# An analysis of thirty years of articles in the _William and Mary Quarterly_

The _William and Mary Quarterly_ is the premier scholarly journal in early American studies and history.  It is published by the [Omohundro Institute of Early American History and Culture](https://oieahc.wm.edu).

The purpose of this experiment is to use a dataset provided by the [JSTOR API](http://dfr.jstor.org), which is an xml file of metadata for around 1000 WMQ articles starting in 1976 and continuing through 2010. The logic behind the limits of this dataset is that the API allows researchers to request data for up to 1000 articles at a time, and this search gave me a dataset of n =973.  

My goal in this basic project is to perform some basic text analysis and other kinds of analysis on this dataset, using some rudimentary Python code.

## First steps
The JSTOR API sent me a series of XML files which I have unzipped and stored in a file (called ```/data```) in the same directory/repository as this jupyter notebook.  My first experiment is to try to use xmltree to parse some of this data.

In [2]:
import xml.etree.ElementTree as ET

In [3]:
tree = ET.parse('data/citations.xml')
root = tree.getroot()

In [4]:
root.tag

'citations'

## Success
The above code imports the python library for xml parsing, known as etree.  Then I feed the ```citations.xml``` file into the parse method and store the resulting object in the ```tree``` variable.  Then I find the root tag within the xml tree, which shows me that I've successfully imported the file.  

# Next step
Now that we have the tree of xml parsed into the ET object, we can experiment with the various functions of the ElementTree module.  For instance, writing a loop to parse out the titles of the articles.  

In [17]:
for child in root[0]:
    print(child)
    

<Element 'doi' at 0x104b498b8>
<Element 'title' at 0x104b49908>
<Element 'author' at 0x104b49958>
<Element 'journaltitle' at 0x104b499a8>
<Element 'volume' at 0x104b49a48>
<Element 'issue' at 0x104b49a98>
<Element 'pubdate' at 0x104b49ae8>
<Element 'pagerange' at 0x104b49b38>
<Element 'publisher' at 0x104b49b88>
<Element 'type' at 0x104b49bd8>
<Element 'reviewed-work' at 0x104b49c28>
<Element 'abstract' at 0x104b49c78>


Problem: I cannot figure out how to parse this xml.  It's a similar process to beautiful soup, where you have to go down the tree, but I seem to wind up at the right child, but nothing prints out.  I have looked at the following pages:

- [stack overflow](http://stackoverflow.com/questions/1912434/how-do-i-parse-xml-in-python)
- p. 156 in Severance.
- [official documentation for python](https://docs.python.org/2/library/xml.etree.elementtree.html)

I am a bit stuck so I'll take a break. My sense is that I'm almost there.  Etree should not be very hard.

## Possible Breakthrough
I think I have found a good article about this dilemma in the book [_Dive Into Python_](https://docs.python.org/2/library/xml.etree.elementtree.html).  Will return and work on this next.  

In [24]:
for child in root:
    tit = child[1].text
    print(tit)
    pubdat = child[6].text
    print(pubdat)

Revolution, Domestic Life, and the End of "Common Mercy" in Crévecoeur's "Landscapes"
1998-04-01T00:00:00Z
Trivia
1984-01-01T00:00:00Z
Interpretive Frameworks: The Quest for Intellectual Order in Early American History
1991-10-01T00:00:00Z
Van der Donck's Description of the Indians: Additions and Corrections
1990-07-01T00:00:00Z
Mapping an Empire: Cartographic and Colonial Rivalry in Seventeenth-Century Dutch and English North America
1997-07-01T00:00:00Z
Indians, the Colonial Order, and the Social Significance of the American Revolution
1996-04-01T00:00:00Z
Who Wrote "The North American" Essays?
1997-04-01T00:00:00Z
Women and Property across Colonial America: A Comparison of Legal Systems in New Mexico and New York
2003-04-01T00:00:00Z
Reason and Compromise in the Establishment of the Federal Constitution, 1787-1801
1987-07-01T00:00:00Z
The Statutory Law of Slavery and Race in the Thirteen Mainland Colonies of British America
1977-04-01T00:00:00Z
Shipping Patterns and the Atlantic Tra

# Moving on...
This is one way to parse, by working "from the top down."  Since the etree object is effectively like a list, you can locate elements within the list with their index numbers as above.  So in the above example, the root is citations, and the first child (```root[0]```) is "articles" and then we drill into each of these by a for loop: 
```
for articles in root:
titl = articles[3]
```

Or whatever.

But there's another way to do it, which is to use the findall() method on elements within the etree.  

In [51]:
alltitls = tree.findall('.//title')
print(alltitls[0:10])

[<Element 'title' at 0x104b49908>, <Element 'title' at 0x104b49d68>, <Element 'title' at 0x104b4e228>, <Element 'title' at 0x104b4e688>, <Element 'title' at 0x104b4eae8>, <Element 'title' at 0x104b4ef48>, <Element 'title' at 0x104b54408>, <Element 'title' at 0x104b54868>, <Element 'title' at 0x104b54cc8>, <Element 'title' at 0x104b58188>]


In [53]:
for titl in alltitls[0:20]:
    print(titl.text)

Revolution, Domestic Life, and the End of "Common Mercy" in Crévecoeur's "Landscapes"
Trivia
Interpretive Frameworks: The Quest for Intellectual Order in Early American History
Van der Donck's Description of the Indians: Additions and Corrections
Mapping an Empire: Cartographic and Colonial Rivalry in Seventeenth-Century Dutch and English North America
Indians, the Colonial Order, and the Social Significance of the American Revolution
Who Wrote "The North American" Essays?
Women and Property across Colonial America: A Comparison of Legal Systems in New Mexico and New York
Reason and Compromise in the Establishment of the Federal Constitution, 1787-1801
The Statutory Law of Slavery and Race in the Thirteen Mainland Colonies of British America
Shipping Patterns and the Atlantic Trade of Bristol, 1749-1770
When in the Course
Lord Cornbury Redressed: The Governor and the Problem Portrait
Governors or Generals?: A Note on Martial Law and the Revolution of 1689 in English America
The Evoluti

In [54]:
! ls -a

[34m.[m[m                  [34m.git[m[m               README.md          test.md
[34m..[m[m                 .gitignore         WMQ_analysis.ipynb
.DS_Store          [34m.ipynb_checkpoints[m[m [34mdata[m[m
