# CSV Exercise

### Given a CSV of forme states for the *Shepheardes Calendar*, determine if 1586 was set from 1581

Here's the original question:

> Could you help me confirm that 1586 was set from 1581 and not from 1579?  I have before me one instance in which 1586 seems to revert to a reading in state 2 of a forme in 1579.  I’d like to confirm that this is a fluke and that in a preponderance of details 1586 reproduces 1581 and not 1579.

**n.b.: This is based on a private CSV from the Spenser Project. If you'd like to try this exercise, contact me for the file.**

In [1]:
# First let's import the libraries we will need

import csv
from collections import Counter

In [7]:
# Now let's open the file using the csv module

with open("data/SC_simplified.csv", "r") as csvfile:
    reader = csv.DictReader(csvfile) # We don't need to explicitly set the delimiter, because we know this is comma-separated
    tokens = [row for row in reader] # As before, turn our reader object into a list
    
print(tokens[:20])

[OrderedDict([('id', '1'), ('1579 signature', '¶1R'), ('1579', 'THE '), ('1581', 'THE '), ('1586', 'THE '), ('1591', 'THE '), ('1597', 'THE '), ('1611', 'THE ')]), OrderedDict([('id', '2'), ('1579 signature', '¶1R'), ('1579', 'Shepheardes'), ('1581', 'Shepheardes'), ('1586', 'Shepheardes'), ('1591', 'Shepheards'), ('1597', 'SHEPHEARDS'), ('1611', 'SHEPHEARDS')]), OrderedDict([('id', '3'), ('1579 signature', '¶1R'), ('1579', 'Calender'), ('1581', 'Calender'), ('1586', 'Calender,'), ('1591', 'Calender.'), ('1597', 'Calender:'), ('1611', 'CALENDER:')]), OrderedDict([('id', '4'), ('1579 signature', '¶1R'), ('1579', 'Conteyning'), ('1581', 'Conteining'), ('1586', 'Conteining'), ('1591', 'Conteining'), ('1597', 'CONTEYNING'), ('1611', 'CONTAINING')]), OrderedDict([('id', '6'), ('1579 signature', '¶1R'), ('1579', 'Æglogues'), ('1581', 'Æglogues'), ('1586', 'Æglogues'), ('1591', 'Aeglogues'), ('1597', 'Aeglogues,'), ('1611', 'ÆGLOGVES,')]), OrderedDict([('id', '7'), ('1579 signature', '¶1R'), 

In [12]:
"""
Now the trick is to loop through these and figure out
when 1586 is the same as 1579, and when it matches 1581.
We'll start some counters at 0.
"""

matches_1579 = 0
matches_1581 = 0

for token in tokens:
    # First, when does 1586 match 1579, BUT NOT 1581:
    if token["1586"] == token["1579"] and token["1586"] != token["1581"]:
        matches_1579 += 1 # If this is true, iterate the counter by one
        
    # Second, when does 1586 match 1581, BUT NOT 1579:
    if token["1586"] == token["1581"] and token["1586"] != token["1579"]:
        matches_1581 += 1 # If this is true, iterate the other counter by one
        
print("1586 follows 1579 {} times.".format(matches_1579))
print("1586 follows 1581 {} times.".format(matches_1581))


1586 follows 1579 549 times.
1586 follows 1581 1882 times.


In [15]:
"""
So this result confirms our suspicion that 1586 was set from 1581.
However there are a lot of instances where it follows 1579 and
differs from 1581. Let's keep track of the readings instead.
"""

# First we need a function to remove multiple items from a dictionary (more on why in a minute)
# I modified this from here: https://stackoverflow.com/questions/5844672/delete-an-element-from-a-dictionary

def remove_item(dictionary, keys):
    copy_dictionary = dict(dictionary) # Create a copy of the dictionary
    for key in keys: # For every key that we want to remove
        copy_dictionary.pop(key) # Remove the item by its key, using the .pop() method
    return copy_dictionary # Return the copy, not the original
    
# Now let's do a modified version of our loop
# This time let's create lists
matchlist_1579 = []
matchlist_1581 = []

# Let's make a list of the years we don't care about, for our remove function

later_editions = ["1591", "1597", "1611"]

# Now let's loop
for token in tokens:
    # First we create a copy of the token that removes the years we don't care about:
    clean_token = remove_item(token, later_editions)
    # Then, when does 1586 match 1579, BUT NOT 1581:
    if clean_token["1586"] == clean_token["1579"] and clean_token["1586"] != clean_token["1581"]:
        # If this is true, add the token to our list
        matchlist_1579.append(clean_token)
        
    # Now, when does 1586 match 1581, BUT NOT 1579:
    if clean_token["1586"] == clean_token["1581"] and clean_token["1586"] != clean_token["1579"]:
        # If this is true, add the token to the other list
        matchlist_1581.append(clean_token)
        
# Note that the numbers are the same:

print("1586 follows 1579 {} times.".format(len(matchlist_1579)))
print("1586 follows 1581 {} times.".format(len(matchlist_1581)))

1586 follows 1579 549 times.
1586 follows 1581 1882 times.


In [16]:
# Our counts are the same, but now we can really examine those lists.
# Let's take a look at 1579

for token in matchlist_1579:
    print(token)

{'id': '104', '1579 signature': '¶1V', '1579': '[ſh]adow', '1581': 'ſ⁀hadowe', '1586': '[ſh]adow'}
{'id': '147', '1579 signature': '¶1V', '1579': 'name,', '1581': 'name', '1586': 'name,'}
{'id': '259', '1579 signature': '¶2R', '1579': 'and', '1581': '&', '1586': 'and'}
{'id': '286', '1579 signature': '¶2R', '1579': 'owne', '1581': 'ovvne', '1586': 'owne'}
{'id': '330', '1579 signature': '¶2R', '1579': 'and', '1581': '&', '1586': 'and'}
{'id': '372', '1579 signature': '¶2R', '1579': 'onely', '1581': 'only', '1586': 'onely'}
{'id': '378', '1579 signature': '¶2R', '1579': 'all,', '1581': 'al,', '1586': 'all,'}
{'id': '454', '1579 signature': '¶2R', '1579': 'which', '1581': 'vvhich', '1586': 'which'}
{'id': '571', '1579 signature': '¶2R', '1579': 'and', '1581': '&', '1586': 'and'}
{'id': '579', '1579 signature': '¶2R', '1579': '[ſt]ill', '1581': 'ﬅil', '1586': '[ſt]ill'}
{'id': '586', '1579 signature': '¶2R', '1579': 'needes', '1581': 'needs', '1586': 'needes'}
{'id': '596', '1579 signatur

In [36]:
# Here's a secret for cleaner output and a preview of a later lesson
# Just use pandas! (You'll have to install it first)

import pandas as pd

df = pd.DataFrame(matchlist_1579) # Turn our list into a "dataframe"

pd.set_option('display.max_rows', len(df)) # This just makes sure pandas doesn't truncate the data
df # The only time you *shouldn't* use the print() function

Unnamed: 0,1579,1579 signature,1581,1586,id
0,[ſh]adow,¶1V,ſ⁀hadowe,[ſh]adow,104
1,"name,",¶1V,name,"name,",147
2,and,¶2R,&,and,259
3,owne,¶2R,ovvne,owne,286
4,and,¶2R,&,and,330
5,onely,¶2R,only,onely,372
6,"all,",¶2R,"al,","all,",378
7,which,¶2R,vvhich,which,454
8,and,¶2R,&,and,571
9,[ſt]ill,¶2R,ﬅil,[ſt]ill,579


In [40]:
# We can output both of these as CSVs to take a look at them in a spreadsheet later

with open("matchlist_1579.csv", "w") as csv79:
    fieldnames = list(matchlist_1579[0].keys()) # Get the keys from one of our dictionaries as fieldnames
    writer = csv.DictWriter(csv79, fieldnames=fieldnames)
    writer.writeheader()
    writer.writerows(matchlist_1579)
    
with open("matchlist_1581.csv", "w") as csv81:
    fieldnames = list(matchlist_1581[0].keys()) # Get the keys from one of our dictionaries as fieldnames
    writer = csv.DictWriter(csv81, fieldnames=fieldnames)
    writer.writeheader()
    writer.writerows(matchlist_1581)   