### **To submit on Canvas for DH_assignment4**: 
1. This notebook with your additions (see the last cell, **Your Assignment**, for what to do)
2. Your notes from the reading part of the assignment.

## Tutorial: Intro to Dataframes in Pandas
In this notebook we'll run through some basic examples of what you can do in pandas. If you are new to programming parts will probably seem overwhelming, but that's ok! Just absorb as much as you can in a reasonable amount of time, and make a note of things you find confusing. One of the best ways to learn programming is to dive right into the "deep end", using real tools to do something you're interested in, with real data. We are not going to be evaluating you on your "coding skills" in any way, and our hope is that you just enjoy the process of learning.

Here are some resources:

1. [This](https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf) is the most helpful pandas "cheatsheet" (reference sheet) that I know of. It will probably be helpful to download it and refer to it as you go through the lesson.

2. A basic tutorial on pandas by Chris Potts from Stanford's Programming for Linguists class ([html version](http://web.stanford.edu/class/linguist278/notes/ling278_class12.html), [jupyter notebook version](http://web.stanford.edu/class/linguist278/notes/ling278_class12.ipynb)). Most of the tutorial below is adapted from this.

3. [10 minutes to pandas](http://pandas-docs.github.io/pandas-docs-travis/getting_started/10min.html). Note that despite it's friendly title this takes longer than 10 minutes, and is geared mainly toward people with some programming experience (though it's still possible to learn from it if you are new!).

4. [pandas User Guide](http://pandas-docs.github.io/pandas-docs-travis/user_guide/index.html), helpfully organized by topic.


As you go through the tutorial it may also be helpful to have the data uploaded in a google sheets page so that you can manually scroll around in it and compare with what the pandas operations do.


### Setup

In [45]:
%matplotlib inline
import pandas as pd
import os

First check what your "present working directory" (pwd) is by running the code cell below (remember to run a cell you hit [shift] + [enter/return]). The present working directory is the place in your computer's file system where any search for a file you are trying to load will begin. The default 
present working directory is your "home directory" but you can change it to any folder you want (for example your "Documents" folder) using Python's [os module](https://docs.python.org/3/library/os.html). In this class, for simplicity, we'll stick with the home directory as our present working directory. For example, on my computer (a macbook), the home directory is called "nicholasgardner" and has a house icon next to it. When I call
`%pwd` as below, I get the string '/Users/nicholasgardner', which is the filepath to my home directory.

In [9]:
%pwd

'/Users/nicholasgardner'

Let's import all(!) the data in the Grand Tour Explorer into pandas' spreadsheet-like objects, which are called "dataframes". Since GTE exports include three .tsv files (Travelers_Itineraries, Travelers_Life_Events, and Travelers) we will create a dataframe for each and assign it an informative variable name.


Here we use the python os module to create filepath variables to the three .tsv files we want to load. Filepaths are just strings, but it is a good idea to use the os.path.join() function to create them. Before you create the filepaths by running the cell below, do the following:
1. create a folder named 'GTE_exports' in your present working directory (e.g. your home directory)
2. creat a folder named 'all' within this 'GTE_exports' folder
3. place the three .tsv files downloaded from Canvas in this 'all' folder



In [12]:
TRAVELERS_FP = os.path.join("GTE_exports", "all", "Travelers_all.tsv")
TRAVELERS_ITINERARIES_FP = os.path.join("GTE_exports", "all", "Travelers_Itineraries_all.tsv")
TRAVELERS_LIFE_EVENTS_FP = os.path.join("GTE_exports", "all", "Travelers_Life_Events_all.tsv")

In [13]:
#execute this cell to see that filepaths are just strings
TRAVELERS_FP

'GTE_exports/all/Travelers_all.tsv'


In the cell below we load the .tsv files. 

We set the sep(arator) argument of pandas' read_csv() function to "\t" because we 
are loading .tsv (tab separated value) files. "\t" represents the tab character. As it's name indicates, read_csv() 
defaults to reading .csv (comma separated value) files. If we wanted to explicitly indicate we were reading a .csv file, 
we could pass the argument `sep=','` to the function in order to indicate that our file uses a comma to separate values 
within a row.

We set index_col of our dataframes to be the traveler's ID number (their unique identifier in the GTE database),
so that we can easily and unambiguously refer to their row entries. The traveler's ID number is named "index"
in the travelers .tsv file, while it is named "entryID" in the travelers_itineraries and travelers_life_events
files, so we need to pass the arguments `index_col="index"` and `index_col="entryID"` respectively.

In [108]:
travelers_all = pd.read_csv(
    TRAVELERS_FP, 
    sep="\t", 
    index_col="index")

travelers_itineraries_all = pd.read_csv(
    TRAVELERS_ITINERARIES_FP, 
    sep="\t", 
    index_col="entryID")


travelers_life_events_all = pd.read_csv(
    TRAVELERS_LIFE_EVENTS_FP, 
    sep="\t", 
    index_col="entryID")

### Inspecting our data

In [107]:
#let's check the 'shape' (row x column dimensions) of each of our dataframes.

travelers_all.shape #(6005,12) because there are 6005 person entries (rows) in the database, with 12 data fields (columns)

(6005, 12)

In [109]:
travelers_itineraries_all.shape

(27275, 16)

In [110]:
travelers_life_events_all.shape

(10041, 11)

In [111]:
#now let's look at the first ten entries of each of the dataframes to get a visual sense of what the
#data looks like, and so we can see what the columns are named in each of the three files
travelers_all.head(20)

Unnamed: 0_level_0,travelerNames,gender,birthDate,deathDate,birthPlace,deathPlace,parents,sources,eventsIndex,matchedMentions,unmatchedMentions,matchedMentionsEntryIndexes
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
0.0,Charles Abbot,Male,1757.0,1829.0,,,"John Abbot of Colchester, Essex, and Sarah, la...",Abbot jnl.MSS,1234567,Oswald Leycester;Henry Bankes;John Mitford;Lou...,Angelo Dalmazzoni;Piranesi;Alessandro Cades;Lu...,"2948,220,3361,1536,3266,1164,2219,2205,2947,88..."
1.0,John Farr Abbot,Male,1756.0,1794.0,,,"John Abbot of Colchester, Essex",,89,Sarah Bentham,,366
1.1,Mary Pearce,Female,1762.0,1793.0,,,,,10,Sarah Bentham;John Farr Abbot,,3661
2.0,Edward Abbott,Male,1737.0,1791.0,,,,Redgrave 1878,11,William Wynne Ryland,,4190
3.0,John Abbott,Male,,,,,,,12,Francis Harriman,,2266
4.0,Abbott,Male,,,,,,Martin jnl.MSS,NaN,,,
6.0,Maj. Abercrombie,Male,1706.0,1781.0,,,,Martin jnl.MSS,13141516,James Martin;Edward Gibbon;William Guise;Col. ...,,3208193521494269
7.0,Abercromby,Male,,,,,,,NaN,"John, Lord Carmichael",,808
8.0,Sir Ralph Abercromby,Male,1734.0,1801.0,,,George Abercromby,,17181920212223242526,"George, Baron Keith",,2732
9.0,Aberdeen,Male,,,,,,"ASV ,Forbes MSS,Gazz.Tosc.",NaN,Margaret Forbes;Parnell,,17733742


In [112]:
travelers_itineraries_all.head(20)

Unnamed: 0_level_0,travelerNames,birthDate,deathDate,gender,travelPlace,coordinates,startDate,endDate,startYear,startMonth,startDay,endYear,endMonth,endDay,markers,travelIndex
entryID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
0.0,Charles Abbot,1757.0,1829.0,Male,Dover,"51.1275,1.312222222",1788-8-26,1788-8-26,1788.0,8.0,26.0,1788.0,8.0,26.0,dep.,0
0.0,Charles Abbot,1757.0,1829.0,Male,Turin,"45.06666667,7.7",1788-9-10,1788-9-10,1788.0,9.0,10.0,1788.0,9.0,10.0,,1
0.0,Charles Abbot,1757.0,1829.0,Male,Genoa,"44.40718611,8.933983333",1788-9-13,1788-9-13,1788.0,9.0,13.0,1788.0,9.0,13.0,,2
0.0,Charles Abbot,1757.0,1829.0,Male,Leghorn,"43.55,10.31666667",1788-9-01,1788-9-01,1788.0,9.0,,1788.0,9.0,,,3
0.0,Charles Abbot,1757.0,1829.0,Male,Pisa,"43.70853,10.4036",1788-9-01,1788-9-01,1788.0,9.0,,1788.0,9.0,,,4
0.0,Charles Abbot,1757.0,1829.0,Male,Florence,"43.77138889,11.25416667",1788-9-15,1788-9-18,1788.0,9.0,15.0,1788.0,9.0,18.0,,5
0.0,Charles Abbot,1757.0,1829.0,Male,Perugia,"43.1121,12.3888",1788-9-20,1788-9-20,1788.0,9.0,20.0,1788.0,9.0,20.0,,6
0.0,Charles Abbot,1757.0,1829.0,Male,Narni,"42.517799,12.51640034",1788-9-01,1788-9-01,1788.0,9.0,,1788.0,9.0,,,7
0.0,Charles Abbot,1757.0,1829.0,Male,Rome,"41.89305556,12.48277778",1788-9-21,1788-9-28,1788.0,9.0,21.0,1788.0,9.0,28.0,,8
0.0,Charles Abbot,1757.0,1829.0,Male,Naples,"40.840141,14.25226021",1788-9-30,1788-10-6,1788.0,9.0,30.0,1788.0,10.0,6.0,,9


In [113]:
travelers_life_events_all.head(20)

Unnamed: 0_level_0,travelerNames,birthDate,deathDate,gender,lifeEvents,eventsDetail1,eventsDetail2,place,startDate,endDate,eventsIndex
entryID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
0.0,Charles Abbot,1757.0,1829.0,Male,marriage,1,Elizabeth Gibbes,,1796.0,,1
0.0,Charles Abbot,1757.0,1829.0,Male,education,,Westminster School,"Westminster, London",,,2
0.0,Charles Abbot,1757.0,1829.0,Male,education,,"Church Christ College, Oxford",Oxford,1775.0,,3
0.0,Charles Abbot,1757.0,1829.0,Male,education,,Middle Temple,London,,,4
0.0,Charles Abbot,1757.0,1829.0,Male,education,,Lincoln's Inn,London,1785.0,,5
0.0,Charles Abbot,1757.0,1829.0,Male,occupation,Law,called to the bar,,1783.0,,6
0.0,Charles Abbot,1757.0,1829.0,Male,occupation,Statesmen and Political Appointees,Member of Parliament,,1795.0,1817.0,7
1.0,John Farr Abbot,1756.0,1794.0,Male,marriage,1,Mary Pearce,,1786.0,,8
1.0,John Farr Abbot,1756.0,1794.0,Male,occupation,Law,clerk of the rules King's Bench,,1790.0,1794.0,9
1.1,Mary Pearce,1762.0,1793.0,Female,marriage,1,John Farr Abbot,,1786.0,,10


In [114]:
#we can pull out the data of a single column in our dataframe. For example, we can look at 
#the 'travelerNames' column in the Travelers.tsv file to see what the personal names corresponding to the traveler IDs are.
travelers_all['travelerNames']

index
0.0                                 Charles Abbot
1.0                               John Farr Abbot
1.1                                   Mary Pearce
2.0                                 Edward Abbott
3.0                                   John Abbott
4.0                                        Abbott
6.0                              Maj. Abercrombie
7.0                                    Abercromby
8.0                          Sir Ralph Abercromby
9.0                                      Aberdeen
10.0                             George Abernethy
12.0      Willoughby Bertie, 4th Earl of Abingdon
13.0                                   Col. Abram
15.0                                     Ackmooty
16.0                             John Dyke Acland
17.0                                   John Acton
18.0                    John Francis Edward Acton
18.1                          Joseph Edward Acton
18.2                              Mary Anne Acton
18.3               Ferdinand Richard Edward 

In [115]:
#or we can look at the travel places corresponding to each travel event (ordered by traveler ID) in the
#Travelers_Itineraries file
travelers_itineraries_all['travelPlace']

entryID
0.0          Dover
0.0          Turin
0.0          Genoa
0.0        Leghorn
0.0           Pisa
0.0       Florence
0.0        Perugia
0.0          Narni
0.0           Rome
0.0         Naples
0.0           Rome
0.0          Siena
0.0       Florence
0.0        Bologna
0.0         Venice
0.0        Vicenza
0.0         Verona
0.0         London
1.0        England
1.0         Verona
1.0        Vicenza
1.0          Padua
1.0         Venice
1.0          Padua
1.0        Ferrara
1.0        Bologna
1.0       Florence
1.0          Siena
1.0        Viterbo
1.0           Rome
            ...   
5285.0        Rome
5285.0     Bologna
5285.0      Venice
5285.0       Padua
5285.0     Vicenza
5285.0      Verona
5285.0      Mantua
5285.0       Parma
5285.0       Milan
5285.0       Pavia
5286.0        Rome
5286.0        Rome
5286.0        Rome
5286.0    Florence
5286.0      Naples
5286.0      Naples
5286.0      Naples
5286.0      Naples
5286.0    Florence
5286.1    Florence
5286.2    Florence
5286

In [116]:
#we can pass a list of multiple columns and see the data in all of them
travelers_itineraries_all[ ['travelerNames', 'travelPlace', 'startMonth','endMonth'] ]

Unnamed: 0_level_0,travelerNames,travelPlace,startMonth,endMonth
entryID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0.0,Charles Abbot,Dover,8.0,8.0
0.0,Charles Abbot,Turin,9.0,9.0
0.0,Charles Abbot,Genoa,9.0,9.0
0.0,Charles Abbot,Leghorn,9.0,9.0
0.0,Charles Abbot,Pisa,9.0,9.0
0.0,Charles Abbot,Florence,9.0,9.0
0.0,Charles Abbot,Perugia,9.0,9.0
0.0,Charles Abbot,Narni,9.0,9.0
0.0,Charles Abbot,Rome,9.0,9.0
0.0,Charles Abbot,Naples,9.0,10.0


### Getting specific rows

We can use `.loc[TRAVELER ID]` method to view the rows of data associated with TRAVELER ID. For example, Charles Abbot has traveler ID 0.0, so we can view the rows corresponding to his itinerary as follows. 

In [117]:
travelers_itineraries_all.loc[0.0]

Unnamed: 0_level_0,travelerNames,birthDate,deathDate,gender,travelPlace,coordinates,startDate,endDate,startYear,startMonth,startDay,endYear,endMonth,endDay,markers,travelIndex
entryID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
0.0,Charles Abbot,1757.0,1829.0,Male,Dover,"51.1275,1.312222222",1788-8-26,1788-8-26,1788.0,8.0,26.0,1788.0,8.0,26.0,dep.,0
0.0,Charles Abbot,1757.0,1829.0,Male,Turin,"45.06666667,7.7",1788-9-10,1788-9-10,1788.0,9.0,10.0,1788.0,9.0,10.0,,1
0.0,Charles Abbot,1757.0,1829.0,Male,Genoa,"44.40718611,8.933983333",1788-9-13,1788-9-13,1788.0,9.0,13.0,1788.0,9.0,13.0,,2
0.0,Charles Abbot,1757.0,1829.0,Male,Leghorn,"43.55,10.31666667",1788-9-01,1788-9-01,1788.0,9.0,,1788.0,9.0,,,3
0.0,Charles Abbot,1757.0,1829.0,Male,Pisa,"43.70853,10.4036",1788-9-01,1788-9-01,1788.0,9.0,,1788.0,9.0,,,4
0.0,Charles Abbot,1757.0,1829.0,Male,Florence,"43.77138889,11.25416667",1788-9-15,1788-9-18,1788.0,9.0,15.0,1788.0,9.0,18.0,,5
0.0,Charles Abbot,1757.0,1829.0,Male,Perugia,"43.1121,12.3888",1788-9-20,1788-9-20,1788.0,9.0,20.0,1788.0,9.0,20.0,,6
0.0,Charles Abbot,1757.0,1829.0,Male,Narni,"42.517799,12.51640034",1788-9-01,1788-9-01,1788.0,9.0,,1788.0,9.0,,,7
0.0,Charles Abbot,1757.0,1829.0,Male,Rome,"41.89305556,12.48277778",1788-9-21,1788-9-28,1788.0,9.0,21.0,1788.0,9.0,28.0,,8
0.0,Charles Abbot,1757.0,1829.0,Male,Naples,"40.840141,14.25226021",1788-9-30,1788-10-6,1788.0,9.0,30.0,1788.0,10.0,6.0,,9


In [126]:
#calculate travel age at time of each travel event. Note that there is often not a single
#"age at travel time" that stays constant throughout a tour, since some travelers were in Italy 
#for more than a year
print(travelers_itineraries_all['startYear'] - travelers_itineraries_all['birthDate'])

entryID
0.0       31.0
0.0       31.0
0.0       31.0
0.0       31.0
0.0       31.0
0.0       31.0
0.0       31.0
0.0       31.0
0.0       31.0
0.0       31.0
0.0       31.0
0.0       31.0
0.0       31.0
0.0       31.0
0.0       31.0
0.0       31.0
0.0       31.0
0.0       31.0
1.0       37.0
1.0       37.0
1.0       37.0
1.0       37.0
1.0       37.0
1.0       37.0
1.0       37.0
1.0       37.0
1.0       37.0
1.0       37.0
1.0       37.0
1.0       37.0
          ... 
5285.0     NaN
5285.0     NaN
5285.0     NaN
5285.0     NaN
5285.0     NaN
5285.0     NaN
5285.0     NaN
5285.0     NaN
5285.0     NaN
5285.0     NaN
5286.0     NaN
5286.0     NaN
5286.0     NaN
5286.0     NaN
5286.0     NaN
5286.0     NaN
5286.0     NaN
5286.0     NaN
5286.0     NaN
5286.1     NaN
5286.2     NaN
5286.3     NaN
5286.4     NaN
5287.0     NaN
5288.0     NaN
5288.0     NaN
5289.0     NaN
5290.0     NaN
5290.0     NaN
5290.1     NaN
Length: 27275, dtype: float64


### Basic counting, and other summary statistics of the data

In [51]:
#count how many "travel events" there are for each place in the GTE database
travelers_itineraries_all['travelPlace'].value_counts()

Rome              5050
Florence          2971
Venice            2686
Naples            2388
Padua             1339
Turin             1171
Leghorn           1111
Bologna            865
Genoa              732
England            713
Milan              681
Pisa               490
Verona             460
Siena              359
Parma              352
London             276
Capua              269
Lucca              212
Vicenza            198
Modena             182
Mantua             175
Italy              174
Loreto             169
Sicily             142
Paris              141
Ferrara            120
Geneva             120
Vienna             106
Piacenza            84
Dover               81
                  ... 
Ohlau                1
Asola                1
Porto Palinuro       1
Fiumicino            1
Pera                 1
Clitumnus            1
Ascoli               1
Cerignola            1
Avellino             1
Brenner              1
eastern Europe       1
Monza                1
Nola       

In [152]:
#get some statistics on travel start date (when is the most travel activity reported in our data)
mean_travel_year = travelers_itineraries_all['startYear'].mean()  #using the mean doesn't make much sense in this case
median_travel_year = travelers_itineraries_all['startYear'].median()

                                                
print("the mean travel year is: ", mean_travel_year) 
print("the median travel year is: ", median_travel_year)


the mean travel year is:  1761.536734993755
the median travel year is:  1768.0


### More sophisticated data selection

In [137]:
#disclaimer: this one is pretty confusing at first
#create a "boolean filter" to extract travelers whose occupation is known
filter_for_occupations = travelers_life_events_all['lifeEvents'] == "occupation"

#we see that this gives us "True" values for those travelers whose occupation is known
filter_for_occupations

entryID
0.0       False
0.0       False
0.0       False
0.0       False
0.0       False
0.0        True
0.0        True
1.0       False
1.0        True
1.1       False
2.0       False
3.0       False
6.0        True
6.0       False
6.0       False
6.0       False
8.0       False
8.0       False
8.0       False
8.0        True
8.0        True
8.0       False
8.0       False
8.0       False
8.0       False
8.0       False
12.0      False
12.0      False
12.0      False
15.0      False
          ...  
5278.0    False
5278.0     True
5278.0     True
5278.1    False
5279.0    False
5279.0    False
5279.0    False
5279.0    False
5279.0    False
5279.0     True
5279.0     True
5279.0     True
5281.0    False
5281.0    False
5281.0     True
5283.0    False
5283.0     True
5283.1    False
5284.0    False
5284.0    False
5284.0    False
5284.0    False
5284.0    False
5284.0    False
5284.0    False
5284.0    False
5284.0     True
5284.0     True
5286.1     True
5286.2     True
Name: lifeEvents

In [138]:
#now apply/input this filter to the travelers_life_events_all dataframe to view only the travelers whose occupation is known
travelers_life_events_all[filter_for_occupations]

Unnamed: 0_level_0,travelerNames,birthDate,deathDate,gender,lifeEvents,eventsDetail1,eventsDetail2,place,startDate,endDate,eventsIndex
entryID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
0.0,Charles Abbot,1757.0,1829.0,Male,occupation,Law,called to the bar,,1783,,6
0.0,Charles Abbot,1757.0,1829.0,Male,occupation,Statesmen and Political Appointees,Member of Parliament,,1795,1817.0,7
1.0,John Farr Abbot,1756.0,1794.0,Male,occupation,Law,clerk of the rules King's Bench,,1790,1794.0,9
6.0,Maj. Abercrombie,1706.0,1781.0,Male,occupation,Army and Navy,army officer,,,,13
8.0,Sir Ralph Abercromby,1734.0,1801.0,Male,occupation,Statesmen and Political Appointees,Knight of the Bath,,1795,,20
8.0,Sir Ralph Abercromby,1734.0,1801.0,Male,occupation,Army and Navy,army officer,,,,21
15.0,Ackmooty,1687.0,1750.0,Male,occupation,Law,called to the bar,,1711,,33
16.0,John Dyke Acland,1746.0,1778.0,Male,occupation,Statesmen and Political Appointees,Member of Parliament,,1774,1778.0,36
16.0,John Dyke Acland,1746.0,1778.0,Male,occupation,Army and Navy,army officer,,,,37
17.0,John Acton,1703.0,1766.0,Male,occupation,Statesmen and Political Appointees,captain East India Company,EICo,,1747.0,41


In [139]:
#we could have done the above in one line, but it makes it even more confusing. Eventually, with practice, this sort
#of thing becomes intuitive.
travelers_life_events_all[travelers_life_events_all['lifeEvents'] == "occupation"]

Unnamed: 0_level_0,travelerNames,birthDate,deathDate,gender,lifeEvents,eventsDetail1,eventsDetail2,place,startDate,endDate,eventsIndex
entryID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
0.0,Charles Abbot,1757.0,1829.0,Male,occupation,Law,called to the bar,,1783,,6
0.0,Charles Abbot,1757.0,1829.0,Male,occupation,Statesmen and Political Appointees,Member of Parliament,,1795,1817.0,7
1.0,John Farr Abbot,1756.0,1794.0,Male,occupation,Law,clerk of the rules King's Bench,,1790,1794.0,9
6.0,Maj. Abercrombie,1706.0,1781.0,Male,occupation,Army and Navy,army officer,,,,13
8.0,Sir Ralph Abercromby,1734.0,1801.0,Male,occupation,Statesmen and Political Appointees,Knight of the Bath,,1795,,20
8.0,Sir Ralph Abercromby,1734.0,1801.0,Male,occupation,Army and Navy,army officer,,,,21
15.0,Ackmooty,1687.0,1750.0,Male,occupation,Law,called to the bar,,1711,,33
16.0,John Dyke Acland,1746.0,1778.0,Male,occupation,Statesmen and Political Appointees,Member of Parliament,,1774,1778.0,36
16.0,John Dyke Acland,1746.0,1778.0,Male,occupation,Army and Navy,army officer,,,,37
17.0,John Acton,1703.0,1766.0,Male,occupation,Statesmen and Political Appointees,captain East India Company,EICo,,1747.0,41


In [153]:
#get all the painters and architects
target_occupations = {'painter', 'architect'} #this is a "set" object--it's like a list but unordered and without duplicates
travelers_life_events_all[ travelers_life_events_all['eventsDetail2'].isin(target_occupations) ]

Unnamed: 0_level_0,travelerNames,birthDate,deathDate,gender,lifeEvents,eventsDetail1,eventsDetail2,place,startDate,endDate,eventsIndex
entryID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2.0,Edward Abbott,1737.0,1791.0,Male,DBITI employment or identifier,,painter,,,,11
21.0,James Adam,1732.0,1794.0,Male,DBITI employment or identifier,,architect,,,,62
21.0,James Adam,1732.0,1794.0,Male,occupation,"Artists, Architects, Craftsmen",architect,King's Works,1769,1782.0,63
23.0,Robert Adam,1728.0,1792.0,Male,DBITI employment or identifier,,architect,,,,67
23.0,Robert Adam,1728.0,1792.0,Male,occupation,"Artists, Architects, Craftsmen",architect,King's Works,1762,1768.0,68
36.0,William Aikman,1682.0,1731.0,Male,DBITI employment or identifier,,painter,,,,92
45.0,Cosmo Alexander,1724.0,1772.0,Male,DBITI employment or identifier,,painter,,,,105
48.0,John Alexander,1686.0,1766.0,Male,DBITI employment or identifier,,painter,,,,111
49.0,Alexander,1748.0,,Male,DBITI employment or identifier,,painter,,,,112
52.0,David Allan,1744.0,1796.0,Male,DBITI employment or identifier,,painter,,,,114


### Sorting

In [127]:
#sort by travel place
travelers_itineraries_all.sort_values('travelPlace')

Unnamed: 0_level_0,travelerNames,birthDate,deathDate,gender,travelPlace,coordinates,startDate,endDate,startYear,startMonth,startDay,endYear,endMonth,endDay,markers,travelIndex
entryID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
2488.0,Patrick Home,1728.0,1808.0,Male,Abano,"45.360458,11.78979015",1773-6-01,1773-7-01,1773.0,6.0,,1773.0,7.0,,,17
1909.0,David Garrick,,,Male,Abano,"45.360458,11.78979015",1764-6-01,1764-8-01,1764.0,6.0,,1764.0,8.0,,,14
2488.1,Jane Graham,,,Female,Abano,"45.360458,11.78979015",1773-6-01,1773-7-01,1773.0,6.0,,1773.0,7.0,,,15
3883.0,Anne Pitt,1712.0,1781.0,Female,Abano,"45.360458,11.78979015",1775-5-01,1775-9-01,1775.0,5.0,,1775.0,9.0,,,7
3883.0,Anne Pitt,1712.0,1781.0,Female,Abano,"45.360458,11.78979015",1774-5-01,1774-5-01,1774.0,5.0,,1774.0,5.0,,,2
1909.1,Eva Maria Veigel,1724.0,1822.0,Female,Abano,"45.360458,11.78979015",1764-6-01,1764-8-01,1764.0,6.0,,1764.0,8.0,,,14
2851.0,John La Touche,1772.0,1838.0,Male,Abano,"45.360458,11.78979015",1794-9-01,1795-1-01,1794.0,9.0,,1795.0,1.0,,c.; /,8
2834.0,Langdale,,,Male,Abano,"45.360458,11.78979015",1764-6-01,1764-6-01,1764.0,6.0,,1764.0,6.0,,,3
2540.0,Edward Howard,1744.0,1767.0,Male,Abano,"45.360458,11.78979015",1764-6-01,1764-6-01,1764.0,6.0,,1764.0,6.0,,,3
1539.0,John Dyer,1699.0,1757.0,Male,Aberglasney,"51.8806,-4.06223011",1725-9-01,1725-11-01,1725.0,9.0,,1725.0,11.0,,,7


In [130]:
#sort by travel place, and WITHIN a given place, sort those people by travel startDate.
#This is an easy way to get a sense of which people were in the same 
#city at the same time.
travelers_itineraries_all.sort_values( ['travelPlace','startDate'] )

Unnamed: 0_level_0,travelerNames,birthDate,deathDate,gender,travelPlace,coordinates,startDate,endDate,startYear,startMonth,startDay,endYear,endMonth,endDay,markers,travelIndex
entryID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
1909.0,David Garrick,,,Male,Abano,"45.360458,11.78979015",1764-6-01,1764-8-01,1764.0,6.0,,1764.0,8.0,,,14
1909.1,Eva Maria Veigel,1724.0,1822.0,Female,Abano,"45.360458,11.78979015",1764-6-01,1764-8-01,1764.0,6.0,,1764.0,8.0,,,14
2540.0,Edward Howard,1744.0,1767.0,Male,Abano,"45.360458,11.78979015",1764-6-01,1764-6-01,1764.0,6.0,,1764.0,6.0,,,3
2834.0,Langdale,,,Male,Abano,"45.360458,11.78979015",1764-6-01,1764-6-01,1764.0,6.0,,1764.0,6.0,,,3
2488.0,Patrick Home,1728.0,1808.0,Male,Abano,"45.360458,11.78979015",1773-6-01,1773-7-01,1773.0,6.0,,1773.0,7.0,,,17
2488.1,Jane Graham,,,Female,Abano,"45.360458,11.78979015",1773-6-01,1773-7-01,1773.0,6.0,,1773.0,7.0,,,15
3883.0,Anne Pitt,1712.0,1781.0,Female,Abano,"45.360458,11.78979015",1774-5-01,1774-5-01,1774.0,5.0,,1774.0,5.0,,,2
3883.0,Anne Pitt,1712.0,1781.0,Female,Abano,"45.360458,11.78979015",1775-5-01,1775-9-01,1775.0,5.0,,1775.0,9.0,,,7
2851.0,John La Touche,1772.0,1838.0,Male,Abano,"45.360458,11.78979015",1794-9-01,1795-1-01,1794.0,9.0,,1795.0,1.0,,c.; /,8
1539.0,John Dyer,1699.0,1757.0,Male,Aberglasney,"51.8806,-4.06223011",1725-9-01,1725-11-01,1725.0,9.0,,1725.0,11.0,,,7


## Your Assignment

Spend at least 30 minutes using Pandas to do something relevant to your midterm project in this notebook. You can build off of and recombine any examples above, and/or consult the resources listed at the top of this notebook. 

Create new cells below by clicking the plus sign on the Jupyter notebook menu above, and do your work there. Don't worry about demonstrating programming knowledge or anything like that. What you do can be very basic. Just use this time to do something that is useful for your midterm project. If you are new to programming, even just doing one slightly different thing based on the code examples above is a good accomplishment. If after attempting something and consulting the resources above, you aren't able to get it to work,  briefly describe what it is that you wanted to do, and keep your code attempts in the cells below.

