# Enslaved People British Empire Project
Syracuse University Fall 2020 - Spring 2021
<br>
<br>
Subject Matter Expert: Professor Tessa Murphy Ph.D. temurphy@maxwell.syr.edu
<br>
Supervising Professor: Professor Michael Fudge mafudge@syr.edu
<br>
Faculty Assistant: Ian Ustanik ihustani@syr.edu

# Respository Contents:
### sheets.ipynb
File contains Jupyter Notebook automated pipeline for accessing Google Sheets files, processing sheets, combining sheets, and exporting Enslaved_Persons_Final.csv.
### service.json
File contains service account file credentials for OUATH2 Google API login in JSON format.
### visualizations.ipynb
File contains Jupyter Notebook for example visualizations produced using python code. 
### packed_bubbles.py
File contains imported python file from unreleased version of Matplotlib for producing packed-bubble charts in the visualizations.ipynb notebook.
### examples.twb
File contains Tableau Workbook for example visualizations and dashboards. Workbook contains 9 worksheets and 3 dashboards.
### Enslaved_Persons_Final.csv
File contains latest processed dataset in CSV format.

# Project Methodology:

Data Source is British Registries of Enslaved Peoples circa 1815 for the island of Saint Lucia. Great Britain had recently secured the island in 1814 from the French following the Treaty of Paris ending the Napoleonic Wars. Registry is in French. 

Each page corresponds to a plantation (sometimes running to several pages, based on number of people). Each plantation is listed by name, owner, and usually location (by parish within the island) across the top of the page.

Enslaved people are then organized into three groups: “liste generale des familles” (general list of families; usually female-headed, and often multi-generational); then “liste generale des esclaves males” (enslaved men without family on the plantation); then “liste generale des esclaves femelles” (enslaved women without family on the plantation).

The purpose of this project was to digitize these records, process them, and present a clean dataset for research in the spirit of digital humanities. Analysis can be completed through various methods including visual to answer important questions about the lives of these individuals. The ultimate goal is to create an artifact such as an API or Website for the data which is easily accessible for other researchers.

### Data Input
The first step in this process was data collection and input. Tessa Murphy and her team of research assistants worked on data collection and translation for each page in the registries. An example of a single page is shown below:
<br>
![image-3.png](attachment:image-3.png)
<br>
The team utilized a Google Sheets template for organizing the data collection as shown below:
<br>
![image.png](attachment:image.png)
<br>
![image-2.png](attachment:image-2.png)
<br>
Google Sheets were stored in a shared drive Enslaved-People-British-Empire. This process is ongoing and is very labor intensive. Big props to Tessa Murphy and her team for their efforts in data collection, translation, and input!
### Accessing Data 
To access the Google Sheets from the shared drive the team used the Google API python client service. Overall Google access required service credentials for OUATH2 Google API login. The Enslaved-People-British-Empire drive was then returned with all files. Only Google Sheets files were sorted out for processing. **File(s): sheets.ipynb, service.json**
### Sheet Processing
A sheet_processor() function was defined for processing an individual sheet. It took a sheet ID as an input and output a processed pandas dataframe. It first retrieved a response for both Individal and Plantation Information sheets for each workbook along with named alternatives, then created dataframes from each response. Plantation information was then transformed and appended to the end of each row. An integer conversion function was defined to help create the engineered feature for combined height. The engineered feature for birth year was then derived followed by the engineered features for parental hierarchy. Finally, the next cell in the notebook created a combined dataframe applying sheet_processor() across all sheets, with errors being appended to a seperate list. **File(s): sheets.ipynb**
### Term Consolidation 
After discussion with Tessa Murpy, it was decided that it would be helpful to have additional columns consolidating similar terms for the purpose of further analysis. This was completed through the creation of lookup tables for consolidating similar terms. These dictionaries were then reversed and new engineered columns were created for consolidated terms. These included terms for Parish, Employment, Color, and General_Employment. See bottom of data dictionary for futher details. **File(s): sheets.ipynb**
### Data Output
A cleaned dataframe was then output in csv. file format for futher analysis. **File(s): sheets.ipynb, Enslaved_Persons_Final.csv**
### Data Analysis (Visualization)
Finally, the team could engage in some preliminary data analysis. To provide an example of analysis that could be completed in python, visualization.ipynb shows examples of tree maps, bubble charts, and various histograms. A Tableau Workbook was also created for example visualizations and dashboards.The workbook contains 9 worksheets and 3 dashboards.**File(s): visualization.ipynb, packed_bubbles.py, examples.twb**
### Next Steps
See below section for future project considerations:

# Next Steps:
### Amazon S3 for storing registry images
Link to images through dataset, hosted on the cloud.
### Term classification task
Utilize machine learning to automate the consolidation of terms through NLP techniques and classification algorithms.
### API or website build
Create artifact that is easily accessible for other researchers.

# Data Dictionary:

Below is a data dictionary of column values located in the final dataset. Engineered columns are denoted with an asterisk following the column name. Tessa Murphy has indicated some thoughts on information that could potentially be derived from this information below some of the column definitions:

**Individual ID**
<br>
<br> 
    -Unique identifyer for an individual within a particular plantation. 
    <br>
    -Unique only to plantation. 
    <br>
    -Use Individual ID along with Plantation Information for composite key.
<br>
<br>
**Nom** (first name)
<br>
<br>
	-Gender (traditional male vs. female names).
    <br>
	-Family naming patterns (children or grandchildren being named after other family members.
    <br>
	-African naming practices?
<br>
<br>
**Surnom** (surname):
<br>
<br>
	-Family relationships.
    <br>
	-Gender dynamics? (matrilineal families in which descendants take the mother’s last name).
<br>
<br>
**Couleur** (color):
<br>
<br>
	-Gender (terms ending in ‘esse’ indicate women).
    <br>
		-It may be worthwhile to create a separate column that indicates sex [based on these endings].
        <br>
	-Interracial relationships (mulatre = 1 black and 1 white parent; capre = 1 ‘mulatre’ and 1 black parent etc.).
    <br>
	-Sexual violence?
<br>
<br>
**Emplois** (employment)
<br>
<br>
	-Variety of work.
    <br>
	-Gendered dimensions of work (e.g. women more likely to look after animals).
    <br>
	-Racial dimensions of work (e.g. people of mixed race more likely to be in skilled positions).
    <br>
	-Generational dimensions of work (e.g. people with parents in a given role may have the same role).
    <br>
	-Creole dimensions of work (e.g. people identified as ‘Creole’ more likely to be in skilled positions).
    <br>
	-Age dimensions of work (e.g. age at which people begin and end working).
<br>
<br>
**Age**
<br>
<br>
	-Average age by gender.
    <br>
	-Average age by occupation.
    <br>
	-Age of childbearing? (i.e. if a woman has a child on the same plantation, can be used to figure out the age at which she had her child[ren]?).
    <br>
	-Age at which people were trafficked from Africa?
<br>
<br>
**Taille** (height)
<br>
<br>
	-Average size relative to place of birth?
    <br>
	-Average size relative to gender.
    <br>
	-Average size relative to age.
<br>
<br>
**Pays** (Country)
<br>
<br>
	-Place of origin.
    <br>
	-Number of African-born vs. American-born people.
    <br>
	-African-born people by age and gender.
    <br>
	-American-born (Creole) people by age and gender.
    <br>
	-Relationship between place of birth and family formation? (e.g. if you’re born in the Americas, are you more likely to live on a plantation with members of your birth family? Are you more or less likely to have formed and maintained a family of your own?).
    <br>
	-Regional trafficking (people born in surrounding islands end up in St. Lucia).
<br>
<br>
**Marques** (Marks)
<br>
<br>
	-Disease (e.g. “marques de petite verolle” = smallpox scars).
    <br>
		-Clusters of scarred people may point to outbreaks at a given moment (e.g. if no one over age x has these scars, we can probably assume an outbreak occurred prior to x date).
        <br>
	-Violence (e.g. “doigts coupes” = fingers cut off; “brulure” = burn marks).
    <br>
	-Ritual scarification (e.g. “marques de son pays”) indicates that they reached adulthood initiation in West African; may eventually allow people to pinpoint more precise places of origin (different scarification practices in different tribes).
<br>
<br>
**Parente** (family relations) **ORIGINAL COLUMN NOT IN FINAL DATASET --> SEE IMMEDIATELY BELOW**
<br>
<br>
	-Parent-child-grandchild-great-grandchild relations.
    <br>
	-Occasionally sibling relationships.
    <br>
	-Even more rarely, uncles or aunts are indicated. 
    <br>
	-Recognition of spousal relationships (seems to be a relationship between men in skilled positions having recognized partner[s].
    <br>
	-Average number of children per woman.
    <br>
	-Sex ratios? (e.g. whether male or female children are more likely to remain on the plantation).
    <br>
	-Whether and when men are able to form families (seems to be a relationship between men in skilled positions being recognized as fathers).
<br>
<br>
**Female Parent** (Individual ID)
<br>
<br>
    -Directly Corresponding to the Individual ID of the female parent in the Parente column.
    <br>
    -ID used for the future purpose of creating family trees.
<br>
<br>
**Male Parent** (Individual ID)
<br>
<br>
    -Directly Corresponding to the Individual ID of the male parent in the Parente column.
    <br>
    -ID used for the future purpose of creating family trees.
<br>
<br>
**Other Relations** (Relationship - First Name, Surname)
<br>
<br>
    -Contains all other relationships excluding parental for the individual.
    <br>
    -Relationship listed followed by first name and surname of relations.
<br>
<br>
**Corrections**
<br>
<br>
	-Rarely used.
    <br>
	-Occasionally shows if a person has run away (“marron”), died, or been sold.
<br>
<br>
**Gender** (M/F)
<br> 
<br>
    -Male or female gender of individual.
<br>
<br>
**Family** (Y/N)
<br> 
<br>
    -Is the individual known to have family on the plantation or is there individually.
    <br>
    -Yes or No.
<br>
<br>
**Registry Page Number** (Actual Physical)
<br>
<br>
    -Eventual link of where the registry image will be stored.
    <br>
    -Corresponds to Page Reference column.
<br>
<br>
**Page Reference** (Ancestry Pointer)
<br>
<br>
    -Registry page where individual information is located.
    <br>
    -Corresponds to Registry Page Number column.
<br>
<br>
**Plantation Name**
<br>
<br>
    -Name of individual's plantation.
<br>
<br>
**Owner**
<br>
<br>
    -Name of known plantation owner.
<br>
<br>
**Manager** (If Applicable)
<br>
<br>
    -Name of known plantation manager.
    <br>
    -Many times the manager performed day-to-day plantaion management.
<br>
<br>
**Location** (Parish)
<br>
<br>
    -Location of plantation on Saint Lucia.
    <br>
    -Location denoted as parish the plantation resided in.
<br>
<br>
**Main Production**
<br>
<br>
    -Main agricultural or manufacturing product of the plantation.
<br>
<br>
**Number of Enslaved People**
<br>
<br>
    -Known number of enslaved people on the plantation.
<br>
<br>
**Sex of Owner**
<br>
<br>
    -Known sex of plantation owner.
<br>
<br>
**Date of Registry** (If Applicable)
<br>
<br>
    -Listed date that the registry page was completed for the plantation. 
<br>
<br>
**Signature**
<br>
<br>
    -Individual who provided a signature for the registry content of a particular plantation.
    <br>
    -May be owner, manager, registrar, or other possible individuals.
<br>
<br>
**Combined Height***
<br>
<br>
    -Total height of individual in inches.
    <br>
    -Derived from Taille column.
<br>
<br>
**Birth Year***
<br>
<br>
    -Estimated birth year of individual.
    <br>
    -Derived from Date of Registry and Age columns.
<br>
<br>
**Nom (First Name) (Female Parent)***
<br>
<br>
    -First name of known female parent.
    <br>
    -Derived from Nom and Female Parent columns.
<br>
<br>
**Nom (First Name) (Male Parent)***
<br>
<br>
    -First name of known male parent.
    <br>
    -Derived from Nom and Male Parent columns.
<br>
<br>
**Notes**
<br>
<br>
    -Possible notes regarding the individual's record.
<br>
<br>
**Location (Parish)***
<br>
<br>
    -More general terms considering possible location terms to consolidate.
    <br>
    -St. Lucia has 11 parishes- Anse la Raye, Castries, Choiseul, Dauphin, Dennery, Gros Islet, Laborie, Micoud, Praslin, Soufrière and Vieux Fort. Anything that approximates those spellings but isn’t quite right (e.g. Anselaraye as all one word; Soufriere without an accent) can be changed to the form above.
    <br>
    -See Parish lookup table below.
<br>
<br>
**Emplois (Employment)***
<br>
<br>
    -More general terms considering possible occupation terms to consolidate.
    <br>
    -See Employment lookup table below.
<br>
<br>
**General (Employment)***
<br>
<br>
    -Even more general terms considering possible color terms to consolidate.
    <br>
    -Ex. fieldwork, housework, not working, rented out.
    <br>
    -See General_Employment lookup table below.
<br>
<br>
**Couleur (Color)***
<br>
<br>
    -More general terms considering possible color terms to consolidate.
    <br>
    -Mulatre/mulatresse [with and without accents]- male and female words for a person with one black and one white parent].
    <br>
    -Griffe/capre/capresse/caoresse [this last one is likely just a typo]- male and female words for a person with one Black and one ‘mulatto’ parent].
    <br>
    -Black/negro/Negre/noir/negresse [with and without accents]- male and female words for a Black person.
    <br>
    -Mestif/mestive/metis/metif/metive [with and without accents]- variable spellings for male and female words for a person with one white and one ‘mulatto’ parent.
    <br>
    -See Color lookup table below.
<br>
<br>
### Lookup Tables

**Parish** = {'Anse la Raye': ['anse la raye', 'anse laraye', 'anse laraye'], <br>
          'Castries': ['castries', 'castries [?]', 'anse des roseaux, castries', 'les groseaux [?] castries'], <br>
          'Choiseul': ['choiseul', 'choiseul and ladorie', 'choiseul?'], <br>
          'Dauphin': ['dauphin'], <br>
          'Dennery': ['dennery', "d'ennery", "d'onnery", 'dennery [?]', 'ennery', 'onnery [?]'], <br>
          'Gros Islet': ['gros islet', 'gros ilet', 'gros islet [?]'], <br>
          'Laborie': ['laborie', 'la borie', 'laborie [?]', 'laboue'], <br>
          'Micoud': ['micoud'], <br>
          'Praslin': ['praslin', 'pastin', 'prastin [?]', 'prastin', 'prastin'], <br>
          'Soufrière': ['soufriére', 'soufriere', 'soufriere [?]'], <br>
          'Vieux Fort': ['vieux fort']} <br>

**Employment** = {'Au jardin': ['au jardin', 'au jardin et accoucheuse', 'au jardin, infirme', 'jardeniere', 'jardieniere', 'jardin', 'jardinier', 'jardiniere'], <br>
              'Cultivateur': ['cultivateur', 'cultivateur, infirme', 'cultivateus[e]', 'cultivator', 'cultivatuer', 'cultvateur'], <br>
              'Cultivatrice': ['cultivatrice'], <br>
              'A la culture': ['a la culture', 'culture'], <br>
              'A la houe': ['a la houe'], <br>
              'Labourer': ['labourer', 'labourer infirme', 'labourer, infirme', 'laboureur'], <br>
              'Laboureuse': ['laboureuse'], <br>
              'Field': ['field', 'field negro'], <br>
              'Domestique': ['domestique', 'domestique al loyer journalier'], <br>
              'Domestic': ['domestic'], <br>
              'Servant[e]': ['servante', 'servant', 'servante a loyer journalier', 'servante au bourg chez me fille', 'servante dans la maison', 'servante dans le maison', 'servante de maison', 'servante domestique', 'servante, absente a la martinique', 'sevante'], <br>
              'Cuisiniere[cook]': ['cuisiniere', 'cook', 'cuisiinier', 'cuisineer', 'cuisinere', 'cuisinier', 'cuisiniére'], <br>
              'Point': ['point', 'poiint', 'poijnt'], <br>
              'Sans employ/emploi': ['sans emploi', 'sams emploi'], <br>
              'Infirme[sick]': ['infirme', 'infirm', 'infirme gardien', 'infirm huragé', 'infirm lunatique', "infirme mais employee a l'hopital", 'infirme muet', 'infirme sans doigts aux pieds', 'infirme, des ulceres a la jamber', 'infirmiere', 'infirmiére', 'infrime'], <br>
              'Suragee[with or without an accent on the first e]': ['suragee'], <br>
              'Enfant[child, i.e. too young to work]': ['enfant', 'infant', 'engant'], <br>
              'A la loue': ['a la houe', 'a la loue'], <br>
              'De loue': ['de loue', 'de houe']} <br>

**Color** = {'Mulatre': ['mulatre', 'mulatresse', 'mulatre[sse]', 'mulatress', 'mulatto', 'mustee'], <br>
         'Griffe': ['griffe', 'capre', 'capresse', 'caoresse', 'cap', 'capress', '[capre]'], <br>
         'Black': ['black', 'negro', 'negre', 'noir', 'negresse', 'negre [sic?]', 'negre infirme', 'negre rouge', 'negre rougeatre', 'negre[sse]', 'negrese', 'negresee', 'negress', 'negresse [sic?]', 'negresse rouge', 'negresse rougeatre', 'negressse', 'negroe', 'negrsese', 'ngere', 'noir', 'noire'], <br>
         'Mestif': ['mestif', 'mestive', 'metis', 'metif', 'metive', 'mestee', 'mestisse', 'méstive']} <br>

**General_Employment** = {'Fieldwork': ['au jardin', 'cultivateur', 'cultivatrice', 'a la culture', 'a la houe', 'labourer', 'laboureuse', 'field'], <br>
                      'Housework': ['domestique', 'domestic', 'servant[e]', 'cuisiniere[cook]'], <br>
                      'Not working': ['point', 'sans employ/emploi', 'infirme[sick]', 'suragee[with or without an accent on the first e]', 'enfant[child, i.e. too young to work]'], <br>
                      'Rented out': ['a la loue', 'de loue']}