# NAILDOH: Social Class Part 2

## Resources

The is the sixth notebook in the series used to prepare and analyze the NAILDOH collection.

In [101]:
# Libraries
library(tidyverse) # for data manipulation

In [102]:
# Functions
factorize <- function(df){ # Create a function
  for(i in which(sapply(df, class) == "character")) # that looks for variables with the character class 
      df[[i]] = as.factor(df[[i]]) # and converts them to factor (i.e., categorical) class
  return(df)
}

unfactorize <- function(df){ # Create a function
  for(i in which(sapply(df, class) == "factor")) # that looks for variables with the character class 
      df[[i]] = as.character(df[[i]]) # and converts them to factor (i.e., categorical) class
  return(df)
}

In [103]:
# Data
letters <- factorize(read.csv("20230423_AM_PhD-NaildohSubset.csv")) # Put csv into a dataframe called docData
colnames(letters) # Get an overview of the dataframe
dim(letters)

### Back on the Occupation Track

In [104]:
# Make a new variable called social class
# Fill with the values from the occupation (combined) variable
letters$socialClass <- letters$occupation

In [105]:
# Get a list of unique job titles (note: some cells contain multiple titles)
letters$socialClass %>%
str_split("; ") %>% 
unlist() %>% 
unique() 

In [106]:
# To what value does the "wife" problem apply
unique(letters$socialClass[which(grepl("wife", letters$socialClass))])

In [107]:
# Fix this to assign the husband's profession to the wife
letters$socialClass  <-  str_remove_all(letters$socialClass, "[:space:]wife")
letters$socialClass  <-  str_remove_all(letters$socialClass, "\'s")

In [108]:
# Check to make sure that north_american_occupation and socialClass contain the same values
setdiff(letters$occupation, letters$socialClass) # items in occupation and not socialClass
setdiff(letters$socialClass, letters$occupation) # items in socialClass and not occupation

Farmer's wife resolves to Farmer, which appears in both lists, which is why it does not appear in line 2. 

In [109]:
# Put the list of unique job titles into a list (note: some cells contain multiple titles)
Jobs  <- letters$socialClass %>%
str_split("; ") %>% 
unlist() %>% 
unique() 

print(Jobs)

 [1] "Nun"                  "Social worker"        "Teacher"             
 [4] "Military"             "Writer"               NA                    
 [7] "Clergy"               "Farmer"               "Miner"               
[10] "Homemaker"            "Merchant"             "Laborer"             
[13] "Businessman"          "Artist"               "Educator"            
[16] "Printer"              "Architect"            "Government appointee"
[19] "Politician"           "Religious leader"     "Tradesman"           
[22] "Rancher"              "Government employee"  "Surveyor"            
[25] "Manufacturer"         "Urban planner"        "Missionary"          
[28] "Military personnel"   "Royal governor"      


In [110]:
# Do I want to combine government jobs
# Let's see how they breakdown
vals <- c("Government appointee", "Government employee", "Royal governor")

letters %>% 
filter(grepl(paste(vals, collapse='|'), socialClass)) %>% 
select(docauthorname, socialClass) %>% 
unique()

Unnamed: 0_level_0,docauthorname,socialClass
Unnamed: 0_level_1,<fct>,<chr>
1,"Hudson, Henry James, 1822-",Government appointee; Politician; Religious leader
2,"Robb, Alexander, 1839-",Miner; Laborer; Rancher; Government employee
14,"Buchanan, J. C., fl. 1833",Government employee
15,"Buchanan, Alexander Carlisle, 1786-1840",Government appointee
16,"Anonymous Government Agent in Upper Canada, fl. 1833",Government employee
17,"Aylmer, Matthew, Lord, 1775-1850",Military personnel; Royal governor


Appointee, employee and governor represent different statures. For now, let's keep them.

In [111]:
# Do I want to combine education / teacher jobs?
# Let's see how they breakdown
# https://en.wikipedia.org/wiki/Robert_Hamilton_Bishop
# http://www.biographi.ca/en/bio/harris_robert_14E.html
# https://en.wikipedia.org/wiki/Sister_Blandina
# http://www.biographi.ca/en/bio/menzies_george_7E.html

vals <- c("Educator", "Teacher")

letters %>% 
filter(grepl(paste(vals, collapse='|'), socialClass)) %>% 
select(docauthorname, socialClass) %>% 
unique()

Unnamed: 0_level_0,docauthorname,socialClass
Unnamed: 0_level_1,<fct>,<chr>
1,"Segale, Sister Blandina, 1850-1941",Nun; Social worker; Teacher
56,"Harris, Robert, 1849-1919",Artist; Educator
66,"Bishop, Robert Hamilton, 1777-1855",Clergy; Educator; Writer
67,"Menzies, George, fl. 1834",Teacher


Bishop was a university professor. Segale appears to have taught in local schools. Harris taught art classes and was a founding member of the Canadian Academy of Arts. Menzies may have been a teacher in Scotland before emigrating. It seems that educator refers to higher stature positions whereas teacher refers to lower stature (though not necessarily less influential) positions. I will keep these for now.

In [112]:
# Do I want to combine Military / Military personnel jobs?
# Let's see how they breakdown
letters %>% 
filter(grepl("Military", socialClass)) %>% 
select(docauthorname, socialClass) %>% 
unique()

Unnamed: 0_level_0,docauthorname,socialClass
Unnamed: 0_level_1,<fct>,<chr>
1,"Moodie, Susannah Strickland, 1803-1885",Military; Writer
135,"Aylmer, Matthew, Lord, 1775-1850",Military personnel; Royal governor


Moodie's inherits the Military occupation from her husband. Her occupation will be changed to "Military personnel" to match Aylmer's.

In [113]:
letters$socialClass[letters$docauthorname=="Moodie, Susannah Strickland, 1803-1885"] <- "Military personnel; Writer"

In [114]:
# Put the list of unique job titles into a list (note: some cells contain multiple titles)
Jobs  <- letters$socialClass %>%
str_split("; ") %>% 
unlist() %>% 
unique() 

print(Jobs)

 [1] "Nun"                  "Social worker"        "Teacher"             
 [4] "Military personnel"   "Writer"               NA                    
 [7] "Clergy"               "Farmer"               "Miner"               
[10] "Homemaker"            "Merchant"             "Laborer"             
[13] "Businessman"          "Artist"               "Educator"            
[16] "Printer"              "Architect"            "Government appointee"
[19] "Politician"           "Religious leader"     "Tradesman"           
[22] "Rancher"              "Government employee"  "Surveyor"            
[25] "Manufacturer"         "Urban planner"        "Missionary"          
[28] "Royal governor"      


In [115]:
# Turn this list into a dataframe.
jobClass <- data.frame(Jobs)
head(jobClass)

Unnamed: 0_level_0,Jobs
Unnamed: 0_level_1,<chr>
1,Nun
2,Social worker
3,Teacher
4,Military personnel
5,Writer
6,


### Setting up the Occupational Variables

In [116]:
# Add a column for the Step 1 (Erickson) classification.
jobClass['Erickson']  <- NA
glimpse(jobClass)

Rows: 28
Columns: 2
$ Jobs     [3m[90m<chr>[39m[23m "Nun", "Social worker", "Teacher", "Military personnel", "Wri…
$ Erickson [3m[90m<lgl>[39m[23m NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…


In [117]:
# Make Erickson categories
# Agricultural (A), Industrial (I), Commercial-Clerical-Professional (CCP)

A  <- c("Farmer",
        "Rancher" 
        #"Plantation manager"
        )

I  <-  c("Miner",
         "Manufacturer"
         #"Factory worker",
         #"Transportation worker",
         #"Explorer" # Note: This is a prospector so closest to mining in the Erickson classification system.
               ) 

CCP <- c(#"Jeweler",
                "Tradesman", 
                #"Retail worker", 
                "Businessman",
         #"Businesswoman",
                #"Tailor", 
                "Merchant", 
                #"Banker",
                #"Physician", 
                   "Architect", 
                   #"Engineer", 
                   "Artist", 
                   "Writer", 
                   "Surveyor",
                   #"Secretary",
                   #"Accountant",
                   #"Editor", 
                   #"Nurse",
         "Clergy", 
         #"Barber",
            "Missionary",
            "Educator", 
            "Religious leader",
            "Social worker", 
            #"Religious worker",
            "Military personnel", 
            "Politician",
         "Government employee",
            "Government appointee",
            "Royal governor", 
            #"Diplomat",
         "Nun", 
            "Teacher",
    "Printer",
    "Urban planner"
         #"Servant", 
              #"Cook", 
             #"Housekeeper",
        #"Law enforcement"
)

Unknown  <- c("Homemaker",
            "Laborer"# it's not clear what kind of labour
             #"Student"
              )

In [118]:
# Map jobs to occupation categories.
# Agricultural
rows = which(grepl(paste(A,collapse="|"), jobClass$Job)) # Get rows that meet condition
jobClass$Erickson[rows] <- "A" # Recode data

# Get Rows where job is in Industry
rows = which(grepl(paste(I,collapse="|"), jobClass$Job))
jobClass$Erickson[rows] <- "I"

# Commercial
rows = which(grepl(paste(CCP,collapse="|"), jobClass$Job))
jobClass$Erickson[rows] <- "CCP"

# Professional
rows = which(grepl(paste(Unknown,collapse="|"), jobClass$Job))
jobClass$Erickson[rows] <- "Unknown"

# View
jobClass

Jobs,Erickson
<chr>,<chr>
Nun,CCP
Social worker,CCP
Teacher,CCP
Military personnel,CCP
Writer,CCP
,
Clergy,CCP
Farmer,A
Miner,I
Homemaker,Unknown


In [119]:
# Who are the laborers?
letters %>% 
filter(grepl("Laborer", socialClass)) %>% 
select(docauthorname, socialClass) %>% 
unique()

Unnamed: 0_level_0,docauthorname,socialClass
Unnamed: 0_level_1,<fct>,<chr>
1,"Harris, Critchlow, 1813-1899",Farmer; Merchant; Laborer
2,"Harris, Sarah Stretch, 1818-1897",Farmer; Merchant; Laborer
202,"Robb, Alexander, 1839-",Miner; Laborer; Rancher; Government employee
214,"Singer, William, fl. 1831",Laborer; Tradesman


Critchlow and Sarah Stretch Harris

"During October of 1860, after finally having given up on farming, Critchlow attempted to open a store. After a hard, slow go at it, he gave that up as well, almost exactly one year later in October of 1861 (search "store"). After this, he swallowed his pride and forewent his traditional attachment to land and proprietorship and accepted work on a wage. Critchlow began working first for their family friend, Mr. Haszard, and then for the Davies brothers. His tasks included measuring, transporting, and purchasing goods, preparing swine, and overseeing a fishing station, among other various tasks (search "Haszard" and "Davies"). Finally, the financial prospects of the Harris family improved." (http://sarah.emilieroberts.ca/index.php?page=The_Family)


On searching these terms in the source text, it seems evident that Critchlow was working for wages in the agricultural-commercial sector, especially in terms of food distribution (e.g., oats, pork, fish). (https://archive.org/embed/islandfamilyharr0000unse)

Alexander Robb

"...Alex got work among the labourers building the Cariboo wagon road.  After several years struggling to make a living, he and an Englishman were the first two Europeans to homestead in the Nicola Valley of central British Columbia..." (https://www.ancestryireland.com/alexander-robb-canada/)

Alexander served as a labourer in the transportation sector. The information for his biography gives 1910 as his deathyear. Recoding that value accordingly...

In [120]:
letters$deathyear[letters$docauthorname=="Robb, Alexander, 1839-"] <- 1910

In [121]:
letters %>% 
filter(docauthorname=="Singer, William, fl. 1831") %>% 
select(birthyear, deathyear, nationalOrigin, socialClass, occupation, religionNew, docid, sourcetitle) %>% 
unique()

Unnamed: 0_level_0,birthyear,deathyear,nationalOrigin,socialClass,occupation,religionNew,docid,sourcetitle
Unnamed: 0_level_1,<int>,<dbl>,<fct>,<chr>,<fct>,<fct>,<fct>,<fct>
1,,,English,Laborer; Tradesman,Laborer; Tradesman,,S9873-D021,Hints on Emigration to Upper Canada; Especially Addressed to the Middle and Lower Classes in Great Britain and Ireland


William Singer

"I then went to work for Mr. Silcog four months, and Jerry Annett worked on the next farm. I have worked some at my trade; a person that can work well, can get a dollar and a half per day, and in the harvest
field we can get a dollar per day."

"I design working at my trade. I have been working on a farm, chopping, and other work...I cut my hand in the summer whilst mowin...on Mr. Silcog's field...there is plenty of hard work here, we can always have plenty to do; we board and lodge with the persons we work for. I am chopping now for Mr&dot; Allworth, on his farm joining Mr. Silcog's. If any of my old acquaintances have got tired of being slaves and drudges, tell them to come to Upper Canada, to William Singer, bricklayer, he'll take them by the hand and lead them to hard work, good wages, and the best of living." (https://archive.org/embed/hintsonemigratio00doyl)


William served as a wage labourer in the agricultural sector. 


For the Erickson variable, I think I should recode Singer into A, Robb into I and Harris into CCP. Singer is clearly working on a farm. Rob is involved in building transportation infrastructure, which is more of an industrial undertaking. And Harris is working to move product, thus he is more in the commercial sector, including in some cases in roles that involve professional skills (e.g., purchasing, overseeing). However, this cannot be done until later.  For now, "Labourer" will stay in the other category. This is a to-do (DONE).

In [122]:
# Who are the homemakers?
letters %>% 
filter(grepl("Homemaker", socialClass)) %>% 
select(docid, docauthorname, socialClass, nationalOrigin, religionNew, relMin, birthyear, deathyear, authorLocation) %>% 
unique()

Unnamed: 0_level_0,docid,docauthorname,socialClass,nationalOrigin,religionNew,relMin,birthyear,deathyear,authorLocation
Unnamed: 0_level_1,<fct>,<fct>,<chr>,<fct>,<fct>,<lgl>,<int>,<dbl>,<fct>
1,S316-D137,"Roberts, Sarah, fl. 1858",Homemaker,Welsh,Christian,,,,USA
2,S316-D182,"Roberts, Sarah, fl. 1858",Homemaker,Welsh,Christian,,,,USA
3,S316-D189,"Roberts, Sarah, fl. 1858",Homemaker,Welsh,Christian,,,,USA
4,S316-D193,"Roberts, Sarah, fl. 1858",Homemaker,Welsh,Christian,,,,USA


The letters of Sarah Roberts indicate that she and her husband (Humphrey) are involved in farming, but possibly some form of industrial wage labour (e.g., coal mining, steelworks). 

"...We had 210 bushels of wheat and 160 of corn...Wages have risen in every business except farming.
There is no hope that the corn and livestock will go up..." (S316-D189)

"...Puddlers or boilers get eight dollars a ton. The charge is 480 pounds pig-iron and five charges are worked a day. There is an assistant and he gets one third. When working six, there are two assistants, one of which is paid by the day. The colliers get five cents a bushel of eighty pounds and they can raise one hundred bushels a day in a seam five feet thick. They cut and fill and the company takes it away. Laborers get from a dollar and a half to two dollars a day..." (S316-D193)

However, because occupation is unclear, I will recode Sarah as Unknown. This was DONE, then after further examination, she was assigned to the agricultural class (A). 

In [123]:
# Who are the teachers?
letters %>% 
filter(grepl("Teacher", socialClass)) %>% 
select(docauthorname, socialClass) %>% 
unique()

Unnamed: 0_level_0,docauthorname,socialClass
Unnamed: 0_level_1,<fct>,<chr>
1,"Segale, Sister Blandina, 1850-1941",Nun; Social worker; Teacher
56,"Menzies, George, fl. 1834",Teacher


In [124]:
# Menzies was also a writer (poet) and printer (newspaper)
# Recode both from teacher to educator
# http://www.biographi.ca/en/bio.php?BioId=37680
letters$socialClass[letters$docauthorname == "Menzies, George, fl. 1834"]  <- "Teacher; Writer; Printer"

In [125]:
# Who are the Government?
letters %>% 
filter(grepl("Government", socialClass)| grepl("governor", socialClass)) %>% 
select(docauthorname, socialClass) %>% 
unique()

Unnamed: 0_level_0,docauthorname,socialClass
Unnamed: 0_level_1,<fct>,<chr>
1,"Hudson, Henry James, 1822-",Government appointee; Politician; Religious leader
2,"Robb, Alexander, 1839-",Miner; Laborer; Rancher; Government employee
14,"Buchanan, J. C., fl. 1833",Government employee
15,"Buchanan, Alexander Carlisle, 1786-1840",Government appointee
16,"Anonymous Government Agent in Upper Canada, fl. 1833",Government employee
17,"Aylmer, Matthew, Lord, 1775-1850",Military personnel; Royal governor


Although Government employee, appointee and royal governor represent very different social stature, for now they will not be differentiated because in the Erickson scheme, they all are the same. This will be rectified later (DONE). For the meantime, I make the following changes based on biographical information found online.

In [126]:
# Alexander Buchanan: "merchant and emigration agent"
# http://biographi.ca/en/bio/buchanan_alexander_carlisle_1786_1840_7E.html
letters$socialClass[letters$docauthorname == "Buchanan, Alexander Carlisle, 1786-1840"]  <- 
"Government appointee; Merchant"

# For future reference:
# Matthew Aylmer: "army officer and colonial administrator"
# http://www.biographi.ca/en/bio/whitworth_aylmer_matthew_7E.html
# Henry James: " public offices, among them postmaster, justice of the peace, county commissioner, and county judge."
# https://history.nebraska.gov/collection_section/henry-james-hudson-1822-1903-rg3031-am/

In [127]:
#Re-do the classification to make sure intervening changes are captured.

# Put the list of unique job titles into a list (note: some cells contain multiple titles)
Jobs  <- letters$socialClass %>%
str_split("; ") %>% 
unlist() %>% 
unique() 

print(Jobs)

# Turn this list into a dataframe.
jobClass <- data.frame(Jobs)
head(jobClass)

# Add a column for the Step 1 (Erickson) classification.
jobClass['Erickson']  <- NA
glimpse(jobClass)

# Make Erickson categories
# Agricultural (A), Industrial (I), Commercial-Clerical-Professional (CCP)

A  <- c("Farmer",
        "Rancher" 
        #"Plantation manager"
        )

I  <-  c("Miner",
         "Manufacturer"
         #"Factory worker",
         #"Transportation worker",
         #"Explorer" # Note: This is a prospector so closest to mining in the Erickson classification system.
               ) 

CCP <- c(#"Jeweler",
                "Tradesman", 
                #"Retail worker", 
                "Businessman",
         #"Businesswoman",
                #"Tailor", 
                "Merchant", 
                #"Banker",
                #"Physician", 
                   "Architect", 
                   #"Engineer", 
                   "Artist", 
                   "Writer", 
                   "Surveyor",
                   #"Secretary",
                   #"Accountant",
                   #"Editor", 
                   #"Nurse",
         "Clergy", 
         #"Barber",
            "Missionary",
            "Educator", 
            "Religious leader",
            "Social worker", 
            #"Religious worker",
            "Military personnel", 
            "Politician",
         "Government employee",
            "Government appointee",
            "Royal governor", 
            #"Diplomat",
         "Nun", 
            "Teacher",
    "Printer",
    "Urban planner"
         #"Servant", 
              #"Cook", 
             #"Housekeeper",
        #"Law enforcement"
)

Unknown  <- c("Homemaker",
            "Laborer"# it's not clear what kind of labour
             #"Student"
              )

# Map jobs to occupation categories.
# Agricultural
rows = which(grepl(paste(A,collapse="|"), jobClass$Job)) # Get rows that meet condition
jobClass$Erickson[rows] <- "A" # Recode data

# Get Rows where job is in Industry
rows = which(grepl(paste(I,collapse="|"), jobClass$Job))
jobClass$Erickson[rows] <- "I"

# Commercial
rows = which(grepl(paste(CCP,collapse="|"), jobClass$Job))
jobClass$Erickson[rows] <- "CCP"

# Professional
rows = which(grepl(paste(Unknown,collapse="|"), jobClass$Job))
jobClass$Erickson[rows] <- "Unknown"

# View
jobClass

 [1] "Nun"                  "Social worker"        "Teacher"             
 [4] "Military personnel"   "Writer"               NA                    
 [7] "Clergy"               "Farmer"               "Miner"               
[10] "Homemaker"            "Merchant"             "Laborer"             
[13] "Businessman"          "Artist"               "Educator"            
[16] "Printer"              "Architect"            "Government appointee"
[19] "Politician"           "Religious leader"     "Tradesman"           
[22] "Rancher"              "Government employee"  "Surveyor"            
[25] "Manufacturer"         "Urban planner"        "Missionary"          
[28] "Royal governor"      


Unnamed: 0_level_0,Jobs
Unnamed: 0_level_1,<chr>
1,Nun
2,Social worker
3,Teacher
4,Military personnel
5,Writer
6,


Rows: 28
Columns: 2
$ Jobs     [3m[90m<chr>[39m[23m "Nun", "Social worker", "Teacher", "Military personnel", "Wri…
$ Erickson [3m[90m<lgl>[39m[23m NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…


Jobs,Erickson
<chr>,<chr>
Nun,CCP
Social worker,CCP
Teacher,CCP
Military personnel,CCP
Writer,CCP
,
Clergy,CCP
Farmer,A
Miner,I
Homemaker,Unknown


In [128]:
# Who are the miners?
letters %>% 
filter(grepl("Miner", socialClass)) %>% 
select(docauthorname, socialClass) %>% 
unique()

Unnamed: 0_level_0,docauthorname,socialClass
Unnamed: 0_level_1,<fct>,<chr>
1,"Williams, John R., fl. 1895",Miner
2,"Robb, Alexander, 1839-",Miner; Laborer; Rancher; Government employee
14,"Hutchings, James Mason, 1820-1902",Tradesman; Miner; Writer; Businessman


Because mining in North American during the time in question was in its infancy, many of the people described as miners were participating in the Gold Rush as independent prospectors. Therefore, they aren't really wage labourers. For example, https://en.wikipedia.org/wiki/James_Mason_Hutchings

In [129]:
# Who are the manufacturers?
letters %>% 
filter(grepl("Manufacturer", socialClass)) %>% 
select(docauthorid, docauthorname, socialClass, docid) %>% 
unique()

Unnamed: 0_level_0,docauthorid,docauthorname,socialClass,docid
Unnamed: 0_level_1,<fct>,<fct>,<chr>,<fct>
1,per0029182,"Anonymous Male Scottish Immigrant from Aberdeen, fl. 1832-1834",Manufacturer,S9865-D017
2,per0029182,"Anonymous Male Scottish Immigrant from Aberdeen, fl. 1832-1834",Manufacturer,S9865-D018
3,per0029182,"Anonymous Male Scottish Immigrant from Aberdeen, fl. 1832-1834",Manufacturer,S9865-D019
4,per0029200,"Anonymous Scottish Immigrant from Turriff, fl. 1834",Manufacturer,S9865-D039
5,per0036196,"Downe, John, fl. 1830",Manufacturer,S9974-D010


In [130]:
#According to the original letter text (S9865-D017), per0029182 is a farmer and a millwright.
#So not a laborer.  
#Recoding his socialClass to include farmer and tradesman.

letters$socialClass[letters$docauthorid=="per0029182"] <- "Manufacturer; Farmer; Tradesman"

In [131]:
#According to the original letter text (S9865-D039), per0029200 is a millwright who works for wages.
#Recoding his socialClass to indicate tradesman.
# To-do record wageLabour as TRUE (DONE)

letters$socialClass[letters$docauthorid=="per0029200"] <- "Manufacturer; Tradesman"

In [132]:
#According to the original letter text (S9865-D039), per0029200 is a manager in a factory.
# To-do record wageLabour as FALSE (DONE)

letters$socialClass[letters$docauthorid=="per0036196"] <- "Manufacturer"

In [133]:
# Create a new variable called "Labourer" in the jobClass dataframe.
jobClass['Labourer']  <- NA
glimpse(jobClass)

Rows: 28
Columns: 3
$ Jobs     [3m[90m<chr>[39m[23m "Nun", "Social worker", "Teacher", "Military personnel", "Wri…
$ Erickson [3m[90m<chr>[39m[23m "CCP", "CCP", "CCP", "CCP", "CCP", NA, "CCP", "A", "I", "Unkn…
$ Labourer [3m[90m<lgl>[39m[23m NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…


In [134]:
# Associate jobs with factor levels for this new variable.

Yes <- c("Laborer")

No <-  c("Nun",
         "Social worker",
         "Teacher",
         "Military personnel",
         "Writer",
         "Clergy",
         "Merchant",
         "Businessman",
         "Artist",
         "Educator" ,
         "Printer",
         "Architect",
         "Government appointee",
         "Politician",
         "Religious leader",
         "Government employee",
         "Surveyor", 
         "Urban planner",
         "Missionary",
         "Royal governor" )

Uncertain <- c("Homemaker", 
            "Farmer",
            "Miner",
            "Tradesman",
            "Rancher",
            "Manufacturer"
           )
                 
                 
#"Factory worker"
#"Transportation worker"
#"Jeweler"
#"Merchants, shopkeepers and peddlers
#"Retail worker"
#"Businesswoman",
#"Tailor", 
#"Barber",
#"Banker",
#"Explorer"
#"Engineer", 
#"Secretary",
#"Accountant",
#"Editor",
#"Servant", 
#"Cook", 
#"Housekeeper"
#Clergymen and schoolmasters 
#"Student", 
#"Religious worker", 
#"Nurse",
#"Physician"
#"Law enforcement",
#"Diplomat",

In [135]:
# Enter the appropriate value based on the job.

# Labourer
rows = which(grepl(paste(Yes,collapse="|"), jobClass$Job)) # Get rows that meet condition
jobClass$Labourer[rows] <- "Yes" # Recode data

# Middle class or higher
rows = which(grepl(paste(No,collapse="|"), jobClass$Job)) # Get rows that meet condition
jobClass$Labourer[rows] <- "No" # Recode data

# Middle class or higher
rows = which(grepl(paste(Uncertain,collapse="|"), jobClass$Job)) # Get rows that meet condition
jobClass$Labourer[rows] <- "Uncertain" # Recode data

# View
jobClass

Jobs,Erickson,Labourer
<chr>,<chr>,<chr>
Nun,CCP,No
Social worker,CCP,No
Teacher,CCP,No
Military personnel,CCP,No
Writer,CCP,No
,,
Clergy,CCP,No
Farmer,A,Uncertain
Miner,I,Uncertain
Homemaker,Unknown,Uncertain


### Assigning Values to the Occupation Variables

In [136]:
glimpse(letters)

Rows: 617
Columns: 26
$ docid                     [3m[90m<fct>[39m[23m S1019-D002, S1019-D004, S1019-D005, S1019-D0…
$ sourcetitle               [3m[90m<fct>[39m[23m "At the End of the Santa Fe Trail", "At the …
$ docyear                   [3m[90m<int>[39m[23m 1872, 1872, 1872, 1872, 1873, 1873, 1873, 18…
$ docmonth                  [3m[90m<int>[39m[23m 11, 12, 12, 12, 3, 7, 9, 6, 11, 6, 9, 12, 1,…
$ docday                    [3m[90m<int>[39m[23m 30, 6, 10, 21, 1, NA, NA, 30, 14, NA, NA, 16…
$ authorLocation            [3m[90m<fct>[39m[23m USA, USA, USA, USA, USA, USA, USA, USA, USA,…
$ docauthorid               [3m[90m<fct>[39m[23m per0001043, per0001043, per0001043, per00010…
$ docauthorname             [3m[90m<fct>[39m[23m "Segale, Sister Blandina, 1850-1941", "Segal…
$ authorgender              [3m[90m<fct>[39m[23m F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,…
$ agewriting                [3m[90m<int>[39m[23m 22, 22, 22, 22, 23, 23, 23, 24, 24

In [137]:
# Step 1
# Agricultural Class ("A")

rows = which(grepl(paste(A,collapse="|"), letters$socialClass)) # Get rows that meet condition
letters['A'] <- NA # Make binary variable and fill with NAs
letters$A[!is.na(letters$socialClass)]  <- FALSE # Set non-NA rows to False
letters$A[rows] <- TRUE # Set rows meeting condition to True
summary(letters$A) #Get summary

# Industrial Class ("I")

rows = which(grepl(paste(I,collapse="|"), letters$socialClass)) # Get rows that meet condition
letters['I'] <- NA # Make binary variable an fill with NAs
letters$I[!is.na(letters$socialClass)]  <- FALSE # Set non-NA rows to False
letters$I[rows] <- TRUE # Set rows meeting condition to True
summary(letters$I) #Get summary

# Commercial, Clerical & Professional ("CCP")

rows = which(grepl(paste(CCP,collapse="|"), letters$socialClass)) # Get rows that meet condition
letters['CCP'] <- NA # Make binary variable an fill with NAs
letters$CCP[!is.na(letters$socialClass)]  <- FALSE # Set non-NA rows to False
letters$CCP[rows] <- TRUE # Set rows meeting condition to True
summary(letters$CCP) #Get summary

# Unknown Class ("Unknown")

rows = which(grepl(paste(Unknown,collapse="|"), letters$socialClass)) # Get rows that meet condition
letters['Unknown'] <- NA # Make binary variable an fill with NAs
letters$Unknown[!is.na(letters$socialClass)]  <- FALSE # Set non-NA rows to False
letters$Unknown[rows] <- TRUE # Set rows meeting condition to True
summary(letters$Unknown) #Get summary

   Mode   FALSE    TRUE    NA's 
logical     286     263      68 

   Mode   FALSE    TRUE    NA's 
logical     527      22      68 

   Mode   FALSE    TRUE    NA's 
logical      32     517      68 

   Mode   FALSE    TRUE    NA's 
logical     331     218      68 

I decided not to use this hierarchical approach for reasons explained in the reflexivity blog, but I am keeping the code here in case I decide to change my mind. 

In [138]:
# This is where I use the available data to indicate where writers are wage labourers.

#no = which(grepl(paste(No,collapse="|"), letters$socialClass)) # Get rows that are not wage labourers
#yes = which(grepl(paste(Yes,collapse="|"), letters$socialClass)) # Get rows that are wage labourers
#maybe = which(grepl(paste(Unknown,collapse="|"), letters$socialClass)) # Get rows that might be wage labourers

# The order of the following is important because writers have multiple occupations. 
# I am treating these as hierarchical, such that the writer is not considered a wage 
# labourer if any one of their jobs place them in a class above

#letters['wageLabour'] <- NA # First, I fill all cells with NA
#letters$wageLabour[yes] <- "TRUE" # Second, I enter TRUE anytime one of the occupations is Labour
#letters$wageLabour[maybe] <- "Unknown" # Third, the value is upgraded if there is an occupation that might not be wage labour.
#letters$wageLabour[no] <- "FALSE" # Finally, the value is upgraded to FALSE if there is a non-labour job.
#summary(as.factor(letters$wageLabour))

In [139]:
#This is where I use the available data to indicate where writers are wage labourers.

no = which(grepl(paste(No,collapse="|"), letters$socialClass)) # Get rows that are not wage labourers
yes = which(grepl(paste(Yes,collapse="|"), letters$socialClass)) # Get rows that are wage labourers
maybe = which(grepl(paste(Uncertain,collapse="|"), letters$socialClass)) # Get rows that might be wage labourers

# The order of the following is important because writers have multiple occupations. 
# I have decided to code this variables as TRUE if any one of the writer's position places them 
# in the labouring class.

letters['wageLabour'] <- NA # First, I fill all cells with NA
letters$wageLabour[no] <- "FALSE" # Finally, the value is upgraded to FALSE if there is a non-labour job.
letters$wageLabour[maybe] <- "Uncertain" # Third, the value is upgraded if there is an occupation that might not be wage labour.
letters$wageLabour[yes] <- "TRUE" # Second, I enter TRUE anytime one of the occupations is Labour
summary(as.factor(letters$wageLabour))

Are the unknowns are indeed labourers only or something else too?

In [140]:
letters["sourceID"] <- substr(letters$docid, 1, 4)

In [141]:
letters %>% 
filter(wageLabour == "Uncertain") %>% 
select(docauthorname, socialClass, sourceID) %>% 
unique()

Unnamed: 0_level_0,docauthorname,socialClass,sourceID
Unnamed: 0_level_1,<fct>,<chr>,<chr>
1,"Thomas, William, fl. 1852",Farmer,S316
2,"Jones, John Owen, fl. 1848",Farmer,S316
3,"Pugh, Margaret",Farmer,S316
4,"Owen, Margred",Farmer,S316
5,"Roberts, Samuel, fl. 1856-1870",Clergy; Farmer,S316
6,"Williams, John R., fl. 1895",Miner,S316
7,"Roberts, Sarah, fl. 1858",Homemaker,S316
11,"Turnbull, Thomas, 1812-1869",Farmer,S855
12,"Carrothers, Nathaniel, ?-1881",Farmer; Tradesman,S963
19,"Carrothers, Joseph, 1793(?)-",Farmer; Tradesman,S963


What follows is a case-by-case determination of whether the author is a wage labourer. These assessments are based on the letter text and in some cases online biographical research.

In [142]:
letters$wageLabour[letters$docauthorname=="Thomas, William, fl. 1852"] <- "FALSE"
letters$wageLabour[letters$docauthorname=="Jones, John Owen, fl. 1848"] <- "FALSE"
letters$wageLabour[letters$docauthorname=="Pugh, Margaret"] <- "FALSE"

According to the introduction to the Welsh in America, Samuel Roberts was a "Congregational minister from Llanbrynmair who was also a tenant farmer, a scholar, and a considerable social force in
19th century Wales" (Conway, 1961, p.10). Tenant farmers were not wage laborers. Therefore, this person will be coded as "FALSE."

In [143]:
letters$wageLabour[letters$docauthorname=="Roberts, Samuel, fl. 1856-1870"] <- "FALSE"

This author specifically mentions his pay ($1.88/per day) and can therefore be coded as TRUE for wage labourer. 

In [144]:
letters$wageLabour[letters$docauthorname=="Williams, John R., fl. 1895"] <- "TRUE"

As explained previously in this notebook, Sarah and her husband Humphrey appear to be involved in farming and potentially some form of industrial wage labour, although it is not possible to discern if they are personally engaged in this form of work or just familiar with it because their sons and people in their community are engaged in it. Sarah mentions that “things are dearer in the towns than here in the country,” which does indicate that they are principally involved in farming rather than industrial work. She refers to “fair prices” for horses, which indicates that they are likely independent farmers rather than farmhands, but her knowledge of wages for labourers and working conditions in industrial settings indicates that she is very close to this lifestyle. My net impression from Sarah’s letters are that she and Humphrey are farmers but well-connected with the wider Welsh community, which is involved in a variety industries and the wage labour that is associated with it. For this reason, I am coding her as FALSE for wage labour.

In [145]:
letters$wageLabour[letters$docauthorname=="Roberts, Sarah, fl. 1858"] <- "FALSE"

Thomas initially sounds like a poor labourer because he makes multiple references to his terrible situation and to work, or rather his inability to do any, because of an illness. The metadata shows him as a farmer, but his letter goes on to speak of mining – particularly independent prospecting, which involved hiring men and purchasing tools. I therefore add mining to his list of occupations. In the introduction to his travel journal, an editor mentions that Thomas and his brother “worked a lime kiln” until forced away by illness, then they bought land and became farmers until the Gold Rush struck, at which point only Thomas continued westward (Turnbull, 1914, p. 151). It is not clear whether Thomas’ role at the lime kiln was as a paid labourer or as an owner-operator. However, given his “solid education” and other entrepreneurial / independent professions, I lean toward not treating Thomas a wage labourer (Turnbull, 1914, p. 151). 

In [146]:
letters$wageLabour[letters$docauthorname=="Turnbull, Thomas, 1812-1869"] <- "FALSE"
letters$socialClass[letters$docauthorname=="Turnbull, Thomas, 1812-1869"] <- "Farmer; Miner"

Although Nathaniel ended up buying land and becoming a farmer, he started off as a carpenter working for wages, which he reported to be five shillings per day to 20 dollars per month (Z_Houston1990_IrishEmigration.txt). Because of this experience with wage labour, I will enter "TRUE" for him and his brother.

In [147]:
letters$wageLabour[letters$docauthorname=="Carrothers, Nathaniel, ?-1881"] <- "TRUE"
letters$wageLabour[letters$docauthorname=="Carrothers, Joseph, 1793(?)-"] <- "TRUE"
letters$wageLabour[letters$docauthorname=="Anonymous English Male Immigrant in Troy, NY, fl. 1804"] <- "FALSE"
letters$wageLabour[letters$docauthorname=="Anonymous English Male Immigrant near Washington, D.C., fl. 1822"] <- "FALSE"
letters$wageLabour[letters$docauthorname=="Anonymous English Male Immigrant in Marietta, OH, fl. 1828"] <- "FALSE"
letters$wageLabour[letters$docauthorname=="Knight, James, fl. 1831"] <- "FALSE"
letters$wageLabour[letters$docauthorname=="Anonymous Male Scottish Immigrant in Buffalo, NY, fl. 1834"] <- "TRUE"
letters$wageLabour[letters$docauthorname=="Anonymous Male Scottish Immigrant from Aberdeen, fl. 1832-1834"] <- "FALSE"
letters$wageLabour[letters$docauthorname=="Anonymous Male Scottish Farmer Immigrant, fl. 1833"] <- "FALSE"
letters$wageLabour[letters$docauthorname=="Anonymous Male Scottish Immigrant, fl. 1833"] <- "FALSE"
letters$wageLabour[letters$docauthorname=="Anonymous Scottish Cabinet Maker, fl. 1833"] <- "TRUE"
letters$wageLabour[letters$docauthorname=="Anonymous Settler in Canada, fl. 1832"] <- "FALSE"
letters$wageLabour[letters$docauthorname=="Anonymous Scottish Gentleman in Canada, fl. 1834"] <- "FALSE"
letters$wageLabour[letters$docauthorname=="Anonymous Scottish Farmer from St. Fergus Parish, fl. 1834"] <- "FALSE"
letters$wageLabour[letters$docauthorname=="Anonymous Scottish Immigrant from Turriff, fl. 1834"] <- "TRUE"
letters$wageLabour[letters$docauthorname=="Anonymous Scottish Farmer from Aberdeenshire, fl. 1834"] <- "FALSE"
letters$wageLabour[letters$docauthorname=="Anonymous Young Scottish Male Immigrant at Whitby, Canada, fl. 1833"] <- "FALSE"
letters$wageLabour[letters$docauthorname=="Anonymous Young Scotsman on the Trent River, Canada, fl. 1833"] <- "FALSE"
letters$wageLabour[letters$docauthorname=="Graham, Thomas, fl. 1827"] <- "TRUE"
letters$wageLabour[letters$docauthorname=="Hutchings, James Mason, 1820-1902"] <- "FALSE"

This writer is a prospector but also writes about the necessity of serving as a day labourer on a canal or railway project, but this appearst to be advice for others, not his own experience. His work appears to be as a gold digger / prospector who teams up with others but is still more-less independent. 

In [148]:
letters$wageLabour[letters$docauthorname=="Downe, John, fl. 1830"] <- "FALSE"

This writer is a woman who is part of a family that was able to buy land and was in the process of establishing a homestead, although wage labour seems to have contributed to them being able to do this. It is not clear how much farming they are doing themselves, as opposed to purchasing or trading for provisions. For this reason, I am going to code them as TRUE for wage labour.

In [149]:
letters$wageLabour[letters$docauthorname=="Owen, Margred"] <- "TRUE"

In [150]:
# Where are we with this variable
summary(as.factor(letters$wageLabour))

In [151]:
letters$wageLabour <- as.logical(letters$wageLabour)

Bayesian imputation will be used to fill in values for the NAs for wageLabour.

In [152]:
vars <- c('north_american_occupation',
          'socialClass', 
          'A', 
          'I', 
          'CCP', 
          'Unknown',
         'wageLabour')
unique(letters[vars])

Unnamed: 0_level_0,north_american_occupation,socialClass,A,I,CCP,Unknown,wageLabour
Unnamed: 0_level_1,<fct>,<chr>,<lgl>,<lgl>,<lgl>,<lgl>,<lgl>
1,Nun; Social worker; Teacher,Nun; Social worker; Teacher,False,False,True,False,False
56,Military wife; Writer,Military personnel; Writer,False,False,True,False,False
190,,,,,,,
195,Clergy,Clergy,False,False,True,False,False
199,Farmer,Farmer,True,False,False,False,False
201,Farmer's wife,Farmer,True,False,False,False,False
202,Farmer's wife,Farmer,True,False,False,False,True
204,,Clergy; Farmer,True,False,True,False,False
206,Miner,Miner,False,True,False,False,True
207,Homemaker,Homemaker,False,False,False,True,False


NAs for north_american_occupation are derived from native_occupation, and in a few cases cases (White, Graham) from close readings of the letters. 

In [153]:
letters %>% 
filter(is.na(north_american_occupation) & 
      !is.na(socialClass)) %>% 
select(docauthorname, north_american_occupation, native_occupation, socialClass) %>% 
unique()

Unnamed: 0_level_0,docauthorname,north_american_occupation,native_occupation,socialClass
Unnamed: 0_level_1,<fct>,<fct>,<fct>,<chr>
1,"Roberts, Samuel, fl. 1856-1870",,Clergy; Farmer,Clergy; Farmer
2,"White, Jane, 1831(?)-1867",,,Businessman
8,"Knight, James, fl. 1831",,Businessman; Tradesman,Businessman; Tradesman
9,"Anonymous Scottish Gardener in Canada, fl. 1834",,Urban planner,Urban planner
10,"Anonymous Scottish Immigrant from Turriff, fl. 1834",,Manufacturer,Manufacturer; Tradesman
11,"Graham, Thomas, fl. 1827",,,Tradesman


## Returning to the to-dos

For the Erickson variable, recode Singer into A, Robb into I and the Harris' into CCP. 

In [154]:
letters  %>% 
filter(docauthorname == "Singer, William, fl. 1831") %>% 
select(socialClass, A, I, CCP, Unknown, wageLabour)%>% 
unique()

Unnamed: 0_level_0,socialClass,A,I,CCP,Unknown,wageLabour
Unnamed: 0_level_1,<chr>,<lgl>,<lgl>,<lgl>,<lgl>,<lgl>
1,Laborer; Tradesman,False,False,True,True,True


In [155]:
#Recode
letters$A[letters$docauthorname == "Singer, William, fl. 1831"] <- TRUE
letters$Unknown[letters$docauthorname == "Singer, William, fl. 1831"] <- FALSE

#Check
letters  %>% 
filter(docauthorname == "Singer, William, fl. 1831") %>% 
select(socialClass, A, I, CCP, Unknown, wageLabour)%>% 
unique()

Unnamed: 0_level_0,socialClass,A,I,CCP,Unknown,wageLabour
Unnamed: 0_level_1,<chr>,<lgl>,<lgl>,<lgl>,<lgl>,<lgl>
1,Laborer; Tradesman,True,False,True,False,True


In [156]:
letters  %>% 
filter(docauthorname == "Robb, Alexander, 1839-") %>% 
select(socialClass, A, I, CCP, Unknown, wageLabour)%>% 
unique()

Unnamed: 0_level_0,socialClass,A,I,CCP,Unknown,wageLabour
Unnamed: 0_level_1,<chr>,<lgl>,<lgl>,<lgl>,<lgl>,<lgl>
1,Miner; Laborer; Rancher; Government employee,True,True,True,True,True


Robb is already TRUE for the I variable because of his occupation as a miner. Nothing to change other than FALSE for Other category.

In [157]:
# Recode
letters$Unknown[letters$docauthorname == "Robb, Alexander, 1839-"] <- FALSE

#Check
letters  %>% 
filter(docauthorname == "Robb, Alexander, 1839-") %>% 
select(socialClass, A, I, CCP, Unknown, wageLabour) %>% 
unique()

Unnamed: 0_level_0,socialClass,A,I,CCP,Unknown,wageLabour
Unnamed: 0_level_1,<chr>,<lgl>,<lgl>,<lgl>,<lgl>,<lgl>
1,Miner; Laborer; Rancher; Government employee,True,True,True,False,True


In [158]:
letters  %>% 
filter(docauthorname == "Harris, Critchlow, 1813-1899" | 
       docauthorname == "Harris, Sarah Stretch, 1818-1897") %>% 
select(docauthorname, socialClass, A, I, CCP, Unknown, wageLabour) %>% 
unique()

Unnamed: 0_level_0,docauthorname,socialClass,A,I,CCP,Unknown,wageLabour
Unnamed: 0_level_1,<fct>,<chr>,<lgl>,<lgl>,<lgl>,<lgl>,<lgl>
1,"Harris, Critchlow, 1813-1899",Farmer; Merchant; Laborer,True,False,True,True,True
2,"Harris, Sarah Stretch, 1818-1897",Farmer; Merchant; Laborer,True,False,True,True,True


In [159]:
# Recode
letters$Unknown[letters$docauthorname == "Harris, Critchlow, 1813-1899"] <- FALSE
letters$Unknown[letters$docauthorname == "Harris, Sarah Stretch, 1818-1897"] <- FALSE

#Check
letters  %>% 
filter(docauthorname == "Harris, Critchlow, 1813-1899" | 
       docauthorname == "Harris, Sarah Stretch, 1818-1897") %>% 
select(docauthorname, socialClass, A, I, CCP, Unknown, wageLabour) %>% 
unique()

Unnamed: 0_level_0,docauthorname,socialClass,A,I,CCP,Unknown,wageLabour
Unnamed: 0_level_1,<fct>,<chr>,<lgl>,<lgl>,<lgl>,<lgl>,<lgl>
1,"Harris, Critchlow, 1813-1899",Farmer; Merchant; Laborer,True,False,True,False,True
2,"Harris, Sarah Stretch, 1818-1897",Farmer; Merchant; Laborer,True,False,True,False,True


In [160]:
#Check variable
letters  %>% 
filter(Unknown == TRUE) %>% 
select(docauthorname, socialClass, A, I, CCP, Unknown, wageLabour) %>% 
unique()


Unnamed: 0_level_0,docauthorname,socialClass,A,I,CCP,Unknown,wageLabour
Unnamed: 0_level_1,<fct>,<chr>,<lgl>,<lgl>,<lgl>,<lgl>,<lgl>
1,"Roberts, Sarah, fl. 1858",Homemaker,False,False,False,True,False


As explained in line 402, a close reading of the letters by Sarah indicate that she and her husband are most likely independent farmers. I am going to re-code her as FALSE for Unknown and TRUE for A.

In [161]:
# Re-code
letters$Unknown[letters$docauthorname == "Roberts, Sarah, fl. 1858"] <- FALSE
letters$A[letters$docauthorname == "Roberts, Sarah, fl. 1858"] <- TRUE

# Check
letters  %>% 
filter(docauthorname == "Roberts, Sarah, fl. 1858") %>% 
select(docauthorname, socialClass, A, I, CCP, Unknown, wageLabour) %>% 
unique()


Unnamed: 0_level_0,docauthorname,socialClass,A,I,CCP,Unknown,wageLabour
Unnamed: 0_level_1,<fct>,<chr>,<lgl>,<lgl>,<lgl>,<lgl>,<lgl>
1,"Roberts, Sarah, fl. 1858",Homemaker,True,False,False,False,False


In [162]:
#Check variable
letters  %>% 
filter(Unknown == TRUE) %>% 
select(docauthorname, socialClass, A, I, CCP, Unknown, wageLabour) %>% 
nrow()

In [163]:
# Who are the Government?
letters %>% 
filter(grepl("Government", socialClass)| grepl("governor", socialClass)) %>% 
select(docauthorname, socialClass, A, I,CCP, Unknown, wageLabour) %>% 
unique()

Unnamed: 0_level_0,docauthorname,socialClass,A,I,CCP,Unknown,wageLabour
Unnamed: 0_level_1,<fct>,<chr>,<lgl>,<lgl>,<lgl>,<lgl>,<lgl>
1,"Hudson, Henry James, 1822-",Government appointee; Politician; Religious leader,False,False,True,False,False
2,"Robb, Alexander, 1839-",Miner; Laborer; Rancher; Government employee,True,True,True,False,True
14,"Buchanan, J. C., fl. 1833",Government employee,False,False,True,False,False
15,"Buchanan, Alexander Carlisle, 1786-1840",Government appointee; Merchant,False,False,True,False,False
16,"Anonymous Government Agent in Upper Canada, fl. 1833",Government employee,False,False,True,False,False
17,"Aylmer, Matthew, Lord, 1775-1850",Military personnel; Royal governor,False,False,True,False,False


I am happy with these social class values so will make no further changes.

In [164]:
letters$wageLabour[letters$docauthorname == "Anonymous Scottish Immigrant from Turriff, fl. 1834"]

In [165]:
letters$wageLabour[letters$docauthorname == "Downe, John, fl. 1830"]

In [166]:
letters  <- factorize(letters)

In [167]:
glimpse(letters)

Rows: 617
Columns: 32
$ docid                     [3m[90m<fct>[39m[23m S1019-D002, S1019-D004, S1019-D005, S1019-D0…
$ sourcetitle               [3m[90m<fct>[39m[23m "At the End of the Santa Fe Trail", "At the …
$ docyear                   [3m[90m<int>[39m[23m 1872, 1872, 1872, 1872, 1873, 1873, 1873, 18…
$ docmonth                  [3m[90m<int>[39m[23m 11, 12, 12, 12, 3, 7, 9, 6, 11, 6, 9, 12, 1,…
$ docday                    [3m[90m<int>[39m[23m 30, 6, 10, 21, 1, NA, NA, 30, 14, NA, NA, 16…
$ authorLocation            [3m[90m<fct>[39m[23m USA, USA, USA, USA, USA, USA, USA, USA, USA,…
$ docauthorid               [3m[90m<fct>[39m[23m per0001043, per0001043, per0001043, per00010…
$ docauthorname             [3m[90m<fct>[39m[23m "Segale, Sister Blandina, 1850-1941", "Segal…
$ authorgender              [3m[90m<fct>[39m[23m F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,…
$ agewriting                [3m[90m<int>[39m[23m 22, 22, 22, 22, 23, 23, 23, 24, 24

In [168]:
vars  <- c('docauthorid',
           'docauthorname',
           'docid',
           'sourcetitle',
           'docyear',
           'docmonth',
           'docday',
           'authorgender',
           'agewriting',
           'birthyear',
           'deathyear',
           'religionNew',
           'relMin',
           'nationalOrigin',
           'britishEmpire_EU',
           'translated',
           'authorLocation',
           'socialClass',
           'A', 
           'I', 
           'CCP',
           'Unknown',
           'wageLabour',
           'marriagestatus',
           'maternalstatus',
          'publicLetter')

letters <- letters[vars]

In [169]:
glimpse(letters)

Rows: 617
Columns: 26
$ docauthorid      [3m[90m<fct>[39m[23m per0001043, per0001043, per0001043, per0001043, per00…
$ docauthorname    [3m[90m<fct>[39m[23m "Segale, Sister Blandina, 1850-1941", "Segale, Sister…
$ docid            [3m[90m<fct>[39m[23m S1019-D002, S1019-D004, S1019-D005, S1019-D006, S1019…
$ sourcetitle      [3m[90m<fct>[39m[23m "At the End of the Santa Fe Trail", "At the End of th…
$ docyear          [3m[90m<int>[39m[23m 1872, 1872, 1872, 1872, 1873, 1873, 1873, 1874, 1874,…
$ docmonth         [3m[90m<int>[39m[23m 11, 12, 12, 12, 3, 7, 9, 6, 11, 6, 9, 12, 1, 3, 3, 6,…
$ docday           [3m[90m<int>[39m[23m 30, 6, 10, 21, 1, NA, NA, 30, 14, NA, NA, 16, NA, NA,…
$ authorgender     [3m[90m<fct>[39m[23m F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F,…
$ agewriting       [3m[90m<int>[39m[23m 22, 22, 22, 22, 23, 23, 23, 24, 24, 26, 26, 26, 27, 2…
$ birthyear        [3m[90m<int>[39m[23m 1850, 1850, 1850, 1850, 1850, 1850, 1850, 1

In [170]:
length(unique(letters$docauthorid))

In [171]:
letters %>% # Take the dataframe
count(docauthorid, sort = TRUE) %>% # Count the number of letters per authors
count(n > 1) # Count the number of series

n > 1,n
<lgl>,<int>
False,65
True,38


In [172]:
#What is the gender breakdown by doc and by author for letters

lettersG <- letters %>% #Create new variable for the collection of letters
summarise(authorgender) #summarized by gender
table(lettersG$authorgender) #plot collection gender breakdown
prop.table(as.matrix(table(lettersG$authorgender)), 2)*100

letterAuthorsG <- letters %>% #Create new variable for the writer pool
summarise(authorgender, group_by=docauthorid) %>% #summarized by gender and grouped by author
unique() #unique values only
table(letterAuthorsG$authorgender) #plot author breakdown
prop.table(as.matrix(table(letterAuthorsG$authorgender)), 2)*100


  F   M 
405 212 

0,1
F,65.64019
M,34.35981



 F  M 
15 88 

0,1
F,14.56311
M,85.43689


In [173]:
#What is the breakdown for religious minority by doc and by author for letters

lettersG <- letters %>% #Create new variable for the collection of letters
summarise(relMin) #summarized by gender
table(lettersG$relMin) #plot collection religious minority breakdown
prop.table(as.matrix(table(lettersG$relMin)), 2)*100

letterAuthorsG <- letters %>% #Create new variable for the writer pool
summarise(relMin, group_by=docauthorid) %>% #summarized by gender and grouped by author
unique() #unique values only
table(letterAuthorsG$relMin) #plot religious minority breakdown
prop.table(as.matrix(table(letterAuthorsG$relMin)), 2)*100


FALSE  TRUE 
  452    63 

0,1
False,87.76699
True,12.23301



FALSE  TRUE 
   25     7 

0,1
False,78.125
True,21.875


In [174]:
#What is the breakdown for working class by doc and by author for letters

lettersG <- letters %>% #Create new variable for the collection of letters
summarise(wageLabour) #summarized by gender
table(lettersG$wageLabour) #plot collection religious minority breakdown
prop.table(as.matrix(table(lettersG$wageLabour)), 2)*100

letterAuthorsG <- letters %>% #Create new variable for the writer pool
summarise(wageLabour, group_by=docauthorid) %>% #summarized by gender and grouped by author
unique() #unique values only
table(letterAuthorsG$wageLabour) #plot religious minority breakdown
prop.table(as.matrix(table(letterAuthorsG$wageLabour)), 2)*100


FALSE  TRUE 
  310   239 

0,1
False,56.4663
True,43.5337



FALSE  TRUE 
   43    12 

0,1
False,78.18182
True,21.81818


In [175]:
write.csv(letters, 
          "20230507_AM_PhD-NaildohSubset.csv", 
          row.names=FALSE)

## References

Conway, A. (1961). The Welsh in America: Letters from the immigrants. University of Minnesota Press. https://www.jstor.org/stable/10.5749/j.cttts8t0.

Turnbull, T. (1914). T. Turnbull’s travels from the United States across the plains to California (F. L. (Frederic L. Paxson & R. G. Thwaites, Eds.). Madison, Published for the State Historical Society of Wisconsin. http://archive.org/details/tturnbullstravel00turnrich
