# Merge Datasets

<h2>Packages</h2>

In [2]:
library(tidyverse)

<h2>Functions</h2>

In [3]:
# Converts all factors to character class
unfactorize <- function(df){
  for(i in which(sapply(df, class) == "factor")) df[[i]] = as.character(df[[i]])
  return(df)
}

In [4]:
# Converts character to factor class
factorize <- function(df){
  for(i in which(sapply(df, class) == "character")) df[[i]] = as.factor(df[[i]])
  return(df)
}

<h2>Data</h2>

In [5]:
# Load data related to authors and documents (metadata)
dfMeta <- unfactorize(read.csv("20230606_AM_PhD-NaildohSubset.csv"))
glimpse(dfMeta)

Rows: 576
Columns: 24
$ docauthorid      [3m[90m<chr>[39m[23m "per0001043", "per0001043", "per0001043", "per0001043…
$ docauthorname    [3m[90m<chr>[39m[23m "Segale, Sister Blandina, 1850-1941", "Segale, Sister…
$ docid            [3m[90m<chr>[39m[23m "S1019-D002", "S1019-D004", "S1019-D005", "S1019-D006…
$ sourcetitle      [3m[90m<chr>[39m[23m "At the End of the Santa Fe Trail", "At the End of th…
$ docyear          [3m[90m<int>[39m[23m 1872, 1872, 1872, 1872, 1873, 1873, 1873, 1874, 1874,…
$ docmonth         [3m[90m<int>[39m[23m 11, 12, 12, 12, 3, 7, 9, 6, 11, 6, 9, 12, 1, 3, 3, 6,…
$ docday           [3m[90m<int>[39m[23m 30, 6, 10, 21, 1, NA, NA, 30, 14, NA, NA, 16, NA, NA,…
$ authorgender     [3m[90m<chr>[39m[23m "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F"…
$ agewriting       [3m[90m<int>[39m[23m 22, 22, 22, 22, 23, 23, 23, 24, 24, 26, 26, 26, 27, 2…
$ birthyear        [3m[90m<int>[39m[23m 1850, 1850, 1850, 1850, 1850, 1850, 1850, 1

In [6]:
# Column names
names(dfMeta)

In [7]:
# Load data related to authors and documents (metadata)
dfNarrative <- unfactorize(read.csv("20240220_PhD_SentimentChunk.csv"))[-1]
head(dfNarrative)

Unnamed: 0_level_0,chunk,docid,sequence,scoreNeg,scorePos,scoreNeu,scoreCompound,chunks,position
Unnamed: 0_level_1,<chr>,<chr>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<int>,<dbl>
1,"TRINIDAD On Train from Steubenville, Ohio, to Cincinnati. Nov 30, 1872. My Darling Sister Justina: How interestedly you, Sister M Louis and myself read Eugénie de Guérin's Journal and her daily anxieties to save her brother from being a spiritual outcast! This Journal which I propose keeping for you will deal with incidents occurring on my journey to Trinidad and happenings in that far-off land to which I am consigned. The Journal will begin with the first act. Here is Mother Josephine's letter: Mt St Vincent, O, Nov 27, 1872. Sister Blandina, Steubenville, O My Dear Child: You are missioned to Trinidad. You will leave Cincinnati Wednesday and alone. Mother Regina will attend to your needs. Devotedly, Mother Josephine. This letter thrilled us both. I was delighted to make the sacrifice, and you were hiding your feelings that I might not lose any merit. Neither of us could find Trinidad on the map except in the island of Cuba. So we concluded that Cuba was my destination. I was to leave Steubenville quietly so that none of my obstreperous pupils might cause the incoming teacher",S1019-D002,1,0.053,0.119,0.827,0.9425,15,0.06666667
2,"Josephine. This letter thrilled us both. I was delighted to make the sacrifice, and you were hiding your feelings that I might not lose any merit. Neither of us could find Trinidad on the map except in the island of Cuba. So we concluded that Cuba was my destination. I was to leave Steubenville quietly so that none of my obstreperous pupils might cause the incoming teacher annoyance. Hence I went to Sunday catechetical class as usual 2:00 P M I was to take the 3:00 P M train for Cincinnati. I said to my hopefuls, ""Instead of catechism, I'm going to tell you an Indian story to-day."" The schoolhouse roof was not disturbed, though the hurrahs were loud enough! The moral of the story was ""Indian Endurance."" Dismissed them at two-thirty without one word of goodbye except the daily one. You remember how surprised I was to see a crowd at the station to wish me ""Godspeed""; I thought I was to slip away without anyone's knowledge except our own. Mr Tait and Mr McCann wished to speak to me alone. Both had been in the West. ""You will have a long travel on the plains,"" they said, ""before you",S1019-D002,2,0.05,0.092,0.859,0.8625,15,0.13333333
3,"them at two-thirty without one word of goodbye except the daily one. You remember how surprised I was to see a crowd at the station to wish me ""Godspeed""; I thought I was to slip away without anyone's knowledge except our own. Mr Tait and Mr McCann wished to speak to me alone. Both had been in the West. ""You will have a long travel on the plains,"" they said, ""before you reach Trinidad."" ""Where is Trinidad?"" ""A little mining town in Southwestern Colorado."" So then I knew my destination, which, of course I would have been told at Mt St Vincent. Both gentlemen said they had traveled on the plains on the Santa Fe Trail, and they seemed to have made it a matter of conscience to inform me on the subject of cowboys. This in substance was their conversation with me: ""Sister, you may be snow-bound while on the plains."" I looked my assent, I knew I could not stop the snow. ""Travelers are sometimes snow-bound for two weeks, and you are alone. This, though, is not the greatest danger to you."" Mentally I was wishing both gentlemen somewhere else. ""Your real danger is from cowboys."" I looked at",S1019-D002,3,0.037,0.08,0.883,0.6977,15,0.2
4,"This in substance was their conversation with me: ""Sister, you may be snow-bound while on the plains."" I looked my assent, I knew I could not stop the snow. ""Travelers are sometimes snow-bound for two weeks, and you are alone. This, though, is not the greatest danger to you."" Mentally I was wishing both gentlemen somewhere else. ""Your real danger is from cowboys."" I looked at the speakers. ""You do not seem to grasp our meaning. No virtuous woman is safe near a cowboy."" Both gave up trying to make me understand what they considered danger. Why should snow or cowboys frighten me any more than others who will be traveling the same way! So you see, dearest, I'm not going to so long a distance as we thought. At three o'clock A M the baggage checker came through our coach. I looked to see how much the pocketbook contained just twenty-five cents. If I used it to ride to the Good Samaritan I'd be minus the fare to Mt St Vincent, so I made up my mind to skirt around from the Little Miami to the Good Samaritan Hospital. At four A M I rang the front doorbell no response. I sat on the stone",S1019-D002,4,0.056,0.132,0.812,0.9451,15,0.26666667
5,"three o'clock A M the baggage checker came through our coach. I looked to see how much the pocketbook contained just twenty-five cents. If I used it to ride to the Good Samaritan I'd be minus the fare to Mt St Vincent, so I made up my mind to skirt around from the Little Miami to the Good Samaritan Hospital. At four A M I rang the front doorbell no response. I sat on the stone steps and waited till I heard the rising bell, then I waited another fifteen minutes when again I rang the doorbell. Sister Anthony came. ""Why, child, where did you come from; how did you get here? I'm sure you are cold."" I said I came from Steubenville. ""Oh, yes! dear Father Bigelow died there. He was good to this hospital. Last year he sent a barge of coal to us."" I said he was good to anyone in need. He died possessed of three dollars and fifty cents. His hand was always open to any kind of distress. I did not mention to Sister Anthony that I walked from the Little Miami Station. After Mass, breakfast and miles of sympathy and ""God Bless You"" from the Sisters at the Good Samaritan, one of the nurses",S1019-D002,5,0.059,0.148,0.793,0.9509,15,0.33333333
6,"year he sent a barge of coal to us."" I said he was good to anyone in need. He died possessed of three dollars and fifty cents. His hand was always open to any kind of distress. I did not mention to Sister Anthony that I walked from the Little Miami Station. After Mass, breakfast and miles of sympathy and ""God Bless You"" from the Sisters at the Good Samaritan, one of the nurses accompanied me to Fifth and Vine Streets, where I was to take ""Barney's Bus"" for Mt St Vincent. The bus was to leave at ten A M I waited till three P M then asked one of the clerks if he thought Mr McCabe would run the bus that day. ""I fear no bus will run to-day. There is an epidemic epizootic among the horses."" I asked if Mr Segale's place of business was anywhere near. He pointed to Wood's Theater and I started to the place indicated. Brother Henry managed to find a ""hack"" to send me to Mt St Vincent. On the way, between the first ascent of the hill and the Seminary, I met Sisters Gabriella and Delphina walking in the slush and cold on their way to the Orphan Asylum. I stopped to take them in. They returned",S1019-D002,6,0.071,0.074,0.854,0.1548,15,0.4


In [8]:
# Are the values identical
identical(unique(dfMeta$docid),unique(dfNarrative$docid))

In [9]:
#Merge datasets
df <- right_join(dfMeta, dfNarrative, by = 'docid')
glimpse(df)

Rows: 3,785
Columns: 32
$ docauthorid      [3m[90m<chr>[39m[23m "per0001043", "per0001043", "per0001043", "per0001043…
$ docauthorname    [3m[90m<chr>[39m[23m "Segale, Sister Blandina, 1850-1941", "Segale, Sister…
$ docid            [3m[90m<chr>[39m[23m "S1019-D002", "S1019-D002", "S1019-D002", "S1019-D002…
$ sourcetitle      [3m[90m<chr>[39m[23m "At the End of the Santa Fe Trail", "At the End of th…
$ docyear          [3m[90m<int>[39m[23m 1872, 1872, 1872, 1872, 1872, 1872, 1872, 1872, 1872,…
$ docmonth         [3m[90m<int>[39m[23m 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 1…
$ docday           [3m[90m<int>[39m[23m 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 3…
$ authorgender     [3m[90m<chr>[39m[23m "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F"…
$ agewriting       [3m[90m<int>[39m[23m 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 2…
$ birthyear        [3m[90m<int>[39m[23m 1850, 1850, 1850, 1850, 1850, 1850, 1850,

In [10]:
# Apply function to turn character class variables to factor class.
# df  <- factorize(df)
# summary(df)

In [11]:
df[1, c("docid", "chunk")]

Unnamed: 0_level_0,docid,chunk
Unnamed: 0_level_1,<chr>,<chr>
1,S1019-D002,"TRINIDAD On Train from Steubenville, Ohio, to Cincinnati. Nov 30, 1872. My Darling Sister Justina: How interestedly you, Sister M Louis and myself read Eugénie de Guérin's Journal and her daily anxieties to save her brother from being a spiritual outcast! This Journal which I propose keeping for you will deal with incidents occurring on my journey to Trinidad and happenings in that far-off land to which I am consigned. The Journal will begin with the first act. Here is Mother Josephine's letter: Mt St Vincent, O, Nov 27, 1872. Sister Blandina, Steubenville, O My Dear Child: You are missioned to Trinidad. You will leave Cincinnati Wednesday and alone. Mother Regina will attend to your needs. Devotedly, Mother Josephine. This letter thrilled us both. I was delighted to make the sacrifice, and you were hiding your feelings that I might not lose any merit. Neither of us could find Trinidad on the map except in the island of Cuba. So we concluded that Cuba was my destination. I was to leave Steubenville quietly so that none of my obstreperous pupils might cause the incoming teacher"


In [12]:
write.csv(df, "20240220_PhD_Data4TopicModel-Chunk.csv")