# Setup and data loading

In [1]:
library(tidyverse) # load needed code libraries

── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.4     [32m✔[39m [34mreadr    [39m 2.1.5
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.1
[32m✔[39m [34mggplot2  [39m 3.5.1     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.4     [32m✔[39m [34mtidyr    [39m 1.3.1
[32m✔[39m [34mpurrr    [39m 1.0.2     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors


In R, `<-` is used to assign something a name you can refer to later. In the below, `<- read_tsv()` is used to read in a table from a file, and name it `finnish_deaths`. 

After that, `|>` (the "pipe" operator) is used to pipe the table as input into various other functionalities. 
In this case, we're first piping the data into a function that prints summaries of each of the columns. 

Then, for the next thing we want to do, we actually use two pipes. This means that the output of the first function is fed into the second one. Here, the first function (`slice_sample(n=10)`) takes the table as input and returns a new table with just 10 random rows from the original. This truncated table is then piped to `View()` for printing. 

(At any point, you could also assign the intermediate output a name and then use that later, but often it's just easier to write the whole transformation and application pipeline together using multiple pipe operations. E.g in the below, you could also have:
```r
ten_random_rows_of_finnish_deaths <- finnish_deaths |> slice_sample(n=10)
ten_random_rows_of_finnish_deaths |> View()
```
)

In [2]:
finnish_deaths <- read_tsv("Finnish_deaths_1980-2020.tsv", col_types=cols(year='i', sex='f', age='i')) # load data
finnish_deaths |> summary() # print summary information on columns
finnish_deaths |> slice_sample(n=10) |> View() # print 10 random rows.

      year          sex               age        
 Min.   :1980   male  :1015800   Min.   :  0.00  
 1st Qu.:1990   female:1011585   1st Qu.: 66.00  
 Median :2001                    Median : 77.00  
 Mean   :2001                    Mean   : 73.82  
 3rd Qu.:2011                    3rd Qu.: 85.00  
 Max.   :2020                    Max.   :112.00  

year,sex,age
<int>,<fct>,<int>
2002,female,92
2019,male,81
2000,female,83
1999,female,90
1997,male,73
1990,female,81
1983,female,92
1996,female,70
1988,female,66
2011,male,56


In [42]:
skvr <- read_tsv("skvr.tsv", col_types=cols(poem_id='c', collector_name='f', collection_year='i', collection_place='f', n_verses='i')) # load data
summary(skvr) # print summary information on columns
View(skvr |> slice_sample(n=10)) # print 10 random rows.

   poem_id                      collector_name  collection_year
 Length:85172       Krohn, Kaarle      : 4013   Min.   :1564   
 Class :character   Alava, Vihtori     : 3081   1st Qu.:1885   
 Mode  :character   Europaeus, D. E. D.: 2768   Median :1901   
                    Paulaharju, Samuli : 2709   Mean   :1897   
                    Neovius, A. D.     : 2511   3rd Qu.:1915   
                    Porkka, Volmari    : 2421   Max.   :1939   
                    (Other)            :67669                  
    collection_place    n_verses    
 Narvusi    : 3197   Min.   :  1.0  
 Lempaala   : 2954   1st Qu.:  4.0  
 Vuokkiniemi: 2179   Median :  7.0  
 Soikkola   : 1763   Mean   : 15.2  
 Rautu      : 1728   3rd Qu.: 17.0  
 Ilomantsi  : 1566   Max.   :571.0  
 (Other)    :71785                  

poem_id,collector_name,collection_year,collection_place,n_verses
<chr>,<fct>,<int>,<fct>,<int>
SKVR VII3 loitsut 219.,"Krohn, Kaarle",1885,Juuka,34
SKVR VI1 2609.,"Kouvo, Henrik",1889,Lemi,2
SKVR VII1 629.,"Härkönen, Iivo",1903,Ilomantsi,15
SKVR VIII 2885.,"Saariluoma, Vilho",1913,Sauvo,4
SKVR I3 1768.,"Europaeus, D. E. D.",1845,Suomussalmi,24
SKVR IX1 1806.,"Vihervaara, Eemeli",1910,Tammela,3
SKVR XIV 822.,"Vaara, V. B.",1912,Mäntsälä,16
SKVR XII1 2690.,Perä-Pohjolan ja Lapin Kotiseutuyhdistys,1915,Alatornio,6
SKVR VII4 loitsut 1675.,"Krohn, Kaarle",1885,Kaavi,55
SKVR VI2 4477.,"Schadewitz, Martti",1900,Juva,4


In [43]:
ceec_people <- read_tsv("ceec_people.tsv", col_types=cols(person_id='c', sex='f', first_name='c', last_name='c', year_of_birth='i', year_of_death='i', living_region='f', societal_rank='f', societal_rank_of_father='f', religion='f', level_of_education='f', letters_sent='i', letters_received='i')) # load data
summary(ceec_people) # print summary information on columns
View(ceec_people |> slice_sample(n=10)) # print 10 random rows.

  person_id             sex        first_name         last_name        
 Length:2050        Female: 476   Length:2050        Length:2050       
 Class :character   Male  :1561   Class :character   Class :character  
 Mode  :character   NA's  :  13   Mode  :character   Mode  :character  
                                                                       
                                                                       
                                                                       
                                                                       
 year_of_birth  year_of_death        living_region        societal_rank
 Min.   :1360   Min.   :1408   Other        :510   Gentry (lower):367  
 1st Qu.:1560   1st Qu.:1606   London       :358   Nobility      :320  
 Median :1628   Median :1672   East Anglia  :182   Gentry (upper):291  
 Mean   :1626   Mean   :1673   North        :181   Professional  :291  
 3rd Qu.:1714   3rd Qu.:1765   Home counties:149   Other        

person_id,sex,first_name,last_name,year_of_birth,year_of_death,living_region,societal_rank,societal_rank_of_father,religion,level_of_education,letters_sent,letters_received
<chr>,<fct>,<chr>,<chr>,<int>,<int>,<fct>,<fct>,<fct>,<fct>,<fct>,<int>,<int>
SBYRON,Female,Sophia,Byron née Trevannion,,1790.0,London,Gentry (lower),,,,0,5
WASTON,Male,Walter,Aston,1584.0,1639.0,Court,Nobility,Gentry (upper),,,0,3
N2BACON,Male,NICHOLAS II,BACON,1543.0,1624.0,East Anglia,Gentry (upper),Gentry (upper),,Higher,7,4
JTILSON,Male,John,Tilson,,,Other,Gentry (lower),,,,0,1
WHEWER,Male,William,Hewer,1642.0,1715.0,London,Professional,Professional,,,0,1
LMARESCOE,Female,Leonora,Marescoe,1640.0,1715.0,London,Merchant,,,,0,2
IJONES,Male,INIGO,JONES,1573.0,1652.0,London,Professional,Other,,,1,0
JNORBURY,Male,JOHN,NORBURY,,,Other,Other,,,,1,0
JMOWBRJR,Male,JOHN,MOWBRAY,1444.0,1476.0,East Anglia,Nobility,Nobility,,,3,1
H5PERCY,Male,Henry,Percy,1478.0,1527.0,North,Nobility,Nobility,,,17,0
