# Importing data

Open RStudio.

Open a new R script in R and **save it as** `wpa_5_LastFirst.R` (where Last and First is your last and first name). 

Careful about: capitalizing, last and first name order, and using `_` instead of `-`.

At the top of your script, write the following (**with appropriate changes**):

In [1]:
# Assignment: WPA 5
# Name: Laura Fontanesi
# Date: 6 April 2021

## 1. Importing data
Up to this point, I gave you the code to load the data in R. 

Say instead you have your own data saved on your computer or somewhere online. How can you analyze this data in R? For that we need to learn how to import it using the tidyverse's [readr](https://readr.tidyverse.org/reference/index.html).

Data can come from different sources, e.g.:
- text files stored locally
- text files from a website

The functions you will use, depend on the specific format the data were written in:

#### Different functions to read tabular data from file:
- `read_delim()` is the principal and more general means of reading tabular data into R

- `read_csv()` sets the default separator to a comma

- `read_csv2()` is its European cousin, using a comma for decimal places and a semicolon as a separator

- `read_tsv()` import tab-delimited files

#### Different functions to read other data formats:
- Excel files: https://readxl.tidyverse.org/reference/read_excel.html

- STATA, SPSS, SAS files: https://haven.tidyverse.org/

## 2. File paths for local files

At this point, you should have a folder on your laptop for our R course, where you stored all your scripts. You should also have a subfolder called `data`. If you do not have this yet, quickly set this up now :)

When you are done, download the content of this folder 'https://www.dropbox.com/sh/j98l42hxdg66zox/AAC-IJnNGRElTGnBmQYNsgWfa?dl=0' in your `data` folder, so that the 5 data files are in your `data` folder.

To load these files in R, we need to write the path to your data folder. We can do this using code completion (Tab key): 
- On Mac, you can start from `read_delim('~/')` (or similar functions for loading data) and press Tab, to start navigating from your home folder
- On Windows, you can do the same, but starting from `read_delim('C:\Users\')`

In my case, this folder was on Dropbox:

In [1]:
library(tidyverse)

── [1mAttaching packages[22m ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──

[32m✔[39m [34mggplot2[39m 3.3.3     [32m✔[39m [34mpurrr  [39m 0.3.4
[32m✔[39m [34mtibble [39m 3.1.0     [32m✔[39m [34mdplyr  [39m 1.0.4
[32m✔[39m [34mtidyr  [39m 1.1.2     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 1.4.0     [32m✔[39m [34mforcats[39m 0.5.1

── [1mConflicts[22m ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()



In [5]:
data_a = read_delim('~/Dropbox/teaching/r-course21/data/data_to_import_a.txt', delim='\t')


[36m──[39m [1m[1mColumn specification[1m[22m [36m────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────[39m
cols(
  index = [32mcol_double()[39m,
  participant = [32mcol_double()[39m,
  gender = [31mcol_character()[39m,
  age = [32mcol_double()[39m,
  options = [31mcol_character()[39m,
  accuracy = [32mcol_double()[39m,
  RT_msec = [32mcol_double()[39m
)




In [6]:
head(data_a)

index,participant,gender,age,options,accuracy,RT_msec
<dbl>,<dbl>,<chr>,<dbl>,<chr>,<dbl>,<dbl>
1,8,male,18,CD,1,2381
2,8,male,18,CD,1,1730
3,8,male,18,AB,1,1114
4,8,male,18,AC,1,600
5,8,male,18,CD,1,683
6,8,male,18,AC,0,854


In [6]:
data_b = read_csv('~/Dropbox/teaching/r-course21/data/data_to_import_b.csv')

# same as: data_b = read_delim('~/Dropbox/teaching/r-course21/data/data_to_import_b.csv', delim = ",")


[36m──[39m [1m[1mColumn specification[1m[22m [36m────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────[39m
cols(
  id = [31mcol_character()[39m,
  gender = [32mcol_double()[39m,
  age = [32mcol_double()[39m,
  income = [32mcol_double()[39m,
  p1 = [32mcol_double()[39m,
  p2 = [32mcol_double()[39m,
  p3 = [32mcol_double()[39m,
  p4 = [32mcol_double()[39m,
  p5 = [32mcol_double()[39m,
  p6 = [32mcol_double()[39m,
  p7 = [32mcol_double()[39m,
  p8 = [32mcol_double()[39m,
  p9 = [32mcol_double()[39m,
  p10 = [32mcol_double()[39m,
  task = [32mcol_double()[39m,
  havemore = [32mcol_double()[39m,
  haveless = [32mcol_double()[39m,
  pcmore = [32mcol_double()[39m
)




In [7]:
head(data_b)

id,gender,age,income,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,task,havemore,haveless,pcmore
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
R_3PtNn51LmSFdLNM,2,26,7,1,1,1,1,1,1,1,1,1,1,0,,50.0,50
R_2AXrrg62pgFgtMV,2,32,4,1,1,1,1,1,1,1,1,1,1,0,,25.0,75
R_cwEOX3HgnMeVQHL,1,25,2,0,1,1,1,1,1,1,1,0,0,0,,10.0,90
R_d59iPwL4W6BH8qx,1,33,5,1,1,1,1,1,1,1,1,1,1,0,,50.0,50
R_1f3K2HrGzFGNelZ,1,24,1,1,1,0,1,1,1,1,1,1,1,1,99.0,,99
R_3oN5ijzTfoMy4ca,1,22,2,1,1,0,0,1,1,1,1,0,1,0,,20.0,80


In [10]:
data_c = read_csv2('~/Dropbox/teaching/r-course21/data/data_to_import_c.csv')

[36mℹ[39m Using [34m[34m','[34m[39m as decimal and [34m[34m'.'[34m[39m as grouping mark. Use [30m[47m[30m[47m`read_delim()`[47m[30m[49m[39m for more control.


[36m──[39m [1m[1mColumn specification[1m[22m [36m────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────[39m
cols(
  .default = col_character(),
  age = [32mcol_double()[39m,
  Medu = [32mcol_double()[39m,
  Fedu = [32mcol_double()[39m,
  traveltime = [32mcol_double()[39m,
  studytime = [32mcol_double()[39m,
  failures = [32mcol_double()[39m,
  famrel = [32mcol_double()[39m,
  freetime = [32mcol_double()[39m,
  goout = [32mcol_double()[39m,
  Dalc = [32mcol_double()[39m,
  Walc = [32mcol_double()[39m,
  health = [32mcol_double()[39m,
  absences = [32mcol_double()[39m,
  G1 = [32mcol_double()[39m,
  G2 = [32mcol_double()[39m,
  G3 = [32mcol_double()[

In [11]:
head(data_c)

school,sex,age,address,famsize,Pstatus,Medu,Fedu,Mjob,Fjob,⋯,famrel,freetime,goout,Dalc,Walc,health,absences,G1,G2,G3
<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<chr>,<chr>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
GP,F,18,U,GT3,A,4,4,at_home,teacher,⋯,4,3,4,1,1,3,4,0,11,11
GP,F,17,U,GT3,T,1,1,at_home,other,⋯,5,3,3,1,1,3,2,9,11,11
GP,F,15,U,LE3,T,1,1,at_home,other,⋯,4,3,2,2,3,3,6,12,13,12
GP,F,15,U,GT3,T,4,2,health,services,⋯,3,2,2,1,1,5,0,14,14,14
GP,F,16,U,GT3,T,3,3,other,other,⋯,4,3,2,1,2,5,0,11,13,13
GP,M,16,U,LE3,T,4,3,services,other,⋯,5,4,2,1,2,5,6,12,12,13


In [13]:
library(readxl)

# maybe try first: install.packages("readxl")

In [14]:
data_d = read_excel('~/Dropbox/teaching/r-course21/data/data_to_import_d.xls')

In [15]:
head(data_d)

Year,Average population,Live births,Deaths,Natural change,Crude birth rate (per 1000),Crude death rate (per 1000),Natural change (per 1000),Total fertility rates
<dbl>,<chr>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>
1900,3300000,94316,63606,30710,28.6,19.3,9.3,3.83
1901,3341000,97028,60018,37010,29.0,18.0,11.1,3.89
1902,3384000,96480,57702,38778,28.5,17.1,11.5,3.82
1903,3428000,93824,59626,34198,27.4,17.4,10.0,3.67
1904,3472000,94867,60857,34010,27.3,17.5,9.8,3.66
1905,3516000,94653,61800,32853,26.9,17.6,9.3,3.6


In [17]:
library(haven)

# maybe first: install.packages("haven")

In [18]:
data_e = read_sav('~/Dropbox/teaching/r-course21/data/data_to_import_e.sav')

In [19]:
head(data_e)

case_ID,wave,year,weight_wave,weight_aggregate,happening,cause_original,cause_other_text,cause_recoded,sci_consensus,⋯,employment,house_head,house_size,house_ages0to1,house_ages2to5,house_ages6to12,house_ages13to17,house_ages18plus,house_type,house_own
<dbl>,<dbl+lbl>,<dbl+lbl>,<dbl>,<dbl>,<dbl+lbl>,<dbl+lbl>,<chr>,<dbl+lbl>,<dbl+lbl>,⋯,<dbl+lbl>,<dbl+lbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl+lbl>,<dbl+lbl>
2,1,1,0.54,0.2939263,3,1,,6,4,⋯,5,1,3,0,0,0,0,3,1,1
3,1,1,0.85,0.4626617,2,1,,6,1,⋯,6,2,2,0,0,0,0,2,4,2
5,1,1,0.49,0.2667109,2,2,,4,2,⋯,4,2,2,0,0,0,0,2,1,1
6,1,1,0.29,0.1578493,3,2,,4,4,⋯,5,2,2,0,0,0,0,2,1,1
7,1,1,1.29,0.7021572,3,1,,6,2,⋯,1,2,2,0,0,0,0,2,1,1
8,1,1,2.56,1.3934283,2,2,,4,2,⋯,2,2,3,0,0,1,0,2,2,1


## 3. File on a website: load and save

In [20]:
data_transactions = read_csv("http://samplecsvs.s3.amazonaws.com/Sacramentorealestatetransactions.csv")

head(data_transactions)

ERROR: Error in open.connection(con, "rb"): HTTP error 404.


In [32]:
# save it to file
write_csv(data_transactions, path = "~/Dropbox/teaching/r-course21/data/Sacramentorealestatetransactions.csv")

In [17]:
# load it again
data_transactions = read_csv("~/Dropbox/teaching/r-course21/data/Sacramentorealestatetransactions.csv")

head(data_transactions)

Parsed with column specification:
cols(
  street = [31mcol_character()[39m,
  city = [31mcol_character()[39m,
  zip = [32mcol_double()[39m,
  state = [31mcol_character()[39m,
  beds = [32mcol_double()[39m,
  baths = [32mcol_double()[39m,
  sq__ft = [32mcol_double()[39m,
  type = [31mcol_character()[39m,
  sale_date = [31mcol_character()[39m,
  price = [32mcol_double()[39m,
  latitude = [32mcol_double()[39m,
  longitude = [32mcol_double()[39m
)


street,city,zip,state,beds,baths,sq__ft,type,sale_date,price,latitude,longitude
<chr>,<chr>,<dbl>,<chr>,<dbl>,<dbl>,<dbl>,<chr>,<chr>,<dbl>,<dbl>,<dbl>
3526 HIGH ST,SACRAMENTO,95838,CA,2,1,836,Residential,Wed May 21 00:00:00 EDT 2008,59222,38.63191,-121.4349
51 OMAHA CT,SACRAMENTO,95823,CA,3,1,1167,Residential,Wed May 21 00:00:00 EDT 2008,68212,38.4789,-121.431
2796 BRANCH ST,SACRAMENTO,95815,CA,2,1,796,Residential,Wed May 21 00:00:00 EDT 2008,68880,38.6183,-121.4438
2805 JANETTE WAY,SACRAMENTO,95815,CA,2,1,852,Residential,Wed May 21 00:00:00 EDT 2008,69307,38.61684,-121.4391
6001 MCMAHON DR,SACRAMENTO,95824,CA,2,1,797,Residential,Wed May 21 00:00:00 EDT 2008,81900,38.51947,-121.4358
5828 PEPPERMILL CT,SACRAMENTO,95841,CA,3,1,1122,Condo,Wed May 21 00:00:00 EDT 2008,89921,38.6626,-121.3278


## 4. Get raw files from GitHub

Go on the `data` folder where I loaded all past dataset on GitHub: https://github.com/laurafontanesi/r-seminar/tree/master/data.

Click on `tdcs.csv`.

To be able to load these data in R, we first need to get to the raw data.

You can get them by clickin on `View raw`. **Note** that for some files, instead of getting to the raw data page, you can directly dowload them to a local directory. From there, you can simply load them in R using the appropriate `read_` function.

Copy the adress of the page containing the raw data. It should be https://raw.githubusercontent.com/laurafontanesi/r-seminar/master/data/tdcs.csv

You can use now this url with one of our `read_` functions:

In [21]:
data_tdcs = read_csv("https://raw.githubusercontent.com/laurafontanesi/r-seminar/main/data/tdcs.csv")

head(data_tdcs)


[36m──[39m [1m[1mColumn specification[1m[22m [36m────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────[39m
cols(
  RT = [32mcol_double()[39m,
  acc_spd = [31mcol_character()[39m,
  accuracy = [32mcol_double()[39m,
  angle = [32mcol_double()[39m,
  block = [32mcol_double()[39m,
  coherence = [32mcol_double()[39m,
  dataset = [31mcol_character()[39m,
  id = [31mcol_character()[39m,
  left_right = [32mcol_double()[39m,
  subj_idx = [32mcol_double()[39m,
  tdcs = [31mcol_character()[39m,
  trial_NR = [32mcol_double()[39m
)




RT,acc_spd,accuracy,angle,block,coherence,dataset,id,left_right,subj_idx,tdcs,trial_NR
<dbl>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<chr>,<dbl>,<dbl>,<chr>,<dbl>
799,spd,1,180,1,0.4171,berkeley,S1.1,2,1,sham,1
613,spd,1,180,1,0.4171,berkeley,S1.1,2,1,sham,2
627,spd,1,180,1,0.4171,berkeley,S1.1,1,1,sham,3
1280,acc,0,180,1,0.4171,berkeley,S1.1,1,1,sham,4
800,spd,1,180,1,0.4171,berkeley,S1.1,2,1,sham,5
760,acc,1,180,1,0.4171,berkeley,S1.1,2,1,sham,6


## 5. Now it's your turn

**Task A**

From our Github page, get the data sets in the list below. Load them in R giving the respective names: `qualtrics_data`, `data_f`, `data_g`, `data_h`. Inspect them using `head()` or `glimpse()`. Finally, save them to your local `data` directory (that you should have as a sub-directory in your R course directory) as `csv` files.

1. 20180321_qualtrics_managers_historical_social_comparisons.dta
2. data_to_import_f.csv
3. data_to_import_g.csv
4. data_to_import_h.csv


**Task B**

Go to this website: https://www.britishelectionstudy.com/data-objects/cross-sectional-data/ (you can register for free).

Download the 2017 Face-to-face Post-election Survey Version 1.5 **SPSS file** in your local `data` directory (see above). Then, load it in R assigning it to the name `british_cross_sectional_data` using the appropriate function for SPSS files and inspect it using `head()` or `glimpse()`.

## Submit your assignment

Save and email your script to me at [laura.fontanesi@unibas.ch](mailto:laura.fontanesi@unibas.ch) by the end of **Friday**.