# Task 1

## Question 1

This question is about setting up a working environment, as detailed in this unit: https://github.com/xvrdm/unit-test/blob/master/installing_r.md.
 

#### Learner message:

Hi,  

I'm trying to install everything on a PC (which is made available from my employer and dedicated to the course).  Unfortunately, I'm having trouble already at the stage of installing the first packages.

Here is what I get :

\> install.packages("cowsay")<br>
Warning in install.packages("cowsay") :<br>
  'lib = "C:/Program Files/R/R-3.5.1/library"' is not writable <br>
Error in install.packages("cowsay") : unable to install packages

I suspect that this is because my company didn't give me all the rights on this PC (if a folder is not writable)? If so, I will ask our IT division for help.

But I wanted to ask you first. I'm sorry that my computer knowledges are rather low and I'm unable to solve the problem by myself.

Many thanks for your help in advance

## Answer 1

Indeed, it looks like you have an unfortunate admin rights limitation on your PC, which probably depends on your installed Windows version as well. There are several ways to work around this problem, but the main ones are described in detail in these links:

https://cran.r-project.org/bin/windows/base/rw-FAQ.html#Does-R-run-under-Windows-Vista_003f <br>
https://cran.r-project.org/bin/windows/base/rw-FAQ.html#I-don_0027t-have-permission-to-write-to-the-R_002d3_002e5_002e1_005clibrary-directory

If I were you I would try the following as first options:
- Run R or RStudio with Administrator privileges in sessions where you want to install packages --> right-click on the R or RStudio icon and select ’Run as Administrator’.
- Transfer ownership of the R installation to the user which installed R --> find the top-level R folder, right-click on 'Properties', select the 'Security' tab, and give 'Full Control' to the user.

If your problem is not solved this way, you can still install the packages into a different library tree or create and make use of a personal library (you can find how to do this in the provided links). These are a bit more trickier so if you get to this point and need further help let me know.

## Question 2
Note: This question is about reshaping tables with packages {tidyr} and {dplyr} as detailed in this unit:  https://exts.epfl.ch/pdf/reshaping_tibble.pdf
 

#### Learner message
Hi,

 

Despite the quizz and the explanation, I really don't understand the point of `gather()`? Can you please explain it to me in another way?


## Answer 2

To be honest I had the same feeling the first time I came across `gather()` back in the days... :-) But before we jump into the explanation, some reminders:
- Real-life data comes in many shapes and flavours, so you might need to do some data cleaning and tidying before you can actually use it further (visualization, analysis, and so on). 
- Your data is considered 'tidy' when you have one observation per row and one feature per column. 
- {tidyr} and {dplyr} are the fundamental packages for data tidying in R, and `gather()` is part of {tidyr}.

Imagine you work as an analyst for and ice-cream making company, and you are sent the following table with the actual tons of ice-cream consumption per city during the past five years:

In [2]:
library(ggplot2)
library(dplyr)
library(tidyr)

In [4]:
ice_cream

year,Lausanne.consumption,Geneva.consumption,Morges.consumption,Vevey.consumption
2013,4,2,1,2
2014,5,2,1,3
2015,2,1,2,1
2016,9,5,1,2
2017,8,4,3,4


At a first glance, you may consider that your data layout is enough for what you want to accomplish (for example, plotting a simple bar chart or line trend). However, on a closer look, it is obvious that each row is not a single observation but rather a mishmash of observations for different cities, and therefore this dataset is far from being tidy. 

The first thing to do is to identify which are the unique features you would need to tidy this dataset: year, city, and consumption. The year is already 'clean', but the other two are mixed together into 4 different columns. This is when you can use `gather()` to tidy your dataset:

In [16]:
ice_cream %>% gather(city, consumption, -year)

year,city,consumption
2013,Lausanne.consumption,4
2014,Lausanne.consumption,5
2015,Lausanne.consumption,2
2016,Lausanne.consumption,9
2017,Lausanne.consumption,8
2013,Geneva.consumption,2
2014,Geneva.consumption,2
2015,Geneva.consumption,1
2016,Geneva.consumption,5
2017,Geneva.consumption,4


What you are doing here is the following:

- Creating a 'city' column based on the values of the column headers of your initial dataframe
- Creating a 'consumption' column based on the values of the actual consumption 
- Selecting which group of columns you want to do the `gather()` operation on (in our example, all the columns apart from the year).

There you go, your dataset it already tidy :-) It would be much nicer to have the city names as actual city names, so you could split the city column and drop the second column resulting from this operation:

In [14]:
ice_cream %>% gather(city, consumption, -year) %>% separate(city, c("city", "type")) %>% select(-type)

year,city,consumption
2013,Lausanne,4
2014,Lausanne,5
2015,Lausanne,2
2016,Lausanne,9
2017,Lausanne,8
2013,Geneva,2
2014,Geneva,2
2015,Geneva,1
2016,Geneva,5
2017,Geneva,4


To test your understanding, I propose you a challenge. Let's assume that you are given the actual and forecasted consumption of ice-cream in the following form:

In [21]:
ice_cream_for

year,Lausanne.consumption,Lausanne.forecast,Geneva.consumption,Geneva.forecast,Morges.consumption,Morges.forecast,Vevey.consumption,Vevey.forecast
2013,4,2,2,3,1,2,2,1
2014,5,10,2,2,1,3,3,5
2015,2,14,1,5,2,5,1,4
2016,9,3,5,2,1,7,2,8
2017,8,7,4,7,3,2,4,10


How would you do to tidy it up? (hint: you might need to use other {tidyr} and/or {dplyr} functions apart from `gather()` !) Your final and tidy dataframe should look like this:

In [22]:
ice_cream_for %>% gather(`City-Consumption`, Consumption, -year) %>% separate(`City-Consumption`, c("city", "Transaction")) %>% spread(Transaction, Consumption)

year,city,consumption,forecast
2013,Geneva,2,3
2013,Lausanne,4,2
2013,Morges,1,2
2013,Vevey,2,1
2014,Geneva,2,2
2014,Lausanne,5,10
2014,Morges,1,3
2014,Vevey,3,5
2015,Geneva,1,5
2015,Lausanne,2,14
