In [None]:
options(jupyter.rich_display = F)

# LIST NEIGHBOURS

We have an 81x81 matrix of bird fly distances in km's between 81 province centers in Turkey.

To retrieve this matrix please follow the link below to download and save the file distance2.RData to the default location:

[distance2.RData](../file/distance2.RData)

After you download the file to your local computer, load the data and convert it to a list as such:

```R
load("~/Downloads/distance2.RData")
diag(distance2) <- Inf
distancel <- split(distance2, row(distance2))
names(distancel) <- rownames(distance2)
distancel2 <- lapply(distancel, function(x) {names(x) <- rownames(distance2); x})

```

And check the new object:
```R
> length(distancel2)
[1] 81

> str(distancel2[1:3])

List of 3
 $ adana         : Named num [1:81] Inf 274 464 738 410 ...
  ..- attr(*, "names")= chr [1:81] "adana" "adiyaman" "afyonkarahisar" "agri" ...
 $ adiyaman      : Named num [1:81] 274 Inf 684 468 384 ...
  ..- attr(*, "names")= chr [1:81] "adana" "adiyaman" "afyonkarahisar" "agri" ...
 $ afyonkarahisar: Named num [1:81] 464 684 Inf 1082 500 ...
  ..- attr(*, "names")= chr [1:81] "adana" "adiyaman" "afyonkarahisar" "agri" ...   
```
As you see, each row in **distance2** matrix becomes a vector item in **distancel2** list

Please write a function **getc** that takes three arguments:
- **radius** : a numeric value
- **minc** : a numeric value
- **listx** : a list object

The function:
- should sapply() through **listx** and get the count of neighbours within (distance smaller than or equal to) the **radius** for each item. The output should be a vector and be assigned to an object.
- Filter that vector object for the values greater than or equal to **minc** as such:

```R
> getc(radius = 100, minc = 5, listx = distancel2)

bilecik  sakarya  bayburt   yalova osmaniye 
       6        5        5        5        5 
```

So these are the cities that have at least 5 neighbours within 100 km radius

```R
> getc(radius = 150, minc = 9, listx = distancel2)

   bingol eskisehir   sakarya 
        9         9        10
```

And these are the cities that have at least 9 neighbours within 150 km radius


**Hints:**
- Note that each item of **listx** is a vector
- Inside the function, first **sapply()** through **listx** and write a function using the function(x) ... notation as the second argument to **sapply**. Assign the vector output to an object.
- Then filter that vector object

In [None]:
load("~/Downloads/distance2.RData")
diag(distance2) <- Inf
distancel <- split(distance2, row(distance2))
names(distancel) <- rownames(distance2)
distancel2 <- lapply(distancel, function(x) {names(x) <- rownames(distance2); x})
length(distancel2)
str(distancel2[1:3])

In [None]:
getc <- function(radius, minc, listx)
{
    counts <- sapply(listx, function(x) sum(x <= radius))
    counts[counts >= minc]
}
                     
getc(radius = 100, minc = 5, listx = distancel2)
getc(radius = 150, minc = 9, listx = distancel2)

# GAINS BY  WEEKDAY

Please download the following file:

[flights14.csv](~/Downloads/flights14.csv)

And run the following code:

```R
flights <- read.csv("~/Downloads/flights14.csv")
flights$dates <- with(flights, as.Date(paste(year, month, day, sep = "-")))
flights$weekdays <- weekdays(flights$dates, abbreviate = T)
```

```R
> head(flights, 5)

  year month day dep_time dep_delay arr_time arr_delay cancelled carrier
1 2014 1     1    914     14        1238      13       0         AA     
2 2014 1     1   1157     -3        1523      13       0         AA     
3 2014 1     1   1902      2        2224       9       0         AA     
4 2014 1     1    722     -8        1014     -26       0         AA     
5 2014 1     1   1347      2        1706       1       0         AA     
  tailnum flight origin dest air_time distance hour min dates      weekdays
1 N338AA    1    JFK    LAX  359      2475      9   14  2014-01-01 Wed     
2 N335AA    3    JFK    LAX  363      2475     11   57  2014-01-01 Wed     
3 N327AA   21    JFK    LAX  351      2475     19    2  2014-01-01 Wed     
4 N3EHAA   29    LGA    PBI  157      1035      7   22  2014-01-01 Wed     
5 N319AA  117    JFK    LAX  350      2475     13   47  2014-01-01 Wed     
```

You should write a function named **gainw** that takes two arguments:

- **mon** : A numeric value for month
- **df** : A data frame object

The function should:
- first filter the rows where the **month** column is equal to **mon** and assign into a new object
- Into the new object, assign a new column named **gain** which is the difference between **dep_delay** and **arr_delay** columns (**dep_delay** minus **arr_delay**)
- and for each unique value in the **weekday** column, calculate the **mean** value of **gain** column as such:

```R
> gainw(mon = 1, df = flights)

  Group.1 x        
1 Fri      3.198806
2 Mon      3.364228
3 Sat      2.205575
4 Sun      5.868331
5 Thu     -1.458835
6 Tue      1.061756
7 Wed      2.071683

> gainw(mon = 2, df = flights)

  Group.1 x         
1 Fri      0.5329193
2 Mon     -4.8831820
3 Sat      1.6815501
4 Sun      2.2282392
5 Thu     -0.9030837
6 Tue      2.3986361
7 Wed      2.0422684

```

**Hints:**
- Inside the function, first filter the rows of the **df** and assign into a new object
- Calculate the difference and assign into the new **gain** column 
- Then you should use **aggregate** and **mean** functions

In [None]:
flights <- read.csv("~/Downloads/flights14.csv")
flights$dates <- with(flights, as.Date(paste(year, month, day, sep = "-")))
flights$weekdays <- weekdays(flights$dates, abbreviate = T)
head(flights, 5)

In [None]:
gainw <- function(mon, df)
{
    df2 <- df[df$month == mon,]
    df2$gain <- df2$dep_delay - df2$arr_delay
    aggregate(df2$gain, by = list(df2$weekdays), mean)
}

gainw(mon = 1, df = flights)
gainw(mon = 2, df = flights)

# MERGE RIGHT

Please create two random data frames **heights** and **weights** as such:


```R
vars <- expand.grid(letters[1:3], LETTERS[1:3])

RNGversion("3.3.1")
set.seed(20)
heights <- data.frame(vars[sample(1:9, 5),], height = round(rnorm(5, 160, 15), 2))
weights <- data.frame(vars[sample(1:9, 5),], weight = round(rnorm(5, 60, 10), 2))
names(weights) <- c("V1", "V2", "weight")
```

```R
> heights

  Var1 Var2 height
8 b    C    190.92
7 a    C    137.95
2 b    A    155.03
4 a    B    170.49
5 b    B    169.78

> weights

  V1 V2 weight
5 b  B  59.80 
3 c  A  58.50 
1 a  A  53.72 
2 b  A  73.23 
9 c  C  44.79 
```

Write a function **mright** that takes two arguments, **df1** and **df2**, two data frames.

The function should **RIGHT** join two data frames so that:
- **Var1** and **Var2** columns in **df1** match **V1** and **V2** columns in **df2** respectively

as such:

```R
mright(df1 = heights, df2 = weights)

  Var1 Var2 height weight
1 a    A        NA 53.72 
2 b    A    155.03 73.23 
3 b    B    169.78 59.80 
4 c    A        NA 58.50 
5 c    C        NA 44.79  
```

**Hint:** You should use **merge** function

In [None]:
vars <- expand.grid(letters[1:3], LETTERS[1:3])

RNGversion("3.3.1")
set.seed(20)
heights <- data.frame(vars[sample(1:9, 5),], height = round(rnorm(5, 160, 15), 2))
weights <- data.frame(vars[sample(1:9, 5),], weight = round(rnorm(5, 60, 10), 2))
names(weights) <- c("V1", "V2", "weight")
heights
weights

In [None]:
mright <- function(df1, df2)
{
    merge(df1, df2,
          by.x = c("Var1", "Var2"),
          by.y = c("V1", "V2"),
          all.y = T)
}

mright(df1 = heights, df2 = weights)