In [None]:
options(jupyter.rich_display = F)

# LIST NEIGHBOURS

We have an 81x81 matrix of bird fly distances in km's between 81 province centers in Turkey.

To retrieve this matrix please follow the link below to download and save the file distance2.RData to the default location:

[distance2.RData](../file/distance2.RData)

After you download the file to your local computer, load the data and convert it to a list as such:

```R
load("~/Downloads/distance2.RData")
diag(distance2) <- Inf
distancel <- split(distance2, row(distance2))
names(distancel) <- rownames(distance2)
distancel2 <- lapply(distancel, function(x) {names(x) <- rownames(distance2); x})

```

And check the new object:
```R
> length(distancel2)
[1] 81

> str(distancel2[1:3])

List of 3
 $ adana         : Named num [1:81] Inf 274 464 738 410 ...
  ..- attr(*, "names")= chr [1:81] "adana" "adiyaman" "afyonkarahisar" "agri" ...
 $ adiyaman      : Named num [1:81] 274 Inf 684 468 384 ...
  ..- attr(*, "names")= chr [1:81] "adana" "adiyaman" "afyonkarahisar" "agri" ...
 $ afyonkarahisar: Named num [1:81] 464 684 Inf 1082 500 ...
  ..- attr(*, "names")= chr [1:81] "adana" "adiyaman" "afyonkarahisar" "agri" ...   
```

As you see, each row in **distance2** matrix becomes a vector item in **distancel2** list

Please write a function **getn** that takes three arguments:
- **cities** : a vector of city names or city indices (e.g. index of istanbul is 34)
- **radius** : a numeric value
- **listx** : a list object

The function:
- should subset **listx** with the **cities** vector
- for each item in the list subset, filter for the neighbour cities within the **radius** and sort them
- return a list of filtered and sorted vectors as such:

```R
> getn(cities = c(34, 6), radius = 100, listx = distancel2)

$istanbul
 yalova kocaeli   bursa 
     47      78      91 

$ankara
kirikkale   cankiri 
       56        98 

> getn(cities = c(1, 35), radius = 150, listx = distancel2)

$adana
  mersin osmaniye    hatay    nigde 
      64       82      112      123 

$izmir
manisa  aydin 
    33    136 
```

**Hints:**
- Note that each item in **listx** is a vector
- Inside the function, first subset the list with **cities** and assign the subset to a new list object
- **lapply()** through this new object and write a function using the function(x) ... notation as the second argument to lapply


In [None]:
load("~/Downloads/distance2.RData")
diag(distance2) <- Inf
distancel <- split(distance2, row(distance2))
names(distancel) <- rownames(distance2)
distancel2 <- lapply(distancel, function(x) {names(x) <- rownames(distance2); x})
length(distancel2)
str(distancel2[1:3])

In [None]:
getn <- function(cities, radius, listx)
{
    lapply(listx[cities], function(x) sort(x[x <= radius]))
}
           
getn(cities = c(34, 6), radius = 100, listx = distancel2)
getn(cities = c(1, 35), radius = 150, listx = distancel2)

# DELAYS BY HOUR

Please download the following file:

[flights14.csv](~/Downloads/flights14.csv)

And run the following code:

```R
flights <- read.csv("~/Downloads/flights14.csv")
flights$dates <- with(flights, as.Date(paste(year, month, day, sep = "-")))
flights$weekdays <- weekdays(flights$dates, abbreviate = T)
```

```R
> head(flights, 5)

  year month day dep_time dep_delay arr_time arr_delay cancelled carrier
1 2014 1     1    914     14        1238      13       0         AA     
2 2014 1     1   1157     -3        1523      13       0         AA     
3 2014 1     1   1902      2        2224       9       0         AA     
4 2014 1     1    722     -8        1014     -26       0         AA     
5 2014 1     1   1347      2        1706       1       0         AA     
  tailnum flight origin dest air_time distance hour min dates      weekdays
1 N338AA    1    JFK    LAX  359      2475      9   14  2014-01-01 Wed     
2 N335AA    3    JFK    LAX  363      2475     11   57  2014-01-01 Wed     
3 N327AA   21    JFK    LAX  351      2475     19    2  2014-01-01 Wed     
4 N3EHAA   29    LGA    PBI  157      1035      7   22  2014-01-01 Wed     
5 N319AA  117    JFK    LAX  350      2475     13   47  2014-01-01 Wed     
```

You should write a function named **hourd** that takes two arguments:

- **or** : A character value for origin airport code
- **df** : A data frame object

The function should:
- first filter the rows where the **origin** column is equal to **or**
- and for each unique value in the **hour** column, calculate the **median** value of **dep_delay** column as such:

```R
> hourd(or = "JFK", df = flights)

   Group.1 x    
1   0       38.0
2   1      151.0
3   2      200.0
4   3      281.0
5   4      331.5
6   5       -4.0
7   6       -4.0
8   7       -3.0
9   8       -3.0
10  9       -2.0
11 10       -2.0
12 11       -3.0
13 12       -3.0
14 13       -1.0
15 14       -2.0
16 15        0.0
17 16       -1.0
18 17        0.0
19 18        0.0
20 19        2.0
21 20        2.0
22 21       10.0
23 22        3.0
24 23       15.0
25 24        1.0

> hourd(or = "EWR", df = flights)

   Group.1 x    
1   0      197.0
2   1      251.0
3   2      360.0
4   4       -5.0
5   5       -4.0
6   6       -3.0
7   7       -3.0
8   8       -3.0
9   9       -1.5
10 10       -3.0
11 11       -2.0
12 12       -2.0
13 13        0.0
14 14        0.0
15 15        4.0
16 16        1.0
17 17        5.0
18 18        5.0
19 19        9.0
20 20        7.0
21 21       16.0
22 22       65.0
23 23      129.0
24 24      241.0
```

**Hints:**
- Inside the function, first filter the rows of the **df** and assign into a new object 
- Then you should use **aggregate** and **median** functions

In [None]:
flights <- read.csv("~/Downloads/flights14.csv")
flights$dates <- with(flights, as.Date(paste(year, month, day, sep = "-")))
flights$weekdays <- weekdays(flights$dates, abbreviate = T)
head(flights, 5)

In [None]:
hourd <- function(or, df = flights)
{
    df2 <- df[df$origin == or,]
    aggregate(df2$dep_delay, by = list(df2$hour), median)
}

hourd(or = "JFK", df = flights)
hourd(or = "EWR", df = flights)

# MERGE LEFT

Please create two random data frames **heights** and **weights** as such:


```R
vars <- expand.grid(letters[1:3], LETTERS[1:3])

RNGversion("3.3.1")
set.seed(10)
heights <- data.frame(vars[sample(1:9, 5),], height = round(rnorm(5, 160, 15), 2))
weights <- data.frame(vars[sample(1:9, 5),], weight = round(rnorm(5, 60, 10), 2))
```

```R
> heights

  Var1 Var2 height
5 b    B    148.69
3 c    A    150.91
8 b    C    157.34
9 c    C    162.56
1 a    A    163.64

> weights

  Var1 Var2 weight
4 a    B    71.02 
1 a    A    67.56 
2 b    A    57.62 
3 c    A    69.87 
5 b    B    67.41 
```

Write a function **mleft** that takes two arguments, **df1** and **df2**, two data frames:
- The function should **LEFT** join two data frames by **Var1** and **Var2** columns as such:

```R
mleft(df1 = heights, df2 = weights)

  Var1 Var2 height weight
1 a    A    163.64 67.56 
2 b    B    148.69 67.41 
3 b    C    157.34    NA 
4 c    A    150.91 69.87 
5 c    C    162.56    NA 
```

**Hint:** You should use **merge** function

In [None]:
vars <- expand.grid(letters[1:3], LETTERS[1:3])

RNGversion("3.3.1")
set.seed(10)
heights <- data.frame(vars[sample(1:9, 5),], height = round(rnorm(5, 160, 15), 2))
weights <- data.frame(vars[sample(1:9, 5),], weight = round(rnorm(5, 60, 10), 2))
heights
weights

In [None]:
mleft <- function(df1, df2)
{
    merge(df1, df2, by = c("Var1", "Var2"), all.x = T)
}

mleft(df1 = heights, df2 = weights)