# Merging Datasets

### Load required Tidyverse packages

In [1]:
library(readr)
library(dplyr)
library(tidyr)
library(ggplot2)
library(knitr)


Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union



### Using "World Cup" Dataset from "faraway" package
- Using locally saved csv file

In [2]:
worldcup <- read_csv("Data/worldcup.csv")
worldcup <- worldcup %>%
  select(player_name, everything()) %>%
  rename(Player_Name = player_name)

head(worldcup)

Parsed with column specification:
cols(
  Team = col_character(),
  Position = col_character(),
  Time = col_integer(),
  Shots = col_integer(),
  Passes = col_integer(),
  Tackles = col_integer(),
  Saves = col_integer(),
  player_name = col_character()
)


Player_Name,Team,Position,Time,Shots,Passes,Tackles,Saves
Abdoun,Algeria,Midfielder,16,0,6,0,0
Abe,Japan,Midfielder,351,0,101,14,0
Abidal,France,Defender,180,0,91,6,0
Abou Diaby,France,Midfielder,270,1,111,5,0
Aboubakar,Cameroon,Forward,46,2,16,0,0
Abreu,Uruguay,Forward,72,0,15,0,0


### Merging "Team Standings" Dataset to "World Cup" Dataset
- Read in "Team Standings" csv file
- Provide column types explicitly (integer, character)

In [3]:
team_standings <- read_csv("Data/team_standings.csv", col_types="ic")
sample_n(team_standings, 10)

Standing,Team
4,Uruguay
25,Greece
23,Serbia
14,Mexico
26,Italy
1,Spain
22,New Zealand
12,USA
2,Netherlands
7,Ghana


## Using *_join() family of functions:
- left_join(): Includes all observations in the left data frame, whether or not there is a match in the right data frame
- right_join(): Includes all observations in the right data frame, whether or not there is a match in the left data frame
- inner_join(): Includes only observations that are in both data frames
- full_join(): Includes all observations from both data frames

### left_join()
- left_join(worldcup, team_standings, by="Team")
- Would include all rows from "worldcup" dataset whether or not the player had a team listed in team_standings

In [4]:
left_join(worldcup, team_standings, by="Team") %>% 
  slice(1:10)

Player_Name,Team,Position,Time,Shots,Passes,Tackles,Saves,Standing
Abdoun,Algeria,Midfielder,16,0,6,0,0,28
Abe,Japan,Midfielder,351,0,101,14,0,9
Abidal,France,Defender,180,0,91,6,0,29
Abou Diaby,France,Midfielder,270,1,111,5,0,29
Aboubakar,Cameroon,Forward,46,2,16,0,0,31
Abreu,Uruguay,Forward,72,0,15,0,0,4
Addy,Ghana,Defender,138,0,51,2,0,7
Adiyiah,Ghana,Forward,33,0,9,0,0,7
Afellay,Netherlands,Midfielder,21,5,22,0,0,2
Afolabi,Nigeria,Defender,103,0,38,1,0,27


## Example:
- Create a table of top 10 players by Shots as well as final Standings for each of player's team
- From "worldcup" Dataset, select Name, Position, Shots, and Team columns
- Arrange by Shots in descending order, using arrange() function
- Pipe that into a Left Join with left_join() function to join with Team Standings
- Use kable() from "knitr" package to display

In [6]:
worldcup %>%
  select(Player_Name, Position, Shots, Team) %>%
  arrange(desc(Shots)) %>%
  head()

Player_Name,Position,Shots,Team
Gyan,Forward,27,Ghana
Villa,Forward,22,Spain
Messi,Forward,21,Argentina
Suarez,Forward,19,Uruguay
Forlan,Forward,18,Uruguay
Ronaldo,Forward,18,Portugal


In [7]:
worldcup %>%
 select(Player_Name, Position, Shots, Team) %>%
 arrange(desc(Shots)) %>% 
 left_join(team_standings, by="Team") %>%
 rename(Team_Standing = Standing) %>%
 slice(1:10) %>%
 kable()



|Player_Name    |Position   | Shots|Team        | Team_Standing|
|:--------------|:----------|-----:|:-----------|-------------:|
|Gyan           |Forward    |    27|Ghana       |             7|
|Villa          |Forward    |    22|Spain       |             1|
|Messi          |Forward    |    21|Argentina   |             5|
|Suarez         |Forward    |    19|Uruguay     |             4|
|Forlan         |Forward    |    18|Uruguay     |             4|
|Ronaldo        |Forward    |    18|Portugal    |            11|
|Podolski       |Forward    |    17|Germany     |             3|
|Dempsey        |Midfielder |    15|USA         |            12|
|Sneijder       |Midfielder |    15|Netherlands |             2|
|Park Chu-Young |Forward    |    14|South Korea |            15|