# Joining Data
---

Incompelte data sets, whether they are missing individual values or full rows and columns, are a common problem in data analysis. Luckily for us, Pandas has lots of tools to help us make these data sets easier to handle.

In [1]:
import numpy as np
import pandas as pd

match1 = pd.DataFrame({
  'Opponent': ['Selche FC'],
  'GoalsFor': [1],
  'GoalsAgainst': [1],
  'Attendance': [53225]
})

match2 = pd.DataFrame({
  'Opponent': ['Sudaton FC'],
  'GoalsFor': [3],
  'GoalsAgainst': [0],
  'Attendance': [53256]
})

match3 = pd.DataFrame({
  'Opponent': ['Ouestjambon United'],
  'GoalsFor': [4],
  'GoalsAgainst': [1],
  'Attendance': [53225]
})

match3

Unnamed: 0,Opponent,GoalsFor,GoalsAgainst,Attendance
0,Ouestjambon United,4,1,53225


## Concatenation

The simplest method to jolin data os to concatenate them with the `pd.concat()` method.

In [2]:
AllMatches = pd.concat([match1, match2, match3])
AllMatches

Unnamed: 0,Opponent,GoalsFor,GoalsAgainst,Attendance
0,Selche FC,1,1,53225
0,Sudaton FC,3,0,53256
0,Ouestjambon United,4,1,53225


## Merging

`pd.merge()` will allow us to stick data together left-to-right. First, let's create more dtails for our matches above that we can then merge.

In [4]:
match1scorers = pd.DataFrame({
  'First': ['Sally'],
  'Last': ['Billy'],
  'Opponent': ['Selche FC']
})

match2scorers = pd.DataFrame({
  'First': ['Sally'],
  'Last': ['Pip'],
  'Opponent': ['Sudaton FC']
})

match3scorers = pd.DataFrame({
  'First': ['Sally'],
  'Last': ['Sally'],
  'Opponent': ['Ouestjambon United']
})

AllScorers = pd.concat([match1scorers, match2scorers, match3scorers])
AllScorers

Unnamed: 0,First,Last,Opponent
0,Sally,Billy,Selche FC
0,Sally,Pip,Sudaton FC
0,Sally,Sally,Ouestjambon United


In [5]:
pd.merge(AllMatches, AllScorers, how='inner', on='Opponent')

Unnamed: 0,Opponent,GoalsFor,GoalsAgainst,Attendance,First,Last
0,Selche FC,1,1,53225,Sally,Billy
1,Sudaton FC,3,0,53256,Sally,Pip
2,Ouestjambon United,4,1,53225,Sally,Sally


This is like SQL. Here I just did an inner join where the opponent columns are connected.