# Concat and Join
Sometimes we have data which might belong together but somehow ended up in two different DataFrames, or scattered across a DataFrame and a Series. Pandas has excellent documentation on how to leverage ```merge```, ```join```, and ```concat```. I find myself using ```join``` and ```concat``` the most so I will review both of these pandas methods. Execute the code below to generate some data for us to play with. The link below contains an excellent review of this content;

Link: <a href="https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html">Official Pandas Documentation</a>

In [None]:
# provided code
import pandas as pd
from IPython.display import display

# create a dataframe of names
names_df_1 = pd.DataFrame(columns=['First Name', 'Last Name'],
                         data=[['Nick', 'Pollari'],
                               ['Brooke', 'Golob'],
                               ['Donald', 'Trump'],
                               ['Hillary', 'Clinton']]
                         )
display('names_df_1')
display(names_df_1)

# create another dataframe of names
names_df_2 = pd.DataFrame(columns=['First Name', 'Last Name'],
                         data=[['Bernie', 'Sanders'],
                               ['Will', 'Ferrell']
                               ]
                         )
display('names_df_2')
display(names_df_2)

# create a series of Last Names tied to Trustworthyness
trust = pd.Series(index=['Pollari', 'Golob', 'Trump', 'Clinton', 'Sanders', 'Ferrell'],
                 data=[100, 100, 5, 5, 20, 50],
                 name='trust_level')
display('trust')
display(trust)

# create a series of Last Names tied to Net Worth (In $millions)
net_worth = pd.Series(index=['Pollari', 'Golob', 'Trump', 'Clinton', 'Sanders', 'Ferrell', 'Gates'],
                 data=[0, 0, 4000, 20, 1, 100, 100000],
                 name='net_worth')
display('net_worth')
display(net_worth)

## Concat
Concatenation works by providing a list of DataFrames or Series, but not both. The syntax for ```concat``` is below;
<br><br>
<font size=4px>
pandas.concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=None, copy=True)
</font>
<br>
<font size=2px>
Source: <a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html">Official Pandas Documentation</a>
</font>
<br><br>
Theres a lot going on here so we are only going to focus on the ```objs``` and the ```axis``` arguments. Generally speaking you don't need to focus on the others as when you are considering those other arguments you are usually better off working with the ```.join()``` or ```.merge()``` methodologies.

The ```objs``` in the function refers to the list of DataFrames or Series that you want to work with. By setting ```axis=0``` we are telling pandas to stack each dataset in ```objs``` vertically.
<br><br>
#### Exercise
Call ```pd.concat()``` on a list containing ```names_df_1``` and ```names_df_2``` and name the resulting object ```names_df_final```. Then display the results of ```names_df_final``` in the cell.

In [None]:
# your code goes here

The first thing you should notice is that our index now has duplicate values in it, it is now ```[0, 1, 2, 3, 0, 1]```. Pandas has essentially appended ```names_df_2``` to the end of ```names_df_1```. It matches values based on their respective column name. To fix our ```index``` issue we need to call ```.reset_index(drop=True)``` on our ```names_df_final```. Execute the code in the cell below.

In [None]:
# provided code
names_df_final.reset_index(drop=True, inplace=True)
names_df_final

So now we have established that by calling ```pd.concat()``` and ```axis=0``` we are stacking things vertically. Lets try creating a DataFrame from the two Series ```trust``` and ```net_worth```. Since we basically have two columns and we want to concatenate them horizontally (**and along their index, this is very important to understand**) we will just need to set ```axis=1```.
<br><br>
#### Exercise
Using pandas ```pd.concat()``` perform a concatenation between ```trust``` and ```net_worth``` using ```axis=1```. Set the final object to be ```trust_and_money_df```.

In [None]:
# your code goes here

Notice that ```Gates``` doesn't have a ```trust_level``` and so Pandas automatically puts a ```NaN``` in the corresponding cell for him.
<br><br>
## Join
Now we are going to look at taking a DataFrame and a Series and combining them using the ```.join()``` method of the DataFrame.
<br><br>
<font size=4px>
**
DataFrame.join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False)
**
</font>
<br>
<font size=2px>
Source: <a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.join.html">Official Pandas Documentation</a>
</font>
<br><br>
What is important to understand is that what we are doing is taking the ```index``` of the Series (in this case, the argument ```other```) and matching that to the values of the ```index``` of our DataFrame (if ```on=None```) or matching that to the values in a column(s) of our DataFrame (if ```on=[column_list]```).

The first thing we are going to do is take ```net_worth``` and join it to ```names_df_final``` along the column ```'Last Name'```. This is possible because the ```index``` of ```net_worth``` is the last names in the ```'Last Name'``` column of ```names_df_final```. Execute the code below to see what I mean.

In [None]:
# provided code
names_df_final.join(net_worth, on=['Last Name'])

If we change the ```how``` statement in the ```join``` to be ```'outer'``` instead of ```'left'``` we would include the element ```Gates``` from ```net_worth```. Execute the code below to see what I mean;

In [None]:
# provided code
names_df_final.join(net_worth, on=['Last Name'], how='outer')

You can use the image below to get a better understanding of how the join/merge/concat ```how``` argument works. 

<br>
<img src="./img/join-types-merge-names.jpg">
<br>