# Merging DataFrames

##  1. Import and Load data from the CSV files

We first load the necessary Python package using the <b>import</b> function. After that, we have to import the data from the CSV files. Towards this end, we use the <b>pd.read_csv</b> function, which is included in the pandas package. Reading in the data with this function returns dataframes.

In [None]:
import pandas as pd
orders = pd.read_csv('../input/orders.csv')
aisles = pd.read_csv('../input/aisles.csv')
departments = pd.read_csv('../input/departments.csv')
products = pd.read_csv('../input/products.csv')

## 2. View and Understand data

We now use the <b>head()</b> function in order to visualise the first 5 rows of these dataframes.

In [None]:
products.head()

In [None]:
aisles.head()

In [None]:
departments.head()

What we can notice looking the dataframes of products, aisles and departments?

Answer: The dataframe of products includes some same columns that we can see in the dataframes of aisles and departments. This means that we can <b>merge</b> these dataframes into a new one. 

## 3. Create a final_products dataframe

In order to create a merged dataframe, we need to join the dataframes we have. We create a new dataframe final_products which contains the dataframes products, aisles and departments. We can see that product dataframe includes the columns "aisle_id" and "department_id" which are common columns at aisles and departments dataframes too.  Towards this end, we use the merge() function, which performs a join operation by columns or indexes.

First of all we have to choose the right type of join in order to create a dataframe with the data we want. There are four types of join:
1. (INNER) JOIN: Returns records(rows) that have matching values in both dataframes.
2. LEFT (OUTER) JOIN: Return all records from the leftdataframes, and the matched records from the right dataframes.
3. RIGHT (OUTER) JOIN: Return all records from the right dataframes, and the matched records from the left dataframes.
4. FULL (OUTER) JOIN: Return all records when there is a match in either left or right dataframes.
<img src="https://imgur.com/yLDkld9.png" width="400">

### 3.1 Merge of products and aisles dataframes

The new_products (the merged) dataframe should have only the data we want. We need all rows and columns from this dataframe, and the column <b>"aisle"</b> from aisles dataframe. According to the shape above, we understand that we need to use the left join.  


In [None]:
new_products = pd.merge(products,aisles,on="aisle_id", how="left")
new_products.head()

The function we used is the: <b><i>pd.merge(products,aisles)</i></b> Let's explain what happened above:
* The function by default (without any expressions inside it) makes an <b>inner join</b> between the two dataframes. That's why we used the expression <b>how="left"</b>. Using the expression "how" we can use the four types of joins that exist.
* The function by default uses the common column in order to make the join we asked for. In our example we use the expression <b>on="aisle_id"</b> so we can emphasize at the common column of our dataframes. If you try to run the code without this expression you will see that we will get the same result. Can you imagine in which case, this expression is useful for us?

Answer:
If we want to merge two dataframes which have more than one common columns, we should use the expression "on" to indicate the column or the columns that the function will use. E.x:

<i>merge( x, y, on="key")</i>    
<i>merge( x, y, on=["first_key","second_key"])</i>

### 3.2 Merge of new_products and departments dataframes

In this section, we would like to merge the "new_products" that we created before with the departments dataframe. In order to study a more complicated case of the function "merge()", we will make a small change at the column names of the departments dataframe. We set new labels at columns' names. So let's take it as the default situation and see how we can handle it. 

In [None]:
 new_products.columns = ['product_id','product_name', "aisle_id", "departments_id","aisle"]
 new_products.head()

Looking the "head()" function above we can observe that the name of the column <b>"department_id"</b> is now <b>"departments_id"</b>. As we said previously we would like to merge the dataframe "new_products" with the "departments". Looking more carefully we will see that the common column is the department_id but its name is different between the two dataframes. How will we handle it?

In [None]:
final_products = pd.merge(new_products,departments,left_on="departments_id",right_on="department_id",how="left")
final_products.head()

The function we used is the: <b><i>pd.merge(new_products,departments,left_on="departments_id",right_on="department_id",how="left")</i></b> Let's explain what happened above:
* We used the expression <b>how="left"</b> as it happened before.
* We used the expressions <b>left_on</b> and <b>right_on</b> in order to specify which columns should be used for the merging.

### 3.3 Delete unnecessary columns

Finally, in this section we will delete some columns which are not useful for our new dataframe. This columns are the "aisles_id", "departments_id", "department_id". 

In [None]:
del final_products["aisle_id"]
del final_products["departments_id"]
del final_products["department_id"]
final_products.head()