<a href="https://colab.research.google.com/github/shrishti-04/DataAnalytics_Pandas_Numpy_Matplotlib_Seaborn/blob/master/Concatenation%2C_Merging%2C_and_Appending.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1. DATAFRAME CONCATENATION

In [1]:
# Import Pandas

import pandas as pd

In [2]:
# Creating a dataframe from a dictionary
# Let's define a dataframe with a list of bank clients with IDs = 1, 2, 3, 4, 5 
# Check this out: https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html


bank_info = {'Bank Client ID': ['1', '2', '3', '4', '5'],
            'First Name': ['Nancy', 'Alex', 'Shep', 'Max', 'Allen'],
            'Last Name': ['Rob', 'Ali', 'George', 'Mitch', 'Steve']}

bank1_df = pd.DataFrame(bank_info, columns=['Bank Client ID', 'First Name', 'Last Name'])
bank1_df

Unnamed: 0,Bank Client ID,First Name,Last Name
0,1,Nancy,Rob
1,2,Alex,Ali
2,3,Shep,George
3,4,Max,Mitch
4,5,Allen,Steve


In [3]:
# Let's define another dataframe for a separate list of clients (IDs = 6, 7, 8, 9, 10)

bank_info_2 = {'Bank Client ID': ['6', '7', '8', '9', '10'],
            'First Name': ['Bill', 'Dina', 'Sarah', 'Heather', 'Holy'],
            'Last Name': ['Christian', 'Mo', 'Steve', 'Bob', 'Michelle']}

bank2_df = pd.DataFrame(bank_info_2, columns=['Bank Client ID', 'First Name', 'Last Name'])
bank2_df

Unnamed: 0,Bank Client ID,First Name,Last Name
0,6,Bill,Christian
1,7,Dina,Mo
2,8,Sarah,Steve
3,9,Heather,Bob
4,10,Holy,Michelle


In [4]:
# Let's concatenate both dataframes #1 and #2
# Note that we now have client IDs from 1 to 10
# Note that by default ignore_index has been set to False meaning indexes from both dataframes are kept unchanged

bank_all_df = pd.concat([bank1_df, bank2_df])
bank_all_df

Unnamed: 0,Bank Client ID,First Name,Last Name
0,1,Nancy,Rob
1,2,Alex,Ali
2,3,Shep,George
3,4,Max,Mitch
4,5,Allen,Steve
0,6,Bill,Christian
1,7,Dina,Mo
2,8,Sarah,Steve
3,9,Heather,Bob
4,10,Holy,Michelle


In [5]:
# Let's concatenate both dataframes #1 and #2
# Note that by setting ignore_index = True, the index has been automatically set to numeric and now ranges from 1 to 9

bank_all_df = pd.concat([bank1_df, bank2_df], ignore_index = True)

In [6]:
bank_all_df

Unnamed: 0,Bank Client ID,First Name,Last Name
0,1,Nancy,Rob
1,2,Alex,Ali
2,3,Shep,George
3,4,Max,Mitch
4,5,Allen,Steve
5,6,Bill,Christian
6,7,Dina,Mo
7,8,Sarah,Steve
8,9,Heather,Bob
9,10,Holy,Michelle


In [7]:
# You can also use the append method to perform similar task
# Note that order matters!

bank_all_df = bank2_df.append(bank1_df, ignore_index = True)
bank_all_df

Unnamed: 0,Bank Client ID,First Name,Last Name
0,6,Bill,Christian
1,7,Dina,Mo
2,8,Sarah,Steve
3,9,Heather,Bob
4,10,Holy,Michelle
5,1,Nancy,Rob
6,2,Alex,Ali
7,3,Shep,George
8,4,Max,Mitch
9,5,Allen,Steve


In [8]:
# You can also use the append method to perform similar task 

bank_all_df = bank1_df.append(bank2_df, ignore_index = True)
bank_all_df

Unnamed: 0,Bank Client ID,First Name,Last Name
0,1,Nancy,Rob
1,2,Alex,Ali
2,3,Shep,George
3,4,Max,Mitch
4,5,Allen,Steve
5,6,Bill,Christian
6,7,Dina,Mo
7,8,Sarah,Steve
8,9,Heather,Bob
9,10,Holy,Michelle


**MINI CHALLENGE #1:**
- **Assume that you and your significant other become a new client at the bank and would like to add your first names, last names and unique client IDs. Define a new DataFrame and add it to the master list "bank_all_df"** 

In [9]:
bank_info_3 = {'Bank Client ID': ['11', '12'],
            'First Name': ['Shrishti', 'Justin'],
            'Last Name': ['Tiwari', 'Bieber']}

bank3_df = pd.DataFrame(bank_info_3, columns = ['Bank Client ID', 'First Name', 'Last Name'])
bank3_df

Unnamed: 0,Bank Client ID,First Name,Last Name
0,11,Shrishti,Tiwari
1,12,Justin,Bieber


In [10]:
bank_all_df = bank_all_df.append(bank3_df, ignore_index = True)
bank_all_df

Unnamed: 0,Bank Client ID,First Name,Last Name
0,1,Nancy,Rob
1,2,Alex,Ali
2,3,Shep,George
3,4,Max,Mitch
4,5,Allen,Steve
5,6,Bill,Christian
6,7,Dina,Mo
7,8,Sarah,Steve
8,9,Heather,Bob
9,10,Holy,Michelle


# 2. DATAFRAME CONCATENATION WITH MULTI-INDEXING

In [11]:
# We can perform concatenation and also use multi-indexing dataframe as follows:

bank1_df

Unnamed: 0,Bank Client ID,First Name,Last Name
0,1,Nancy,Rob
1,2,Alex,Ali
2,3,Shep,George
3,4,Max,Mitch
4,5,Allen,Steve


In [12]:
bank2_df

Unnamed: 0,Bank Client ID,First Name,Last Name
0,6,Bill,Christian
1,7,Dina,Mo
2,8,Sarah,Steve
3,9,Heather,Bob
4,10,Holy,Michelle


In [13]:
# You can access elements using multi-indexing as follows

bank_all_df = pd.concat([bank1_df, bank2_df], keys = ['Customer Group 1', 'Customer Group 2'])
bank_all_df

Unnamed: 0,Unnamed: 1,Bank Client ID,First Name,Last Name
Customer Group 1,0,1,Nancy,Rob
Customer Group 1,1,2,Alex,Ali
Customer Group 1,2,3,Shep,George
Customer Group 1,3,4,Max,Mitch
Customer Group 1,4,5,Allen,Steve
Customer Group 2,0,6,Bill,Christian
Customer Group 2,1,7,Dina,Mo
Customer Group 2,2,8,Sarah,Steve
Customer Group 2,3,9,Heather,Bob
Customer Group 2,4,10,Holy,Michelle


In [14]:
# You can access elements using multi-indexing as follows

bank_all_df.loc[('Customer Group 1'), :]

Unnamed: 0,Bank Client ID,First Name,Last Name
0,1,Nancy,Rob
1,2,Alex,Ali
2,3,Shep,George
3,4,Max,Mitch
4,5,Allen,Steve


In [15]:
bank_all_df.loc[('Customer Group 1'), 0]

Bank Client ID        1
First Name        Nancy
Last Name           Rob
Name: (Customer Group 1, 0), dtype: object

In [16]:
# You can access elements using multi-indexing as follows

bank_all_df.loc[('Customer Group 2'), 'First Name']

0       Bill
1       Dina
2      Sarah
3    Heather
4       Holy
Name: First Name, dtype: object

In [17]:
bank_all_df.loc[('Customer Group 2'), ['First Name', 'Last Name']]

Unnamed: 0,First Name,Last Name
0,Bill,Christian
1,Dina,Mo
2,Sarah,Steve
3,Heather,Bob
4,Holy,Michelle


**MINI CHALLENGE #2:**
- **Assume that you and your significant other belong to Customers Group #3. Use multindexing to add both names to the master list. Write a line of code to access Group #3 only.**

In [18]:
bank3_df

Unnamed: 0,Bank Client ID,First Name,Last Name
0,11,Shrishti,Tiwari
1,12,Justin,Bieber


In [19]:
bank_all_df = pd.concat([bank1_df, bank2_df, bank3_df], keys = ['Customer Group 1', 'Customer Group 2', 'Customer Group 3'])
bank_all_df

Unnamed: 0,Unnamed: 1,Bank Client ID,First Name,Last Name
Customer Group 1,0,1,Nancy,Rob
Customer Group 1,1,2,Alex,Ali
Customer Group 1,2,3,Shep,George
Customer Group 1,3,4,Max,Mitch
Customer Group 1,4,5,Allen,Steve
Customer Group 2,0,6,Bill,Christian
Customer Group 2,1,7,Dina,Mo
Customer Group 2,2,8,Sarah,Steve
Customer Group 2,3,9,Heather,Bob
Customer Group 2,4,10,Holy,Michelle


In [20]:
bank_all_df.loc[('Customer Group 3'),:]

Unnamed: 0,Bank Client ID,First Name,Last Name
0,11,Shrishti,Tiwari
1,12,Justin,Bieber


In [21]:
bank_all_df.loc[('Customer Group 3'), 1]

Bank Client ID        12
First Name        Justin
Last Name         Bieber
Name: (Customer Group 3, 1), dtype: object

# 3. DATA MERGING

In [22]:
# Let's concatenate both dataframes #1 and #2
# Note that we now have client IDs from 1 to 10
# Note that by default ignore_index has been set to False meaning indexes from both dataframes are kept unchanged

bank_all_df = pd.concat([bank1_df, bank2_df], ignore_index = True)
bank_all_df

Unnamed: 0,Bank Client ID,First Name,Last Name
0,1,Nancy,Rob
1,2,Alex,Ali
2,3,Shep,George
3,4,Max,Mitch
4,5,Allen,Steve
5,6,Bill,Christian
6,7,Dina,Mo
7,8,Sarah,Steve
8,9,Heather,Bob
9,10,Holy,Michelle


In [23]:
# Let's assume we obtained additional information (Annual Salary) about our bank customers 
# Note that data obtained is for all clients with IDs 1 to 10

raw_data = {'Bank Client ID': ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10'],
            'Annual Salary ($/Year)': [25000, 34000, 22000, 43000, 27000, 56000, 36000, 21000, 47000, 53000]}

bank_annual_salary = pd.DataFrame(raw_data)
bank_annual_salary

Unnamed: 0,Bank Client ID,Annual Salary ($/Year)
0,1,25000
1,2,34000
2,3,22000
3,4,43000
4,5,27000
5,6,56000
6,7,36000
7,8,21000
8,9,47000
9,10,53000


In [24]:
# Let's merge all data on 'Bank Client ID'

bank_all_df = pd.merge(bank_all_df, bank_annual_salary, on = 'Bank Client ID')
bank_all_df

Unnamed: 0,Bank Client ID,First Name,Last Name,Annual Salary ($/Year)
0,1,Nancy,Rob,25000
1,2,Alex,Ali,34000
2,3,Shep,George,22000
3,4,Max,Mitch,43000
4,5,Allen,Steve,27000
5,6,Bill,Christian,56000
6,7,Dina,Mo,36000
7,8,Sarah,Steve,21000
8,9,Heather,Bob,47000
9,10,Holy,Michelle,53000


**MINI CHALLENGE #3:**
- **Let's assume that you were able to obtain two new pieces of information about the bank clients such as: (1) credit card debt, (2) age**
- **Define a new DataFrame that contains this new information**
- **Merge this new information to the DataFrame "bank_all_df".** 

In [25]:
bank_all_df = pd.concat([bank1_df, bank2_df], ignore_index = True)
bank_all_df

Unnamed: 0,Bank Client ID,First Name,Last Name
0,1,Nancy,Rob
1,2,Alex,Ali
2,3,Shep,George
3,4,Max,Mitch
4,5,Allen,Steve
5,6,Bill,Christian
6,7,Dina,Mo
7,8,Sarah,Steve
8,9,Heather,Bob
9,10,Holy,Michelle


In [26]:
raw2_data = {'Bank Client ID': ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10'],
             'Credit Card Dept': [350, 270, 400, 340, 200, 480, 500, 420, 220, 380],
             'Age': [29, 22, 35, 28, 21, 38, 41, 39, 23, 37]}

bank_more_info = pd.DataFrame(raw2_data)
bank_more_info

Unnamed: 0,Bank Client ID,Credit Card Dept,Age
0,1,350,29
1,2,270,22
2,3,400,35
3,4,340,28
4,5,200,21
5,6,480,38
6,7,500,41
7,8,420,39
8,9,220,23
9,10,380,37


In [27]:
bank_all_df = pd.merge(bank_all_df, bank_more_info, on = 'Bank Client ID')
bank_all_df

Unnamed: 0,Bank Client ID,First Name,Last Name,Credit Card Dept,Age
0,1,Nancy,Rob,350,29
1,2,Alex,Ali,270,22
2,3,Shep,George,400,35
3,4,Max,Mitch,340,28
4,5,Allen,Steve,200,21
5,6,Bill,Christian,480,38
6,7,Dina,Mo,500,41
7,8,Sarah,Steve,420,39
8,9,Heather,Bob,220,23
9,10,Holy,Michelle,380,37
