<a href="https://colab.research.google.com/github/hewp84/CRT420/blob/main/Pandas_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# PANDAS: Merge, Concatenate, and Join

## Concatenate

The `concat` method in pandas is used to concatenate (combine) pandas objects like Series and DataFrames along a particular axis (either rows or columns). It is particularly useful when you have multiple data sources and you want to combine them into a single DataFrame.

Properties of the concat method:

* Concatenation can be done along either the rows (axis 0) or columns (axis 1).
* It can handle concatenating multiple DataFrames at once.
* It can handle both inner and outer joins.

In [None]:
pandas.concat(objs, axis=0, join='outer', ignore_index=False)

#### Attributes:

* objs: A sequence of pandas objects (e.g., DataFrames or Series) that you want to concatenate.
* axis: The axis along which the concatenation will be performed (0 for rows, 1 for columns).
* join: Type of join to be performed ('inner', 'outer', 'left', 'right').
* ignore_index: If True, the resulting DataFrame will have a new index; if False (default), indexes from the original objects will be preserved.

#### Key Parameters:

* objs: A sequence of pandas objects to concatenate.
* axis: Determines whether to concatenate along rows (0) or columns (1).
* join: Specifies the type of join to perform, which could be 'inner', 'outer', 'left', or 'right'.
* ignore_index: If set to True, the resulting DataFrame will have a new index.

Examples:

Let's consider two dictionaries representing DataFrames as examples:

In [None]:
import pandas as pd

data1 = {'A': [1, 2, 3],
         'B': [4, 5, 6],
         'C': [7, 8, 9],
         'D': [10, 11, 12]}

data2 = {'A': [13, 14, 15],
         'B': [16, 17, 18],
         'C': [19, 20, 21],
         'D': [22, 23, 24]}
         
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
df1

Example 1: Concatenating along rows (axis=0) with the same columns:

In [None]:
result = pd.concat([df1, df2])
result


Example 2: Concatenating along columns (axis=1):

In [None]:
result1 = pd.concat([df1, df2], axis=1)
result1


Example 3: Ignoring original indices and resetting index:

In [None]:
result2 = pd.concat([df1, df2], ignore_index=True)
result2


Example 4: Performing different joins:

In [None]:
result3 = pd.concat([df1, df2], join='outer')
result3


## Merge method

The merge method in pandas is used to combine two or more DataFrames based on common columns or indexes. It's similar to joining tables in SQL. The method performs database-style joins, allowing you to combine datasets using various types of joins.

#### Properties of the merge method:

* Merging can be performed on one or more columns, and it's based on common values.
* Different types of joins like inner, outer, left, and right can be used.
* The merge method allows you to handle duplicate columns that might arise from merging.

#### Syntax:

In [None]:
pandas.merge(left, right, on=None, how='inner',)


Attributes:

* left: The left DataFrame to be merged.
* right: The right DataFrame to be merged.
* on: The column(s) or index(es) to merge on.
* how: The type of join to perform ('inner', 'outer', 'left', 'right').

Examples:


In [None]:
import pandas as pd

data1 = {'ID': [1, 2, 3, 4],
         'Name': ['Alice', 'Bob', 'Charlie', 'David']}

data2 = {'ID': [3, 4, 5, 6],
         'Age': [25, 32, 28, 40]}

df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)


In [None]:
# Example 1: Different Joins on a Common Column:

result_join = pd.merge(df1, df2, on='ID', how='inner')
result_join

In [None]:
#Example 2
data3 = {'EmpID': [2, 3, 4, 7],
         'Salary': [60000, 75000, 80000, 50000]}

df3 = pd.DataFrame(data3)

result_left = pd.merge(df1, df3, left_on='ID', right_on='EmpID', how='left')
result_left


In [None]:
# Example 3
df1 = df1.set_index('ID')
df2 = df2.set_index('ID')

result_outer = pd.merge(df1, df2, left_index=True, right_index=True, how='outer')
print(result_outer)


In [None]:
#Example 4
data4 = {'ID': [1, 2, 3, 4],
         'Name': ['Eve', 'Frank', 'Grace', 'Henry']}

df4 = pd.DataFrame(data4)

result_suffix = pd.merge(df1, df4, on='ID', how='inner', suffixes=('_left', '_right'))
result_suffix


## Join Method

The join() method allows you to join DataFrames together based on an index or column in each DataFrame. It is different from concat() in that it uses the indexes/columns rather than just concatenating. It is different from merge() in that you must specify the index/column to join on rather than merge() allowing specification of multiple join keys.

#### Properties:

* Joins DataFrames together based on specified index or column names
* Result will contain union of indexes/columns from both DataFrames
* Can specify how='left', 'right', 'outer' to control join behavior
* Preserves indexes/columns not used in the join

#### Syntax:

In [None]:
df1.join(df2, how='left', on=None, lsuffix='', rsuffix='')

Examples: 

In [None]:
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}, index=[1, 2])
df2 = pd.DataFrame({'C': [5, 6]}, index=[1, 2])

df1.join(df2)

### Additional examples using World data set

In [None]:
data = pd.read_csv('world-data-2023.csv')
df1 = data[['Country', 'Population']].head(3)
df2 = data[['Country','Land Area(Km2)']].head(3)

pd.concat([df1, df2], axis=0)

In [None]:
df3 = data[['Country','Population']].head(3)
df4 = data[['Country','Birth Rate']].head(3)

pd.merge(df3, df4, on='Country')

In [None]:
# Set index to Country and select Population column
df5 = data.set_index('Country')[['Population']].head(3)  

# Select Land Area column 
df6 = data[['Land Area(Km2)']].head(3) 

# Join on index 
df5.join(df6)