# Joining Tables with Pandas
**Explanation** :
 
Joining tables refers to combining data from multiple sources(tables) into a single DataFrame based on common columns.Pandas allows to perform various type of joins(inner,left,right,outer) similar to SQL.

1. **Inner Join** : An inner join returns only row with matching values in both tables.

In [47]:
import pandas as pd
from sqlalchemy import create_engine

In [48]:
engine=create_engine('mysql://root:raghav85299@localhost/test')

In [49]:
#step1: Write an SQL query to perform an inner join between employees and departments
query='''
SELECT employees.name,departments.department
FROM employees
INNER JOIN departments
ON employees.department_id=departments.id'''


In [50]:
#step2: Execute the SQL query and load the joined data into a Pandas DataFrame.
df=pd.read_sql(query,con=engine)

In [51]:
#step3: Display the joined dataframe
print(df)

           name    department
0   Alice Smith   Engineering
1   Bob Johnson  Data Science
2  Diana Prince        Design


- **Explanation** : 
- `INNER JOIN departments ON employees.department_id=departments.id`: This join the `employees` table with the `departments` table where the `department_id` in `employees` matches the `id` in `departments`.
- The resulting DataFrame `df` contains only the rows where there is a match between the two tables

# Inner join with Pandas

In [52]:
import pandas as pd
from sqlalchemy import create_engine

In [53]:
engine=create_engine('mysql://root:raghav85299@localhost/test')

In [54]:
# Step 2: Load the employees and departments tables into Pandas DataFrames
employees_df = pd.read_sql('SELECT * FROM employees', con=engine)
departments_df = pd.read_sql('SELECT * FROM departments', con=engine)

In [55]:
# Step 1: Assume 'employees_df' and 'departments_df' are DataFrames.

# Step 2: Use Pandas' merge() to perform an inner join on the department_id and id columns.

merged_df=pd.merge(employees_df,departments_df,left_on='department_id',right_on='id')

In [56]:
# step3: Display the joined DataFrame
print(merged_df)

   id_x          name           position   salary  department_id  age  id_y  \
0     1   Alice Smith  Software Engineer  75000.0              1   30     1   
1     2   Bob Johnson     Data Scientist  80000.0              2   28     2   
2     4  Diana Prince        UX Designer  72000.0              3   29     3   

     department       location    budget         head  
0   Engineering  San Francisco  500000.0     Jane Doe  
1  Data Science       New York  400000.0   John Smith  
2        Design    Los Angeles  300000.0  Emma Watson  


- **Explanation** : 
- `pd.merge(employees_df`,`departments_df`,`left_on`=`department_id`,`right_on`=`id`):This merge `employees_df` and `departments_df` on the `department_id` column from `employees_df` and the `id` column from `departments_df`.
- The resulting DataFrame merged_df contains only rows where there is a match between the two DataFrames.

2. **Left Join** : 
A left join returns all rows from the left table(first table) and the matching row from the right table(second table).Non-matching rows from the right table will have NAN values.

# Left join with SQL query

In [57]:
import pandas as pd
from sqlalchemy import create_engine

In [58]:
engine=create_engine('mysql://root:raghav85299@localhost/test')

In [59]:
query='''
SELECT employees.name,departments.department
FROM employees
LEFT JOIN departments
ON employees.department_id=departments.id'''

In [60]:
df=pd.read_sql(query,con=engine)


In [61]:
print(df)

            name    department
0    Alice Smith   Engineering
1    Bob Johnson  Data Science
2  Charlie Brown          None
3   Diana Prince        Design


- **Explanation** : 
- `LEFT JOIN departments ON employees.department_id = departments.id`:This joins all rows from `employees` with matching rows from `departments`.If no match is found,the `department` column will have NAN values.
- The resulting DataFrame df includes all employees, even if they do not belong to a department.

# Left join with Pandas

In [62]:
import pandas as pd
from sqlalchemy import create_engine

In [63]:
engine=create_engine('mysql://root:raghav85299@localhost/test')

In [64]:
# Step 2: Load the employees and departments tables into Pandas DataFrames
employees_df = pd.read_sql('SELECT * FROM employees', con=engine)
departments_df = pd.read_sql('SELECT * FROM departments', con=engine)

In [65]:
merged_df=pd.merge(employees_df,departments_df,left_on='department_id',right_on='id',how='left')

In [66]:
print(merged_df)

   id_x           name           position   salary  department_id  age  id_y  \
0     1    Alice Smith  Software Engineer  75000.0              1   30   1.0   
1     2    Bob Johnson     Data Scientist  80000.0              2   28   2.0   
2     3  Charlie Brown    Product Manager  95000.0             35   35   NaN   
3     4   Diana Prince        UX Designer  72000.0              3   29   3.0   

     department       location    budget         head  
0   Engineering  San Francisco  500000.0     Jane Doe  
1  Data Science       New York  400000.0   John Smith  
2           NaN            NaN       NaN          NaN  
3        Design    Los Angeles  300000.0  Emma Watson  


- **Explanation:**
- `pd.merge(employees_df, departments_df, left_on='department_id', right_on='id', how='left')`: This merges employees_df and departments_df using a left join. All rows from employees_df are retained, and non-matching rows in departments_df result in NaN values.
- The resulting DataFrame merged_df contains all employees, even those without a matching department.

3. **Right Join** : 
A right join is the opposite of a left join,returning all rows from the right table and matching rows from the left table.Non-matchine rows from the left table will have NAN Values.

# Right join with sql query

In [67]:
import pandas as pd
from sqlalchemy import create_engine

In [68]:
engine=create_engine('mysql://root:raghav85299@localhost/test')

In [69]:
query='''
SELECT employees.name,departments.department
FROM employees
RIGHT JOIN departments
ON employees.department_id=departments.id'''

In [70]:
df=pd.read_sql(query,con=engine)

In [71]:
print(df)

           name    department
0   Alice Smith   Engineering
1   Bob Johnson  Data Science
2  Diana Prince        Design


- **Explanation:**
- `RIGHT JOIN departments ON employees.department_id = departments.id`: This joins all rows from departments with matching rows from employees. If no match is found, the name column will have NaN values.
- The resulting DataFrame df includes all departments, even if no employees belong to them.

# Right join with pandas

In [72]:
import pandas as pd
from sqlalchemy import create_engine

In [73]:
engine=create_engine('mysql://root:raghav85299@localhost/test')

In [74]:

employees_df = pd.read_sql('SELECT * FROM employees', con=engine)
departments_df = pd.read_sql('SELECT * FROM departments', con=engine)

In [75]:
merged_df = pd.merge(employees_df, departments_df, left_on='department_id', right_on='id', how='right')

In [76]:
print(merged_df)

   id_x          name           position   salary  department_id  age  id_y  \
0     1   Alice Smith  Software Engineer  75000.0              1   30     1   
1     2   Bob Johnson     Data Scientist  80000.0              2   28     2   
2     4  Diana Prince        UX Designer  72000.0              3   29     3   

     department       location    budget         head  
0   Engineering  San Francisco  500000.0     Jane Doe  
1  Data Science       New York  400000.0   John Smith  
2        Design    Los Angeles  300000.0  Emma Watson  


# Explanation
- `pd.merge(employees_df, departments_df, left_on='department_id', right_on='id', how='right')`: This merges employees_df and departments_df using a right join. All rows from departments_df are retained, and non-matching rows in employees_df result in NaN values.
- The resulting DataFrame merged_df contains all departments, even if they have no associated employees.

4. **Outer jon**
An outer join return all rows from both tables with NAN values for non matching rows in either table

# Outer join with SQL query

In [77]:
import pandas as pd
from sqlalchemy import create_engine

In [78]:
engine=create_engine('mysql://root:raghav85299@localhost/test')

In [86]:
query='''
SELECT employees.name,departments.department
FROM employees
OUTER JOIN departments
ON employees.department_id=departments.id'''