# 1) Pandas merge

- spojenie dvoch DataFrameov, zalozene na ich indexoch alebo stlpcoch
- podobne ako JOIN v SQL
- syntax:
  ```
  pd.merge(left, right, on=None, how='inner', left_on=None, right_on=None, sort=False)
  ```
  - **left:** specifies the left DataFrame to be merged
  - **right:** specifies the right DataFrame to be merged
  - **on (optional):** specifies column(s) to join on
  - **how (optional):** specifies the type of join to perform
  - **left_on (optional):** specifies column(s) from the left DataFrame to use as key(s) for merging
  - **right_on (optional):** specifies column(s) from the right DataFrame to use as key(s) for merging
  - **sort (optional):** if True, sort the result DataFrame by the join keys


In [None]:
import pandas as pd

# create dataframes from the dictionaries
data1 = {
    "EmployeeID": ["E001", "E002", "E003", "E004", "E005"],
    "Name": ["John Doe", "Jane Smith", "Peter Brown", "Tom Johnson", "Rita Patel"],
    "DeptID": ["D001", "D003", "D001", "D002", "D003"],
}
employees = pd.DataFrame(data1)

data2 = {"DeptID": ["D001", "D002", "D003"], "DeptName": ["Sales", "HR", "Admin"]}
departments = pd.DataFrame(data2)

# merge dataframes employees and departments
merged_df = pd.merge(employees, departments)

# display DataFrames
print("Employees:")
print(employees)
print()
print("Departments:")
print(departments)
print()
print("Merged DataFrame:")
print(merged_df)

Employees:
  EmployeeID         Name DeptID
0       E001     John Doe   D001
1       E002   Jane Smith   D003
2       E003  Peter Brown   D001
3       E004  Tom Johnson   D002
4       E005   Rita Patel   D003

Departments:
  DeptID DeptName
0   D001    Sales
1   D002       HR
2   D003    Admin

Merged DataFrame:
  EmployeeID         Name DeptID DeptName
0       E001     John Doe   D001    Sales
1       E002   Jane Smith   D003    Admin
2       E003  Peter Brown   D001    Sales
3       E004  Tom Johnson   D002       HR
4       E005   Rita Patel   D003    Admin


## 1.1) Merge DataFrames based on keys

- ak 2 dataframy nemaju spolocny stlpec, tak ich spajame tym, ze urcime stlpce pomocou, ktorych sa maju spojit


In [None]:
import pandas as pd

# create dataframes from the dictionaries
data1 = {
    "EmployeeID": ["E001", "E002", "E003", "E004", "E005"],
    "Name": ["John Doe", "Jane Smith", "Peter Brown", "Tom Johnson", "Rita Patel"],
    "DeptID1": ["D001", "D003", "D001", "D002", "D006"],
}
employees = pd.DataFrame(data1)

data2 = {
    "DeptID2": ["D001", "D002", "D003", "D004"],
    "DeptName": ["Sales", "HR", "Admin", "Marketing"],
}
departments = pd.DataFrame(data2)

# merge the dataframes
df_merge = pd.merge(
    employees, departments, left_on="DeptID1", right_on="DeptID2", sort=True
)

print(df_merge)

  EmployeeID         Name DeptID1 DeptID2 DeptName
0       E001     John Doe    D001    D001    Sales
1       E003  Peter Brown    D001    D001    Sales
2       E004  Tom Johnson    D002    D002       HR
3       E002   Jane Smith    D003    D003    Admin


# 2) Types of join operations in merge()

- mozme ich specifikovat v **how** argumente f-cie **merge()**
- typy:
  - left join
  - right join
  - outer join
  - inner join (default)
  - cross join


## 2.1) Left join

- spaja 2 dataframy zalozene na spolocnom kluci a vracia novy dataframe, ktory obsahuje vsetky riadky z laveho dataframu a zhodne riadky z praveho dataframu
- ak hodnoty nie su najdene v pravom dataframe, tak vyplni miesta s **NaN**


In [None]:
import pandas as pd

# create dataframes from the dictionaries
data1 = {
    "EmployeeID": ["E001", "E002", "E003", "E004", "E005"],
    "Name": ["John Doe", "Jane Smith", "Peter Brown", "Tom Johnson", "Rita Patel"],
    "DeptID": ["D001", "D003", "D001", "D002", "D006"],
}
employees = pd.DataFrame(data1)

data2 = {
    "DeptID": ["D001", "D002", "D003", "D004"],
    "DeptName": ["Sales", "HR", "Admin", "Marketing"],
}
departments = pd.DataFrame(data2)

# left merge the dataframes
df_merge = pd.merge(employees, departments, on="DeptID", how="left", sort=True)

print(df_merge)

  EmployeeID         Name DeptID DeptName
0       E001     John Doe   D001    Sales
1       E003  Peter Brown   D001    Sales
2       E004  Tom Johnson   D002       HR
3       E002   Jane Smith   D003    Admin
4       E005   Rita Patel   D006      NaN


## 2.2) Right join

- opacne ako left join
- vracia riadky z praveho dataframu a zhodne riadky z laveho dataframu


In [1]:
import pandas as pd

# create dataframes from the dictionaries
data1 = {
    "EmployeeID": ["E001", "E002", "E003", "E004", "E005"],
    "Name": ["John Doe", "Jane Smith", "Peter Brown", "Tom Johnson", "Rita Patel"],
    "DeptID": ["D001", "D003", "D001", "D002", "D006"],
}
employees = pd.DataFrame(data1)

data2 = {
    "DeptID": ["D001", "D002", "D003", "D004"],
    "DeptName": ["Sales", "HR", "Admin", "Marketing"],
}
departments = pd.DataFrame(data2)

# right merge the dataframes
df_merge = pd.merge(employees, departments, on="DeptID", how="right", sort=True)

print(df_merge)

  EmployeeID         Name DeptID   DeptName
0       E001     John Doe   D001      Sales
1       E003  Peter Brown   D001      Sales
2       E004  Tom Johnson   D002         HR
3       E002   Jane Smith   D003      Admin
4        NaN          NaN   D004  Marketing


## 2.3) Inner join

- spaja 2 dataframy so spolocnym klucom a vracia novy dataframe, ktoruy obsahuje iba riadky, ktore sa zhoduju v oboch dataframoch


In [2]:
import pandas as pd

# create dataframes from the dictionaries
data1 = {
    "EmployeeID": ["E001", "E002", "E003", "E004", "E005"],
    "Name": ["John Doe", "Jane Smith", "Peter Brown", "Tom Johnson", "Rita Patel"],
    "DeptID": ["D001", "D003", "D001", "D002", "D006"],
}
employees = pd.DataFrame(data1)

data2 = {
    "DeptID": ["D001", "D002", "D003", "D004"],
    "DeptName": ["Sales", "HR", "Admin", "Marketing"],
}
departments = pd.DataFrame(data2)

# inner merge the dataframes
df_merge = pd.merge(employees, departments, on="DeptID", how="inner", sort=True)

print(df_merge)

  EmployeeID         Name DeptID DeptName
0       E001     John Doe   D001    Sales
1       E003  Peter Brown   D001    Sales
2       E004  Tom Johnson   D002       HR
3       E002   Jane Smith   D003    Admin


## 2.4) Outer join

- oproti inner join vracia novy dataframe so vsetkymi riadkami
- prazdne hodnoty vyplni **NaN**


In [3]:
import pandas as pd

# create dataframes from the dictionaries
data1 = {
    "EmployeeID": ["E001", "E002", "E003", "E004", "E005"],
    "Name": ["John Doe", "Jane Smith", "Peter Brown", "Tom Johnson", "Rita Patel"],
    "DeptID": ["D001", "D003", "D001", "D002", "D006"],
}
employees = pd.DataFrame(data1)

data2 = {
    "DeptID": ["D001", "D002", "D003", "D004"],
    "DeptName": ["Sales", "HR", "Admin", "Marketing"],
}
departments = pd.DataFrame(data2)

# outer merge the dataframes
df_merge = pd.merge(employees, departments, on="DeptID", how="outer", sort=True)

print(df_merge)

  EmployeeID         Name DeptID   DeptName
0       E001     John Doe   D001      Sales
1       E003  Peter Brown   D001      Sales
2       E004  Tom Johnson   D002         HR
3       E002   Jane Smith   D003      Admin
4        NaN          NaN   D004  Marketing
5       E005   Rita Patel   D006        NaN


## 2.5) Cross join

- vytvori "karteziansky sucin" z oboch dataframov, pricom zachovava poradie laveho dataframu


In [4]:
import pandas as pd

# create dataframes from the dictionaries
data1 = {
    "EmployeeID": ["E001", "E002", "E003", "E004", "E005"],
    "Name": ["John Doe", "Jane Smith", "Peter Brown", "Tom Johnson", "Rita Patel"],
    "DeptID": ["D001", "D003", "D001", "D002", "D006"],
}
employees = pd.DataFrame(data1)

data2 = {
    "DeptID": ["D001", "D002", "D003", "D004"],
    "DeptName": ["Sales", "HR", "Admin", "Marketing"],
}
departments = pd.DataFrame(data2)

# merge the dataframes
df_merge = pd.merge(employees, departments, how="cross")

print(df_merge)

   EmployeeID         Name DeptID_x DeptID_y   DeptName
0        E001     John Doe     D001     D001      Sales
1        E001     John Doe     D001     D002         HR
2        E001     John Doe     D001     D003      Admin
3        E001     John Doe     D001     D004  Marketing
4        E002   Jane Smith     D003     D001      Sales
5        E002   Jane Smith     D003     D002         HR
6        E002   Jane Smith     D003     D003      Admin
7        E002   Jane Smith     D003     D004  Marketing
8        E003  Peter Brown     D001     D001      Sales
9        E003  Peter Brown     D001     D002         HR
10       E003  Peter Brown     D001     D003      Admin
11       E003  Peter Brown     D001     D004  Marketing
12       E004  Tom Johnson     D002     D001      Sales
13       E004  Tom Johnson     D002     D002         HR
14       E004  Tom Johnson     D002     D003      Admin
15       E004  Tom Johnson     D002     D004  Marketing
16       E005   Rita Patel     D006     D001    

# 3) Join vs Merge vs Concat

- **join():** spaja 2 dataframy na zaklade indexov, lavy dataframe je predvoleny
- **merge():** spaja 2 dataframy na zaklade stlpcov, inner join je predvoleny
- **concat():** nasklada 2 dataframy pozdlz vertikalnej alebo horizontalnej osi
