# 1) Pandas concatenation

- prida jeden dataframe k druhemu pozdlz osi (vertikalnej alebo horizontalnej)
- podobne **SQL UNION ALL** operacii
- syntax
  ```
  pd.concat(objs, axis=0, join='outer', ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=False, copy=True)
  ```
  - **objs:** sequence of Series or DataFrame objects
  - **axis (optional):** the axis to concatenate along
  - **join (optional):** the type of join to perform
  - **ignore_index (optional):** if True, it will not use the index values on the concatenation axis and will result in a default integer index
  - **keys (optional):** used to construct hierarchical index using the passed keys as the outermost level
  - **verify_integrity (optional):** If True, it checks whether the new concatenated axis contains duplicates and raises ValueError if duplicates are found
  - **sort (optional):** sorts the non-concatenation axis if it is not already aligned


In [1]:
import pandas as pd

# create dataframes
df1 = pd.DataFrame({"A": ["A0", "A1"], "B": ["B0", "B1"]}, index=[0, 1])

df2 = pd.DataFrame({"A": ["A2", "A3"], "B": ["B2", "B3"]}, index=[2, 3])

# concatenate two dataframes
result = pd.concat([df1, df2])

print(result)

    A   B
0  A0  B0
1  A1  B1
2  A2  B2
3  A3  B3


## 1.1) concat() with arguments

- pr. s **ignore_index** a **sort**


In [2]:
import pandas as pd

# create dataframes
df1 = pd.DataFrame(
    {
        "Name": ["John", "Alice", "Bob"],
        "Age": [25, 30, 35],
        "City": ["New York", "Paris", "London"],
    }
)

df2 = pd.DataFrame(
    {
        "Name": ["Emily", "Michael", "Sophia", "Rita"],
        "Age": [28, 32, 27, 22],
        "City": ["Berlin", "Tokyo", "Sydney", "Delhi"],
    }
)

# concatenate dataframes while ignoring index
result_ignore_index = pd.concat([df1, df2], ignore_index=True)

# concatenate dataframes and sort the result
result_sort = pd.concat([df1, df2], sort=True)

# display the concatenated results
print("ignore_index = True\n", result_ignore_index)
print("\nsort = True\n", result_sort)

ignore_index = True
       Name  Age      City
0     John   25  New York
1    Alice   30     Paris
2      Bob   35    London
3    Emily   28    Berlin
4  Michael   32     Tokyo
5   Sophia   27    Sydney
6     Rita   22     Delhi

sort = True
    Age      City     Name
0   25  New York     John
1   30     Paris    Alice
2   35    London      Bob
0   28    Berlin    Emily
1   32     Tokyo  Michael
2   27    Sydney   Sophia
3   22     Delhi     Rita


## 1.2) concat() along axis 1 (horizontalna os)

- spojenie pozdlz stlpcov


In [3]:
import pandas as pd

# create dataframes
df1 = pd.DataFrame(
    {
        "Name": ["John", "Alice", "Bob"],
        "Age": [25, 30, 35],
        "City": ["New York", "Paris", "London"],
    }
)

df2 = pd.DataFrame(
    {
        "Name": ["Emily", "Michael", "Sophia", "Rita"],
        "Age": [28, 32, 27, 22],
        "City": ["Berlin", "Tokyo", "Sydney", "Delhi"],
    }
)

# concatenate dataframes along axis 1
result = pd.concat([df1, df2], axis=1)

print(result)

    Name   Age      City     Name  Age    City
0   John  25.0  New York    Emily   28  Berlin
1  Alice  30.0     Paris  Michael   32   Tokyo
2    Bob  35.0    London   Sophia   27  Sydney
3    NaN   NaN       NaN     Rita   22   Delhi


## 1.3) Inner vs Outer join

- **inner join:** vrati dataframe, ktory obsahuje iba riadky so spolocnymi hodnotami. **Nie je default.**
- **outre join:** vrati dataframe, ktory obsahuje vsetky riadky, prazdne hodnoty vyplni Nan. **Je default**


In [4]:
import pandas as pd

# create dataframes
df1 = pd.DataFrame(
    {
        "Name": ["John", "Alice", "Bob"],
        "Age": [25, 30, 35],
        "City": ["New York", "Paris", "London"],
    }
)

df2 = pd.DataFrame(
    {
        "Name": ["Emily", "Michael", "Sophia", "Rita"],
        "Age": [28, 32, 27, 22],
        "City": ["Berlin", "Tokyo", "Sydney", "Delhi"],
    }
)


# concatenate dataframes with outer join
result_outer = pd.concat([df1, df2], axis=1)

# concatenate dataframes with inner join
result_inner = pd.concat([df1, df2], axis=1, join="inner")

# display the concatenated results
print("Outer Join\n", result_outer)
print("\nInner Join\n", result_inner)

Outer Join
     Name   Age      City     Name  Age    City
0   John  25.0  New York    Emily   28  Berlin
1  Alice  30.0     Paris  Michael   32   Tokyo
2    Bob  35.0    London   Sophia   27  Sydney
3    NaN   NaN       NaN     Rita   22   Delhi

Inner Join
     Name  Age      City     Name  Age    City
0   John   25  New York    Emily   28  Berlin
1  Alice   30     Paris  Michael   32   Tokyo
2    Bob   35    London   Sophia   27  Sydney


## 1.4) concat() with keys

- **key** parameter pouzivame ak chceme pridat extra info do vysledneho dataframu
- vytvori sa hierarchicky dataframe


In [5]:
import pandas as pd

# create dataframes
df1 = pd.DataFrame(
    {
        "Name": ["John", "Alice", "Bob"],
        "Age": [25, 30, 35],
        "City": ["New York", "Paris", "London"],
    }
)

df2 = pd.DataFrame(
    {
        "Name": ["Emily", "Michael", "Sophia", "Rita"],
        "Age": [28, 32, 27, 22],
        "City": ["Berlin", "Tokyo", "Sydney", "Delhi"],
    }
)


# concatenate dataframes while ignoring index
result = pd.concat([df1, df2], keys=["from_df1", "from_df2"])

print(result)

               Name  Age      City
from_df1 0     John   25  New York
         1    Alice   30     Paris
         2      Bob   35    London
from_df2 0    Emily   28    Berlin
         1  Michael   32     Tokyo
         2   Sophia   27    Sydney
         3     Rita   22     Delhi
