# 1) Pandas reshape

- konverzia DataFramu z jedneho formatu na iny pre lepsiu data vizualizaciu a analyzu
- viacere metody:
  - **pivot()**
  - **pivot_table()**
  - **stack()**
  - **unstack()**
  - **melt()**


## 1.1) pivot()

- reshape (pretvara) udaje na zaklade hodnot stlpcov
- zoberie hodnoty stlpca a zoskupi ich do 2D tabulky


In [2]:
import pandas as pd

# create a DataFrame
data = {
    "Date": ["2023-08-01", "2023-08-01", "2023-08-02", "2023-08-02"],
    "Category": ["A", "B", "A", "B"],
    "Value": [10, 20, 30, 40],
}
df = pd.DataFrame(data)

print("Original Dataframe:\n", df)

# pivot the  DataFrame
pivot_df = df.pivot(index="Date", columns="Category", values="Value")
print("\nReshaped DataFrame:\n", pivot_df)

Original Dataframe:
          Date Category  Value
0  2023-08-01        A     10
1  2023-08-01        B     20
2  2023-08-02        A     30
3  2023-08-02        B     40

Reshaped DataFrame:
 Category     A   B
Date              
2023-08-01  10  20
2023-08-02  30  40


## 1.2) pivot_table()

- vytvorenie kontigencnej (pivot) tabulky, ktora agreguje a sumarizuje udaje na zaklade zadaneho indexu, stlpcov a agregacnych f-cii


In [None]:
import pandas as pd

# create a DataFrame
data = {"Category": ["A", "B", "A", "B", "A", "B"], "Value": [10, 20, 30, 40, 50, 60]}
df = pd.DataFrame(data)
print("Original Dataframe:\n", df)

# create a pivot table
pivot_table_df = df.pivot_table(index="Category", values="Value", aggfunc="mean")
print("\nReshaped Dataframe:\n", pivot_table_df)

Original Dataframe:
   Category  Value
0        A     10
1        B     20
2        A     30
3        B     40
4        A     50
5        B     60

Reshaped Dataframe:
           Value
Category       
A          30.0
B          40.0


## 1.3) stack() and unstack()

- **stack()** - otaca uroven oznacenia stlpcov a transformuje ich na najvnutornejsie urovne indexu riadku
  - v podstate prehodim oznacenia stlpcov za riadky
- **unstack()** - otaca urovne indexu riadkov a transformuje ich na uroven najvzdialenejsieho stlpca
  - v podstate prehodim riadky za oznacenia stlpcov


In [None]:
import pandas as pd

# create a DataFrame
data = {
    "Date": ["2023-08-01", "2023-08-02"],
    "Category_A": [10, 20],
    "Category_B": [30, 40],
}
df = pd.DataFrame(data)
print("Original df:\n", df)
print()

# set 'Date' column as the index
df.set_index("Date", inplace=True)

# stack the columns into rows
stacked_df = df.stack()
print("Stack:\n", stacked_df)
print()

# unstack the rows back to columns
unstacked_df = stacked_df.unstack()
print("Unstack: \n", unstacked_df)


Original df:
          Date  Category_A  Category_B
0  2023-08-01          10          30
1  2023-08-02          20          40

Stack:
 Date                  
2023-08-01  Category_A    10
            Category_B    30
2023-08-02  Category_A    20
            Category_B    40
dtype: int64

Unstack: 
             Category_A  Category_B
Date                              
2023-08-01          10          30
2023-08-02          20          40


## 1.4) melt()

- transformuje DataFrame zo sirokeho formatu na dlhy format
  - id_vars - urcuje stlpec, ktory chceme nezmeneny
  - var_name - urcuje nazov noveho stlpca, ktory bude obsahovat nazvy premennych ( v pripade ukazky 'Math' a 'History')
  - value_name - urcuje nazov noveho stlpca, ktory bude obsahovat hodnoty


In [None]:
import pandas as pd

# create a sample DataFrame
data = {"Name": ["Alice", "Bob"], "Math": [90, 85], "History": [75, 92]}
df = pd.DataFrame(data)
print("Original df:\n", df)
print()

# melt the DataFrame
melted_df = pd.melt(df, id_vars="Name", var_name="Subject", value_name="Score")

print(melted_df)

Original df:
     Name  Math  History
0  Alice    90       75
1    Bob    85       92

    Name  Subject  Score
0  Alice     Math     90
1    Bob     Math     85
2  Alice  History     75
3    Bob  History     92
