# 1) Pandas group by

- umoznuje zoskupobvat udaje na zaklade konkretnych stlpcov
- cize mozme rozdelit dataframe na niekolko mensich zoskupeni na zaklade hodnot v stlpcoch
- na taketo zoskupenia mozme potom aplikovat rozne funkcie


## 1.1) Group by a single column

- **groupby()**


In [None]:
import pandas as pd

# create a dictionary containing the data
data = {
    "Category": ["Electronics", "Clothing", "Electronics", "Clothing"],
    "Sales": [1000, 500, 800, 300],
}

# create a DataFrame using the data dictionary
df = pd.DataFrame(data)
print("Original:\n", df)
print()

# group the DataFrame by the Category column and
# calculate the sum of Sales for each category
grouped = df.groupby("Category")["Sales"].sum()

# print the grouped data
print(grouped)

# vysvetlenie
# df.groupby('Category') - groups the df DataFrame by the unique values in the Category column.
# ['Sales'] - specifies that we are interested in the Sales column within each group.
# .sum() - calculates the sum of the Sales values for each group.

Original:
       Category  Sales
0  Electronics   1000
1     Clothing    500
2  Electronics    800
3     Clothing    300

Category
Clothing        800
Electronics    1800
Name: Sales, dtype: int64


## 1.2) Group by a multiple column


In [6]:
import pandas as pd

# create a DataFrame with student data
data = {
    "Gender": ["Male", "Female", "Male", "Female", "Male"],
    "Grade": ["A", "B", "A", "A", "B"],
    "Score": [90, 85, 92, 88, 78],
}

df = pd.DataFrame(data)
print("Original:\n", df)
print()

# define the aggregate functions to be applied to the Score column
agg_functions = {
    # calculate both mean and maximum of the Score column
    "Score": ["mean", "max"]
}

# group the DataFrame by Gender and Grade, then apply the aggregate functions
grouped = df.groupby(["Gender", "Grade"]).aggregate(agg_functions)

# print the resulting grouped DataFrame
print(grouped)

Original:
    Gender Grade  Score
0    Male     A     90
1  Female     B     85
2    Male     A     92
3  Female     A     88
4    Male     B     78

             Score    
              mean max
Gender Grade          
Female A      88.0  88
       B      85.0  85
Male   A      91.0  92
       B      78.0  78


## 1.3) Group by categorical data


In [10]:
import pandas as pd

# sample data
data = {
    "Category": ["A", "B", "A", "B", "A", "B"],
    "Sales": [100, 150, 200, 50, 300, 120],
}

df = pd.DataFrame(data)
print("Original:\n", df)
print()

# convert Category column to categorical type
df["Category"] = pd.Categorical(df["Category"])
print("Categorical:\n", df)
print()

# group by Category  and calculate the total sales
grouped = df.groupby("Category", observed=True)["Sales"].sum()

print(grouped)

Original:
   Category  Sales
0        A    100
1        B    150
2        A    200
3        B     50
4        A    300
5        B    120

Categorical:
   Category  Sales
0        A    100
1        B    150
2        A    200
3        B     50
4        A    300
5        B    120

Category
A    600
B    320
Name: Sales, dtype: int64
