# 1) Pandas categorical

- kategoricke data - reprezentuju kategoriu alebo oznacenie namiesto cisel


## 1.1) Create categorical data type in pandas

- **Categorical()** metoda


In [1]:
import pandas as pd

data = ["red", "blue", "green", "red", "blue"]

# create a categorical column
categorical_data = pd.Categorical(data)

print(categorical_data)

['red', 'blue', 'green', 'red', 'blue']
Categories (3, object): ['blue', 'green', 'red']


## 1.2) Convert series to categorical series


### Using the astype() function


In [2]:
import pandas as pd

# create a regular Series
data = ["red", "blue", "green", "red", "blue"]
series1 = pd.Series(data)

# convert the Series to a categorical Series using .astype()
categorical_s = series1.astype("category")

print(categorical_s)

0      red
1     blue
2    green
3      red
4     blue
dtype: category
Categories (3, object): ['blue', 'green', 'red']


### Using the dtype parameter Inside Series()


In [3]:
import pandas as pd

# create a categorical Series
data = ["A", "B", "A", "C", "B"]
cat_series = pd.Series(data, dtype="category")

print(cat_series)

0    A
1    B
2    A
3    C
4    B
dtype: category
Categories (3, object): ['A', 'B', 'C']


## 1.3) Access categories and codes in pandas

- **cat** - accessor na pristup ku kategoriam a kodom


In [None]:
import pandas as pd

# create a categorical Series
data = ["A", "B", "A", "C", "B"]
cat_series = pd.Series(data, dtype="category")

# using .cat accessor
print("Categories:")
print(cat_series.cat.categories)
print("Codes:")
print(cat_series.cat.codes)

# vysvetlenie
# The element at index 0 of cat_series is A, which corresponds to category 0.
# The element at index 1 of cat_series is B, which corresponds to category 1.
# The element at index 2 of cat_series is A, which again corresponds to category 0.
# The element at index 3 of cat_series is C, which corresponds to category 2.
# The element at index 4 of cat_series is B, which again corresponds to category 1.

Categories:
Index(['A', 'B', 'C'], dtype='object')
Codes:
0    0
1    1
2    0
3    2
4    1
dtype: int8


## 1.4) Rename categories

- **cat.rename_categories()** metoda


In [11]:
import pandas as pd

# create a categorical Series
data = ["A", "B", "A", "C", "B"]
cat_series = pd.Series(data, dtype="category")
print("Original:\n", cat_series)
print()

# create a dictionary for renaming categories
category_mapping = {"A": "Category A", "B": "Category B", "C": "Category C"}

# rename categories using .rename_categories() and recreate the Series
cat_series_renamed = cat_series.cat.rename_categories(category_mapping)

print("Renamed:\n", cat_series_renamed)

Original:
 0    A
1    B
2    A
3    C
4    B
dtype: category
Categories (3, object): ['A', 'B', 'C']

Renamed:
 0    Category A
1    Category B
2    Category A
3    Category C
4    Category B
dtype: category
Categories (3, object): ['Category A', 'Category B', 'Category C']


## 1.5) Add new categories

- **cat.add_categories()** metoda


In [12]:
import pandas as pd

# create a categorical Series
data = ["A", "B", "A", "C", "B"]
cat_series = pd.Series(data, dtype="category")

# add new categories and reassign the variable
new_categories = ["D", "E"]
cat_series = cat_series.cat.add_categories(new_categories)

print(cat_series)

0    A
1    B
2    A
3    C
4    B
dtype: category
Categories (5, object): ['A', 'B', 'C', 'D', 'E']


## 1.6) Remove categories

- **cat.remove_categories()** metoda


In [13]:
import pandas as pd

# create a categorical Series
data = ["A", "B", "A", "C", "B"]
cat_series = pd.Series(data, dtype="category")

# display the original categorical variable
print("Original Series:")
print(cat_series)

# remove specific categories
categories_to_remove = ["B", "C"]
cat_series_removed = cat_series.cat.remove_categories(categories_to_remove)

# display the modified categorical variable
print("\nModified Series:")
print(cat_series_removed)

Original Series:
0    A
1    B
2    A
3    C
4    B
dtype: category
Categories (3, object): ['A', 'B', 'C']

Modified Series:
0      A
1    NaN
2      A
3    NaN
4    NaN
dtype: category
Categories (1, object): ['A']


## 1.7) Check if categorical variable is ordered or not

- **ordered** atribut/parameter **cat** accessora
- **!Important:**
  - Ordering categorical variables in Pandas helps in maintaining a logical sequence for analysis and visualization. Recognizing this order ensures accurate statistical tests, meaningful visual representations, and consistent data interpretation.


In [15]:
import pandas as pd

# create an ordered categorical Series
data = ["low", "medium", "high", "low", "medium"]
ordered_cat_series = pd.Categorical(
    data, categories=["low", "medium", "high"], ordered=True
)

print("Ordered series:\n", ordered_cat_series)
print()

# check if the categorical variable is ordered
is_ordered = ordered_cat_series.ordered

print("Is ordered:", is_ordered)

Ordered series:
 ['low', 'medium', 'high', 'low', 'medium']
Categories (3, object): ['low' < 'medium' < 'high']

Is ordered: True
