# Working with Categorical Data in Python

## What Does it Mean to be "Categorical"?

A variable is usually considered categorical if it contains a finite number of distinct groups - or categories. This type of data is also known as qualitative data. In contrast, numerical data (quantitative data) is expressed using values and measurements.

### Example

In [None]:
import pandas as pd

# Example of categorical data
data = {'Category': ['Red', 'Blue', 'Green', 'Red', 'Blue']}
df = pd.DataFrame(data)
print(df['Category'].value_counts())

## Ordinal vs. Nominal Variables

Categorical data can be further broken down into two types: **ordinal** (with a natural order) and **nominal** (without a natural order).

### Example

In [None]:
# Ordinal example
ordinal_data = {'Rating': ['Poor', 'Fair', 'Good', 'Very Good', 'Excellent']}
ordinal_df = pd.DataFrame(ordinal_data)

# Nominal example
nominal_data = {'Color': ['Red', 'Green', 'Blue']}
nominal_df = pd.DataFrame(nominal_data)

## Our First Dataset

We'll explore the adult census income dataset to learn about categorical variables. This dataset contains information on US adults and whether they make over $50,000 annually.

### Example

In [None]:
# Load the dataset
url = 'https://www.kaggle.com/uciml/adult-census-income'
census_data = pd.read_csv('path_to_your_dataset.csv')

# Check the dataset's info
print(census_data.info())

## Using Describe

You can use the `describe()` method to explore a categorical variable like marital status.

### Example

In [None]:
# Describe marital status
print(census_data['Marital Status'].describe())

## Using Value Counts

Use the `value_counts()` method to create a frequency table of unique values.

### Example

In [None]:
# Value counts of marital status
print(census_data['Marital Status'].value_counts())

## Using Value Counts with Normalize

The `normalize=True` parameter gives the relative frequencies of the unique values.

### Example

In [None]:
# Normalized value counts
print(census_data['Marital Status'].value_counts(normalize=True))

## Knowledge Check

Let's recap the different types of data we've discussed by working through a couple of exercises.

### Example Exercise

1. Load the dataset and check the unique values in the "Occupation" column.
2. Create a bar plot to visualize the distribution of "Education" levels.

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# Unique values in Occupation
print(census_data['Occupation'].unique())

# Bar plot of Education levels
sns.countplot(y='Education', data=census_data)
plt.title('Distribution of Education Levels')
plt.show()