# Identifying Individuals, Variables, and Categorical Variables in a Dataset

## üîé Key Concepts

### 1. Individuals (Units of Observation)
- **Definition**: The entities described by the dataset.
- **Representation**: Each row in a dataset usually corresponds to one individual.
- **Examples**:
  - In a medical study ‚Üí each patient is an individual.
  - In a school dataset ‚Üí each student is an individual.
  - In a survey ‚Üí each respondent is an individual.

---

### 2. Variables
- **Definition**: Characteristics measured or recorded about each individual.
- **Representation**: Each column in a dataset typically represents a variable.
- **Examples**:
  - Age, height, weight, income, test scores.
  - Gender, occupation, favorite color.

---

### 3. Categorical Variables
- **Definition**: Variables where values represent categories or labels, not numerical quantities.
- **Purpose**: They classify individuals into groups.
- **Examples**:
  - Gender ‚Üí {Male, Female, Non-binary}
  - Eye color ‚Üí {Blue, Brown, Green}
  - Type of car ‚Üí {SUV, Sedan, Truck}
- ‚ö†Ô∏è Even if categories are coded with numbers (e.g., 1 = Male, 2 = Female), they are still **categorical**, because the numbers don‚Äôt represent actual quantities.

---

## üìù Example Dataset

| Student ID | Name   | Age | Gender | Favorite Subject |
|------------|--------|-----|--------|-----------------|
| 001        | Alice  | 15  | Female | Math            |
| 002        | Ben    | 16  | Male   | History         |
| 003        | Carla  | 15  | Female | Science         |

- **Individuals**: Alice, Ben, Carla (each student).
- **Variables**: Student ID, Name, Age, Gender, Favorite Subject.
- **Categorical Variables**: Gender, Favorite Subject.
- **Quantitative Variables**: Age (numerical, can be measured).

---

In [1]:
import pandas as pd

# Example dataset
data = {
    "Student_ID": [1, 2, 3],
    "Name": ["Alice", "Ben", "Carla"],
    "Age": [15, 16, 15],
    "Gender": ["Female", "Male", "Female"],
    "Favorite_Subject": ["Math", "History", "Science"]
}

df = pd.DataFrame(data)

# Display the dataset
print(df)

   Student_ID   Name  Age  Gender Favorite_Subject
0           1  Alice   15  Female             Math
1           2    Ben   16    Male          History
2           3  Carla   15  Female          Science


## ‚úÖ Quick Tips for Identification
- Ask: *Who/what is being described?* ‚Üí **Individuals**
- Ask: *What characteristics are recorded?* ‚Üí **Variables**
- Ask: *Does the variable represent categories or numbers with meaning?* ‚Üí **Categorical vs. Quantitative**

# Distinguish Categorical vs. Quantitative Variables

In [2]:
# Check data types
print(df.dtypes)

Student_ID           int64
Name                object
Age                  int64
Gender              object
Favorite_Subject    object
dtype: object


* Quantitative variables (numeric): Age, Student_ID
* Categorical variables (labels/groups): Name, Gender, Favorite_Subject

# Quick Detection of Categorical Variables

In [3]:
# Select categorical variables automatically
categorical_vars = df.select_dtypes(include=["object"]).columns
print("Categorical Variables:", list(categorical_vars))

# Select quantitative variables automatically
quantitative_vars = df.select_dtypes(include=["int64", "float64"]).columns
print("Quantitative Variables:", list(quantitative_vars))


Categorical Variables: ['Name', 'Gender', 'Favorite_Subject']
Quantitative Variables: ['Student_ID', 'Age']
