# 2. Types of Variables

## 2.1 Qualitative Variables

- A variable is called qualitative if its values are not measurable.
- Gender, profession, and marital status are some examples of qualitative variables. The values of a qualitative variable are called categories.
- A qualitative variable is called nominal if its categories are not naturally ordered. For example, in a population of working people, profession is a nominal variable.
- A qualitative variable is called ordinal if its categories follow a natural order. For example, a disease can be categorized as mild, moderate, or severe. These values can be ordered: mild < moderate < severe.

## 2.2 Quantitative Variables

- A variable is called quantitative if its values are measurable and numerical.
- A quantitative variable is called discrete if it can take only values that can be enumerated.
- A quantitative variable is called continuous if its potential values cannot be enumerated.
- Binary variables are discrete quantitative variables with specific properties.

### Binary Quantitative Variables

- **Symmetric**: A binary variable is symmetric if its two categories have the same importance, meaning they can be coded indifferently by 0 or 1. For example, the variable gender is symmetric because it can be coded as 0 or 1 for male or female without any difference.
- **Asymmetric**: A binary variable is asymmetric if its two categories do not have the same importance. For example, the result of a medical test cannot be coded as 0 if the test is positive and 1 if the test is negative due to the importance of the expected test result.

## Modifying the Dataset

We will now modify the dataset created in `01_Data` to include all types of variables. Then we will display the types of the variables using `.dtypes` and `.info()`.

In [1]:
# Importing the necessary library
import pandas as pd

# Creating the modified dataset
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
    'Gender': ['Female', 'Male', 'Male', 'Male', 'Female'],
    'Department': ['HR', 'Engineering', 'Marketing', 'Engineering', 'HR'],
    'Age': [25, 30, 35, 40, 28],
    'Salary': [50000, 60000, 70000, 80000, 52000],
    'Years of Experience': [2, 5, 10, 15, 3],
    'Marital Status': ['Single', 'Married', 'Married', 'Single', 'Single'],  # Qualitative Nominal
    'Disease Severity': ['Mild', 'Moderate', 'Severe', 'Mild', 'Moderate'],  # Qualitative Ordinal
    'Test Result': [1, 0, 0, 1, 1]  # Binary Quantitative Asymmetric
}

# Creating a DataFrame
df = pd.DataFrame(data)

# Displaying the DataFrame
df


Unnamed: 0,Name,Gender,Department,Age,Salary,Years of Experience,Marital Status,Disease Severity,Test Result
0,Alice,Female,HR,25,50000,2,Single,Mild,1
1,Bob,Male,Engineering,30,60000,5,Married,Moderate,0
2,Charlie,Male,Marketing,35,70000,10,Married,Severe,0
3,David,Male,Engineering,40,80000,15,Single,Mild,1
4,Eva,Female,HR,28,52000,3,Single,Moderate,1


In [2]:
# Displaying the types of variables
df.dtypes

Name                   object
Gender                 object
Department             object
Age                     int64
Salary                  int64
Years of Experience     int64
Marital Status         object
Disease Severity       object
Test Result             int64
dtype: object

In [3]:
# Displaying information about the DataFrame
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 9 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   Name                 5 non-null      object
 1   Gender               5 non-null      object
 2   Department           5 non-null      object
 3   Age                  5 non-null      int64 
 4   Salary               5 non-null      int64 
 5   Years of Experience  5 non-null      int64 
 6   Marital Status       5 non-null      object
 7   Disease Severity     5 non-null      object
 8   Test Result          5 non-null      int64 
dtypes: int64(4), object(5)
memory usage: 492.0+ bytes


## Exercise

Determine the types of the following variables in the dataset. Are they qualitative or quantitative? If they are binary, are they symmetric or asymmetric?

- **Name**
- **Gender**
- **Department**
- **Age**
- **Salary**
- **Years of Experience**
- **Marital Status**
- **Disease Severity**
- **Test Result**