# **🗃️ Day 4 – Data Types & Type Conversion (Dtypes) 🔄**

#### **Goal:** Deepen your understanding of Pandas data types (dtypes), learn how to inspect them, and master techniques for safely converting data between different types.

#### **Topics To Cover:** Pandas Dtypes (int, float, object, category, datetime), Identifying Dtypes, Explicit Type Casting using .astype(), and the use of pd.to_numeric().

----

## **Introduction: What are Data Types? 🧱**
When working with Pandas, it's crucial to understand the concept of a data type (dtype). Data types define how data is stored and manipulated within a column. Each column in a DataFrame is essentially a Series that holds values of a specific type (e.g., int64 for integers, object for strings, float64 for decimals).

#### **Analogy: The Container Store 📦**
Think of a DataFrame column as a shelf in a store, and the data type as the container you use for items on that shelf:
* Wrong Container (object for numbers): If you store currency values (numbers) in a container meant for mixed items (object), calculations are impossible, and it wastes storage space.

* Correct Container (float or int): By using the correct numerical container, the computer knows exactly how to perform math and uses memory efficiently.

### **Why is Type Handling Important? 💡**
* Accuracy: Incorrect types (e.g., a number stored as a string) can prevent mathematical operations, leading to errors or inconsistencies.

* Efficiency: Using optimized types, like category for low-cardinality strings or smaller integer sizes (int32 instead of int64), drastically reduces memory usage.

* Consistency: Ensuring consistent types across datasets is essential for tasks like merging data or training models.



----

## Let's Begin 🚀

In [1]:
# importing the necessary libraries
import pandas as pd
import numpy as np

# Load Dataset
data = pd.read_csv(r'../data/academic Stress level - maintainance 1.csv')
df = pd.DataFrame(data)
df.head()

Unnamed: 0,Timestamp,Your Academic Stage,Peer pressure,Academic pressure from your home,Study Environment,What coping strategy you use as a student?,"Do you have any bad habits like smoking, drinking on a daily basis?",What would you rate the academic competition in your student life,Rate your academic stress index
0,24/07/2025 22:05:39,undergraduate,4,5,Noisy,Analyze the situation and handle it with intel...,No,3,5
1,24/07/2025 22:05:52,undergraduate,3,4,Peaceful,Analyze the situation and handle it with intel...,No,3,3
2,24/07/2025 22:06:39,undergraduate,1,1,Peaceful,"Social support (friends, family)",No,2,4
3,24/07/2025 22:06:45,undergraduate,3,2,Peaceful,Analyze the situation and handle it with intel...,No,4,3
4,24/07/2025 22:08:06,undergraduate,3,3,Peaceful,Analyze the situation and handle it with intel...,No,4,5


### Key methods for DataType and Type conversion

These are the ~15 methods/properites for data type inspection and conversion:

| Method / Attribute      | Purpose                                         | Input → Output                      | Syntax Example                              |
| ----------------------- | ----------------------------------------------- | ----------------------------------- | ------------------------------------------- |
| `.dtypes`               | Check data types of columns                     | DataFrame → Series                  | `df.dtypes`                                 |
| `.info()`               | Full column summary with dtypes                 | DataFrame → None                    | `df.info()`                                 |
| `.astype()`             | Convert type to specified dtype                 | Series/DataFrame → converted object | `df['col'].astype('int64')`                 |
| `pd.to_numeric()`       | Force conversion to numeric                     | Series → numeric Series             | `pd.to_numeric(df['col'], errors='coerce')` |
| `pd.to_datetime()`      | Convert to datetime                             | Series/str → datetime64             | `pd.to_datetime(df['date'])`                |
| `pd.to_timedelta()`     | Convert to timedelta                            | Series/str → timedelta64            | `pd.to_timedelta(df['duration'])`           |
| `.infer_objects()`      | Try to infer better dtypes for `object` columns | DataFrame → DataFrame               | `df.infer_objects()`                        |
| `.convert_dtypes()`     | Convert to best possible dtypes automatically   | DataFrame → DataFrame               | `df.convert_dtypes()`                       |
| `.select_dtypes()`      | Select only specific dtype columns              | DataFrame → DataFrame               | `df.select_dtypes(include='number')`        |
| `.apply(pd.to_numeric)` | Batch numeric conversion                        | DataFrame → DataFrame               | `df.apply(pd.to_numeric, errors='coerce')`  |
| `.astype('category')`   | Convert to categorical (memory efficient)       | Series → category                   | `df['col'].astype('category')`              |
| `.astype('string')`     | Convert to new Pandas string dtype              | Series → string                     | `df['col'].astype('string')`                |
| `.astype('boolean')`    | Nullable Boolean conversion                     | Series → boolean                    | `df['col'].astype('boolean')`               |
| `.astype('Int64')`      | Nullable integer (can hold NaN)                 | Series → Int64                      | `df['col'].astype('Int64')`                 |


### Examples:

In [2]:
df.loc[:, [col for col in df.columns if df[col].dtype == object]] # get columns with object data type
# Another way but better
df.select_dtypes(include=object)

Unnamed: 0,Timestamp,Your Academic Stage,Study Environment,What coping strategy you use as a student?,"Do you have any bad habits like smoking, drinking on a daily basis?"
0,24/07/2025 22:05:39,undergraduate,Noisy,Analyze the situation and handle it with intel...,No
1,24/07/2025 22:05:52,undergraduate,Peaceful,Analyze the situation and handle it with intel...,No
2,24/07/2025 22:06:39,undergraduate,Peaceful,"Social support (friends, family)",No
3,24/07/2025 22:06:45,undergraduate,Peaceful,Analyze the situation and handle it with intel...,No
4,24/07/2025 22:08:06,undergraduate,Peaceful,Analyze the situation and handle it with intel...,No
...,...,...,...,...,...
135,17/08/2025 13:02:04,undergraduate,Peaceful,Analyze the situation and handle it with intel...,No
136,18/08/2025 14:36:00,undergraduate,disrupted,Analyze the situation and handle it with intel...,No
137,18/08/2025 17:13:52,undergraduate,Peaceful,Analyze the situation and handle it with intel...,No
138,18/08/2025 19:08:52,undergraduate,disrupted,"Social support (friends, family)",No


In [3]:
df.loc[:, [col for col in df.columns if df[col].dtype == 'int64']] # get columns with int64 data type
# Another way:
df.select_dtypes(include='int64')

Unnamed: 0,Peer pressure,Academic pressure from your home,What would you rate the academic competition in your student life,Rate your academic stress index
0,4,5,3,5
1,3,4,3,3
2,1,1,2,4
3,3,2,4,3
4,3,3,4,5
...,...,...,...,...
135,3,2,3,4
136,4,2,3,3
137,3,3,2,4
138,4,5,5,5


In [4]:
df.info() # for quick summery with data types, memory usage

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 140 entries, 0 to 139
Data columns (total 9 columns):
 #   Column                                                               Non-Null Count  Dtype 
---  ------                                                               --------------  ----- 
 0   Timestamp                                                            140 non-null    object
 1   Your Academic Stage                                                  140 non-null    object
 2   Peer pressure                                                        140 non-null    int64 
 3   Academic pressure from your home                                     140 non-null    int64 
 4   Study Environment                                                    139 non-null    object
 5   What coping strategy you use as a student?                           140 non-null    object
 6   Do you have any bad habits like smoking, drinking on a daily basis?  140 non-null    object
 7   What would you rat

In [5]:
# converting column: 'Do you have any bad habits like smoking, drinking on a daily basis?' to boolean type
df2 = df.copy()
col_name = 'Do you have any bad habits like smoking, drinking on a daily basis?'

# Standardize the column values
df2[col_name] = df2[col_name].str.strip()

# Update the column
df2[col_name] = df2[col_name].replace({'Yes': True, 'No': False, 'prefer not to say': False}).astype('boolean')

# Now, check data types of each columns
df2.dtypes

  df2[col_name] = df2[col_name].replace({'Yes': True, 'No': False, 'prefer not to say': False}).astype('boolean')


Timestamp                                                               object
Your Academic Stage                                                     object
Peer pressure                                                            int64
Academic pressure from your home                                         int64
Study Environment                                                       object
What coping strategy you use as a student?                              object
Do you have any bad habits like smoking, drinking on a daily basis?    boolean
What would you rate the academic  competition in your student life       int64
Rate your academic stress index                                          int64
dtype: object

In [6]:
# converting Timestamp column's data type of datetime
df2['Timestamp'] = df2['Timestamp'].astype('datetime64[ns]')
df2.dtypes

Timestamp                                                              datetime64[ns]
Your Academic Stage                                                            object
Peer pressure                                                                   int64
Academic pressure from your home                                                int64
Study Environment                                                              object
What coping strategy you use as a student?                                     object
Do you have any bad habits like smoking, drinking on a daily basis?           boolean
What would you rate the academic  competition in your student life              int64
Rate your academic stress index                                                 int64
dtype: object

In [7]:
# create a column named 'pressure ratio' with int64 type
# df2['pressure ratio'] = df2['Peer pressure'] // df2['Academic pressure from your home']
# or
df2['pressure ratio'] = df2['Peer pressure'] / df2['Academic pressure from your home']
df2['pressure ratio'] = df2['pressure ratio'].astype('int64')
df2.dtypes

Timestamp                                                              datetime64[ns]
Your Academic Stage                                                            object
Peer pressure                                                                   int64
Academic pressure from your home                                                int64
Study Environment                                                              object
What coping strategy you use as a student?                                     object
Do you have any bad habits like smoking, drinking on a daily basis?           boolean
What would you rate the academic  competition in your student life              int64
Rate your academic stress index                                                 int64
pressure ratio                                                                  int64
dtype: object

In [8]:
# convert the data type of pressure ratio to int64 with coercion of NaN values
pd.to_numeric(df2['pressure ratio'], errors='coerce')

0      0
1      0
2      1
3      1
4      1
      ..
135    1
136    2
137    1
138    0
139    2
Name: pressure ratio, Length: 140, dtype: int64

In [9]:
# select columns with numeric data
df2.select_dtypes(include='number') # for numeric, means both int64 and float64

Unnamed: 0,Peer pressure,Academic pressure from your home,What would you rate the academic competition in your student life,Rate your academic stress index,pressure ratio
0,4,5,3,5,0
1,3,4,3,3,0
2,1,1,2,4,1
3,3,2,4,3,1
4,3,3,4,5,1
...,...,...,...,...,...
135,3,2,3,4,1
136,4,2,3,3,2
137,3,3,2,4,1
138,4,5,5,5,0


In [10]:
# for integer only
df2.select_dtypes(include='int64')

Unnamed: 0,Peer pressure,Academic pressure from your home,What would you rate the academic competition in your student life,Rate your academic stress index,pressure ratio
0,4,5,3,5,0
1,3,4,3,3,0
2,1,1,2,4,1
3,3,2,4,3,1
4,3,3,4,5,1
...,...,...,...,...,...
135,3,2,3,4,1
136,4,2,3,3,2
137,3,3,2,4,1
138,4,5,5,5,0


In [11]:
# for float only
df2.select_dtypes(include='float64')

0
1
2
3
4
...
135
136
137
138
139


In [12]:
# for selecting boolean
df2.select_dtypes(include='boolean')

Unnamed: 0,"Do you have any bad habits like smoking, drinking on a daily basis?"
0,False
1,False
2,False
3,False
4,False
...,...
135,False
136,False
137,False
138,False


In [13]:
# for object only including string
df2.select_dtypes(include='object')

Unnamed: 0,Your Academic Stage,Study Environment,What coping strategy you use as a student?
0,undergraduate,Noisy,Analyze the situation and handle it with intel...
1,undergraduate,Peaceful,Analyze the situation and handle it with intel...
2,undergraduate,Peaceful,"Social support (friends, family)"
3,undergraduate,Peaceful,Analyze the situation and handle it with intel...
4,undergraduate,Peaceful,Analyze the situation and handle it with intel...
...,...,...,...
135,undergraduate,Peaceful,Analyze the situation and handle it with intel...
136,undergraduate,disrupted,Analyze the situation and handle it with intel...
137,undergraduate,Peaceful,Analyze the situation and handle it with intel...
138,undergraduate,disrupted,"Social support (friends, family)"


In [14]:
# for datetime only
df2.select_dtypes(include='datetime')

Unnamed: 0,Timestamp
0,2025-07-24 22:05:39
1,2025-07-24 22:05:52
2,2025-07-24 22:06:39
3,2025-07-24 22:06:45
4,2025-07-24 22:08:06
...,...
135,2025-08-17 13:02:04
136,2025-08-18 14:36:00
137,2025-08-18 17:13:52
138,2025-08-18 19:08:52


In [15]:
# for Auto converting all data types
df2.convert_dtypes()
df2.dtypes

Timestamp                                                              datetime64[ns]
Your Academic Stage                                                            object
Peer pressure                                                                   int64
Academic pressure from your home                                                int64
Study Environment                                                              object
What coping strategy you use as a student?                                     object
Do you have any bad habits like smoking, drinking on a daily basis?           boolean
What would you rate the academic  competition in your student life              int64
Rate your academic stress index                                                 int64
pressure ratio                                                                  int64
dtype: object