# Identify different data types in Python

In [2]:
import pandas as pd
sample_city_data = {'neighborhood': ['Alameda de Osuna', 'Aeropuerto', 'Casco Histórico de Barajas', 'Timón', 'Corralejos'],
                    'neighborhood_id': [211, 212, 213, 214, 215], 
                    'air_quality': [13, 5, 72, 45, 39]}
air_quality = pd.DataFrame(sample_city_data)

# Display basic info about columns including data types
air_quality.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   neighborhood     5 non-null      object
 1   neighborhood_id  5 non-null      int64 
 2   air_quality      5 non-null      int64 
dtypes: int64(2), object(1)
memory usage: 248.0+ bytes


In [3]:
# Display data types for all columns in the dataframe
air_quality.dtypes

neighborhood       object
neighborhood_id     int64
air_quality         int64
dtype: object

# Converting values between types
General recipe
```python
df[column1] = df[column1].astype('destination_type')
```

Pandas tries to be compatible with Numpy, including by defualt the NumPy dytpe. But it also offers several extensions to include new data types to increase the performance of non-numerical contents. One of the most popular type is Categorical.

Reference: [Pandas Arrays and data types](https://pandas.pydata.org/docs/reference/api/pandas.array.html)

## Converting integers to categorical data

In [4]:
air_quality['neighborhood_id'] = air_quality['neighborhood_id'].astype('category')
# Alternative transformation air_quality['neighborhood_id'] = pd.Categorical(air_quality['neighborhood_id'])
air_quality.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column           Non-Null Count  Dtype   
---  ------           --------------  -----   
 0   neighborhood     5 non-null      object  
 1   neighborhood_id  5 non-null      category
 2   air_quality      5 non-null      int64   
dtypes: category(1), int64(1), object(1)
memory usage: 425.0+ bytes
