### Astype method
* The astype method converts a Series's value to a specified type.
* Pass in the specified type as either a string or the core Python data type.
* Pandas cannot convert NaN values to numeric types, so we need to eliminate/replace them before we perform the conversion.
* The dtypes attribute returns a Series and DataFrame's columns and their types.

In [11]:
import pandas as pd

In [12]:
dataframe = pd.read_csv("Crime Prediction in Chicago_Dataset.csv").dropna(how='all')

In [13]:
dataframe.dtypes

ID                        int64
Case Number              object
Date                     object
Block                    object
IUCR                     object
Primary Type             object
Description              object
Location Description     object
Arrest                     bool
Domestic                   bool
Beat                      int64
District                  int64
Ward                    float64
Community Area            int64
FBI Code                 object
X Coordinate            float64
Y Coordinate            float64
Year                      int64
Updated On               object
Latitude                float64
Longitude               float64
Location                 object
dtype: object

In [14]:
dataframe.isnull().values.any()

True

In [15]:
dataframe.columns.values

array(['ID', 'Case Number', 'Date', 'Block', 'IUCR', 'Primary Type',
       'Description', 'Location Description', 'Arrest', 'Domestic',
       'Beat', 'District', 'Ward', 'Community Area', 'FBI Code',
       'X Coordinate', 'Y Coordinate', 'Year', 'Updated On', 'Latitude',
       'Longitude', 'Location'], dtype=object)

In [21]:
# fill NaN values before using astype method

fill_values = {
    "ID": 0,
    "Case Number": "Unknown",
    "Date": "Unknown",
    "Block": "Unknown",
    "IUCR": "Unknown",
    "Primary Type": "Unknown",
    "Description": "Unknown",
    "Location Description": "Unknown",
    "Arrest": "Unknown",
    "Domestic": "Unknown",
    "Beat": 0,
    "District": 0,
    "Ward": 0,
    "Community Area": 0,
    "FBI Code": "Unknown",
    "X Coordinate": 0,
    "Y Coordinate": 0,
    "Year": 0,
    "Updated On": "Unknown",
    "Latitude": 0,
    "Longitude": 0,
    "Location": "0"
}
dataframe = dataframe.fillna(fill_values)

In [22]:
dataframe.isnull().values.any()

False

In [24]:
dataframe['Ward'] = dataframe['Ward'].astype(int)

In [25]:
dataframe['Ward']

0        16
1         5
2        39
3        17
4         4
         ..
45418    14
45419    49
45420     6
45421    18
45422     7
Name: Ward, Length: 45423, dtype: int32

In [26]:
dataframe.dtypes

ID                        int64
Case Number              object
Date                     object
Block                    object
IUCR                     object
Primary Type             object
Description              object
Location Description     object
Arrest                     bool
Domestic                   bool
Beat                      int64
District                  int64
Ward                      int32
Community Area            int64
FBI Code                 object
X Coordinate            float64
Y Coordinate            float64
Year                      int64
Updated On               object
Latitude                float64
Longitude               float64
Location                 object
dtype: object

### Astype method II
* The **category** type is ideal for columns with a limited number of unique values.
* The **nunique** method will return a Series with the number of unique values in each column.
* With categories, pandas does not create a separate value in memory for each 'cell'. Rather, the cells point to a single copy of each unique value.

In [31]:
dataframe['Primary Type'].nunique()

31

In [32]:
dataframe.nunique()

ID                      45423
Case Number             45419
Date                    28199
Block                   16626
IUCR                      270
Primary Type               31
Description               251
Location Description      108
Arrest                      2
Domestic                    2
Beat                      274
District                   23
Ward                       51
Community Area             77
FBI Code                   26
X Coordinate            23501
Y Coordinate            26562
Year                        1
Updated On                954
Latitude                31719
Longitude               31715
Location                31733
dtype: int64

In [33]:
dataframe.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45423 entries, 0 to 45422
Data columns (total 22 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   ID                    45423 non-null  int64  
 1   Case Number           45423 non-null  object 
 2   Date                  45423 non-null  object 
 3   Block                 45423 non-null  object 
 4   IUCR                  45423 non-null  object 
 5   Primary Type          45423 non-null  object 
 6   Description           45423 non-null  object 
 7   Location Description  45423 non-null  object 
 8   Arrest                45423 non-null  bool   
 9   Domestic              45423 non-null  bool   
 10  Beat                  45423 non-null  int64  
 11  District              45423 non-null  int64  
 12  Ward                  45423 non-null  int32  
 13  Community Area        45423 non-null  int64  
 14  FBI Code              45423 non-null  object 
 15  X Coordinate       

In [34]:
dataframe['Primary Type'].astype('category')

0            OTHER OFFENSE
1              SEX OFFENSE
2              SEX OFFENSE
3        WEAPONS VIOLATION
4                    THEFT
               ...        
45418              ROBBERY
45419      CRIMINAL DAMAGE
45420    WEAPONS VIOLATION
45421      CRIMINAL DAMAGE
45422              BATTERY
Name: Primary Type, Length: 45423, dtype: category
Categories (31, object): ['ARSON', 'ASSAULT', 'BATTERY', 'BURGLARY', ..., 'SEX OFFENSE', 'STALKING', 'THEFT', 'WEAPONS VIOLATION']

In [37]:
dataframe['Location Description'] = dataframe['Location Description'].astype('category')
dataframe['Arrest'] = dataframe['Arrest'].astype('category')
dataframe['Domestic'] = dataframe['Domestic'].astype('category')
dataframe['Beat'] = dataframe['Beat'].astype('category')
dataframe['District'] = dataframe['District'].astype('category')
dataframe['Ward'] = dataframe['Ward'].astype('category')
dataframe['Community Area'] = dataframe['Community Area'].astype('category')
dataframe['FBI Code'] = dataframe['FBI Code'].astype('category')

In [38]:
dataframe.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45423 entries, 0 to 45422
Data columns (total 22 columns):
 #   Column                Non-Null Count  Dtype   
---  ------                --------------  -----   
 0   ID                    45423 non-null  int64   
 1   Case Number           45423 non-null  object  
 2   Date                  45423 non-null  object  
 3   Block                 45423 non-null  object  
 4   IUCR                  45423 non-null  object  
 5   Primary Type          45423 non-null  category
 6   Description           45423 non-null  object  
 7   Location Description  45423 non-null  category
 8   Arrest                45423 non-null  category
 9   Domestic              45423 non-null  category
 10  Beat                  45423 non-null  category
 11  District              45423 non-null  category
 12  Ward                  45423 non-null  category
 13  Community Area        45423 non-null  category
 14  FBI Code              45423 non-null  category
 15  X 