## Different Ways to Change Data Type in Pandas

While working in Pandas DataFrame or any table-like data structures we are often required to chang the data type(dtype) of a column also called type casting, for example, convert from int to string, string to int e.t.c, 

In pandas, you can do this by using several methods like ```astype()```, ```to_numeric()```, ```covert_dttypes()```, ```infer_objects()``` and e.t.c. 

In today's class, I will explain different examples of how to change or convert the data type in Pandas DataFrame – convert all columns to a specific type, convert single or multiple column types – convert to numeric types e.t.c.

### 1. Quick Examples of Changing Data Type

Below are some quick examples of converting column data type on Pandas DataFrame.

In [2]:
# Quick Examples of Converting Data Types in Pandas
df2=df.convert_dtypes()
df = df.astype(str)
df = df.astype({"Fee": int, "Discount": float})
df = df.astype({"Courses": int},errors='ignore')
df = df.infer_objects()
df['Fee'] = pd.to_numeric(df['Fee'])
df[['Fee', 'Discount']] =df [['Fee', 'Discount']].apply(pd.to_numeric)

**Now let’s see with an example. first, create a Pandas DataFrame with columns names Courses, Fee, Duration, Discount.**

In [3]:
import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python","pandas","Oracle","Java"],
    'Fee' :[20000,25000,26000,22000,24000,21000,22000],
    'Duration ':['30day','40days','35days', '40days','60days','50days','55days'],
    'Discount':[11.8,23.7,13.4,15.7,12.5,25.4,18.4]
    }
df = pd.DataFrame(technologies)
print(df.dtypes)

Courses       object
Fee            int64
Duration      object
Discount     float64
dtype: object


### 2. DataFrame.convert_dtypes() to Convert Data Type in Pandas

```convert_dtypes()``` is available in Pandas DataFrame since version 1.0.0, this is the most used method as it automatically converts the column types to best possible types.

Below is the Syntax of the pandas.DataFrame.convert_dtypes().

    # Syntax of DataFrame.convert_dtypes
    DataFrame.convert_dtypes(infer_objects=True, convert_string=True,
      convert_integer=True, convert_boolean=True, convert_floating=True)

In [4]:
# Convert all types to best possible types
df2=df.convert_dtypes()
print(df2.dtypes)

Courses       string
Fee            Int64
Duration      string
Discount     Float64
dtype: object


In [5]:
import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python","pandas","Oracle","Java"],
    'Fee' :['20000','25000','26000','22000','24000','21000','22000'],
    'Duration ':['30day','40days','35days', '40days','60days','50days','55days'],
    'Discount':[11.8,23.7,13.4,15.7,12.5,25.4,18.4]
    }
df = pd.DataFrame(technologies)
print(df.dtypes)

Courses       object
Fee           object
Duration      object
Discount     float64
dtype: object


In [6]:
# Convert all types to best possible types
df2=df.convert_dtypes()
print(df2.dtypes)

Courses       string
Fee           string
Duration      string
Discount     Float64
dtype: object


### 3. DataFrame.astype() to Change Data Type in Pandas

In pandas DataFrame use <a href="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.astype.html#">DataFrame.astype()</a> to convert one type to another type of single or multiple columns at a time, you can also use it to change all column types to the same type. When you perform astype() on a DataFrame without specifying a column name, it changes all columns to a specific type. To convert a specific column, you need to explicitly specify the column.

Below is the syntax of pandas.DataFrame.astype()


#### Below is syntax of DataFrame.astype()
DataFrame.astype(dtype, copy=True, errors='raise')

#### 3.1 Change All Columns to Same type in Pandas
    df.astype(str) converts all columns of Pandas DataFrame to string type.

In [8]:
# Change a specific Columns to Some type

df2 = df2.astype({'Fee':int})
df2.dtypes

Courses       string
Fee            int32
Duration      string
Discount     Float64
dtype: object

In [9]:
# Change All Columns to Same type
df3 = df.astype(str)
print(df3.dtypes)

Courses      object
Fee          object
Duration     object
Discount     object
dtype: object


### 3.2 Change Type For One or Multiple Columns in Pandas

On astype() Specify the param as JSON notation with column name as key and type you wanted to convert as a value to change one or multiple columns. Below example cast DataFrame column Fee to int type and Discount to float type.

In [10]:
# Change Type For One or Multiple Columns
df4 = df.astype({"Fee": int, "Discount": float})
print(df4.dtypes)

Courses       object
Fee            int32
Duration      object
Discount     float64
dtype: object


### 3.3 Convert Data Type for All Columns in a List

Sometimes you may need to convert a list of DataFrame columns to a specific type, you can achieve this in several ways. Below are 3 different ways that coverts columns Fee and Discount to float type.

In [12]:
# Convert Data Type for All Columns in a List
df = pd.DataFrame(technologies)
cols = ['Fee', 'Discount']
df[cols] = df[cols].astype('float')
print(df.dtypes)

# By using a loop
for col in ['Fee', 'Discount']:
    df[col] = df[col].astype('float')
print(df.dtypes)

#By using apply() & astype() together
df[['Fee', 'Discount']].apply(lambda x: x.astype('float'))
print(df.dtypes)

Courses       object
Fee          float64
Duration      object
Discount     float64
dtype: object
Courses       object
Fee          float64
Duration      object
Discount     float64
dtype: object
Courses       object
Fee          float64
Duration      object
Discount     float64
dtype: object


### 4. Using DataFrame.to_numeric() to Convert Numeric Types

pandas.to_numeric() is used to convert columns with non-numeric dtypes to the most suitable numeric type.

### 4.1 Convert Numeric Types

The below example just converts Fee column to the numeric type.

In [10]:
# Converts fee column to numeric type
df['Fee'] = pd.to_numeric(df['Fee'])
df

Unnamed: 0,Courses,Fee,Duration,Discount
0,Spark,20000,30day,11.8
1,PySpark,25000,40days,23.7
2,Hadoop,26000,35days,13.4
3,Python,22000,40days,15.7
4,pandas,24000,60days,12.5
5,Oracle,21000,50days,25.4
6,Java,22000,55days,18.4


In [13]:
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python","pandas","Oracle","Java"],
    'Fee' :['20000','25000','26000','22000','24000','21000','22000'],
    'Duration ':['30day','40days','35days', '40days','60days','50days','55days'],
    'Discount':[11.8,23.7,13.4,15.7,12.5,25.4,18.4]
    }
df = pd.DataFrame(technologies)
print(df.dtypes)

Courses       object
Fee           object
Duration      object
Discount     float64
dtype: object


In [14]:
df.Fee = pd.to_numeric(df.Fee)
df.dtypes

Courses       object
Fee            int64
Duration      object
Discount     float64
dtype: object