## Changing Datatypes of Pandas inbuilt types

- For series,dataframe,column which are datatypes present in pandas library not in python

#### 3 options

- .astype(int|float|str)
- pd.to_numeric()
- pd.to_datetime()


- Use pd.numeric only if mixed col(mixed with strings or mixed with Nan's) 
- Otherwise use astype

In [62]:
import pandas as pd
df=pd.DataFrame({'Name':['Sahil', 'Sonia', 'Sourav', 'Vishal'],
        'Age':[20, 21, 19, 18],
        'Number':[100,200,300,400],
        'Weight':[12.2,12.4,43.4,45.1],
        'Height':[12.2,44.3,76.8,12.3]        })
df

Unnamed: 0,Name,Age,Number,Weight,Height
0,Sahil,20,100,12.2,12.2
1,Sonia,21,200,12.4,44.3
2,Sourav,19,300,43.4,76.8
3,Vishal,18,400,45.1,12.3


### Change datatype of one/multiple columns

In [64]:
# ASTYPE()
# Change datatype of one column:
df_new=df['Age'].astype(int)

# Change datatype of multiple columns:
df_new=df.astype({'Age': 'float64', 'Name': 'object'})

# PD.TO_NUMERIC()
# Change datatype of one column:
df['Age']=pd.to_numeric(df['Age'])

# convert just columns "Weight" & "Height"
df[["Weight", "Height"]] = df[["Weight", "Height"]].apply(pd.to_numeric)

In [65]:
df.dtypes

Name      object
Age        int64
Number     int64
Weight     int64
Height     int64
dtype: object

### Select and convert float columns only

In [66]:
float_cols = df.select_dtypes(include=['float64']) # This will select float columns only
print(list(float_cols.columns.values))
print()
print(float_cols)

for col in float_cols.columns.values:
    df[col] = df[col].astype('int64')
    
# or refer to string functions notebook

[]

Empty DataFrame
Columns: []
Index: [0, 1, 2, 3]


In [56]:
df.dtypes

Name      object
Age        int64
Number     int64
Weight     int64
Height     int64
dtype: object

### Snippet to convert datatypes while catching the errors as well

In [45]:
try:
        df['Age'] = df['Age'].astype(float)
except TypeError as e:
        print(e)

### Use case 1: Numbers as strings

In [1]:
l=['1','2','3','4'] # String gets converted into object
df=pd.DataFrame(l,columns =['Numbers'])

In [2]:
df

Unnamed: 0,Numbers
0,1
1,2
2,3
3,4


In [3]:
df.Numbers.dtype

dtype('O')

In [4]:
# df['string_col']=df['string_col'].astype(int//float//string)
df['Numbers']=pd.to_numeric(df['Numbers'])

In [5]:
df.Numbers.dtype   # DONE !!!

dtype('int64')

### Use case 2: Ints+Floats

In [6]:
l=[1,2,3,4,5.5] # Note: If only one float,entire column gets converted into float
df=pd.DataFrame(l,columns =['Numbers'])

In [7]:
df

Unnamed: 0,Numbers
0,1.0
1,2.0
2,3.0
3,4.0
4,5.5


In [8]:
df.Numbers.dtype  

dtype('float64')

#### Convert Float⇒Int

In [9]:
df['Numbers']=df['Numbers'].astype(int)

- It rounds all values to down
    - ie, 4.1→4 , 4.6→4
- Fix: first round them to nearest whole number
- df['Numbers'].round(0).astype(int)
    - round 0 turns 4.1→4, 4.2→4 , 4.6→5

**or**

- pd.to_numeric(df['Numbers']) 
    - gives float ⇒ Downcast to int(but downcast only work if .0 )
    - So,to get integers
        1.convert all float values to .0
        2.Downcast to Integer
``` ie,pd.to_numeric(df['Numbers'].round(0),downcast='Integer') ```

In [10]:
df

Unnamed: 0,Numbers
0,1
1,2
2,3
3,4
4,5


### Use case 3: Strings + Ints

In [11]:
# Column mixed with Strings and Ints
l=[1,2,3,'Sahil'] # String + Numbers gets converted to object
df=pd.DataFrame(l,columns =['Numbers'])

In [12]:
df

Unnamed: 0,Numbers
0,1
1,2
2,3
3,Sahil


In [13]:
df.Numbers.dtype

dtype('O')

In [34]:
df['Numbers']=pd.to_numeric(df['Numbers'],errors='coerce')
# This errors=coerce converts strings to Nans and keeps the entire row
# But Nan's are floats so our entire column is float now

# This errors=ignore ignores the rows which contains NaNs

In [15]:
df

Unnamed: 0,Numbers
0,1.0
1,2.0
2,3.0
3,


In [16]:
df['Numbers']=df['Numbers'].astype('Int64') 
# It changes numpy's Nan's to pandas <Na> which are ints

In [17]:
df

Unnamed: 0,Numbers
0,1.0
1,2.0
2,3.0
3,


### Use case 4: Ints + Nans

In [18]:
import pandas as pd
import numpy as np
l=[1,2,3,np.nan] # even if 1 Nan⇒ nan is considered as float in pandas⇒ entire col gets converted to float
df=pd.DataFrame(l,columns =['Numbers'])

In [19]:
df

Unnamed: 0,Numbers
0,1.0
1,2.0
2,3.0
3,


In [20]:
df.Numbers.dtype

dtype('float64')

In [21]:
# Now do the same manipulation as above

### Use case 5 : Strings (currencies)

In [22]:
import pandas as pd
import numpy as np
l=['₹1,5000.00','₹2,500.00','₹3,2000.00'] 
df1=pd.DataFrame(l,columns =['Numbers'])

In [23]:
df1

Unnamed: 0,Numbers
0,"₹1,5000.00"
1,"₹2,500.00"
2,"₹3,2000.00"


In [24]:
df1.Numbers.dtype

dtype('O')

In [29]:
df1['Numbers']=df1['Numbers'].str.replace('₹','').str.replace(',','').astype(float)

In [30]:
df1

Unnamed: 0,Numbers
0,15000.0
1,2500.0
2,32000.0


In [31]:
df1.dtypes

Numbers    float64
dtype: object

In [32]:
df1['Numbers']=df1['Numbers'].astype(int)

In [33]:
df1

Unnamed: 0,Numbers
0,15000
1,2500
2,32000
