# Inspecting Data Types
Use the space below to explore `data_08_v2.csv` and `data_18_v2.csv` to answer the quiz questions below regarding datatypes. You should've created these data files in the previous section: *Filter, Drop Nulls, Dedupe*.

In [2]:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

a_df = pd.read_csv('data_08_v2.csv')
b_df = pd.read_csv('data_18_v2.csv')

# Pretty Print Class
# ref doc: https://stackoverflow.com/questions/8924173/how-do-i-print-bold-text-in-python
class color:
   PURPLE = '\033[95m'
   CYAN = '\033[96m'
   DARKCYAN = '\033[36m'
   BLUE = '\033[94m'
   GREEN = '\033[92m'
   YELLOW = '\033[93m'
   RED = '\033[91m'
   BOLD = '\033[1m'
   UNDERLINE = '\033[4m'
   END = '\033[0m'

In [44]:
a_df.head(3)

# list(data) or 
#list(a_df.columns) 

Unnamed: 0,model,displ,cyl,trans,drive,fuel,veh_class,air_pollution_score,city_mpg,hwy_mpg,cmb_mpg,greenhouse_gas_score,smartway
0,ACURA MDX,3.7,(6 cyl),Auto-S5,4WD,Gasoline,SUV,7,15,20,17,4,no
1,ACURA RDX,2.3,(4 cyl),Auto-S5,4WD,Gasoline,SUV,7,17,22,19,5,no
2,ACURA RL,3.5,(6 cyl),Auto-S5,4WD,Gasoline,midsize car,7,16,24,19,5,no


In [43]:
b_df.head(3)

# list(data) or 
#list(b_df.columns) 
#(a_df.columns == b_df.columns).all() # should be true if all headers the same

Unnamed: 0,model,displ,cyl,trans,drive,fuel,veh_class,air_pollution_score,city_mpg,hwy_mpg,cmb_mpg,greenhouse_gas_score,smartway
0,ACURA RDX,3.5,6.0,SemiAuto-6,2WD,Gasoline,small SUV,3,20,28,23,5,No
1,ACURA RDX,3.5,6.0,SemiAuto-6,4WD,Gasoline,small SUV,3,19,27,22,4,No
2,ACURA TLX,2.4,4.0,AMS-8,2WD,Gasoline,small car,3,23,33,27,6,No


Creating object to do single column manipulation
- interesting lessong on checking data types

https://thispointer.com/how-to-get-check-data-types-of-dataframe-columns-in-python-pandas/#:~:text=Use%20Dataframe.,information%20of%20each%20columns%20i.e.&text=It%20returns%20a%20series%20object%20containing%20data%20type%20information%20of%20each%20column.

In [27]:
dict(a_df.dtypes)

{'model': dtype('O'),
 'displ': dtype('float64'),
 'cyl': dtype('O'),
 'trans': dtype('O'),
 'drive': dtype('O'),
 'fuel': dtype('O'),
 'veh_class': dtype('O'),
 'air_pollution_score': dtype('O'),
 'city_mpg': dtype('O'),
 'hwy_mpg': dtype('O'),
 'cmb_mpg': dtype('O'),
 'greenhouse_gas_score': dtype('O'),
 'smartway': dtype('O')}

# Inspecting Data

Quesitons asked
1. Check out ```cyc``` data type
2. Check out ```air_pollution_score``` data type
3. Which features need to be converted from float to strings?
4. Check out ```greenhouse_gas_score``` data type

In [42]:
# check dataset 1 and 2
print(color.BOLD + color.UNDERLINE + '\n-----DATASET 1-----\n' + color.END)
print(a_df.dtypes['cyl'])

print(color.BOLD + color.UNDERLINE + '\n-----DATASET 2-----\n' + color.END)
print(b_df.dtypes['cyl'])

a_df.head()

[1m[4m
-----DATASET 1-----
[0m
object
[1m[4m
-----DATASET 2-----
[0m
float64


Unnamed: 0,model,displ,cyl,trans,drive,fuel,veh_class,air_pollution_score,city_mpg,hwy_mpg,cmb_mpg,greenhouse_gas_score,smartway
0,ACURA MDX,3.7,(6 cyl),Auto-S5,4WD,Gasoline,SUV,7,15,20,17,4,no
1,ACURA RDX,2.3,(4 cyl),Auto-S5,4WD,Gasoline,SUV,7,17,22,19,5,no
2,ACURA RL,3.5,(6 cyl),Auto-S5,4WD,Gasoline,midsize car,7,16,24,19,5,no
3,ACURA TL,3.2,(6 cyl),Auto-S5,2WD,Gasoline,midsize car,7,18,26,21,6,yes
4,ACURA TL,3.5,(6 cyl),Auto-S5,2WD,Gasoline,midsize car,7,17,26,20,6,yes


In [30]:
# check dataset 1 and 2
print(color.BOLD + color.UNDERLINE + '\n-----DATASET 1-----\n' + color.END)
print(a_df.dtypes['air_pollution_score'])

print(color.BOLD + color.UNDERLINE + '\n-----DATASET 2-----\n' + color.END)
print(b_df.dtypes['air_pollution_score'])

[1m[4m
-----DATASET 1-----
[0m
object
[1m[4m
-----DATASET 2-----
[0m
int64


In [34]:
# check dataset 1 and 2
print(color.BOLD + color.UNDERLINE + '\n-----DATASET 1-----\n' + color.END)
print(a_df.dtypes)

print(color.BOLD + color.UNDERLINE + '\n-----DATASET 2-----\n' + color.END)
print(b_df.dtypes)

# improvement here to just filter out all the floats and objects and "datatypes"
# a_df.select_dtypes(include=['float64'])

[1m[4m
-----DATASET 1-----
[0m
model                    object
displ                   float64
cyl                      object
trans                    object
drive                    object
fuel                     object
veh_class                object
air_pollution_score      object
city_mpg                 object
hwy_mpg                  object
cmb_mpg                  object
greenhouse_gas_score     object
smartway                 object
dtype: object
[1m[4m
-----DATASET 2-----
[0m
model                    object
displ                   float64
cyl                     float64
trans                    object
drive                    object
fuel                     object
veh_class                object
air_pollution_score       int64
city_mpg                 object
hwy_mpg                  object
cmb_mpg                  object
greenhouse_gas_score      int64
smartway                 object
dtype: object


In [35]:
# check dataset 1 and 2
print(color.BOLD + color.UNDERLINE + '\n-----DATASET 1-----\n' + color.END)
print(a_df.dtypes['greenhouse_gas_score'])

print(color.BOLD + color.UNDERLINE + '\n-----DATASET 2-----\n' + color.END)
print(b_df.dtypes['greenhouse_gas_score'])

[1m[4m
-----DATASET 1-----
[0m
object
[1m[4m
-----DATASET 2-----
[0m
int64
