# Numerical Data Lab

## Introduction
In this lab we will use the feature lib to transform the numerical data in the `car_data.csv` dataset. Time to start.

In [1]:
import pandas as pd
url = "https://raw.githubusercontent.com/jigsawlabs-student/engineering-large-datasets/master/car_data.csv"
df = pd.read_csv(url)

In [4]:
df[:2]

Unnamed: 0,Car_Name,Year,Selling_Price,Present_Price,Kms_Driven,Fuel_Type,Seller_Type,Transmission,Owner
0,ritz,6 years ago,$3.35,$5.59,27000,Petrol,Dealer,Manual,0
1,sx4,7 years ago,$4.75,$9.54,43000,Diesel,Dealer,Manual,0


We can see that two of the columns have numeric data in them, `Selling_Price` and `Present_Price`.  Let's just select them, as we have a small enough dataset to spot them.

In [6]:
almost_nums_df = df[['Selling_Price', 'Present_Price']]

Ok, now let's use a list comprehension to create our steps, and then coerce the data with a DataFrameMapper. 

> Try not to reference the previous reading at first.  Only look to it if you get stuck.

In [8]:
import pandas as pd
def price_to_num(val):
    return pd.to_numeric(val[1:])

In [13]:
from sklearn_pandas import FunctionTransformer, DataFrameMapper

steps = [([col], FunctionTransformer(price_to_num)) 
             for col in almost_nums_df.columns]

In [15]:
mapper = DataFrameMapper(
    steps, df_out = True
)

In [18]:
transformed_cols = mapper.fit_transform(df)
transformed_cols[:5]

# 	Selling_Price	Present_Price
# 0	3.35	5.59
# 1	4.75	9.54
# 2	7.25	9.85
# 3	2.85	4.15
# 4	4.60	6.87

Unnamed: 0,Selling_Price,Present_Price
0,3.35,5.59
1,4.75,9.54
2,7.25,9.85
3,2.85,4.15
4,4.6,6.87


Next, include all of the columns when from our mapper.  

In [35]:
c

In [46]:
coerced_df = mapper.fit_transform(df)

### Working with Year

Ok, now let's add year in there.  We can write a method that coerces the year, and use it with a transformer to add to our mapper.  Let's get going.

In [71]:
def coerce_to_year(val):
    return pd.to_numeric(val.split()[0])

Store the step in `coerce_step`.

In [88]:
coerce_step = (['Year'], FunctionTransformer(coerce_to_year))

(['Year'], FunctionTransformer(func=None))

Then create a list of our steps for converting the prices and the year.

In [89]:
comb_steps = steps + [coerce_step]

And add the list of steps to the mapper.

In [83]:
mapper_with_num_converter = DataFrameMapper(
    comb_steps, 
    df_out = True,
)

In [86]:
price_year_df = mapper_with_num_converter.fit_transform(df)

price_year_df[:3]

Unnamed: 0,Selling_Price,Present_Price,Year
0,3.35,5.59,6
1,4.75,9.54,7
2,7.25,9.85,3


### Keeping the rest

In [75]:
coerced_df.dtypes

Selling_Price    float64
Present_Price    float64
Car_Name          object
Year              object
Kms_Driven        object
Fuel_Type         object
Seller_Type       object
Transmission      object
Owner             object
dtype: object

We can see that we lost our original int datatypes from our starting dataframe. 

In [43]:
df.dtypes

Car_Name         object
Year             object
Selling_Price    object
Present_Price    object
Kms_Driven        int64
Fuel_Type        object
Seller_Type      object
Transmission     object
Owner             int64
dtype: object

So below, we'll select the datatypes from `df`.  

In [36]:
df_dtypes = df.dtypes.to_dict()
df_dtypes

{'Car_Name': dtype('O'),
 'Year': dtype('O'),
 'Selling_Price': dtype('O'),
 'Present_Price': dtype('O'),
 'Kms_Driven': dtype('int64'),
 'Fuel_Type': dtype('O'),
 'Seller_Type': dtype('O'),
 'Transmission': dtype('O'),
 'Owner': dtype('int64')}

Then use dictionary comprehension to select those that are not of type object.

In [37]:
df_dtypes['Car_Name'].type

numpy.object_

In [38]:
import numpy as np
non_obj_dtypes = {k:v for k, v in dtypes.items() if v.type is not np.object_} 

Then set this to our `coerced_df`.

In [49]:
updated_df = coerced_df.astype(non_obj_dtypes)

In [50]:
updated_df.dtypes

Selling_Price    float64
Present_Price    float64
Car_Name          object
Year              object
Kms_Driven         int64
Fuel_Type         object
Seller_Type       object
Transmission      object
Owner              int64
dtype: object

### Summary

In this lesson, we worked with coercing our numeric data.  We practiced using list iteration to create multiple steps simultaneously, and also worked with coercing our datatypes using a dtypes dictionary.