# 1. Cleaning Data (draft version)

The data that we have used for this project come from the GitHub directory of Nicolas Gervais: 
[Predicting Car Price from Scraped Data](https://github.com/nicolas-gervais/predicting-car-price-from-scraped-data) & [The Car Connection Website](https://www.thecarconnection.com).

The dataset actually contains information for 32000 produced cars from from 1990 to 2017.

This draft notebook contains all the required cleaning steps. The final cleaned dataset ("raw_data_no_dummies_imputed.csv") has been used in the sequence of this notebook ("2. Analysis & Predictive Model.ipynb") and with the software Tableau. The analysis and the results of these sources are presented in the final report pdf file.



In [1]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
        
pd.set_option('display.max_columns', 500) # show all columns
pd.options.display.max_rows = 200 # show 200 rows
        
import warnings
warnings.filterwarnings("ignore") #remove warning messages during csv import

# Any results you write to the current directory are saved as output.

symbols = '!@#$%^&*()_+[]-–'
letters = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
numbers = '0123456789'

pd.options.display.float_format = '{:.2f}'.format

%matplotlib inline

In [2]:
raw_data_original = pd.read_csv("./fullspecs_fetched_data.csv",index_col=0).transpose()

raw_data = raw_data_original.copy()

In [3]:
raw_data.shape

(32316, 234)

In [4]:
curb_cols = [col for col in raw_data.columns.str.lower() if 'curb' in col]
print(curb_cols)

['base curb weight (lbs)', 'turning diameter - curb to curb (ft)', 'curb weight - front (lbs)', 'curb weight - rear (lbs)']


In [5]:
#Hybrid / Electric
raw_data.shape
raw_data['Full Name'] = raw_data.index
raw_data['Hybrid'] = raw_data['Full Name'].str.lower().str.contains('hyb')
raw_data['Electric'] = (raw_data['Engine Type'].str.lower().str.contains('elect') | raw_data['Full Name'].str.lower().str.contains('elect'))
raw_data['Electric'].loc[raw_data['Electric']==1]

2019 Acura MDX Specs: SH-AWD Sport Hybrid w/Technology Pkg    True
2019 Acura MDX Specs: SH-AWD Sport Hybrid w/Advance Pkg       True
2018 Acura MDX Specs: SH-AWD Sport Hybrid w/Technology Pkg    True
2018 Acura MDX Specs: SH-AWD Sport Hybrid w/Advance Pkg       True
2014 Acura ILX Specs: 4-Door Sedan 1.5L Hybrid                True
                                                              ... 
2016 Volvo XC90 Specs: AWD 4-Door T8 Inscription              True
2019 Volvo S90 Specs: T8 eAWD Plug-In Hybrid Momentum         True
2019 Volvo S90 Specs: T8 eAWD Plug-In Hybrid Inscription      True
2018 Volvo S90 Specs: T8 eAWD Plug-In Hybrid Momentum         True
2018 Volvo S90 Specs: T8 eAWD Plug-In Hybrid Inscription      True
Name: Electric, Length: 974, dtype: bool

In [6]:
raw_data['Turning Diameter - Curb to Curb (ft)'] = raw_data['Turning Diameter - Curb to Curb (ft)'].str.strip('-')
raw_data['Turning Diameter - Curb to Curb (ft)'] = raw_data['Turning Diameter - Curb to Curb (ft)'].str.split().str.get(0)
raw_data['Turning Diameter - Curb to Curb (ft)'] = raw_data['Turning Diameter - Curb to Curb (ft)'].str.replace('TBD', 
                                                            '').replace(r'^\s*$', np.nan, regex=True).astype(float)

In [7]:
raw_data['Turning Diameter - Curb to Curb (ft)']

2019 Acura RDX Specs: FWD w/Technology Pkg                                39.00
2019 Acura RDX Specs: FWD w/Advance Pkg                                   39.00
2019 Acura RDX Specs: FWD w/A-Spec Pkg                                    39.00
2019 Acura RDX Specs: FWD                                                 39.00
2019 Acura RDX Specs: AWD w/Technology Pkg                                39.00
                                                                           ... 
2018 Volvo V60 Cross Country Specs: T5 AWD Platinum                       37.10
2016 Volvo V60 Cross Country Specs: 4-Door Wagon T5 AWD                   37.10
2016 Volvo V60 Cross Country Specs: 4-Door Wagon T5 Platinum AWD          37.10
2015 Volvo V60 Cross Country Specs: 2015.5 4-Door Wagon T5 AWD            37.10
2015 Volvo V60 Cross Country Specs: 2015.5 4-Door Wagon T5 Platinum AWD   37.10
Name: Turning Diameter - Curb to Curb (ft), Length: 32316, dtype: float64

In [8]:
'''raw_data['Reverse Ratio (:1)'] = raw_data['Reverse Ratio (:1)'].str.split('/').str.get(0).str.strip('-TBD-')
raw_data['Reverse Ratio (:1)'] = raw_data['Reverse Ratio (:1)'].str.split('-').str.get(0).str.strip('Variable')
raw_data['Reverse Ratio (:1)'] = raw_data['Reverse Ratio (:1)'].str.split().str.get(0).astype(float)'''

"raw_data['Reverse Ratio (:1)'] = raw_data['Reverse Ratio (:1)'].str.split('/').str.get(0).str.strip('-TBD-')\nraw_data['Reverse Ratio (:1)'] = raw_data['Reverse Ratio (:1)'].str.split('-').str.get(0).str.strip('Variable')\nraw_data['Reverse Ratio (:1)'] = raw_data['Reverse Ratio (:1)'].str.split().str.get(0).astype(float)"

In [9]:
#### COMPANY NAMES

raw_data['Full Name'] = raw_data.index
raw_data['Company Name'] = raw_data['Full Name'].str.split(" ",expand=True)[1]

del raw_data['Full Name']

raw_data['Company Name'] = raw_data['Company Name'].str.replace("Alfa", "Alfa Romeo")
raw_data['Company Name'] = raw_data['Company Name'].str.replace("Aston", "Aston Martin")
raw_data['Company Name'] = raw_data['Company Name'].str.replace("Land", "Land Rover")
raw_data['Company Name'] = raw_data['Company Name'].str.replace("smart", "Smart")


# -------- replace na and tbd with np nan

raw_data.replace("NA", np.nan)
raw_data = raw_data.replace("- TBD –", 'NA')
raw_data = raw_data.replace("- TBD -", 'NA')
raw_data['EPA Fuel Economy Est - City (MPG)'] = raw_data['EPA Fuel Economy Est - City (MPG)'].str.replace(r"\(.*\)","")
raw_data = raw_data.replace("NA", np.nan)

# -------- cols with forbidden charac

raw_data = raw_data.rename(columns=lambda x: x.split(" (ft")[0])
raw_data['Passenger Volume'] = raw_data['Passenger Volume'].str.replace(r"\(.*\)","")

# -------- Clean MSRP and convert to float

raw_data.MSRP = raw_data.MSRP.str.replace("$", "")
raw_data.MSRP = raw_data.MSRP.str.replace(",", "")

# -------- Clean basic miles and convert to float

raw_data['Basic Miles/km'] = raw_data['Basic Miles/km'].str.replace(",", "")
raw_data['Basic Miles/km'] = raw_data['Basic Miles/km'].str.replace("Unlimited", "150000")
raw_data['Basic Miles/km'] = raw_data['Basic Miles/km'].str.replace("49999", "50000")

# -------- Clean Drivetrain Miles and convert to float

raw_data['Drivetrain Miles/km'] = raw_data['Drivetrain Miles/km'].str.replace(",", "")
raw_data['Drivetrain Miles/km'] = raw_data['Drivetrain Miles/km'].str.replace("Unlimited", "150000")

# -------- get Roadside Assistance Miles/km miles  as integer

raw_data['Roadside Assistance Miles/km'] = raw_data['Roadside Assistance Miles/km'].str.replace(",", "")
raw_data['Roadside Assistance Miles/km'] = raw_data['Roadside Assistance Miles/km'].str.replace("Unlimited", "100000")

# -------- get number of gears

raw_data['Transmission'] = raw_data['Transmission'].str.lower()
raw_data['Gears'] = raw_data['Transmission'].str.split("-speed", expand=True, n = 1)[0].str[-2:].str.strip()
raw_data.Gears = raw_data['Gears'].str.replace("le", "1")
raw_data.Gears = raw_data['Gears'].str.replace("ed", "NA")
raw_data.Gears = raw_data['Gears'].str.replace("ic", "NA")
raw_data.Gears = raw_data['Gears'].str.replace("es", "NA")
raw_data.Gears = raw_data['Gears'].str.replace("er", "NA")
raw_data.Gears = raw_data['Gears'].str.replace("ls", "NA")
raw_data.Gears = raw_data['Gears'].str.replace("ve", "NA")
raw_data.Gears = raw_data['Gears'].str.replace("to", "NA")
raw_data.Gears = raw_data['Gears'].str.replace("de", "NA")
raw_data.Gears = raw_data['Gears'].str.replace("ch", "NA")
raw_data.Gears = raw_data['Gears'].str.replace("ct", "NA")
raw_data.Gears = raw_data['Gears'].str.replace("rs", "NA")
raw_data.Gears = raw_data['Gears'].str.replace("ft", "NA")
raw_data.Gears = raw_data['Gears'].str.replace("al", "NA")
raw_data.Gears = raw_data['Gears'].str.replace("s,", "NA")
raw_data.Gears = raw_data['Gears'].str.replace("on", "NA")
raw_data.Gears = raw_data['Gears'].str.replace("NA", "NA")
raw_data.Gears = raw_data['Gears'].str.replace("co", "NA")
raw_data.Gears = raw_data['Gears'].str.replace("/8", "NA")


# -------- get max horsepower 
################ FIX THE FIRST SPLIT TO KEEP ONLY THE NUMBER

raw_data['Net Horsepower'] = raw_data['SAE Net Horsepower @ RPM'].str.split(" ",expand=True)[0].value_counts()
raw_data['Net Horsepower'] = raw_data['Net Horsepower'].astype(float)
raw_data.replace("NA", np.nan, inplace=True)


# -------- get max horsepower rpm 
############### FIX
raw_data['Net Horsepower RPM'] = raw_data['SAE Net Horsepower @ RPM'].str.split("@",expand=True)[1].str.strip()
raw_data['Net Horsepower RPM'] = raw_data['Net Horsepower RPM'].str.replace("- TBD -", "NA").str.replace("-TBD-", "NA")
raw_data['Net Horsepower RPM'] = raw_data['Net Horsepower RPM'].str[:4]

# -------- get max torque

raw_data['Net Torque'] = raw_data['SAE Net Torque @ RPM'].str.split(" ", expand=True)[0]
raw_data.replace("NA", np.nan, inplace=True)
raw_data['Net Torque'] = raw_data['Net Torque'].astype(float)

In [10]:

list(raw_data.columns)

['MSRP',
 'Gas Mileage',
 'Engine',
 'EPA Class',
 'Style Name',
 'Drivetrain',
 'Passenger Capacity',
 'Passenger Doors',
 'Body Style',
 'Transmission',
 'EPA Classification',
 'Base Curb Weight (lbs)',
 'Front Hip Room (in)',
 'Front Leg Room (in)',
 'Second Shoulder Room (in)',
 'Passenger Volume',
 'Second Head Room (in)',
 'Front Shoulder Room (in)',
 'Second Hip Room (in)',
 'Front Head Room (in)',
 'Second Leg Room (in)',
 'Wheelbase (in)',
 'Min Ground Clearance (in)',
 'Track Width, Front (in)',
 'Width, Max w/o mirrors (in)',
 'Track Width, Rear (in)',
 'Height, Overall (in)',
 'Cargo Volume to Seat 1',
 'Cargo Volume to Seat 2',
 'Cargo Volume to Seat 3',
 'Fuel Tank Capacity, Approx (gal)',
 'Fuel Economy Est-Combined (MPG)',
 'EPA Fuel Economy Est - City (MPG)',
 'EPA Fuel Economy Est - Hwy (MPG)',
 'Engine Order Code',
 'SAE Net Torque @ RPM',
 'Fuel System',
 'Engine Type',
 'SAE Net Horsepower @ RPM',
 'Displacement',
 'First Gear Ratio (:1)',
 'Sixth Gear Ratio (:1)',

In [11]:
raw_data['Transmission'] = raw_data['Transmission'].str.lower()
raw_data['Gears'] = raw_data['Transmission'].str.split("-speed", expand=True, n = 1)[0].str[-2:].str.strip()
raw_data['Gears'] = raw_data['Gears'].str.strip(letters).str.strip('/').str.strip(',').replace(r'^\s*$', np.nan, regex=True)
raw_data['Gears'] = raw_data['Gears'].astype(float)
raw_data['Gears'].value_counts()

6.00     11240
5.00      8224
4.00      5150
8.00      2781
7.00      1252
9.00       443
10.00      193
1.00        77
3.00        21
Name: Gears, dtype: int64

In [12]:
raw_data['Cylinders'] = raw_data['Engine Type'].str.strip(letters).str.strip(symbols).str.strip(letters).str.strip(symbols)
raw_data['Cylinders'] = raw_data['Cylinders'].str.strip().str.split().str.get(-1)
raw_data['Cylinders'] = raw_data['Cylinders'].str.replace("-", "").str.replace("/", "")
raw_data['Cylinders'] = raw_data['Cylinders'].str.lstrip(letters)
raw_data['Cylinders'] = raw_data['Cylinders'].replace(r'^\s*$', np.nan, regex=True).replace('4Cyl', '4')
raw_data['Cylinders'] = raw_data['Cylinders'].astype(float)

In [13]:
config = ['turbo', 'supercharger', 'regular', 'unleaded', 'premium', 'gas', 'electric', 'turbocharged', 'flexible',
          'intercooled', 'twin', 'unleaded', 'charged', 'ethanol', 'natural', 'high pressure', 'low pressure',
          'ecotec', 'cyl', 'diesel', 'compressed', 'super', 'vortec', '4', '6', '8', '5', '(']
raw_data['Engine Configuration'] = raw_data['Engine Type'].str.lower() 
raw_data['Engine Configuration'] = raw_data['Engine Configuration'].str.strip(numbers).str.lower()
raw_data['Engine Configuration'] = raw_data['Engine Configuration'].str.replace('-', " ").str.replace('/', " ")
raw_data['Engine Configuration'] = raw_data['Engine Configuration'].str.strip(symbols).str.rstrip(numbers)

for i in config:
    raw_data['Engine Configuration'] = raw_data['Engine Configuration'].str.replace(i, " ")
raw_data['Engine Configuration'] = raw_data['Engine Configuration'].str.strip().str[-1]
raw_data['Engine Configuration'] = raw_data['Engine Configuration'].str.upper().str.replace('T', 'FLAT').replace('L', np.nan)
raw_data['Engine Configuration'].value_counts()

V       17456
I       10587
FLAT      966
H         341
W          73
Name: Engine Configuration, dtype: int64

In [14]:
raw_data["Rear Tire Width"] = raw_data["Rear Tire Size"].str.split("/").str.get(0).str[-3:].str.strip()
raw_data["Rear Tire Width"] = raw_data["Rear Tire Width"].replace('R20', np.nan).replace('R18', np.nan)
raw_data["Rear Tire Width"] = raw_data["Rear Tire Width"].replace('D -', np.nan).replace('R15', np.nan).replace('R15', np.nan)
raw_data["Rear Tire Width"] = raw_data["Rear Tire Width"].replace('18"', np.nan).replace('60A', np.nan)
raw_data["Rear Tire Width"] = raw_data["Rear Tire Width"].astype(float)
raw_data["Rear Tire Width"].value_counts().head(8)

245.00    5294
235.00    4440
225.00    3892
265.00    3287
215.00    2672
205.00    2614
255.00    2233
275.00    2136
Name: Rear Tire Width, dtype: int64

In [15]:
raw_data["Front Tire Width"] = raw_data["Front Tire Size"].str.split("/").str.get(0).str[-3:].str.strip()
raw_data["Front Tire Width"] = raw_data["Front Tire Width"].replace('R20', 'NA').str.strip(letters).str.strip(symbols)
raw_data["Front Tire Width"] = raw_data["Front Tire Width"].replace('18"', 'NA').replace('R15', '').replace('D -', '')
raw_data["Front Tire Width"] = raw_data["Front Tire Width"].replace("NA", np.nan).replace("60", np.nan).replace("15", np.nan)
raw_data["Front Tire Width"] = raw_data["Front Tire Width"].replace(r'^\s*$', np.nan, regex=True).astype(float)
raw_data["Front Tire Width"].value_counts().head(8)

245.00    5807
235.00    4692
225.00    4198
265.00    3253
215.00    2702
205.00    2693
255.00    2420
275.00    1841
Name: Front Tire Width, dtype: int64

In [16]:
raw_data["Tire Rating"] = raw_data["Front Tire Size"].str.split("/").str.get(-1).str.strip(numbers).str[0].str.upper()
raw_data["Tire Rating"] = raw_data["Tire Rating"].replace(r'^\s*$', np.nan, regex=True).replace('-', np.nan)
raw_data["Tire Rating"] = raw_data["Tire Rating"].replace('"', np.nan).replace('P', np.nan).replace('X', np.nan)
raw_data["Tire Rating"].value_counts()

R    16015
H     5235
V     2562
S     2259
T     1245
Z     1191
Y     1011
W      778
Name: Tire Rating, dtype: int64

In [17]:
raw_data['Country'] = raw_data['Company Name']

raw_data['Country'] = raw_data['Country'].replace(['Ford', 'Chevrolet', 'GMC', 'Ram', 'Jeep', 'Cadillac', 'Dodge',
                                      'Buick', 'Lincoln', 'Chrysler', 'Tesla'], 'USA')
raw_data['Country'] = raw_data['Country'].replace(['Jaguar', 'Land Rover', 'Bentley', 'Rolls-Royce', 
                                       'Aston Martin', 'Lotus', 'McLaren', 'Mini', 'MINI'], 'UK')
raw_data['Country'] = raw_data['Country'].replace(['Toyota', 'Nissan', 'Honda', 'Subaru', 'Mazda', 'Acura', 
                                       'Mitsubishi', 'Lexus', 'Infiniti', 'INFINITI'], 'Japan')
raw_data['Country'] = raw_data['Country'].replace(['Volkswagen', 'BMW', 'Audi', 'Mercedes-Benz', 'Porsche', 'Smart'], 'Germany')
raw_data['Country'] = raw_data['Country'].replace(['Hyundai', 'Kia', 'Genesis'], 'Korea')
raw_data['Country'] = raw_data['Country'].replace(['Volvo'], 'Sweden')
raw_data['Country'] = raw_data['Country'].replace(['Fiat', 'Maserati', 'Alfa Romeo', 'Lamborghini', 'Ferrari', 'FIAT'], 'Italy')

raw_data['Country Code'] = raw_data['Country'].astype("category").cat.codes


In [18]:
raw_data['Displacement (L)'] = raw_data['Displacement'].str.split("/", expand=True)[0].str[:3]
raw_data['Displacement (L)'] = raw_data['Displacement (L)'].str.replace('39.', '3.9')


# -------- displacement - cc

raw_data['Displacement (cc)'] = raw_data['Displacement'].str.split("/", expand=True)[1]
raw_data['Displacement (cc)'] = raw_data['Displacement (cc)'].str.replace('- TBD -', 'NA')
raw_data['Displacement (cc)'] = raw_data['Displacement (cc)'].str.replace('- TBD –', 'NA')
raw_data['Displacement (cc)'] = raw_data['Displacement (cc)'].str.replace('302 CID', 'NA')
raw_data['Displacement (cc)'] = raw_data['Displacement (cc)'].str.replace(' NA', 'NA')
# raw_data.loc['2018 Buick Envision Specs: AWD 4-Door Essence':'2018 Buick Envision Specs: AWD 4-Door Preferred',
# "Displacement (cc)"] = 'NA'

# -------- get rear tire width

raw_data["Rear Tire Width"] = raw_data["Rear Tire Size"].str.split("/").str.get(0).str[-3:].str.strip()
raw_data["Rear Tire Width"] = raw_data["Rear Tire Width"].replace('R20', 'NA').replace('18\"', 'NA').replace('R15', 'NA').replace('60A', 'NA').replace('R18', 'NA')
raw_data.replace("NA", np.nan, inplace=True)
raw_data["Rear Tire Width"] = raw_data["Rear Tire Width"].astype(float)

# -------- get front tire width

raw_data["Front Tire Width"] = raw_data["Front Tire Size"].str.split("/").str.get(0).str[-3:].str.strip()
#### FIX
raw_data["Front Tire Width"] = raw_data["Front Tire Width"].replace('R20', 'NA').replace('18\"', 'NA').replace('R15', 'NA').replace('60A', 'NA').replace('R18', 'NA')
raw_data.replace("NA", np.nan, inplace=True)
raw_data["Front Tire Width"] = raw_data["Front Tire Width"].astype(float)

# -------- get rear wheel size
#### FIX
raw_data["Rear Wheel Size"] = raw_data["Rear Wheel Size (in)"].str[:2].replace('P2', 'NA')
raw_data.replace("NA", np.nan, inplace=True)
raw_data["Rear Wheel Size"] = raw_data["Rear Wheel Size"].astype(float)
# -------- get front wheel size
#### FIX
raw_data["Front Wheel Size"] = raw_data["Front Wheel Size (in)"].str[:2].replace('P2', 'NA')
raw_data.replace("NA", np.nan, inplace=True)
raw_data["Front Wheel Size"]= raw_data["Front Wheel Size"].astype(float)
# -------- get tire rating

raw_data["Tire Rating"] = raw_data["Front Tire Size"].str.split("/").str.get(-1).str[-4]
raw_data["Tire Rating"] = raw_data["Tire Rating"].replace('5', 'NA')
raw_data["Tire Rating"] = raw_data["Tire Rating"].replace('0', 'NA')
raw_data["Tire Rating"] = raw_data["Tire Rating"].replace('1', 'NA')
raw_data["Tire Rating"] = raw_data["Tire Rating"].replace('2', 'NA')



# -------- get width ratio

raw_data["Tire Width Ratio"] = raw_data["Rear Tire Width"]/raw_data["Front Tire Width"]

# -------- get size ratio

raw_data["Wheel Size Ratio"] = raw_data["Rear Wheel Size"] / raw_data["Front Wheel Size"]

# -------- get tire ratio

raw_data["Tire Ratio"] = raw_data["Front Tire Size"].str.split("/").str.get(1).str[0]
raw_data["Tire Ratio"] = raw_data["Tire Ratio"].replace('Y', 'NA')

# -------- get year

raw_data["Year"] = raw_data.index.str[:4].astype(float)

In [19]:

# -------- edit drivetrain values

raw_data['Drivetrain'] = raw_data['Drivetrain'].str.replace('4-Wheel Drive', 'Four Wheel Drive')
raw_data['Drivetrain'] = raw_data['Drivetrain'].str.replace('Front wheel drive', 'Front Wheel Drive')
raw_data['Drivetrain'] = raw_data['Drivetrain'].str.replace('Four-Wheel Drive', 'Four Wheel Drive')

# -------- edit fuel system values

raw_data['Fuel System'] = raw_data['Fuel System'].str.replace('Turbocharged EFI', 'Electronic Fuel Injection')
raw_data['Fuel System'] = raw_data['Fuel System'].str.replace('Electric', 'Electronic Fuel Injection')
raw_data['Fuel System'] = raw_data['Fuel System'].str.replace('Sequential MPI (injection)', 'Sequential MPI')
raw_data['Fuel System'] = raw_data['Fuel System'].str.replace('SMPI', 'Sequential MPI')
raw_data['Fuel System'] = raw_data['Fuel System'].str.replace('EFI', 'Electronic Fuel Injection')
raw_data['Fuel System'] = raw_data['Fuel System'].str.replace('Direct Gasoline Injection', 'Direct Injection')

In [20]:
raw_data['Sixth Gear Ratio (:1)'] = raw_data['Sixth Gear Ratio (:1)'].str.strip(letters).str.strip(symbols)
raw_data['Sixth Gear Ratio (:1)'] = raw_data['Sixth Gear Ratio (:1)'].str.strip(' TBD ').replace(r'^\s*$', np.nan, regex=True).astype(float)

In [21]:
raw_data = raw_data.rename(columns=lambda x: x.split(" (ft")[0])
raw_data['EPA Fuel Economy Est - City (MPG)'] = raw_data['EPA Fuel Economy Est - City (MPG)'].str.replace(r"\(.*\)","")
raw_data['Passenger Volume'] = raw_data['Passenger Volume'].str.replace(r"\(.*\)","")

In [22]:
raw_data['Cylinders'] = raw_data['Engine Type'].str.split("-", expand=True)[1]
raw_data['Cylinders'] = raw_data['Cylinders'].str.replace("Cyl", "4")
raw_data['Cylinders'] = raw_data['Cylinders'].str.replace("in Electric I4", "4")


# -------- replace na by npnan

raw_data.replace("NA", np.nan, inplace=True)

# -------- convert all to float

raw_data.MSRP = raw_data.MSRP.astype(float)
raw_data["Tire Ratio"] = raw_data["Tire Ratio"].astype(float)
raw_data['Displacement (cc)'] = raw_data['Displacement (cc)'].astype(float)
raw_data['Displacement (L)'] = raw_data['Displacement (L)'].astype(float)

raw_data['Cylinders'] = raw_data['Cylinders'].str.replace('cyl', 'NA').str.replace('Pressure Turbo Gas I5', 'NA').str.replace('Turbocharged Gas V12', 'NA').str.replace('Scroll Turbocharged Gas I6', 'NA').str.replace('4 Turbocharged', 'NA').str.replace('Turbocharged Gas V8', 'NA')
raw_data.replace("NA", np.nan, inplace=True)
raw_data['Cylinders'] = raw_data['Cylinders'].astype(float)

raw_data['Net Horsepower RPM'] = raw_data['Net Horsepower RPM'].astype(float)

In [23]:
raw_data['Gears'] = raw_data['Gears'].astype(float)
raw_data['Roadside Assistance Miles/km'] = raw_data['Roadside Assistance Miles/km'].astype(float)
raw_data['Drivetrain Miles/km'] = raw_data['Drivetrain Miles/km'].astype(float)
raw_data['Basic Miles/km'] = raw_data['Basic Miles/km'].astype(float)

In [24]:
raw_data.head()

Unnamed: 0,MSRP,Gas Mileage,Engine,EPA Class,Style Name,Drivetrain,Passenger Capacity,Passenger Doors,Body Style,Transmission,EPA Classification,Base Curb Weight (lbs),Front Hip Room (in),Front Leg Room (in),Second Shoulder Room (in),Passenger Volume,Second Head Room (in),Front Shoulder Room (in),Second Hip Room (in),Front Head Room (in),Second Leg Room (in),Wheelbase (in),Min Ground Clearance (in),"Track Width, Front (in)","Width, Max w/o mirrors (in)","Track Width, Rear (in)","Height, Overall (in)",Cargo Volume to Seat 1,Cargo Volume to Seat 2,Cargo Volume to Seat 3,"Fuel Tank Capacity, Approx (gal)",Fuel Economy Est-Combined (MPG),EPA Fuel Economy Est - City (MPG),EPA Fuel Economy Est - Hwy (MPG),Engine Order Code,SAE Net Torque @ RPM,Fuel System,Engine Type,SAE Net Horsepower @ RPM,Displacement,First Gear Ratio (:1),Sixth Gear Ratio (:1),Trans Description Cont.,Fourth Gear Ratio (:1),Seventh Gear Ratio (:1),Trans Order Code,Second Gear Ratio (:1),Reverse Ratio (:1),Trans Description Cont. Again,Fifth Gear Ratio (:1),Eighth Gear Ratio (:1),Trans Type,Third Gear Ratio (:1),Final Drive Axle Ratio (:1),Brake Type,Rear Brake Rotor Diam x Thickness (in),Disc - Rear (Yes or ),Brake ABS System,Drum - Rear (Yes or ),Front Brake Rotor Diam x Thickness (in),Disc - Front (Yes or ),Rear Drum Diam x Width (in),Steering Type,Turning Diameter - Curb to Curb,Front Tire Order Code,Spare Tire Size,Front Tire Size,Rear Tire Order Code,Rear Tire Size,Spare Tire Order Code,Front Wheel Size (in),Spare Wheel Material,Front Wheel Material,Rear Wheel Size (in),Rear Wheel Material,Spare Wheel Size (in),Suspension Type - Front,Suspension Type - Rear (Cont.),Suspension Type - Rear,Suspension Type - Front (Cont.),Air Bag-Frontal-Driver,Air Bag-Frontal-Passenger,Air Bag-Passenger Switch (On/Off),Air Bag-Side Body-Front,Air Bag-Side Body-Rear,Air Bag-Side Head-Front,Air Bag-Side Head-Rear,Brakes-ABS,Child Safety Rear Door Locks,Daytime Running Lights,Traction Control,Night Vision,Rollover Protection Bars,Fog Lamps,Parking Aid,Tire Pressure Monitor,Back-Up Camera,Stability Control,Other Features,Basic Miles/km,Basic Years,Corrosion Miles/km,Corrosion Years,Drivetrain Miles/km,Drivetrain Years,Roadside Assistance Miles/km,Roadside Assistance Years,Maximum Alternator Capacity (amps),Cold Cranking Amps @ 0° F (Primary),Wt Distributing Hitch - Max Tongue Wt. (lbs),Dead Weight Hitch - Max Tongue Wt. (lbs),Maximum Trailering Capacity (lbs),Wt Distributing Hitch - Max Trailer Wt. (lbs),Dead Weight Hitch - Max Trailer Wt. (lbs),Liftover Height (in),Rear Door Opening Height (in),Rear Door Opening Width (in),"Length, Overall (in)",Cargo Box Width @ Wheelhousings (in),Cargo Area Length @ Floor to Seat 3 (in),Cargo Area Length @ Floor to Seat 1 (in),Cargo Box (Area) Height (in),Cargo Area Width @ Beltline (in),Cargo Area Length @ Floor to Seat 2 (in),Clutch Size (in),Turning Diameter - Wall to Wall,Lock to Lock Turns (Steering),"Steering Ratio (:1), Overall",Shock Absorber Diameter - Front (mm),Stabilizer Bar Diameter - Rear (in),Shock Absorber Diameter - Rear (mm),Stabilizer Bar Diameter - Front (in),Total Cooling System Capacity (qts),Third Shoulder Room (in),Third Head Room (in),Third Hip Room (in),Third Leg Room (in),Hybrid/Electric Components Miles/km,Hybrid/Electric Components Years,"Transfer Case Gear Ratio (:1), High","Transfer Case Gear Ratio (:1), Low",Trunk Volume,Number of Passenger Doors,Roadside Assistance Note,Warranty Note,Maintenance Miles/km,Maintenance Years,Basic Note,Cargo Volume with Rear Seat Up,Cargo Volume with Rear Seat Down,Gross Vehicle Weight Rating Cap (lbs),Engine Oil Cooler,Transfer Case Model,Transfer Case Power Take Off,Trans PTO Access,Brake ABS System (Second Line),Axle Type - Rear,Axle Type - Front,Cold Cranking Amps @ 0° F (2nd),EPA MPG Equivalent - Combined,Battery Range (mi),Axle Ratio (:1) - Rear,Axle Ratio (:1) - Front,Gross Axle Wt Rating - Front (lbs),Gross Axle Wt Rating - Rear (lbs),EPA MPG Equivalent - City,EPA MPG Equivalent - Hwy,Maintenance Note,Emissions Miles/km,Emissions Years,Ninth Gear Ratio (:1),EPA Air Pollution Score,Rear Door Type,Curb Weight - Front (lbs),Gross Combined Wt Rating (lbs),Curb Weight - Rear (lbs),"Ground Clearance, Rear (in)",Step Up Height - Front (in),"Length, Overall w/rear bumper (in)",Ground to Top of Load Floor (in),Side Door Opening Height (in),"Overhang, Front (in)",Step Up Height - Side (in),"Ground Clearance, Front (in)",Side Door Opening Width (in),"Overhang, Rear w/bumper (in)",Cargo Volume to Seat 4,Cargo Area Length @ Floor to Seat 4 (in),Cargo Area Length @ Floor to Console (in),"Aux Fuel Tank Capacity, Approx (gal)",Fuel Tank Location,Aux Fuel Tank Location,Trans Power Take Off,Tenth Gear Ratio (:1),"Steering Ratio (:1), On Center","Steering Ratio (:1), At Lock",Spare Tire Capacity (lbs),Front Tire Capacity (lbs),Revolutions/Mile @ 45 mph - Spare,Revolutions/Mile @ 45 mph - Front,Rear Tire Capacity (lbs),Revolutions/Mile @ 45 mph - Rear,Axle Capacity - Front (lbs),Spring Capacity - Front (lbs),Axle Capacity - Rear (lbs),Spring Capacity - Rear (lbs),Maximum Alternator Watts,Fifth Wheel Hitch - Max Trailer Wt. (lbs),Fifth Wheel Hitch - Max Tongue Wt. (lbs),"Length, Overall w/o rear bumper (in)",Front Bumper to Back of Cab (in),"Frame Width, Rear (in)",Cab to Axle (in),"Overhang, Rear w/o bumper (in)",Ground to Top of Frame (in),Cab to End of Frame (in),"Cargo Box Width @ Top, Rear (in)",Cargo Volume,Cargo Box Width @ Floor (in),Ext'd Cab Cargo Volume,Cargo Box Length @ Floor (in),Tailgate Width (in),Drivetrain Note,Emissions Note,Fourth Hip Room (in),Fourth Leg Room (in),Fourth Shoulder Room (in),Fourth Head Room (in),Fifth Shoulder Room (in),Fifth Head Room (in),Fifth Hip Room (in),Fifth Leg Room (in),Corrosion Note,Hybrid/Electric Components Note,Hybrid,Electric,Company Name,Gears,Net Horsepower,Net Horsepower RPM,Net Torque,Cylinders,Engine Configuration,Rear Tire Width,Front Tire Width,Tire Rating,Country,Country Code,Displacement (L),Displacement (cc),Rear Wheel Size,Front Wheel Size,Tire Width Ratio,Wheel Size Ratio,Tire Ratio,Year
2019 Acura RDX Specs: FWD w/Technology Pkg,40600.0,22 mpg City/28 mpg Hwy,"Turbo Premium Unleaded I-4, 2.0 L",Small Sport Utility Vehicles 2WD,FWD w/Technology Pkg,Front Wheel Drive,5,4,Sport Utility,transmission: 10-speed automatic -inc: sequent...,Small Sport Utility Vehicles 2WD,3790,55,41.6,56.6,104,38.3,59.7,49.9,39.6,38.4,108.3,5.7,64.2,74.8,64.7,65.7,58.9,29.5,29.5,17.1,24.0,22,28,,280 @ 1600,Gasoline Direct Injection,Turbo Premium Unleaded I-4,272 @ 6500,2.0 L/122,5.25,1.0,Automatic w/OD,1.6,0.78,,3.27,3.97,,1.3,0.65,10,2.19,4.17,4-Wheel Disc,12.2,Yes,4-Wheel,,12.4,Yes,,Rack-Pinion,39.0,,,P235/55HR19,,P235/55HR19,,19 X 8,,Aluminum,19 X 8,Aluminum,,Strut,Multi-Link,Multi-Link,Strut,Yes,Yes,No,Yes,No,Yes,Yes,Yes,Yes,Yes,Yes,No,No,No,Yes,Yes,Yes,Yes,Vehicle Stability Assist Electronic Stability ...,50000.0,4,Unlimited,5,70000.0,6,50000.0,4,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,False,False,Acura,10.0,,6500.0,280.0,4.0,I,235.0,235.0,H,Japan,2,2.0,122.0,19.0,19.0,1.0,1.0,5.0,2019.0
2019 Acura RDX Specs: FWD w/Advance Pkg,45500.0,22 mpg City/28 mpg Hwy,"Turbo Premium Unleaded I-4, 2.0 L",Small Sport Utility Vehicles 2WD,FWD w/Advance Pkg,Front Wheel Drive,5,4,Sport Utility,transmission: 10-speed automatic -inc: sequent...,Small Sport Utility Vehicles 2WD,3829,55,41.6,56.6,104,38.3,59.7,49.9,39.6,38.4,108.3,5.7,64.2,74.8,64.7,65.7,58.9,29.5,29.5,17.1,24.0,22,28,,280 @ 1600,Gasoline Direct Injection,Turbo Premium Unleaded I-4,272 @ 6500,2.0 L/122,5.25,1.0,Automatic w/OD,1.6,0.78,,3.27,3.97,,1.3,0.65,10,2.19,4.17,4-Wheel Disc,12.2,Yes,4-Wheel,,12.4,Yes,,Rack-Pinion,39.0,,,P235/55HR19,,P235/55HR19,,19 X 8,,Aluminum,19 X 8,Aluminum,,Strut,Multi-Link,Multi-Link,Strut,Yes,Yes,No,Yes,No,Yes,Yes,Yes,Yes,Yes,Yes,No,No,Yes,Yes,Yes,Yes,Yes,Vehicle Stability Assist Electronic Stability ...,50000.0,4,Unlimited,5,70000.0,6,50000.0,4,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,False,False,Acura,10.0,,6500.0,280.0,4.0,I,235.0,235.0,H,Japan,2,2.0,122.0,19.0,19.0,1.0,1.0,5.0,2019.0
2019 Acura RDX Specs: FWD w/A-Spec Pkg,43600.0,22 mpg City/27 mpg Hwy,"Turbo Premium Unleaded I-4, 2.0 L",Small Sport Utility Vehicles 2WD,FWD w/A-Spec Pkg,Front Wheel Drive,5,4,Sport Utility,transmission: 10-speed automatic -inc: sequent...,Small Sport Utility Vehicles 2WD,3821,55,41.6,56.6,104,38.3,59.7,49.9,39.6,38.4,108.3,5.7,64.2,74.8,64.7,65.7,58.9,29.5,29.5,17.1,24.0,22,27,,280 @ 1600,Gasoline Direct Injection,Turbo Premium Unleaded I-4,272 @ 6500,2.0 L/122,5.25,1.0,Automatic w/OD,1.6,0.78,,3.27,3.97,,1.3,0.65,10,2.19,4.17,4-Wheel Disc,12.2,Yes,4-Wheel,,12.4,Yes,,Rack-Pinion,39.0,,,P255/45VR20,,P255/45VR20,,20 X 8,,Aluminum,20 X 8,Aluminum,,Strut,Multi-Link,Multi-Link,Strut,Yes,Yes,No,Yes,No,Yes,Yes,Yes,Yes,Yes,Yes,No,No,Yes,Yes,Yes,Yes,Yes,Vehicle Stability Assist Electronic Stability ...,50000.0,4,Unlimited,5,70000.0,6,50000.0,4,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,False,False,Acura,10.0,,6500.0,280.0,4.0,I,255.0,255.0,V,Japan,2,2.0,122.0,20.0,20.0,1.0,1.0,4.0,2019.0
2019 Acura RDX Specs: FWD,37400.0,22 mpg City/28 mpg Hwy,"Turbo Premium Unleaded I-4, 2.0 L",Small Sport Utility Vehicles 2WD,FWD,Front Wheel Drive,5,4,Sport Utility,transmission: 10-speed automatic -inc: sequent...,Small Sport Utility Vehicles 2WD,3783,55,41.6,56.6,104,38.3,59.7,49.9,39.6,38.4,108.3,5.7,64.2,74.8,64.7,65.7,58.9,29.5,29.5,17.1,24.0,22,28,,280 @ 1600,Gasoline Direct Injection,Turbo Premium Unleaded I-4,272 @ 6500,2.0 L/122,5.25,1.0,Automatic w/OD,1.6,0.78,,3.27,3.97,,1.3,0.65,10,2.19,4.17,4-Wheel Disc,12.2,Yes,4-Wheel,,12.4,Yes,,Rack-Pinion,39.0,,,P235/55HR19,,P235/55HR19,,19 X 8,,Aluminum,19 X 8,Aluminum,,Strut,Multi-Link,Multi-Link,Strut,Yes,Yes,No,Yes,No,Yes,Yes,Yes,Yes,Yes,Yes,No,No,No,No,Yes,Yes,Yes,Vehicle Stability Assist Electronic Stability ...,50000.0,4,Unlimited,5,70000.0,6,50000.0,4,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,False,False,Acura,10.0,,6500.0,280.0,4.0,I,235.0,235.0,H,Japan,2,2.0,122.0,19.0,19.0,1.0,1.0,5.0,2019.0
2019 Acura RDX Specs: AWD w/Technology Pkg,42600.0,21 mpg City/27 mpg Hwy,"Turbo Premium Unleaded I-4, 2.0 L",Small Sport Utility Vehicles 4WD,AWD w/Technology Pkg,All Wheel Drive,5,4,Sport Utility,transmission: 10-speed automatic -inc: sequent...,Small Sport Utility Vehicles 4WD,4026,55,41.6,56.6,104,38.3,59.7,49.9,39.6,38.4,108.3,5.7,64.2,74.8,64.7,65.7,58.9,29.5,29.5,17.1,23.0,21,27,,280 @ 1600,Gasoline Direct Injection,Turbo Premium Unleaded I-4,272 @ 6500,2.0 L/122,5.25,1.0,Automatic w/OD,1.6,0.78,,3.27,3.97,,1.3,0.65,10,2.19,4.17,4-Wheel Disc,12.2,Yes,4-Wheel,,12.4,Yes,,Rack-Pinion,39.0,,Compact,P235/55HR19,,P235/55HR19,,19 X 8,Steel,Aluminum,19 X 8,Aluminum,Compact,Strut,Multi-Link,Multi-Link,Strut,Yes,Yes,No,Yes,No,Yes,Yes,Yes,Yes,Yes,Yes,No,No,No,Yes,Yes,Yes,Yes,Vehicle Stability Assist Electronic Stability ...,50000.0,4,Unlimited,5,70000.0,6,50000.0,4,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,False,False,Acura,10.0,,6500.0,280.0,4.0,I,235.0,235.0,H,Japan,2,2.0,122.0,19.0,19.0,1.0,1.0,5.0,2019.0


In [25]:
raw_data['Net Torque RPM'] = raw_data['SAE Net Torque @ RPM'].str.split().str.get(-1).str[-4:].str.strip()
raw_data['Net Torque RPM'] = raw_data['Net Torque RPM'].str.replace("- TBD -", "NA").str.replace('-', 'NA').str.replace('ined', 'NA').str.replace('E85', 'NA').str.replace('NA\)', 'NA').str.replace('est\)', 'NA')
raw_data.replace("NA", np.nan, inplace=True)
raw_data['Net Torque RPM'] = raw_data['Net Torque RPM'].astype(float)
raw_data['Net Torque RPM'] = raw_data['Net Torque RPM'].clip(lower=1000)



# -------- replace na by npnan

raw_data.replace("NA", np.nan, inplace=True)

# -------- convert all to float

raw_data.MSRP = raw_data.MSRP.astype(float)
raw_data["Tire Ratio"] = raw_data["Tire Ratio"].astype(float)
raw_data['Displacement (cc)'] = raw_data['Displacement (cc)'].astype(float)
raw_data['Displacement (L)'] = raw_data['Displacement (L)'].astype(float)
raw_data['Cylinders'] = raw_data['Cylinders'].astype(float)
raw_data['Net Horsepower RPM'] = raw_data['Net Horsepower RPM'].astype(float)
raw_data['Gears'] = raw_data['Gears'].astype(float)
raw_data['Roadside Assistance Miles/km'] = raw_data['Roadside Assistance Miles/km'].astype(float)
raw_data['Drivetrain Miles/km'] = raw_data['Drivetrain Miles/km'].astype(float)
raw_data['Basic Miles/km'] = raw_data['Basic Miles/km'].astype(float)

# -------- converet numeric

specs_to_numeric = ['MSRP', 'Passenger Capacity', 'Passenger Doors',
                    'Base Curb Weight (lbs)', 'Second Shoulder Room (in)',
                    'Second Head Room (in)', 'Front Shoulder Room (in)',
                    'Second Hip Room (in)', 'Front Head Room (in)', 'Second Leg Room (in)', 'Front Hip Room (in)',
                    'Front Leg Room (in)', 'Width, Max w/o mirrors (in)', 'Track Width, Rear (in)',
                    'Height, Overall (in)', 'Wheelbase (in)', 'Track Width, Front (in)',
                    'Fuel Tank Capacity, Approx (gal)', 'EPA Fuel Economy Est - City (MPG)',
                    'EPA Fuel Economy Est - Hwy (MPG)',
                    'Fuel Economy Est-Combined (MPG)', 'Fourth Gear Ratio (:1)',
                    'Second Gear Ratio (:1)', 'Reverse Ratio (:1)', 'Fifth Gear Ratio (:1)',
                    'Third Gear Ratio (:1)', 'Final Drive Axle Ratio (:1)', 'First Gear Ratio (:1)',
                    'Sixth Gear Ratio (:1)', 'Passenger Volume',
                    'Front Brake Rotor Diam x Thickness (in)', 'Disc - Front (Yes or   )',
                    'Rear Brake Rotor Diam x Thickness (in)', 'Rear Wheel Size (in)',
                    'Rear Wheel Material', 'Spare Wheel Size (in)', 'Front Wheel Size (in)', 'Basic Miles/km',
                    'Basic Years', 'Corrosion Years', 'Drivetrain Miles/km', 'Drivetrain Years',
                    'Roadside Assistance Miles/km', 'Roadside Assistance Years', 'Year', 'Tire Ratio',
                    'Front Tire Width', 'Rear Tire Width', 'Displacement (cc)', 'Displacement (L)', 'Net Torque RPM',
                    'Net Torque', 'Gears', 'Net Horsepower', 'Net Horsepower RPM', 'Cylinders']

for i in specs_to_numeric:
    raw_data[i] = pd.to_numeric(raw_data[i], errors='coerce')

raw_data.head()

Unnamed: 0,MSRP,Gas Mileage,Engine,EPA Class,Style Name,Drivetrain,Passenger Capacity,Passenger Doors,Body Style,Transmission,EPA Classification,Base Curb Weight (lbs),Front Hip Room (in),Front Leg Room (in),Second Shoulder Room (in),Passenger Volume,Second Head Room (in),Front Shoulder Room (in),Second Hip Room (in),Front Head Room (in),Second Leg Room (in),Wheelbase (in),Min Ground Clearance (in),"Track Width, Front (in)","Width, Max w/o mirrors (in)","Track Width, Rear (in)","Height, Overall (in)",Cargo Volume to Seat 1,Cargo Volume to Seat 2,Cargo Volume to Seat 3,"Fuel Tank Capacity, Approx (gal)",Fuel Economy Est-Combined (MPG),EPA Fuel Economy Est - City (MPG),EPA Fuel Economy Est - Hwy (MPG),Engine Order Code,SAE Net Torque @ RPM,Fuel System,Engine Type,SAE Net Horsepower @ RPM,Displacement,First Gear Ratio (:1),Sixth Gear Ratio (:1),Trans Description Cont.,Fourth Gear Ratio (:1),Seventh Gear Ratio (:1),Trans Order Code,Second Gear Ratio (:1),Reverse Ratio (:1),Trans Description Cont. Again,Fifth Gear Ratio (:1),Eighth Gear Ratio (:1),Trans Type,Third Gear Ratio (:1),Final Drive Axle Ratio (:1),Brake Type,Rear Brake Rotor Diam x Thickness (in),Disc - Rear (Yes or ),Brake ABS System,Drum - Rear (Yes or ),Front Brake Rotor Diam x Thickness (in),Disc - Front (Yes or ),Rear Drum Diam x Width (in),Steering Type,Turning Diameter - Curb to Curb,Front Tire Order Code,Spare Tire Size,Front Tire Size,Rear Tire Order Code,Rear Tire Size,Spare Tire Order Code,Front Wheel Size (in),Spare Wheel Material,Front Wheel Material,Rear Wheel Size (in),Rear Wheel Material,Spare Wheel Size (in),Suspension Type - Front,Suspension Type - Rear (Cont.),Suspension Type - Rear,Suspension Type - Front (Cont.),Air Bag-Frontal-Driver,Air Bag-Frontal-Passenger,Air Bag-Passenger Switch (On/Off),Air Bag-Side Body-Front,Air Bag-Side Body-Rear,Air Bag-Side Head-Front,Air Bag-Side Head-Rear,Brakes-ABS,Child Safety Rear Door Locks,Daytime Running Lights,Traction Control,Night Vision,Rollover Protection Bars,Fog Lamps,Parking Aid,Tire Pressure Monitor,Back-Up Camera,Stability Control,Other Features,Basic Miles/km,Basic Years,Corrosion Miles/km,Corrosion Years,Drivetrain Miles/km,Drivetrain Years,Roadside Assistance Miles/km,Roadside Assistance Years,Maximum Alternator Capacity (amps),Cold Cranking Amps @ 0° F (Primary),Wt Distributing Hitch - Max Tongue Wt. (lbs),Dead Weight Hitch - Max Tongue Wt. (lbs),Maximum Trailering Capacity (lbs),Wt Distributing Hitch - Max Trailer Wt. (lbs),Dead Weight Hitch - Max Trailer Wt. (lbs),Liftover Height (in),Rear Door Opening Height (in),Rear Door Opening Width (in),"Length, Overall (in)",Cargo Box Width @ Wheelhousings (in),Cargo Area Length @ Floor to Seat 3 (in),Cargo Area Length @ Floor to Seat 1 (in),Cargo Box (Area) Height (in),Cargo Area Width @ Beltline (in),Cargo Area Length @ Floor to Seat 2 (in),Clutch Size (in),Turning Diameter - Wall to Wall,Lock to Lock Turns (Steering),"Steering Ratio (:1), Overall",Shock Absorber Diameter - Front (mm),Stabilizer Bar Diameter - Rear (in),Shock Absorber Diameter - Rear (mm),Stabilizer Bar Diameter - Front (in),Total Cooling System Capacity (qts),Third Shoulder Room (in),Third Head Room (in),Third Hip Room (in),Third Leg Room (in),Hybrid/Electric Components Miles/km,Hybrid/Electric Components Years,"Transfer Case Gear Ratio (:1), High","Transfer Case Gear Ratio (:1), Low",Trunk Volume,Number of Passenger Doors,Roadside Assistance Note,Warranty Note,Maintenance Miles/km,Maintenance Years,Basic Note,Cargo Volume with Rear Seat Up,Cargo Volume with Rear Seat Down,Gross Vehicle Weight Rating Cap (lbs),Engine Oil Cooler,Transfer Case Model,Transfer Case Power Take Off,Trans PTO Access,Brake ABS System (Second Line),Axle Type - Rear,Axle Type - Front,Cold Cranking Amps @ 0° F (2nd),EPA MPG Equivalent - Combined,Battery Range (mi),Axle Ratio (:1) - Rear,Axle Ratio (:1) - Front,Gross Axle Wt Rating - Front (lbs),Gross Axle Wt Rating - Rear (lbs),EPA MPG Equivalent - City,EPA MPG Equivalent - Hwy,Maintenance Note,Emissions Miles/km,Emissions Years,Ninth Gear Ratio (:1),EPA Air Pollution Score,Rear Door Type,Curb Weight - Front (lbs),Gross Combined Wt Rating (lbs),Curb Weight - Rear (lbs),"Ground Clearance, Rear (in)",Step Up Height - Front (in),"Length, Overall w/rear bumper (in)",Ground to Top of Load Floor (in),Side Door Opening Height (in),"Overhang, Front (in)",Step Up Height - Side (in),"Ground Clearance, Front (in)",Side Door Opening Width (in),"Overhang, Rear w/bumper (in)",Cargo Volume to Seat 4,Cargo Area Length @ Floor to Seat 4 (in),Cargo Area Length @ Floor to Console (in),"Aux Fuel Tank Capacity, Approx (gal)",Fuel Tank Location,Aux Fuel Tank Location,Trans Power Take Off,Tenth Gear Ratio (:1),"Steering Ratio (:1), On Center","Steering Ratio (:1), At Lock",Spare Tire Capacity (lbs),Front Tire Capacity (lbs),Revolutions/Mile @ 45 mph - Spare,Revolutions/Mile @ 45 mph - Front,Rear Tire Capacity (lbs),Revolutions/Mile @ 45 mph - Rear,Axle Capacity - Front (lbs),Spring Capacity - Front (lbs),Axle Capacity - Rear (lbs),Spring Capacity - Rear (lbs),Maximum Alternator Watts,Fifth Wheel Hitch - Max Trailer Wt. (lbs),Fifth Wheel Hitch - Max Tongue Wt. (lbs),"Length, Overall w/o rear bumper (in)",Front Bumper to Back of Cab (in),"Frame Width, Rear (in)",Cab to Axle (in),"Overhang, Rear w/o bumper (in)",Ground to Top of Frame (in),Cab to End of Frame (in),"Cargo Box Width @ Top, Rear (in)",Cargo Volume,Cargo Box Width @ Floor (in),Ext'd Cab Cargo Volume,Cargo Box Length @ Floor (in),Tailgate Width (in),Drivetrain Note,Emissions Note,Fourth Hip Room (in),Fourth Leg Room (in),Fourth Shoulder Room (in),Fourth Head Room (in),Fifth Shoulder Room (in),Fifth Head Room (in),Fifth Hip Room (in),Fifth Leg Room (in),Corrosion Note,Hybrid/Electric Components Note,Hybrid,Electric,Company Name,Gears,Net Horsepower,Net Horsepower RPM,Net Torque,Cylinders,Engine Configuration,Rear Tire Width,Front Tire Width,Tire Rating,Country,Country Code,Displacement (L),Displacement (cc),Rear Wheel Size,Front Wheel Size,Tire Width Ratio,Wheel Size Ratio,Tire Ratio,Year,Net Torque RPM
2019 Acura RDX Specs: FWD w/Technology Pkg,40600.0,22 mpg City/28 mpg Hwy,"Turbo Premium Unleaded I-4, 2.0 L",Small Sport Utility Vehicles 2WD,FWD w/Technology Pkg,Front Wheel Drive,5,4,Sport Utility,transmission: 10-speed automatic -inc: sequent...,Small Sport Utility Vehicles 2WD,3790.0,55.0,41.6,56.6,104.0,38.3,59.7,49.9,39.6,38.4,108.3,5.7,64.2,74.8,64.7,65.7,58.9,29.5,29.5,17.1,24.0,22.0,28.0,,280 @ 1600,Gasoline Direct Injection,Turbo Premium Unleaded I-4,272 @ 6500,2.0 L/122,5.25,1.0,Automatic w/OD,1.6,0.78,,3.27,3.97,,1.3,0.65,10,2.19,4.17,4-Wheel Disc,12.2,Yes,4-Wheel,,12.4,,,Rack-Pinion,39.0,,,P235/55HR19,,P235/55HR19,,,,Aluminum,,,,Strut,Multi-Link,Multi-Link,Strut,Yes,Yes,No,Yes,No,Yes,Yes,Yes,Yes,Yes,Yes,No,No,No,Yes,Yes,Yes,Yes,Vehicle Stability Assist Electronic Stability ...,50000.0,4.0,Unlimited,5.0,70000.0,6.0,50000.0,4.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,False,False,Acura,10.0,,6500.0,280.0,4.0,I,235.0,235.0,H,Japan,2,2.0,122.0,19.0,19.0,1.0,1.0,5.0,2019.0,1600.0
2019 Acura RDX Specs: FWD w/Advance Pkg,45500.0,22 mpg City/28 mpg Hwy,"Turbo Premium Unleaded I-4, 2.0 L",Small Sport Utility Vehicles 2WD,FWD w/Advance Pkg,Front Wheel Drive,5,4,Sport Utility,transmission: 10-speed automatic -inc: sequent...,Small Sport Utility Vehicles 2WD,3829.0,55.0,41.6,56.6,104.0,38.3,59.7,49.9,39.6,38.4,108.3,5.7,64.2,74.8,64.7,65.7,58.9,29.5,29.5,17.1,24.0,22.0,28.0,,280 @ 1600,Gasoline Direct Injection,Turbo Premium Unleaded I-4,272 @ 6500,2.0 L/122,5.25,1.0,Automatic w/OD,1.6,0.78,,3.27,3.97,,1.3,0.65,10,2.19,4.17,4-Wheel Disc,12.2,Yes,4-Wheel,,12.4,,,Rack-Pinion,39.0,,,P235/55HR19,,P235/55HR19,,,,Aluminum,,,,Strut,Multi-Link,Multi-Link,Strut,Yes,Yes,No,Yes,No,Yes,Yes,Yes,Yes,Yes,Yes,No,No,Yes,Yes,Yes,Yes,Yes,Vehicle Stability Assist Electronic Stability ...,50000.0,4.0,Unlimited,5.0,70000.0,6.0,50000.0,4.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,False,False,Acura,10.0,,6500.0,280.0,4.0,I,235.0,235.0,H,Japan,2,2.0,122.0,19.0,19.0,1.0,1.0,5.0,2019.0,1600.0
2019 Acura RDX Specs: FWD w/A-Spec Pkg,43600.0,22 mpg City/27 mpg Hwy,"Turbo Premium Unleaded I-4, 2.0 L",Small Sport Utility Vehicles 2WD,FWD w/A-Spec Pkg,Front Wheel Drive,5,4,Sport Utility,transmission: 10-speed automatic -inc: sequent...,Small Sport Utility Vehicles 2WD,3821.0,55.0,41.6,56.6,104.0,38.3,59.7,49.9,39.6,38.4,108.3,5.7,64.2,74.8,64.7,65.7,58.9,29.5,29.5,17.1,24.0,22.0,27.0,,280 @ 1600,Gasoline Direct Injection,Turbo Premium Unleaded I-4,272 @ 6500,2.0 L/122,5.25,1.0,Automatic w/OD,1.6,0.78,,3.27,3.97,,1.3,0.65,10,2.19,4.17,4-Wheel Disc,12.2,Yes,4-Wheel,,12.4,,,Rack-Pinion,39.0,,,P255/45VR20,,P255/45VR20,,,,Aluminum,,,,Strut,Multi-Link,Multi-Link,Strut,Yes,Yes,No,Yes,No,Yes,Yes,Yes,Yes,Yes,Yes,No,No,Yes,Yes,Yes,Yes,Yes,Vehicle Stability Assist Electronic Stability ...,50000.0,4.0,Unlimited,5.0,70000.0,6.0,50000.0,4.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,False,False,Acura,10.0,,6500.0,280.0,4.0,I,255.0,255.0,V,Japan,2,2.0,122.0,20.0,20.0,1.0,1.0,4.0,2019.0,1600.0
2019 Acura RDX Specs: FWD,37400.0,22 mpg City/28 mpg Hwy,"Turbo Premium Unleaded I-4, 2.0 L",Small Sport Utility Vehicles 2WD,FWD,Front Wheel Drive,5,4,Sport Utility,transmission: 10-speed automatic -inc: sequent...,Small Sport Utility Vehicles 2WD,3783.0,55.0,41.6,56.6,104.0,38.3,59.7,49.9,39.6,38.4,108.3,5.7,64.2,74.8,64.7,65.7,58.9,29.5,29.5,17.1,24.0,22.0,28.0,,280 @ 1600,Gasoline Direct Injection,Turbo Premium Unleaded I-4,272 @ 6500,2.0 L/122,5.25,1.0,Automatic w/OD,1.6,0.78,,3.27,3.97,,1.3,0.65,10,2.19,4.17,4-Wheel Disc,12.2,Yes,4-Wheel,,12.4,,,Rack-Pinion,39.0,,,P235/55HR19,,P235/55HR19,,,,Aluminum,,,,Strut,Multi-Link,Multi-Link,Strut,Yes,Yes,No,Yes,No,Yes,Yes,Yes,Yes,Yes,Yes,No,No,No,No,Yes,Yes,Yes,Vehicle Stability Assist Electronic Stability ...,50000.0,4.0,Unlimited,5.0,70000.0,6.0,50000.0,4.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,False,False,Acura,10.0,,6500.0,280.0,4.0,I,235.0,235.0,H,Japan,2,2.0,122.0,19.0,19.0,1.0,1.0,5.0,2019.0,1600.0
2019 Acura RDX Specs: AWD w/Technology Pkg,42600.0,21 mpg City/27 mpg Hwy,"Turbo Premium Unleaded I-4, 2.0 L",Small Sport Utility Vehicles 4WD,AWD w/Technology Pkg,All Wheel Drive,5,4,Sport Utility,transmission: 10-speed automatic -inc: sequent...,Small Sport Utility Vehicles 4WD,4026.0,55.0,41.6,56.6,104.0,38.3,59.7,49.9,39.6,38.4,108.3,5.7,64.2,74.8,64.7,65.7,58.9,29.5,29.5,17.1,23.0,21.0,27.0,,280 @ 1600,Gasoline Direct Injection,Turbo Premium Unleaded I-4,272 @ 6500,2.0 L/122,5.25,1.0,Automatic w/OD,1.6,0.78,,3.27,3.97,,1.3,0.65,10,2.19,4.17,4-Wheel Disc,12.2,Yes,4-Wheel,,12.4,,,Rack-Pinion,39.0,,Compact,P235/55HR19,,P235/55HR19,,,Steel,Aluminum,,,,Strut,Multi-Link,Multi-Link,Strut,Yes,Yes,No,Yes,No,Yes,Yes,Yes,Yes,Yes,Yes,No,No,No,Yes,Yes,Yes,Yes,Vehicle Stability Assist Electronic Stability ...,50000.0,4.0,Unlimited,5.0,70000.0,6.0,50000.0,4.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,False,False,Acura,10.0,,6500.0,280.0,4.0,I,235.0,235.0,H,Japan,2,2.0,122.0,19.0,19.0,1.0,1.0,5.0,2019.0,1600.0


In [26]:
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('Compact Cars', 'Compact')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('Mid-Size Cars', 'Midsize')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('Small Sport Utility Vehicles 4WD', 'Small SUV 4WD')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('4WD Sport Utility Vehicle', 'SUV 4WD')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('Small Sport Utility Vehicles 2WD', 'Small SUV 2WD')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('Mid-Size', 'Midsize')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('Two-Seaters', 'Two-Seater')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('Sub-Compact', 'Subcompact')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('Small Station Wgn', 'Small Station Wagon')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('Large Cars', 'Large')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('Subcompact Cars', 'Subcompact')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('4WD Sport Utility', 'SUV 4WD')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('Standard Sport Utility Vehicles 4WD', 'SUV 4WD')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('SUV 4WDs', 'SUV 4WD')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('Sport Utility Vehicle - 4WD', 'SUV 4WD')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('Sport Utility Vehicle - 2WD', 'SUV 2WD')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('SUV 4WDs', 'SUV 4WD')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('Large Car', 'Large')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('2WD Sport Utility', 'SUV 2WD')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('Subcompact car', 'Subcompact')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('2WD Sport Utility Vehicles', 'SUV 2WD')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('Standard Sport Utility Vehicles 2WD', 'SUV 2WD')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('Mini-Compact', 'Minicompact Cars')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('Midsize Car', 'Midsize')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('AWD Sport Utility', 'SUV 4WD')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('SUV 2WD Vehicle', 'SUV 2WD')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('Two-Seater', 'Two Seater')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('Sport Utility Vehicle', 'SUV')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('Midsizes', 'Midsize')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('2WD Minivans', '2WD Minivan')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('Special Purpose Vehicle', 'Special Purpose')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('Small Station Wagons', 'Small Station Wagon')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('SUV 2WDs', 'SUV 2WD')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('Midsize Sedan', 'Midsize')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('2WD Sport Utililty', 'SUV 2WD')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('FWD SUV', 'SUV 2WD')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('Midsize cars', 'Midsize')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('SUV - AWD', 'SUV 4WD')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('Minicompact Car', 'Minicompact')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('Minicompact Cars ', 'Minicompact')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('Mid-size', 'Midsize')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('FWD Sport Utility', 'SUV 2WD')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('Sub-compact', 'Subcompact')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('SUVs', 'SUV')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('2 Seater', 'Two Seater')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('SUV 4WD Vehicle', 'SUV 4WD')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('2WD sport Utility Vehicle', 'SUV 2WD')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('Sport Utility', 'SUV')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('MidSize Cars', 'Midsize')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('Small station wagon', 'Small Station Wagon')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('Minicompacts Car', 'Minicompact')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('Sub Compact', 'Subcompact')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('Mini-compact', 'Minicompact')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('SUV 4WDs', 'SUV 4WD')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('large', 'Large')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('Two seaters', 'Two Seater')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('Midsize sedan', 'Midsize')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('Subcompacts', 'Subcompact')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('Compact Car', 'Compact')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('Midsize Station Wagons', 'Midsize Station Wagon')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('Two seater', 'Two Seater')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('4WD sport Utility Vehicle', 'SUV 4WD')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('Compact Sedan', 'Compact')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('Large car', 'Large')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('Subcompact Car', 'Subcompact')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('Two Seaters', 'Two Seater')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('Minivan - 2WD', '2WD Minivan')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('4WD Minivans', '4WD Minivan')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('2WD Special Purpose', 'Special Purpose 2WD')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('Midsize S/W', 'Midsize Station Wagon')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('Midsize Wagon', 'Midsize Station Wagon')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('2WD Van', '2WD Minivan')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('Minicompacts', 'Minicompact')
raw_data['EPA Classification'] = raw_data['EPA Classification'].str.replace('Pick-up Truck', 'Truck')


raw_data.loc[raw_data['EPA Classification'] == 'Full Size', 'EPA Classification'] = 'Midsize'
raw_data.loc[raw_data['EPA Classification'] == 'Wagon', 'EPA Classification'] = 'Small Station Wagon'
raw_data.loc[raw_data['EPA Classification'] == 'Small SUV', 'EPA Classification'] = 'SUV'
raw_data.loc[raw_data['EPA Classification'] == 'Pickup Trucks', 'EPA Classification'] = np.nan
raw_data.loc[raw_data['EPA Classification'] == 'Light-Duty Truck', 'EPA Classification'] = np.nan
raw_data.loc[raw_data['EPA Classification'] == '4WD Pickup Trucks', 'EPA Classification'] = np.nan
raw_data.loc[raw_data['EPA Classification'] == '4WD Standard Pickup Truck', 'EPA Classification'] = np.nan


del raw_data['Other Features']


raw_data['Corrosion Miles/km'] = raw_data['Corrosion Miles/km'].str.replace("Unlimited", "10000000")


raw_data.head()

Unnamed: 0,MSRP,Gas Mileage,Engine,EPA Class,Style Name,Drivetrain,Passenger Capacity,Passenger Doors,Body Style,Transmission,EPA Classification,Base Curb Weight (lbs),Front Hip Room (in),Front Leg Room (in),Second Shoulder Room (in),Passenger Volume,Second Head Room (in),Front Shoulder Room (in),Second Hip Room (in),Front Head Room (in),Second Leg Room (in),Wheelbase (in),Min Ground Clearance (in),"Track Width, Front (in)","Width, Max w/o mirrors (in)","Track Width, Rear (in)","Height, Overall (in)",Cargo Volume to Seat 1,Cargo Volume to Seat 2,Cargo Volume to Seat 3,"Fuel Tank Capacity, Approx (gal)",Fuel Economy Est-Combined (MPG),EPA Fuel Economy Est - City (MPG),EPA Fuel Economy Est - Hwy (MPG),Engine Order Code,SAE Net Torque @ RPM,Fuel System,Engine Type,SAE Net Horsepower @ RPM,Displacement,First Gear Ratio (:1),Sixth Gear Ratio (:1),Trans Description Cont.,Fourth Gear Ratio (:1),Seventh Gear Ratio (:1),Trans Order Code,Second Gear Ratio (:1),Reverse Ratio (:1),Trans Description Cont. Again,Fifth Gear Ratio (:1),Eighth Gear Ratio (:1),Trans Type,Third Gear Ratio (:1),Final Drive Axle Ratio (:1),Brake Type,Rear Brake Rotor Diam x Thickness (in),Disc - Rear (Yes or ),Brake ABS System,Drum - Rear (Yes or ),Front Brake Rotor Diam x Thickness (in),Disc - Front (Yes or ),Rear Drum Diam x Width (in),Steering Type,Turning Diameter - Curb to Curb,Front Tire Order Code,Spare Tire Size,Front Tire Size,Rear Tire Order Code,Rear Tire Size,Spare Tire Order Code,Front Wheel Size (in),Spare Wheel Material,Front Wheel Material,Rear Wheel Size (in),Rear Wheel Material,Spare Wheel Size (in),Suspension Type - Front,Suspension Type - Rear (Cont.),Suspension Type - Rear,Suspension Type - Front (Cont.),Air Bag-Frontal-Driver,Air Bag-Frontal-Passenger,Air Bag-Passenger Switch (On/Off),Air Bag-Side Body-Front,Air Bag-Side Body-Rear,Air Bag-Side Head-Front,Air Bag-Side Head-Rear,Brakes-ABS,Child Safety Rear Door Locks,Daytime Running Lights,Traction Control,Night Vision,Rollover Protection Bars,Fog Lamps,Parking Aid,Tire Pressure Monitor,Back-Up Camera,Stability Control,Basic Miles/km,Basic Years,Corrosion Miles/km,Corrosion Years,Drivetrain Miles/km,Drivetrain Years,Roadside Assistance Miles/km,Roadside Assistance Years,Maximum Alternator Capacity (amps),Cold Cranking Amps @ 0° F (Primary),Wt Distributing Hitch - Max Tongue Wt. (lbs),Dead Weight Hitch - Max Tongue Wt. (lbs),Maximum Trailering Capacity (lbs),Wt Distributing Hitch - Max Trailer Wt. (lbs),Dead Weight Hitch - Max Trailer Wt. (lbs),Liftover Height (in),Rear Door Opening Height (in),Rear Door Opening Width (in),"Length, Overall (in)",Cargo Box Width @ Wheelhousings (in),Cargo Area Length @ Floor to Seat 3 (in),Cargo Area Length @ Floor to Seat 1 (in),Cargo Box (Area) Height (in),Cargo Area Width @ Beltline (in),Cargo Area Length @ Floor to Seat 2 (in),Clutch Size (in),Turning Diameter - Wall to Wall,Lock to Lock Turns (Steering),"Steering Ratio (:1), Overall",Shock Absorber Diameter - Front (mm),Stabilizer Bar Diameter - Rear (in),Shock Absorber Diameter - Rear (mm),Stabilizer Bar Diameter - Front (in),Total Cooling System Capacity (qts),Third Shoulder Room (in),Third Head Room (in),Third Hip Room (in),Third Leg Room (in),Hybrid/Electric Components Miles/km,Hybrid/Electric Components Years,"Transfer Case Gear Ratio (:1), High","Transfer Case Gear Ratio (:1), Low",Trunk Volume,Number of Passenger Doors,Roadside Assistance Note,Warranty Note,Maintenance Miles/km,Maintenance Years,Basic Note,Cargo Volume with Rear Seat Up,Cargo Volume with Rear Seat Down,Gross Vehicle Weight Rating Cap (lbs),Engine Oil Cooler,Transfer Case Model,Transfer Case Power Take Off,Trans PTO Access,Brake ABS System (Second Line),Axle Type - Rear,Axle Type - Front,Cold Cranking Amps @ 0° F (2nd),EPA MPG Equivalent - Combined,Battery Range (mi),Axle Ratio (:1) - Rear,Axle Ratio (:1) - Front,Gross Axle Wt Rating - Front (lbs),Gross Axle Wt Rating - Rear (lbs),EPA MPG Equivalent - City,EPA MPG Equivalent - Hwy,Maintenance Note,Emissions Miles/km,Emissions Years,Ninth Gear Ratio (:1),EPA Air Pollution Score,Rear Door Type,Curb Weight - Front (lbs),Gross Combined Wt Rating (lbs),Curb Weight - Rear (lbs),"Ground Clearance, Rear (in)",Step Up Height - Front (in),"Length, Overall w/rear bumper (in)",Ground to Top of Load Floor (in),Side Door Opening Height (in),"Overhang, Front (in)",Step Up Height - Side (in),"Ground Clearance, Front (in)",Side Door Opening Width (in),"Overhang, Rear w/bumper (in)",Cargo Volume to Seat 4,Cargo Area Length @ Floor to Seat 4 (in),Cargo Area Length @ Floor to Console (in),"Aux Fuel Tank Capacity, Approx (gal)",Fuel Tank Location,Aux Fuel Tank Location,Trans Power Take Off,Tenth Gear Ratio (:1),"Steering Ratio (:1), On Center","Steering Ratio (:1), At Lock",Spare Tire Capacity (lbs),Front Tire Capacity (lbs),Revolutions/Mile @ 45 mph - Spare,Revolutions/Mile @ 45 mph - Front,Rear Tire Capacity (lbs),Revolutions/Mile @ 45 mph - Rear,Axle Capacity - Front (lbs),Spring Capacity - Front (lbs),Axle Capacity - Rear (lbs),Spring Capacity - Rear (lbs),Maximum Alternator Watts,Fifth Wheel Hitch - Max Trailer Wt. (lbs),Fifth Wheel Hitch - Max Tongue Wt. (lbs),"Length, Overall w/o rear bumper (in)",Front Bumper to Back of Cab (in),"Frame Width, Rear (in)",Cab to Axle (in),"Overhang, Rear w/o bumper (in)",Ground to Top of Frame (in),Cab to End of Frame (in),"Cargo Box Width @ Top, Rear (in)",Cargo Volume,Cargo Box Width @ Floor (in),Ext'd Cab Cargo Volume,Cargo Box Length @ Floor (in),Tailgate Width (in),Drivetrain Note,Emissions Note,Fourth Hip Room (in),Fourth Leg Room (in),Fourth Shoulder Room (in),Fourth Head Room (in),Fifth Shoulder Room (in),Fifth Head Room (in),Fifth Hip Room (in),Fifth Leg Room (in),Corrosion Note,Hybrid/Electric Components Note,Hybrid,Electric,Company Name,Gears,Net Horsepower,Net Horsepower RPM,Net Torque,Cylinders,Engine Configuration,Rear Tire Width,Front Tire Width,Tire Rating,Country,Country Code,Displacement (L),Displacement (cc),Rear Wheel Size,Front Wheel Size,Tire Width Ratio,Wheel Size Ratio,Tire Ratio,Year,Net Torque RPM
2019 Acura RDX Specs: FWD w/Technology Pkg,40600.0,22 mpg City/28 mpg Hwy,"Turbo Premium Unleaded I-4, 2.0 L",Small Sport Utility Vehicles 2WD,FWD w/Technology Pkg,Front Wheel Drive,5,4,Sport Utility,transmission: 10-speed automatic -inc: sequent...,Small SUV 2WD,3790.0,55.0,41.6,56.6,104.0,38.3,59.7,49.9,39.6,38.4,108.3,5.7,64.2,74.8,64.7,65.7,58.9,29.5,29.5,17.1,24.0,22.0,28.0,,280 @ 1600,Gasoline Direct Injection,Turbo Premium Unleaded I-4,272 @ 6500,2.0 L/122,5.25,1.0,Automatic w/OD,1.6,0.78,,3.27,3.97,,1.3,0.65,10,2.19,4.17,4-Wheel Disc,12.2,Yes,4-Wheel,,12.4,,,Rack-Pinion,39.0,,,P235/55HR19,,P235/55HR19,,,,Aluminum,,,,Strut,Multi-Link,Multi-Link,Strut,Yes,Yes,No,Yes,No,Yes,Yes,Yes,Yes,Yes,Yes,No,No,No,Yes,Yes,Yes,Yes,50000.0,4.0,10000000,5.0,70000.0,6.0,50000.0,4.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,False,False,Acura,10.0,,6500.0,280.0,4.0,I,235.0,235.0,H,Japan,2,2.0,122.0,19.0,19.0,1.0,1.0,5.0,2019.0,1600.0
2019 Acura RDX Specs: FWD w/Advance Pkg,45500.0,22 mpg City/28 mpg Hwy,"Turbo Premium Unleaded I-4, 2.0 L",Small Sport Utility Vehicles 2WD,FWD w/Advance Pkg,Front Wheel Drive,5,4,Sport Utility,transmission: 10-speed automatic -inc: sequent...,Small SUV 2WD,3829.0,55.0,41.6,56.6,104.0,38.3,59.7,49.9,39.6,38.4,108.3,5.7,64.2,74.8,64.7,65.7,58.9,29.5,29.5,17.1,24.0,22.0,28.0,,280 @ 1600,Gasoline Direct Injection,Turbo Premium Unleaded I-4,272 @ 6500,2.0 L/122,5.25,1.0,Automatic w/OD,1.6,0.78,,3.27,3.97,,1.3,0.65,10,2.19,4.17,4-Wheel Disc,12.2,Yes,4-Wheel,,12.4,,,Rack-Pinion,39.0,,,P235/55HR19,,P235/55HR19,,,,Aluminum,,,,Strut,Multi-Link,Multi-Link,Strut,Yes,Yes,No,Yes,No,Yes,Yes,Yes,Yes,Yes,Yes,No,No,Yes,Yes,Yes,Yes,Yes,50000.0,4.0,10000000,5.0,70000.0,6.0,50000.0,4.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,False,False,Acura,10.0,,6500.0,280.0,4.0,I,235.0,235.0,H,Japan,2,2.0,122.0,19.0,19.0,1.0,1.0,5.0,2019.0,1600.0
2019 Acura RDX Specs: FWD w/A-Spec Pkg,43600.0,22 mpg City/27 mpg Hwy,"Turbo Premium Unleaded I-4, 2.0 L",Small Sport Utility Vehicles 2WD,FWD w/A-Spec Pkg,Front Wheel Drive,5,4,Sport Utility,transmission: 10-speed automatic -inc: sequent...,Small SUV 2WD,3821.0,55.0,41.6,56.6,104.0,38.3,59.7,49.9,39.6,38.4,108.3,5.7,64.2,74.8,64.7,65.7,58.9,29.5,29.5,17.1,24.0,22.0,27.0,,280 @ 1600,Gasoline Direct Injection,Turbo Premium Unleaded I-4,272 @ 6500,2.0 L/122,5.25,1.0,Automatic w/OD,1.6,0.78,,3.27,3.97,,1.3,0.65,10,2.19,4.17,4-Wheel Disc,12.2,Yes,4-Wheel,,12.4,,,Rack-Pinion,39.0,,,P255/45VR20,,P255/45VR20,,,,Aluminum,,,,Strut,Multi-Link,Multi-Link,Strut,Yes,Yes,No,Yes,No,Yes,Yes,Yes,Yes,Yes,Yes,No,No,Yes,Yes,Yes,Yes,Yes,50000.0,4.0,10000000,5.0,70000.0,6.0,50000.0,4.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,False,False,Acura,10.0,,6500.0,280.0,4.0,I,255.0,255.0,V,Japan,2,2.0,122.0,20.0,20.0,1.0,1.0,4.0,2019.0,1600.0
2019 Acura RDX Specs: FWD,37400.0,22 mpg City/28 mpg Hwy,"Turbo Premium Unleaded I-4, 2.0 L",Small Sport Utility Vehicles 2WD,FWD,Front Wheel Drive,5,4,Sport Utility,transmission: 10-speed automatic -inc: sequent...,Small SUV 2WD,3783.0,55.0,41.6,56.6,104.0,38.3,59.7,49.9,39.6,38.4,108.3,5.7,64.2,74.8,64.7,65.7,58.9,29.5,29.5,17.1,24.0,22.0,28.0,,280 @ 1600,Gasoline Direct Injection,Turbo Premium Unleaded I-4,272 @ 6500,2.0 L/122,5.25,1.0,Automatic w/OD,1.6,0.78,,3.27,3.97,,1.3,0.65,10,2.19,4.17,4-Wheel Disc,12.2,Yes,4-Wheel,,12.4,,,Rack-Pinion,39.0,,,P235/55HR19,,P235/55HR19,,,,Aluminum,,,,Strut,Multi-Link,Multi-Link,Strut,Yes,Yes,No,Yes,No,Yes,Yes,Yes,Yes,Yes,Yes,No,No,No,No,Yes,Yes,Yes,50000.0,4.0,10000000,5.0,70000.0,6.0,50000.0,4.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,False,False,Acura,10.0,,6500.0,280.0,4.0,I,235.0,235.0,H,Japan,2,2.0,122.0,19.0,19.0,1.0,1.0,5.0,2019.0,1600.0
2019 Acura RDX Specs: AWD w/Technology Pkg,42600.0,21 mpg City/27 mpg Hwy,"Turbo Premium Unleaded I-4, 2.0 L",Small Sport Utility Vehicles 4WD,AWD w/Technology Pkg,All Wheel Drive,5,4,Sport Utility,transmission: 10-speed automatic -inc: sequent...,Small SUV 4WD,4026.0,55.0,41.6,56.6,104.0,38.3,59.7,49.9,39.6,38.4,108.3,5.7,64.2,74.8,64.7,65.7,58.9,29.5,29.5,17.1,23.0,21.0,27.0,,280 @ 1600,Gasoline Direct Injection,Turbo Premium Unleaded I-4,272 @ 6500,2.0 L/122,5.25,1.0,Automatic w/OD,1.6,0.78,,3.27,3.97,,1.3,0.65,10,2.19,4.17,4-Wheel Disc,12.2,Yes,4-Wheel,,12.4,,,Rack-Pinion,39.0,,Compact,P235/55HR19,,P235/55HR19,,,Steel,Aluminum,,,,Strut,Multi-Link,Multi-Link,Strut,Yes,Yes,No,Yes,No,Yes,Yes,Yes,Yes,Yes,Yes,No,No,No,Yes,Yes,Yes,Yes,50000.0,4.0,10000000,5.0,70000.0,6.0,50000.0,4.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,False,False,Acura,10.0,,6500.0,280.0,4.0,I,235.0,235.0,H,Japan,2,2.0,122.0,19.0,19.0,1.0,1.0,5.0,2019.0,1600.0


>Saving dataframe to a new csv

In [27]:
#final prep-processing


raw_data['Min Ground Clearance (in)'] = raw_data['Min Ground Clearance (in)'].str.slice(stop=2).astype(float)

raw_data['Corrosion Miles/km']= raw_data['Corrosion Miles/km'].str.replace('50,000', '50000')
raw_data['Corrosion Miles/km']= raw_data['Corrosion Miles/km'].str.replace('60,000', '60000')
raw_data['Corrosion Miles/km']= raw_data['Corrosion Miles/km'].str.replace('100,000', '100000')
raw_data['Corrosion Miles/km'] = raw_data['Corrosion Miles/km'].astype(float)

del raw_data['Maximum Alternator Capacity (amps)']

del raw_data['Cold Cranking Amps @ 0° F (Primary)'] 
del raw_data['Wt Distributing Hitch - Max Tongue Wt. (lbs)'] 


In [28]:
raw_data['Drivetrain'].value_counts()

Rear Wheel Drive     9020
Front Wheel Drive    8910
Four Wheel Drive     5958
All Wheel Drive      5275
All-Wheel Drive       545
4 Wheel Drive         378
Front-Wheel Drive     207
Rear wheel drive       79
AWD                    42
Rear-Wheel Drive       39
4WD                    23
2 Wheel Drive          22
All wheel drive        22
REAR WHEEL DRIVE       19
Four wheel drive       18
All-wheel drive        11
Front-wheel drive      11
2WD                     9
2-Wheel Drive           8
RWD                     3
4-wheel Drive           1
Name: Drivetrain, dtype: int64

In [29]:
raw_data['Drivetrain']= raw_data['Drivetrain'].str.replace('Rear wheel drive', 'Rear Wheel Drive')
raw_data['Drivetrain']= raw_data['Drivetrain'].str.replace('Rear wheel drive ', 'Rear Wheel Drive')
raw_data['Drivetrain']= raw_data['Drivetrain'].str.replace('Rear-Wheel Drive', 'Rear Wheel Drive')
raw_data['Drivetrain']= raw_data['Drivetrain'].str.replace('REAR WHEEL DRIVE', 'Rear Wheel Drive')
raw_data['Drivetrain']= raw_data['Drivetrain'].str.replace('RWD', 'Rear Wheel Drive')

raw_data['Drivetrain']= raw_data['Drivetrain'].str.replace('Front-Wheel Drive', 'Front Wheel Drive')
raw_data['Drivetrain']= raw_data['Drivetrain'].str.replace('Front-wheel Drive', 'Front Wheel Drive')
raw_data['Drivetrain']= raw_data['Drivetrain'].str.replace('Front-wheel drive', 'Front Wheel Drive')

raw_data['Drivetrain']= raw_data['Drivetrain'].str.replace('4 Wheel Drive', 'Four Wheel Drive')
raw_data['Drivetrain']= raw_data['Drivetrain'].str.replace('4WD', 'Four Wheel Drive')
raw_data['Drivetrain']= raw_data['Drivetrain'].str.replace('Four wheel drive', 'Four Wheel Drive')
raw_data['Drivetrain']= raw_data['Drivetrain'].str.replace('4-wheel Drive', 'Four Wheel Drive')

raw_data['Drivetrain']= raw_data['Drivetrain'].str.replace('All-Wheel Drive', 'All Wheel Drive')
raw_data['Drivetrain']= raw_data['Drivetrain'].str.replace('AWD', 'All Wheel Drive')
raw_data['Drivetrain']= raw_data['Drivetrain'].str.replace('All-wheel drive', 'All Wheel Drive')
raw_data['Drivetrain']= raw_data['Drivetrain'].str.replace('All wheel drive', 'All Wheel Drive')

raw_data['Drivetrain']= raw_data['Drivetrain'].str.replace('2 Wheel Drive', 'Two Wheel Drive')
raw_data['Drivetrain']= raw_data['Drivetrain'].str.replace('2WD', 'Two Wheel Drive')
raw_data['Drivetrain']= raw_data['Drivetrain'].str.replace('2-Wheel Drive', 'Two Wheel Drive')

**Remove columns**

In [30]:
df = raw_data.copy()
#raw_data = df.copy()

In [31]:
# DELETE COLUMNS
specs_to_delete = ['Gas Mileage', 'Engine', 'Engine Type', 'SAE Net Horsepower @ RPM', 'SAE Net Torque @ RPM',
                  'Displacement', 'Trans Description Cont.', 'Rear Tire Size', 'Front Tire Size', 'Rear Wheel Size (in)',
                  'Front Wheel Size (in)', 'Transmission', 'EPA Class', 'Brake ABS System', 'Disc - Front (Yes or   )',
                  'Brake Type', 'Disc - Rear (Yes or   )', 'Spare Tire Size', 'Spare Wheel Size (in)', 'Spare Wheel Material']
raw_data.drop(specs_to_delete, axis=1, inplace=True)


########### FIX TO 75% ########## -------- Identifying columns with NaN totalling more than 75% of elements	

col_to_delete = raw_data.columns[raw_data.isna().sum() >= 0.25*len(raw_data)].tolist()

  ######
#Keep Hybrid columns (['Hybrid/Electric Components Miles/km', 'Hybrid/Electric Components Years', 'Hybrid/Electric Components Note', 'Hybrid'] )
#                 even if they have many missing values
hyb_cols = [col for col in raw_data if 'ybri' in col]

for x in hyb_cols:
    if x in col_to_delete:
        col_to_delete.remove(x)

for x in ['MSRP', 'Year', 'EPA Classification', 'Company Name', 'EPA Fuel Economy Est - City (MPG)', 'EPA Fuel Economy Est - Hwy (MPG)',
          'Base Curb Weight (lbs)', 'Turning Diameter - Curb to Curb', 'Curb Weight - Front (lbs)', 'Curb Weight - Rear (lbs)' ]:
    if x in col_to_delete:
        col_to_delete.remove(x)
######

raw_data.drop(col_to_delete, axis=1, inplace=True)
raw_data.head()

Unnamed: 0,MSRP,Style Name,Drivetrain,Passenger Capacity,Passenger Doors,Body Style,EPA Classification,Base Curb Weight (lbs),Front Leg Room (in),Second Shoulder Room (in),Second Head Room (in),Front Shoulder Room (in),Front Head Room (in),Second Leg Room (in),Wheelbase (in),"Width, Max w/o mirrors (in)","Height, Overall (in)","Fuel Tank Capacity, Approx (gal)",EPA Fuel Economy Est - City (MPG),EPA Fuel Economy Est - Hwy (MPG),Fuel System,First Gear Ratio (:1),Fourth Gear Ratio (:1),Second Gear Ratio (:1),Reverse Ratio (:1),Trans Type,Third Gear Ratio (:1),Steering Type,Turning Diameter - Curb to Curb,Front Wheel Material,Suspension Type - Front,Suspension Type - Rear,Air Bag-Frontal-Driver,Air Bag-Frontal-Passenger,Air Bag-Passenger Switch (On/Off),Air Bag-Side Body-Front,Air Bag-Side Body-Rear,Air Bag-Side Head-Front,Air Bag-Side Head-Rear,Brakes-ABS,Child Safety Rear Door Locks,Daytime Running Lights,Traction Control,Night Vision,Rollover Protection Bars,Fog Lamps,Parking Aid,Tire Pressure Monitor,Back-Up Camera,Stability Control,Basic Miles/km,Basic Years,Corrosion Miles/km,Corrosion Years,Drivetrain Miles/km,Drivetrain Years,Roadside Assistance Miles/km,Roadside Assistance Years,Hybrid/Electric Components Miles/km,Hybrid/Electric Components Years,Curb Weight - Front (lbs),Curb Weight - Rear (lbs),Hybrid/Electric Components Note,Hybrid,Electric,Company Name,Gears,Net Horsepower RPM,Net Torque,Engine Configuration,Rear Tire Width,Front Tire Width,Country,Country Code,Displacement (L),Displacement (cc),Rear Wheel Size,Front Wheel Size,Tire Width Ratio,Wheel Size Ratio,Tire Ratio,Year,Net Torque RPM
2019 Acura RDX Specs: FWD w/Technology Pkg,40600.0,FWD w/Technology Pkg,Front Wheel Drive,5,4,Sport Utility,Small SUV 2WD,3790.0,41.6,56.6,38.3,59.7,39.6,38.4,108.3,74.8,65.7,17.1,22.0,28.0,Gasoline Direct Injection,5.25,1.6,3.27,3.97,10,2.19,Rack-Pinion,39.0,Aluminum,Strut,Multi-Link,Yes,Yes,No,Yes,No,Yes,Yes,Yes,Yes,Yes,Yes,No,No,No,Yes,Yes,Yes,Yes,50000.0,4.0,10000000.0,5.0,70000.0,6.0,50000.0,4.0,,,,,,False,False,Acura,10.0,6500.0,280.0,I,235.0,235.0,Japan,2,2.0,122.0,19.0,19.0,1.0,1.0,5.0,2019.0,1600.0
2019 Acura RDX Specs: FWD w/Advance Pkg,45500.0,FWD w/Advance Pkg,Front Wheel Drive,5,4,Sport Utility,Small SUV 2WD,3829.0,41.6,56.6,38.3,59.7,39.6,38.4,108.3,74.8,65.7,17.1,22.0,28.0,Gasoline Direct Injection,5.25,1.6,3.27,3.97,10,2.19,Rack-Pinion,39.0,Aluminum,Strut,Multi-Link,Yes,Yes,No,Yes,No,Yes,Yes,Yes,Yes,Yes,Yes,No,No,Yes,Yes,Yes,Yes,Yes,50000.0,4.0,10000000.0,5.0,70000.0,6.0,50000.0,4.0,,,,,,False,False,Acura,10.0,6500.0,280.0,I,235.0,235.0,Japan,2,2.0,122.0,19.0,19.0,1.0,1.0,5.0,2019.0,1600.0
2019 Acura RDX Specs: FWD w/A-Spec Pkg,43600.0,FWD w/A-Spec Pkg,Front Wheel Drive,5,4,Sport Utility,Small SUV 2WD,3821.0,41.6,56.6,38.3,59.7,39.6,38.4,108.3,74.8,65.7,17.1,22.0,27.0,Gasoline Direct Injection,5.25,1.6,3.27,3.97,10,2.19,Rack-Pinion,39.0,Aluminum,Strut,Multi-Link,Yes,Yes,No,Yes,No,Yes,Yes,Yes,Yes,Yes,Yes,No,No,Yes,Yes,Yes,Yes,Yes,50000.0,4.0,10000000.0,5.0,70000.0,6.0,50000.0,4.0,,,,,,False,False,Acura,10.0,6500.0,280.0,I,255.0,255.0,Japan,2,2.0,122.0,20.0,20.0,1.0,1.0,4.0,2019.0,1600.0
2019 Acura RDX Specs: FWD,37400.0,FWD,Front Wheel Drive,5,4,Sport Utility,Small SUV 2WD,3783.0,41.6,56.6,38.3,59.7,39.6,38.4,108.3,74.8,65.7,17.1,22.0,28.0,Gasoline Direct Injection,5.25,1.6,3.27,3.97,10,2.19,Rack-Pinion,39.0,Aluminum,Strut,Multi-Link,Yes,Yes,No,Yes,No,Yes,Yes,Yes,Yes,Yes,Yes,No,No,No,No,Yes,Yes,Yes,50000.0,4.0,10000000.0,5.0,70000.0,6.0,50000.0,4.0,,,,,,False,False,Acura,10.0,6500.0,280.0,I,235.0,235.0,Japan,2,2.0,122.0,19.0,19.0,1.0,1.0,5.0,2019.0,1600.0
2019 Acura RDX Specs: AWD w/Technology Pkg,42600.0,AWD w/Technology Pkg,All Wheel Drive,5,4,Sport Utility,Small SUV 4WD,4026.0,41.6,56.6,38.3,59.7,39.6,38.4,108.3,74.8,65.7,17.1,21.0,27.0,Gasoline Direct Injection,5.25,1.6,3.27,3.97,10,2.19,Rack-Pinion,39.0,Aluminum,Strut,Multi-Link,Yes,Yes,No,Yes,No,Yes,Yes,Yes,Yes,Yes,Yes,No,No,No,Yes,Yes,Yes,Yes,50000.0,4.0,10000000.0,5.0,70000.0,6.0,50000.0,4.0,,,,,,False,False,Acura,10.0,6500.0,280.0,I,235.0,235.0,Japan,2,2.0,122.0,19.0,19.0,1.0,1.0,5.0,2019.0,1600.0


In [32]:
raw_data.columns

Index(['MSRP', 'Style Name', 'Drivetrain', 'Passenger Capacity',
       'Passenger Doors', 'Body Style', 'EPA Classification',
       'Base Curb Weight (lbs)', 'Front Leg Room (in)',
       'Second Shoulder Room (in)', 'Second Head Room (in)',
       'Front Shoulder Room (in)', 'Front Head Room (in)',
       'Second Leg Room (in)', 'Wheelbase (in)', 'Width, Max w/o mirrors (in)',
       'Height, Overall (in)', 'Fuel Tank Capacity, Approx (gal)',
       'EPA Fuel Economy Est - City (MPG)', 'EPA Fuel Economy Est - Hwy (MPG)',
       'Fuel System', 'First Gear Ratio (:1)', 'Fourth Gear Ratio (:1)',
       'Second Gear Ratio (:1)', 'Reverse Ratio (:1)', 'Trans Type',
       'Third Gear Ratio (:1)', 'Steering Type',
       'Turning Diameter - Curb to Curb', 'Front Wheel Material',
       'Suspension Type - Front', 'Suspension Type - Rear',
       'Air Bag-Frontal-Driver', 'Air Bag-Frontal-Passenger',
       'Air Bag-Passenger Switch (On/Off)', 'Air Bag-Side Body-Front',
       'Air Bag-Side

In [33]:
raw_data.shape[1]

83

In [34]:
raw_data['EPA Classification'].isnull().value_counts()

False    19826
True     12490
Name: EPA Classification, dtype: int64

In [35]:
raw_data['Name'] = raw_data.index

In [36]:

raw_data.loc[ pd.Series(raw_data['EPA Classification'].str.contains("Small", na=False)), 'EPA Classification' ] = "Compact"     
raw_data.loc[ pd.Series(raw_data['EPA Classification'].str.contains("Truck", na=False)), 'EPA Classification' ] = "Pick-up Truck"
raw_data.loc[ pd.Series(raw_data['EPA Classification'].str.contains("Minivan", na=False)), 'EPA Classification' ] = "Van" 
raw_data.loc[ pd.Series(raw_data['EPA Classification'].str.contains("Purpose", na=False)), 'EPA Classification' ] = "Special Purpose"
raw_data.loc[ pd.Series(raw_data['EPA Classification'].str.contains("Subcompact", na=False)), 'EPA Classification' ] = "Compact"
raw_data.loc[ pd.Series(raw_data['EPA Classification'].str.contains("Minicompact", na=False)), 'EPA Classification' ] = "Compact"
raw_data.loc[ pd.Series(raw_data['EPA Classification'].str.contains("Small", na=False)), 'EPA Classification' ] = "Compact"
raw_data.loc[ pd.Series(raw_data['EPA Classification'].str.contains("Mid-sized", na=False)), 'EPA Classification' ] = "Midsize"
raw_data.loc[ pd.Series(raw_data['EPA Classification'].str.contains("Mid size", na=False)), 'EPA Classification' ] = "Midsize"
raw_data.loc[ pd.Series(raw_data['EPA Classification'].str.contains("Special", na=False)), 'EPA Classification' ] = "SUV"                    
raw_data.loc[ pd.Series(raw_data['EPA Classification'].str.contains("Model X", na=False)), 'EPA Classification' ] = "All Electric SUV"  
raw_data.loc[ pd.Series(raw_data['EPA Classification'].str.contains("Pick-up", na=False)), 'EPA Classification' ] = "Pick-up Truck"    

raw_data.loc[ pd.Series(raw_data['EPA Classification'].str.contains("SUV 4WD", na=False)), 'EPA Classification' ] = "SUV"                    
raw_data.loc[ pd.Series(raw_data['EPA Classification'].str.contains("SUV 2WD", na=False)), 'EPA Classification' ] = "SUV"                    

raw_data.loc[ pd.Series(raw_data['EPA Classification'].str.contains("Midsize Station Wagon", na=False)), 'EPA Classification' ] = "Wagon"  

raw_data.loc[ pd.Series(raw_data['EPA Classification'].str.contains("Pick-up Truck", na=False)), 'EPA Classification' ] = "Truck"     


raw_data.loc[ pd.Series(raw_data.Name.str.contains("Atlas ")), 'EPA Classification' ] = "SUV"                         
raw_data.loc[ pd.Series(raw_data.Name.str.contains("4Runner")), 'EPA Classification' ] = "SUV" 
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Ascent")), 'EPA Classification' ] = "SUV"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("GLS")), 'EPA Classification' ] = "SUV"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Navigator")), 'EPA Classification' ] = "SUV"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Lexus Lx")), 'EPA Classification' ] = "SUV"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Range Rover")), 'EPA Classification' ] = "SUV"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Wrangler")), 'EPA Classification' ] = "SUV"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("INFINITY")), 'EPA Classification' ] = "SUV"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Yukon")), 'EPA Classification' ] = "SUV"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Expedition")), 'EPA Classification' ] = "SUV"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Durango")), 'EPA Classification' ] = "SUV"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Suburban")), 'EPA Classification' ] = "SUV"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Escalade")), 'EPA Classification' ] = "SUV"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Land Rover")), 'EPA Classification' ] = "SUV"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Sierra")), 'EPA Classification' ] = "SUV"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Bentayga")), 'EPA Classification' ] = "SUV"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Audi Q8")), 'EPA Classification' ] = "SUV"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Sequoia")), 'EPA Classification' ] = "SUV"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Cayenne")), 'EPA Classification' ] = "SUV"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("G Class")), 'EPA Classification' ] = "SUV"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Blazer")), 'EPA Classification' ] = "SUV"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Suburban")), 'EPA Classification' ] = "SUV"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Land Cruiser")), 'EPA Classification' ] = "SUV"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Cullinan")), 'EPA Classification' ] = "SUV"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Lexus GX")), 'EPA Classification' ] = "SUV"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Chevrolet Tahoe")), 'EPA Classification' ] = "SUV"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("BMW X7")), 'EPA Classification' ] = "SUV"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Mercedes-Benz GLE")), 'EPA Classification' ] = "SUV"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Murano")), 'EPA Classification' ] = "SUV"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Audi Q3")), 'EPA Classification' ] = "SUV"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Ford Explorer")), 'EPA Classification' ] = "SUV"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Pathfinder")), 'EPA Classification' ] = "SUV"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Passport")), 'EPA Classification' ] = "SUV"


raw_data.loc[ pd.Series(raw_data.Name.str.contains("Super Duty")), 'EPA Classification' ] = "Pick-up Truck"  


raw_data.loc[ pd.Series(raw_data.Name.str.contains("3-Series")), 'EPA Classification' ] = "Compact"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("5-Series")), 'EPA Classification' ] = "Compact"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Mirage")), 'EPA Classification' ] = "Compact"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Maxima")), 'EPA Classification' ] = "Compact"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Sentra")), 'EPA Classification' ] = "Compact"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Camry")), 'EPA Classification' ] = "Compact"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Corolla")), 'EPA Classification' ] = "Compact"


raw_data.loc[ pd.Series(raw_data.Name.str.contains("Elantra")), 'EPA Classification' ] = "Compact"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Veloster")), 'EPA Classification' ] = "Compact"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Jetta")), 'EPA Classification' ] = "Compact"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Golf")), 'EPA Classification' ] = "Compact"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Elantra")), 'EPA Classification' ] = "Compact"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Impreza")), 'EPA Classification' ] = "Compact"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Altima")), 'EPA Classification' ] = "Compact"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Supra")), 'EPA Classification' ] = "Compact"


raw_data.loc[ pd.Series(raw_data.Name.str.contains("911")), 'EPA Classification' ] = "Two Seater"


raw_data.loc[ pd.Series(raw_data.Name.str.contains("Tesla")), 'EPA Classification' ] = "All Electric"



raw_data.loc[ pd.Series(raw_data.Name.str.contains("Transit")), 'EPA Classification' ] = "Midsize"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Regal")), 'EPA Classification' ] = "Midsize"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Camaro")), 'EPA Classification' ] = "Midsize"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Legacy")), 'EPA Classification' ] = "Midsize"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Passat")), 'EPA Classification' ] = "Midsize"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Taurus")), 'EPA Classification' ] = "Midsize"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Sonata")), 'EPA Classification' ] = "Midsize"


raw_data.loc[ pd.Series(raw_data.Name.str.contains("Ranger")), 'EPA Classification' ] = "Compact"


raw_data.loc[ pd.Series(raw_data.Name.str.contains("Ram")), 'EPA Classification' ] = "Van"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Sprinter")), 'EPA Classification' ] = "Van"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Nissan NV")), 'EPA Classification' ] = "Van"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Metris")), 'EPA Classification' ] = "Van"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Express")), 'EPA Classification' ] = "Van"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("NV200")), 'EPA Classification' ] = "Van"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Savana")), 'EPA Classification' ] = "Van"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Savana")), 'EPA Classification' ] = "Van"



raw_data.loc[ pd.Series(raw_data.Name.str.contains("Lexus LX")), 'EPA Classification' ] = "Large"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Jaguar XF")), 'EPA Classification' ] = "Large"

raw_data.loc[ pd.Series(raw_data.Name.str.contains("Ridgeline")), 'EPA Classification' ] = "Truck"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Titan")), 'EPA Classification' ] = "Truck"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Frontier")), 'EPA Classification' ] = "Truck"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Ford F")), 'EPA Classification' ] = "Truck"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Sierra")), 'EPA Classification' ] = "Truck"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Canyon")), 'EPA Classification' ] = "Truck"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Silverado")), 'EPA Classification' ] = "Truck"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Tundra")), 'EPA Classification' ] = "Truck"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Tacoma")), 'EPA Classification' ] = "Truck"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Colorado")), 'EPA Classification' ] = "Truck"


raw_data.loc[ pd.Series(raw_data.Name.str.contains("BMW X5")), 'EPA Classification' ] = "SUV"

raw_data.loc[ pd.Series(raw_data.Name.str.contains("Bolt EV")), 'EPA Classification' ] = "All Electric"

raw_data.loc[ pd.Series(raw_data.Name.str.contains("QX80")), 'EPA Classification' ] = "SUV"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Armada")), 'EPA Classification' ] = "SUV"
raw_data.loc[ pd.Series(raw_data.Name.str.contains("Urus")), 'EPA Classification' ] = "SUV"


raw_data.loc[ pd.Series(raw_data['EPA Classification'].str.contains("Pick-up", na=False)), 'EPA Classification' ] = "Truck"    



###REMOVE ALL OTHER NAN
raw_data = raw_data.loc[raw_data['EPA Classification'].notnull()]

raw_data['EPA Classification'].value_counts()

Compact         9711
Truck           7020
SUV             6356
Midsize         3804
Van             2019
Large           1371
Two Seater      1316
Wagon            233
All Electric      71
Name: EPA Classification, dtype: int64

In [37]:
raw_data['EPA Classification'].isnull().value_counts()

False    31901
Name: EPA Classification, dtype: int64

In [38]:
#raw_data = raw_data.drop(raw_data[raw_data.Year == 2019].index)

In [39]:
raw_data['Curb Weight - Front (lbs)'] = raw_data['Curb Weight - Front (lbs)'].astype(float)
raw_data['Country Code'] = raw_data['Country Code'].astype("category")

In [40]:
#go one year back for names and year of cars of 2019 and 2018

raw_data.loc[raw_data.Year==2018, 'Year'] = 2017
raw_data.loc[raw_data.Year==2017, 'Name'] = raw_data.loc[raw_data.Year==2017, 'Name'].str.slice_replace(stop=4, repl='2017')

raw_data.loc[raw_data.Year==2019, 'Year'] = 2018
raw_data.loc[raw_data.Year==2018, 'Name'] = raw_data.loc[raw_data.Year==2018, 'Name'].str.slice_replace(stop=4, repl='2018')

raw_data.set_index('Name', inplace=True)

raw_data.Year.value_counts()

2017.00    2801
2018.00    2431
2016.00    2225
2015.00    2103
2014.00    1897
2013.00    1889
2012.00    1865
2011.00    1666
2010.00    1447
2009.00    1435
2008.00    1381
2007.00    1294
2006.00    1192
2005.00    1116
2004.00    1034
2003.00     848
2002.00     753
2001.00     732
2000.00     586
1999.00     543
1998.00     471
1997.00     469
1996.00     314
1995.00     278
1994.00     256
1992.00     245
1991.00     223
1993.00     206
1990.00     201
Name: Year, dtype: int64

In [41]:
raw_data.loc[:,raw_data.select_dtypes(float).columns] = raw_data.select_dtypes(float).fillna(raw_data.select_dtypes(float).mean())
raw_data.loc[:,raw_data.select_dtypes(int).columns] = raw_data.select_dtypes(int).fillna(raw_data.select_dtypes(int).mean())
raw_data.loc[:,raw_data.select_dtypes(object).columns] = raw_data.select_dtypes(object).fillna(raw_data.select_dtypes(object).mode().iloc[0])

raw_data.head()

Unnamed: 0_level_0,MSRP,Style Name,Drivetrain,Passenger Capacity,Passenger Doors,Body Style,EPA Classification,Base Curb Weight (lbs),Front Leg Room (in),Second Shoulder Room (in),Second Head Room (in),Front Shoulder Room (in),Front Head Room (in),Second Leg Room (in),Wheelbase (in),"Width, Max w/o mirrors (in)","Height, Overall (in)","Fuel Tank Capacity, Approx (gal)",EPA Fuel Economy Est - City (MPG),EPA Fuel Economy Est - Hwy (MPG),Fuel System,First Gear Ratio (:1),Fourth Gear Ratio (:1),Second Gear Ratio (:1),Reverse Ratio (:1),Trans Type,Third Gear Ratio (:1),Steering Type,Turning Diameter - Curb to Curb,Front Wheel Material,Suspension Type - Front,Suspension Type - Rear,Air Bag-Frontal-Driver,Air Bag-Frontal-Passenger,Air Bag-Passenger Switch (On/Off),Air Bag-Side Body-Front,Air Bag-Side Body-Rear,Air Bag-Side Head-Front,Air Bag-Side Head-Rear,Brakes-ABS,Child Safety Rear Door Locks,Daytime Running Lights,Traction Control,Night Vision,Rollover Protection Bars,Fog Lamps,Parking Aid,Tire Pressure Monitor,Back-Up Camera,Stability Control,Basic Miles/km,Basic Years,Corrosion Miles/km,Corrosion Years,Drivetrain Miles/km,Drivetrain Years,Roadside Assistance Miles/km,Roadside Assistance Years,Hybrid/Electric Components Miles/km,Hybrid/Electric Components Years,Curb Weight - Front (lbs),Curb Weight - Rear (lbs),Hybrid/Electric Components Note,Hybrid,Electric,Company Name,Gears,Net Horsepower RPM,Net Torque,Engine Configuration,Rear Tire Width,Front Tire Width,Country,Country Code,Displacement (L),Displacement (cc),Rear Wheel Size,Front Wheel Size,Tire Width Ratio,Wheel Size Ratio,Tire Ratio,Year,Net Torque RPM
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1
2018 Acura RDX Specs: FWD w/Technology Pkg,40600.0,FWD w/Technology Pkg,Front Wheel Drive,5,4,Sport Utility,Compact,3790.0,41.6,56.6,38.3,59.7,39.6,38.4,108.3,74.8,65.7,17.1,22.0,28.0,Gasoline Direct Injection,5.25,1.6,3.27,3.97,10,2.19,Rack-Pinion,39.0,Aluminum,Strut,Multi-Link,Yes,Yes,No,Yes,No,Yes,Yes,Yes,Yes,Yes,Yes,No,No,No,Yes,Yes,Yes,Yes,50000.0,4.0,10000000.0,5.0,70000.0,6.0,50000.0,4.0,100000,8,2872.31,3075,Applies to hybrid vehicles only,False,False,Acura,10.0,6500.0,280.0,I,235.0,235.0,Japan,2,2.0,122.0,19.0,19.0,1.0,1.0,5.0,2018.0,1600.0
2018 Acura RDX Specs: FWD w/Advance Pkg,45500.0,FWD w/Advance Pkg,Front Wheel Drive,5,4,Sport Utility,Compact,3829.0,41.6,56.6,38.3,59.7,39.6,38.4,108.3,74.8,65.7,17.1,22.0,28.0,Gasoline Direct Injection,5.25,1.6,3.27,3.97,10,2.19,Rack-Pinion,39.0,Aluminum,Strut,Multi-Link,Yes,Yes,No,Yes,No,Yes,Yes,Yes,Yes,Yes,Yes,No,No,Yes,Yes,Yes,Yes,Yes,50000.0,4.0,10000000.0,5.0,70000.0,6.0,50000.0,4.0,100000,8,2872.31,3075,Applies to hybrid vehicles only,False,False,Acura,10.0,6500.0,280.0,I,235.0,235.0,Japan,2,2.0,122.0,19.0,19.0,1.0,1.0,5.0,2018.0,1600.0
2018 Acura RDX Specs: FWD w/A-Spec Pkg,43600.0,FWD w/A-Spec Pkg,Front Wheel Drive,5,4,Sport Utility,Compact,3821.0,41.6,56.6,38.3,59.7,39.6,38.4,108.3,74.8,65.7,17.1,22.0,27.0,Gasoline Direct Injection,5.25,1.6,3.27,3.97,10,2.19,Rack-Pinion,39.0,Aluminum,Strut,Multi-Link,Yes,Yes,No,Yes,No,Yes,Yes,Yes,Yes,Yes,Yes,No,No,Yes,Yes,Yes,Yes,Yes,50000.0,4.0,10000000.0,5.0,70000.0,6.0,50000.0,4.0,100000,8,2872.31,3075,Applies to hybrid vehicles only,False,False,Acura,10.0,6500.0,280.0,I,255.0,255.0,Japan,2,2.0,122.0,20.0,20.0,1.0,1.0,4.0,2018.0,1600.0
2018 Acura RDX Specs: FWD,37400.0,FWD,Front Wheel Drive,5,4,Sport Utility,Compact,3783.0,41.6,56.6,38.3,59.7,39.6,38.4,108.3,74.8,65.7,17.1,22.0,28.0,Gasoline Direct Injection,5.25,1.6,3.27,3.97,10,2.19,Rack-Pinion,39.0,Aluminum,Strut,Multi-Link,Yes,Yes,No,Yes,No,Yes,Yes,Yes,Yes,Yes,Yes,No,No,No,No,Yes,Yes,Yes,50000.0,4.0,10000000.0,5.0,70000.0,6.0,50000.0,4.0,100000,8,2872.31,3075,Applies to hybrid vehicles only,False,False,Acura,10.0,6500.0,280.0,I,235.0,235.0,Japan,2,2.0,122.0,19.0,19.0,1.0,1.0,5.0,2018.0,1600.0
2018 Acura RDX Specs: AWD w/Technology Pkg,42600.0,AWD w/Technology Pkg,All Wheel Drive,5,4,Sport Utility,Compact,4026.0,41.6,56.6,38.3,59.7,39.6,38.4,108.3,74.8,65.7,17.1,21.0,27.0,Gasoline Direct Injection,5.25,1.6,3.27,3.97,10,2.19,Rack-Pinion,39.0,Aluminum,Strut,Multi-Link,Yes,Yes,No,Yes,No,Yes,Yes,Yes,Yes,Yes,Yes,No,No,No,Yes,Yes,Yes,Yes,50000.0,4.0,10000000.0,5.0,70000.0,6.0,50000.0,4.0,100000,8,2872.31,3075,Applies to hybrid vehicles only,False,False,Acura,10.0,6500.0,280.0,I,235.0,235.0,Japan,2,2.0,122.0,19.0,19.0,1.0,1.0,5.0,2018.0,1600.0


In [42]:
raw_data['Curb Weight - Front (lbs)'].isnull().value_counts()

False    31901
Name: Curb Weight - Front (lbs), dtype: int64

In [43]:
df = raw_data.copy()

In [44]:
raw_data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 31901 entries, 2018 Acura RDX Specs: FWD w/Technology Pkg to 2015 Volvo V60 Cross Country Specs: 2015.5 4-Door Wagon T5 Platinum AWD
Data columns (total 83 columns):
MSRP                                   31901 non-null float64
Style Name                             31901 non-null object
Drivetrain                             31901 non-null object
Passenger Capacity                     31901 non-null int64
Passenger Doors                        31901 non-null int64
Body Style                             31901 non-null object
EPA Classification                     31901 non-null object
Base Curb Weight (lbs)                 31901 non-null float64
Front Leg Room (in)                    31901 non-null float64
Second Shoulder Room (in)              31901 non-null float64
Second Head Room (in)                  31901 non-null float64
Front Shoulder Room (in)               31901 non-null float64
Front Head Room (in)                   31901 non-null

In [45]:
df.to_csv('raw_data_no_dummies_imputed.csv')
df.head()

Unnamed: 0_level_0,MSRP,Style Name,Drivetrain,Passenger Capacity,Passenger Doors,Body Style,EPA Classification,Base Curb Weight (lbs),Front Leg Room (in),Second Shoulder Room (in),Second Head Room (in),Front Shoulder Room (in),Front Head Room (in),Second Leg Room (in),Wheelbase (in),"Width, Max w/o mirrors (in)","Height, Overall (in)","Fuel Tank Capacity, Approx (gal)",EPA Fuel Economy Est - City (MPG),EPA Fuel Economy Est - Hwy (MPG),Fuel System,First Gear Ratio (:1),Fourth Gear Ratio (:1),Second Gear Ratio (:1),Reverse Ratio (:1),Trans Type,Third Gear Ratio (:1),Steering Type,Turning Diameter - Curb to Curb,Front Wheel Material,Suspension Type - Front,Suspension Type - Rear,Air Bag-Frontal-Driver,Air Bag-Frontal-Passenger,Air Bag-Passenger Switch (On/Off),Air Bag-Side Body-Front,Air Bag-Side Body-Rear,Air Bag-Side Head-Front,Air Bag-Side Head-Rear,Brakes-ABS,Child Safety Rear Door Locks,Daytime Running Lights,Traction Control,Night Vision,Rollover Protection Bars,Fog Lamps,Parking Aid,Tire Pressure Monitor,Back-Up Camera,Stability Control,Basic Miles/km,Basic Years,Corrosion Miles/km,Corrosion Years,Drivetrain Miles/km,Drivetrain Years,Roadside Assistance Miles/km,Roadside Assistance Years,Hybrid/Electric Components Miles/km,Hybrid/Electric Components Years,Curb Weight - Front (lbs),Curb Weight - Rear (lbs),Hybrid/Electric Components Note,Hybrid,Electric,Company Name,Gears,Net Horsepower RPM,Net Torque,Engine Configuration,Rear Tire Width,Front Tire Width,Country,Country Code,Displacement (L),Displacement (cc),Rear Wheel Size,Front Wheel Size,Tire Width Ratio,Wheel Size Ratio,Tire Ratio,Year,Net Torque RPM
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1
2018 Acura RDX Specs: FWD w/Technology Pkg,40600.0,FWD w/Technology Pkg,Front Wheel Drive,5,4,Sport Utility,Compact,3790.0,41.6,56.6,38.3,59.7,39.6,38.4,108.3,74.8,65.7,17.1,22.0,28.0,Gasoline Direct Injection,5.25,1.6,3.27,3.97,10,2.19,Rack-Pinion,39.0,Aluminum,Strut,Multi-Link,Yes,Yes,No,Yes,No,Yes,Yes,Yes,Yes,Yes,Yes,No,No,No,Yes,Yes,Yes,Yes,50000.0,4.0,10000000.0,5.0,70000.0,6.0,50000.0,4.0,100000,8,2872.31,3075,Applies to hybrid vehicles only,False,False,Acura,10.0,6500.0,280.0,I,235.0,235.0,Japan,2,2.0,122.0,19.0,19.0,1.0,1.0,5.0,2018.0,1600.0
2018 Acura RDX Specs: FWD w/Advance Pkg,45500.0,FWD w/Advance Pkg,Front Wheel Drive,5,4,Sport Utility,Compact,3829.0,41.6,56.6,38.3,59.7,39.6,38.4,108.3,74.8,65.7,17.1,22.0,28.0,Gasoline Direct Injection,5.25,1.6,3.27,3.97,10,2.19,Rack-Pinion,39.0,Aluminum,Strut,Multi-Link,Yes,Yes,No,Yes,No,Yes,Yes,Yes,Yes,Yes,Yes,No,No,Yes,Yes,Yes,Yes,Yes,50000.0,4.0,10000000.0,5.0,70000.0,6.0,50000.0,4.0,100000,8,2872.31,3075,Applies to hybrid vehicles only,False,False,Acura,10.0,6500.0,280.0,I,235.0,235.0,Japan,2,2.0,122.0,19.0,19.0,1.0,1.0,5.0,2018.0,1600.0
2018 Acura RDX Specs: FWD w/A-Spec Pkg,43600.0,FWD w/A-Spec Pkg,Front Wheel Drive,5,4,Sport Utility,Compact,3821.0,41.6,56.6,38.3,59.7,39.6,38.4,108.3,74.8,65.7,17.1,22.0,27.0,Gasoline Direct Injection,5.25,1.6,3.27,3.97,10,2.19,Rack-Pinion,39.0,Aluminum,Strut,Multi-Link,Yes,Yes,No,Yes,No,Yes,Yes,Yes,Yes,Yes,Yes,No,No,Yes,Yes,Yes,Yes,Yes,50000.0,4.0,10000000.0,5.0,70000.0,6.0,50000.0,4.0,100000,8,2872.31,3075,Applies to hybrid vehicles only,False,False,Acura,10.0,6500.0,280.0,I,255.0,255.0,Japan,2,2.0,122.0,20.0,20.0,1.0,1.0,4.0,2018.0,1600.0
2018 Acura RDX Specs: FWD,37400.0,FWD,Front Wheel Drive,5,4,Sport Utility,Compact,3783.0,41.6,56.6,38.3,59.7,39.6,38.4,108.3,74.8,65.7,17.1,22.0,28.0,Gasoline Direct Injection,5.25,1.6,3.27,3.97,10,2.19,Rack-Pinion,39.0,Aluminum,Strut,Multi-Link,Yes,Yes,No,Yes,No,Yes,Yes,Yes,Yes,Yes,Yes,No,No,No,No,Yes,Yes,Yes,50000.0,4.0,10000000.0,5.0,70000.0,6.0,50000.0,4.0,100000,8,2872.31,3075,Applies to hybrid vehicles only,False,False,Acura,10.0,6500.0,280.0,I,235.0,235.0,Japan,2,2.0,122.0,19.0,19.0,1.0,1.0,5.0,2018.0,1600.0
2018 Acura RDX Specs: AWD w/Technology Pkg,42600.0,AWD w/Technology Pkg,All Wheel Drive,5,4,Sport Utility,Compact,4026.0,41.6,56.6,38.3,59.7,39.6,38.4,108.3,74.8,65.7,17.1,21.0,27.0,Gasoline Direct Injection,5.25,1.6,3.27,3.97,10,2.19,Rack-Pinion,39.0,Aluminum,Strut,Multi-Link,Yes,Yes,No,Yes,No,Yes,Yes,Yes,Yes,Yes,Yes,No,No,No,Yes,Yes,Yes,Yes,50000.0,4.0,10000000.0,5.0,70000.0,6.0,50000.0,4.0,100000,8,2872.31,3075,Applies to hybrid vehicles only,False,False,Acura,10.0,6500.0,280.0,I,235.0,235.0,Japan,2,2.0,122.0,19.0,19.0,1.0,1.0,5.0,2018.0,1600.0
