# Predicting the Sale Price of Bulldozers using Machine Learning

This notebook is based on Kaggle Competition - [Bluebook for Bulldozers](https://www.kaggle.com/c/bluebook-for-bulldozers/overview)

This is a kind of regression problem, where we try to predict a continuous number. Evaluation metric has been specified in the competition rules and so we will use RMSE for model evaluation

In [10]:
# Imports

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

### Download the dataset from Kaggle

In [1]:
!pip install --upgrade --force-reinstall --no-deps kaggle

Collecting kaggle
  Downloading kaggle-1.5.12.tar.gz (58 kB)
[?25l[K     |█████▋                          | 10 kB 18.4 MB/s eta 0:00:01[K     |███████████▏                    | 20 kB 22.0 MB/s eta 0:00:01[K     |████████████████▊               | 30 kB 13.5 MB/s eta 0:00:01[K     |██████████████████████▎         | 40 kB 10.5 MB/s eta 0:00:01[K     |███████████████████████████▉    | 51 kB 5.4 MB/s eta 0:00:01[K     |████████████████████████████████| 58 kB 3.1 MB/s 
[?25hBuilding wheels for collected packages: kaggle
  Building wheel for kaggle (setup.py) ... [?25l[?25hdone
  Created wheel for kaggle: filename=kaggle-1.5.12-py3-none-any.whl size=73052 sha256=dc19a5620150ec19efd788843765e5a3c91f3f1301b87fd4547f8fbd857caecc
  Stored in directory: /root/.cache/pip/wheels/62/d6/58/5853130f941e75b2177d281eb7e44b4a98ed46dd155f556dc5
Successfully built kaggle
Installing collected packages: kaggle
  Attempting uninstall: kaggle
    Found existing installation: kaggle 1.5.12
    Un

In [None]:
from google.colab import files
files.upload()

In [3]:
! mkdir ~/.kaggle

In [4]:

! cp kaggle.json ~/.kaggle/

In [5]:
! chmod 600 ~/.kaggle/kaggle.json

In [7]:
! kaggle competitions download -c bluebook-for-bulldozers

Downloading bluebook-for-bulldozers.zip to /content
 85% 41.0M/48.4M [00:00<00:00, 52.2MB/s]
100% 48.4M/48.4M [00:00<00:00, 68.7MB/s]


In [8]:
! mkdir input_data

In [9]:
! unzip bluebook-for-bulldozers.zip -d input_data

Archive:  bluebook-for-bulldozers.zip
  inflating: input_data/Data Dictionary.xlsx  
  inflating: input_data/Machine_Appendix.csv  
  inflating: input_data/Test.csv     
  inflating: input_data/Train.7z     
  inflating: input_data/Train.zip    
  inflating: input_data/TrainAndValid.7z  
  inflating: input_data/TrainAndValid.csv  
  inflating: input_data/TrainAndValid.zip  
  inflating: input_data/Valid.7z     
  inflating: input_data/Valid.csv    
  inflating: input_data/Valid.zip    
  inflating: input_data/ValidSolution.csv  
  inflating: input_data/median_benchmark.csv  
  inflating: input_data/random_forest_benchmark_test.csv  


### EDA

In [14]:
# Read Train and Validation Dataset. Note that we use saledat as date column to be passed to parse_date argument
df = pd.read_csv("input_data/TrainAndValid.csv",
                 low_memory=False,
                 parse_dates=["saledate"])

df.head()

Unnamed: 0,SalesID,SalePrice,MachineID,ModelID,datasource,auctioneerID,YearMade,MachineHoursCurrentMeter,UsageBand,saledate,fiModelDesc,fiBaseModel,fiSecondaryDesc,fiModelSeries,fiModelDescriptor,ProductSize,fiProductClassDesc,state,ProductGroup,ProductGroupDesc,Drive_System,Enclosure,Forks,Pad_Type,Ride_Control,Stick,Transmission,Turbocharged,Blade_Extension,Blade_Width,Enclosure_Type,Engine_Horsepower,Hydraulics,Pushblock,Ripper,Scarifier,Tip_Control,Tire_Size,Coupler,Coupler_System,Grouser_Tracks,Hydraulics_Flow,Track_Type,Undercarriage_Pad_Width,Stick_Length,Thumb,Pattern_Changer,Grouser_Type,Backhoe_Mounting,Blade_Type,Travel_Controls,Differential_Type,Steering_Controls
0,1139246,66000.0,999089,3157,121,3.0,2004,68.0,Low,2006-11-16,521D,521,D,,,,Wheel Loader - 110.0 to 120.0 Horsepower,Alabama,WL,Wheel Loader,,EROPS w AC,None or Unspecified,,None or Unspecified,,,,,,,,2 Valve,,,,,None or Unspecified,None or Unspecified,,,,,,,,,,,,,Standard,Conventional
1,1139248,57000.0,117657,77,121,3.0,1996,4640.0,Low,2004-03-26,950FII,950,F,II,,Medium,Wheel Loader - 150.0 to 175.0 Horsepower,North Carolina,WL,Wheel Loader,,EROPS w AC,None or Unspecified,,None or Unspecified,,,,,,,,2 Valve,,,,,23.5,None or Unspecified,,,,,,,,,,,,,Standard,Conventional
2,1139249,10000.0,434808,7009,121,3.0,2001,2838.0,High,2004-02-26,226,226,,,,,Skid Steer Loader - 1351.0 to 1601.0 Lb Operat...,New York,SSL,Skid Steer Loaders,,OROPS,None or Unspecified,,,,,,,,,,Auxiliary,,,,,,None or Unspecified,None or Unspecified,None or Unspecified,Standard,,,,,,,,,,,
3,1139251,38500.0,1026470,332,121,3.0,2001,3486.0,High,2011-05-19,PC120-6E,PC120,,-6E,,Small,"Hydraulic Excavator, Track - 12.0 to 14.0 Metr...",Texas,TEX,Track Excavators,,EROPS w AC,,,,,,,,,,,2 Valve,,,,,,None or Unspecified,,,,,,,,,,,,,,
4,1139253,11000.0,1057373,17311,121,3.0,2007,722.0,Medium,2009-07-23,S175,S175,,,,,Skid Steer Loader - 1601.0 to 1751.0 Lb Operat...,New York,SSL,Skid Steer Loaders,,EROPS,None or Unspecified,,,,,,,,,,Auxiliary,,,,,,None or Unspecified,None or Unspecified,None or Unspecified,Standard,,,,,,,,,,,
