# Car Noise Level and Specification Dataset

## Contents
- [Problem Statment](#Problem-Statment)
- [Datasets Description](#Datasets-Description)
- [Excutive Summary](#Excutive-Summary)
- [Web scrapping sources and API](#Web-scrapping-sources-and-API)
- [Data Dictionary](#Data-Dictionary)
- [I. Scrapping from Web and retrieving from API](#I.-Scrapping-from-Web-and-retrieving-from-API)
	- [(1) Scrapping cars information and noise level from 'auto-decibel-db.com'](#(1)-Scrapping-cars-information-and-noise-level-from-'auto-decibel-db.com')
    - [(2) Getting model_id for each car using 'carqueryapi.com' API](#(2)-Getting-model_id-for-each-car-using-'carqueryapi.com'-API)
    - [(3) Getting detailed specification of cars using 'carqueryapi.com' API](#(3)-Getting-detailed-specification-of-cars-using-'carqueryapi.com'-API)
    - [(4) Scrapping cars' prices](#(4)-Scrapping-cars'-prices)
- [II. Exporting datafram to CSV](#II.-Exporting-datafram-to-CSV)

## Problem Statment

The noise level of cars could be an indicator of both car’s condition and manufacturing quality. Drivers could use noise level to determine if an car suits their needs or if their current car is in a healthy state. On the other hand, manufactures could use noise level to assets their cars quality compared to the market. Luxury cars compete to have low noise level, while sports car usually neglect this factor. In this project we compile data from different sources to arrive to a dataset having many cars’ manufacturing specification mapped to its noise level at different speed. The compiled dataset could be utilized in evaluating cars noise level or in analyzing which technical specification has the major effect on cars’ noise level.

For mor information:

- [An overview of automobile noise and vibration control](https://www.researchgate.net/publication/270775858_An_overview_of_automobile_noise_and_vibration_control)

- [Noise, vibration, and harshness From Wikipedia](https://en.wikipedia.org/wiki/Noise,_vibration,_and_harshness)

## Excutive Summary


Initially we scrape data from https://www.auto-decibel-db.com (hereafter referred AD). This website has nearly 2000 data entries about cars' cabin noise level. Each car in the website has its cabin noise (measured in decibel) at different speed. The website doesn't provide further information about the source or the methodology of its collected data, yet it's the most comprehensive data about the subject I could found. Another source which might be used for verification can be found at https://www.edmunds.com. While edmunds.com states its methodology of collecting noise level, its dataset is embedded in PDF files and is not comprehensive compared to the former.


After scrapping the noise level of cars, we use the available information we have about each car to find its specification. In the scrapped dataset from AD there's 4 features which can be used to identify same car's specification in other datasets: brand, model, year, and spec. After looking up the Web for websites and APIs having detailed and comprehensive data about cars, we decided on http://www.carqueryapi.com API (hereafter referred CQA). Though it's not accurate for some cars, and it has different spelling from our AD, it's the most accessible data we could find. In this section we map each car in AD to its equivalence in cqa using the 3 features: brand, model, and year. We first specify the model_id in CQA and then we will use model_id to retrieve the full specification of the car. Due to the limitation imposed by caranddriver.com on the number of requests (60 requests), we used Tor bridge to alternate IP address.

Finally, we look up for the full specification of each car in CQA using its model_id. In this section we added 60 features of specification of nearly a 1000 car in AD. We refer to each feature pulled from CQA by a postfix added to its column name: '_cqa'. At the end we succeeded in getting specification of 1067 car out of 1895 in  AD. We couldn’t find specification for all cars in AD due to either different naming of cars between AD and QC, or the car doesn’t exist in QC.

## Web scrapping sources and API


[auto-decibel-db.com](https://www.auto-decibel-db.com): This website has nearly 2000 data entries about cars' cabin noise level. Each car in the website has its cabin noise (measured in decibel) at different speed.

[carqueryapi.com](http://www.carqueryapi.com): a JSON based API for retrieving detailed car and truck information, including year, make, model, trim, and specifications. It has 73419 vichle in its database. 

## Data Dictionary

| Feature                            | Type    | Dataset | Description                                      |
|------------------------------------|---------|---------|--------------------------------------------------|
| 'brand'                            |  object | ad      | manufacture of the car                           |
|  'model'                           |  object | ad      | model of the car                                 |
|  'spec'                            |  object | ad      | the size engine or the car trim                  |
|  'year'                            |  object | ad      | year of releasing                                |
|  'dB_at_idle'                      |  object | ad      | noise level when car is idle measured in decibel |
|  'dB_at_50kmh'                     |  object | ad      | noise level at speed 50kmh measured in decibel   |
|  'dB_at_80kmh'                     |  object | ad      | noise level at speed 80kmh measured in decibel   |
|  'dB_at_100kmh'                    |  object | ad      | noise level at speed 100kmh measured in decibel  |
|  'dB_at_120kmh'                    |  object | ad      | noise level at speed 120kmh measured in decibel  |
|  'dB_at_140kmh'                    |  object | ad      | noise level at speed 140kmh measured in decibel  |
|  'model_id'                        |  object | cqa     | car id as in cqa database using the first query  |
|  'model_id_cqa'                    |  object | cqa     | car id as in cqa database using the second query |
|  'model_make_id_cqa'               |  object | cqa     | car technical specification pulled from cqa      |
|  'model_name_cqa'                  |  object | cqa     | car technical specification pulled from cqa      |
|  'model_trim_cqa'                  |  object | cqa     | car technical specification pulled from cqa      |
|  'model_year_cqa'                  |  object | cqa     | car technical specification pulled from cqa      |
|  'model_body_cqa'                  |  object | cqa     | car technical specification pulled from cqa      |
|  'model_engine_position_cqa'       |  object | cqa     | car technical specification pulled from cqa      |
|  'model_engine_cc_cqa'             |  object | cqa     | car technical specification pulled from cqa      |
|  'model_engine_cyl_cqa'            |  object | cqa     | car technical specification pulled from cqa      |
|  'model_engine_type_cqa'           |  object | cqa     | car technical specification pulled from cqa      |
|  'model_engine_valves_per_cyl_cqa' |  object | cqa     | car technical specification pulled from cqa      |
|  'model_engine_power_ps_cqa'       |  object | cqa     | car technical specification pulled from cqa      |
|  'model_engine_power_rpm_cqa'      |  object | cqa     | car technical specification pulled from cqa      |
|  'model_engine_torque_nm_cqa'      |  object | cqa     | car technical specification pulled from cqa      |
|  'model_engine_torque_rpm_cqa'     |  object | cqa     | car technical specification pulled from cqa      |
|  'model_engine_bore_mm_cqa'        |  object | cqa     | car technical specification pulled from cqa      |
|  'model_engine_stroke_mm_cqa'      |  object | cqa     | car technical specification pulled from cqa      |
|  'model_engine_compression_cqa'    |  object | cqa     | car technical specification pulled from cqa      |
|  'model_engine_fuel_cqa'           |  object | cqa     | car technical specification pulled from cqa      |
|  'model_top_speed_kph_cqa'         | float64 | cqa     | car technical specification pulled from cqa      |
|  'model_0_to_100_kph_cqa'          | float64 | cqa     | car technical specification pulled from cqa      |
|  'model_drive_cqa'                 |  object | cqa     | car technical specification pulled from cqa      |
|  'model_transmission_type_cqa'     |  object | cqa     | car technical specification pulled from cqa      |
|  'model_seats_cqa'                 |  object | cqa     | car technical specification pulled from cqa      |
|  'model_doors_cqa'                 |  object | cqa     | car technical specification pulled from cqa      |
|  'model_weight_kg_cqa'             |  object | cqa     | car technical specification pulled from cqa      |
|  'model_length_mm_cqa'             |  object | cqa     | car technical specification pulled from cqa      |
|  'model_width_mm_cqa'              |  object | cqa     | car technical specification pulled from cqa      |
|  'model_height_mm_cqa'             |  object | cqa     | car technical specification pulled from cqa      |
|  'model_wheelbase_mm_cqa'          |  object | cqa     | car technical specification pulled from cqa      |
|  'model_lkm_hwy_cqa'               |  object | cqa     | car technical specification pulled from cqa      |
|  'model_lkm_mixed_cqa'             | float64 | cqa     | car technical specification pulled from cqa      |
|  'model_lkm_city_cqa'              |  object | cqa     | car technical specification pulled from cqa      |
|  'model_fuel_cap_l_cqa'            |  object | cqa     | car technical specification pulled from cqa      |
|  'model_sold_in_us_cqa'            |  object | cqa     | car technical specification pulled from cqa      |
|  'model_co2_cqa'                   | float64 | cqa     | car technical specification pulled from cqa      |
|  'model_make_display_cqa'          |  object | cqa     | car technical specification pulled from cqa      |
|  'model_engine_l_cqa'              |  object | cqa     | car technical specification pulled from cqa      |
|  'model_engine_ci_cqa'             |  object | cqa     | car technical specification pulled from cqa      |
|  'model_engine_bore_in_cqa'        |  object | cqa     | car technical specification pulled from cqa      |
|  'model_engine_stroke_in_cqa'      |  object | cqa     | car technical specification pulled from cqa      |
|  'model_engine_valves_cqa'         |  object | cqa     | car technical specification pulled from cqa      |
|  'model_engine_power_hp_cqa'       |  object | cqa     | car technical specification pulled from cqa      |
|  'model_engine_power_kw_cqa'       |  object | cqa     | car technical specification pulled from cqa      |
|  'model_engine_torque_lbft_cqa'    |  object | cqa     | car technical specification pulled from cqa      |
|  'model_engine_torque_kgm_cqa'     |  object | cqa     | car technical specification pulled from cqa      |
|  'model_top_speed_mph_cqa'         | float64 | cqa     | car technical specification pulled from cqa      |
|  'model_weight_lbs_cqa'            |  object | cqa     | car technical specification pulled from cqa      |
|  'model_length_in_cqa'             |  object | cqa     | car technical specification pulled from cqa      |
|  'model_width_in_cqa'              |  object | cqa     | car technical specification pulled from cqa      |
|  'model_height_in_cqa'             |  object | cqa     | car technical specification pulled from cqa      |
|  'model_wheelbase_in_cqa'          |  object | cqa     | car technical specification pulled from cqa      |
|  'model_mpg_hwy_cqa'               |  object | cqa     | car technical specification pulled from cqa      |
|  'model_mpg_city_cqa'              |  object | cqa     | car technical specification pulled from cqa      |
|  'model_mpg_mixed_cqa'             | float64 | cqa     | car technical specification pulled from cqa      |
|  'model_fuel_cap_g_cqa'            |  object | cqa     | car technical specification pulled from cqa      |
|  'make_display_cqa'                |  object | cqa     | car technical specification pulled from cqa      |
|  'make_country_cqa'                |  object | cqa     | car technical specification pulled from cqa      |
|  'ExtColors_cqa'                   | float64 | cqa     | car technical specification pulled from cqa      |
|  'IntColors_cqa'                   | float64 |         | car technical specification pulled from cqa      |

## I. Scrapping from Web and retrieving from API

### (1) Scrapping cars information and noise level from 'auto-decibel-db.com'

Here we are going to scrape our initial dataset from https://www.auto-decibel-db.com. This website has nearly 2000 data entries about cars' cabin noise level. Eache car in the website has its cabine noise (measured in decible) at different speed. The website doesn't provide further information about the source or the methodology of its collected data, yet it's the most comprehnsive data about the subject I could found. Another source which might be used for verification can be found at https://www.edmunds.com. While edmunds.com states its methdology of collecting noise level, its dataset is embeded in PDF files and is not comprehsive compared to the former.

In [150]:
import requests
r = requests.get('http://www.auto-decibel-db.com/desktop_kmh.html')

In [91]:
from scrapy.selector import Selector
from scrapy.http import HtmlResponse
import pandas as pd

In [95]:
from pprint import pprint

col_ls = Selector(text=r.text).xpath('/html/body/div/div[2]/table/thead/tr/th/text()').extract()
col_ls = ['brand',
 'model',
 'spec',
 'year',
 'dB_at_idle',
 'dB_at_50kmh',
 'dB_at_80kmh',
 'dB_at_100kmh',
 'dB_at_120kmh',
 'dB_at_140kmh']

In [166]:
# pprint(Selector(text=r.text).xpath('/html/body/div/div[2]/table/tbody/tr[2]/td/text()').extract())
i = 2
rows = []
while i != 0: 
    row_dum = Selector(text=r.text).xpath('/html/body/div/div[2]/table/tbody/tr['+str(i)+']/td/text()').extract()
    rows.append(row_dum)
    i += 1
    if len(row_dum) == 0:
        i = 0

In [171]:
df = pd.DataFrame(rows, columns=col_ls)

In [219]:
df['spec']

0             1.4 16v T-Jet
1              Competizione
2                    3.5 V6
3                    3.5 V6
4                    3.7 V6
5                    3.5 V6
6                    3.5 V6
7                    3.5 V6
8                    3.7 V6
9                       2.4
10                       V6
11               Stationcar
12                 1.9 JTDm
13                  2.2 JTS
14        2.2 JTS Selespeed
15                         
16                     2.1d
17                   2.9 V6
18                      2.0
19                      1.4
20                 1.6 JTDm
21                     1.4T
22                 2.0 JTDm
23                      1.4
24                     1.6d
25              1.9 JTDm Q2
26            1.4 Turbo 155
27                 1.6 JTDm
28      1.4 Turbo Multi-Air
29                      2.0
               ...         
1865                 D3 2.0
1866                     D3
1867                 T6 3.0
1868                     D2
1869                

In [174]:
df.shape

(1895, 10)

In [249]:
df.dtypes

brand            object
model            object
spec             object
year             object
dB_at_idle       object
dB_at_50kmh      object
dB_at_80kmh      object
dB_at_100kmh     object
dB_at_120kmh     object
dB_at_140kmh     object
model_id        float64
dtype: object

### (2) Getting model_id for each car using 'carqueryapi.com' API

After scrapping the noise level of cars, we use the available information we have about each car to find its specification. In the scrapped dataset from auto-decibel-db.com (hereafter referred AD) there's 4 features which can be used to identify same car's specification in other datasets: brand, model, year, and spec. After looking up the Web for websites and APIs having detailed and comprehensive data about cars, we decided on http://caranddriver.com/ API (hereafter referred CQA). Though it's not accurate for some cars, and it has different spelling from our AD, it's the most accessable data we could find. In this section we map each car in AD to its equivalence in cqa using the 3 features: brand, model, and year. We first specify the model_id in CQA and then we will use model_id to retrieve the full specification of the car. Due to the limitation imposed by caranddriver.com on the number of requests (60 requests), we used Tor bridge to alternate IP address.

In [311]:
params = {'make':'lexus','year':'2008', 'model':'ls'}
# to get json format we remove callback=? parameter from URL
url = 'http://www.carqueryapi.com/api/0.3/?&cmd=getTrims&';
# carqueryapi.com only accepts requests of defined user-agent header
headers = {'User-Agent': 'HotJava/1.1.2 FCS'}
# headers = {'User-Agent': 'Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:65.0) Gecko/20100101 Firefox/65.0'}

# r = requests.get(url, headers=headers, params=params);
# print(r.text);
# data = r.json()

In [261]:
import numpy as np
df['model_id'] = np.nan
df['model_id'] = df['model_id'].astype(str)

In [262]:
df

Unnamed: 0,brand,model,spec,year,dB_at_idle,dB_at_50kmh,dB_at_80kmh,dB_at_100kmh,dB_at_120kmh,dB_at_140kmh,model_id
0,Abarth,500,1.4 16v T-Jet,2008,47.3,58.2,67.0,70.2,72.9,76.0,
1,Abarth,595,Competizione,2017,49.9,65.7,69.0,72.3,73.1,75.8,
2,Acura,MDX,3.5 V6,2013,41.7,51.7,57.9,61.2,64.0,66.8,
3,Acura,RDX,3.5 V6,2012,43.0,54.4,61.5,65.5,67.7,69.9,
4,Acura,RL,3.7 V6,2009,43.6,55.5,63.0,66.9,70.2,73.5,
5,Acura,RLX,3.5 V6,2013,39.5,49.7,56.0,59.3,62.4,65.6,
6,Acura,RLX,3.5 V6,2016,42.2,51.1,56.7,59.4,62.6,65.9,
7,Acura,TL,3.5 V6,2009,40.2,54.5,63.3,68.6,70.1,71.7,
8,Acura,TL,3.7 V6,2010,39.9,48.4,53.8,56.2,60.2,64.2,
9,Acura,TSX,2.4,2009,42.2,54.0,61.2,65.2,67.9,70.5,


#### Initializing Tor proxy to overcome blocking 

In [405]:
session = requests.session()
session.proxies = {}

session.proxies['http'] = 'socks5h://localhost:9150'
session.proxies['https'] = 'socks5h://localhost:9150'

# headers = {'User-Agent': 'HotJava/1.1.2 FCS'}
headers = {'User-Agent': 'Mozilla/5.0 Firefox/65.0'}


r = session.get('http://httpbin.org/ip')
print(r.text)

{
  "origin": "185.193.125.42, 185.193.125.42"
}



In [422]:
r = session.get('http://httpbin.org/ip')
print(r.text)

{
  "origin": "109.70.100.19, 109.70.100.19"
}



In [454]:
params = {'make':'lexus','year':'2008', 'model':'ls'}
# to get json format we remove callback=? parameter from URL
url = 'http://www.carqueryapi.com/api/0.3/?&cmd=getTrims&';
# carqueryapi.com only accepts requests of defined user-agent header
# headers = {'User-Agent': 'HotJava/1.1.2 FCS'}
headers = {'User-Agent': 'Mozilla/5.0 Firefox/65.0'}

# r = session.get(url, headers=headers, params=params);
# print(r.text);
# data = r.json()

In [435]:
import difflib
from random import randint

for index, row in df.iloc[808:].iterrows():
    print(index,row['brand'], str(row['model']).replace("'", ""), row['year'])
    params = {'make':row['brand'],'year':row['year'], 'model':str(row['model']).replace("'", "")}
#     r = requests.get(url, headers=headers, params=params);
    
    r = session.get(url, headers=headers, params=params)
    
    try:
        data = r.json()
    except ValueError:
        # Whoops it wasn't a 200
        print('######## Decoding JSON has failed ##########')
        continue
        
    if list(data.keys())[0] == 'error' or (not list(data.values())[0]):
        print('######## Ouch! me stupid car with poor model name! ########')
        print(r.text)
        continue
    
    model_trim_ls = [data['Trims'][t]['model_trim'] for t in range(len(data['Trims']))]
#     model_trim_ls = [data['Trims'][t]['model_trim'] for t in range(len(data['Trims']))]
    
    if len(model_trim_ls) != 0:
        close_matches = difflib.get_close_matches(row['spec'], model_trim_ls)
        if len(close_matches) != 0:
            model_index = model_trim_ls.index(close_matches[0])
        else:
        #if this model is not available excatly, then chose randomly among others
            model_index = randint(0, len(model_trim_ls)-1)

        model_id = data['Trims'][model_index]['model_id']
        df.at[index,'model_id'] = model_id

808 Kia Ceed 2008
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
809 Kia Ceed 2008
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
810 Kia Ceed 2008
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
811 Kia Ceed 2008
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
812 Kia Ceed 2008
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
813 Kia Ceed 2009
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
814 Kia Ceed 2010
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
815 Kia Ceed 2012
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
816 Kia Ceed 2014
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
817 Kia Ceed 2016
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
818 Kia Ceed 2016
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}

######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
969 Mazda MX-5 2016
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
970 McLaren 570S 2017
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
971 McLaren MP4-12C 2012
972 Mercedes A 2012
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
973 Mercedes A 2012
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
974 Mercedes A 2016
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
975 Mercedes A 2016
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
976 Mercedes A 2018
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
977 Mercedes AMG GT 2017
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
978 Mercedes B 2012
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
979 Mercedes B 2012
######## Ouch! me stupid car with poor

######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
1057 Mercedes GLK 2014
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
1058 Mercedes ML 2008
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
1059 Mercedes ML 2009
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
1060 Mercedes ML 2012
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
1061 Mercedes ML 2012
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
1062 Mercedes ML 2012
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
1063 Mercedes R 2012
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
1064 Mercedes S 2008
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
1065 Mercedes S 2008
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
1066 Mercedes S 2009
######## Ouch! me stupid car with poor model name! ##

1208 Opel Astra 2008
1209 Opel Astra 2008
1210 Opel Astra 2009
1211 Opel Astra 2010
1212 Opel Astra 2010
1213 Opel Astra 2010
1214 Opel Astra 2011
1215 Opel Astra 2011
1216 Opel Astra 2012
1217 Opel Astra 2012
1218 Opel Astra 2012
1219 Opel Astra 2012
1220 Opel Astra 2012
1221 Opel Astra 2015
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
1222 Opel Astra 2016
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
1223 Opel Astra 2016
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
1224 Opel Astra 2016
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
1225 Opel Astra 2017
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
1226 Opel Cascada 2014
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
1227 Opel Corsa 2008
1228 Opel Corsa 2009
1229 Opel Corsa 2010
1230 Opel Corsa 2011
1231 Opel Corsa 2011
1232 Opel Corsa 2012
1233 Opel Corsa 2015
######## Ouch!

1344 Porsche 718 2016
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
1345 Porsche 718 2016
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
1346 Porsche 718 2017
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
1347 Porsche 911 2008
1348 Porsche 911 2010
1349 Porsche 911 2011
1350 Porsche 911 2011
1351 Porsche 911 2012
1352 Porsche 911 2012
1353 Porsche 911 2013
1354 Porsche 911 2014
1355 Porsche 911 2016
1356 Porsche 911 2016
1357 Porsche 911 2016
1358 Porsche 911 2016
1359 Porsche 911 2017
1360 Porsche Boxter 2011
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
1361 Porsche Boxter 2012
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
1362 Porsche Boxter 2012
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
1363 Porsche Cayenne 2008
1364 Porsche Cayenne 2009
1365 Porsche Cayenne 2010
1366 Porsche Cayenne 2011
1367 Porsche Cayenne 2012
1368 

1491 Seat Altea 2008
1492 Seat Altea XL 2008
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
1493 Seat Arona 2018
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
1494 Seat Ateca 2017
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
1495 Seat Ateca 2017
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
1496 Seat Ateca 2018
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
1497 Seat Exeo 2009
1498 Seat Exeo 2009
1499 Seat Exeo 2009
1500 Seat Exeo 2011
1501 Seat Exeo 2012
1502 Seat Ibiza 2008
1503 Seat Ibiza 2010
1504 Seat Ibiza 2010
1505 Seat Ibiza 2011
1506 Seat Ibiza 2011
1507 Seat Ibiza 2012
1508 Seat Ibiza 2016
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
1509 Seat Ibiza 2017
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
1510 Seat Leon 2008
1511 Seat Leon 2008
1512 Seat Leon 2008
1513 Seat Leon 2009
1514 Seat Le

######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
1621 Suzuki Swift 2010
1622 Suzuki Swift 2011
1623 Suzuki Swift 2011
1624 Suzuki Swift 2012
1625 Suzuki Swift 2017
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
1626 Suzuki SX4 2010
1627 Suzuki SX4 2013
1628 Suzuki Vitara 2015
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
1629 Suzuki Vitara 2018
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
1630 Tesla Model S 2012
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
1631 Tesla Model S 2013
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
1632 Tesla Model S 2016
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
1633 Tesla Model X 2016
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
1634 Tesla Model X 2017
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
1635 Toyota 4Runne

######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
1805 Volkswagen Scirocco 2008
1806 Volkswagen Scirocco 2009
1807 Volkswagen Scirocco 2012
1808 Volkswagen Scirocco 2014
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
1809 Volkswagen Sharan 2011
1810 Volkswagen Sharan 2012
1811 Volkswagen Sharan 2018
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
1812 Volkswagen Tiguan 2008
1813 Volkswagen Tiguan 2009
1814 Volkswagen Tiguan 2011
1815 Volkswagen Tiguan 2011
1816 Volkswagen Tiguan 2012
1817 Volkswagen Tiguan 2016
1818 Volkswagen Tiguan 2016
1819 Volkswagen Tiguan 2017
1820 Volkswagen Tiguan 2018
1821 Volkswagen Tiguan 2018
1822 Volkswagen Touareg 2008
1823 Volkswagen Touareg 2010
1824 Volkswagen Touareg 2014
1825 Volkswagen Touareg 2015
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
1826 Volkswagen Touareg 2018
######## Ouch! me stupid car with poor model name! ########
{"Trims":[]}
1827 V

In [445]:
#number of rows without model_id
#many of them might have model_id  but due to poor naming in the API, or opposite, the code coudln't find its equivalent model_id
#one sugessition is to make dictionary of alterantive name, e.g. brand name 'Mercedes' is used in the auto-decible dataset while in the API is used as 'Mercedes-Benz' 
len(df[df['model_id'] == 'nan'])

828

In [434]:
r.text

'{"Trims":[]}'

In [396]:
list(data.keys())

len(data.values())

1

In [436]:
df.dtypes

brand           object
model           object
spec            object
year            object
dB_at_idle      object
dB_at_50kmh     object
dB_at_80kmh     object
dB_at_100kmh    object
dB_at_120kmh    object
dB_at_140kmh    object
model_id        object
dtype: object

In [277]:

for index, row in df.iloc[61:].iterrows():
    print(index, row['brand'], row['model'], row['year'])

61 Audi A4 2012
62 Audi A4 2012
63 Audi A4 2014
64 Audi A4 2014
65 Audi A4 2014
66 Audi A4 2015
67 Audi A4 2015
68 Audi A4 2016
69 Audi A4 2016
70 Audi A4 2017
71 Audi A5 2008
72 Audi A5 2009
73 Audi A5 2010
74 Audi A5 2010
75 Audi A5 2012
76 Audi A5 2016
77 Audi A5 2018
78 Audi A6 2009
79 Audi A6 2009
80 Audi A6 2010
81 Audi A6 2010
82 Audi A6 2011
83 Audi A6 2011
84 Audi A6 2011
85 Audi A6 2012
86 Audi A6 2012
87 Audi A6 2015
88 Audi A6 2015
89 Audi A6 2016
90 Audi A6 2016
91 Audi A6 2016
92 Audi A6 2016
93 Audi A6 2017
94 Audi A6 2017
95 Audi A6 2017
96 Audi A6 2017
97 Audi A7 2011
98 Audi A8 2008
99 Audi A8 2010
100 Audi A8 2010
101 Audi A8 2011
102 Audi A8 2012
103 Audi A8 2014
104 Audi Q2 2017
105 Audi Q2 2018
106 Audi Q2 2018
107 Audi Q3 2011
108 Audi Q3 2015
109 Audi Q3 2016
110 Audi Q3 2016
111 Audi Q3 2018
112 Audi Q3 RS 2015
113 Audi Q5 2008
114 Audi Q5 2009
115 Audi Q5 2009
116 Audi Q5 2011
117 Audi Q5 2012
118 Audi Q5 2012
119 Audi Q5 2014
120 Audi Q5 2014
121 Audi Q5 2014

543 Ford F-150 2009
544 Ford F-150 2010
545 Ford F-150 2011
546 Ford F-150 2013
547 Ford F-150 2016
548 Ford F-450 2008
549 Ford Fiesta 2008
550 Ford Fiesta 2008
551 Ford Fiesta 2009
552 Ford Fiesta 2009
553 Ford Fiesta 2010
554 Ford Fiesta 2011
555 Ford Fiesta 2013
556 Ford Fiesta 2013
557 Ford Fiesta 2014
558 Ford Fiesta 2016
559 Ford Fiesta 2016
560 Ford Fiesta 2017
561 Ford Fiesta 2018
562 Ford Fiesta 2018
563 Ford Flex 2009
564 Ford Focus 2008
565 Ford Focus 2008
566 Ford Focus 2008
567 Ford Focus 2008
568 Ford Focus 2008
569 Ford Focus 2009
570 Ford Focus 2009
571 Ford Focus 2009
572 Ford Focus 2010
573 Ford Focus 2010
574 Ford Focus 2011
575 Ford Focus 2011
576 Ford Focus 2011
577 Ford Focus 2012
578 Ford Focus 2012
579 Ford Focus 2012
580 Ford Focus 2012
581 Ford Focus 2013
582 Ford Focus 2013
583 Ford Focus 2014
584 Ford Focus 2015
585 Ford Focus 2016
586 Ford Focus 2016
587 Ford Focus 2016
588 Ford Focus 2016
589 Ford Focus 2017
590 Ford Focus 2018
591 Ford Fusion 2010
592 Fo

1043 Mercedes GLA 2018
1044 Mercedes GLC 2015
1045 Mercedes GLC 2016
1046 Mercedes GLC 2016
1047 Mercedes GLC 2017
1048 Mercedes GLC 2017
1049 Mercedes GLC 2017
1050 Mercedes GLC 2018
1051 Mercedes GLE 2016
1052 Mercedes GLE 2017
1053 Mercedes GLE 2017
1054 Mercedes GLK 2008
1055 Mercedes GLK 2009
1056 Mercedes GLK 2014
1057 Mercedes GLK 2014
1058 Mercedes ML 2008
1059 Mercedes ML 2009
1060 Mercedes ML 2012
1061 Mercedes ML 2012
1062 Mercedes ML 2012
1063 Mercedes R 2012
1064 Mercedes S 2008
1065 Mercedes S 2008
1066 Mercedes S 2009
1067 Mercedes S 2010
1068 Mercedes S 2013
1069 Mercedes S 2014
1070 Mercedes S 2016
1071 Mercedes S 2016
1072 Mercedes S 2016
1073 Mercedes SL 2012
1074 Mercedes SL 2016
1075 Mercedes SLK 2011
1076 Mercedes SLK 2012
1077 Mercedes SLS 2012
1078 Mercury Grand Marquis 2010
1079 Mercury Mariner 2008
1080 Mini Clubman 2012
1081 Mini Clubman 2016
1082 Mini Cooper 2008
1083 Mini Cooper 2009
1084 Mini Cooper 2010
1085 Mini Cooper 2010
1086 Mini Cooper 2011
1087 Min

1543 Skoda Octavia 2013
1544 Skoda Octavia 2016
1545 Skoda Octavia 2016
1546 Skoda Octavia 2017
1547 Skoda Octavia 2017
1548 Skoda Rapid 2013
1549 Skoda Roomster 2008
1550 Skoda Superb 2008
1551 Skoda Superb 2008
1552 Skoda Superb 2010
1553 Skoda Superb 2011
1554 Skoda Superb 2011
1555 Skoda Superb 2012
1556 Skoda Superb 2014
1557 Skoda Superb 2015
1558 Skoda Superb 2015
1559 Skoda Superb 2017
1560 Skoda Superb 2017
1561 Skoda Superb 2018
1562 Skoda Yeti 2010
1563 Skoda Yeti 2010
1564 Skoda Yeti 2012
1565 Skoda Yeti 2014
1566 Skoda Yeti 2017
1567 Smart Forfour 2016
1568 Smart Fortwo 2008
1569 Smart Fortwo 2009
1570 Smart Fortwo 2011
1571 Smart Fortwo 2013
1572 Smart Fortwo 2015
1573 SRT Viper 2012
1574 SsangYong Korando 2012
1575 SsangYong Tivoli 2016
1576 Ssangyong Tivoli 2018
1577 Subaru BRZ 2012
1578 Subaru Forester 2008
1579 Subaru Forester 2009
1580 Subaru Forester 2009
1581 Subaru Forester 2013
1582 Subaru Forester 2015
1583 Subaru Forester 2018
1584 Subaru Forester SUV 2013
1585

In [397]:
r.text

'\n<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n<html xmlns="http://www.w3.org/1999/xhtml">\n<head>\n<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />\n<title>403 Forbidden</title>\n<style type="text/css">\n<!--\nbody { \n\t/* If you want to add a background image uncomment the CSS properties below */\n\t/* background-image:url(http://www.example.com/path-to-some-image-file/example-image-file.jpg); /*\n\t/* background-repeat:repeat; */\n\tbackground-color:#CCCCCC;\n\tline-height: normal;\n}\n\n#bpsMessage {\n\ttext-align:center; \n\tbackground-color: #F7F8F9; \n\tborder:5px solid #000000; \n\tpadding:10px;\n}\n\np {\n    font-family: Verdana, Arial, Helvetica, sans-serif;\n\tfont-size:18px;\n\tfont-weight:bold;\n}\n-->\n</style>\n</head>\n\n<body>\n<div id="bpsMessage"> \n\t<p>carqueryapi.com 403 Forbidden Error Page</p>\n\t<p>If you arrived here due to a search or clicking on a link click

In [61]:
model_trim_ls = [data['Trims'][t]['model_trim'] for t in range(len(data['Trims']))]
model_trim_ls

['430', '460', '600h', '600h L', '600h L Luxury']

In [213]:
model_trim_ls[0]

'430'

In [217]:
import difflib
from random import randint

close_matches = difflib.get_close_matches('600h luxury', model_trim_ls)
if len(close_matches) != 0:
    model_index = model_trim_ls.index(close_matches[0])
else:
    #if this model is not available excatly, then chose randomly among others
    model_index = randint(0, len(model_trim_ls))

model_id = data['Trims'][model_index]['model_id']
model_id

'21821'

In [62]:
model_id = data['Trims'][model_trim_ls.index('600h L')]['model_id']
model_id

'21648'

### (3) Getting detailed specification of cars using 'carqueryapi.com' API

The last part is to look up for the full specification of each car in CQA using its model_id. In this section we added 60 features of specification to nearly a 1000 car of the cars of AD. We refere to each feature pulled from CQA by a postfix added to its column name: '_cqa'. 

In [469]:
df.head()

Unnamed: 0,brand,model,spec,year,dB_at_idle,dB_at_50kmh,dB_at_80kmh,dB_at_100kmh,dB_at_120kmh,dB_at_140kmh,model_id
0,Abarth,500,1.4 16v T-Jet,2008,47.3,58.2,67.0,70.2,72.9,76.0,
1,Abarth,595,Competizione,2017,49.9,65.7,69.0,72.3,73.1,75.8,
2,Acura,MDX,3.5 V6,2013,41.7,51.7,57.9,61.2,64.0,66.8,57614.0
3,Acura,RDX,3.5 V6,2012,43.0,54.4,61.5,65.5,67.7,69.9,48595.0
4,Acura,RL,3.7 V6,2009,43.6,55.5,63.0,66.9,70.2,73.5,299.0


In [542]:
url = 'https://www.carqueryapi.com/api/0.3/?&cmd=getModel&';
headers = {'User-Agent': 'Mozilla/5.0 Firefox/65.0'}

for index, row in df.iloc[:].iterrows():
    print(index,row['brand'], row['model'], row['year'], row['model_id'])
    if row['model_id'] == 'nan':
        print("######## that's a bad car, very bad car. Go NEXT! ########")
        continue
    
    params = {'model':row['model_id']}
#     r = requests.get(url, headers=headers, params=params);
    
    r = session.get(url, headers=headers, params=params)
    
    try:
        data = r.json()
    except ValueError:
        # Whoops it wasn't a 200
        print('######## Decoding JSON has failed ##########')
        continue
        
#     if list(data.keys())[0] == 'error' or (not list(data.values())[0]):
#         print('######## Ouch! me stupid car with poor model name! ########')
#         print(r.text)
#         continue
    for key, value in data[0].items():
        try:
            df.at[index, str(key)+'_cqa'] = value
#             print(index, key, value)
        except ValueError:
            continue
    print('######## Offloading from API is a success ########')
    
    
#     model_trim_ls = [data['Trims'][t]['model_trim'] for t in range(len(data['Trims']))]
# #     model_trim_ls = [data['Trims'][t]['model_trim'] for t in range(len(data['Trims']))]
    
#     if len(model_trim_ls) != 0:
#         close_matches = difflib.get_close_matches(row['spec'], model_trim_ls)
#         if len(close_matches) != 0:
#             model_index = model_trim_ls.index(close_matches[0])
#         else:
#         #if this model is not available excatly, then chose randomly among others
#             model_index = randint(0, len(model_trim_ls)-1)

#         model_id = data['Trims'][model_index]['model_id']
#         df.at[index,'model_id'] = model_id

0 Abarth 500 2008 nan
######## that's a bad car, very bad car. Go NEXT! ########
1 Abarth 595 2017 nan
######## that's a bad car, very bad car. Go NEXT! ########
2 Acura MDX 2013 57614
######## Offloading from API is a success ########
3 Acura RDX 2012 48595
######## Offloading from API is a success ########
4 Acura RL 2009 299
######## Offloading from API is a success ########
5 Acura RLX 2013 nan
######## that's a bad car, very bad car. Go NEXT! ########
6 Acura RLX 2016 67468
######## Offloading from API is a success ########
7 Acura TL 2009 380
######## Offloading from API is a success ########
8 Acura TL 2010 44336
######## Offloading from API is a success ########
9 Acura TSX 2009 296
######## Offloading from API is a success ########
10 Acura TSX 2010 44339
######## Offloading from API is a success ########
11 Acura TSX 2011 44340
######## Offloading from API is a success ########
12 Alfa Romeo 147 2008 nan
######## that's a bad car, very bad car. Go NEXT! ########
13 Alfa Romeo

######## Offloading from API is a success ########
109 Audi Q3 2016 67721
######## Offloading from API is a success ########
110 Audi Q3 2016 67721
######## Offloading from API is a success ########
111 Audi Q3 2018 72864
######## Offloading from API is a success ########
112 Audi Q3 RS 2015 nan
######## that's a bad car, very bad car. Go NEXT! ########
113 Audi Q5 2008 nan
######## that's a bad car, very bad car. Go NEXT! ########
114 Audi Q5 2009 2556
######## Offloading from API is a success ########
115 Audi Q5 2009 2853
######## Offloading from API is a success ########
116 Audi Q5 2011 56295
######## Offloading from API is a success ########
117 Audi Q5 2012 50773
######## Offloading from API is a success ########
118 Audi Q5 2012 50774
######## Offloading from API is a success ########
119 Audi Q5 2014 58861
######## Offloading from API is a success ########
120 Audi Q5 2014 58864
######## Offloading from API is a success ########
121 Audi Q5 2014 58861
######## Offloading from 

######## Offloading from API is a success ########
278 BMW M3 2008 5682
######## Offloading from API is a success ########
279 BMW M3 2013 57673
######## Offloading from API is a success ########
280 BMW M3 2014 nan
######## that's a bad car, very bad car. Go NEXT! ########
281 BMW M3 2017 68775
######## Offloading from API is a success ########
282 BMW M4 2017 68807
######## Offloading from API is a success ########
283 BMW M4 2017 68807
######## Offloading from API is a success ########
284 BMW M5 2012 48707
######## Offloading from API is a success ########
285 BMW M6 2012 nan
######## that's a bad car, very bad car. Go NEXT! ########
286 BMW M6 2017 69334
######## Offloading from API is a success ########
287 BMW X1 2011 nan
######## that's a bad car, very bad car. Go NEXT! ########
288 BMW X1 2011 nan
######## that's a bad car, very bad car. Go NEXT! ########
289 BMW X1 2013 57678
######## Offloading from API is a success ########
290 BMW X1 2015 61281
######## Offloading from API

######## Offloading from API is a success ########
383 Chevrolet Malibu 2008 7196
######## Offloading from API is a success ########
384 Chevrolet Malibu 2012 48806
######## Offloading from API is a success ########
385 Chevrolet Malibu 2016 66834
######## Offloading from API is a success ########
386 Chevrolet Orlando 2011 50934
######## Offloading from API is a success ########
387 Chevrolet Silverado 2009 8143
######## Offloading from API is a success ########
388 Chevrolet Silverado 2016 nan
######## that's a bad car, very bad car. Go NEXT! ########
389 Chevrolet Sonic 2012 48813
######## Offloading from API is a success ########
390 Chevrolet SS 2014 59260
######## Offloading from API is a success ########
391 Chevrolet Suburban 2016 66335
######## Offloading from API is a success ########
392 Chevrolet Tahoe 2008 7826
######## Offloading from API is a success ########
393 Chevrolet Tahoe 2013 57877
######## Offloading from API is a success ########
394 Chevrolet Tahoe 2014 59272


######## Offloading from API is a success ########
484 Dodge Ram 2010 49885
######## Offloading from API is a success ########
485 Dodge Ram 2012 49903
######## Offloading from API is a success ########
486 Dodge Ram 2013 nan
######## that's a bad car, very bad car. Go NEXT! ########
487 Dodge Viper 2008 11426
######## Offloading from API is a success ########
488 Dodge Viper 2013 nan
######## that's a bad car, very bad car. Go NEXT! ########
489 DS 3 2018 nan
######## that's a bad car, very bad car. Go NEXT! ########
490 DS 3 2018 nan
######## that's a bad car, very bad car. Go NEXT! ########
491 DS 4 2016 nan
######## that's a bad car, very bad car. Go NEXT! ########
492 DS 5 2015 nan
######## that's a bad car, very bad car. Go NEXT! ########
493 DS 5 2018 nan
######## that's a bad car, very bad car. Go NEXT! ########
494 Fiat 124 2016 nan
######## that's a bad car, very bad car. Go NEXT! ########
495 Fiat 124 2018 nan
######## that's a bad car, very bad car. Go NEXT! ########
496 Fi

######## Offloading from API is a success ########
588 Ford Focus 2016 66789
######## Offloading from API is a success ########
589 Ford Focus 2017 70866
######## Offloading from API is a success ########
590 Ford Focus 2018 73746
######## Offloading from API is a success ########
591 Ford Fusion 2010 44953
######## Offloading from API is a success ########
592 Ford Fusion 2012 48944
######## Offloading from API is a success ########
593 Ford Fusion 2012 48945
######## Offloading from API is a success ########
594 Ford Fusion 2013 57486
######## Offloading from API is a success ########
595 Ford Galaxy 2008 14150
######## Offloading from API is a success ########
596 Ford Grand C-Max 2012 nan
######## that's a bad car, very bad car. Go NEXT! ########
597 Ford Ka 2008 14176
######## Offloading from API is a success ########
598 Ford Ka 2009 14743
######## Offloading from API is a success ########
599 Ford Ka 2013 nan
######## that's a bad car, very bad car. Go NEXT! ########
600 Ford Ka

######## Offloading from API is a success ########
692 Hyundai Genesis 2012 49056
######## Offloading from API is a success ########
693 Hyundai Genesis 2013 57967
######## Offloading from API is a success ########
694 Hyundai i10 2008 51013
######## Offloading from API is a success ########
695 Hyundai i10 2011 51042
######## Offloading from API is a success ########
696 Hyundai i10 2013 nan
######## that's a bad car, very bad car. Go NEXT! ########
697 Hyundai i10 2014 nan
######## that's a bad car, very bad car. Go NEXT! ########
698 Hyundai i10 2016 nan
######## that's a bad car, very bad car. Go NEXT! ########
699 Hyundai i20 2008 51017
######## Offloading from API is a success ########
700 Hyundai i20 2009 51024
######## Offloading from API is a success ########
701 Hyundai i20 2009 51022
######## Offloading from API is a success ########
702 Hyundai i20 2010 55347
######## Offloading from API is a success ########
703 Hyundai i20 2012 51062
######## Offloading from API is a succ

######## Offloading from API is a success ########
793 Jeep Cherokee 2014 59930
######## Offloading from API is a success ########
794 Jeep Compass 2011 45222
######## Offloading from API is a success ########
795 Jeep Compass 2018 72114
######## Offloading from API is a success ########
796 Jeep Grand Cherokee 2008 19650
######## Offloading from API is a success ########
797 Jeep Grand Cherokee 2008 19753
######## Offloading from API is a success ########
798 Jeep Grand Cherokee 2011 52750
######## Offloading from API is a success ########
799 Jeep Grand Cherokee 2012 49129
######## Offloading from API is a success ########
800 Jeep Grand Cherokee 2014 59949
######## Offloading from API is a success ########
801 Jeep Renegade 2015 61067
######## Offloading from API is a success ########
802 Jeep Renegade 2018 71469
######## Offloading from API is a success ########
803 Jeep Renegade 2018 71469
######## Offloading from API is a success ########
804 Jeep Wrangler 2008 19641
######## Off

######## Offloading from API is a success ########
898 Lexus LS 2009 21730
######## Offloading from API is a success ########
899 Lexus LS 2010 45366
######## Offloading from API is a success ########
900 Lexus LS 2013 nan
######## that's a bad car, very bad car. Go NEXT! ########
901 Lexus LS 2016 nan
######## that's a bad car, very bad car. Go NEXT! ########
902 Lexus NX 2014 nan
######## that's a bad car, very bad car. Go NEXT! ########
903 Lexus NX 2018 nan
######## that's a bad car, very bad car. Go NEXT! ########
904 Lexus RC 2016 nan
######## that's a bad car, very bad car. Go NEXT! ########
905 Lexus RX 2010 45373
######## Offloading from API is a success ########
906 Lexus RX 2011 45375
######## Offloading from API is a success ########
907 Lexus RX 2013 58418
######## Offloading from API is a success ########
908 Lexus RX 2016 nan
######## that's a bad car, very bad car. Go NEXT! ########
909 Lexus RX 2018 nan
######## that's a bad car, very bad car. Go NEXT! ########
910 Lin

######## Offloading from API is a success ########
1079 Mercury Mariner 2008 25846
######## Offloading from API is a success ########
1080 Mini Clubman 2012 nan
######## that's a bad car, very bad car. Go NEXT! ########
1081 Mini Clubman 2016 nan
######## that's a bad car, very bad car. Go NEXT! ########
1082 Mini Cooper 2008 26331
######## Offloading from API is a success ########
1083 Mini Cooper 2009 26367
######## Offloading from API is a success ########
1084 Mini Cooper 2010 45591
######## Offloading from API is a success ########
1085 Mini Cooper 2010 45592
######## Offloading from API is a success ########
1086 Mini Cooper 2011 45594
######## Offloading from API is a success ########
1087 Mini Cooper 2011 45594
######## Offloading from API is a success ########
1088 Mini Cooper 2011 45595
######## Offloading from API is a success ########
1089 Mini Cooper 2014 60162
######## Offloading from API is a success ########
1090 Mini Cooper 2014 60162
######## Offloading from API is a 

######## Offloading from API is a success ########
1178 Nissan Qashqai 2008 28088
######## Offloading from API is a success ########
1179 Nissan Qashqai 2009 27956
######## Offloading from API is a success ########
1180 Nissan Qashqai 2010 47373
######## Offloading from API is a success ########
1181 Nissan Qashqai 2012 50291
######## Offloading from API is a success ########
1182 Nissan Qashqai 2012 50291
######## Offloading from API is a success ########
1183 Nissan Qashqai 2014 nan
######## that's a bad car, very bad car. Go NEXT! ########
1184 Nissan Qashqai 2015 nan
######## that's a bad car, very bad car. Go NEXT! ########
1185 Nissan Qashqai 2016 nan
######## that's a bad car, very bad car. Go NEXT! ########
1186 Nissan Qashqai 2017 nan
######## that's a bad car, very bad car. Go NEXT! ########
1187 Nissan Qashqai 2017 nan
######## that's a bad car, very bad car. Go NEXT! ########
1188 Nissan Qashqai 2017 nan
######## that's a bad car, very bad car. Go NEXT! ########
1189 Nissan

######## Offloading from API is a success ########
1278 Peugeot 207 2010 50360
######## Offloading from API is a success ########
1279 Peugeot 207 2010 50360
######## Offloading from API is a success ########
1280 Peugeot 207 2010 46299
######## Offloading from API is a success ########
1281 Peugeot 208 2012 nan
######## that's a bad car, very bad car. Go NEXT! ########
1282 Peugeot 208 2012 nan
######## that's a bad car, very bad car. Go NEXT! ########
1283 Peugeot 208 2012 nan
######## that's a bad car, very bad car. Go NEXT! ########
1284 Peugeot 208 2016 nan
######## that's a bad car, very bad car. Go NEXT! ########
1285 Peugeot 208 2017 nan
######## that's a bad car, very bad car. Go NEXT! ########
1286 Peugeot 208 2017 nan
######## that's a bad car, very bad car. Go NEXT! ########
1287 Peugeot 208 2017 nan
######## that's a bad car, very bad car. Go NEXT! ########
1288 Peugeot 208 2018 nan
######## that's a bad car, very bad car. Go NEXT! ########
1289 Peugeot 308 2008 31235
####

######## Offloading from API is a success ########
1378 Porsche Macan 2016 66259
######## Offloading from API is a success ########
1379 Porsche Macan 2017 68831
######## Offloading from API is a success ########
1380 Porsche Macan 2017 68831
######## Offloading from API is a success ########
1381 Porsche Panamera 2009 nan
######## that's a bad car, very bad car. Go NEXT! ########
1382 Porsche Panamera 2010 45792
######## Offloading from API is a success ########
1383 Porsche Panamera 2011 45795
######## Offloading from API is a success ########
1384 Porsche Panamera 2011 45796
######## Offloading from API is a success ########
1385 Porsche Panamera 2012 49398
######## Offloading from API is a success ########
1386 Porsche Panamera 2013 58627
######## Offloading from API is a success ########
1387 Porsche Panamera 2016 66755
######## Offloading from API is a success ########
1388 Porsche Panamera 2017 69330
######## Offloading from API is a success ########
1389 Range Rover 0 2013 nan


######## Offloading from API is a success ########
1484 Saturn Vue 2008 35527
######## Offloading from API is a success ########
1485 Scion FR-S 2012 nan
######## that's a bad car, very bad car. Go NEXT! ########
1486 Scion iQ 2012 49414
######## Offloading from API is a success ########
1487 Scion tC 2011 56893
######## Offloading from API is a success ########
1488 Scion xB 2008 35645
######## Offloading from API is a success ########
1489 Seat Alhambra 2011 nan
######## that's a bad car, very bad car. Go NEXT! ########
1490 Seat Altea 2008 35654
######## Offloading from API is a success ########
1491 Seat Altea 2008 35725
######## Offloading from API is a success ########
1492 Seat Altea XL 2008 nan
######## that's a bad car, very bad car. Go NEXT! ########
1493 Seat Arona 2018 nan
######## that's a bad car, very bad car. Go NEXT! ########
1494 Seat Ateca 2017 nan
######## that's a bad car, very bad car. Go NEXT! ########
1495 Seat Ateca 2017 nan
######## that's a bad car, very bad 

######## Offloading from API is a success ########
1584 Subaru Forester SUV 2013 nan
######## that's a bad car, very bad car. Go NEXT! ########
1585 Subaru Impreza 2008 37401
######## Offloading from API is a success ########
1586 Subaru Impreza 2008 37401
######## Offloading from API is a success ########
1587 Subaru Impreza 2008 37586
######## Offloading from API is a success ########
1588 Subaru Impreza 2011 45863
######## Offloading from API is a success ########
1589 Subaru Impreza 2011 45865
######## Offloading from API is a success ########
1590 Subaru Justy 2008 37654
######## Offloading from API is a success ########
1591 Subaru Legacy 2008 37519
######## Offloading from API is a success ########
1592 Subaru Legacy 2008 37238
######## Offloading from API is a success ########
1593 Subaru Legacy 2009 37287
######## Offloading from API is a success ########
1594 Subaru Legacy 2010 45867
######## Offloading from API is a success ########
1595 Subaru Levorg 2016 nan
######## that'

######## Offloading from API is a success ########
1683 Toyota Prius 2011 45995
######## Offloading from API is a success ########
1684 Toyota Prius 2012 49467
######## Offloading from API is a success ########
1685 Toyota Prius 2012 49466
######## Offloading from API is a success ########
1686 Toyota Prius 2012 49467
######## Offloading from API is a success ########
1687 Toyota Prius 2012 49466
######## Offloading from API is a success ########
1688 Toyota Prius 2014 60615
######## Offloading from API is a success ########
1689 Toyota Prius 2014 60616
######## Offloading from API is a success ########
1690 Toyota Prius 2016 67534
######## Offloading from API is a success ########
1691 Toyota Prius 2017 70103
######## Offloading from API is a success ########
1692 Toyota Prius 2018 72678
######## Offloading from API is a success ########
1693 Toyota RAV4 2008 39231
######## Offloading from API is a success ########
1694 Toyota RAV4 2012 56407
######## Offloading from API is a success 

######## Offloading from API is a success ########
1782 Volkswagen Passat 2012 49539
######## Offloading from API is a success ########
1783 Volkswagen Passat 2013 58296
######## Offloading from API is a success ########
1784 Volkswagen Passat 2014 60893
######## Offloading from API is a success ########
1785 Volkswagen Passat 2014 60895
######## Offloading from API is a success ########
1786 Volkswagen Passat 2015 62124
######## Offloading from API is a success ########
1787 Volkswagen Passat 2015 62137
######## Offloading from API is a success ########
1788 Volkswagen Passat 2016 67357
######## Offloading from API is a success ########
1789 Volkswagen Passat 2018 72500
######## Offloading from API is a success ########
1790 Volkswagen Passat CC 2008 nan
######## that's a bad car, very bad car. Go NEXT! ########
1791 Volkswagen Phaeton 2008 42093
######## Offloading from API is a success ########
1792 Volkswagen Phaeton 2010 51681
######## Offloading from API is a success ########
179

######## Offloading from API is a success ########
1881 Volvo XC60 2015 62528
######## Offloading from API is a success ########
1882 Volvo XC60 2016 67762
######## Offloading from API is a success ########
1883 Volvo XC60 2016 67762
######## Offloading from API is a success ########
1884 Volvo XC60 2017 68877
######## Offloading from API is a success ########
1885 Volvo XC60 2018 71450
######## Offloading from API is a success ########
1886 Volvo XC70 2008 43926
######## Offloading from API is a success ########
1887 Volvo XC90 2015 nan
######## that's a bad car, very bad car. Go NEXT! ########
1888 Volvo XC90 2016 nan
######## that's a bad car, very bad car. Go NEXT! ########
1889 Volvo XC90 2016 nan
######## that's a bad car, very bad car. Go NEXT! ########
1890 Volvo XC90 2017 nan
######## that's a bad car, very bad car. Go NEXT! ########
1891 Volvo XC90 2017 nan
######## that's a bad car, very bad car. Go NEXT! ########
1892 Volvo XC90 2018 nan
######## that's a bad car, very bad 

In [547]:
len(df[df['model_id'] != 'nan']), len(df[~df['model_id_cqa'].isna()])

(1067, 1067)

### (4) Scrapping cars' prices

The price for each car is a suggested feature to be added to the dataset. Not implemented in this project.

## II. Exporting datafram to CSV

In [555]:
#code
import csv
df.to_csv('./data/car_noise_specification_datasets.csv', sep='\t', encoding='utf-8', quoting=csv.QUOTE_NONE)