# Descrição do Dataset
Link: https://www.kaggle.com/austinreese/craigslist-carstrucks-data

O dataset contém os carros usados e à venda listados no Craiglist dentro dos Estados Unidos. Dados foram coletados através de um scrapper e atualizados constantemente dentro do período de alguns meses.
O dataset contém quase todas as informações relevantes mostradas no Craiglist.


## Colunas do Dataset

| Coluna       | Descrição                                                                                    |
|--------------|----------------------------------------------------------------------------------------------|
| id           | Identificador da entrada, número único que representa o anúncio que foi postado no Craiglist |
| url          | Url do anúncio                                                                               |
| region       | Região dos Estados Unidos no Craiglist                                                       |
| region_url   | Url da região do Estados Unidos                                                              |
| price        | Preço (em dólar) do carro no anúncio                                                          |
| year         | Ano de fabricaçao do veiculo                                                                 |
| manufacturer | Fabricante do veículo                                                                        |
| model        | Modelo/nome do veículo                                                                       |
| condition    | Condição do veículo (boa, ruim, seminovo, excelente...)                                      |
| cylinders    | Número de cilindros                                                                          |
| fuel         | Tipo de combustivel                                                                          |
| odometer     | Milhas viajada pelo veículo                                                                  |
| title_status | O status do veículo (limpo, reconstruído, apenas partes...)                                  |
| transmission | Tipo de transmissão do veículo                                                               |
| vin          | Número de identificação do veículo                                                           |
| drive        | Tipo de tração: rwd (traseira), 4wd (quatro rodas) e fwd (dianteira)                         |
| size         | Tamanho do veículo                                                                           |
| type         | Tipo genérico do veículo (pickup, offroad, hatchback, sedan, suv...)                         |
| paint_color  | Cor do veículo                                                                               |
| image_url    | Url da imagem                                                                                |
| description  | Descrição do veículo no anúncio                                                              |
| county       | Coluna inútil deixada por engano, será desconsiderada durante o projeto                      |
| state        | Estado onde foi feito o anúncio                                                              |
| lat          | Latitude do anúncio                                                                          |
| long         | Longitude do anúncio                                                                         |

In [90]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns

### Carregamento do dataset

In [78]:
DATASET = "datasets/vehicles.csv"

df = pd.read_csv("vehicles.csv")

### Visão geral do dataset

In [57]:
df.head()

Unnamed: 0,id,url,region,region_url,price,year,manufacturer,model,condition,cylinders,...,drive,size,type,paint_color,image_url,description,county,state,lat,long
0,7119256118,https://mohave.craigslist.org/ctd/d/lake-havas...,mohave county,https://mohave.craigslist.org,3495,2012.0,jeep,patriot,like new,4 cylinders,...,,,,silver,https://images.craigslist.org/00B0B_k2AXIJ21ok...,"THIS 2012 JEEP PATRIOT IS A 4CYL. AC, STEREO, ...",,az,34.4554,-114.269
1,7120880186,https://oregoncoast.craigslist.org/cto/d/warre...,oregon coast,https://oregoncoast.craigslist.org,13750,2014.0,bmw,328i m-sport,good,,...,rwd,,sedan,grey,https://images.craigslist.org/00U0U_3cLk0WGOJ8...,Selling my 2014 BMW 328i with the following be...,,or,46.1837,-123.824
2,7115048251,https://greenville.craigslist.org/cto/d/sparta...,greenville / upstate,https://greenville.craigslist.org,2300,2001.0,dodge,caravan,excellent,6 cylinders,...,,,,,https://images.craigslist.org/00k0k_t4WqYn5nDC...,"01 DODGE CARAVAN,3.3 ENGINE,AUT TRANS,199000 M...",,sc,34.9352,-81.9654
3,7119250502,https://mohave.craigslist.org/cto/d/lake-havas...,mohave county,https://mohave.craigslist.org,9000,2004.0,chevrolet,colorado ls,excellent,5 cylinders,...,rwd,mid-size,pickup,red,https://images.craigslist.org/00J0J_lJEzfeVLHI...,"2004 Chevy Colorado LS, ONLY 54000 ORIGINAL MI...",,az,34.4783,-114.271
4,7120433904,https://maine.craigslist.org/ctd/d/searsport-t...,maine,https://maine.craigslist.org,0,2021.0,,Honda-Nissan-Kia-Ford-Hyundai-VW,,,...,,,,,https://images.craigslist.org/01010_j0IW34mCsm...,CALL: 207.548.6500 TEXT: 207.407.5598 **WE FI...,,me,44.4699,-68.8963


In [58]:
df.describe()

Unnamed: 0,id,price,year,odometer,county,lat,long
count,435849.0,435849.0,434732.0,360701.0,0.0,427614.0,427614.0
mean,7115954000.0,134912.7,2009.86646,98975.9,,38.404163,-94.96117
std,4590854.0,16908570.0,9.312503,113499.0,,6.036915,18.058561
min,7096577000.0,0.0,1900.0,0.0,,-83.1971,-177.012
25%,7112450000.0,4900.0,2007.0,47333.0,,34.2257,-111.731
50%,7117092000.0,9995.0,2012.0,91188.0,,38.9348,-89.6767
75%,7120090000.0,17989.0,2015.0,134736.0,,42.4845,-81.3973
max,7121608000.0,3647257000.0,2021.0,10000000.0,,79.6019,173.675


In [47]:
df.dtypes

id                int64
url              object
region           object
region_url       object
price             int64
year            float64
manufacturer     object
model            object
condition        object
cylinders        object
fuel             object
odometer        float64
title_status     object
transmission     object
vin              object
drive            object
size             object
type             object
paint_color      object
image_url        object
description      object
county          float64
state            object
lat             float64
long            float64
dtype: object

### Remove as colunas 'county', 'region_url' e 'image_url'
A coluna county não continha informações utéis, todos os valores eram nulos. A coluna region_url contém informação redundante, portanto também foi removida, e a coluna image_url não será útil nesse projeto.

É criado um novo '.csv' sem essas colunas.

In [None]:
df.drop(columns=['county', 'region_url', 'image_url'], inplace=True)

In [80]:
df.to_csv("clean_vehicles.csv")
df.head()

Unnamed: 0,id,url,region,price,year,manufacturer,model,condition,cylinders,fuel,...,transmission,vin,drive,size,type,paint_color,description,state,lat,long
0,7119256118,https://mohave.craigslist.org/ctd/d/lake-havas...,mohave county,3495,2012.0,jeep,patriot,like new,4 cylinders,gas,...,automatic,,,,,silver,"THIS 2012 JEEP PATRIOT IS A 4CYL. AC, STEREO, ...",az,34.4554,-114.269
1,7120880186,https://oregoncoast.craigslist.org/cto/d/warre...,oregon coast,13750,2014.0,bmw,328i m-sport,good,,gas,...,automatic,,rwd,,sedan,grey,Selling my 2014 BMW 328i with the following be...,or,46.1837,-123.824
2,7115048251,https://greenville.craigslist.org/cto/d/sparta...,greenville / upstate,2300,2001.0,dodge,caravan,excellent,6 cylinders,gas,...,automatic,,,,,,"01 DODGE CARAVAN,3.3 ENGINE,AUT TRANS,199000 M...",sc,34.9352,-81.9654
3,7119250502,https://mohave.craigslist.org/cto/d/lake-havas...,mohave county,9000,2004.0,chevrolet,colorado ls,excellent,5 cylinders,gas,...,automatic,1GCCS196448191644,rwd,mid-size,pickup,red,"2004 Chevy Colorado LS, ONLY 54000 ORIGINAL MI...",az,34.4783,-114.271
4,7120433904,https://maine.craigslist.org/ctd/d/searsport-t...,maine,0,2021.0,,Honda-Nissan-Kia-Ford-Hyundai-VW,,,other,...,other,,,,,,CALL: 207.548.6500 TEXT: 207.407.5598 **WE FI...,me,44.4699,-68.8963


## Usando o dataset apenas com as colunas relevantes

In [84]:
df = pd.read_csv("datasets/clean_vehicles.csv")

In [85]:
df.head()

Unnamed: 0.1,Unnamed: 0,id,url,region,price,year,manufacturer,model,condition,cylinders,...,transmission,vin,drive,size,type,paint_color,description,state,lat,long
0,0,7119256118,https://mohave.craigslist.org/ctd/d/lake-havas...,mohave county,3495,2012.0,jeep,patriot,like new,4 cylinders,...,automatic,,,,,silver,"THIS 2012 JEEP PATRIOT IS A 4CYL. AC, STEREO, ...",az,34.4554,-114.269
1,1,7120880186,https://oregoncoast.craigslist.org/cto/d/warre...,oregon coast,13750,2014.0,bmw,328i m-sport,good,,...,automatic,,rwd,,sedan,grey,Selling my 2014 BMW 328i with the following be...,or,46.1837,-123.824
2,2,7115048251,https://greenville.craigslist.org/cto/d/sparta...,greenville / upstate,2300,2001.0,dodge,caravan,excellent,6 cylinders,...,automatic,,,,,,"01 DODGE CARAVAN,3.3 ENGINE,AUT TRANS,199000 M...",sc,34.9352,-81.9654
3,3,7119250502,https://mohave.craigslist.org/cto/d/lake-havas...,mohave county,9000,2004.0,chevrolet,colorado ls,excellent,5 cylinders,...,automatic,1GCCS196448191644,rwd,mid-size,pickup,red,"2004 Chevy Colorado LS, ONLY 54000 ORIGINAL MI...",az,34.4783,-114.271
4,4,7120433904,https://maine.craigslist.org/ctd/d/searsport-t...,maine,0,2021.0,,Honda-Nissan-Kia-Ford-Hyundai-VW,,,...,other,,,,,,CALL: 207.548.6500 TEXT: 207.407.5598 **WE FI...,me,44.4699,-68.8963


In [None]:
sns.pairplot(df, hue='price')

