# USA - Used Cars Market Analysis

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns # nice plots
import matplotlib.pyplot as plt # basic plotting library
import re # regular expressions

from mpl_toolkits.basemap import Basemap # library for plotting 2D maps in Python

import warnings
warnings.simplefilter('ignore')

import os
print(os.listdir("../input"))

Reading raw data.

In [None]:
data = pd.read_csv("../input/craigslist-carstrucks-data/craigslistVehicles.csv")

Looking at the first 5 rows.

In [None]:
data.head()

Showing column names.

In [None]:
data.columns

Checking size of the database. I will drop unnecessary columns later to reduce it.

In [None]:
data.size

I will use only subset of the original database - some irrelevant columns will be removed. I will also reduce a size.

In [None]:
cars = data.drop(["url","city_url","title_status","VIN","desc","image_url","size"],axis=1)
cars.size

In [None]:
cars.head()

In [None]:
cars.shape

### Looking at missing data

In [None]:
nans = cars.isnull().sum().sort_values(ascending=False).div(len(cars))
plt.figure(figsize=(16,8))
sns.barplot(x=nans.index, y=nans.values)
plt.title("Percent of missing data")
plt.show()

*Cylinder* column contains a text in theform of: *number + "cylinders"*. This can be changed to integer for easier analysis. If number of cylinders is not specified I will mark it as -1.

In [None]:
def cylinders(row):
    if type(row["cylinders"]) is str:
        cyl = re.findall(r"(\d) cylinders", row["cylinders"])
        if len(cyl) != 0:
            return int(cyl[0])
        else:
            return -1
    else:
        return -1
        
data["cylinders"] = data.apply(cylinders,axis=1)

I will change *year* column type from float to integer. Also all missing data will be replaced with value of 0.

In [None]:
cars["year"].fillna(0, inplace=True)
cars["year"] = cars["year"].astype("int32")

## 1. Location of adverts (subset)  

I will use a subset of 20 000 cars to show their location on the US map.

In [None]:
plt.figure(figsize=(20,10))

cars_sub = cars.sample(20000)

m = Basemap(projection='merc', # mercator projection
            llcrnrlat = 20,
            llcrnrlon = -170,
            urcrnrlat = 70,
            urcrnrlon = -60,
            resolution='l')

m.shadedrelief()
m.drawcoastlines() # drawing coaslines
m.drawcountries(linewidth=2) # drawing countries boundaries
m.drawstates(color='b') # drawing states boundaries
#m.fillcontinents(color='grey',lake_color='aqua')

for index, row in cars_sub.iterrows():
    latitude = row['lat']
    longitude = row['long']
    x_coor, y_coor = m(longitude, latitude)
    m.plot(x_coor,y_coor,'.',markersize=0.2,c="red")

## 2. Price vs. Year of Manufacture

The plot below shows how a price changes with a year of production. As this notebook is created in 2019 and some cars in the database have manufacturing year 2020 I will exclude these from analysis. Also as a part of data cleaning I will exclude all cars with price belo 1 USD and all above 98 percentile.

In [None]:
#identifying outliers:
price_over_98pct = cars["price"].quantile(.98)

price_yr_cleaned = cars[(1 < cars["price"]) & (cars["price"] < price_over_98pct) & (cars["year"] != 0) & (cars["year"] != 2020)]

plt.figure(figsize=(16,9))
sns.boxplot(x="year", y="price", data = price_yr_cleaned)
plt.title("Price of cars vs. manufacturing year")
plt.ylabel("Price [USD]")

max_year = price_yr_cleaned["year"].max()
min_year = price_yr_cleaned["year"].min()
steps = 2
lab = np.sort(price_yr_cleaned["year"].unique())[::2]
pos = np.arange(0,111,2)

plt.xticks(ticks=pos, labels=lab, rotation=90)
plt.show()

It can be observed that the car’s year of manufacture has a significant impact on its price. The lowest prices have cars manufactured in 80s and 90s and the highest after 2015 and before 1939.

Although there are various definition, cars due to their age can be divided into four main categories: classic, old-timers, young-timers and modern cars. 
* Classic cars were built between 1915 and 1948.
* Old-timers are cars that are at least 25 years old. In our analysis, these cars are produced between 1949 and 1994. 
* Young-timers are cars 25-15 years old. Inour case, these cars would be manufactured between 1995-2004.
* The last group are modern cars that are younger than 15 years old it means manufactured after 2004.

Using these criteria, we can observe that the most expensive are classic and modern cars. The lowest price have young timers. 


## 3. Number of offers per year

In [None]:
no_of_offers = price_yr_cleaned.groupby("year")["price"].count()

plt.figure(figsize=(16,8))
sns.barplot(x = no_of_offers.index, y = no_of_offers.values)
plt.title("Numbers of offers for each manufacturing year")
plt.ylabel("Count of offers")
plt.xticks(rotation=90)
plt.show()

On the graph above we can observe that the most offers are for the modern cars. For older cars we have very small sample. This information should be taken into account when analysing car prices in a previous graph. There is a significant increase of offers’ number after 1998 (car younger than 20 years). The biggest number of offers are for car manufactured in 2017.

## 4. Ranking of manufacturers

In [None]:
manufacturers = cars["manufacturer"].value_counts().div(len(cars)).mul(100)
manufactuters_TOP20 = manufacturers[:20]

plt.figure(figsize=(16,8))
sns.barplot(x=manufactuters_TOP20.index, y=manufactuters_TOP20.values)
plt.title("20 most popular manufactureres in the USA")
plt.ylabel("Popularity in %")
plt.xticks(rotation=90)
plt.show()

As we can see the first and the second place are taken by american companies (Ford and Chevrolet). Then there are Japanese brands: Toyota, Nissan and Honda. And there is somethin new if you are European - **RAM**. This is a trucks and van manufacturer belonging to the italian Fiat company. Another less known in Europe brand is **Buick** - it's an american brand manufacturing luxury cars, however it's main market is China.

Some more details:

1. [Ford Motor Company](https://www.ford.com/) is an American automotive concern founded on 16 June 1903 by Henry Ford in Detroit. Currently, it is a large automotive industry concern producing cars, trucks, buses, agricultural vehicles and machines, IT equipment and other products. This concern has approximately 199 000 employees (2019). In the USA, luxury Ford models are manufactured and  sold under the **Lincoln** brand. Ford also purchased small shares in the **Mazda** and **Aston Martin** brands. In 1989 the concern became the owner of the Jaguar brand, and since 2000 Land Rover but both were sold to Toyota Motors in 2008. Since 1999, Ford has owned Volvo, which was sold in 2010 to Zhejiang Geely Holding Group. In 2011, Ford stopped producing cars under the Mercury name (a price positioned brand between Ford and Lincoln), which had been in operation since 1938. Ford is the second largest car manufacturer in the US and the fifth largest in the world. In 2015, around 6 635 000 cars left Ford's factories. The largest non-american plant is Ford-Werke AG in Cologne, Germany.  

2. [Chevrolet](https://www.chevrolet.com/) is an American automotive brand belonging to the **General Motors group**, whose cars are available in over 140 countries around the world. Is the fourth most-bought car brand in the world. Chevrolet was founded in 1911 by Louis Chevrolet and mechanic, and William C. "Billy" Durant. The company's headquarters are in Detroit (like Ford's).

3. [Toyota Motor Corporation](https://www.toyota.com/) (TMC) is a Japanese automotive group founded in 1937 by Sakichi Toyoda. During World War II, a small city car was designed at Toyota factories in hope to motorize Japan. In 1947, work on the SA model was completed. In 1951, a BJ prototype was built, inspired by the design of Jeep Willys, giving rise to the Land Cruiser series, and in 1955 the Crown model was launched. In 1966, the most popular model in the world - **Corolla** - went into production. In 1997, the first mass-produced Toyota's hybrid car, the Prius model, was put into production. In 1982, Toyota Motor Company and Toyota Motor Sales merged into one company called Toyota Motor Corporation. In 1984, a company was formed with the American automotive group General Motors - The United Motor Manufacturing Inc. The purpose of building the company was to conquer the American market, where the Camry model appeared in 1983. In 1989, a sub-brand was created to manufacture luxury cars called **Lexus**. 

Car brands owned by Toyota Motor Corporation:  
•	Toyota - a Japanese brand founded in 1937   
•	Daihatsu - a Japanese brand founded in 1907, belonging to TMC since 1967  
•	Lexus - a Japanese brand founded in 1989 by TMC   
•	Hino Motors - a Japanese truck manufacturer 
•	Scion - a Japanese brand existing in the years 2002 - 2015 founded by TMC  



## 5. Milages

In [None]:
cars_odo_clear = cars[(cars["odometer"]<cars["odometer"].quantile(.99)) & ((cars["price"]<cars["price"].quantile(.99)))& ((cars["price"]>1))]

In [None]:
plt.figure(figsize=(16,8))
sns.scatterplot(x = 'odometer', y = 'price', data = cars_odo_clear)
plt.show()

## MORE TO COME SOON