# MPG Cars

### Introduction:

The following exercise utilizes data from [UC Irvine Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/Auto+MPG)

### Step 1. Import the necessary libraries

In [64]:
import pandas as pd
import numpy as np

### Step 2. Import the first dataset [cars1](https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars1.csv) and [cars2](https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars2.csv).  

In [65]:
cars1_url = r"https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars1.csv"
cars2_url = r"https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars2.csv"

   ### Step 3. Assign each to a variable called cars1 and cars2

In [66]:
cars1 = pd.read_csv(cars1_url)
cars2 = pd.read_csv(cars2_url)

### Step 4. Oops, it seems our first dataset has some unnamed blank columns, fix cars1

In [67]:
# See If All Unnamed Columns Contains Only NaN
cars1.isna().sum()

mpg               0
cylinders         0
displacement      0
horsepower        0
weight            0
acceleration      0
model             0
origin            0
car               0
Unnamed: 9      198
Unnamed: 10     198
Unnamed: 11     198
Unnamed: 12     198
Unnamed: 13     198
dtype: int64

In [68]:
cars1 = cars1.drop(columns=cars1.columns[9:])

In [69]:
cars1.head()

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model,origin,car
0,18.0,8,307,130,3504,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350,165,3693,11.5,70,1,buick skylark 320
2,18.0,8,318,150,3436,11.0,70,1,plymouth satellite
3,16.0,8,304,150,3433,12.0,70,1,amc rebel sst
4,17.0,8,302,140,3449,10.5,70,1,ford torino


### Step 5. What is the number of observations in each dataset?

In [70]:
print(f"Cars 1 Observations: {cars1.shape[0]}")
print(f"Cars 2 Observations: {cars2.shape[0]}")

Cars 1 Observations: 198
Cars 2 Observations: 200


### Step 6. Join cars1 and cars2 into a single DataFrame called cars

In [71]:
cars = pd.concat([cars1, cars2])

In [72]:
cars

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model,origin,car
0,18.0,8,307,130,3504,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350,165,3693,11.5,70,1,buick skylark 320
2,18.0,8,318,150,3436,11.0,70,1,plymouth satellite
3,16.0,8,304,150,3433,12.0,70,1,amc rebel sst
4,17.0,8,302,140,3449,10.5,70,1,ford torino
...,...,...,...,...,...,...,...,...,...
195,27.0,4,140,86,2790,15.6,82,1,ford mustang gl
196,44.0,4,97,52,2130,24.6,82,2,vw pickup
197,32.0,4,135,84,2295,11.6,82,1,dodge rampage
198,28.0,4,120,79,2625,18.6,82,1,ford ranger


### Step 7. Oops, there is a column missing, called owners. Create a random number Series from 15,000 to 73,000.

In [76]:
np.random.seed(450)
owners = pd.Series(np.random.randint(15000, 73000, cars.shape[0]), name = "owners").sort_values()
owners.shape

(398,)

### Step 8. Add the column owners to cars

In [79]:
cars["owners"] = owners

In [80]:
cars

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model,origin,car,owners
0,18.0,8,307,130,3504,12.0,70,1,chevrolet chevelle malibu,41175
1,15.0,8,350,165,3693,11.5,70,1,buick skylark 320,57286
2,18.0,8,318,150,3436,11.0,70,1,plymouth satellite,40038
3,16.0,8,304,150,3433,12.0,70,1,amc rebel sst,30658
4,17.0,8,302,140,3449,10.5,70,1,ford torino,31945
...,...,...,...,...,...,...,...,...,...,...
195,27.0,4,140,86,2790,15.6,82,1,ford mustang gl,17357
196,44.0,4,97,52,2130,24.6,82,2,vw pickup,67101
197,32.0,4,135,84,2295,11.6,82,1,dodge rampage,49006
198,28.0,4,120,79,2625,18.6,82,1,ford ranger,51981
