# MPG Cars

Check out [Cars Exercises Video Tutorial](https://www.youtube.com/watch?v=avzLRBxoguU&list=PLgJhDSE2ZLxaY_DigHeiIDC1cD09rXgJv&index=3) to watch a data scientist go through the exercises

### Introduction:

The following exercise utilizes data from [UC Irvine Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/Auto+MPG)

### Step 1. Import the necessary libraries

In [1]:
import pandas as pd
import numpy as np

### Step 2. Import the first dataset [cars1](https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars1.csv) and [cars2](https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars2.csv).  

   ### Step 3. Assign each to a to a variable called cars1 and cars2

In [2]:
cars1 = pd.read_csv("https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars1.csv")
cars2 = pd.read_csv("https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars2.csv")

print(cars1.head())
print(cars2.head())

    mpg  cylinders  displacement  ... Unnamed: 11  Unnamed: 12  Unnamed: 13
0  18.0          8           307  ...         NaN          NaN          NaN
1  15.0          8           350  ...         NaN          NaN          NaN
2  18.0          8           318  ...         NaN          NaN          NaN
3  16.0          8           304  ...         NaN          NaN          NaN
4  17.0          8           302  ...         NaN          NaN          NaN

[5 rows x 14 columns]
    mpg  cylinders  displacement  ... model  origin                 car
0  33.0          4            91  ...    76       3         honda civic
1  20.0          6           225  ...    76       1      dodge aspen se
2  18.0          6           250  ...    76       1   ford granada ghia
3  18.5          6           250  ...    76       1  pontiac ventura sj
4  17.5          6           258  ...    76       1       amc pacer d/l

[5 rows x 9 columns]


### Step 4. Oops, it seems our first dataset has some unnamed blank columns, fix cars1

In [3]:
cars1 = cars1.loc[:, "mpg":"car"]
cars1.head()

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model,origin,car
0,18.0,8,307,130,3504,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350,165,3693,11.5,70,1,buick skylark 320
2,18.0,8,318,150,3436,11.0,70,1,plymouth satellite
3,16.0,8,304,150,3433,12.0,70,1,amc rebel sst
4,17.0,8,302,140,3449,10.5,70,1,ford torino


### Step 5. What is the number of observations in each dataset?

In [4]:
print(cars1.shape)
print(cars2.shape)

(198, 9)
(200, 9)


### Step 6. Join cars1 and cars2 into a single DataFrame called cars

In [5]:
cars = cars1.append(cars2)
cars

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model,origin,car
0,18.0,8,307,130,3504,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350,165,3693,11.5,70,1,buick skylark 320
2,18.0,8,318,150,3436,11.0,70,1,plymouth satellite
3,16.0,8,304,150,3433,12.0,70,1,amc rebel sst
4,17.0,8,302,140,3449,10.5,70,1,ford torino
...,...,...,...,...,...,...,...,...,...
195,27.0,4,140,86,2790,15.6,82,1,ford mustang gl
196,44.0,4,97,52,2130,24.6,82,2,vw pickup
197,32.0,4,135,84,2295,11.6,82,1,dodge rampage
198,28.0,4,120,79,2625,18.6,82,1,ford ranger


### Step 7. Oops, there is a column missing, called owners. Create a random number Series from 15,000 to 73,000.

In [6]:
nr_owners = np.random.randint(15000, high=73001, size=398, dtype='l')
nr_owners

array([61284, 30044, 24225, 72552, 23583, 35183, 33914, 19468, 64569,
       56604, 32959, 56901, 31963, 41657, 71040, 56236, 69360, 60958,
       52747, 17508, 72944, 67910, 43331, 48457, 39367, 66675, 65859,
       44554, 17727, 62301, 48626, 36408, 36036, 38971, 62797, 15921,
       70697, 57695, 66004, 60161, 32763, 47755, 54008, 55258, 53733,
       46852, 22219, 15061, 55597, 30572, 19768, 30641, 25133, 58249,
       72202, 72119, 20051, 16102, 28175, 68727, 39618, 67731, 72875,
       71049, 52259, 33132, 61844, 57322, 18917, 21077, 62250, 18236,
       46425, 68262, 68443, 47295, 70538, 41755, 70850, 38882, 34722,
       69634, 27307, 62944, 50361, 46277, 61495, 32215, 57305, 57184,
       26484, 72100, 61316, 31345, 41478, 67463, 30966, 33945, 60039,
       65722, 64794, 72689, 65009, 60759, 61042, 53578, 26660, 31473,
       66807, 58317, 55318, 54623, 39456, 59981, 27195, 50162, 51483,
       70621, 37222, 56565, 72841, 52509, 65014, 45119, 70563, 52433,
       35314, 69797,

### Step 8. Add the column owners to cars

In [7]:
cars['owners'] = nr_owners
cars.tail()

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model,origin,car,owners
195,27.0,4,140,86,2790,15.6,82,1,ford mustang gl,62771
196,44.0,4,97,52,2130,24.6,82,2,vw pickup,24013
197,32.0,4,135,84,2295,11.6,82,1,dodge rampage,41166
198,28.0,4,120,79,2625,18.6,82,1,ford ranger,59943
199,31.0,4,119,82,2720,19.4,82,1,chevy s-10,24945
