# MPG Cars

Check out [Cars Exercises Video Tutorial](https://www.youtube.com/watch?v=avzLRBxoguU&list=PLgJhDSE2ZLxaY_DigHeiIDC1cD09rXgJv&index=3) to watch a data scientist go through the exercises

### Introduction:

The following exercise utilizes data from [UC Irvine Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/Auto+MPG)

### Step 1. Import the necessary libraries

In [1]:
import pandas as pd
import numpy as np

### Step 2. Import the first dataset [cars1](https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars1.csv) and [cars2](https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars2.csv).  

   ### Step 3. Assign each to a to a variable called cars1 and cars2

In [2]:
cars1 = pd.read_csv("https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars1.csv")
cars2 = pd.read_csv("https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars2.csv")

print(cars1.head())
print(cars2.head())

    mpg  cylinders  displacement horsepower  weight  acceleration  model  \
0  18.0          8           307        130    3504          12.0     70   
1  15.0          8           350        165    3693          11.5     70   
2  18.0          8           318        150    3436          11.0     70   
3  16.0          8           304        150    3433          12.0     70   
4  17.0          8           302        140    3449          10.5     70   

   origin                        car  Unnamed: 9  Unnamed: 10  Unnamed: 11  \
0       1  chevrolet chevelle malibu         NaN          NaN          NaN   
1       1          buick skylark 320         NaN          NaN          NaN   
2       1         plymouth satellite         NaN          NaN          NaN   
3       1              amc rebel sst         NaN          NaN          NaN   
4       1                ford torino         NaN          NaN          NaN   

   Unnamed: 12  Unnamed: 13  
0          NaN          NaN  
1          NaN

### Step 4. Oops, it seems our first dataset has some unnamed blank columns, fix cars1

In [3]:
cars1 = cars1.loc[:, "mpg":"car"]
cars1.head()

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model,origin,car
0,18.0,8,307,130,3504,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350,165,3693,11.5,70,1,buick skylark 320
2,18.0,8,318,150,3436,11.0,70,1,plymouth satellite
3,16.0,8,304,150,3433,12.0,70,1,amc rebel sst
4,17.0,8,302,140,3449,10.5,70,1,ford torino


### Step 5. What is the number of observations in each dataset?

In [4]:
print(cars1.shape)
print(cars2.shape)

(198, 9)
(200, 9)


### Step 6. Join cars1 and cars2 into a single DataFrame called cars

In [8]:
cars = cars1.add(cars2)
cars

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model,origin,car
0,51.0,12.0,398.0,13053,5299.0,29.4,146.0,4.0,chevrolet chevelle malibuhonda civic
1,35.0,14.0,575.0,165100,7344.0,29.2,146.0,2.0,buick skylark 320dodge aspen se
2,36.0,14.0,568.0,15078,7010.0,32.0,146.0,2.0,plymouth satelliteford granada ghia
3,34.5,14.0,554.0,150110,7078.0,28.2,146.0,2.0,amc rebel sstpontiac ventura sj
4,34.5,14.0,560.0,14095,6642.0,28.3,146.0,2.0,ford torinoamc pacer d/l
...,...,...,...,...,...,...,...,...,...
195,56.0,8.0,225.0,5286,4825.0,37.8,158.0,2.0,chevrolet chevetteford mustang gl
196,68.5,8.0,195.0,6052,4294.0,46.7,158.0,3.0,chevrolet woodyvw pickup
197,61.0,8.0,225.0,7084,4232.0,25.8,158.0,3.0,vw rabbitdodge rampage
198,,,,,,,,,


### Step 7. Oops, there is a column missing, called owners. Create a random number Series from 15,000 to 73,000.

In [9]:
nr_owners = np.random.randint(15000, high=73001, size=398, dtype='l')
nr_owners

array([36335, 65072, 43772, 64021, 19196, 37827, 20993, 32952, 34509,
       27972, 19844, 56451, 28734, 67945, 29886, 72364, 43472, 16666,
       55689, 48705, 45562, 16797, 68121, 66907, 37089, 42170, 35799,
       56620, 20072, 50533, 65787, 50016, 16958, 56181, 72074, 30127,
       54991, 28461, 33208, 38850, 21032, 65908, 49939, 15051, 64640,
       37958, 42994, 62349, 30350, 46507, 69765, 31676, 65702, 37618,
       56440, 44644, 38398, 65285, 71473, 49784, 42162, 35202, 55356,
       15131, 67987, 44750, 53011, 39183, 24836, 35229, 46126, 24740,
       28127, 64223, 45672, 56356, 67091, 22365, 43695, 51771, 31868,
       50622, 23188, 62334, 63619, 54646, 29068, 59306, 20363, 72136,
       63499, 26417, 39895, 32590, 45237, 53897, 16231, 34827, 35068,
       57365, 21950, 59011, 64699, 50906, 61144, 64582, 70129, 21982,
       34695, 63213, 36061, 61958, 28591, 23756, 16814, 20511, 63908,
       56802, 33815, 68410, 20077, 17534, 23796, 34269, 67933, 56836,
       53401, 26791,

### Step 8. Add the column owners to cars

In [10]:
cars['owners'] = nr_owners
cars.tail()

ValueError: Length of values (398) does not match length of index (200)