# MPG Cars

### Introduction:

The following exercise utilizes data from [UC Irvine Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/Auto+MPG)

### Step 1. Import the necessary libraries

In [1]:
import pandas as pd
import numpy as np

### Step 2. Import the first dataset [cars1](https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars1.csv) and [cars2](https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars2.csv).  

   ### Step 3. Assign each to a variable called cars1 and cars2

In [2]:
cars1 = pd.read_csv("https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars1.csv")
cars2 = pd.read_csv("https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars2.csv")

print(cars1.head())
print(cars2.head()) #here we are using .head to make sure our variables are correctly assigned to their respective dataframes

    mpg  cylinders  displacement horsepower  weight  acceleration  model  \
0  18.0          8           307        130    3504          12.0     70   
1  15.0          8           350        165    3693          11.5     70   
2  18.0          8           318        150    3436          11.0     70   
3  16.0          8           304        150    3433          12.0     70   
4  17.0          8           302        140    3449          10.5     70   

   origin                        car  Unnamed: 9  Unnamed: 10  Unnamed: 11  \
0       1  chevrolet chevelle malibu         NaN          NaN          NaN   
1       1          buick skylark 320         NaN          NaN          NaN   
2       1         plymouth satellite         NaN          NaN          NaN   
3       1              amc rebel sst         NaN          NaN          NaN   
4       1                ford torino         NaN          NaN          NaN   

   Unnamed: 12  Unnamed: 13  
0          NaN          NaN  
1          NaN

### Step 4. Oops, it seems our first dataset has some unnamed blank columns, fix cars1

In [3]:
cars1 = cars1.loc[:, "mpg":"car"]
cars1.head() #here we use .loc in combination with splicing to clean the data frame down to the columns 'mpg' through column 'car' to effectively get rid of the unnamed columns that came after 

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model,origin,car
0,18.0,8,307,130,3504,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350,165,3693,11.5,70,1,buick skylark 320
2,18.0,8,318,150,3436,11.0,70,1,plymouth satellite
3,16.0,8,304,150,3433,12.0,70,1,amc rebel sst
4,17.0,8,302,140,3449,10.5,70,1,ford torino


### Step 5. What is the number of observations in each dataset?

In [4]:
print(cars1.shape)
print(cars2.shape) #using .shape shows us the number of rows and columns respectively as a tuple. 

(198, 9)
(200, 9)


### Step 6. Join cars1 and cars2 into a single DataFrame called cars

In [5]:
cars = cars1.append(cars2)
cars #here we have used the .append code to add car2 onto the cars1 dataframe and assigned it to a new variable 'cars', at the bottom we can see that we now have 398 rows which is the same value as 198 and 200 from the above .shape outputs added together. 

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model,origin,car
0,18.0,8,307,130,3504,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350,165,3693,11.5,70,1,buick skylark 320
2,18.0,8,318,150,3436,11.0,70,1,plymouth satellite
3,16.0,8,304,150,3433,12.0,70,1,amc rebel sst
4,17.0,8,302,140,3449,10.5,70,1,ford torino
...,...,...,...,...,...,...,...,...,...
195,27.0,4,140,86,2790,15.6,82,1,ford mustang gl
196,44.0,4,97,52,2130,24.6,82,2,vw pickup
197,32.0,4,135,84,2295,11.6,82,1,dodge rampage
198,28.0,4,120,79,2625,18.6,82,1,ford ranger


### Step 7. Oops, there is a column missing, called owners. Create a random number Series from 15,000 to 73,000.

In [6]:
nr_owners = np.random.randint(15000, high=73001, size=398, dtype='l')
nr_owners #here we used numpys random and randomint to make random integers from value 15000 up to 73000 and set them to a variable 'nr_owners'

array([60384, 50982, 66910, 60594, 25793, 33433, 39253, 43190, 36543,
       69803, 19307, 63729, 20297, 41662, 50290, 25588, 25190, 36341,
       60659, 37518, 50573, 26415, 32863, 40390, 48516, 61426, 43691,
       62672, 51699, 42342, 25639, 34427, 54605, 50924, 34136, 19215,
       69317, 18491, 44229, 71501, 42451, 62604, 51646, 23562, 20033,
       16459, 54551, 29115, 53329, 49405, 47527, 37637, 50241, 34561,
       53531, 66725, 69385, 55274, 62617, 48101, 64851, 49907, 63251,
       23362, 42847, 36171, 54632, 51727, 54940, 41334, 59055, 23875,
       35449, 33033, 50744, 20941, 26896, 38489, 41739, 66523, 18405,
       68891, 17932, 23003, 38934, 58384, 58888, 19277, 51306, 22812,
       44940, 68578, 61244, 64756, 18923, 31273, 20614, 44637, 33305,
       26328, 21726, 37566, 66235, 53706, 54451, 42586, 61859, 51615,
       56255, 39629, 24597, 67877, 17977, 62504, 40763, 29510, 17254,
       39978, 34727, 53979, 17178, 62796, 41405, 20861, 54484, 33343,
       30195, 55372,

### Step 8. Add the column owners to cars

In [7]:
cars['owners'] = nr_owners
cars.tail() #here we took the variable above 'nr_owners' that were set to random variables and put them in a column 'owners' which we can see worked when we call .tail to look at the last rows of the dataframe. 

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model,origin,car,owners
195,27.0,4,140,86,2790,15.6,82,1,ford mustang gl,18398
196,44.0,4,97,52,2130,24.6,82,2,vw pickup,52824
197,32.0,4,135,84,2295,11.6,82,1,dodge rampage,50183
198,28.0,4,120,79,2625,18.6,82,1,ford ranger,35330
199,31.0,4,119,82,2720,19.4,82,1,chevy s-10,48378
