# MPG Cars

### Introduction:

The following exercise utilizes data from [UC Irvine Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/Auto+MPG)

### Step 1. Import the necessary libraries

In [5]:
import pandas as pd
import numpy as np

With these commands we import the libraries that we will need for this exercise.

### Step 2. Import the first dataset [cars1](https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars1.csv) and [cars2](https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars2.csv).  

   ### Step 3. Assign each to a variable called cars1 and cars2

In [6]:
cars1 = pd.read_csv("https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars1.csv")
cars2 = pd.read_csv("https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars2.csv")

print(cars1.head())
print(cars2.head())

    mpg  cylinders  displacement horsepower  weight  acceleration  model  \
0  18.0          8           307        130    3504          12.0     70   
1  15.0          8           350        165    3693          11.5     70   
2  18.0          8           318        150    3436          11.0     70   
3  16.0          8           304        150    3433          12.0     70   
4  17.0          8           302        140    3449          10.5     70   

   origin                        car  Unnamed: 9  Unnamed: 10  Unnamed: 11  \
0       1  chevrolet chevelle malibu         NaN          NaN          NaN   
1       1          buick skylark 320         NaN          NaN          NaN   
2       1         plymouth satellite         NaN          NaN          NaN   
3       1              amc rebel sst         NaN          NaN          NaN   
4       1                ford torino         NaN          NaN          NaN   

   Unnamed: 12  Unnamed: 13  
0          NaN          NaN  
1          NaN

With the first two commands we can obtain the data from the databases located in the links provided by the exercise and the following two commands show us the data obtained with the previous commands.

### Step 4. Oops, it seems our first dataset has some unnamed blank columns, fix cars1

In [9]:
cars1 = cars1.loc[:, "mpg":"car"]
cars1.head()

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model,origin,car
0,18.0,8,307,130,3504,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350,165,3693,11.5,70,1,buick skylark 320
2,18.0,8,318,150,3436,11.0,70,1,plymouth satellite
3,16.0,8,304,150,3433,12.0,70,1,amc rebel sst
4,17.0,8,302,140,3449,10.5,70,1,ford torino


with this command we set a value in a range to the index labels that in this case is from mpg to car and then we show the database with the other command that can be observed in this part.

### Step 5. What is the number of observations in each dataset?

In [5]:
print(cars1.shape)
print(cars2.shape)

(198, 9)
(200, 9)


With these commands we show the numbers of rows and columns of each database (cars1 and cars2).

### Step 6. Join cars1 and cars2 into a single DataFrame called cars

In [6]:
cars = cars1.append(cars2)
cars

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model,origin,car
0,18.0,8,307,130,3504,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350,165,3693,11.5,70,1,buick skylark 320
2,18.0,8,318,150,3436,11.0,70,1,plymouth satellite
3,16.0,8,304,150,3433,12.0,70,1,amc rebel sst
4,17.0,8,302,140,3449,10.5,70,1,ford torino
...,...,...,...,...,...,...,...,...,...
195,27.0,4,140,86,2790,15.6,82,1,ford mustang gl
196,44.0,4,97,52,2130,24.6,82,2,vw pickup
197,32.0,4,135,84,2295,11.6,82,1,dodge rampage
198,28.0,4,120,79,2625,18.6,82,1,ford ranger


with this command we add the rows in cars1 at the end of cars2 but for this the columns must be identical and in this case that condition is met. Then we assigned it to the variable called cars and we loomed by putting the name of that variable on the command line.

### Step 7. Oops, there is a column missing, called owners. Create a random number Series from 15,000 to 73,000.

In [7]:
nr_owners = np.random.randint(15000, high=73001, size=398, dtype='l')
nr_owners

array([66332, 18496, 36657, 71726, 21828, 20122, 58033, 23635, 56562,
       58703, 63640, 16776, 58638, 47410, 47391, 24287, 48334, 65833,
       39814, 36932, 58161, 20426, 60244, 59809, 16052, 37866, 18520,
       55612, 65646, 57557, 56947, 65500, 69131, 59335, 72647, 52037,
       25456, 61603, 32806, 19931, 59334, 19449, 67614, 25699, 66189,
       56538, 38610, 66391, 22275, 34194, 56441, 32320, 39302, 19919,
       57497, 24771, 58426, 28056, 39318, 55665, 16795, 19894, 37126,
       44841, 34532, 37736, 71057, 71187, 26797, 43445, 46712, 41145,
       40842, 24724, 35789, 45522, 55513, 20807, 36431, 27233, 64664,
       60115, 46571, 23096, 44699, 40835, 68161, 15987, 23562, 19881,
       56805, 32237, 36780, 27047, 45407, 69389, 17528, 60758, 67235,
       31799, 35327, 24090, 62025, 23361, 31297, 29931, 29041, 36324,
       65842, 39924, 29342, 16357, 37737, 34996, 33096, 25680, 58874,
       41441, 21739, 71303, 30425, 28873, 51029, 68228, 23993, 71159,
       49949, 67215,

With this command we can create random data within a range specified above and we also specify the amount of data that will be created and the type we want to create that by default are int and then assign them to the nr_owners variable and put the name of this same variable on the command line to show the result.

### Step 8. Add the column owners to cars

In [8]:
cars['owners'] = nr_owners
cars.tail()

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model,origin,car,owners
195,27.0,4,140,86,2790,15.6,82,1,ford mustang gl,44137
196,44.0,4,97,52,2130,24.6,82,2,vw pickup,19768
197,32.0,4,135,84,2295,11.6,82,1,dodge rampage,41668
198,28.0,4,120,79,2625,18.6,82,1,ford ranger,40740
199,31.0,4,119,82,2720,19.4,82,1,chevy s-10,60852


With the first command that can be observed, we create a new column called owners in our table in which we will assign all the values obtained with the command seen previously in step 7 and then we will show the last 5 rows of our dataframe.