# MPG Cars

### Introduction:

The following exercise utilizes data from [UC Irvine Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/Auto+MPG)

### Step 1. Import the necessary libraries

In [18]:
import pandas as pd
import numpy as np

### Step 2. Import the first dataset [cars1](https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars1.csv) and [cars2](https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars2.csv).  

   ### Step 3. Assign each to a variable called cars1 and cars2

In [19]:
url1 = 'https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars1.csv'
url2 = 'https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars2.csv'

cars1 = pd.read_csv(url1)
cars2 = pd.read_csv(url2)

### Step 4. Oops, it seems our first dataset has some unnamed blank columns, fix cars1

In [20]:
cars1.columns

Index(['mpg', 'cylinders', 'displacement', 'horsepower', 'weight',
       'acceleration', 'model', 'origin', 'car', 'Unnamed: 9', 'Unnamed: 10',
       'Unnamed: 11', 'Unnamed: 12', 'Unnamed: 13'],
      dtype='object')

In [21]:
cars1.drop(['Unnamed: 9', 'Unnamed: 10','Unnamed: 11', 'Unnamed: 12', 'Unnamed: 13'], inplace=True, axis=1)

In [22]:
cars1

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model,origin,car
0,18.0,8,307,130,3504,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350,165,3693,11.5,70,1,buick skylark 320
2,18.0,8,318,150,3436,11.0,70,1,plymouth satellite
3,16.0,8,304,150,3433,12.0,70,1,amc rebel sst
4,17.0,8,302,140,3449,10.5,70,1,ford torino
...,...,...,...,...,...,...,...,...,...
193,24.0,6,200,81,3012,17.6,76,1,ford maverick
194,22.5,6,232,90,3085,17.6,76,1,amc hornet
195,29.0,4,85,52,2035,22.2,76,1,chevrolet chevette
196,24.5,4,98,60,2164,22.1,76,1,chevrolet woody


### Step 5. What is the number of observations in each dataset?

In [23]:
cars1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 198 entries, 0 to 197
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   mpg           198 non-null    float64
 1   cylinders     198 non-null    int64  
 2   displacement  198 non-null    int64  
 3   horsepower    198 non-null    object 
 4   weight        198 non-null    int64  
 5   acceleration  198 non-null    float64
 6   model         198 non-null    int64  
 7   origin        198 non-null    int64  
 8   car           198 non-null    object 
dtypes: float64(2), int64(5), object(2)
memory usage: 14.0+ KB


In [24]:
cars2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   mpg           200 non-null    float64
 1   cylinders     200 non-null    int64  
 2   displacement  200 non-null    int64  
 3   horsepower    200 non-null    object 
 4   weight        200 non-null    int64  
 5   acceleration  200 non-null    float64
 6   model         200 non-null    int64  
 7   origin        200 non-null    int64  
 8   car           200 non-null    object 
dtypes: float64(2), int64(5), object(2)
memory usage: 14.2+ KB


### Step 6. Join cars1 and cars2 into a single DataFrame called cars

In [25]:
cars = pd.concat([cars1, cars2], axis=0).reset_index()
cars.sample(10)

Unnamed: 0,index,mpg,cylinders,displacement,horsepower,weight,acceleration,model,origin,car
146,146,28.0,4,90,75,2125,14.5,74,1,dodge colt
386,188,25.0,6,181,110,2945,16.4,82,1,buick century limited
110,110,22.0,4,108,94,2379,16.5,73,3,datsun 610
255,57,25.1,4,140,88,2720,15.4,78,1,ford fairmont (man)
295,97,35.7,4,98,80,1915,14.4,79,1,dodge colt hatchback custom
375,177,36.0,4,105,74,1980,15.3,82,2,volkswagen rabbit l
57,57,24.0,4,113,95,2278,15.5,72,3,toyota corona hardtop
305,107,28.4,4,151,90,2670,16.0,79,1,buick skylark limited
372,174,27.0,4,151,90,2735,18.0,82,1,pontiac phoenix
32,32,25.0,4,98,?,2046,19.0,71,1,ford pinto


### Step 7. Oops, there is a column missing, called owners. Create a random number Series from 15,000 to 73,000.

In [28]:
cars['owner'] = np.random.randint(15000, 73000,size=(398,))
cars.sample(10)

Unnamed: 0,index,mpg,cylinders,displacement,horsepower,weight,acceleration,model,origin,car,owner
14,14,24.0,4,113,95,2372,15.0,70,3,toyota corona mark ii,15263
349,151,34.1,4,91,68,1985,16.0,81,3,mazda glc 4,63157
396,198,28.0,4,120,79,2625,18.6,82,1,ford ranger,51495
372,174,27.0,4,151,90,2735,18.0,82,1,pontiac phoenix,52566
71,71,19.0,3,70,97,2330,13.5,72,3,mazda rx2 coupe,47362
263,65,17.7,6,231,165,3445,13.4,78,1,buick regal sport coupe (turbo),62610
72,72,15.0,8,304,150,3892,12.5,72,1,amc matador (sw),15835
185,185,26.0,4,98,79,2255,17.7,76,1,dodge colt,20184
221,23,17.5,8,305,145,3880,12.5,77,1,chevrolet caprice classic,70749
193,193,24.0,6,200,81,3012,17.6,76,1,ford maverick,48075


### Step 8. Add the column owners to cars