# 13. AUTO DATA: MULTICLASS CLASSIFICATION
---

## 1. Introducing the Data

- **Description**: The dataset we will be working with contains information on various cars. For each car we have information about the technical aspects of the vehicle such as the motor's displacement, the weight of the car, the miles per gallon, and how fast the car accelerates. Using this information we will predict the origin of the vehicle, either North America, Europe, or Asia.
- **Origin**: [University of California Irvine](https://archive.ics.uci.edu/ml/datasets/Auto+MPG)
- **Modified by**: [Dataquest.io](https://app.dataquest.io/m/24/multiclass-classification/1/introduction-to-the-data)
    - We'll be working with `auto-mpg.data`, which omits the 8 rows containing missing values for fuel efficiency (mpg column). 
    - We've converted this data into a CSV file named auto.csv for you.
- **Filename**: `auto.csv`, downloaded from Dataquest (Course 25.3)
    - Course 25: Machine Learning in Python Intermediate
    - Mission 3: Multiclass Classification
- **Attributes**: Here are the columns in the dataset:
    - `mpg` -- Miles per gallon, Continuous.
    - `cylinders` -- Number of cylinders in the motor, Integer, Ordinal, and Categorical.
    - `displacement` -- Size of the motor, Continuous.
    - `horsepower` -- Horsepower produced, Continuous.
    - `weight` -- Weights of the car, Continuous.
    - `acceleration` -- Acceleration, Continuous.
    - `year` -- Year the car was built, Integer and Categorical.
    - `origin` -- Integer and Categorical. 1: North America, 2: Europe, 3: Asia.

In [1]:
import pandas as pd
import numpy as np
pd.set_option("display.max_columns", 99)
pd.set_option("display.max_rows", 999)
pd.set_option('precision', 3)

cars = pd.read_csv('data/auto.csv')
print(cars.shape)
cars.head()

(392, 8)


Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,year,origin
0,18.0,8,307.0,130.0,3504.0,12.0,70,1
1,15.0,8,350.0,165.0,3693.0,11.5,70,1
2,18.0,8,318.0,150.0,3436.0,11.0,70,1
3,16.0,8,304.0,150.0,3433.0,12.0,70,1
4,17.0,8,302.0,140.0,3449.0,10.5,70,1


In [2]:
unique_regions = cars["origin"].unique()
print(unique_regions)

[1 3 2]


## 2. Making Dummy Variables

In [6]:
cars = cars.copy()

dummy_cylinders = pd.get_dummies(cars["cylinders"], prefix="cyl")
cars = pd.concat([cars, dummy_cylinders], axis=1)

dummy_years = pd.get_dummies(cars["year"], prefix="year")
cars = pd.concat([cars, dummy_years], axis=1)

cars_1 = cars.drop(["year", "cylinders"], axis=1)
print(cars_1.shape)
cars_1.head()

(392, 78)


Unnamed: 0,mpg,displacement,horsepower,weight,acceleration,origin,cyl_3,cyl_4,cyl_5,cyl_6,cyl_8,year_70,year_71,year_72,year_73,year_74,year_75,year_76,year_77,year_78,year_79,year_80,year_81,year_82,cyl_3.1,cyl_4.1,cyl_5.1,cyl_6.1,cyl_8.1,year_70.1,year_71.1,year_72.1,year_73.1,year_74.1,year_75.1,year_76.1,year_77.1,year_78.1,year_79.1,year_80.1,year_81.1,year_82.1,cyl_3.2,cyl_4.2,cyl_5.2,cyl_6.2,cyl_8.2,year_70.2,year_71.2,year_72.2,year_73.2,year_74.2,year_75.2,year_76.2,year_77.2,year_78.2,year_79.2,year_80.2,year_81.2,year_82.2,cyl_3.3,cyl_4.3,cyl_5.3,cyl_6.3,cyl_8.3,year_70.3,year_71.3,year_72.3,year_73.3,year_74.3,year_75.3,year_76.3,year_77.3,year_78.3,year_79.3,year_80.3,year_81.3,year_82.3
0,18.0,307.0,130.0,3504.0,12.0,1,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0
1,15.0,350.0,165.0,3693.0,11.5,1,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0
2,18.0,318.0,150.0,3436.0,11.0,1,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0
3,16.0,304.0,150.0,3433.0,12.0,1,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0
4,17.0,302.0,140.0,3449.0,10.5,1,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0
