# Summarizing Automobile Evaluation Data

This dataset contains information on the cost and physical attributes of several thousand cars. Originally, this dataset was used for to train a classification model that assigned an acceptability score/category to cars based on these attributes.

The car evaluation dataset has been sourced from the UCI Machine Learning Repository and has been slightly modified for this project. Specifically, one additional field manufacturer_country has been simulated for illustrative purposes. You can read more about the details, features, and original uses of this dataset in research on the [ UCI data description page.](https://archive.ics.uci.edu/ml/datasets/car+evaluation).
***

## Importing the modules

In [77]:
import pandas as pd 
import numpy as np

## Quick look at the data

In [78]:
car_eval = pd.read_csv('car_eval_dataset.csv')
display(car_eval.head())

Unnamed: 0,buying_cost,maintenance_cost,doors,capacity,luggage,safety,acceptability,manufacturer_country
0,vhigh,low,4,4,small,med,unacc,China
1,vhigh,med,3,4,small,high,acc,France
2,med,high,3,2,med,high,unacc,United States
3,low,med,4,more,big,low,unacc,United States
4,low,high,2,more,med,high,acc,South Korea


## Summarizing Manufacturing Country

1. **Create a table of frequencies of all the cars reviewed by manufacturer_country. What is the modal category? Which country appears 4th most frequently? Print out your results.**

In [79]:
#frequency table
manufacturer_country_table = car_eval.manufacturer_country.value_counts(normalize=True)
display(manufacturer_country_table)

#most frequent observations
manufacturer_country_ind = car_eval.manufacturer_country.value_counts()

#getting the biggest manufacter country
biggest_manufacter_country = manufacturer_country_ind.index[0]
print("The biggest manufacturer is: " + str(biggest_manufacter_country))

Japan            0.228
Germany          0.218
South Korea      0.159
United States    0.138
Italy            0.097
France           0.087
China            0.073
Name: manufacturer_country, dtype: float64

The biggest manufacturer is: Japan


2. **Calculate a table of proportions for countries that appear in manufacturer_country in the dataset. What percentage of cars were manufactured in Japan?**

In [80]:
#percentage of cars that were manufactured in Japan
perc_manufactured_in_jp = manufacturer_country_table[0]*100
print("The percentage of cars manufactered in Japan is: " + str(perc_manufactured_in_jp) + "%")

The percentage of cars manufactered in Japan is: 22.8%


## Summarizing Buying Costs

3. **`buying_cost` is a categorical variable which describes the cost of buying any car in the dataset. Print out a list of the possible values for this variable.**

In [81]:
possible_value = car_eval.buying_cost.sample()
print("Random Possible Value: " + str(possible_value))

Random Possible Value: 763    high
Name: buying_cost, dtype: object


4. **`buying_cost` is an ordinal categorical variable, which means we can create an order associated with the values in the data and perform additional numeric operations on the variable. Create a new list, `buying_cost_categories`, that contains the unique values in `buying_cost`, ordered from lowest to highes**

In [82]:
buying_cost_categories = car_eval.buying_cost.unique()
print(buying_cost_categories)

#lists of buying costs ordered from lowest to highest 
buying_cost_categories = ['low','med','high', 'vhigh']
print(buying_cost_categories)

['vhigh' 'med' 'low' 'high']
['low', 'med', 'high', 'vhigh']


5. **Convert `buying_cost` to type 'category' using the list you created in the previous exercise.**

In [83]:
car_eval['buying_cost'] = pd.Categorical(car_eval['buying_cost'],buying_cost_categories,ordered=True)
print(car_eval['buying_cost'].cat.codes)

0      3
1      3
2      1
3      0
4      0
      ..
995    0
996    0
997    3
998    0
999    0
Length: 1000, dtype: int8


6. **Calculate the median category of the `buying_cost` variable and print the result.**

In [84]:
buying_cost_median_number = np.median(car_eval['buying_cost'].cat.codes)
print(buying_cost_median_number)

buying_cost_median = buying_cost_categories[int(buying_cost_median_number)]
print('The median category for buying cost is: ' + str(buying_cost_median))

1.0
The median category for buying cost is: med


## Summarizing Luggage Capacity

7. **`luggage` is a categorical variable in the car evaluations dataset that records the luggage capacity for each reviewed car. Calculate a table of proportions for this variable and print the result.**

In [85]:
luggage_frequency = car_eval['luggage'].value_counts(normalize=True)
print(luggage_frequency)

small    0.339
med      0.333
big      0.328
Name: luggage, dtype: float64


8. **Are there any missing values in this column? Replicate the table of proportions from the previous exercise, but do not drop any missing values from the count. Print your result.**

In [86]:
luggage_frequency_missing_values = car_eval['luggage'].value_counts(normalize=True,dropna=False)
print(luggage_frequency_missing_values)
#no missing values because the result above is the same as this one

small    0.339
med      0.333
big      0.328
Name: luggage, dtype: float64


9. **Without passing `normalize = True` to `.value_counts()`, can you replicate the result you got in the previous exercises?**

In [87]:
luggage_frequency_manual = car_eval['luggage'].value_counts() / len(car_eval['luggage'])
print(luggage_frequency_manual)

small    0.339
med      0.333
big      0.328
Name: luggage, dtype: float64


## Summarizing Passenger Capacity

10. **`doors` is a categorical variable in the car evaluations dataset that records the count of doors for each reviewed car. Find the count of cars that have 5 or more doors. You can identify cars with 5+ doors by looking for cars that have a value of `'5more'` in the doors column. Print your result.**

In [88]:
doors = (car_eval['doors'] == '5more').sum()
print(doors)

246


11. **Find the proportion of cars that have 5+ doors and print the result.**

In [89]:
more5_doors_car_proportion =  (car_eval['doors'] == '5more').mean()
print(more5_doors_car_proportion)

0.246
