# Summarizing Automobile Evaluation Data



The car evaluation dataset has been sourced from the UCI Machine Learning Repository and has been slightly modified for this project. Specifically, one additional field `manufacturer_country` has been simulated for illustrative purposes. You can read more about the details, features, and original uses of this dataset in research on the [UCI data description page](https://archive.ics.uci.edu/ml/datasets/car+evaluation).

## Summarizing Manufacturing Country

1. `manufacturer_country` is a _nominal categorical variable_ that indicates the country of the manufacturer of each car reviewed. Create a table of frequencies of all the cars reviewed by `manufacturer_country`. What is the modal category? Which country appears 4th most frequently? Print out your results.

In [48]:
import pandas as pd
import numpy as np

car_eval = pd.read_csv('car_eval_dataset.csv')
print(car_eval.head())
print(car_eval.tail())

  buying_cost maintenance_cost doors capacity luggage safety acceptability  \
0       vhigh              low     4        4   small    med         unacc   
1       vhigh              med     3        4   small   high           acc   
2         med             high     3        2     med   high         unacc   
3         low              med     4     more     big    low         unacc   
4         low             high     2     more     med   high           acc   

  manufacturer_country  
0                China  
1               France  
2        United States  
3        United States  
4          South Korea  
    buying_cost maintenance_cost  doors capacity luggage safety acceptability  \
995         low              low      3        4     big    med          good   
996         low              med      4        4     big   high         vgood   
997       vhigh            vhigh      3     more   small    low         unacc   
998         low              low      4        4     big 

In [39]:
#creates a table of frequencies
manuf_all = car_eval.manufacturer_country.value_counts()
print(manuf_all)
#finds the 4th most frequent country by manufacturing
manuf_all.index[3]

Japan            228
Germany          218
South Korea      159
United States    138
Italy             97
France            87
China             73
Name: manufacturer_country, dtype: int64


'United States'

2. Calculate a table of proportions for countries that appear in `manufacturer_country` in the dataset. What percentage of cars were manufactured in Japan?

In [40]:
manuf_prop = car_eval.manufacturer_country.value_counts(normalize=True)
print(manuf_prop)#Japan produced 22,8% of the cars in the dataset


Japan            0.228
Germany          0.218
South Korea      0.159
United States    0.138
Italy            0.097
France           0.087
China            0.073
Name: manufacturer_country, dtype: float64


## Summarizing Buying Costs

3. `buying_cost` is a categorical variable which describes the cost of buying any car in the dataset. Print out a list of the possible values for this variable.

In [41]:
car_eval.buying_cost.unique()

array(['vhigh', 'med', 'low', 'high'], dtype=object)

4. Convert `buying_cost` to type `'category'`.

In [42]:
buying_cost_categories = ['low', 'med', 'high', 'vhigh']
car_eval['buying_cost'] = pd.Categorical(car_eval['buying_cost'], 
                                        buying_cost_categories, 
                                        ordered=True)

5. Calculate the median category of the `buying_cost` variable and print the result.

In [43]:
ind_cat = car_eval['buying_cost'].cat.codes.median()
median_cat = buying_cost_categories[int(ind_cat)]
print(median_cat)

med


## Summarizing Luggage Capacity

6. `luggage` is a categorical variable in the car evaluations dataset that records the luggage capacity for each reviewed car. Calculate a table of proportions for this variable and print the result.

In [44]:
car_eval.luggage.value_counts(dropna=False, normalize=True)

small    0.339
med      0.333
big      0.328
Name: luggage, dtype: float64

7. Without passing `normalize = True` to `.value_counts()`, can you replicate the result you got in the previous exercises?

In [45]:
car_eval.luggage.value_counts(dropna=False)/len(car_eval.luggage)

small    0.339
med      0.333
big      0.328
Name: luggage, dtype: float64

## Summarizing Passenger Capacity

8. Find the count of cars that have 5 or more doors. Print your result.

In [46]:
freq_door = (car_eval['doors'] == '5more').sum()
print(freq_door)

246


9. Find the proportion of cars that have 5+ doors and print the result.

In [47]:
door_prop = (car_eval['doors'] == '5more').mean()#finds the proportion/ 24.6%
print(door_prop)
print(len(car_eval.doors))#checks the overal number of values

0.246
1000
