<a href="https://colab.research.google.com/github/sandeep92134/The-Data-Visualization-Workshop-by-packt/blob/master/module%206/Activity6.01%3A%20Plotting%20Mean%20Car%20Prices%20of%20Manufacturers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Activity 6.01: Plotting Mean Car Prices of Manufacturers 

This activity will combine what you have learned about Bokeh already.    
We will use the basics to create a visualization that displays the mean price for each car manufacturer of our dataset.

In the process we will first plot all cars with their prices and then slowly develop a more sophisticated visualization that also uses color to visually focus the manufacturers with the highest mean prices.

#### Loading our dataset

In [1]:
# importing the necessary dependencies
import pandas as pd
from bokeh.io import output_notebook

output_notebook()

In [2]:
# loading the Dataset with geoplotlib
dataset = pd.read_csv('https://raw.githubusercontent.com/PacktWorkshops/The-Data-Visualization-Workshop/master/Datasets/automobiles.csv')

In [3]:
# looking at the dataset
dataset.head()

Unnamed: 0,make,fuel-type,num-of-doors,body-style,engine-location,length,width,height,num-of-cylinders,horsepower,peak-rpm,city-mpg,highway-mpg,price
0,alfa-romero,gas,two,convertible,front,168.8,64.1,48.8,four,111,5000,21,27,13495
1,alfa-romero,gas,two,convertible,front,168.8,64.1,48.8,four,111,5000,21,27,16500
2,alfa-romero,gas,two,hatchback,front,171.2,65.5,52.4,six,154,5000,19,26,16500
3,audi,gas,four,sedan,front,176.6,66.2,54.3,four,102,5500,24,30,13950
4,audi,gas,four,sedan,front,176.6,66.4,54.3,five,115,5500,18,22,17450


Our dataset contains the following columns:

- `make`: Manufacturer of the car
- `fuel-type`: diesel, gas
- `num-of-doors`: Number of doors
- `body-style`: Body style of the car e.g. convertible
- `engine-location`: front, rear
- `length`: continuous from 141.1 to 208.1
- `width`: continuous from 60.3 to 72.3
- `height`: continuous from 47.8 to 59.8
- `num-of-cylinders`: Number of cylinders, e.g. eight
- `horsepower`: Amount of Horsepower
- `peak-rpm`: Maximum RPM
- `city-mpg`: Fuel consumption in the city
- `highway-mpg`: Fuel consumption on the Highway
- `price`: Price of the car

---

#### Plotting each car with its price

We will use the plotting interface of Bokeh to do some basic visualization first.   
Let's plot each car with its price

Import `figure` and `show` from the `bokeh.plotting` interface.

In [4]:
# importing the necessary dependencies 
from bokeh.plotting import figure, show

In this first task we want to use the index as our x axis since we just want to plot each car with its price.

Create a new column in our dataset that uses the `dataset.index` as values.

In [5]:
# adding a new column with the indices called index
dataset['index'] = dataset.index

Once we have our usable index column, we can plot our cars.

Create a new figure and plot each car using a scatter plot with the index and price column.   
Give the visualization a title of `Car prices` and name the x axis `Car Index`. The y axis should be named `Price`. 

In [7]:
# plotting a point for each car with its price
plot = figure(title='Car prices', x_axis_label='Car Index', y_axis_label='Price')
plot.scatter(dataset['index'], dataset['price'])

show(plot)

Of course, this visualization doesn't give us any insights into which car manufacturer, on average, produces the most expensive cars.   
But it's a good start to get an overview of what values are present in our dataset.

---

#### Grouping cars from manufacturers together

As we have learned in the previous chapters, we can group and aggregate data in our datasets.

Group the dataset using `groupby` and the column `make`. Then use the `mean` method to get the mean value for each column.   
We don't want the make column to be used as index, provide the `as_index=False` argument to `groupby`.

Print out the grouped average dataset after to see the difference from the initial dataset.

In [8]:
# grouping the dataset by the make column and aggregating it by the mean
grouped_average = dataset.groupby(['make'], as_index=False).mean()
grouped_average

Unnamed: 0,make,length,width,height,city-mpg,highway-mpg,price,index
0,alfa-romero,169.6,64.566667,50.0,20.333333,26.666667,15498.333333,1.0
1,audi,184.766667,68.85,54.833333,19.333333,24.5,17859.166667,5.5
2,bmw,184.5,66.475,54.825,19.375,25.375,26118.75,12.5
3,chevrolet,151.933333,62.5,52.4,41.0,46.333333,6007.0,18.0
4,dodge,160.988889,64.166667,51.644444,28.0,34.111111,7875.444444,24.0
5,honda,160.769231,64.384615,53.238462,30.384615,35.461538,8184.692308,35.0
6,isuzu,171.65,63.5,52.45,24.0,29.0,8916.5,42.5
7,jaguar,196.966667,69.933333,51.133333,14.333333,18.333333,34600.0,45.0
8,mazda,170.805882,65.588235,53.358824,25.705882,31.941176,10652.882353,55.0
9,mercedes-benz,195.2625,71.0625,55.725,18.5,21.0,33647.0,67.5


After we created our grouped dataset, we can use its values the same way we did with the previous plot.   
Note that we are dealing with categorical data, the manufacturer name, this time.

Create a new figure with a title of `Car Manufacturer Mean Prices`, an x axis of `Car Manufacturer`, and a y label of `Mean Price`.   
In addition to that, handle the categorical data by providing the `x_range` argument to the figure with the `make` column.

In [9]:
# plotting the manufacturers and their mean car prices
grouped_plot = figure(title='Car Manufacturer Mean Prices', x_axis_label='Car Manufacturer', y_axis_label='Mean Price'
                      , x_range=grouped_average['make'])
grouped_plot.scatter(grouped_average['make'], grouped_average['price'])

show(grouped_plot)

By default, the axis labels are aligned horizontally.

Assign the value of `vertical` to the `xaxis.major_label_orientation` attribute of our grouped_plot.   
Call the show method again to display the visualization.

In [10]:
# assigning the x label orientation the value of vertical
grouped_plot.xaxis.major_label_orientation = "vertical"

show(grouped_plot)

---

#### Adding color

To give the user a little bit more information about the data, we want to add some color based on the mean price of each manufacturer.     
In addition to that, we also want to increase the size of the points to make them pop more. 

- Import and setup a new `LinearColorMapper` with a palette of `Magma256`, and the min and max prices for the `low` and `high` arguments.
- Create a new figure with the same name, labels and x_range as before.
- Plot each manufacturer and provide a `size` argument with a size of 15.
- Provide the color argument to the scatter method and use the `field` and `transform` attributes to provide the column (y) and the color_mapper.
- As we've done before, set the label orientation to vertical

In [11]:
# adding color based on the mean price to our elements
from bokeh.models import LinearColorMapper

color_mapper = LinearColorMapper(palette='Magma256', low=min(grouped_average['price']), high=max(grouped_average['price']))

grouped_colored_plot = figure(title='Car Manufacturer Mean Prices', x_axis_label='Car Manufacturer', y_axis_label='Mean Price'
                      , x_range=grouped_average['make'])
grouped_colored_plot.scatter(grouped_average['make'], grouped_average['price'],color={'field': 'y', 'transform': color_mapper}, size=15)

grouped_colored_plot.xaxis.major_label_orientation = "vertical"

show(grouped_colored_plot)