## Bubble Chart 
https://plot.ly/python/bubble-charts/

A bubble chart is a variation of a scatter chart in which the data points are replaced with bubbles, and additional dimension of the data is represented in the size of the bubbles and color.

### Importing packages

In [1]:
import pandas as pd
import plotly
import plotly.graph_objs as go
import plotly.offline as offline

# Graphs in Plotly are plotted using the "graph_objs" library
# We want these graphs to be available offile. In order for Plotly
# visuals to display within Jupyter Notebook, we need to enable
# notebook mode.
offline.init_notebook_mode(connected=True)

##### Regular 2D scatter plot

In [2]:
trace = go.Scatter (x = [15, 18, 21, 25],
                    
                    y = [100, 400, 300, 200],
                    
                    mode = 'markers')

data = [trace]

offline.iplot(data)

#### Representing a 3rd dimension of data in the same scatter plot

In [3]:
z = [25, 100, 75, 50]

In [4]:
trace = go.Scatter (x = [15, 18, 21, 25],
                    
                    y = [100, 400, 300, 200],
                    
                    mode = 'markers',
                    
                    marker = dict(size = z)
                   )

data = [trace]

offline.iplot(data)

#### Representing a 4th dimension in the same plot and use color to represent it

In [5]:
i = [5, 6, 8, 4]

In [6]:
trace = go.Scatter (x = [15, 18, 21, 25],
                    
                    y = [100, 400, 300, 200],
                    
                    mode = 'markers',
                    
                    marker = dict(size = z,
                                  color = i,
                                  colorscale = 'Portland',
                                  showscale = True)
                   )

In [7]:
data = [trace]
offline.iplot(data)

#### Loading the dataset
Here we are loading the csv format data that we saved in the previous demo

In [8]:
housing_data = pd.read_csv('datasets/housing.csv')

housing_data.sample(10)

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value,ocean_proximity
2017,-119.8,36.72,19.0,1334.0,336.0,1171.0,319.0,1.0481,48500.0,INLAND
7293,-118.23,33.98,35.0,1366.0,496.0,2160.0,497.0,2.2059,150000.0,<1H OCEAN
5398,-118.43,34.03,45.0,1740.0,311.0,788.0,306.0,5.2099,373600.0,<1H OCEAN
5935,-117.87,34.15,24.0,5745.0,735.0,2061.0,679.0,8.2827,451400.0,INLAND
11430,-117.97,33.66,22.0,3914.0,600.0,1871.0,607.0,5.8541,281500.0,<1H OCEAN
10173,-117.98,33.86,26.0,1240.0,285.0,781.0,315.0,4.1287,205800.0,<1H OCEAN
17182,-122.51,37.53,17.0,1574.0,262.0,672.0,241.0,7.2929,355800.0,NEAR OCEAN
13397,-117.53,34.06,18.0,5605.0,1303.0,4028.0,1145.0,2.9386,116400.0,INLAND
18107,-122.01,37.31,23.0,6846.0,1078.0,2951.0,1063.0,6.3702,332000.0,<1H OCEAN
15592,-116.32,33.28,19.0,1791.0,367.0,327.0,185.0,3.3625,100000.0,INLAND


In [9]:
housing_data.shape

(20640, 10)

##### Dataset is very big so we are taking only 1% length of the dataset

In [10]:
housing_data = housing_data.sample(frac=0.07).reset_index(drop=True)

housing_data.shape

(1445, 10)

##### Showing unique values of ocean_proximity

In [11]:
housing_data['ocean_proximity'].unique()

array(['NEAR BAY', 'INLAND', '<1H OCEAN', 'NEAR OCEAN', 'ISLAND'],
      dtype=object)

##### Plotting the bubble chart where the size of bubble is related to households and color is related to housing_median_age

In [15]:
trace = go.Scatter(x = housing_data['median_income'],
                   y = housing_data['median_house_value'],
                   
                   mode = 'markers',
                   
                   marker = dict(
                                 size = housing_data['total_rooms'],
                                 sizeref = 500, # Total Rooms are really large values.
                                 # "sizeref" will divide all of the values by 500
                                 # so that they will scale down to smaller numbers
                                 # The proportion will be maintained.
                                
                                 color = housing_data['housing_median_age'],
                                 colorscale = 'Jet',
                                 showscale = True))

In [16]:
data = [trace]

layout = go.Layout(height = 600,
                   width = 900,
                  
                   title = 'Housing Data',
                   hovermode = 'closest')

In [14]:
fig = go.Figure(data = data,
                layout = layout)

offline.iplot(fig)