## Data Visualization

- Pie Chart: Compare Percentages
- Bar Chart: Compare Scores across groups
- Histogram: Show frequency of values/value range
- Line Chart: Show trend of Scores
- Scatter Plot: Show Relationship between a pair of Scores
- Map: Show Geo Distribution of data

|Type|Variable Y|Variable X|
|:--:|:--:|:--:|
|Pie Chart|Fractions|None|
|Bar Chart|Numbers|Categories|
|Histogram|Integer|Categories/Value Range|
|Line Chart|Numbers|Time/Date/Period|
|Scatter Plot|Numbers|Numbers|
|Map|Latitude|Longtitude|

###  Sign up for Plot.ly

1. Sign up for Plot.ly: https://plot.ly/Auth/login/?action=signup#
2. Get your API token: Settings -> API Keys -> Regenerate Key -> Copy your newly created key
3. Save your API key somewhere

<div class="alert alert-block alert-warning">
**<b>Reminder</b>** Free account can only call Plot.ly API 100 times per day and generate up to 25 graphs.</div>

In [None]:
import plotly.plotly as py      #Import library and give it an abbreviated name
import plotly.graph_objs as go  #go: graph object
from plotly import tools

py.sign_in('USER NAME', 'API TOKEN') #fill in your user name and API token

***

## Pie Chart

In [None]:
labels = ['Female','Male']
values = [40,20]

trace = go.Pie(labels=labels, values=values)

py.iplot([trace], filename='pie_chart')

In [None]:
#change data labels by re-defining parameter "textinfo"
labels = ['Female','Male']
values = [40,20]

trace = go.Pie(labels=labels, values=values, textinfo='label+value')

py.iplot([trace], filename='pie_chart')

In [None]:
#change color setting by re-defining "marker" parameter
labels = ['Female','Male']
values = [40,20]

trace = go.Pie(labels=labels, values=values, marker={'colors':['red','blue']})

py.iplot([trace], filename='pie_chart')

In [None]:
#turn the pie chart into a donut by re-defining "hole" parameter
labels = ['Female','Male']
values = [40,20]

trace = go.Pie(labels=labels, values=values, hole=0.2)

py.iplot([trace], filename='pie_chart')

In [None]:
#change the graph size to 400*300 and add a title by re-defining "width" and "height" in "layout"
labels = ['Female','Male']
values = [40,20]

trace = go.Pie(labels=labels, values=values)
layout=go.Layout(width=400,height=300,title='Gender Distribution')
fig=go.Figure([trace],layout)

py.iplot(fig, filename='pie_chart')

#### <font style="color: blue">Practice:</font>
---
<font style="color: blue"> Please download the Hong Kong census data about educational attainment from <a href='https://juniorworld.github.io/python-workshop-2018/doc/Hong Kong Census Educational Attainment.csv'>this link</a>.
    <p>Create a pie chart to visualize the percentages of different education levels in 2016. The pie chart should meet following requirements:</p>
    1. Donut style
    2. Change slice colors
</font>

In [None]:
#Write down your code here
#---------------------------------------------------------







***

## Bar Chart
<br>For more details: https://plot.ly/python/reference/#bar

In [None]:
x = ['Female','Male']
y = [1.6,1.8]

trace = go.Bar(x=x,y=y)

py.iplot([trace], filename='bar_chart')

In [None]:
#Widen the gap between bars by increasing "bargap" parameters in layout
x = ['Female','Male']
y = [40,20]

trace = go.Bar(x=x,y=y)
layout = go.Layout(bargap=0.5)
fig = go.Figure([trace],layout)

py.iplot(fig, filename='bar_chart')

In [None]:
#Grouped bar chart
x = ['Female','Male']
y1 = [40,20]
y2 = [30,50]

trace1 = go.Bar(x=x,y=y1,name='class1')
trace2 = go.Bar(x=x,y=y2,name='class2')

py.iplot([trace1,trace2], filename='bar_chart')

In [None]:
#Stacked/Relative bar chart by re-defining "barmode" in layout
x = ['Female','Male']
y1 = [40,20]
y2 = [30,50]

trace1 = go.Bar(x=x,y=y1)
trace2 = go.Bar(x=x,y=y2)

layout = go.Layout(barmode='stack')
fig = go.Figure([trace1,trace2],layout)

py.iplot(fig, filename='bar_chart')

In [None]:
#100% Stacked bar chart by re-defining "barnorm" as "percent" in layout
x = ['Female','Male']
y1 = [40,20]
y2 = [30,50]

trace1 = go.Bar(x=x,y=y1)
trace2 = go.Bar(x=x,y=y2)

layout = go.Layout(barmode='stack',barnorm='percent')
fig = go.Figure([trace1,trace2],layout)

py.iplot(fig, filename='bar_chart')

In [None]:
x = ['Female','Male']
y1 = [40,20]
y2 = [30,50]

trace1 = go.Bar(x=x,y=y1)
trace2 = go.Bar(x=x,y=y2)

layout = go.Layout(barmode='stack',barnorm='fraction',yaxis={'tickformat':'%'})
fig = go.Figure([trace1,trace2],layout)

py.iplot(fig, filename='bar_chart')

#### <font style="color: blue">Practice:</font>
---
<font style="color: blue"> Please refer to "Hong Kong Census Educational Attainment.csv".
    <p>Create a bar chart to visualize the percentages of different education levels in different years, i.e. 2006, 2011 and 2016. The bar chart should meet following requirements:</p>
    1. A bar represents a year
    2. 100% Stacked bar chart: higher education levels stacked on top of lower ones and the bar's full length is 100%
    2. The gap between bar groups = 0.2
</font>

In [None]:
#Write down your code here
#---------------------------------------------------------






***

## Break

***

## Histogram
Histogram is a special type of bar chart where one's y value is its count. It is used to show data distribution: viusalize the skewness and central tendency.
<br>For more details: https://plot.ly/python/reference/#histogram

In [None]:
a=[1,2,3,3,4,4,4,5,5,6,7,3,3,2]
trace=go.Histogram(x=a)
py.iplot([trace],filename='Histogram')

In [None]:
#Change the bins by re-defining "size" parameter in xbins
a=[1,2,3,3,4,4,4,5,5,6,7,3,3,2]
trace=go.Histogram(x=a,xbins={'size':1})
py.iplot([trace],filename='Histogram')

In [None]:
#Convert into a 100% Histogram whose y value is percentage of getting a value
#Re-define the "histnorm" to a "percent" mode
a=[1,2,3,3,4,4,4,5,5,6,7,3,3,2]
trace=go.Histogram(x=a,xbins={'size':1},histnorm='probability')
layout=go.Layout(yaxis={'tickformat':'%'})
fig=go.Figure([trace],layout)
py.iplot(fig,filename='Histogram')

In [None]:
#Decrease every element in "a" by one unit to create a new list "b"
#Grouped Histogram
a=[1,2,3,3,4,4,4,5,5,6,7,3,3,2]
b=                 #Write your code here

trace1=go.Histogram(x=a,xbins={'size':1})
trace2=go.Histogram(x=b,xbins={'size':1})

py.iplot([trace1,trace2],filename='Histogram')

In [None]:
#Overlay Histogram of a and b
#Increase the transparency by re-defining "opacity" parameter
#Change color by re-defining "color" parameter in "marker"
#Change the value of "barmode" parameter in layout to "overlay"

trace1=go.Histogram(x=a,xbins={'size':1},opacity=0.5,marker={'color':'blue'})
trace2=go.Histogram(x=b,xbins={'size':1},opacity=0.5,marker={'color':'red'})

layout=go.Layout(barmode='overlay')

fig=go.Figure([trace1,trace2],layout)
py.iplot(fig,filename='Histogram')

#### <font style="color: blue">Practice:</font>
---
<font style="color: blue"> <font style="color: blue"> Please download YouTube Popularity data from <a href='https://juniorworld.github.io/python-workshop-2018/doc/Youtube.csv'>this link</a>.
    <p>Create three Histograms to visualize the distribution of views, likes, dislikes and comments. The histograms should meet following requirements:</p>
    1. One basic histogram to show distribution of "views"
    2. One basic histogram to show distribution of "log(views)"
    3. One 100% overlay histogram to show distributions of log(likes), log(dislikes) and log(comments)
Hint: to apply logarithmic transformation, you can use numpy's log10 function. For example: to calcualte the logrithm of a variable "a".
</font>

>```python
import numpy as np
a=np.log10(a)```

In [None]:
#Write your code here






## Line Chart
In Plot.ly, line chart is defined as a special scatter plot whose scatters are connected by lines.
<br>For more details: https://plot.ly/python/reference/#scatter

In [None]:
#create your first line chart
x=[1,2,3]
y=[10,22,34]
trace1=go.Scatter(x=x,y=y,mode='lines') #mode='lines','markers','lines+markers'

py.iplot([trace1],filename='line chart')

In [None]:
#add markers to it by changing mode to "lines+markers"
x=[1,2,3]
y=[10,22,34]
trace1=go.Scatter(x=x,y=y,mode='lines+markers')

py.iplot([trace1],filename='line chart')

In [None]:
#make it a dashed line by re-defining the "dash" parameters in "line"
#try other alternative shapes: "solid", "dot", "dash", "longdash", "dashdot", or "longdashdot"
x=[1,2,3]
y=[10,22,34]
trace1=go.Scatter(x=x,y=y,mode='lines+markers',line={'dash':'dash'})

py.iplot([trace1],filename='line chart')

In [None]:
#fill the area below
x=[1,2,3]
y=[-10,22,34]
trace1=go.Scatter(x=x,y=y,mode='lines',fill='tozeroy') #mode='lines'

py.iplot([trace1],filename='line chart')

In [None]:
#add another trace to it
x=[1,2,3]
y1=[10,22,34]
y2=[34,22,10]
trace1=go.Scatter(x=x,y=y1,mode='lines')
trace2=go.Scatter(x=x,y=y2,mode='lines')
py.iplot([trace1,trace2],filename='line chart')

In [None]:
#change the range of axis
x=[1,2,3]
y1=[10,22,34]
y2=[34,22,10]
trace1=go.Scatter(x=x,y=y1,mode='lines')
trace2=go.Scatter(x=x,y=y2,mode='lines')
layout=go.Layout(yaxis={'range':[0,35]},xaxis={'range':[0,3]})
fig=go.Figure([trace1,trace2],layout)
py.iplot(fig,filename='line chart')

In [None]:
#stacked line chart by re-defining "stackgroup" parameter
x=[1,2,3]
y1=[10,22,34]
y2=[34,22,10]
trace1=go.Scatter(x=x,y=y1,mode='lines',stackgroup='1')
trace2=go.Scatter(x=x,y=y2,mode='lines',stackgroup='1')

py.iplot([trace1,trace2],filename='line chart')

#### <font style="color: blue">Practice:</font>
---
<font style="color: blue"> <font style="color: blue"> Please download stock price data from <a href='https://juniorworld.github.io/python-workshop-2018/doc/stock.csv'>this link</a>.
    <p>Create a line chart to visualize the trend of these five listed companies. The line chart should meet following requirements:</p>
    1. Name lines after companies
</font>

In [None]:
#Write your code here




## Scatter Plot
<br>For more details: https://plot.ly/python/reference/#scatter

In [None]:
#create your first scatter plot
x=[1,2,3,4,5]
y=[10,22,34,40,50]
trace1=go.Scatter(x=x,y=y,mode='markers')

py.iplot([trace1],filename='scatter')

In [None]:
#style the markers
x=[1,2,3,4,5]
y=[10,22,34,40,50]
trace1=go.Scatter(x=x,y=y,mode='markers',marker={'size':10,'color':'red'})

py.iplot([trace1],filename='scatter')

In [None]:
#give their names by re-defining "text"
x=[1,2,3,4,5]
y=[10,22,34,40,50]
trace1=go.Scatter(x=x,y=y,mode='markers',text=['a','b','c','d','e'])

py.iplot([trace1],filename='scatter')

In [None]:
#assign different sizes and colors to markers
x=[1,2,3,4,5]
y=[10,22,34,40,50]
trace1=go.Scatter(x=x,y=y,mode='markers',text=['apple','banana','cabbage','duck'],marker={'size':b,'color':a})

py.iplot([trace1],filename='scatter')

In [None]:
#assign color according to values in colorscale
#"Colorscale" options: Greys,YlGnBu,Greens,YlOrRd,Bluered,RdBu,Reds,Blues,Picnic,Rainbow,Portland,Jet,Hot,Blackbody,Earth,Electric,Viridis,Cividis
x=[1,2,3,4,5]
y=[10,22,34,40,50]
trace1=go.Scatter(x=x,y=y,mode='markers',text=['apple','banana','cabbage','duck'],
                  marker={'size':b,'color':a,'colorscale':'Rainbow'})

py.iplot([trace1],filename='scatter')

In [None]:
#try plotting scatters in a 3D space
x=[1,2,3,4,5]
y=[10,22,34,40,50]
z=[2,3,4,5,6]
trace1=go.Scatter3d(x=x,y=y,z=z,mode='markers')

py.iplot([trace1],filename='scatter')

In [None]:
#Change axis titles
x=[1,2,3,4,5]
y=[10,22,34,40,50]
z=[2,3,4,5,6]
trace1=go.Scatter3d(x=x,y=y,z=z,mode='markers')
layout=go.Layout(scene={'xaxis':{'title':'length'},'yaxis':{'title':'width'},'zaxis':{'title':'height'}})
fig=go.Figure([trace1],layout)
py.iplot(fig,filename='scatter')

#### <font style="color: blue">Practice:</font>
---
<font style="color: blue"> <font style="color: blue"> Please download box office data from <a href='https://juniorworld.github.io/python-workshop-2018/doc/movies.csv'>this link</a>.
    <p>Create a 3D scatter plot to visualize these movies. The scatter plot should meet following requirements:</p>
    1. X axis represents "Production Budget"
    2. Y axis represents "Box Office"
    3. Z axis represents "ROI" (Return on Investment)
    4. Size scatters according to their "IMDB Ratings"
    5. Color scatters according to their "Genre"
    6. Name scatters after movies
</font>

In [None]:
colors_=[]
for color in colors:
    if color =='Comedy':
        colors_.extend([1])
    else:
        colors_.extend([len(color)])

In [None]:
#Write your code here





<div class="alert alert-block alert-info">
**<b>Tips</b>** Two tools to better work with colors in Python:
    <br>1. W3S color palette: https://www.w3schools.com/colors/colors_palettes.asp
<br>2. colorlover: https://github.com/jackparmer/colorlover</div>

## Map
We will learn two types of maps: scatter map and filled map. Scatter map is to show scattering points on the geo map while filled map is to show the value of a region by changing its color on the map.
<br>For more details: https://plot.ly/python/reference/#scattermapbox and https://plot.ly/python/reference/#choropleth

### 1. Scatter Map

We will rely on a built-in tool in plot.ly, named "mapbox". Mapbox is an independent IT company focusing on developing GIS-related service. It has connections with plot.ly, IBM, and Google to provide far-reaching and accessible tools in their platforms. In order to use it, you need to apply for its account: https://www.mapbox.com/

In [None]:
mapbox_token='YOUR TOKEN'

Besides, we need to use google map api to search for place's coordinates. So please go to google cloud platform: https://console.cloud.google.com/google/maps-apis and activate Place API.

In [None]:
#install googlemaps library
! pip3 install googlemaps

In [None]:
import googlemaps

place_api='YOUR TOKEN'

In [None]:
client=googlemaps.Client(key=place_api) #create a client variable with your api

In [None]:
univs=client.places('universities in hong kong') #search for some places

In [None]:
type(univs) #look into the search result. It's a dictionary.

In [None]:
univs.keys() #search results are stored with the key of "results"

In [None]:
names=[]
geos=[]
ratings=[]
for i in univs['results']: #go over every university and store its name, geolocation and rating into three blank lists respectively
    names.append(i['name'])
    geos.append(list(i['geometry']['location'].values()))
    ratings.append(i['rating'])

In [None]:
#create a list of Scttermapbox objects. Each object stands for one scatter point on the map.
data=[]
for i in range(len(names)):
    trace=go.Scattermapbox(lat=[geos[i][0]],lon=[geos[i][1]],text=names[i],
                     marker={'size':ratings[i]*2})
    data.append(trace)

In [None]:
#update the layout
layout = go.Layout(
    mapbox={
           'accesstoken':mapbox_token,
           'style':'dark',
           'center':{'lat':geos[0][0],'lon':geos[0][1]},
           'zoom':10
    },
    showlegend=False
)

In [None]:
fig=go.Figure(data,layout)
py.iplot(fig,filename='map')

### 2. Filled Map
Fill regions on the map with certain colors to represent the statistics. This type of map has an academic name of "choropleth map".

In [None]:
import pandas as pd
freedom_table=pd.read_csv('https://juniorworld.github.io/python-workshop-2018/doc/human-freedom-index.csv')

In [None]:
freedom_table.head() #first column, i.e. iso contry code, can be used to create a map.

In [None]:
trace=go.Choropleth(
        locations=freedom_table['ISO_code'],
        z=freedom_table['human freedom'],
        text=freedom_table['countries']
        
)
py.iplot([trace],filename='map')

In [None]:
#change color scale
trace=go.Choropleth(
        locations=freedom_table['ISO_code'],
        z=freedom_table['human freedom'],
        text=freedom_table['countries'],
        colorscale='RdBu'
        
)
py.iplot([trace],filename='map')

In [None]:
#change the map design by redefining line setting in marker parameter
trace=go.Choropleth(
        locations=freedom_table['ISO_code'],
        z=freedom_table['human freedom'],
        text=freedom_table['countries'],
        colorscale='RdBu',
        marker={'line':{'color':'white','width':0.2}}
        
)

py.iplot([trace],filename='map')

In [None]:
#remove coastlines
trace=go.Choropleth(
        locations=freedom_table['ISO_code'],
        z=freedom_table['human freedom'],
        text=freedom_table['countries'],
        colorscale='RdBu',
        marker={'line':{'color':'white','width':0.2}}
        
)
layout=go.Layout(geo={'showcoastlines':False})
fig=go.Figure([trace],layout)
py.iplot(fig,filename='map')

In [None]:
#try other alternative types of projection in the map layout
#Alternative types: 'equirectangular', 'mercator', 'orthographic', 'natural earth', 'kavrayskiy7', 'miller', 'robinson',
#'eckert4', 'azimuthal equal area', 'azimuthal equidistant', 'conic equal area', 'conic conformal', 'conic equidistant', 
#'gnomonic', 'stereographic', 'mollweide', 'hammer', 'transverse mercator', 'albers usa', 'winkel tripel', 'aitoff', 'sinusoidal'
trace=go.Choropleth(
        locations=freedom_table['ISO_code'],
        z=freedom_table['human freedom'],
        text=freedom_table['countries'],
        colorscale='RdBu',
        marker={'line':{'color':'white','width':0.2}}
        
)
layout=go.Layout(geo={'projection':{'type':'orthographic'}})
fig=go.Figure([trace],layout)
py.iplot(fig,filename='map')

#### <font style="color: blue">Practice:</font>
---
<font style="color: blue">Please create a world map representing the GDP values of the countries recorded in freedom_table. The map should meet following requirements:<br>
    1. colorscale = Reds
    2. projection type: natural earth
</font>