## Data Visualization

### Visualization Types

**Basic**
- Pie Chart: Compare Percentages
- Bar Chart: Compare Scores across groups
- Line Chart: Show trend of Scores
- Scatter Plot: Show Relationship between a pair of Scores

**Advanced**
- Map: Show Geo Distribution of data
- Sunburst: A special type of pie chart that shows shares of segements at several levels. Hierarchical data.
- Histogram: A special type of bar chart. Show frequency of values/value range

|Type|Variable Y|Variable X|
|:--:|:--:|:--:|
|Pie Chart|Numbers|None|
|Bar Chart|Numbers|Categories|
|Histogram|Numbers/Category Frequencies|Categories/Value Range|
|Line Chart|Numbers|Time/Date/Period|
|Scatter Plot|Numbers|Numbers|
|Map|Latitude|Longtitude|

<table>
    <tr>
        <td><img src="https://datavizproject.com/wp-content/uploads/types/Pie-Chart.png" width="250">
        </td>
        <td><img src="https://datavizproject.com/wp-content/uploads/types/Bar-Chart-Vertical.png" width="250">
        </td>
        <td><img src="https://datavizproject.com/wp-content/uploads/types/Line-Graph.png" width="250">
        </td>
    </tr>
    <tr>
        <td><img src="https://datavizproject.com/wp-content/uploads/types/Bubble-Chart.png" width="250">
        </td>
        <td><img src="https://datavizproject.com/wp-content/uploads/types/Dot-Density-Map.png" width="250">
        </td>
        <td><img src="https://datavizproject.com/wp-content/uploads/types/Sunburst-Diagram.png" width="250">
        </td>
    </tr>
</table>

### Visualization Tool

We are going to use **Plot.ly** for data visualization.<br>
In Plot.ly, figures are created by <font color="red">tree-like</font> data as follows:
<img src="https://juniorworld.github.io/python-workshop/img/plotly_structure.png">
_Attributes that can be directly accessed in plot.ly are shown in bold and italics._

#### Difference between name, labels, and text

Trace (name) > Sector (labels) > Data Point (text)<br>

<img src="https://juniorworld.github.io/python-workshop/img/pie.png" width="250" align="left">

###  Install Plot.ly
If you haven't installed Plot.ly, you need to run the cell below.<br>
Installation is a one-time action. You only need to install a package one time.

In [None]:
! pip3 install plotly

### Plot.ly syntax
1. Use `go.Graph_type(attribute=...)` to create a trace. The commonly used attributes are listed in the tree graph above, including `values`, `x`, `y`, `name`, `labels`, and `text`. For example, to create a pie chart, you should use the following syntax (labels are the group names and values are the group sizes):
>```python
go.Pie(labels = ..., values = ...)
go.Bar(x = ..., y = ...)
go.Scatter(x = ..., y = ...)
```

2. Use `go.Layout(attribute=...)` to change figure layout. Plot.ly has set up a default layout. You don't need to run this function if you are happy with the defaul layout.
>```python
go.Layout("title" = "blablabla", width = ..., height = ...,
             xaxis = {"title": ..., "range": [lower_limit,upper_limit]},
             yaxis = {"title": ..., "range": [lower_limit,upper_limit]})
```

3. Use `go.Figure(data = trace, layout = layout)` to create a new figure.
4. Use `.show()` method to display the figure

### Import Plot.ly
To use external libraries, you need to import them every time when you open a new jupyter notebook.

In [None]:
import plotly.graph_objs as go  #go: graph object

***

## Pie Chart
Reference: https://plotly.com/python/reference/pie/

In [None]:
genders = ['Female','Male']
counts = [40,20]

#create a new trace
trace1 = go.Pie(labels=genders, values=counts)

#create a new figure based on the newly created trace
figure1=go.Figure(data=trace1)

#display the figure
figure1.show()

In [None]:
#change data labels by defining attribute "text"
trace2 = go.Pie(text=genders, values=counts)
figure2=go.Figure(data=trace2)
figure2.show()

In [None]:
#change color setting by re-defining "marker" attribute
#marker attribute can be assigned with a dictionary
trace3 = go.Pie(labels=genders, values=counts, marker={'colors':['green','yellow']})
figure3 = go.Figure(data=trace3)
figure3.show()

In [None]:
#Let's have a look at the attributes that can be styled with the marker dictionary
? go.scatter.Marker

In [None]:
#turn the pie chart into a donut by re-defining "hole" parameter
trace4 = go.Pie(labels=genders, values=counts, hole=0.2, marker={'colors':['red','blue']})
figure4 = go.Figure(data=trace4)
figure4.show()

In [None]:
#change the canvas size to 400*300 and add a title by re-defining "width" and "height" in "layout"
trace5 = go.Pie(labels=genders, values=counts)
layout1 = go.Layout(width=400,height=300,title='Gender Distribution')
figure5 = go.Figure(data=trace5,layout=layout1)
figure5.show()

#### <font style="color: blue">Exercise 1:</font>
---
<font style="color: blue"> Download the Hong Kong census data about educational attainment from <a href='https://juniorworld.github.io/python-workshop/doc/Hong%20Kong%20Census%20Educational%20Attainment.csv'>https://juniorworld.github.io/python-workshop/doc/Hong%20Kong%20Census%20Educational%20Attainment.csv</a>.
    <p>Create a pie chart to visualize the percentages of different education levels in 2016. The pie chart should meet following requirements:</p>
    1. Donut style<br>
    2. Change slice colors
</font>

In [None]:
#Write down your code here




#### <font style="color: blue">Exercise 2:</font>
---
<font style="color: blue">Read the "stack-overflow-developer-survey-2022-first1000.csv" that you have downloaded in the previous class. If you haven't downloaded it, you can do so by clicking into this link: https://juniorworld.github.io/python-workshop/doc/stack-overflow-developer-survey-2022-first1000.csv<br>
    Create a pie chart to show the percentages of respondents with different genders in this survey. Change the canvas size to 1000*600 and change its title to be "Gender Distribution of Survey Respondents".
</font>

In [None]:
#Write down your code here



In [None]:
#Show the percentages of Men and Other Genders


***

## Bar Chart
<br>For more details: https://plot.ly/python/reference/#bar

In [None]:
genders = ['Female','Male']
heights = [1.6,1.8] #average height

trace = go.Bar(x=genders,y=heights)
figure = go.Figure(data=trace)
figure.show()

In [None]:
#Grouped bar chart
genders = ['Female','Male']
height_class1 = [1.6,1.8]
height_class2 = [1.5,1.9]

trace1 = go.Bar(x=genders,y=height_class1,name='class1')
trace2 = go.Bar(x=genders,y=height_class2,name='class2')

figure_grouped = go.Figure(data=[trace1,trace2])
figure_grouped.show()

Two ways to create a multi-trace graph:
1. In a batch: Combine all traces into a `list` and send the list to `go.Figure()` as the data input.
2. Step by step: create an empty figure using `figure=go.Figure()` and then add new trace to the figure one after one, using `figure.add_trace()` 
   - Inside the bracket of `.add_trace()` method, you need to provide a trace created by go.Graph_type(), such as `go.Bar()`, `go.Scatter()`

In [None]:
#An other way to create grouped bar chart
genders = ['Female','Male']
height_class1 = [1.6,1.8]
height_class2 = [1.5,1.9]

figure_grouped2=go.Figure() #create an empty figure

figure_grouped2.add_trace(go.Bar(x=genders,y=height_class1,name='class1'))
figure_grouped2.add_trace(go.Bar(x=genders,y=height_class2,name='class2'))

figure_grouped.show()

In [None]:
#Stacked/Relative bar chart by re-defining "barmode" in layout
activities = ['study','entertainment']
mobile = [1.2,4.2]
laptop= [3.5,1.6]

trace1 = go.Bar(x=activities,y=mobile,name='mobile')
trace2 = go.Bar(x=activities,y=laptop,name='laptop')

layout_stack = go.Layout(barmode='stack')
figure_stack = go.Figure(data=[trace1,trace2],layout=layout_stack)

figure_stack.show()

In [None]:
#How can I display the time spent on each activity as the data labels on the above graph?



In [None]:
#100% Stacked bar chart by re-defining "barnorm" as "fraction" in layout
layout_stack = go.Layout(barmode='stack',barnorm='fraction')
figure_stack = go.Figure(data=[trace1,trace2],layout=layout_stack)

figure_stack.show()

In [None]:
#Add percentage marks to all ticks on y axis
layout_stack = go.Layout(barmode='stack',barnorm='fraction',yaxis={'tickformat':'0%'})
figure_stack = go.Figure(data=[trace1,trace2],layout=layout_stack)

figure_stack.show()

#### Exercise:
---
 Read "Hong Kong Census Educational Attainment.csv".
    <p>Create a bar chart to visualize the percentages of different education levels in different years, i.e. 2006, 2011 and 2016. The bar chart should meet following requirements:</p>
    1. Each bar represents a year<br>
    2. 100% Stacked bar chart: higher education levels stacked on top of lower ones and the bar's full length is 100%<br>
</font>

In [None]:
#Write down your code here



***

## Break

***

## Scatter Plot
- A scatter plot uses dots to represent values for two different variables, i.e. x and y.
- You need to specify the mode to be either "markers" or "lines" or "markers+lines"
- For more details: https://plot.ly/python/reference/#scatter

In [None]:
#create your first scatter plot
list1=[1,2,3,4,5]
list2=[10,22,34,40,50]

trace1=go.Scatter(x=list1,y=list2,mode='markers') #mode='lines','markers','lines+markers'
figure1=go.Figure(data=trace1)
figure1.show()

In [None]:
#try changing the mode to "lines" and "markers+lines"



In [None]:
#style the markers
trace2=go.Scatter(x=list1,y=list2,mode='markers',marker={'color':'red','size':10})
figure2=go.Figure(data=trace2)
figure2.show()

In [None]:
#assign different sizes and colors to markers
#color values do not need to be categorical colors. you can also provide numbers to set colors
trace3=go.Scatter(x=list1,y=list2,mode='markers',marker={'color':list1,'size':list2})
figure3=go.Figure(data=trace3)
figure3.show()

In [None]:
#Add titles to X and Y axes
layout_2d=go.Layout(xaxis={"title":"weight"},yaxis={"title":"height"})
figure4=go.Figure(data=trace3,layout=layout_2d)
figure4.show()

In [None]:
#You can also create a 3D scatter plot
list1=[1,2,3,4,5]
list2=[10,22,34,40,50]
list3=[2,3,4,5,6]

trace5=go.Scatter3d(x=list1,y=list2,z=list3,mode='markers')
figure5=go.Figure(data=trace5)
figure5.show()

In [None]:
#Change axis titles by referring to "scene" attribute
layout_3d=go.Layout(scene={'xaxis':{'title':'length'},
                           'yaxis':{'title':'width'},
                           'zaxis':{'title':'height'}})
figure6=go.Figure(data=trace5,layout=layout_3d)
figure6.show()

#### <font style="color: blue">Exercise:</font>
---
<font style="color: blue"> <font style="color: blue"> Please download box office data from <a href='https://juniorworld.github.io/python-workshop/doc/movies.csv'>https://juniorworld.github.io/python-workshop/doc/movies.csv</a>.
    <p>Create a 3D scatter plot to visualize these movies. The scatter plot should meet following requirements:</p>
    1. X axis represents "Production Budget"<br>
    2. Y axis represents "Box Office"<br>
    3. Z axis represents "ROI" (Return on Investment)<br>
    4. Size scatters according to their "IMDB Ratings"<br>
    5. Color scatters according to their "Genre"<br>
    6. [Optional] Name scatters after movies
</font>

In [None]:
#Write your code here



## Line Chart
In Plot.ly, line chart is defined as **a special scatter plot** whose scatters are connected by lines.
<br>For more details: https://plot.ly/python/reference/#scatter

In [None]:
#create your first line chart
trace1=go.Scatter(x=list1,y=list2,mode='lines') #mode='lines','markers','lines+markers'
figure1=go.Figure(data=trace1)
figure1.show()

In [None]:
#make it a dashed line by re-defining the "dash" parameters in "line"
#Alternative shapes: "solid", "dot", "dash", "longdash", "dashdot", or "longdashdot"
trace2=go.Scatter(x=list1,y=list2,mode='lines',line={'dash':'dash'})
figure2=go.Figure(data=trace2)
figure2.show()

In [None]:
#fill the area between line and X axis
list1=[1,2,3]
list3=[-10,22,34]
trace3=go.Scatter(x=list1,y=list3,mode='lines',fill='tozeroy')
figure3=go.Figure(data=trace3)
figure3.show()

In [None]:
#display two lines
#line1: x=list1, y=list4
#line2: x=list1, y=list5
list1=[1,2,3]
list4=[10,22,34]
list5=[34,22,10]



#### <font style="color: blue">Exercise:</font>
---
<font style="color: blue"> <font style="color: blue"> Please download stock price data from <a href='https://juniorworld.github.io/python-workshop/doc/stock.csv'>https://juniorworld.github.io/python-workshop/doc/stock.csv</a>.<br>
    Create a figure with five lines showing the trends of the five listed companies. Name each line after the associated companies.
</font>

In [None]:
#Write your code here




***

## Break

***

## Map
We will learn two types of maps: scatter map and filled map. Scatter map is to show scattering points on the geo map while filled map is to show the value of a region by changing its color on the map.
<br>

Plotly supports two different kinds of maps:

- Mapbox maps are tile-based maps. There are a variety of fancy maps that you can project your data on.
- Geo maps are outline-based maps. Basic maps.

For more details: https://plot.ly/python/reference/#scattermapbox and https://plot.ly/python/reference/#choropleth

### 1. Scatter Map

We will rely on a built-in tool in plot.ly, named "mapbox". Mapbox is an independent IT company focusing on developing GIS-related service. It has connections with plot.ly, IBM, and Google to provide far-reaching and accessible tools in their platforms. To use its service, you need to register for a free account at https://mapbox.com/ and obtain a Mapbox Access token.

Please read this table: <a href='https://juniorworld.github.io/python-workshop/doc/china_cities.csv'>https://juniorworld.github.io/python-workshop/doc/china_cities.csv</a><br>
This table contains 170 cities with more than 3 million residents.

In [None]:
mapbox_token='fill in your token here'

In [None]:
city_table=pd.read_csv("https://juniorworld.github.io/python-workshop/doc/china_cities.csv")

In [None]:
trace=go.Scattermapbox(lat=city_table['lat'],
                       lon=city_table['lng'],
                       text=city_table['city'],
                       marker={'size':city_table['population']})

layout_mapbox = go.Layout(
    mapbox={
           'accesstoken':mapbox_token,
           'style':'dark', #basic, streets, outdoors, light, dark, satellite, satellite-streets
           'center':{'lat':city_table['lat'].iloc[0],'lon':city_table['lng'].iloc[0]},
           'zoom':3
    },
    showlegend=False
)
figure=go.Figure(data=trace,layout=layout_mapbox)
figure.show()

#### Exercise
Read this table of HK bus stops: <a href='https://juniorworld.github.io/python-workshop/doc/HK_Bus_Stops.csv'>https://juniorworld.github.io/python-workshop/doc/HK_Bus_Stops.csv</a><br>
1. Create a Mapbox scatter plot of HK bus stops. 
2. Resize the dots to one-fifth of their bus fares, e.g. if the bus fare is 20, the dot size is 4
3. Center the map on the **11th** stop in the table.
4. Change the zoom ratio to 10

In [None]:
#Write your code here



### 2. Filled Map
Fill regions on the map with certain colors to represent the statistics. This type of map has an academic name of "choropleth map".<br>
Please read the human freedom index table from: https://juniorworld.github.io/python-workshop/doc/human-freedom-index.csv

In [None]:
freedom_table=pd.read_csv('https://juniorworld.github.io/python-workshop/doc/human-freedom-index.csv')

In [None]:
freedom_table.head() #first column, i.e. iso contry code, can be used to create a map.

In [None]:
trace=go.Choropleth(
        locations=freedom_table['ISO_code'],
        z=freedom_table['human freedom'],
        text=freedom_table['countries']
        
)
figure=go.Figure(data=trace)
figure.show()

In [None]:
#change color scale
trace=go.Choropleth(
        locations=freedom_table['ISO_code'],
        z=freedom_table['human freedom'],
        text=freedom_table['countries'],
        colorscale='RdBu'
        
)
figure=go.Figure(data=trace)
figure.show()

In [None]:
#try other alternative types of projection in the map layout
#Alternative types: 'equirectangular', 'mercator', 'orthographic', 'natural earth', 'kavrayskiy7', 'miller', 'robinson',
#'eckert4', 'azimuthal equal area', 'azimuthal equidistant', 'conic equal area', 'conic conformal', 'conic equidistant', 
#'gnomonic', 'stereographic', 'mollweide', 'hammer', 'transverse mercator', 'albers usa', 'winkel tripel', 'aitoff', 'sinusoidal'
trace=go.Choropleth(
        locations=freedom_table['ISO_code'],
        z=freedom_table['human freedom'],
        text=freedom_table['countries'],
        colorscale='RdBu',
        marker={'line':{'color':'white','width':0.2}}
        
)
layout_alternative=go.Layout(geo={'projection':{'type':'orthographic'}})
figure=go.Figure(data=trace,layout=layout_alternative)
figure.show()

#### <font style="color: blue">Practice:</font>
---
<font style="color: blue">Please create a world map representing the GDP values of the countries recorded in freedom_table. The map should meet following requirements:<br>
    1. colorscale = Reds<br>
    2. projection type: natural earth
</font>

In [None]:
#Write your code here





## Histogram
Histogram is a special type of bar chart where one's y value is its count. It is used to show data distribution: viusalize the skewness and central tendency.
<br>For more details: https://plot.ly/python/reference/#histogram

In [None]:
a=[1,2,3,3,4,4,4,5,5,6,7,3,3,2]
trace=go.Histogram(x=a)
figure=go.Figure(data=trace)
figure.show()

In [None]:
#Change the bins by re-defining "size" parameter in xbins
trace2=go.Histogram(x=a,xbins={'size':1})
figure2=go.Figure(data=trace2)
figure2.show()

In [None]:
#Convert into a 100% Histogram whose y value is percentage of getting a value
#Re-define the "histnorm" to a "percent" mode
trace3=go.Histogram(x=a,xbins={'size':1},histnorm='probability')
layout_percent=go.Layout(yaxis={'tickformat':'0%'})
figure=go.Figure(data=trace3,layout=layout_percent)
figure.show()

In [None]:
#Decrease every element in "a" by one unit to create a new list "b"
#Grouped Histogram
a=[1,2,3,3,4,4,4,5,5,6,7,3,3,2]
b=[0,1,2,2,3,3,3,4,4,5,6,2,2,1]

trace1=go.Histogram(x=a,xbins={'size':1})
trace2=go.Histogram(x=b,xbins={'size':1})

figure=go.Figure(data=[trace1,trace2])
figure.show()

In [None]:
#Overlay Histogram of a and b
#Increase the transparency by re-defining "opacity" parameter
#Change color by re-defining "color" parameter in "marker"
#Change the value of "barmode" parameter in layout to "overlay"

trace1=go.Histogram(x=a,xbins={'size':1},opacity=0.5,marker={'color':'blue'})
trace2=go.Histogram(x=b,xbins={'size':1},opacity=0.5,marker={'color':'red'})

layout=go.Layout(barmode='overlay')

figure=go.Figure(data=[trace1,trace2])
figure.show()

#### Exercise:
---
Please download YouTube Popularity data from <a href='https://juniorworld.github.io/python-workshop/doc/Youtube.csv'>https://juniorworld.github.io/python-workshop/doc/Youtube.csv</a>.<br>
Create three Histograms to visualize the distribution of views, likes, dislikes and comments. The histograms should meet following requirements:<br>
    1. One basic histogram to show distribution of "views"<br>
    2. One basic histogram to show distribution of "log(views)"<br>
    3. One 100% overlay histogram to show distributions of log(likes), log(dislikes) and log(comments)<br>

_Hint: to apply logarithmic transformation, you can use numpy's log10 function. For example: to calcualte the logrithm of a variable "a"._
>```python
import numpy as np
a=np.log10(a)
```

In [None]:
#Write your code here



<div class="alert alert-block alert-info">
**<b>Tips</b>** Two tools to better work with colors in Python:
    <br>1. W3S color palette: https://www.w3schools.com/colors/colors_palettes.asp
<br>2. colorlover: https://github.com/jackparmer/colorlover</div>