# Path

- Paths in Python are strings. You need to quote them.
- Folder = Directory = Location
- File path = Folder path/File_name
  - File path "C:/Users/Yuner ZHU/Downloads/text.txt"
  - Folder path = "C:/Users/Yuner ZHU/Downloads"
  - File name = "text.txt"
- Two special characters that might cause you trouble:
  1. `\`: "Downloads`\`text.txt" -> "Downloads<font color="red">`/`</font>text.txt" or "Downloads<font color="red">``\\``</font>text.txt"
  2. Folder name with spaces needs to be quoted in Terminal: `cd "C:/Users/Yuner ZHU/Downloads"`

- Operating systems use a hierarchical file system structure
- File paths are described using family analogy
  - <font color="red">Root</font> folders
    - MacOS: User root folder
    - Windows: C Drive, D Drive...
      - By default, C Drive will be used in CLI. You can access another drive, e.g. D Drive, using syntax "d:"
      - <img src='https://juniorworld.github.io/python-workshop/img/Ddisk.png' width='180'>
  - <font color="red">Child</font> folders of X refer to folders that are contained within X
    - In CLI: Use "cd folder_name" to change directory to a child folder
  - <font color="red">Parent</font> folders of X refer to folders that are contained within X
    - In CLI: Use "cd .." to change directory to the parent folder
    - A child folder can only have one parent folder. However, a parent folder can have several child folders.
   
## How to find the file path?
- MacOS:
  - Folder path: Right Click on the file or Select File Tab -> Get Info -> Where
- Windows
  - Folder path: Right Click on the file -> Properties -> Location
  - File path: Select Home Tab -> Copy Path

# Functions

<img src='https://juniorworld.github.io/python-workshop/img/week2-presidents.png'>
Presidential inauguration speeches capture the sentiment of the time.

## Practice: Inauguration Speech
Download the dataset from:https://juniorworld.github.io/python-workshop//doc/presidents.rar

<p>Expected Objectives:</p>

1. Total number of sentences in the speech: 
    - End of Sentence Marks: the period (.), the question mark (?), and the exclamation point (!).
2. Total number of words in the speech
3. Average length of sentences
4. Coleman–Liau index of Readablity

<font size="4"> Coleman–Liau index</font>
><b>CLI = 0.0588 &ast; L - 0.296 &ast; S - 15.8</b>
<br>L is the average number of letters per 100 words and S is the average number of sentences per 100 words.<br>
CLI is equivalent to the number of years of education completed by the speaker.

In [None]:
presidents=['Washington','Jefferson','Lincoln','Roosevelt','Kennedy','Nixon','Reagan','Bush','Clinton','W Bush','Obama','Trump']

In [None]:
for president in presidents:
    

# Data Visualization

### Visualization Types

**Basic**
- Pie Chart: Compare Percentages
- Bar Chart: Compare Scores across groups
- Line Chart: Show trend of Scores
- Scatter Plot: Show Relationship between a pair of Scores

**Advanced**
- Map: Show Geo Distribution of data
- Sunburst: A special type of pie chart that shows shares of segements at several levels. Hierarchical data.
- Histogram: A special type of bar chart. Show frequency of values/value range

|Type|Variable Y|Variable X|
|:--:|:--:|:--:|
|Pie Chart|Numbers|None|
|Bar Chart|Numbers|Categories|
|Histogram|Numbers/Category Frequencies|Categories/Value Range|
|Line Chart|Numbers|Time/Date/Period|
|Scatter Plot|Numbers|Numbers|
|Map|Latitude|Longtitude|

<table>
    <tr>
        <td><img src="https://datavizproject.com/wp-content/uploads/types/Pie-Chart.png" width="250">
        </td>
        <td><img src="https://datavizproject.com/wp-content/uploads/types/Bar-Chart-Vertical.png" width="250">
        </td>
        <td><img src="https://datavizproject.com/wp-content/uploads/types/Line-Graph.png" width="250">
        </td>
    </tr>
    <tr>
        <td><img src="https://datavizproject.com/wp-content/uploads/types/Bubble-Chart.png" width="250">
        </td>
        <td><img src="https://datavizproject.com/wp-content/uploads/types/Dot-Density-Map.png" width="250">
        </td>
        <td><img src="https://datavizproject.com/wp-content/uploads/types/Sunburst-Diagram.png" width="250">
        </td>
    </tr>
</table>

### Visualization Components

We are going to use **Plot.ly** for data visualization.<br>
In Plot.ly, figures are created by <font color="red">tree-like</font> data as follows:
<img src="https://juniorworld.github.io/python-workshop/img/plotly_structure.png">
_Attributes that can be directly accessed in plot.ly are shown in bold and italics._

<font size="4"><b> Difference between name, labels, and text</b></font>

Trace (name) > Sector (labels) > Data Point (text)<br>

<img src="https://juniorworld.github.io/python-workshop/img/pie.png" width="250" align="left">

###  Install Plot.ly
If you haven't installed Plot.ly, you need to run the cell below.<br>
Installation is a one-time action. You only need to install a package one time.

In [None]:
! pip3 install plotly

### Plot.ly syntax
1. Use `go.Graph_type(attribute=...)` to create a trace. The commonly used attributes are listed in the tree graph above, including `values`, `x`, `y`, `name`, `labels`, and `text`. For example, to create a pie chart, you should use the following syntax (labels are the group names and values are the group sizes):
>```python
go.Pie(labels = ..., values = ...)
go.Bar(x = ..., y = ...)
go.Scatter(x = ..., y = ...)
```

2. Use `go.Layout(attribute=...)` to change figure layout. Plot.ly has set up a default layout. You don't need to run this function if you are happy with the defaul layout.
>```python
go.Layout("title" = "blablabla", width = ..., height = ...,
             xaxis = {"title": ..., "range": [lower_limit,upper_limit]},
             yaxis = {"title": ..., "range": [lower_limit,upper_limit]})
```

3. Use `go.Figure(data = trace, layout = layout)` to create a new figure.
4. Use `.show()` method to display the figure

To use external libraries, you need to import them every time when you open a new jupyter notebook.

In [None]:
import plotly.graph_objs as go  #go: graph object

***

## Pie Chart
Reference: https://plotly.com/python/reference/pie/

In [None]:
genders = ['Female','Male']
counts = [40,20]

#create a new trace
trace1 = go.Pie(labels = genders, values = counts)

#create a new figure based on the newly created trace
figure1 = go.Figure(data = trace1)

#display the figure
figure1.show()

In [None]:
#Change color setting by re-defining "marker" attribute
#Marker attribute can be assigned with a dictionary
trace2 = go.Pie(labels = genders, values = counts, marker = {'colors':['green','yellow']})
figure2 = go.Figure(data = trace2)
figure2.show()

In [None]:
#Let's have a look at the attributes that can be styled with the marker dictionary
? go.scatter.Marker

In [None]:
#change the canvas size to 400*300 and add a title by re-defining "width" and "height" in "layout"
trace3 = go.Pie(labels=genders, values=counts)
layout3 = go.Layout(width=400,height=500,title='Gender Distribution')
figure3 = go.Figure(data=trace3,layout=layout3)
figure3.show()

#### Practice 1
---
Download the Hong Kong census data about educational attainment from <a href='https://juniorworld.github.io/python-workshop/doc/Hong%20Kong%20Census%20Educational%20Attainment.csv'>https://juniorworld.github.io/python-workshop/doc/Hong%20Kong%20Census%20Educational%20Attainment.csv</a>.
    <p>Create a pie chart to visualize the percentages of different education levels in 2016. The pie chart should meet following requirements:</p>
    1. Title: Education Attainment<br>
    2. Change slice colors

In [None]:
#Write down your code here



***

## Bar Chart
<br>For more details: https://plot.ly/python/reference/#bar

In [None]:
genders = ['Female','Male']
heights = [1.6,1.8] #average height

trace = go.Bar(x=genders,y=heights)
figure = go.Figure(data=trace)
figure.show()

In [None]:
#Change directions
trace = go.Bar(y = genders, x = heights, orientation = 'h')
figure = go.Figure(data = trace)
figure.show()

In [None]:
#Grouped bar chart
genders = ['Female','Male']
height_class1 = [1.6,1.8]
height_class2 = [1.5,1.9]

trace1 = go.Bar(x = genders, y = height_class1, name = 'class1')
trace2 = go.Bar(x = genders, y = height_class2, name = 'class2')

figure_grouped = go.Figure(data = [trace1,trace2])
figure_grouped.show()

Two ways to create a multi-trace graph:
1. In a batch: Combine all traces into a `list` and send the list to `go.Figure(data=[trace1...])` as the data input.
2. Step by step: create an empty figure using `figure=go.Figure()` and then add new trace to the figure one after one, using `figure.add_trace()` 
   - Inside the bracket of `.add_trace()` method, you need to provide a trace created by go.Graph_type(), such as `go.Bar()`, `go.Scatter()`

In [None]:
#An other way to create grouped bar chart
figure_grouped2=go.Figure() #create an empty figure

figure_grouped2.add_trace(go.Bar(x = genders, y = height_class1, name = 'class1'))
figure_grouped2.add_trace(go.Bar(x = genders, y = height_class2, name = 'class2'))

figure_grouped.show()

In [None]:
#Stacked/Relative bar chart by re-defining "barmode" in layout
activities = ['study','entertainment']
mobile = [1.2,4.2]
laptop= [3.5,1.6]

trace1 = go.Bar(x = activities, y = mobile, text = mobile, name = 'mobile')
trace2 = go.Bar(x = activities, y = laptop, text = laptop, name = 'laptop')

layout_stack = go.Layout(barmode = 'stack')
figure_stack = go.Figure(data = [trace1,trace2], layout = layout_stack)

figure_stack.show()

In [None]:
#100% Stacked bar chart by re-defining "barnorm" as "fraction" in layout
layout_stack = go.Layout(barmode = 'stack', barnorm = 'fraction')
figure_stack = go.Figure(data = [trace1,trace2], layout = layout_stack)

figure_stack.show()

In [None]:
#Add percentage marks to all ticks on y axis
layout_stack = go.Layout(barmode = 'stack', barnorm = 'fraction', yaxis = {'tickformat':'0%'})
figure_stack = go.Figure(data = [trace1,trace2], layout = layout_stack)

figure_stack.show()

#### Practice 2
---
 Read "Hong Kong Census Educational Attainment.csv".
    <p>Create a bar chart to visualize the percentages of different education levels in different years, i.e. 2006, 2011 and 2016. The bar chart should meet following requirements:</p>
    1. Each bar represents a year<br>
    2. 100% Stacked bar chart: higher education levels stacked on top of lower ones and the bar's full length is 100%<br>
</font>

In [None]:
#Write down your code here



***

## Scatter Plot
- A scatter plot uses dots to represent values for two different variables, i.e. x and y.
- You need to specify the mode to be either "markers" or "lines" or "markers+lines"
- For more details: https://plot.ly/python/reference/#scatter

In [None]:
#create your first scatter plot
list1 = [1,2,3,4,5]
list2 = [10,22,34,40,50]

trace1 = go.Scatter(x = list1,y = list2,mode = 'markers') #mode='lines','markers','lines+markers'
figure1 = go.Figure(data = trace1)
figure1.show()

In [None]:
#try changing the mode to "lines" and "markers+lines"
trace1 = go.Scatter(x = list1,y = list2,mode = 'markers+lines') #mode='lines','markers','lines+markers'
figure1 = go.Figure(data = trace1)
figure1.show()

In [None]:
#style the markers
trace2 = go.Scatter(x = list1,y = list2, mode = 'markers', 
                    marker = {'color':'red','size':10})

figure2 = go.Figure(data = trace2)
figure2.show()

In [None]:
#assign different sizes and colors to markers
#color values do not need to be categorical colors. you can also provide numbers to set colors
trace3 = go.Scatter(x = list1,y = list2, mode = 'markers', 
                    marker = {'color':list1,'size':list2})

figure3 = go.Figure(data = trace3)
figure3.show()

In [None]:
#Add titles to X and Y axes
layout_2d = go.Layout(xaxis = {"title":"weight"}, yaxis = {"title":"height"})
figure4 = go.Figure(data = trace3, layout = layout_2d)
figure4.show()

In [None]:
#You can also create a 3D scatter plot
list3 = [2,3,4,5,6]

trace5 = go.Scatter3d(x = list1, y = list2, z = list3, mode = 'markers')
figure5 = go.Figure(data = trace5)
figure5.show()

In [None]:
#Change axis titles by referring to "scene" attribute
layout_3d = go.Layout(scene = {'xaxis':{'title':'length'},
                           'yaxis':{'title':'width'},
                           'zaxis':{'title':'height'}})
figure6 = go.Figure(data = trace5, layout = layout_3d)
figure6.show()

#### Practice 3
---
Please download box office data from <a href='https://juniorworld.github.io/python-workshop/doc/movies.csv'>https://juniorworld.github.io/python-workshop/doc/movies.csv</a>.
    <p>Create a 3D scatter plot to visualize these movies. The scatter plot should meet following requirements:</p>
    1. X axis represents "Production Budget"<br>
    2. Y axis represents "Box Office"<br>
    3. Z axis represents "ROI" (Return on Investment)<br>
    4. Size scatters according to their "IMDB Ratings"<br>
    5. Color scatters according to their "Genre"<br>
    6. [Optional] Name scatters after movies

In [None]:
#Write your code here



## Line Chart
In Plot.ly, line chart is defined as **a special scatter plot** whose scatters are connected by lines.
<br>For more details: https://plot.ly/python/reference/#scatter

In [None]:
#create your first line chart
trace1=go.Scatter(x=list1,y=list2,mode='lines') #mode='lines','markers','lines+markers'
figure1=go.Figure(data=trace1)
figure1.show()

In [None]:
#make it a dashed line by re-defining the "dash" parameters in "line"
#Alternative shapes: "solid", "dot", "dash", "longdash", "dashdot", or "longdashdot"
trace2=go.Scatter(x=list1,y=list2,mode='lines',line={'dash':'longdashdot'})
figure2=go.Figure(data=trace2)
figure2.show()

In [None]:
#Use .add_trace() method to display two lines
figure=go.Figure()
figure.add_trace(go.Scatter(x=list1,y=list2,mode='lines'))
figure.add_trace(go.Scatter(x=list1,y=list3,mode='lines'))
figure.show()

# Quiz