# 3. Visualization - extra

## 3.2 Matplotlib
Matplotlib is a Python 2D plotting library which produces publication quality figures. Although Matplotlib is written primarily in pure Python, it makes heavy use of NumPy and other extension code to provide good performance even for large arrays. Matplotlib is probably the most common used visualization library. 

An overview of the possible plots is accessible on the [tutorials website](https://matplotlib.org/tutorials/introductory/sample_plots.html). Although matplotlib is not interactive, there's an alternative that is interactive : https://github.com/matplotlib/ipympl

We will start with the basics concepts being figures, subplots (axes) and axis. The following line of code allows the figures to be plotted in the notebook results

In [None]:
%matplotlib inline

`matplotlib.pyplot` is a collection of command style functions that make matplotlib work like MATLAB. Each pyplot function makes some change to a figure: e.g., creates a figure, creates a plotting area in a figure, plots some lines in a plotting area, decorates the plot with labels, etc. In matplotlib.pyplot various states are preserved across function calls, so that it keeps track of things like the current figure and plotting area, and the plotting functions are directed to the current subplot.

What we first have to do is importing the library of course. 

In [None]:
import matplotlib.pyplot as plt

### 3.2.1 Matplotlib syntax for plots

In [None]:
plt.plot([1, 2, 3, 2.5])
plt.ylabel('some numbers')

`plot()` is a versatile command, and will take an arbitrary number of arguments. For example, to plot x versus y, you can issue the command:

In [None]:
# list ranging from 1 to 9
x_list = [1, 2, 3, 4, 5, 6, 7, 8, 9]
# list with exponential values
y_list = [1, 4, 9, 16, 25, 36, 49, 64, 81]

In [None]:
plt.plot(x_list, y_list)
plt.title("Title of the plot")

Using the pyplot interphase, you build a graph by calling a sequence of functions and all of them are applied to the *current subplot*, like so:

In [None]:
plt.plot([1, 2, 3, 4], [10, 20, 25, 30], color='lightblue', linewidth=3)
plt.scatter([0.3, 3.8, 1.2, 2.5], [11, 25, 9, 26], color='darkgreen', marker='^')
plt.xlim(0.5, 4.5)
plt.title("Title of the plot")
plt.xlabel("This is the x-label")
plt.ylabel("This is the y-label")
# Uncomment the line below to save the figure in your currentdirectory
# plt.savefig('examplefigure.png')

When working with just one subplot in the figure, generally is OK to work with the pyplot interphase, however, when doing more complicated plots, or working within larger scripts, you will want to explicitly pass around the *Subplot (Axes)* and/or *Figure* object to operate upon.


### 3.2.2 Barplots in matplotlib
Barplots are made with the `plt.bar` function:

In [None]:
# height of bars are nucleotide percentages of data/gene.fa: [A_perc, C_perc, G_perc, T_perc]
height = [17.627944760357433, 33.22502030869212, 30.300568643379368, 18.846466287571083]
# Names of bars
bars = ('A','C','G','T')
# making a barplot
plt.bar(bars, height)
# adding layouts: xlabel, ylabel and title. 
plt.xlabel('Nucleotide')
plt.ylabel('Percentage of occurence (%)')
plt.title('Distribution of nucleotides in fasta sequence')

In [None]:
# height of bars are nucleotide percentages of data/gene.fa: [A_perc, C_perc, G_perc, T_perc]
height = [17.627944760357433, 33.22502030869212, 30.300568643379368, 18.846466287571083]
# Names of bars
bars = ('A','C','G','T')
#plt.bar(bars, height, color=('green','red','yellow','blue'))
plt.bar(bars, height, color=('#1f77b4','#ff7f0e','#2ca02c','#d62728'))

plt.xlabel('Nucleotide')
plt.ylabel('Percentage of occurence (%)')
plt.title('Distribution of nucleotides in fasta sequence')

## 3.4 Plot.ly

## 3.5 Bokeh

*Statistical and novel interactive html plots for python.* 

Bokeh creates interactive and non-interactive graphs that can be displayed on modern web browsers. Of course, plots can also be saved as images. Compared to matplotlib, Bokeh is not as mature yet, but shows many interesting features already. (2016)

*Bokeh is a Python interactive visualization library that targets modern web browsers for presentation. Its goal is to provide elegant, concise construction of novel graphics in the style of D3.js, and to extend this capability with high-performance interactivity over very large or streaming datasets. Bokeh can help anyone who would like to quickly and easily create interactive plots, dashboards, and data applications.*


|    Question   | Answer |
| ---      | --- |
| Installed with Python by default  | No |
| Installed with Anaconda   | Possible  |
| How to install it?    | `pip install bokeh`  | 

Here is a simple first example. First we'll import the figure function from bokeh.plotting, which will let us create all sorts of interesting plots easily. We also import the show and ouptut_notebook functions from bokeh.io — these let us display our results inline in the notebook.

In [None]:
from bokeh.plotting import figure
from bokeh.io import output_notebook, show

Next, we'll tell Bokeh to display its plots directly into the notebook. This will cause all of the Javascript and data to be embedded directly into the HTML of the notebook itself. (Bokeh can output straight to HTML files, or use a server, which we'll look at later.)



In [None]:
output_notebook()

Create some data

In [None]:
x = [1, 2, 3, 4, 5]
y = [6, 7, 2, 4, 5]

Simple example. First we have to call Bokeh's figure function to create a plot p. Then call the `line()` method to render a line-plot. 

In [None]:
p = figure(width=200, height=200)
p.line(x, y)
show(p)

We can immediately interact with the plot:

- click-drag will pan the plot around.
- mousewheel will zoom in and out (after enabling in the toolbar)

Now we'll add some lay-outing to the figure:

In [None]:
p = figure(width=300, height=300, title="simple line example", x_axis_label='X-axis', y_axis_label='Y-axis')
p.line(x, y,  line_width=2)
show(p)

In [None]:
import numpy as np

from bokeh.plotting import figure, output_file, show

In [None]:
colormap = {'setosa': 'red', 'versicolor': 'green', 'virginica': 'blue'}
colors = [colormap[x] for x in iris['species']]

p = figure(title = "Iris Morphology")
p.xaxis.axis_label = 'Petal Length'
p.yaxis.axis_label = 'Petal Width'

p.circle(iris["petal_length"], iris["petal_width"],
         color=colors, fill_alpha=0.2, size=10)

output_file("iris.html", title="iris.py example")

show(p)

## Some test with COVID data

Using Bokeh library to map the COVID19 confirmed cases per municipality in Belgium on a map.
https://www.youtube.com/watch?v=P60qokxPPZc

In [None]:
import os
import pandas as pd

from bokeh.io import show, output_file
from bokeh.models import (GMapPlot, GMapOptions, ColumnDataSource, Circle, Range1d, PanTool, WheelZoomTool, BoxSelectTool)

In [None]:
# Read in csv file
df0 = pd.read_csv('data/COVID19BE_CASES_MUNI.csv', encoding='latin-1') # encoding latin-1 otherwise it can't decode byte 0xe9
df = df0[['DATE', 'TX_DESCR_NL', 'CASES']]
df.head()

Change values `'<5'` to an integer value of `1`. It should be better to categorize this data as in all values <5 = 0-4, values betw 5-9, betw 10-14 etc.  

In [None]:
df['CASES'][0]

In [None]:
df.loc[(df.CASES == '<5'),'CASES']='1'
df.head()

In [None]:
print(df['CASES'][0])
print(type(df['CASES'][0]))

In [None]:
df['CASES'] = pd.to_numeric(df['CASES'])
type(df['CASES'][0])

Map options: 
- `lat`: lateral coordinate (found with Google Map Search: Belgium)
- `lng`: longitudinal coordinate (found with Google Map Search: Belgium)
- `map_type`: available options [here](https://developers.google.com/maps/documentation/javascript/reference#MapTypeId)
- `zoom`: trial and error

In [None]:
map_options = GMapOptions(lat=50.4974445, lng=3.451696, map_type='roadmap', zoom=3)

Make an API key in Google. Normally you should export it to your environment. Make sure to keep it safe. More information: [here](https://jupyter-gmaps.readthedocs.io/en/latest/authentication.html)

In [None]:
api_key = 'AIzaSyBX-OC1rtFjgDwjfr_tU_3RBR7Ba1IcS7Y'

In [None]:
plot = GMapPlot(x_range=Range1d(),
                y_range=Range1d(), 
               map_options = map_options,
               api_key = api_key)
plot.add_tools(PanTool(), WheelZoomTool(), BoxSelectTool())

Now we have municipalities, but we need coordinates. We will do this with pgeocode

In [None]:
#pip install pgeocode

In [None]:
import pgeocode

Test that the dataframe is working

In [None]:
nomi = pgeocode.Nominatim('be')
nomi.query_postal_code(["3040", "9000"])

In [None]:
nomi = pgeocode.Nominatim('be')
nomi.query_location('Gent')

In [None]:
df_test = df
df_test
df_test.head()

In [None]:
village = nomi.query_location('Huldenberg')
village

In [None]:
def get_coordinates(municipality, pgeocode_df):
    find pgeocode_df['place_name']

In [None]:
baseline = df['CASES'].min()
scale = 2.5
source = ColumnDataSource( data = dict(lat = df['lat'].tolist(),
                                      lon = df['long'].tolist(),
                                      rad = [(i-baseline) / scale for i in df['CASES'].tolist()]))