<a href="https://colab.research.google.com/github/mscouse/TBS_investment_management/blob/main/PM_labs_part_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1p1Uu3gneOWUBm0do9s_VBygcwImmqSP7?usp=sharing)
# <strong> Investment Management 1</strong>
---
#<strong> Part 3: Data visualisation libraries.</strong>

In the course repository on GitHub, you will find several introductory Colab notebooks covering the following topics:

**Part 1: Introduction to Python and Google Colab notebooks.**

**Part 2: Getting started with Colab notebooks & basic features.**

**Part 3: Data visualisation libraries (CURRENT NOTEBOOK).**

**Part 4: Data sources & data collection in Python.**

**Part 5: Basic financial calculations in python.**


The notebooks have been designed to help you get started with Python and Google Colab. See the **“1_labs_introduction”** folder for more information. Each notebook contains all necessary libraries and references to the required subsets of data.

# <strong>Data visualisation libraries</strong>

When going through the <a href="https://pypi.org/">Python Package Index </a>, you will come across tens of data visualisation libraries for almost any discipline — from `pastalog` for realtime visualisations neural network training to `GazeParser` for eye movement research. Inevitably, Some of these libraries are more focused than others.

This section provides an overview of the most widely used Python data visualisation library, <a href="https://matplotlib.org/">`Matplotlib`</a>. Other popular interdisciplinary Python  visualisation libraries are <a href="https://seaborn.pydata.org/">`Seaborn`</a> and <a href="https://plotly.com/python/">`Plotly`</a>.

## 1. Matplotlib setup

Matplotlib is one of the most popular, and certainly the most widely used, multi-platform data visualisation library built on NumPy arrays in Python. It is used to generate simple yet powerful visualisations with just a few lines of code. It can be used in both interactive and non-interactive scripts. For inspiration, see <a href="https://matplotlib.org/gallery.html#statistics">examples</a> of Matplotlib visualisations. For more information on advanced plotting features, see <a href="https://python4astronomers.github.io/plotting/advanced.htm">this</a> guide. 

As Matplotlib was the first widely available Python data visualisation library, many other libraries are built on top of it or designed to work with it. For instance, the plotting functions of `pandas` and `Seaborn` are essentially wrappers around the `Matplotlib` library.

**Step 0 (optional): plot embedding**

In the notebook, you have the option of embedding visualisations directly in the notebook by running the following command (it needs to be done only once per kernel/session):
```
%matplotlib inline
```

After running this command, any cell within the notebook that creates a plot will embed a PNG image of the resulting visualisations.

To enable the Colab extension that renders `pandas` dataframes into interactive tables, run:
```
%load_ext google.colab.data_table
```

This also needs to be done only once per kernel/session.

In [None]:
%matplotlib inline
%load_ext google.colab.data_table

**Step 1: importing libraries**

To get the visualisation code working, we need to import the required libraries into the environment.

In [None]:
# importing required librarires and modifying their names 
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

**Step 2 (optional): setting style**

The `style` package adds support for easy-to-switch plotting "styles" so that you can customise the look of your figures. There are a number of <a href="https://github.com/matplotlib/matplotlib/tree/master/lib/matplotlib/mpl-data/stylelib">pre-defined styles</a> provided by Matplotlib. In this example, we set the `classic` style. This ensures the plots we create use the classic Matplotlib style.

In [None]:
# Setting style
plt.style.use('classic')

# To list all available styles, use:
print(plt.style.available)

**Step 3: selecting the inputs**

Matplotlib plotting functions expect a `numpy` array as input. Classes that are ‘array-like’ such as `pandas` data objects and `np.matrix` may or may not work with Matplotlib functions as intended. It is best to convert these to `np.array` objects prior to plotting. Another widely use plotting library, `Seaborn`, expect data to be provided as `pandas` dataframes.

In [None]:
# Here we use the "linspace(START, STOP, NUM)" method to generate an array (x)
# of NUM evenly spaced numbers over a specified (START to STOP) interval.

x = np.linspace(0, 2, 100)
y = np.linspace(1, 2, 100)
x, y

**Step 4: plotting the data and saving the figure**

The `pyplot` module from matplotlib we imported earlier is the stateful interface of matplotlib. Almost all functions in that module, such as `plt.plot()`, would either apply to an existing current figure (and its axes), or create them anew if none exist. According to the matplotlib docs:
> "...with pyplot...simple functions are used to add plot elements (lines, images, text, etc.) to the current axes in the current figure".

For more information on creating multiple subplots using matplotlib, see <a href="https://matplotlib.org/devdocs/gallery/subplots_axes_and_figures/subplots_demo.html">this</a> guide.

In [None]:
# plotting y versus x using the "matplotlib.pyplot.plot(x, x, label="label name")"
# function. "matplotlib.pyplot" in our case is shortened "plt" - see step 1.
plt.plot(x, x, label='linear')
plt.plot(x, x**2, label='quadratic')
plt.plot(x, x**3, label='cubic')

# naming the chart axes
plt.xlabel('x label')
plt.ylabel('y label')

# naming the chart
plt.title("Simple Plot")

# displaying the legend
plt.legend()

'''
Matplotlib allows saving figures in a wide range of formats. Saving a figure can
be done using the `savefig()` command. You can also adjust the resolution (dpi)
of your saved figure. To save the figure as a PNG file, you can use the script below.
On the left side of colab interface, there is a "Files/ Folder" tab.
You can find all the files you saved there.
'''
plt.savefig('sample_figure.png', dpi=300)

# displaying the chart (optional - see step 0). "plt.show()" will display the
# current figure that you are working on
plt.show()

## 2. Popular matplotlib charts

For inspiration, see <a href="https://matplotlib.org/gallery.html#statistics">examples</a> of Matplotlib visualisations.

**Line graph**

A line graph is commonly used to show trends over time - for example, historical stock prices. 

In [None]:
# importing required librarires
import matplotlib.pyplot as plt

x1  = [1, 2, 3, 4, 5, 6, 7, 8, 9]
y1 = [1, 3, 5, 3, 1, 3, 5, 3, 1]
y2 = [2, 4, 6, 4, 2, 4, 6, 4, 2]
plt.plot(x1, y1, label="line L")
plt.plot(x1, y2, label="line H")

plt.xlabel("x axis")
plt.ylabel("y axis")
plt.title("Line Graph")

plt.show()

**Bar chart**

Bar graphs, also known as column charts, use vertical or horizontal bars to represent data along both an x-axis and a y-axis visually. Each bar represents one value.

In [None]:
import matplotlib.pyplot as plt

# Look at index 4 and 6, which demonstrate overlapping cases.
x2 = [1, 3, 4, 5, 6, 7, 9]
y3 = [4, 7, 2, 4, 7, 8, 3]

x3 = [2, 4, 6, 8, 10]
y4 = [5, 6, 2, 6, 2]

# Colors: https://matplotlib.org/api/colors_api.html
# try changing the value of "alpha=" parameter 
plt.bar(x2, y3, label="Blue Bar", color='b')
plt.bar(x3, y4, label="Green Bar", color='g', alpha=0.4)

plt.xlabel("bar number")
plt.ylabel("bar height")
plt.title("Bar Chart")
plt.legend()

plt.show()

**Histogram**

We use histograms to summarise discrete or continuous data, such as stock returns. Histograms provide a visual interpretation of numerical data by showing the number of data points that fall within a specified range of values - these are typically called "bins". 

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Use numpy to generate an array of random data points.
# np.random.randn(A,B) - this function creates a 1D array (B=1) and
# fills it with A (A=1000) random values as per standard normal distribution.
data = np.random.randn(1000,1)

# specify the required number of bins by changing the value of the "bins=" parameter
plt.hist(data, bins=20, color='g', alpha=0.4)
plt.title("Histogram", color='r')
plt.show()

**Scatter plot chart**

In scatter plot charts, the values of two variables are plotted along the two axes. The pattern of the resulting points may reveal a correlation/association between the two variables (e.g. stock and market portfolio returns).

In [None]:
import matplotlib.pyplot as plt

# use numpy to generate 1D arrays of random data points (data2 and data3)
data2 = np.random.randn(1000,1)
data3 = np.random.randn(1000,1)

# for more information on the design of markers, see https://matplotlib.org/api/markers_api.html
plt.scatter(data2, data3, marker='*', color='r')

plt.title('Scatter Plot')
plt.show()

**Pie chart**

Circular statistical graphic divided into slices to show percentage or proportional data. The percentage represented by each category is typically provided next to the corresponding slice of pie.

In [None]:
import matplotlib.pyplot as plt

# introduce category names, values, and section colours to use
labels = 'Label_1', 'Label_2', 'Label_3'
category_values = [56, 66, 24]
colors = ['g', 'r', 'y']

# use the ".pie" method to plot the pie chart 
'''The default startangle is 0, which would start the "Label_1" slice on the
positive x-axis. This example sets startangle = 90 such that everything is
rotated counter-clockwise by 90 degrees, and the "Label_1 slice starts on
the positive y-axis.'''
plt.pie(category_values, labels=labels, colors=colors,
        startangle=90,
        explode = (0, 0.1, 0), # Try commenting this out.
        autopct = '%1.2f%%')

plt.axis('equal') # Try commenting this out.
plt.title('Pie Chart Example')

plt.show()

**Wireframe 3D Plot**

Wireframe plots are used to graphically represent skeletal sketches of functions defined over a rectangular grid.

In [None]:
# importing required librarires
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import axes3d

# create and instantiate the figure object "fig"
fig = plt.figure()

# create an axes object in the figure
ax = fig.add_subplot(111, projection = '3d')

x, y, z = axes3d.get_test_data()
ax.plot_wireframe(x, y, z, rstride = 2, cstride = 2)

plt.title("Wireframe Plot Example")
plt.tight_layout()
plt.show()

**Regression line plot**

Regression plot is a scatterplot graph with a  regression line fit to it. We'll use the `Seaborn` data visualisation library and its `.regplot` method to draw a regression line plot. For more information and examples, see <a href="https://seaborn.pydata.org/generated/seaborn.regplot.html">here</a>.

In [None]:
# importing required librarires
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

# generate  random data points
stock = np.random.randn(100,1)
market = np.random.randn(100,1)

# plot the regression line; "ci" stands for confidence interval and can be changed
sns.regplot(x=market, y=stock, color="r", marker="+", ci=95)
plt.show()

**Heatmap**

Graphical representation of data where the individual values contained in a matrix are represented as colors. Useful for presenting correlations.

In [None]:
# importing required librarires
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

# create a 10 x 10 heatmap with random data
side_length = 10

# we start with a 10 x 10 matrix of random values
data = np.random.randn(side_length, side_length)

# the next two lines of code make the values larger as we get closer to (9, 9)
# += simply adds a value and the variable and assigns the result to that variable
data += np.arange(side_length)
data += np.reshape(np.arange(side_length), (side_length, 1))

# Generate the heatmap
sns.heatmap(data, annot=True) # remove the "annot=True" parameter and rerun
plt.show()

**Bubble chart**

The simplest way to plot a bubble chart is to use the `scatterplot` method of `matplotlib`. However, we will use the ‘s‘ argument to map a third numerical variable to the size of the marker. This will allow us to change sizes of individual point on the same graphic.

In [None]:
# import required libraries
import matplotlib.pyplot as plt
import numpy as np
 
# generate random data
x = np.random.rand(40)
y = np.random.rand(40)
z = np.random.rand(40)
 
# use the scatter function
plt.scatter(x, y, s=z*1000, alpha=0.5)
plt.show()