# Python and Data Visualization - Part 3

**Goal:** The goal of this project is to construct scatter plot and line charts in Python using Bokeh.

**Description:** We'll work on a few charts using both quantitative and time-series data. We will start by plotting simple mathematical functions to learn the basics. We'll then move to plotting time-series financial data from the stock market.

## 3A: Basic Scatter/Line Charts

### Data Preparation

When dealing with a line or scatter plot, we typically have two pieces of data we want to visualize.
 - *X-axis*: Also called the domain or independent variable, the x-axis represents the output
 - *Y-axis*: Also called the range or dependent variable, the y-axis represents the output
 
 
Together, these pieces of data represent a function. For the following pairs of data, which would be the x and y axes? Why?
 - Quantity and Price
 - Age and Height
 - Date and Temperature
 
 
 The following lines of code illustrate the concept of inputs, outputs, and functions.

In [1]:
upper_bound = 21
# Create an array called x, contiaining the values 0 to upper_bound-1
x = [i for i in range(upper_bound)]
print(x)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]


In [2]:
# Create an array called y, following the function y=x^2
y = [num**2 for num in x] # Apply num**2 (num^2) for every num in x, and return as list
print(y)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256, 289, 324, 361, 400]


We haven't plotted it yet, but we now have the data to represent a quadratic function.

### Bokeh Setup

Similar to the last workshop, we will be using the Bokeh library. We need to `import` it here.

In [3]:
from bokeh.plotting import figure, show 
from bokeh.io import output_notebook

output_notebook() # Tells Python to present Bokeh plots within the notebook itself

### Plotting the Data

Now lets make our plot. We first need to create an empty figure.

In [4]:
scatter = figure(title="Quadratic Function", x_axis_label="X-axis Data", y_axis_label="Y-axis Data")

Next, we plot our x and y data. Bokeh plots *glyphs* on empty figures to create the graph. In this case, we want circles for our scatter plot.

In [5]:
scatter.circle(x=x, y=y) # X data should be our x array, y data should be our y array
show(scatter)            # Display the graph

### More Fun with Functions

We have successfully plotted our quadratic function. Let's try changing our function and creating a new chart.

In [6]:
y = [(-i**2 + 2*i + 4) for i in x] # Sets our y-axis data to the formula -i^2 + 2*i + 4

scatter2 = figure(title="Quadratic Function", x_axis_label="X-axis Data", y_axis_label="Y-axis Data")
scatter2.circle(x=x, y=y)
show(scatter2) # Displays our empty figure

To plot multiple series of data, we can simply add another `circle` glyph to our chart. The `color` specifies the color of each series, and `legend_label` allows us to create a legend for our figure.

In [7]:
y_squared = [i**2 for i in x]
y_squared_negative = [-i**2 for i in x]

scatter3 = figure(title="Plotting Multiple Series", x_axis_label="X-axis Data", y_axis_label="Y-axis Data") # Create an empty figure
scatter3.circle(x=x, y=y_squared, color="#d83939", legend_label="y = x^2") # Plot the red series of dots
scatter3.circle(x=x, y=y_squared_negative, color="#2785db", legend_label="y = -(x^2)") # Plot the blue series of dots
show(scatter3) # Display the plot


Finally, `glyphs` can be lines too. Instead of calling `circle`, we simply call `line` instead. Circles are useful when dealing with *discrete* data, and lines are useful for *continuous* data.

In [8]:
scatter4 = figure(title="Plotting Multiple Series", x_axis_label="X-axis Data", y_axis_label="Y-axis Data") # Create new figure
scatter4.line(x=x, y=y_squared, color="#d83939", legend_label="y = x^2") # Plot the red line
scatter4.line(x=x, y=y_squared_negative, color="#2785db", legend_label="y = -(x^2)") # Plot the blue line
show(scatter4) # Display the plot

## 3B: Time-series Data

*Time-series data* refers to data that changes with time. The x-axis will always be some measure of time, for example: seconds, days, months, years. The y-axis will be observations recorded at periodic time intervals, for example: temperature, population, GDP, etc. Try thinking of a few more examples of time-series data.

### Financial Stock Data

This project will work with financial stock data. First, we will analyze Tesla (TSLA) prices from 2010-06-29 to 2020-06-12. You can find the source file called `TSLA.csv`, taken from https://www.macrotrends.net/stocks/charts/TSLA/tesla/stock-price-history.

Open `TSLA.csv` in a new window. Each column represents a certain observation for a specific day:
 - *date*: Our x-axis, the day on which the observation was made
 - *open*: The price of the stock at the start of each day
 - *high*: The highest price the stock reached each day
 - *low*: The lowest price the stock reached each day
 - *close*: The price the stock at the end of each day
 - *volume*: The amount of trades executed each day

Let's use `pandas` to import `TSLA.csv` into a DataFrame.

In [9]:
import pandas as pd
df = pd.read_csv("TSLA.csv")
print(df)

            date    open       high      low    close    volume
0     2010-06-29   19.00    25.0000   17.540    23.89  18766300
1     2010-06-30   25.79    30.4192   23.300    23.83  17187100
2     2010-07-01   25.00    25.9200   20.270    21.96   8218800
3     2010-07-02   23.00    23.1000   18.710    19.20   5139800
4     2010-07-06   20.00    20.0000   15.830    16.11   6866900
...          ...     ...        ...      ...      ...       ...
2502  2020-06-08  919.00   950.0000  909.160   949.92  13950535
2503  2020-06-09  940.01   954.4399  923.931   940.67  11388154
2504  2020-06-10  991.88  1027.4800  982.500  1025.05  18273709
2505  2020-06-11  990.20  1018.9600  972.000   972.84  15691175
2506  2020-06-12  980.00   987.9800  912.600   935.28  16763374

[2507 rows x 6 columns]


### Data Preparation

Next, let's figure out what data we want to plot. For now, let's plot the `close` prices (y-axis) against the `date` (x-axis).

In [10]:
dates = pd.to_datetime(df['date']).tolist() # We convert the date column to datetime objects, and store them in a list
print("X-axis")
print(dates[0:5]) # Print the first few dates on the x-axis to give us an idea of what our list looks like
print("")
close = df['close'].tolist() # Store the close column in a list
print("Y-axis")
print(close[0:5]) # Print the first few dates on the y-axis to give us an idea of what our list looks like


X-axis
[Timestamp('2010-06-29 00:00:00'), Timestamp('2010-06-30 00:00:00'), Timestamp('2010-07-01 00:00:00'), Timestamp('2010-07-02 00:00:00'), Timestamp('2010-07-06 00:00:00')]

Y-axis
[23.89, 23.83, 21.96, 19.2, 16.11]


### Plotting the Data

Now we are ready to plot our data. First we create an empty figure, then we plot the line on our empty figure, and finally we `show` the chart.

In [11]:
tsla = figure(title="Tesla Close Prices", x_axis_label="Date", y_axis_label="Closing Price")
tsla.line(x=dates, y=close, color="#000000", legend_label='Close Prices')
show(tsla)

Notice that the dates aren't displaying correctly. We can fix this using the `x_axis_type`. We can zoom in to a section of the data using the tools on the side.

In [12]:
tsla = figure(title="Tesla Close Prices", x_axis_label="Date", y_axis_label="Closing Price", x_axis_type="datetime")
tsla.line(x=dates, y=close, color="#000000", legend_label='Close')
show(tsla)

### Plotting Multiple Series

Finally, let's make our chart more informative by displaying the trading high and low prices. First, we'll put the columns of data into lists.

In [13]:
high = df['high'].tolist()       # Convert the columns into lists
low = df['low'].tolist()

print("High")                    # Print the first few values of each list of data
print(high[0:5])
print("")
print("Low")
print(low[0:5])

High
[25.0, 30.4192, 25.92, 23.1, 20.0]

Low
[17.54, 23.3, 20.27, 18.71, 15.83]


We can now plot the `high` and `low` data to our existing `tsla` figure, and show the resulting chart. Use the zoom tool to see details.

In [14]:
tsla.line(x=dates, y=high, color="#2785db", legend_label='High')
tsla.line(x=dates, y=low, color="#d83939", legend_label='Low')
show(tsla)

Ultimately, we have learned how to use scatter and line charts in Bokeh. Try creating a few functions of your own, or experimenting with `TSLA.csv` file.