# Dynamic Visualizations with Plotly
## Zexi Han

In [1]:
import pandas as pd
import numpy as np

import plotly.plotly as py
import plotly.graph_objs as go
import plotly
import cufflinks
plotly.tools.set_credentials_file(username='zexihan', api_key='YzQVrnP4docHXcWDujIc')

In [2]:
prices = pd.read_csv("prices.csv")

In [3]:
prices.head()

Unnamed: 0,date,symbol,open,close,low,high,volume
0,2016-01-05 00:00:00,WLTW,123.43,125.839996,122.309998,126.25,2163600.0
1,2016-01-06 00:00:00,WLTW,125.239998,119.980003,119.940002,125.540001,2386400.0
2,2016-01-07 00:00:00,WLTW,116.379997,114.949997,114.93,119.739998,2489500.0
3,2016-01-08 00:00:00,WLTW,115.480003,116.620003,113.5,117.440002,2006300.0
4,2016-01-11 00:00:00,WLTW,117.010002,114.970001,114.089996,117.330002,1408600.0


In [4]:
prices.shape

(851264, 7)

**Label the 7 columns in the dataset as key or value columns (see Munzner 2.6). Include
this write-up in a markup cell. Write a couple of sentences justifying your choice.**

A key attribute acts as an index that is used to look up value attributes. In prices dataset, "date" should be labeled as key column and other columns are labeled as value columns. Because the "date" column is an independent attribute and uniquely defines an obervation of the stock exchange in the dataset, while other columns are dependent on the "date" column.

**Come up with a task a user might be interested in performing with this dataset. (Refer to
task abstraction from Visual Encoding lecture slides.) Write it in markup. ​You must specify your task using technical visualization terminology, not just a layman’s description.**

Task: Please present how the stock's closing price of Google and Amazon change over time during 2010 - 2016. Is there any story that you can tell from it? Make your visualization interactive and dynamic if necessary.

**Choose two different, reasonable ways to encode the data that allow a user to perform the task you specified in the last step. Focus more on the lecture slides on Marks & Channels, and on Visual Encodings than making an interactive visualization in this step.**

Line plot and area plot are two types of plot for visualizing time-series data like stock closing prices.
1. Line plot encodes two attributes using both point mark and line mark with the vertical spatial position channel for the quantitative attribute of stock closing price and the horizontal spatial position channel for the categorical attribute of date. Each pair of consecutive point marks are connected with a line mark, the slope of which represents the change in price over date. Color channel is used to distinguish two different stocks.
2. Area plot encodes two attributes using line mark and area mark. The line is plotted the same as above and the area is area between the line and the x-axis. Besides the change in price over date encoded with the slope of the line, the value of the stock within a certain date period is encoded using area channel. Color channel is also used.

**Create the two interactive visualizations. Make sure readers of the visualization understand what they are looking at, e.g., use sensible axes, labels, title, etc**

In [5]:
prices_amzn = prices[prices["symbol"] == "AMZN"]
prices_googl = prices[prices["symbol"] == "GOOGL"]
prices_amzn["amzn_close"] = prices_amzn["close"]
prices_googl["googl_close"] = prices_googl["close"]
prices_amzn = prices_amzn[["date", "amzn_close"]]
prices_googl = prices_googl[["date", "googl_close"]]

In [6]:
df = prices_amzn.join(prices_googl.set_index('date'), on='date')
df = df.set_index("date")

In [9]:
trace_amzn = go.Scatter(
    x=df.index,
    y=df["amzn_close"],
    name = "AMZN Close",
    line = dict(color = '#FF8E05'),
    opacity = 0.8)

trace_googl = go.Scatter(
    x=df.index,
    y=df["googl_close"],
    name = "GOOGL Close",
    line = dict(color = '#397AF2'),
    opacity = 0.8)

data = [trace_amzn, trace_googl]

layout = dict(
    title='Time Series with Rangeslider',
    xaxis=dict(
        rangeselector=dict(
            buttons=list([
                dict(count=1,
                     label='1m',
                     step='month',
                     stepmode='backward'),
                dict(count=6,
                     label='6m',
                     step='month',
                     stepmode='backward'),
                dict(step='all')
            ])
        ),
        rangeslider=dict(
            visible = True
        ),
        type='date'
    )
)

fig = dict(data=data, layout=layout)
py.iplot(fig, filename = "Time Series with Rangeslider")

High five! You successfully sent some data to your account on plotly. View your plot in your browser at https://plot.ly/~zexihan/0 or inside your plot.ly account where it is named 'Time Series with Rangeslider'


To reveal the change of stock closing price over date, line plot is always the first choice in that the slope of the line channel mathematically encodes the change. The closing price line is increasing if it goes up from left to right (The slope is positive), and the line is decreasing if it goes down from left to right (The slope is negative).

The plot presents how the stock's closing price of two leading tech companies Google and Amazon change over time during 2010 - 2016. Intreastingly, we see that the stock price of Amazon steadily increased over the entire period. But the stock of Google began with high level and it dropped sharpely in around 4 months past 2014.

Range slider is added so that the user can see any period of time that the user is interested in. The view of the plot will zoom in to the selected range of time. It allows user to observe granularized change of stock closing price. As the task if for presentation, the user can show an entire view of the line plot at first to let audience have an overview of the data, and then let them focus on a certain period for telling an interesting story by zooming in. 

In [8]:
df.iplot(subplots=True, shape=(2,1), shared_xaxes=True, fill=True)

Besides the function of line plot showing changes of stock price using slope of the line marks, the area plot also encodes the value of stock within a certain period using filled area channel. Separate subplots without overlapping are used for clearly presenting the difference between two stocks.

We can clearly see that the stock value (area) of Google is twice as much as of Amazon from 2010 - 2014. But after Google's drop in 2014, the stock value of Amazon exceeded that of Google.

Similar as above, range slider is added so that the user can see any period of time that the user is interested in. The view of the plot will zoom in to the selected range of time. It allows user to observe granularized amount of stock value. As the task if for presentation, the user can show an entire view of the line plot at first to let audience have an overview of the data, and then let them focus on a certain period for telling an interesting story by zooming in. 