# Dynamic Visualizations with Plotly

# Data Exploration

In [93]:
#reading the data into a pandas dataframe
import pandas as pd
data = pd.read_csv('prices.csv')
data.describe()

Unnamed: 0,open,close,low,high,volume
count,851264.0,851264.0,851264.0,851264.0,851264.0
mean,70.836986,70.857109,70.118414,71.543476,5415113.0
std,83.695876,83.689686,82.877294,84.465504,12494680.0
min,0.85,0.86,0.83,0.88,0.0
25%,33.84,33.849998,33.48,34.189999,1221500.0
50%,52.77,52.799999,52.23,53.310001,2476250.0
75%,79.879997,79.889999,79.110001,80.610001,5222500.0
max,1584.439941,1578.130005,1549.939941,1600.930054,859643400.0


In [13]:
#Checking the feature types
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 851264 entries, 0 to 851263
Data columns (total 7 columns):
date      851264 non-null object
symbol    851264 non-null object
open      851264 non-null float64
close     851264 non-null float64
low       851264 non-null float64
high      851264 non-null float64
volume    851264 non-null float64
dtypes: float64(5), object(2)
memory usage: 45.5+ MB


In [94]:
#Exploring the data rows
data.head()

Unnamed: 0,date,symbol,open,close,low,high,volume
0,2016-01-05 00:00:00,WLTW,123.43,125.839996,122.309998,126.25,2163600.0
1,2016-01-06 00:00:00,WLTW,125.239998,119.980003,119.940002,125.540001,2386400.0
2,2016-01-07 00:00:00,WLTW,116.379997,114.949997,114.93,119.739998,2489500.0
3,2016-01-08 00:00:00,WLTW,115.480003,116.620003,113.5,117.440002,2006300.0
4,2016-01-11 00:00:00,WLTW,117.010002,114.970001,114.089996,117.330002,1408600.0


# Key or value columns
The columns 'date', 'symbol', 'open', 'close', 'low', 'high', 'volume' are all values. This flat table has an implicit key which is the index of the row entries. 
In tables, keys may be categorical or ordinal attributes, but quantititive attributes are typically unsuitable as keys because there is nothing to prevent them from having the same values for multiple items. In this case the categorical variables date and symbol both have duplicates and so cannot be used as keys to identify unique entries in the table.

# Task
- User wants to summarize and compare the performance of 'YHOO', 'WMT' and 'ZTS' between 2013-01-01 and 2016-02-12 to eventually decide where to invest.
- For Actions taxonomy we consume the information needed by discovering the trends in closing values of stocks and present it to the user. Thus action of type analyze.
- We first filter the data by years and then by the required labels. This final dataframe is then used for visualizing the time series data using different attributes like closing price and volume of trade.

In [95]:
#Filtering the data for the range of dates
data = pd.read_csv('prices.csv')
data_year = data[(data['date'] > '2013-01-01') & (data['date'] < '2016-02-12')]

In [73]:
#creating data frames for different stock labels
data_filter_1 = data_year[data_year['symbol'].isin(['YHOO'])]
data_filter_2 = data_year[data_year['symbol'].isin(['WMT'])]
data_filter_3 = data_year[data_year['symbol'].isin(['ZTS'])]

In [89]:
import plotly.plotly as py
import plotly.graph_objs as go
init_notebook_mode(connected=True)

#Creating a data mapping for plotting the stock data
data = [go.Scatter(x=data_filter_1.date, y=data_filter_1['close'], name= 'YHOO'),go.Scatter(x=data_filter_2.date, y=data_filter_2['close'],name= 'WMT'),
       go.Scatter(x=data_filter_3.date, y=data_filter_3['close'],name= 'ZTS')]

#Layout to define aesthetics of the visualization
layout = go.Layout(
    title='Closing prices of different stocks',
    xaxis=dict(
        title='Years',
        titlefont=dict(
            family='Courier New, monospace',
            size=18,
            color='#7f7f7f'
        )
    ),
    yaxis=dict(
        title='Price',
        titlefont=dict(
            family='Courier New, monospace',
            size=18,
            color='#7f7f7f'
        )
    )
)
#embedding the data with the aesthetic specifications
fig = go.Figure(data=data, layout=layout)
#plotting a offline plotly graph
iplot(fig)

# Justification and Insights
- Here the variable to be encoded is a quantitatve variable which is continuous over the years.
- So, I decided on encoding the variable using a line chart where the marks are the 'line' and channel is the 'position' of the line. 
- The plot shows the performance of stocks in terms of closing prices over a period of time. Having multiple lines on the same chart enables the user to accomplish his task of comparing the performance of different stock options and make a decision.
- The line charts in plotly have a feature of tooltips which allow the user to get a exact value at a specific point by howering his pointer over it. This helps the user to judge the performance in a calculated way. We can view the value of closing price at particular date for a stock.
- Interesting Insight- The plot shows that the prices of 'YHOO' and 'WMT' have been dropping considerably over the last quarter of 2015. This is a interesting insight for a high risk investor who could buy the stocks for cheap and hope the prices increase over the next few years.

In [92]:
import plotly.plotly as py
import plotly.graph_objs as go
init_notebook_mode(connected=True)

#Creating a data mapping for plotting the stock data
data = [go.Scatter(x=data_filter_1.date, y=data_filter_1['volume'],fill='tozeroy', name = 'YHOO'),go.Scatter(x=data_filter_2.date, y=data_filter_2['volume'], fill = 'tozeroy', name = 'WMT'),
       go.Scatter(x=data_filter_3.date, y=data_filter_3['volume'], fill = 'tozeroy', name = 'ZTS')]

#Layout to define aesthetics of the visualization
layout = go.Layout(
    title='Volume of trading for different stocks',
    xaxis=dict(
        title='Years',
        titlefont=dict(
            family='Courier New, monospace',
            size=18,
            color='#7f7f7f'
        )
    ),
    yaxis=dict(
        title='Volume',
        titlefont=dict(
            family='Courier New, monospace',
            size=18,
            color='#7f7f7f'
        )
    )
)

#embedding the data with the aesthetic specifications
fig = go.Figure(data=data, layout=layout)

#plotting a offline plotly graph
iplot(fig, filename='basic-area')

# Justification and Insights
- The variable to be encoded is a quantitative variable(Volume) which is continuous over the years. So, I decided on encoding the information using a area chart.
- The area chart uses the color coded area under a line as 'marks' and the size of the region as channel.
- The plot shows the volume of trading for three stocks over a period of time. Having the volume represented as area we can compare quantities for different stocks by using the respective colors and justify choice.
- The area charts contain tooltips that display the value at a particular point when the user hovers over them. This interactivity helps make a decision with accuracy even when there is an overlap of different quantities by slight margins. Also, it allows to compare values at particular date which would not be possible with a non interactive area chart.
- Interesting insight here is that the YHOO stock has been traded in large volumes over the past few years and had a peak volume during September 2014. This tells us that a specific event during this time resulted in a surge. It has been downhill ever since the loss of private user data to hackers.
- YHOO stock will continue to fall and maybe a good option for a high risk investor to bet on.