# Library Show and Tell: **Plotly**
"The plotly Python library (plotly.py) is an interactive, open-source plotting library that supports over 40 unique chart types covering a wide range of statistical, financial, geographic, scientific, and 3-dimensional use-cases.

Built on top of the Plotly JavaScript library (plotly.js), plotly.py enables Python users to create beautiful interactive web-based visualizations that can be displayed in Jupyter notebooks, saved to standalone HTML files, or served as part of pure Python-built web applications using Dash."

In this demonstration we'll go over just a handful of the chart types in the library. Official documentation can be found at https://plotly.com/python/reference

Notebook adapted from https://www.kaggle.com/kanncaa1/plotly-tutorial-for-beginners

## Installation
The code chunk below makes certain that you are running the pip version associated with the current Python kernel. This allows the installed packages to be used in the current notebook. More info can be found here: https://jakevdp.github.io/blog/2017/12/05/installing-python-packages-from-jupyter/.

In [8]:
# Plotly installation in the current Jupyter kernel
#import sys
#!{sys.executable} -m pip install plotly --upgrade #install latest version of plotly
#!{sys.executable} -m pip install chart_studio #chart studio required for in-line rendering for notebooks

#Log in using plotly account credentials
import chart_studio
chart_studio.tools.set_credentials_file(username='jakekaihewalu', api_key='5FI8FIrD7FhqSFhzgO5p')

import chart_studio.plotly as py #import chart studio as 'py'
import plotly.graph_objs as go #import graph objects as 'go'

Collecting plotly
  Using cached https://files.pythonhosted.org/packages/4e/9b/1597117623f99b16d87f839b66040a5d3c8a61d83264fdad4388e142944b/plotly-4.7.0-py2.py3-none-any.whl
Processing /home/jovyan/.cache/pip/wheels/d7/a9/33/acc7b709e2a35caa7d4cae442f6fe6fbf2c43f80823d46460c/retrying-1.3.3-cp37-none-any.whl
Installing collected packages: retrying, plotly
Successfully installed plotly-4.7.0 retrying-1.3.3
Collecting chart_studio
  Using cached https://files.pythonhosted.org/packages/ca/ce/330794a6b6ca4b9182c38fc69dd2a9cbff60fd49421cb8648ee5fee352dc/chart_studio-1.1.0-py3-none-any.whl
Installing collected packages: chart-studio
Successfully installed chart-studio-1.1.0


## Loading Sample Dataset
We'll be using the "World University Rankings" dataset from Kaggle: https://www.kaggle.com/mylesoneill/world-university-rankings/version/3

Of all the universities in the world, which are the best?

Ranking universities is a difficult, political, and controversial practice. There are hundreds of different national and international university ranking systems, many of which disagree with each other. This dataset contains three global university rankings from very different places.

In [9]:
import pandas as pd

timesData = pd.read_csv("timesData.csv")
timesData.head()

Unnamed: 0,world_rank,university_name,country,teaching,international,research,citations,income,total_score,num_students,student_staff_ratio,international_students,female_male_ratio,year
0,1,Harvard University,United States of America,99.7,72.4,98.7,98.8,34.5,96.1,20152,8.9,25%,,2011
1,2,California Institute of Technology,United States of America,97.7,54.6,98.0,99.9,83.7,96.0,2243,6.9,27%,33 : 67,2011
2,3,Massachusetts Institute of Technology,United States of America,97.8,82.3,91.4,99.9,87.5,95.6,11074,9.0,33%,37 : 63,2011
3,4,Stanford University,United States of America,98.3,29.5,98.1,99.2,64.3,94.3,15596,7.8,22%,42 : 58,2011
4,5,Princeton University,United States of America,90.9,70.3,95.4,99.9,-,94.2,7929,8.4,27%,45 : 55,2011


- timesData includes 2603 entries with 14 features that are:
    - world_rank
    - university_name
    - country
    - teaching
    - international
    - research
    - citations
    - income
    - total_score
    - num_students
    - student_staff_ratio
    - international_students
    - female_male_ratio
    - year

## Line Charts
*Line Charts Example: Citation and Teaching vs World Rank of Top 100 Universities*
- Import graph_objs as go
- Creating traces
    - x = x axis
    - y = y axis
    - mode = type of plot like marker, line or line + markers
    - name = name of the plots
    - marker = marker is used with dictionary.
        - color = color of lines. It takes RGB (red, green, blue) and opacity (alpha)
    - text = The hover text (hover is curser)
- data = is a list that we add traces into it
- layout = it is dictionary.
    - title = title of layout
    - x axis = it is dictionary
        - title = label of x axis
        - ticklen = length of x axis ticks
        - zeroline = showing zero line or not
- fig = it includes data and layout
- iplot() = plots the figure(fig) that is created by data and layout

In [10]:
# prepare data frame of top 100 Universities
df = timesData.iloc[:100,:]

# Creating trace1
trace1 = go.Scatter(
                    x = df.world_rank,
                    y = df.citations,
                    mode = "lines",
                    name = "citations",
                    marker = dict(color = 'rgba(16, 112, 2, 0.8)'),
                    text= df.university_name)
# Creating trace2
trace2 = go.Scatter(
                    x = df.world_rank,
                    y = df.teaching,
                    mode = "lines+markers",
                    name = "teaching",
                    marker = dict(color = 'rgba(80, 26, 80, 0.8)'),
                    text= df.university_name)
data = [trace1, trace2]

layout = dict(title = 'Citation and Teaching vs World Rank of Top 100 Universities',
              xaxis= dict(title= 'World Rank',ticklen= 5,zeroline= False))

fig = dict(data = data, layout = layout)
py.iplot(fig)

## Scatterplot
*Scatter Example: Citation vs World Rank of Top 100 Universities in Years 2014, 2015, and 2016.*

In [11]:
# prepare data frames
df2014 = timesData[timesData.year == 2014].iloc[:100,:]
df2015 = timesData[timesData.year == 2015].iloc[:100,:]
df2016 = timesData[timesData.year == 2016].iloc[:100,:]

# creating trace1
trace1 =go.Scatter(
                    x = df2014.world_rank,
                    y = df2014.citations,
                    mode = "markers",
                    name = "2014",
                    marker = dict(color = 'rgba(255, 128, 255, 0.8)'),
                    text= df2014.university_name)
# creating trace2
trace2 =go.Scatter(
                    x = df2015.world_rank,
                    y = df2015.citations,
                    mode = "markers",
                    name = "2015",
                    marker = dict(color = 'rgba(255, 128, 2, 0.8)'),
                    text= df2015.university_name)
# creating trace3
trace3 =go.Scatter(
                    x = df2016.world_rank,
                    y = df2016.citations,
                    mode = "markers",
                    name = "2016",
                    marker = dict(color = 'rgba(0, 255, 200, 0.8)'),
                    text= df2016.university_name)

data = [trace1, trace2, trace3]
layout = dict(title = 'Citation vs world rank of top 100 universities with 2014, 2015 and 2016 years',
              xaxis= dict(title= 'World Rank',ticklen= 5,zeroline= False),
              yaxis= dict(title= 'Citation',ticklen= 5,zeroline= False)
             )
fig = dict(data = data, layout = layout)
py.iplot(fig)

## Bar Charts
*Bar Chart Example: Citations and Teaching of Top 3 Universities in 2014*

In [12]:
#Create subset of just top 3 schools in 2014
df2014 = timesData[timesData.year == 2014].iloc[:3,:]

# create trace1 for citations
trace1 = go.Bar(
                x = df2014.university_name,
                y = df2014.citations,
                name = "citations",
                marker = dict(color = 'rgba(255, 174, 255, 0.5)',
                             line=dict(color='yellow',width=1.5)),
                text = df2014.country)
# create trace2 for teaching
trace2 = go.Bar(
                x = df2014.university_name,
                y = df2014.teaching,
                name = "teaching",
                marker = dict(color = 'rgba(255, 255, 128, 0.5)',
                              line=dict(color='pink',width=1.5)),
                text = df2014.country)
data = [trace1, trace2]
layout = go.Layout(barmode = "group")
fig = go.Figure(data = data, layout = layout)
py.iplot(fig)

## Scatter Matrix Plots
*Scatter Matrix Plots Example: Allows us to look at covariance and relationships between features. Tool commonly used in exploratory analysis. Different syntax than before.*

In [13]:
# import figure factory
import plotly.figure_factory as ff

#import numpy as np
import numpy as np

# prepare data
dataframe = timesData[timesData.year == 2015]
data2015 = dataframe.loc[:,["research","international", "total_score"]]
data2015["index"] = np.arange(1,len(data2015)+1)

# scatter matrix
fig = ff.create_scatterplotmatrix(data2015, diag='box', index='index',colormap='Portland',
                                  colormap_type='cat',
                                  height=700, width=700)
py.iplot(fig)

## Multiple Subplots
Multiple Subplots: While comparing more than one features, multiple subplots can be useful. This is achieved by defining the layout.

In [14]:
trace1 = go.Scatter(
    x=dataframe.world_rank,
    y=dataframe.research,
    name = "research"
)
trace2 = go.Scatter(
    x=dataframe.world_rank,
    y=dataframe.citations,
    xaxis='x2',
    yaxis='y2',
    name = "citations"
)
trace3 = go.Scatter(
    x=dataframe.world_rank,
    y=dataframe.income,
    xaxis='x3',
    yaxis='y3',
    name = "income"
)
trace4 = go.Scatter(
    x=dataframe.world_rank,
    y=dataframe.total_score,
    xaxis='x4',
    yaxis='y4',
    name = "total_score"
)
data = [trace1, trace2, trace3, trace4]
layout = go.Layout(
    xaxis=dict(
        domain=[0, 0.45]
    ),
    yaxis=dict(
        domain=[0, 0.45]
    ),
    xaxis2=dict(
        domain=[0.55, 1]
    ),
    xaxis3=dict(
        domain=[0, 0.45],
        anchor='y3'
    ),
    xaxis4=dict(
        domain=[0.55, 1],
        anchor='y4'
    ),
    yaxis2=dict(
        domain=[0, 0.45],
        anchor='x2'
    ),
    yaxis3=dict(
        domain=[0.55, 1]
    ),
    yaxis4=dict(
        domain=[0.55, 1],
        anchor='x4'
    ),
    title = 'Research, citation, income and total score VS World Rank of Universities'
)
fig = go.Figure(data=data, layout=layout)
py.iplot(fig)