# Data visualisation using plotly and dash

My target for this notebook is to learn interactive data visualisation with plotly. I will try out various plots using plotly and finally try to put altogether in a dashboard. You migh find some plots to be 'forced', and this may be because I am trying out a plot which does not fit this IPL data. However, I will try my best to keep the plots sensible. 

I am not that much of a writing person as I believe the visualisation should be enough to tell the story, so don't expect a lot of notes here.

Enough of talk, let's go straight to action! 

# Importing the libraries

In [None]:
import numpy as np 
import pandas as pd 
import plotly.offline as pyo
import plotly.graph_objs as go
import plotly.figure_factory as ff
import plotly.subplots as subplots
import cufflinks as cf
cf.go_offline(connected=True)

pd.options.display.max_columns = None
pyo.init_notebook_mode(connected=True)

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Loading the datasets

In [None]:
matches = pd.read_csv('/kaggle/input/ipl-complete-dataset-20082020/IPL Matches 2008-2020.csv')
balls = pd.read_csv('/kaggle/input/ipl-complete-dataset-20082020/IPL Ball-by-Ball 2008-2020.csv')

# First look at the ball by ball data

In [None]:
balls.head()

# First look at the matches data

In [None]:
matches.head()

# Merging the two datasets

In [None]:
data = pd.merge(left=matches, right=balls, on='id', how='right')
data.head()

In [None]:
print(matches.shape)
print(balls.shape)
print(data.shape)

In [None]:
data.info()

# Extracting year from the date

In [None]:
data['date'] = pd.to_datetime(data['date'])
data['year'] = pd.DatetimeIndex(data['date']).year

In [None]:
data.head()

# Runs scored over the years

In [None]:
runs_by_years = data.groupby(by='year').sum()['total_runs']
runs_by_years = pd.DataFrame(runs_by_years)
runs_by_years.reset_index(inplace=True)

### Using plotly

In [None]:
total_runs = go.Scatter(
                    x=runs_by_years['year'],
                    y=runs_by_years['total_runs'],
                    mode='lines',
                    name='runs')

data = [total_runs]

layout = go.Layout(title='Runs scored by year',
                  xaxis = dict(title='Year'),
                  yaxis = dict(title='Runs'))

fig = go.Figure(data=data, layout=layout)

pyo.iplot(fig)

### Using cufflinks

In [None]:
runs_by_years.iplot(kind='scatter', x='year', y='total_runs', title='Runs scored by year', 
                    xTitle='Year', yTitle='Runs')

# Preferred toss decision

In [None]:
toss_decisions = matches.groupby(by='toss_decision').count()
toss_decisions = pd.DataFrame(toss_decisions['id'])
toss_decisions.reset_index(inplace=True)
toss_decisions

### Using plotly

In [None]:
toss_decision = go.Bar(
                    x=toss_decisions['toss_decision'],
                    y=toss_decisions['id']
                )

data = [toss_decision]

layout = go.Layout(title='Toss decision',
                  xaxis = dict(title='Decision'),
                  yaxis = dict(title='count'))

fig = go.Figure(data=data, layout=layout)

pyo.iplot(fig)

### Using cufflinks

In [None]:
toss_decisions.iplot(kind='bar', x='toss_decision', y='id', title='Toss Decision', 
                     xTitle='Decision', yTitle='count')

# Totals runs and wickets by over

In [None]:
runs_and_wickets_by_over = balls.groupby(by='over').sum()
runs_and_wickets_by_over = pd.DataFrame(runs_and_wickets_by_over[['total_runs', 'is_wicket']])
runs_and_wickets_by_over.reset_index(inplace=True)
runs_and_wickets_by_over

### Using plotly

In [None]:
runs_and_wickets_by_overs = go.Scatter(
                    x=runs_and_wickets_by_over['over'],
                    y=runs_and_wickets_by_over['total_runs'],
                    text=runs_and_wickets_by_over['is_wicket'],
                    mode='markers',
                    marker=dict(size=runs_and_wickets_by_over['is_wicket']/10,
                               color=runs_and_wickets_by_over['total_runs']/10,
                               showscale=True)
                )

data = [runs_and_wickets_by_overs]

layout = go.Layout(title='Runs and wicket by over',
                  xaxis = dict(title='Over'),
                  yaxis = dict(title='Runs'))

fig = go.Figure(data=data, layout=layout)

pyo.iplot(fig)

### Using cufflinks

Failed to get colorscale in cufflinks

In [None]:
runs_and_wickets_by_over.iplot(kind='scatter', x='over', y='total_runs', mode='markers',
                               title='Runs and wickets by over',
                               xTitle='Over', yTitle='Runs', 
                               size=runs_and_wickets_by_over['is_wicket']/10)

# Runs distribution over wise

In [None]:
balls = pd.read_csv('/kaggle/input/ipl-complete-dataset-20082020/IPL Ball-by-Ball 2008-2020.csv')
runs_overs = balls[['total_runs', 'over']]
runs_overs

### Using plotly

In [None]:
runs_over = go.Box(
                    x=runs_overs['over'],
                    y=runs_overs['total_runs']
                )

data = [runs_over]

layout = go.Layout(title='Runs distribution over wise',
                  xaxis = dict(title='Over'),
                  yaxis = dict(title='Runs'))

fig = go.Figure(data=data, layout=layout)

pyo.iplot(fig)

### Using cufflinks

A simple one-liner doesn't seem to work here.

In [None]:
runs_overs.iplot(kind='box', y='over',title='Runs distribution over wise', xTitle='Over', yTitle='Runs')

The below code gives the correct plot, however it's causing the whole notebook to slow down so I have commented it out.

In [None]:
#runs_overs = balls[['total_runs', 'over']]
#runs_overs.pivot(columns='over', values='total_runs').iplot(kind='box')

# Runs distribution match wise

In [None]:
runs_by_match = balls.groupby(by='id').sum()
runs_by_match = pd.DataFrame(runs_by_match[['total_runs']])
runs_by_match.reset_index(inplace=True)
runs_by_match

In [None]:
runs_by_match = go.Histogram(
                    x=runs_by_match['total_runs']
                )

data = [runs_by_match]

layout = go.Layout(title='Runs distribution match wise',
                  xaxis = dict(title='Runs'))

fig = go.Figure(data=data, layout=layout)

pyo.iplot(fig)

In [None]:
data = pd.merge(left=matches, right=balls, on='id', how='right')
data['date'] = pd.to_datetime(data['date'])
data['year'] = pd.DatetimeIndex(data['date']).year

runs_by_match_and_year = data.groupby(by=['id','year']).sum()
runs_by_match_and_year = pd.DataFrame(runs_by_match_and_year[['total_runs']])
runs_by_match_and_year.reset_index(inplace=True)
runs_by_match_and_year

In [None]:
year_2008 = go.Histogram(x=runs_by_match_and_year[runs_by_match_and_year['year']==2008]['total_runs'])
year_2009 = go.Histogram(x=runs_by_match_and_year[runs_by_match_and_year['year']==2009]['total_runs'])
year_2010 = go.Histogram(x=runs_by_match_and_year[runs_by_match_and_year['year']==2010]['total_runs'])
year_2011 = go.Histogram(x=runs_by_match_and_year[runs_by_match_and_year['year']==2011]['total_runs'])
year_2012 = go.Histogram(x=runs_by_match_and_year[runs_by_match_and_year['year']==2012]['total_runs'])
year_2013 = go.Histogram(x=runs_by_match_and_year[runs_by_match_and_year['year']==2013]['total_runs'])
year_2014 = go.Histogram(x=runs_by_match_and_year[runs_by_match_and_year['year']==2014]['total_runs'])
year_2015 = go.Histogram(x=runs_by_match_and_year[runs_by_match_and_year['year']==2015]['total_runs'])
year_2016 = go.Histogram(x=runs_by_match_and_year[runs_by_match_and_year['year']==2016]['total_runs'])
year_2017 = go.Histogram(x=runs_by_match_and_year[runs_by_match_and_year['year']==2017]['total_runs'])
year_2018 = go.Histogram(x=runs_by_match_and_year[runs_by_match_and_year['year']==2018]['total_runs'])
year_2019 = go.Histogram(x=runs_by_match_and_year[runs_by_match_and_year['year']==2019]['total_runs'])
year_2020 = go.Histogram(x=runs_by_match_and_year[runs_by_match_and_year['year']==2020]['total_runs'])

fig=subplots.make_subplots(rows=13, cols=1, subplot_titles=['Year 2008', 'Year 2009', 'Year 2010', 'Year 2011',
                                                        'Year 2012', 'Year 2013', 'Year 2014', 'Year 2015',
                                                        'Year 2016', 'Year 2017', 'Year 2018', 'Year 2019',
                                                        'Year 2020'],
                       shared_xaxes=True)

fig['layout'].update(title='Runs distribution match wise', width=800, height=5200, showlegend=False, 
                     xaxis=dict(title='Runs'), yaxis=dict(title='Count of matches'))

fig.append_trace(year_2008,1,1)
fig.append_trace(year_2009,2,1)
fig.append_trace(year_2010,3,1)
fig.append_trace(year_2011,4,1)
fig.append_trace(year_2012,5,1)
fig.append_trace(year_2013,6,1)
fig.append_trace(year_2014,7,1)
fig.append_trace(year_2015,8,1)
fig.append_trace(year_2016,9,1)
fig.append_trace(year_2017,10,1)
fig.append_trace(year_2018,11,1)
fig.append_trace(year_2019,12,1)
fig.append_trace(year_2020,13,1)

pyo.iplot(fig)

# Runs balls wise distributions

In [None]:
balls['total_runs'].iplot(kind='hist', title='Runs balls wise distributions', xTitle='Runs', 
                          yTitle='Count')

### Testing the different themes

In [None]:
themes = cf.getThemes()
themes

In [None]:
for theme in themes:
    balls['total_runs'].iplot(kind='hist', theme=theme, 
                              title=theme+' :Runs balls wise distributions ', xTitle='Runs', 
                              yTitle='Count')

# Runs by ball of the over

In [None]:
runs_by_ball_of_overs = balls.groupby(by=['ball', 'over']).sum()
runs_by_ball_of_overs = pd.DataFrame(runs_by_ball_of_overs[['total_runs']])
runs_by_ball_of_overs.reset_index(inplace=True)
runs_by_ball_of_overs

In [None]:
runs_by_ball_of_over = go.Heatmap(
                    x=runs_by_ball_of_overs['over'],
                    y=runs_by_ball_of_overs['ball'],
                    z=runs_by_ball_of_overs['total_runs']
                )

data = [runs_by_ball_of_over]

layout = go.Layout(title='Runs distribution over and ball wise',
                  xaxis = dict(title='Over'),
                  yaxis = dict(title='Ball'))

fig = go.Figure(data=data, layout=layout)

pyo.iplot(fig)

In [None]:
runs_by_ball_of_overs.iplot(kind='heatmap', x='over', y='ball', z='total_runs',
                           title='Runs distribution over and ball wise', xTitle='Over', 
                            yTitle='Ball')

# Sharing the experience of plotly and cufflinks

I have covered some major plots using plotly in this notebook, however, the notebook has started to slow down now and I have decided to recreate the above plots using seaborn/matplotlib libraries and create more plots. If there is any plot which I feel deserves interactivity, I will surely use plotly. 

Thanks to Aditya Mishra for pointing me to the direction of cufflinks and all those who gave support and motivation

I have still a lot to learn about plotly and dash and I believe this was a good start.

I have not used Dash here, as I realised it can't work in Kaggle, but I am continuing with dash in my locally and will surely share the basic code in this very notebook as well.

Thanks for the support, and all criticsm as welcome! Feel free to leave any small point of improvement, however minor it may be, it will go a long way for me to improove myself. Do upvote if you liked even one thing about thing notebook.