# 1. Introduction to Plotly

<p>
    1. Until now we did visualisations using Matplotlib, Seaborn and Pandas. All of them produce
    static image files.<br><br>
    2. Plotly is company based out in Canada famous for it's products like Plotly and Dash<br><br>
    3. Plotly creates interactive visualisations in the form of HTML files<br><br>
    4. Drawback- can't work with a live data source<br><br>
    5. Dash is used to create live data based dashboards.
</p>

Until now what we have seen with matplotlib, seaborn they all generate a static visualization or image file, we can't interact with them.

But plotly offers interactive visualization in form ofHTML file, but plotly can't work with live data source

In summary, while Matplotlib and Seaborn are commonly used for creating static visualizations and exploratory data analysis, Plotly is preferred for creating interactive and web-based visualizations, especially when building data-driven applications and dashboards. The choice between these libraries depends on the specific requirements of your visualization task and your preference for interactivity and customization.

In [2]:
import numpy as np
import pandas as pd
import plotly.offline as pyo
import plotly.graph_objects as go

importing the dataset

In [3]:
matches = pd.read_csv('matches.csv')
delivery = pd.read_csv('deliveries.csv')
ipl = delivery.merge(matches,left_on='match_id',right_on='id')
ipl.head()

Unnamed: 0,match_id,inning,batting_team,bowling_team,over,ball,batsman,non_striker,bowler,is_super_over,...,result,dl_applied,winner,win_by_runs,win_by_wickets,player_of_match,venue,umpire1,umpire2,umpire3
0,1,1,Sunrisers Hyderabad,Royal Challengers Bangalore,1,1,DA Warner,S Dhawan,TS Mills,0,...,normal,0,Sunrisers Hyderabad,35,0,Yuvraj Singh,"Rajiv Gandhi International Stadium, Uppal",AY Dandekar,NJ Llong,
1,1,1,Sunrisers Hyderabad,Royal Challengers Bangalore,1,2,DA Warner,S Dhawan,TS Mills,0,...,normal,0,Sunrisers Hyderabad,35,0,Yuvraj Singh,"Rajiv Gandhi International Stadium, Uppal",AY Dandekar,NJ Llong,
2,1,1,Sunrisers Hyderabad,Royal Challengers Bangalore,1,3,DA Warner,S Dhawan,TS Mills,0,...,normal,0,Sunrisers Hyderabad,35,0,Yuvraj Singh,"Rajiv Gandhi International Stadium, Uppal",AY Dandekar,NJ Llong,
3,1,1,Sunrisers Hyderabad,Royal Challengers Bangalore,1,4,DA Warner,S Dhawan,TS Mills,0,...,normal,0,Sunrisers Hyderabad,35,0,Yuvraj Singh,"Rajiv Gandhi International Stadium, Uppal",AY Dandekar,NJ Llong,
4,1,1,Sunrisers Hyderabad,Royal Challengers Bangalore,1,5,DA Warner,S Dhawan,TS Mills,0,...,normal,0,Sunrisers Hyderabad,35,0,Yuvraj Singh,"Rajiv Gandhi International Stadium, Uppal",AY Dandekar,NJ Llong,


## 1. Scatter Plots

<img src="https://www.mathsisfun.com/data/images/scatter-ice-cream1.svg"/>

### Scatter plots are drawn between to continous variables
### Problem :- We are going to draw a scatter plot between Batsman Avg(X axis) and
### Batsman Strike Rate(Y axis) of the top 50 batsman in IPL(All time)

In [4]:
# Avg vs SR graph of Top 50 batsman(in terms of total runs)

# Fetching a new dataframe with Top 50 batsman
top50=ipl.groupby('batsman')['batsman_runs'].sum().sort_values(ascending=False).head(50).index.tolist()
new_ipl=ipl[ipl['batsman'].isin(top50)]

# Calculating Strike Rate
# SR=[(number of runs scored)/(number of balls played)]*100

In [5]:
runs=new_ipl.groupby('batsman')['batsman_runs'].sum()
balls=new_ipl.groupby('batsman')['batsman_runs'].count()

sr=(runs/balls)*100

sr=sr.reset_index() # batsman_runs columns should have been named as strike_rate
sr

Unnamed: 0,batsman,batsman_runs
0,AB de Villiers,145.129059
1,AC Gilchrist,133.054662
2,AJ Finch,126.299213
3,AM Rahane,117.486549
4,AT Rayudu,123.014257
5,BB McCullum,126.318203
6,BJ Hodge,121.422376
7,CH Gayle,144.194313
8,DA Miller,137.709251
9,DA Warner,138.318401


### Calculating Avg
### Avg=(Total number of Runs)/(Number of outs)
### Calculating number of outs for top 50 batsman

In [6]:
out=ipl[ipl['player_dismissed'].isin(top50)]

nouts=out['player_dismissed'].value_counts()

avg=runs/nouts

avg=avg.reset_index()
avg.rename(columns={'index':'batsman',0:'avg'},inplace=True)

avg=avg.merge(sr,on='batsman')  # batsman_runs columns should have been named as strike_rate
avg

Unnamed: 0,batsman,avg,batsman_runs
0,AB de Villiers,38.307692,145.129059
1,AC Gilchrist,27.223684,133.054662
2,AJ Finch,27.186441,126.299213
3,AM Rahane,33.593407,117.486549
4,AT Rayudu,27.146067,123.014257
5,BB McCullum,28.112245,126.318203
6,BJ Hodge,33.333333,121.422376
7,CH Gayle,41.022472,144.194313
8,DA Miller,34.733333,137.709251
9,DA Warner,40.14,138.318401


### ploting a scatter plot using plotly

to plot a figure using plotly we have to use plot function with the 'pyo' library and this plot function takes an 'fig' input, and this fig input is made from Figure() class from 'go' library this Figure class takes two input 'data', and 'layout' where data is the data that we're ploting and layout is the details of the plot.

This would open a html file in the browser and we can interact with the html file

In [17]:
trace = go.Scatter(x=avg['avg'], y=avg['batsman_runs'])
data = [trace]
layout = go.Layout(title='Batsman Avg Vs SR',xaxis={'title':'Batsman Average'},yaxis={'title':'Batsman strike rate'})
fig = go.Figure(data=data,layout=layout)
pyo.plot(fig)

'temp-plot.html'

For scatter plot we need to remove the lines, we'll use a parameter called mode in Scatter function

In [7]:
trace = go.Scatter(x=avg['avg'], y=avg['batsman_runs'],mode='markers')
data = [trace]
layout = go.Layout(title='Batsman Avg Vs SR',xaxis={'title':'Batsman Average'},yaxis={'title':'Batsman strike rate'})
fig = go.Figure(data=data,layout=layout)
pyo.plot(fig)

'temp-plot.html'

But just looking at the plot we can't know the batsman names, to see the names we can introduce one more parameter called text and provide it the values in it, now as we can by placing the pointer on the dots we can see the batsman names

In [8]:
trace = go.Scatter(x=avg['avg'], y=avg['batsman_runs'],mode='markers',text=avg['batsman'])
data = [trace]
layout = go.Layout(title='Batsman Avg Vs SR',xaxis={'title':'Batsman Average'},yaxis={'title':'Batsman strike rate'})
fig = go.Figure(data=data,layout=layout)
pyo.plot(fig)

'temp-plot.html'

We can also change color of the dots using marker parameter in scatter function

In [9]:
trace = go.Scatter(x=avg['avg'], y=avg['batsman_runs'],mode='markers',text=avg['batsman'],marker={'color':'#00a65a'})
data = [trace]
layout = go.Layout(title='Batsman Avg Vs SR',xaxis={'title':'Batsman Average'},yaxis={'title':'Batsman strike rate'})
fig = go.Figure(data=data,layout=layout)
pyo.plot(fig)

'temp-plot.html'

We can also change the size

In [12]:
trace = go.Scatter(x=avg['avg'], y=avg['batsman_runs'],mode='markers',text=avg['batsman'],marker={'color':'#00a65a','size':16})
data = [trace]
layout = go.Layout(title='Batsman Avg Vs SR',xaxis={'title':'Batsman Average'},yaxis={'title':'Batsman strike rate'})
fig = go.Figure(data=data,layout=layout)
pyo.plot(fig)

'temp-plot.html'

we can change the filename for the plot using filename parameter in plot function

In [13]:
trace = go.Scatter(x=avg['avg'], y=avg['batsman_runs'],mode='markers',text=avg['batsman'],marker={'color':'#00a65a','size':16})
data = [trace]
layout = go.Layout(title='Batsman Avg Vs SR',xaxis={'title':'Batsman Average'},yaxis={'title':'Batsman strike rate'})
fig = go.Figure(data=data,layout=layout)
pyo.plot(fig, filename='myfile.html')

'myfile.html'

# 2. Line Chart

<p>It's an extension of Scatter plot. Usually used to show a time series data</p>
<img src='https://apexcharts.com/wp-content/uploads/2018/01/basic-line-chart.svg'/>

# Year by Year batsman performance

In [15]:
single=ipl[ipl['batsman']=='V Kohli']
performance=single.groupby('season')['batsman_runs'].sum().reset_index()
performance

Unnamed: 0,season,batsman_runs
0,2008,165
1,2009,246
2,2010,307
3,2011,557
4,2012,364
5,2013,639
6,2014,359
7,2015,505
8,2016,973
9,2017,308


Similarly we'll pass the information of the dataframe above

In [16]:
trace = go.Scatter(x=performance['season'],y=performance['batsman_runs'])
data = [trace]
layout = go.Layout(title='Year by Year performance',xaxis={'title':'Season'},yaxis={'title':'Batsman runs'})
fig = go.Figure(data=data,layout=layout)
pyo.plot(fig)

'temp-plot.html'

Further we can customize the colors

In [19]:
trace = go.Scatter(x=performance['season'],y=performance['batsman_runs'],marker={'color':'#00a65a'})
#trace = go.Scatter(x=performance['season'],y=performance['batsman_runs'],mode='lines',marker={'color':'#00a65a'})
data = [trace]
layout = go.Layout(title='Year by Year performance',xaxis={'title':'Season'},yaxis={'title':'Batsman runs'})
fig = go.Figure(data=data,layout=layout)
pyo.plot(fig)

'temp-plot.html'

In [21]:
trace = go.Scatter(x=performance['season'],y=performance['batsman_runs'],marker={'color':'#00a65a'})
#trace = go.Scatter(x=performance['season'],y=performance['batsman_runs'],mode='lines+markers',marker={'color':'#00a65a'})
data = [trace]
layout = go.Layout(title='Year by Year performance',xaxis={'title':'Season'},yaxis={'title':'Batsman runs'})
fig = go.Figure(data=data,layout=layout)
pyo.plot(fig)

'temp-plot.html'

### multiple line chart

Let's create mulple dataframe for batsman performancs

In [23]:
single=ipl[ipl['batsman']=='V Kohli']
performance=single.groupby('season')['batsman_runs'].sum().reset_index()
performance
single1=ipl[ipl['batsman']=='MS Dhoni']
performance1=single1.groupby('season')['batsman_runs'].sum().reset_index()
performance1

Unnamed: 0,season,batsman_runs
0,2008,414
1,2009,332
2,2010,287
3,2011,392
4,2012,357
5,2013,461
6,2014,371
7,2015,372
8,2016,284
9,2017,290


Now to plot the line charts we have to provide multiple traces

In [25]:
trace = go.Scatter(x=performance['season'],y=performance['batsman_runs'],marker={'color':'#00a65a'})
trace1 = go.Scatter(x=performance1['season'],y=performance1['batsman_runs'],marker={'color':'#7A780A'})
data = [trace,trace1]
layout = go.Layout(title='Year by Year performance',xaxis={'title':'Season'},yaxis={'title':'Batsman runs'})
fig = go.Figure(data=data,layout=layout)
pyo.plot(fig)

'temp-plot.html'

To create this type of multiple line chart instead of passing multiple traces we can create a function that takes a tuple of batsman names as input

In [30]:
def batsman_comp(*name):
    data=[]
    for i in name:
        single=ipl[ipl['batsman']==i]
        performance=single.groupby('season')['batsman_runs'].sum().reset_index()

        trace=go.Scatter(x=performance['season'],y=performance['batsman_runs']
                         ,mode='lines + markers',name=i)
        
        data.append(trace)
    
    layout=go.Layout(title='Batsman Record Comparator',
                xaxis={'title':'Season'},
                yaxis={'title':'Runs'})

    fig=go.Figure(data=data,layout=layout)
    pyo.plot(fig,filename='year_by_year.html')
batsman_comp('V Kohli', 'RG Sharma','DA Warner','MS Dhoni')