<img src="https://prismic-io.s3.amazonaws.com/plotly-marketing-website/bd1f702a-b623-48ab-a459-3ee92a7499b4_logo-plotly.svg">

# Introduction to Statistical Charts

In [1]:
import plotly.express as px

In [2]:
df = px.data.tips()

In [3]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


# A simple histogram

In [5]:
fig = px.histogram(df,x='total_bill', color='time')
fig.show()

We can see that in general, tips are usually between 10 and 20$.

However there is an outliet 50$ + for dinner.

----

# `histfunc` and using `average`

We can see in below graph that average tip size grows as the total_bill grows too.

In [7]:
fig = px.histogram(df,x='total_bill', y='tip', histfunc='avg')
fig.show()

-----

# Comparing distribution with `Distplot` using `Figure Factory`

to use distplot, we need to import figure factory.

In [8]:
import plotly.figure_factory as ff

In [9]:
# separate out lunch and dinner tips
lunch_tips = df[df['time'] == 'Lunch']['tip']
dinner_tips = df[df['time'] =='Dinner']['tip']

In [12]:
# organize labels and data
group_labels = ['Lunch', 'Dinner']
hist_data = [lunch_tips, dinner_tips]

In [13]:
fig = ff.create_distplot(hist_data, group_labels)
fig.show()

Based on the distribution results, we can see that in general Dinner Tips are on higher end than Lunch Tips.

Lunch have a higher propotions of low tips amount than Dinner one.

-------

#### Rug Plot on bottom
Each obversation is bucketed. 

## removing `curve` on distpot
* `show_curve=False`

In [15]:
fig = ff.create_distplot(hist_data, group_labels, show_curve=False)
fig.show()

# now, there is no curve line in the chart.

## Changing bin size
* `bin_size`

In [17]:
fig = ff.create_distplot(hist_data, group_labels, show_curve=False, bin_size=0.5)
fig.show()

-------

# Fast Exploratory Data Analysis with `Scatter Matrix`

* This allows us to create scatter with all the dimensions that we specified.

In [18]:
# we will use iris dataset
iris = px.data.iris()
iris.head(2)

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species,species_id
0,5.1,3.5,1.4,0.2,setosa,1
1,4.9,3.0,1.4,0.2,setosa,1


In [19]:
iris.columns

Index(['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species',
       'species_id'],
      dtype='object')

In [21]:
fig = px.scatter_matrix(iris, 
                 dimensions = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width'],
                 color = 'species')

fig.show()

As the diagonal information is not that useful, we will remove it

## Removing `diagonal_visible=False`

* fig.update_traces(diagonal_visible=False)

In [24]:
fig = px.scatter_matrix(iris, 
                 dimensions = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width'],
                 color = 'species')

fig.update_traces(diagonal_visible=False)

fig.show()

------

# Generate a Correlation Matrix with `px.imshow()`

In [25]:
corr_matrix = iris.drop('species_id', axis=1).corr() # as species_id is not required, we will remove it
corr_matrix

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
sepal_length,1.0,-0.109369,0.871754,0.817954
sepal_width,-0.109369,1.0,-0.420516,-0.356544
petal_length,0.871754,-0.420516,1.0,0.962757
petal_width,0.817954,-0.356544,0.962757,1.0


In [26]:
fig = px.imshow(corr_matrix, 
                x = corr_matrix.index,
                y = corr_matrix.index,)

fig.show()

-------

# Plotly Color Scales and Sequences

https://plotly.com/python/builtin-colorscales/

It is like `palette or cmap` in matplotlib and seaborn.

In [27]:
fig = px.imshow(corr_matrix, 
                x = corr_matrix.index,
                y = corr_matrix.index,
                color_continuous_scale = px.colors.sequential.Viridis)

fig.show()

In [28]:
fig = px.imshow(corr_matrix, 
                x = corr_matrix.index,
                y = corr_matrix.index,
                color_continuous_scale = ['red', 'white', 'green'])

fig.show()

In [29]:
fig = px.imshow(corr_matrix, 
                x = corr_matrix.index,
                y = corr_matrix.index,
                color_continuous_scale = px.colors.diverging.BrBG)

fig.show()