# This Notebook specializes in data visualization with python plot tools: matplotlib, seaborn, plotly.
<br>

**I integrated tips and codes, matplotlib, sns, plotly, of data visualization for my biz and kaggle.<br>
If you find it useful, please upvote it as well.**

## I referred following links.<br>

#### [An excellent cheetsheet on Matplotlib](https://www.kaggle.com/general/140923#796614) <br>
<img src="https://raw.githubusercontent.com/rougier/matplotlib-cheatsheet/master/matplotlib-cheatsheet.png" width="500">
<br>

#### [チートシートと各種グラフ](https://qiita.com/4m1t0/items/76b0033edb545a78cef5)<br>
<img src="https://camo.qiitausercontent.com/35c1430c8117d9b93fa4ab698c1b4b5019225862/68747470733a2f2f71696974612d696d6167652d73746f72652e73332e616d617a6f6e6177732e636f6d2f302f3130383732392f38333332653831372d393261302d373834382d366438332d3339376230346630633735662e706e67" width="500">

#### [Python graph gallery](https://www.python-graph-gallery.com/)

<img src="https://matplotlib.org/matplotblog/posts/python-graph-gallery.com/sections-overview.png" width="500">

#### [Plotly](https://plotly.com/python/)

<img src="https://upload.wikimedia.org/wikipedia/en/0/0a/Gallery_of_Plotly_Graphs.png" width="500">

#### [Plotly Tutorial for Beginners](https://www.kaggle.com/kanncaa1/plotly-tutorial-for-beginners)


#### [早く知っておきたかったmatplotlibの基礎知識](https://qiita.com/skotaro/items/08dc0b8c5704c94eafb9)

#### [matplotlibで斜線付きheatmap](https://www.anarchive-beta.com/entry/2021/10/01/100507)


## plot tools

#### matplotlib

#### [seaborn](http://seaborn.pydata.org/tutorial/function_overview.html)
- Overview of seaborn plotting functions

<img src="http://seaborn.pydata.org/_images/function_overview_8_0.png" width="500">

#### [plotly](https://plotly.com/python/)

<img src="https://www.statworx.com/wp-content/uploads/plotly-structure-chart-infographik.png" width="500">

refer : https://www.statworx.com/de/blog/plotly-an-interactive-charting-library/



## plotly Tips

### Three different ways to create fig

#### 1. create "fig" data structure

plotly "fig" object has "data" and "layout" keys.<br>
Create "fig" structure by dict() and dict().update().<br>
```
fig = dict({
    "data": [{"type": "bar",
              "x": [1, 2, 3],
              "y": [1, 3, 2]}],
    "layout": {"title": {"text": "A Figure Specified By Python Dictionary"}}
})
```

#### 2. use graph object

create blank "fig" object and add datas to "fig" by add_trace.

```
fig = go.Figure()
fig.add_trace(go.Bar(x=group["Fruit"], y=group["Area1"], name=contestant.....)
fig.add_trace(go.Bar(x=group["Fruit"], y=group["Area2"], name=contestant.....)
fig.add_trace ....
```

#### 3. use plotly express

This is best way to create plot by pandas dataframe.<br>
You should specify graph type, x,y axis data column name and layout options.<br>
So, you need to reshaping datastructure of dataframe, wide to long by melt().
```
fig = px.bar(df, x="Fruit", y="Number Eaten", color="Contestant", barmode="group")
```

#### Note

all style modify just same "fig" data structure.<br>
You can create same 1.style "fig" data strucutre by 2. and 3. style. 

# Plotly Data Structure

refer to "https://dodotechno.com/covd-19-visualization-plotly/"

## W/O slider/animation

<img src="https://dodotechno.com/wp-content/uploads/2020/04/plotly-structure-1.png" width="500">

## W/ slider/animation
<img src="https://dodotechno.com/wp-content/uploads/2020/04/plotly-structure-2-1.png" width="500">

In [None]:
!pip install joypy
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from joypy import joyplot
import tqdm

%matplotlib inline

path = '../input/ventilator-pressure-prediction'
train = pd.read_csv(f"{path}/train.csv")

target_column = "R"

# pressure range histgram for each time step id

In [None]:
train["time_step_id"] = list(range(1,81,1)) * int(len(train)/80)
range_bins = [0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65]
bins_name = ['~5', '~10', '~15','~20','~25', '~30', '~35', '~40','~45', '~50', '~55', '~60', '~65']
train["pressure_range"] = pd.cut(train["pressure"],bins=range_bins,labels=bins_name)
tmp_df = pd.DataFrame()
for k,grp in train.groupby("time_step_id"):
    tmp_df = tmp_df.append(grp["pressure_range"].value_counts())
tmp_df = tmp_df.reset_index(drop=True)
tmp_df.columns = tmp_df.columns.astype(str)
tmp_df = tmp_df.reindex(columns=bins_name)
ax = sns.heatmap(tmp_df)
ax.set_xlabel("pressure_range", fontsize = 20)
ax.set_ylabel("time_step_id", fontsize = 20)
plt.show()

# For competition ::

## confirm linearity of time_step

In [None]:
time_step_diff_limit = 0.04
non_liner_timestep_breath_ids = list()
for k, grp in train.groupby("breath_id"):
    diff_se = grp["time_step"].diff()
    diff_chk = diff_se[diff_se > time_step_diff_limit]
    if len(diff_chk) != 0:
        non_liner_timestep_breath_ids.append(k)

## visualize non linearity time_step

In [None]:
non_liner_timestep_df = train[train["breath_id"].isin(non_liner_timestep_breath_ids)]
fig = go.Figure()
for k,grp in non_liner_timestep_df.groupby("breath_id"):
    grp = grp.reset_index(drop=True)
    fig.add_trace(go.Scatter(x=grp.index, y=grp["time_step"], mode='lines', name=k))
fig.show()

## visualize linearity time_step

In [None]:
liner_timestep_df = train[~train["breath_id"].isin(non_liner_timestep_breath_ids)]
fig = go.Figure()
for k,grp in liner_timestep_df[:80*10000].groupby("breath_id"):
    grp = grp.reset_index(drop=True)
    fig.add_trace(go.Scatter(x=grp.index, y=grp["time_step"], mode='lines', name=k))
fig.show()

## find minus pressure data

In [None]:
minus_pressure_breath_ids = list()
for k, grp in train.groupby("breath_id"):
    m = grp["pressure"].min()
    if m < 0:
        minus_pressure_breath_ids.append(k)

## visualize minus pressure data

In [None]:
minus_pressure_df = train[train["breath_id"].isin(minus_pressure_breath_ids)]
minus_pressure_df_plotly = pd.melt(minus_pressure_df,id_vars=["time_step","breath_id"], value_vars=["pressure"])
fig = px.line(minus_pressure_df_plotly, x="time_step" , y="value",color = "variable",line_group ="breath_id")
for line in fig.data:
    line['line']['color']='rgba(0, 0, 255, 0.1)'
fig.show()

## count steps of u_out = 1

In [None]:
u_out_open_step_counts = list()
for k, grp in train.groupby("breath_id"):
    count = grp.groupby("u_out")["id"].count()[1]
    u_out_open_step_counts.append(count)

## visualize by histgram counts of u_out = 1

In [None]:
fig = px.histogram(x=u_out_open_step_counts,nbins=8)
fig.update_layout(title="u_out = 1 count histgram in train")
fig.show()

## data count of counts over 52 of u_out = 1

In [None]:
u_out_open_step_counts_over52 = list()
for k, grp in train.groupby("breath_id"):
    count = grp.groupby("u_out")["id"].count()[1]
    if count > 51:
        u_out_open_step_counts_over52.append(count)
len(u_out_open_step_counts_over52)

# Histgram

Histgram of "R" in train.csv

## matplotlib

https://pythondatascience.plavox.info/matplotlib/%E3%83%92%E3%82%B9%E3%83%88%E3%82%B0%E3%83%A9%E3%83%A0

In [None]:
plt.hist(train[target_column])

## seaborn 

http://seaborn.pydata.org/generated/seaborn.distplot.html

In [None]:
sns.histplot(train[target_column])

## plotly

https://plotly.com/python/histograms/

In [None]:
fig = px.histogram(train[target_column])
fig.show()

# Single Line Chart

"u_in" line chart of "breath_id" == 1

## matplotlib

https://pythondatascience.plavox.info/matplotlib/%E6%8A%98%E3%82%8C%E7%B7%9A%E3%82%B0%E3%83%A9%E3%83%95



In [None]:
breath_id_1_df = train[train["breath_id"] == 1]
plt.plot(breath_id_1_df["time_step"] , breath_id_1_df["u_in"])

#### multi line

## seaborn 

https://seaborn.pydata.org/generated/seaborn.lineplot.html

In [None]:
sns.lineplot(x="time_step" , y="u_in", data=breath_id_1_df)

## plotly

https://plotly.com/python/line-charts/

In [None]:
fig = px.line(breath_id_1_df, x="time_step" , y="u_in")
fig.show()

# Multiple Line Chart (one breath_id data)

"u_in" and "u_out" of line chart of "breath_id" == 1

## matplotlib

https://pythondatascience.plavox.info/matplotlib/%E6%8A%98%E3%82%8C%E7%B7%9A%E3%82%B0%E3%83%A9%E3%83%95

In [None]:
plt.plot(breath_id_1_df["time_step"] , breath_id_1_df[["u_in","u_out"]])

## seaborn
https://seaborn.pydata.org/generated/seaborn.lineplot.html<br>

need to change "u_in" and "u_out" dataframe structure, wide to long format.

In [None]:
breath_id_1_df_sns = pd.melt(breath_id_1_df,id_vars=["id","breath_id","R","C","time_step","pressure"])
sns.lineplot(x="time_step" , y="value",hue = "variable", data=breath_id_1_df_sns)

## plotly
https://plotly.com/python/line-charts/<br>
https://plotly.com/python-api-reference/generated/plotly.express.line<br>

need to change "u_in" and "u_out" dataframe structure, wide to long format.

In [None]:
breath_id_1_df_plotly = pd.melt(breath_id_1_df,id_vars=["id","breath_id","R","C","time_step","pressure"])
fig = px.line(breath_id_1_df_plotly, x="time_step" , y="value",color = "variable", )
fig.show()

# Multiple Line Chart (all breath_id "u_in" data)

## matplotlib

https://pythondatascience.plavox.info/matplotlib/%E6%8A%98%E3%82%8C%E7%B7%9A%E3%82%B0%E3%83%A9%E3%83%95

In [None]:
for key, grp in train[:10000].groupby('breath_id'):
    plt.plot(grp["time_step"],grp['u_in'],'g-', alpha=0.01)
plt.show()

## seaborn

need to change "u_in" dataframe structure, wide to long format.

https://seaborn.pydata.org/generated/seaborn.lineplot.html<br>

https://www.delftstack.com/ja/howto/seaborn/remove-legend-seaborn-plot/

melt<br>
https://pandas.pydata.org/docs/reference/api/pandas.melt.html

In [None]:
all_u_in_df_sns = pd.melt(train[:10000],id_vars=["id", "u_out","R","C","time_step","pressure","breath_id"], value_vars=["u_in"])
sns.lineplot(x="time_step" , y="value",hue = "variable", data=all_u_in_df_sns, units="breath_id",estimator=None,alpha  = 0.01)

## plotly

plot with groupby<br>
https://plotly.com/python-api-reference/generated/plotly.express.line<br>

In [None]:
all_u_in_df_plotly = pd.melt(train[:10000],id_vars=["id", "u_out","R","C","time_step","pressure","breath_id"], value_vars=["u_in"])
fig = px.line(all_u_in_df_plotly, x="time_step" , y="value",color = "variable",line_group ="breath_id")
for line in fig.data:
    line['line']['color']='rgba(0, 0, 255, 0.01)'
fig.show()


# Ridgeline plot (joy plot)

plot u_in disttribution. this is non usefull data.

## matplotlib

https://glowingpython.blogspot.com/2020/03/ridgeline-plots-in-pure-matplotlib.html

In [None]:
## T.D.B

## seaborn

https://seaborn.pydata.org/examples/kde_ridgeplot.html<br>
https://www.python-graph-gallery.com/ridgeline-graph-seaborn<br>
https://towardsdatascience.com/ridgeline-plots-the-perfect-way-to-visualize-data-distributions-with-python-de99a5493052

In [None]:
#all_u_in_df_sns = pd.melt(train[:100],id_vars=["id", "u_out","R","C","time_step","pressure","breath_id"], value_vars=["u_in"])
plt.figure()
joyplot(
    data=train[:1000][['breath_id', 'u_in']], 
    by='breath_id',
    figsize=(12, 8)
)
plt.show()

## seaborn (plot "u_in" and "u_out")

In [None]:
plt.figure()
joyplot(
    data=train[:1000][['breath_id', 'u_in','u_out']], 
    by='breath_id',
    column=['u_in', 'u_out'],
    color=['#686de0', '#eb4d4b'],
    figsize=(12, 8)
)
plt.show()

## plotly

https://plotly.com/python/violin/

In [None]:
fig = go.Figure()
for key, grp in train[:1000].groupby('breath_id'):
    fig.add_trace(go.Violin(x=grp["u_in"]))
fig.update_traces(orientation='h', side='positive', width=3, points=False)
fig.update_layout(xaxis_showgrid=False, xaxis_zeroline=False)
fig.show()

# sequential line charts (3D line chart?)

## matplotlib

## seaborn

## plotly : 3d scatter plot

https://plotly.com/python/3d-scatter-plots/

So easy, but unnn.. not useful.

In [None]:
fig = px.scatter_3d(train[:80*10], x='time_step', y='u_in', z='breath_id',
              color='breath_id')
fig.update_layout(title={"text" : "u_in"})
fig.show()

## plotly : 3d line plot

In [None]:
fig = px.line_3d(train[:80*10], x='time_step', y='u_in', z='breath_id',
              color='breath_id')
fig.update_layout(title={"text" : "u_in"})
fig.show()

## plotly : use slider

In [None]:
ymax = max([grp["u_in"].max() for k,grp in train[:80*10].groupby("breath_id")])
fig = px.line(train[:80*10], x='time_step', y='u_in', animation_frame='breath_id',color='breath_id',range_y=[0,ymax])
graphvar        =   {
    "xaxis" : {"tickangle": -45,"tickfont": {"size": 12},"showline":True,"linewidth":2,"linecolor":"black","mirror":True},
    "yaxis" : {"title" : {"font": {"size": 16}},"rangemode":"tozero","gridcolor" : "rgb(200,200,200)"},
    "showlegend" : True ,
    "plot_bgcolor" : "white"
}
fig.layout.update(graphvar)
fig.update_layout(title={"text" : "u_in"})
fig.show()

# Group Bar Chart

## matplotlib

https://pythonspot.com/matplotlib-bar-chart/<br>
https://matplotlib.org/stable/gallery/lines_bars_and_markers/barchart.html

In [None]:
fig, ax = plt.subplots()
width = 0.35
index = np.arange(len(breath_id_1_df["time_step"]))
rects1 = ax.bar(index - width/2, breath_id_1_df["u_in"], width)
rects2 = ax.bar(index + width/2, breath_id_1_df["u_out"], width)
plt.xticks(index + width, breath_id_1_df["time_step"].values.tolist())
plt.show()

## seaborn

https://seaborn.pydata.org/generated/seaborn.barplot.html

need to change "u_in" and "u_out" dataframe structure, wide to long format.

In [None]:
breath_id_1_df_sns = pd.melt(breath_id_1_df,id_vars=["id","breath_id","R","C","time_step","pressure"])
sns.barplot(x="time_step" , y="value",hue = "variable", data=breath_id_1_df_sns)

## plotly

https://plotly.com/python/bar-charts/

need to change "u_in" and "u_out" dataframe structure, wide to long format.

In [None]:
breath_id_1_df_plotly = pd.melt(breath_id_1_df,id_vars=["id","breath_id","R","C","time_step","pressure"])
fig = px.bar(breath_id_1_df_plotly, x="time_step" , y="value",color = "variable", barmode='group')
fig.show()

# Stacked Bar Chart

## matplotlib

https://pythonspot.com/matplotlib-bar-chart/<br>
https://matplotlib.org/stable/gallery/lines_bars_and_markers/bar_stacked.html

In [None]:
fig, ax = plt.subplots()
width = 0.35
index = np.arange(len(breath_id_1_df["time_step"]))
rects1 = ax.bar(index - width/2, breath_id_1_df["u_in"], width)
rects2 = ax.bar(index + width/2, breath_id_1_df["u_out"], width, bottom=breath_id_1_df["u_in"])
plt.xticks(index + width, breath_id_1_df["time_step"].values.tolist())
plt.show()

## seaborn

https://seaborn.pydata.org/generated/seaborn.barplot.html<br>
https://www.delftstack.com/ja/howto/seaborn/stacked-barplots-seaborn/

In [None]:
## There is no way of stacked bar chart.
#s2 = sns.barplot(x="time_step" , y="u_in", data=breath_id_1_df, color = 'blue')
#s1 = sns.barplot(x="time_step" , y="u_out", data=breath_id_1_df, color = 'red')

## plotly

https://plotly.com/python/bar-charts/

need to change "u_in" and "u_out" dataframe structure, wide to long format.

In [None]:
breath_id_1_df_plotly = pd.melt(breath_id_1_df,id_vars=["id","breath_id","R","C","time_step","pressure"])
fig = px.bar(breath_id_1_df_plotly, x="time_step" , y="value",color = "variable")
fig.show()

# Heatmap


## sample data: u_in peak v.s "R" and "C" data

I reffer following.<br>
[histgram heatmap with custom range bins](https://pbpython.com/pandas-qcut-cut.html)

In [None]:
## add "R_C"
train['R_C'] = [f'{r:02}_{c}' for r, c in zip(train['R'], train['C'])]

In [None]:
## create custom range bins with considering negative u_in value
range_bins = pd.interval_range(start=-10, freq=10, end=100)
range_bins

In [None]:
## keep "R_C" index for next modify.
max_each_berath_id = train.groupby(['breath_id','R_C'])["u_in"].max()
max_each_berath_id

I think there is better code.....

### NOTE : tmp_df columns is pandas.Interval object. need to cast to string.

In [None]:
tmp_df = pd.DataFrame()
for k,grp in max_each_berath_id.groupby("R_C"):
    tmp_df = tmp_df.append(pd.cut(grp,bins=range_bins).value_counts().rename(k))
tmp_df

## matplotlib

In [None]:
plt.figure()
plt.imshow(tmp_df)
plt.show()

## seaborn

In [None]:
tmp_df.columns  = tmp_df.columns.astype(str)
tmp_df = tmp_df.reindex(columns=["(-10, 0]","(0, 10]","(10, 20]","(20, 30]","(30, 40]","(40, 50]","(50, 60]","(60, 70]","(70, 80]","(80, 90]","(90, 100]"])
ax = sns.heatmap(tmp_df)
plt.show()

## plotly with bins

#### NOTE : columns is pandas.Interval object. need to cast to string.

pd.cut is usefull for creating bins

<pre>
import pandas as pd
import numpy as np

df = pd.DataFrame()
df["data"] = pd.Series(np.random.randn(100))
df["cuts"] = pd.cut(df["data"],bins = [-10, -3, -0.5, 0, 0.5, 3, 10, 12, 15])
df = df.dropna(how="any")
count_df = df.groupby("cuts").count()
print(df)
print(count_df)

        data          cuts
0   0.244867    (0.0, 0.5]
1   0.743744    (0.5, 3.0]
2   0.053304    (0.0, 0.5]
3  -1.228094  (-3.0, -0.5]
4  -0.892689  (-3.0, -0.5]
..       ...           ...
95  0.540049    (0.5, 3.0]
96 -0.721860  (-3.0, -0.5]
97 -0.578642  (-3.0, -0.5]
98  0.167253    (0.0, 0.5]
99 -1.057753  (-3.0, -0.5]
</pre>

In [None]:
tmp_df.columns  = tmp_df.columns.astype(str)
tmp_df = tmp_df.reindex(columns=["(-10, 0]","(0, 10]","(10, 20]","(20, 30]","(30, 40]","(40, 50]","(50, 60]","(60, 70]","(70, 80]","(80, 90]","(90, 100]"])
fig = px.imshow(tmp_df.values.tolist(),x=tmp_df.columns.astype(str).tolist(),y=tmp_df.index.tolist())
fig.update_layout(title='u_in max range histgram')
fig.show()