# 6\. Visualization

## 6.1 matplotlib

reference: https://matplotlib.org/stable/plot_types/index.html

#### Scatter Plots
데이터가 어떻게 분포하고 있는지 확인할 때 사용

```python
import matplotlib.pyplot as plt
import numpy as np

# make the data
x = 4 + np.random.normal(0, 2, 24)
y = 4 + np.random.normal(0, 2, len(x))

# size and color:
sizes = np.random.uniform(15, 80, len(x))
colors = np.random.uniform(15, 80, len(x))

# plot
fig, ax = plt.subplots()
ax.scatter(x, y, s=sizes, c=colors, vmin=0, vmax=100)

ax.set(
  xlim=(0, 8), xticks=np.arange(1, 8),
  ylim=(0, 8), yticks=np.arange(1, 8)
  )

plt.show()
```

In [None]:
import numpy as np
import pandas as pd

#### Bar Charts
범주별 빈도를 확인할 때 사용

```python
import matplotlib.pyplot as plt
import numpy as np

# make data:
x = 0.5 + np.arange(8)
y = np.random.uniform(2, 7, len(x))

# plot
fig, ax = plt.subplots()

ax.bar(x, y, width=1, edgecolor="white", linewidth=0.7)

ax.set(
  xlim=(0, 8), xticks=np.arange(1, 8),
  ylim=(0, 8), yticks=np.arange(1, 8)
  )

plt.show()
```

#### Line Chart
시간의 흐름에 따른 변화를 보고자할 때 사용

```python
import matplotlib.pyplot as plt
import numpy as np

# make data
x = np.linspace(0, 10, 100)
y = 4 + 2 * np.sin(2 * x)

# plot
fig, ax = plt.subplots()

ax.plot(x, y, linewidth=2.0)

ax.set(
  xlim=(0, 8), xticks=np.arange(1, 8),
  ylim=(0, 8), yticks=np.arange(1, 8)
  )

plt.show()
```

#### Box Plots
변수의 이상치 및 특징을 파악하기 위해 사용

```python
import matplotlib.pyplot as plt
import numpy as np

plt.style.use('_mpl-gallery')

# make data:
data = np.random.normal((3, 5, 4), (1.25, 1.00, 1.25), (100, 3))

# plot
fig, ax = plt.subplots()
box_plot = ax.boxplot(
  data, 
  positions=[2, 4, 6], 
  widths=1.5, patch_artist=True,
  showmeans=False, showfliers=False,
  medianprops={"color": "white", "linewidth": 0.5},
  boxprops={
    "facecolor": "C0", "edgecolor": "white", "linewidth": 0.5},
  whiskerprops={"color": "C0", "linewidth": 1.5},
  capprops={"color": "C0", "linewidth": 1.5}
  )

plt.show()
```

#### Pie Charts
카테고리별 비율을 시각화하기 위해 사용

```python
import matplotlib.pyplot as plt

# Pie chart, where the slices will be ordered and plotted counter-clockwise:
labels = 'Frogs', 'Hogs', 'Dogs', 'Logs'
sizes = [15, 30, 45, 10]
explode = (0, 0.1, 0, 0)  # only "explode" the 2nd slice

fig1, ax1 = plt.subplots()
ax1.pie(
  sizes, 
  explode=explode, 
  labels=labels, 
  autopct='%1.1f%%',
  shadow=True, 
  startangle=90
  )
ax1.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.

plt.show()
```

#### Histograms
데이터의 분포를 확인하기 위해 사용

```python
import matplotlib.pyplot as plt
import numpy as np

# make data
x = 4 + np.random.normal(0, 1.5, 200)

# plot:
fig, ax = plt.subplots()

ax.hist(x, bins=8, linewidth=0.5, edgecolor="white")

ax.set(
  xlim=(0, 8), xticks=np.arange(1, 8),
  ylim=(0, 56), yticks=np.linspace(0, 56, 9))

plt.show()
```

## 6.2 seaborn

reference: https://seaborn.pydata.org

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme(style="whitegrid")

### Scatter Plots
x가 continous한 값이라면 산점도 <br>
x가 categorical 값이라면 jitter <br>
<br>
args
  - hue: 그룹 별 다른 색상으로 plot
  - style: 카테고리별 다른 그림으로 plot
  - col: 해당 컬럼의 값을 기준으로 분할하여 plot
  - row: 해당 컬럼의 값을 기준으로 분할하여 plot

```python
tips = sns.load_dataset("tips")

sns.relplot(
    data=tips,
    x="total_bill", y="tip", hue="smoker", style="time",
)
```

### Bar Charts
args
  - hue: 그룹 별 다른 색상으로 plot
  - x: x를 쓰면 일반적인 수직 그래프
  - y: y를 쓰면 수평 그래프
  - kind='count': 빈도 계산
  - palette='pastel': 다양한 색상을 입힘
  - col: 해당 컬럼의 값을 기준으로 분할하여 plot
  - row: 해당 컬럼의 값을 기준으로 분할하여 plot

```python
titanic = sns.load_dataset("titanic")

sns.catplot(
    data=titanic, 
    x="deck", hue="class", kind="count",
    palette="pastel",
)
```

### Line Charts

>  
x가 continous한 값이라면 일반적 시계열 모양 <br>
x가 중복된 값을 가지고 있으면 95% 신뢰구간을 표현 <br>
<br>
args
  - errorbar
    - None: 신뢰구간 제거
    - sd: 표준편차 신뢰구간
  - hue: 그룹 별 다른 색상으로 plot
  - style: 카테고리별 다른 그림으로 plot
  - col: 해당 컬럼의 값을 기준으로 분할하여 plot
  - row: 해당 컬럼의 값을 기준으로 분할하여 plot

```python
dowjones = sns.load_dataset("dowjones")
sns.relplot(data=dowjones, x="Date", y="Price", kind="line")

fmri = sns.load_dataset("fmri")
sns.relplot(data=fmri, x="timepoint", y="signal", kind="line")
```

### Box Plots
>  
args
  - hue: 그룹 별 다른 색상으로 plot
  - col: 해당 컬럼의 값을 기준으로 분할하여 plot
  - row: 해당 컬럼의 값을 기준으로 분할하여 plot

```python
tips = sns.load_dataset("tips")
sns.catplot(data=tips, x="day", y="total_bill")                           # 점도표
sns.catplot(data=tips, x="day", y="total_bill", kind="box")               # boxplot
sns.catplot(
  data=tips, x="total_bill", y="day", hue="sex", 
  kind="violin", split=True
  ) # violin
```

### Histograms
>  
args
  - binwidth: 구간 범위 설정
  - hue: 그룹 별 다른 색상으로 plot
  - multiple: 다른 그룹을 어떻게 표현할지 설정
    - stack: 합하여 출력
    - dodge: 분리하여 출력
  - col: 해당 컬럼의 값을 기준으로 분할하여 plot
  - row: 해당 컬럼의 값을 기준으로 분할하여 plot
  - stats: 어떤 통계지표로 출력할지 출력
    - density: 밀도 출력
    - probability: 확률 출력

```python
penguins = sns.load_dataset("penguins")
sns.displot(penguins, x="flipper_length_mm", binwidth=3, hue='species', multiple='dodge', stats='density')
```

### Distplots
>args
  - kind: 어떤 타입을 출력할지 설정
    - kde: 커널 분포 출력
    - ecdf: 누적 분포 출력
  - hue: 그룹 별 다른 색상으로 plot
  - multiple: 다른 그룹을 어떻게 표현할지 설정
    - stack: 합하여 출력
    - dodge: 분리하여 출력
  - fill: bool: 면적을 채울지 설정

```python
sns.displot(penguins, x="flipper_length_mm", kind="kde")
```
<br>


**joint plot**
```python
sns.jointplot(data=penguins, x="bill_length_mm", y="bill_depth_mm")

sns.jointplot(
    data=penguins,
    x="bill_length_mm", y="bill_depth_mm", hue="species",
    kind="kde"
)

```

### Heatmaps
> 
args
- cmap: colormap 설정
  - Blues
  - Reds
  - YlGnBu

```python
normal_data = np.random.randn(10, 12)
ax = sns.heatmap(normal_data)
```

### Model
>
args
- y_jitter: y값의 출력 옵션 설정
- hue: 그룹 별 다른 색상으로 plot
- markers: 어떤 모양으로 표현할지 설정
- logistic: bool
  - True: logistic regression plot
- lowess
  - True: nonparametric regression by lowess
- col: 해당 컬럼의 값을 기준으로 분할하여 plot
- row: 해당 컬럼의 값을 기준으로 분할하여 plot

```python
tips = sns.load_dataset("tips")
sns.lmplot(x="total_bill", y="tip", data=tips)

tips["big_tip"] = (tips.tip / tips.total_bill) > .15
sns.lmplot(x="total_bill", y="big_tip", data=tips, y_jitter=.03, logistic=True)
```
<br>

**joint plot**
```python
sns.jointplot(x="total_bill", y="tip", data=tips, kind="reg");
```

**pair plot**
```python
sns.pairplot(tips, x_vars=["total_bill", "size"], y_vars=["tip"],
             hue="smoker", height=5, kind="reg");
```

## 6.3 plotly

reference: https://plotly.com/python/

In [2]:
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import plotly.figure_factory as ff
from plotly.subplots import make_subplots

### Bubble Charts
**px**

```ptyhon
df = px.data.gapminder()

fig = px.scatter(
  df.query("year==2007"), 
  x="gdpPercap", y="lifeExp",
  size="pop", color="continent",
  hover_name="country", 
  log_x=True, size_max=60)
fig.show()
```

**go**

```python
fig = go.Figure(
  data=[go.Scatter(    
    x=[1, 3.2, 5.4, 7.6, 9.8, 12.5],
    y=[1, 3.2, 5.4, 7.6, 9.8, 12.5],
    mode='markers',
    marker=dict(
      color=[120, 125, 130, 135, 140, 145],
      size=[15, 30, 55, 70, 90, 110],
      showscale=True
      ))])

fig.show()
```

px

In [44]:
df = px.data.tips()

# fig = px.scatter(
#   df.query("year==2007"), 
#   x="gdpPercap",
#   y="lifeExp",
#   #size='pop',
#   color='continent',
#   log_x=True)
# fig.show()

go

In [45]:
fig = go.Figure(
    data=[
        go.Scatter(
            x=[1, 3.2, 5.4, 7.6, 9.8, 12.5],
            y=[1, 3.2, 5.4, 7.6, 9.8, 12.5],
            mode='markers',
            marker=dict(
                color=[120, 125, 130, 135, 140, 145],
                size= [15, 30, 55, 70, 90, 110],
                showscale=True
            )
        ),
        name='test'
    ]
)
fig.add_trace(
    data = [
        go.Scatter(
            x=np.array([1, 3.2, 5.4, 7.6, 9.8, 12.5])+2,
            y=np.array([1, 3.2, 5.4, 7.6, 9.8, 12.5])+2,
            mode='markers',
            marker=dict(
                color=[120, 125, 130, 135, 140, 145],
                size= [15, 30, 55, 70, 90, 110],
                showscale=True
            )
        ),
        name='test'
    ]
)
fig.show()

SyntaxError: invalid syntax (3295486506.py, line 13)

In [46]:
fig = go.Figure()
fig.add_trace(
    go.Scatter(
        x=[1, 3.2, 5.4, 7.6, 9.8, 12.5],
        y=[1, 3.2, 5.4, 7.6, 9.8, 12.5],
        mode='markers',
        marker=dict(
            color=[120, 125, 130, 135, 140, 145],
            size= [15, 30, 55, 70, 90, 110],
            showscale=True
        )
    )
)
fig.show()

In [47]:
fig = px.scatter(
  df.query("year==2007"), 
  x="gdpPercap",
  y="lifeExp")
fig.add_trace
fig.show()

UndefinedVariableError: name 'year' is not defined

In [56]:
df= px.data.gapminder().query('continent== "Oceania"')
fig = px.line(df, x='year', y='lifeExp', color='country')
fig.show()

In [54]:
fig = px.scatter(
    df,
    x='sepal_length',
    y='sepal-width',
    trendline='ols'
)

ValueError: Value of 'x' is not the name of a column in 'data_frame'. Expected one of ['total_bill', 'tip', 'sex', 'smoker', 'day', 'time', 'size'] but received: sepal_length

In [51]:
fig = go.Figure()
fig.add_trace(
    go.Scatter(
        df.query("year==2007"), 
        x="gdpPercap",
        y="lifeExp",
        
        log_x=True
    )
)

UndefinedVariableError: name 'year' is not defined

In [31]:
list(set(np.array(df.continent)))

['Europe', 'Asia', 'Africa', 'Oceania', 'Americas']

In [34]:
fig = go.Figure()
for continent in list(set(np.array(df.continent))):
    fig.add_trace(
        go.Scatter(
            x= np.log(df.query(f'year==2007 and continent == "{continent}"').gdpPercap),
            y= df.query(f'year==2007 and continent == "{continent}"').lifeExp,
            mode='markers',
            name=continent
        )
    )
# fig.add_trace(
#     go.Scatter(
#         x= np.log(df.query('year==2007 and continent == "Africa"').gdpPercap),
#         y= df.query('year==2007 and continent == "Africa"').lifeExp,
#         mode='markers',
#         name='Afica'
#     )
# )
# fig.add_trace(
#     go.Scatter(
#         x= np.log(df.query('year==2007 and continent == "Europe"').gdpPercap),
#         y= df.query('year==2007 and continent == "Europe"').lifeExp,
#         mode='markers',
#         name='Europe'
#     )
# )
# fig.add_trace(
#     go.Scatter(
#         x= np.log(df.query('year==2007 and continent == "Americas"').gdpPercap),
#         y= df.query('year==2007 and continent == "Americas"').lifeExp,
#         mode='markers',
#         name='Americas'
#     )
# )
# fig.add_trace(
#     go.Scatter(
#         x= np.log(df.query('year==2007 and continent == "Asia"').gdpPercap),
#         y= df.query('year==2007 and continent == "Asia"').lifeExp,
#         mode='markers',
#         name='Asia'
#     )
# )
fig.show()

In [None]:
fig = go.Figure()
fig.add_trace(
    go.Scatter(
        x= np.log(df.query('year==2007').gdpPercap),
        y= df.query('year==2007').lifeExp,
        mode='markers'
    )
)
fig.show()

In [38]:
df

Unnamed: 0,country,continent,year,lifeExp,pop,gdpPercap,iso_alpha,iso_num
0,Afghanistan,Asia,1952,28.801,8425333,779.445314,AFG,4
1,Afghanistan,Asia,1957,30.332,9240934,820.853030,AFG,4
2,Afghanistan,Asia,1962,31.997,10267083,853.100710,AFG,4
3,Afghanistan,Asia,1967,34.020,11537966,836.197138,AFG,4
4,Afghanistan,Asia,1972,36.088,13079460,739.981106,AFG,4
...,...,...,...,...,...,...,...,...
1699,Zimbabwe,Africa,1987,62.351,9216418,706.157306,ZWE,716
1700,Zimbabwe,Africa,1992,60.377,10704340,693.420786,ZWE,716
1701,Zimbabwe,Africa,1997,46.809,11404948,792.449960,ZWE,716
1702,Zimbabwe,Africa,2002,39.989,11926563,672.038623,ZWE,716


In [39]:
fig = go.Figure()
for continent in list(set(np.array(df.continent))):
    fig.add_trace(
        go.Scatter(
            x= np.log(df.query(f'year==2007 and continent == "{continent}"').gdpPercap),
            y= df.query(f'year==2007 and continent == "{continent}"').lifeExp,
            mode='markers',
            marker = dict(
                color=[120, 125, 130, 135, 140, 145],
                size= np.log(df.query(f'year==2007 and continent == "{continent}"').pop),
                showscale=True
            ),
            name=continent
        )
    )
fig.show()

TypeError: loop of ufunc does not support argument 0 of type method which has no callable log method

In [None]:
fig = go.Figure()
fig.add_trace(
    go.Scatter(
        x=[1, 3.2, 5.4, 7.6, 9.8, 12.5],
        y=[1, 3.2, 5.4, 7.6, 9.8, 12.5],
        mode='markers',
        marker=dict(
            color=[120, 125, 130, 135, 140, 145],
            size= [15, 30, 55, 70, 90, 110],
            showscale=True
        )
    )
)
fig.show()

### Scatter Plots

**px**
  - color: 그룹 별 다른 색상으로 plot
  - marginal_x: 지정된 컬럼의 x축 주변부 분포 출력
  - marginal_y: 지정된 컬럼의 y축 주변부 분포 출력
  - facet_col: 지정된 컬럼별 그래프 column 출력
  - facet_row: 지정된 컬럼별 그래프 row 출력
  - trendline: plot의 모델링 결과 표시
    - ols: 회귀직선

```python
df = px.data.iris()

fig = px.scatter(
  df, x="sepal_width", y="sepal_length", color="species",
  size='petal_length', hover_data=['petal_width'])
fig.show()
```

**go**
```python
import plotly.graph_objects as go

fig = go.Figure(
  data=go.Scatter(
    x=[1, 2, 3, 4],
    y=[10, 11, 12, 13],
    mode='markers',
    marker=dict(
      size=[40, 60, 80, 100],
      color=[0, 1, 2, 3])
      )
    )

fig.show()
```

In [42]:
fig = px.scatter(
    df,
    x='sepal_width',
    y='sepal_length',
    color='species',
    facet_col='iso_alpha'
)
fig.show()

ValueError: Value of 'x' is not the name of a column in 'data_frame'. Expected one of ['country', 'continent', 'year', 'lifeExp', 'pop', 'gdpPercap', 'iso_alpha', 'iso_num'] but received: sepal_width

In [41]:
df

Unnamed: 0,country,continent,year,lifeExp,pop,gdpPercap,iso_alpha,iso_num
0,Afghanistan,Asia,1952,28.801,8425333,779.445314,AFG,4
1,Afghanistan,Asia,1957,30.332,9240934,820.853030,AFG,4
2,Afghanistan,Asia,1962,31.997,10267083,853.100710,AFG,4
3,Afghanistan,Asia,1967,34.020,11537966,836.197138,AFG,4
4,Afghanistan,Asia,1972,36.088,13079460,739.981106,AFG,4
...,...,...,...,...,...,...,...,...
1699,Zimbabwe,Africa,1987,62.351,9216418,706.157306,ZWE,716
1700,Zimbabwe,Africa,1992,60.377,10704340,693.420786,ZWE,716
1701,Zimbabwe,Africa,1997,46.809,11404948,792.449960,ZWE,716
1702,Zimbabwe,Africa,2002,39.989,11926563,672.038623,ZWE,716


### Line Charts

**px**

args
  - markers: bool
    - True: line 상에 점 표시
  - color: 그룹 별 다른 색상으로 plot

```python
df = px.data.gapminder().query("continent == 'Oceania'")

fig = px.line(df, x='year', y='lifeExp', color='country')
fig.show()
```
<br>

**go**

args
  - markers: bool
    - True: line 상에 점 표시
  - color: 그룹 별 다른 색상으로 plot
  - name: legend에 표기될 이름

```python
N = 100
random_x = np.linspace(0, 1, N)
random_y0 = np.random.randn(N) + 5
random_y1 = np.random.randn(N)
random_y2 = np.random.randn(N) - 5

fig = go.Figure()

# Add traces
fig.add_trace(
  go.Scatter(
    x=random_x, y=random_y0,
    mode='markers',
    name='markers'
    )
    )
fig.add_trace(
  go.Scatter(
    x=random_x, y=random_y1,
    mode='lines+markers',
    name='lines+markers')
    )
fig.add_trace(
  go.Scatter(
    x=random_x, y=random_y2,
    mode='lines',
    name='lines')
    )

fig.show()
```

### Bar Charts

**px**

args
  - color: bar에 입력한 column별로 plot
  - barmode
    - stack: 누적
    - group: 분할
  - title: plot의 제목 입력
  - text_auto
    - True: 수치 표기
  - facet_col: 지정된 컬럼별 그래프 column 출력
  - facet_row: 지정된 컬럼별 그래프 row 출력
  
```python
long_df = px.data.medals_long()

fig = px.bar(
  long_df, x="nation", y="count", color="medal", title="Long-Form Input"
  )
fig.show()
```
<br>

**go**
args
  - barmode
    - stack: 누적
    - group: 분할


```python
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
          'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']

fig = go.Figure()
fig.add_trace(
  go.Bar(
    x=months,
    y=[20, 14, 25, 16, 18, 22, 19, 15, 12, 16, 14, 17],
    name='Primary Product',
    marker_color='indianred'
    )
  )
fig.add_trace(go.Bar(
    x=months,
    y=[19, 14, 22, 14, 16, 19, 15, 14, 10, 12, 12, 16],
    name='Secondary Product',
    marker_color='lightsalmon'
    )
  )

fig.update_layout(barmode='group', xaxis_tickangle=-45)
fig.show()

----------
# highlight

colors = ['lightslategray',] * 5
colors[1] = 'crimson'

fig = go.Figure(data=[go.Bar(
    x=['Feature A', 'Feature B', 'Feature C',
       'Feature D', 'Feature E'],
    y=[20, 14, 23, 25, 22],
    marker_color=colors 
)])

-------------
# compare

years = ['2016','2017','2018']

fig = go.Figure()
fig.add_trace(
  go.Bar(
    x=years, y=[500, 600, 700],
    base=[-500,-600,-700],
    marker_color='crimson',
    name='expenses')
    )
fig.add_trace(
  go.Bar(
    x=years, y=[300, 400, 700],
    base=0,
    marker_color='lightslategrey',
    name='revenue'
    )
  )

fig.show()
```

In [57]:
long_df= px.data.medals_long()
long_df

Unnamed: 0,nation,medal,count
0,South Korea,gold,24
1,China,gold,10
2,Canada,gold,9
3,South Korea,silver,13
4,China,silver,15
5,Canada,silver,12
6,South Korea,bronze,11
7,China,bronze,8
8,Canada,bronze,12


In [None]:
fig = px.bar(
    long_df,
    x='nation',
    y='count',
    color='medal',
    barmode='group',
    text_auto=True
)

In [None]:
months = [
    'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
    'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'
]

colors = ['lightslategray'] * 5 + ['crimson'] + ['lightslategray'] * 6

fig = go.Figure()
fig.add_trace(
    go.Bar(
        x=months,
        y=np.random.randint(1, 100, 12),
        name='Primary Product',
        marker_color = colors
    )
)
fig.add_trace(
    go.Bar(
        x=months,
        y=np.random.randint(1, 100, 12),
    
        name='Secondary Product',
        marker_color = colors
    )
)
fig.show()

In [71]:

fig = go.Figure()
fig.add_trace(
    go.Bar(
        x=months,
        y=np.random.randint(1, 100, 12)+100,
        base=100,
        name='Primary Product'
    )
)
y = -np.random.randint(1, 100, 12)+100
fig.add_trace(
    go.Bar(
        x=months,
        y = y,
        base=0,
        name='Secondary Product',
    )
)
fig.show()


In [69]:
y

array([-50, -29, -41, -67, -54, -74, -82, -64, -94, -22, -86, -74])

### Box Plots


**px**

args
  - points
    - all: box plot과 데이터 분포를 같이 출력
  - color: 지정한 컬럼을 그룹으로 출력

```python
df = px.data.tips()

fig = px.box(df, x="day", y="total_bill", color="smoker")
fig.show()
```
<br>

**go**

args
  - name: legend에 표기될 이름
  - marker_color: box에 색을 입힘

```python
y0 = np.random.randn(50) - 1
y1 = np.random.randn(50) + 1

fig = go.Figure()
fig.add_trace(go.Box(y=y0))    
fig.add_trace(go.Box(y=y1))    


fig.add_trace(
  go.Box(
    y=y0, name='Sample A',
    marker_color = 'indianred')
    )       # 축을 바꾸기 위해서는 y->x로 입력 (horizontal)
fig.add_trace(
  go.Box(
    y=y1, name = 'Sample B',
    marker_color = 'lightseagreen')
    )       # 축을 바꾸기 위해서는 y->x로 입력 (horizontal)

fig.show()

--------------------------------------------------------
# grouped boxplot

x = ['day 1', 'day 1', 'day 1', 'day 1', 'day 1', 'day 1',
     'day 2', 'day 2', 'day 2', 'day 2', 'day 2', 'day 2']

fig = go.Figure()

fig.add_trace(
  go.Box(
    y=[0.2, 0.2, 0.6, 1.0, 0.5, 0.4, 0.2, 0.7, 0.9, 0.1, 0.5, 0.3],
    x=x,
    name='kale',
    marker_color='#3D9970'
))
fig.add_trace(
  go.Box(
    y=[0.6, 0.7, 0.3, 0.6, 0.0, 0.5, 0.7, 0.9, 0.5, 0.8, 0.7, 0.2],
    x=x,
    name='radishes',
    marker_color='#FF4136'
))
fig.add_trace(
  go.Box(
    y=[0.1, 0.3, 0.1, 0.9, 0.6, 0.6, 0.9, 1.0, 0.3, 0.6, 0.8, 0.5],
    x=x,
    name='carrots',
    marker_color='#FF851B'
))

fig.update_layout(
    yaxis_title='normalized moisture',
    boxmode='group'
fig.update_traces(orientation='h') # horizontal box plots    

fig.show()
```

px

In [73]:
data = px.data.tips()
fig = px.box(
    data,
    x='day',
    y='total_bill',
    color='smoker'
)
fig.show()

go

In [None]:
fig = go.Figure()
fig.add_trace(
    y=data['total_bill'],
    name='total_bill'
)
fig.show()

### Pie Charts

**px**

args
  - names: 어떤 범주로 pie chart를 그릴 건지 설정
  - values: 비율을 산정할 컬럼

```python
df = px.data.gapminder().query("year == 2007").query("continent == 'Europe'")
df.loc[df['pop'] < 2.e6, 'country'] = 'Other countries' 
fig = px.pie(df, values='pop', names='country', title='Population of European continent')
fig.show()
```
<br>

**go**

args
  - hole: 도넛 모양을 만들 때 얼마나 구멍을 만들지 비율
  - pull: 일정 부분을 강조
  - scalegroup: 파이 크기 만큼 그래프를 조정

```python
labels = ['Oxygen','Hydrogen','Carbon_Dioxide','Nitrogen']
values = [4500, 2500, 1053, 500]

fig = go.Figure(
  data=[go.Pie(labels=labels, values=values, hole=.3, pull=[0, 0, 0.2, 0])]
  )
fig.show()

--------------
# 여러 파이 차트 그리기 
labels = ["US", "China", "European Union", "Russian Federation", "Brazil", "India", "Rest of World"]

fig = make_subplots(rows=1, cols=2, specs=[[{'type':'domain'}, {'type':'domain'}]])
fig.add_trace(go.Pie(labels=labels, values=[16, 15, 12, 6, 5, 4, 42], scalegroup='one', name="GHG Emissions"),
              1, 1)
fig.add_trace(go.Pie(labels=labels, values=[27, 11, 25, 8, 1, 3, 25], scalegroup='one', name="CO2 Emissions"),
              1, 2)

fig.update_traces(hole=.4, hoverinfo="label+percent+name")
```

apple-pie

In [74]:
df = px.data.gapminder().query('year == 2007 and continent == "Europe"')

In [75]:
fig = px.pie(
    df,
    values='pop',
    names='country'
)
fig.show()

go

In [79]:
fig = go.Figure()
fig.add_trace(
    go.Pie(
        labels=df['country'],
        values=df['pop'],
        hole=0.3,
        pull = [0.3] + [0] + (len(df['country'])-1)
    )
)
fig.show()

TypeError: can only concatenate list (not "int") to list

In [None]:
fig = make_subplots(rows=1, cols=2, specs=[[{'type' : 'domain'}, {'type' : 'domain'}]])
fig.add_trace(
    go.Pie(
        labels=df['country'],
        values=df['pop'],
        hole=0.3,
        pull = [0.3] + [0] *(len(df['country'])-1)
    )
)
fig.show()

twopie chart

In [None]:
df = px.data.gapminder().query('year == 2007 and continent == "Europe"')

In [83]:
fig = make_subplots(rows=1, cols=2, specs=[[{'type' : 'domain'}, {'type' : 'domain'}]])
fig.add_trace(
    go.Pie(
        labels=df.query('year == 2007 and continent == "Europe"')['country'],
        values=df.query('year == 2007 and continent == "Europe"')['pop'],
        hole=0.3,
        pull = (
            [0.3] + [0] *(len(df.query('year == 2007 and continent == "Europe"')['country']-1
            ),
        name='2007'
    ),
    1, 1
)
fig.add_trace(
    go.Pie(
        labels=df.query('year == 2007 and continent == "Europe"')['country'],
        values=df.query('year == 2007 and continent == "Europe"')['country']['pop'],
        hole=0.3,
        pull = [0.3] + [0] *(len(df.query('year == 2007 and continent == "Europe"')['country']-1)),
        name='2002'
    ),
    1, 2
)
fig.show()

TypeError: unsupported operand type(s) for -: 'str' and 'int'

### Histograms

**px**

args
  - color: 그룹 별 다른 색상으로 plot
  - category_order: 범주 순서 설정
  - log_y
    - True: log scale
  - nbins: 구간 수
  - bargap: bar간 간격
  - marginal: 데이터 분포 출력
    - rug
    - box
    - violin
  - text_auto
    - True: 값 출력

```python
df = px.data.tips()
fig = px.histogram(df, x="total_bill", category_orders=dict(day=["Thur", "Fri", "Sat", "Sun"])
fig.update_layout(bargap=0.2)
fig.show()
```
<br>

**go**

args
  - histnorm
    - probability: normalized
  - barmode
    - overlay: 그래프를 겹쳐서 plot
    - stack: 그래프를 쌓아서 plot

```python
x0 = np.random.randn(500)
x1 = np.random.randn(500) + 1

fig = go.Figure()
fig.add_trace(go.Histogram(x=x0))
fig.add_trace(go.Histogram(x=x1))

fig.update_layout(barmode='overlay')
fig.update_traces(opacity=0.75)
fig.show()
```

In [84]:
df = px.data.tips()


px

In [88]:
fig = px.histogram(
    df,
    x='day',
    category_orders=dict(
        day=['Sun', 'Thur', 'Fri', 'Sat']
    )
)
fig.show()

go

In [89]:
fig = go.Figure()
fig.add_trace(
    go.Histogram(
        x=df.query('sex=="Male"').total_bill,
        name='male'
    )
)
fig.add_trace(
    go.Histogram(
        x=df.query('sex=="Female"').total_bill,
        name='female'
    )
)

### Distplots
**ff**

args
  - curve_type
    - kde: kernel density estimation
    - normal: 정규분포
  - bin_size: 그룹별 bin 크기 조절 
  - show_hist
    - False: 막대그래프를 출력하지 않음
  - show_curve
    - False: curve 그래프를 출력하지 않음
  - show_rug
    - False: 분포 그래프를 출력하지 않음

```python
x1 = np.random.randn(200) - 2
x2 = np.random.randn(200)
x3 = np.random.randn(200) + 2
x4 = np.random.randn(200) + 4

hist_data = [x1, x2, x3, x4]

group_labels = ['Group 1', 'Group 2', 'Group 3', 'Group 4']
fig = ff.create_distplot(hist_data, group_labels, bin_size=[.1, .25, .5, 1])
fig.show()
--------------------
# pandas
df = pd.DataFrame({
  '2012': np.random.randn(200),
  '2013': np.random.randn(200) + 1
  })
fig = ff.create_distplot([df[c] for c in df.columns], df.columns, bin_size=.25)
fig.show()
```


In [91]:
fig = ff.create_distplot(
    [df.query(f'sex=="{sex}"').total_bill for sex in df.sex.unique()],
    ['Male', 'Female']
)
fig.show()

### Heatmaps

**px**
```python
import plotly.express as px

df = px.data.medals_wide(indexed=True, text_auto=True)
fig = px.imshow(df)
fig.show()
```
<br>

**go**
```python
import plotly.graph_objects as go

fig = go.Figure(
  data=go.Heatmap(
    z=[[1, None, 30, 50, 1], [20, 1, 60, 80, 30], [30, 60, 1, -10, 20]],
    x=['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday'],
    y=['Morning', 'Afternoon', 'Evening'],
    hoverongaps = False)
    )
fig.show()
```

In [93]:
df = px.data.medals_wide(indexed=True)
fig = px.imshow(df)
fig.show()

In [94]:
df.values

array([[24, 13, 11],
       [10, 15,  8],
       [ 9, 12, 12]], dtype=int64)

In [95]:
df.columns

Index(['gold', 'silver', 'bronze'], dtype='object', name='medal')

In [96]:
fig = go.Figure(
  data=go.Heatmap(
    z=df.values,
    x=df.columns,
    y=df.index,
    hoverongaps = False)
    )
fig.show()

In [97]:
df = pd.read_csv(r'')

medal,gold,silver,bronze
nation,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
South Korea,24,13,11
China,10,15,8
Canada,9,12,12


### Time Series and Date Axes

```python
df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/finance-charts-apple.csv')

fig = px.line(
  df, x='Date', y='AAPL.High', title='Time Series with Range Slider and Selectors'
  )

fig.update_xaxes(
    rangeslider_visible=True,
    rangeselector=dict(
        buttons=list([
            dict(count=1, label="1m", step="month", stepmode="backward"),
            dict(count=6, label="6m", step="month", stepmode="backward"),
            dict(count=1, label="YTD", step="year", stepmode="todate"),
            dict(count=1, label="1y", step="year", stepmode="backward"),
            dict(step="all")
        ])
    )
)
fig.show()
```

In [102]:
df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/finance-charts-apple.csv')

df

Unnamed: 0,Date,AAPL.Open,AAPL.High,AAPL.Low,AAPL.Close,AAPL.Volume,AAPL.Adjusted,dn,mavg,up,direction
0,2015-02-17,127.489998,128.880005,126.919998,127.830002,63152400,122.905254,106.741052,117.927667,129.114281,Increasing
1,2015-02-18,127.629997,128.779999,127.449997,128.720001,44891700,123.760965,107.842423,118.940333,130.038244,Increasing
2,2015-02-19,128.479996,129.029999,128.330002,128.449997,37362400,123.501363,108.894245,119.889167,130.884089,Decreasing
3,2015-02-20,128.619995,129.500000,128.050003,129.500000,48948400,124.510914,109.785449,120.763500,131.741551,Increasing
4,2015-02-23,130.020004,133.000000,129.660004,133.000000,70974100,127.876074,110.372516,121.720167,133.067817,Increasing
...,...,...,...,...,...,...,...,...,...,...,...
501,2017-02-10,132.460007,132.940002,132.050003,132.119995,20065500,132.119995,114.494004,124.498666,134.503328,Decreasing
502,2017-02-13,133.080002,133.820007,132.750000,133.289993,23035400,133.289993,114.820798,125.205166,135.589534,Increasing
503,2017-02-14,133.470001,135.089996,133.250000,135.020004,32815500,135.020004,115.175718,125.953499,136.731280,Increasing
504,2017-02-15,135.520004,136.270004,134.619995,135.509995,35501600,135.509995,115.545035,126.723499,137.901963,Decreasing


In [103]:
fig.update_xaxes(
    rangeslider_visible=True,
    rangeselector=dict(
        buttons=list([
            dict(count=1, label="1m", step="month", stepmode="backward"),
            dict(count=6, label="6m", step="month", stepmode="backward"),
            dict(count=1, label="YTD", step="year", stepmode="todate"),
            dict(count=1, label="1y", step="year", stepmode="backward"),
            dict(step="all")
        ])
    )
)
fig.show()

In [104]:
fig = px.line(
  df, x='Date', y='AAPL.High', title='Time Series with Range Slider and Selectors'
  )

fig.update_xaxes(
    rangeslider_visible=True,
    rangeselector=dict(
        buttons=list([
            dict(count=1, label="1m", step="month", stepmode="backward"),
            dict(count=6, label="6m", step="month", stepmode="backward"),
            dict(count=1, label="YTD", step="year", stepmode="todate"),
            dict(count=1, label="1y", step="year", stepmode="backward"),
            dict(step="all")
        ])
    )
)
fig.show()