## Altair

Altair是Python的一個公認的統計視覺化庫。它的API簡單、友好、一致，並建立在強大的vega - lite（互動式圖形語法）之上。Altair API不包含實際的視覺化呈現程式碼，而是按照vega - lite規範發出JSON數據結構。由此產生的資料可以在用戶界面中呈現，這種優雅的簡單性產生了漂亮且有效的視覺化效果，且只需很少的程式碼。

資料來源是一個DataFrame，它由不同資料型別的列組成。DataFrame是一種整潔的格式，其中的行與樣本相對應，而列與觀察到的變數相對應。資料通過資料轉換對映到使用組的視覺屬性(位置、顏色、大小、形狀、面板等)。


### - 安裝 

* 方法一： 

    pip install Altair


* 方法二： 

    conda install altair --channel conda-forge
    

### - 使用

In [None]:
import altair as alt

## 設定渲染方式為notebook

- jupyterlab (預設)
- nteract (預設)
- notebook
- colab

In [None]:
alt.renderers.enable('notebook')

## 指定繪圖數據


可接受類型包含

1. pandas DataFrame
2. altair Data物件(UrlData, InlineData, NamedData)
3. 指向json或csv檔案的url路徑

#### 1. pandas DataFrame

In [None]:
import pandas as pd

df = pd.DataFrame({'x': ['A', 'B', 'C', 'D', 'E'],
                     'y': [5, 3, 6, 7, 2]})

alt.Chart(df).mark_bar().encode(
    x='x:O',
    y='y:Q',
)

In [None]:
## 轉換json
df.to_json('data/data.json', orient='records')
# alt.to_json(df, filename = 'data.json', prefix='altair-data')

## 轉換csv
df.to_csv('data/data.csv')
# alt.to_csv(df, filename = 'data.csv', prefix='altair-data')

#### 2. altair Data物件(UrlData, InlineData, NamedData)

In [None]:
url = 'data/data.csv'
url_data = alt.UrlData(url, format=alt.CsvDataFormat())

In [None]:
alt.Chart(url_data).mark_bar().encode(x='x:O',y='y:Q')

#### 3. 指向json或csv檔案的url路徑

In [None]:
url = 'data/data.json'
# url = 'data.csv'
alt.Chart(url).mark_bar().encode(x='x:O', y='y:Q')

## 長資料與寬資料轉換

- 寬資料 : 每一列包含多個觀測值
- 長資料 : 每一列只包含一個觀測值


In [None]:
data = {'Date' : ['2019-01-01', '2019-01-02', '2019-01-03', '2019-01-04'] * 3,
        'Item' : ['A', 'A', 'A', 'A', 
                  'B', 'B', 'B', 'B', 
                  'C', 'C', 'C', 'C'],
        'Price': [120, 129, 102, 113, 
                  204, 205, 212, 201, 
                  41, 39, 53, 23]
       }

long_df = pd.DataFrame(data)
long_df

In [None]:
# 轉換為寬資料
wide_df = long_df.pivot(index='Date', columns='Item', values='Price')
wide_df

In [None]:
wide_df = wide_df.reset_index()

In [None]:
wide_df.melt('Date', var_name='Item', value_name='Price')

In [None]:
alt.Chart(long_df).mark_line().encode(
  x='Date:T',
  y='Price:Q',
  color='Item:N'
)

### 編碼 Encodings

將資料的屬性映射到可視化的屬性


1. 資料類型

    - 連續變量(quantitative) : Q
    - 有序變量(ordinal) : O
    - 無序變量(nominal) : N
    - 時間(temporal) : T




2. Channel包含

    - 位置(Position)
        - x, y, x2, y2, longitude, latitude, longitude2, latitude2  
        
    - 標記屬性(Mark Property)
        - color, fill, opacity, shape, size, stroke
        
    - 文字與提示(Text and Tooltip)
        - text, key, tooltip
        
    - 超連結(Hyperlink)
        - href
        
    - 細節層次(Level of Detail)
        - detail
        
    - 排序(Order)
        - order
        
    - 分面(Facet)
        - column, row

In [None]:
from vega_datasets import data
cars = data.cars()
cars.head()

In [None]:
alt.Chart(cars).mark_point().encode(
    x='Displacement:Q',
    y='Miles_per_Gallon:Q',
    color='Origin:N'
)

In [None]:
alt.Chart(cars).mark_point().encode(
    x='Displacement:Q',
    y='Miles_per_Gallon:Q',
    color='Origin:N',
    tooltip='Origin'
).configure_mark(
    filled=True
)

In [None]:
base = alt.Chart(cars).mark_point().encode(
    x='Horsepower:Q',
    y='Miles_per_Gallon:Q',
).properties(
    width=150,
    height=150
)

alt.hconcat(
   base.encode(color='Cylinders:Q').properties(title='quantitative'),
   base.encode(color='Cylinders:O').properties(title='ordinal'),
   base.encode(color='Cylinders:N').properties(title='nominal'),
)

In [None]:
from ggplot import *
meat.head()

In [None]:
meat = meat[['date', 'beef', 'pork', 'veal']]

In [None]:
base = alt.Chart(meat).mark_bar().encode(
    alt.Y('beef:Q', title='Beef Price')
).properties(
    width=300,
    height=300
)

alt.hconcat(
    base.encode(x='date:T').properties(title='year=temporal'),
    base.encode(x='date:O').properties(title='year=ordinal')
)

### 資料分組與聚合

In [None]:
alt.Chart(cars).mark_bar().encode(
    alt.X('Miles_per_Gallon:O', bin=True, sort='descending'),
    y = 'count()'
)

In [None]:
alt.Chart(meat).mark_circle().encode(
    alt.X('beef', bin=True),
    alt.Y('pork', bin=True),
    size='count()',
    color = 'veal'
)

### 圖例與座標軸

In [None]:
legend = alt.Legend(title="Country",
                    orient="left")

color = alt.Scale(scheme='dark2')


# domain = cars.Origin.unique().tolist()
# range_ = ['red', 'green', 'blue']
# color = alt.Scale(domain=domain, range=range_)

sort = ['USA', 'Japan', 'Europe']

alt.Chart(cars).mark_bar().encode(
    x=alt.X('Acceleration:Q', 
            scale=alt.Scale(domain = (-5, 30)),
            axis=alt.Axis(format='%', title='percentage')
           ),
    
    y='Origin:O',
    color=alt.Color('Origin:O', 
                    legend=legend,
                    scale=color,
                    sort = sort
                   )
).configure_mark(
    opacity=0.1
)

### 標記類型


- mark_area
- mark_bar
- mark_circle
- mark_geoshape
- mark_line
- mark_point
- mark_rect
- mark_rule
- mark_square
- mark_text
- mark_tick

In [None]:
long_meat = meat.melt('date', var_name='Item', value_name='Price')
alt.Chart(long_meat).mark_area().encode(
    x="date:T",
    y="Price:Q",
    color="Item:N"
)

In [None]:
import random

dates = list(pd.date_range(start='2019-01-01', end='2019-12-31'))
temp = [random.randint(20, 40) for _ in range(len(dates))]

temp_df =pd.DataFrame({'Date':dates, 'Temp':temp})

alt.Chart(temp_df).mark_rect().encode(
    alt.X('date(Date):O', title='day'),
    alt.Y('month(Date):O', title='month'),
    color='Temp:Q'
).properties(
    title="2019 Daily Temp (C)"
)

In [None]:
alt.Chart(meat).mark_square().encode(
    alt.X('beef', bin=True),
    alt.Y('pork', bin=True),
    size='count()',
    color = 'veal'
)

In [None]:
dates = list(pd.date_range(start='2019-01-01', end='2019-01-31'))
o = [random.randint(30, 60) for _ in range(len(dates))]
c = [random.randint(20, 70) for _ in range(len(dates))]
ohlc = pd.DataFrame({'Date':dates,
                   'Open':o,
                   'Close':c
                  })
ohlc['High'] = ohlc[['Open', 'Close']].max(axis=1) + random.randint(0, 10)
ohlc['Low'] = ohlc[['Open', 'Close']].min(axis=1) - random.randint(0, 10)

In [None]:
open_close_color = alt.condition("datum.Open < datum.Close",
                                 alt.value("#06982d"),
                                 alt.value("#ae1325"))

rule = alt.Chart(ohlc).mark_rule().encode(
    alt.X(
        'Date:T',
        axis=alt.Axis(format='%m/%d', title='Date in 2019 01')
    ),
    alt.Y(
        'Low',
        title='Price',
        scale=alt.Scale(zero=False),
    ),
    alt.Y2('High'),
    color=open_close_color
)

bar = alt.Chart(ohlc).mark_bar().encode(
    x='Date:T',
    y='Open',
    y2='Close',
    color=open_close_color
)

rule + bar

In [None]:
bars = alt.Chart(cars).mark_bar().encode(
    x=alt.X('mean(Acceleration):Q'),
    y='Origin:O',
    color=alt.Color('Origin:O')
)

text = bars.mark_text(
    align='left',
    baseline='middle',
    dx=3
).encode(
    text='mean(Acceleration):Q'
)

bars + text

In [None]:
alt.Chart(long_meat).mark_tick().encode(
    x='Price:Q',
    y='Item:O'
)

### 資料過濾

In [None]:
# 類別過濾

alt.Chart(cars).mark_point().encode(
    x='Displacement:Q',
    y='Miles_per_Gallon:Q',
    color='Origin:N'
).transform_filter(
    alt.FieldOneOfPredicate(field='Origin', oneOf=['Japan', 'USA'])
)

In [None]:
# 數值過濾

alt.Chart(cars).mark_point().encode(
    x='Displacement:Q',
    y='Miles_per_Gallon:Q',
    color='Origin:N'
).transform_filter(
    {'not': alt.FieldRangePredicate(field='Miles_per_Gallon', range=[8, 20])},    
).transform_filter(
    alt.FieldRangePredicate(field='Displacement', range=[100, 360])
)

In [None]:
alt.Chart(ohlc).transform_calculate(
    mean_price = (alt.datum.High + alt.datum.Low) / 2
).mark_line().encode(
    alt.X('Date', title='day'),
    alt.Y('mean_price:Q', title='Mean Price')
)

### 資料篩選

In [None]:
brush = alt.selection_interval()

In [None]:
alt.Chart(cars).mark_point().encode(
    x='Miles_per_Gallon:Q',
    y='Horsepower:Q',
    color=alt.condition(brush, 'Origin:N', alt.value('lightgray')),
).add_selection(
    brush
)

In [None]:
chart = alt.Chart(cars).mark_point().encode(
    y='Horsepower:Q',
    color=alt.condition(brush, 'Origin:N', alt.value('lightgray'))
).properties(
    width=250,
    height=250
).add_selection(
    brush
)

chart.encode(x='Acceleration:Q') | chart.encode(x='Miles_per_Gallon:Q')

In [None]:
single = alt.selection_single()

alt.Chart(meat).mark_point().encode(
    alt.X('beef', bin=True),
    alt.Y('pork', bin=True),
    size='count()',
    color=alt.condition(single, 'count()', alt.value('lightgray'))
).add_selection(
    single
)

In [None]:
input_dropdown = alt.binding_select(options=['beef','pork','veal'])

selection = alt.selection_single(fields=['Item'], 
                                 bind=input_dropdown, 
                                 name='Meat')
color = alt.condition(selection,
                      alt.Color('Item:N', legend=None),
                      alt.value('lightgray'))

alt.Chart(long_meat).mark_line().encode(
    x='date:T',
    y='Price:Q',
    color=color
).add_selection(
    selection
).transform_filter(
    selection
)

### 圖形分面

In [None]:
alt.Chart(long_meat).mark_line().encode(
    x='date:T',
    y='Price:Q',
    color=alt.Color('Item:N')
).properties(
    width=180,
    height=180
).facet(
    column='Item:N'
)

In [None]:
base = alt.Chart(meat).mark_point().encode(
    alt.X(alt.repeat("column"), type='quantitative'),
    alt.Y(alt.repeat("row"), type='quantitative'),
).properties(
    width=200,
    height=200
).repeat(
    row=['beef', 'pork', 'veal'],
    column=['beef', 'pork', 'veal']
).interactive()

base

### 保存圖片

In [None]:
base.save('chart.html')
chart.save('chart.json')

## 需安裝selenium
## pip3 install selenium

# chart.save('chart.png', scale_factor=2.0)
# chart.save('chart.svg', scale_factor=2.0)

### 案例 : 歷年性別人口數

In [None]:
# https://www.ris.gov.tw/346
# 歷年全國人口統計資料（括弧內為資料起始年）
# A 戶數、人口數及遷入、遷出
# 05年底人口按性別及年齡(35

# require xlrd
import altair as alt
from altair.expr import datum, if_
from vega_datasets import data

alt.renderers.enable('notebook')

df = pd.read_excel('data/y1s600000.xls', header=2, sheet=1)

In [None]:
df.head()

In [None]:
df = df[df['性別'] != '計']

In [None]:
df.columns

In [None]:
df = df.drop(['年　　別','總　　計','Unnamed: 25'], axis=1)

In [None]:
columns = ['year', 'sex']

In [None]:
columns.extend([(i+4) for i in range(0,105,5)])

In [None]:
df.columns = columns

In [None]:
df.tail()

In [None]:
df = df[:-2]

In [None]:
df = df.assign(year = df.year.fillna(method='ffill'))

In [None]:
df = df.fillna(0)

In [None]:
df = df.melt(['year', 'sex'], var_name='age', value_name='people')

In [None]:
dat = df.to_dict('records')

In [None]:
df.year = df.year.astype(float)
df.age = df.age.astype(int)
df.people = df.people.astype(int)

In [None]:
slider = alt.binding_range(min = df.year.min(), max = df.year.max(), step=1)
select_year = alt.selection_single(name='year', fields=['year'], bind=slider)

base = alt.Chart(df).add_selection(
    select_year
).transform_filter(
    select_year
).transform_calculate(
    gender=if_(datum.sex == '男', '男', '女')
)


title = alt.Axis(title='人口數')
color_scale = alt.Scale(domain=['男', '女'],
                        range=['#1f77b4', '#e377c2'])

left = base.transform_filter(
    datum.gender == '男'
).encode(
    y=alt.X('age:O', axis=None, sort=alt.SortOrder('descending')),
    x=alt.X('sum(people):Q', axis=title, sort=alt.SortOrder('descending'))
#     ,color=alt.Color('sex:N', scale=color_scale, legend=None)
).mark_bar(color='#1f77b4').properties(title='男')

middle = base.encode(
    y=alt.X('age:O', axis=None, sort=alt.SortOrder('descending')),
    text=alt.Text('age:Q'),
).mark_text().properties(width=20)

right = base.transform_filter(
    datum.gender == '女'
).encode(
    y=alt.X('age:O', axis=None, sort=alt.SortOrder('descending')),
    x=alt.X('sum(people):Q', axis=title)
#     ,color=alt.Color('sex:N', scale=color_scale, legend=None)
).mark_bar(color='#e377c2').properties(title='女')


left | middle | right