# Visualization Curriculum

## Chapter4: Scales, Axes, and Legends

---
* Author:  [Yuttapong Mahasittiwat](mailto:khala1391@gmail.com)
* Technologist | Data Modeler | Data Analyst
* [YouTube](https://www.youtube.com/khala1391)
* [LinkedIn](https://www.linkedin.com/in/yuttapong-m/)
---

Source: [Visualization Curriculum](https://idl.uw.edu/visualization-curriculum/altair_introduction.html)

In [1]:
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import altair as alt
print("pandas version :",pd.__version__)
print("numpy version :",np.__version__)
print("matplotlib version :",mpl.__version__)
print("seaborn version :",sns.__version__)
print("altair version :",alt.__version__)

pandas version : 2.2.1
numpy version : 1.26.4
matplotlib version : 3.8.4
seaborn version : 0.13.2
altair version : 5.4.0


In [2]:
import warnings
warnings.filterwarnings('ignore', category=FutureWarning, message="the convert_dtype parameter is deprecated")

### Data

In [4]:
antibiotics = 'https://cdn.jsdelivr.net/npm/vega-datasets@1/data/burtin.json'
df = pd.read_json(antibiotics)

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16 entries, 0 to 15
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Bacteria       16 non-null     object 
 1   Penicillin     16 non-null     float64
 2   Streptomycin   16 non-null     float64
 3   Neomycin       16 non-null     float64
 4   Gram_Staining  16 non-null     object 
 5   Genus          16 non-null     object 
dtypes: float64(3), object(3)
memory usage: 900.0+ bytes


### Configuring Scale and Axes

#### adjusting scale type

In [6]:
alt.Chart(antibiotics).mark_circle().encode(
    alt.X('Neomycin:Q')
)

In [7]:
alt.Chart(antibiotics).mark_circle().encode(
    alt.X('Neomycin:Q',
          scale=alt.Scale(type='sqrt'))
)

- `linear`: For continuous numerical data.
- `log`: For data with wide ranges (logarithmic scale).
- `sqrt`: For data with square root transformation.
- `ordinal`: For ordered categorical data.
- `nominal`: For unordered categorical data.
- `point`: For evenly distributed categorical data points.
- `band`: For evenly spaced categorical data with adjustable spacing.

In [9]:
alt.Chart(antibiotics).mark_circle().encode(
    alt.X('Neomycin:Q',
          scale=alt.Scale(type='log'))
)

#### styling an axis

In [10]:
alt.Chart(antibiotics).mark_circle().encode(
    alt.X('Neomycin:Q',
          sort='descending',
          scale=alt.Scale(type='log'),
          title='Neomycin MIC (μg/ml, reverse log scale)')
)

In [11]:
alt.Chart(antibiotics).mark_circle().encode(
    alt.X('Neomycin:Q',
          sort='descending',
          scale=alt.Scale(type='log'),
          axis=alt.Axis(orient='top'),
          title='Neomycin MIC (μg/ml, reverse log scale)')
)

#### adjusting grid line, tick count, sizing

In [23]:
alt.Chart(antibiotics).mark_circle().encode(
    alt.X('Neomycin:Q',
          sort='descending',
          scale=alt.Scale(type='log'),
          title='Neomycin MIC (μg/ml, reverse log scale)'),
    alt.Y('Streptomycin:Q',
          sort='descending',
          scale=alt.Scale(type='log'),
          title='Streptomycin MIC (μg/ml, reverse log scale)')
)

In [88]:
alt.Chart(antibiotics).mark_circle().encode(
    alt.X('Neomycin:Q',
         sort='descending',
         scale=alt.Scale(type='log', domain=[0.001,1000]),
         title='Neomycin MIC (μg/ml), reverse log scale'
        ),
    alt.Y('Penicillin:Q',
          sort='descending',
          scale=alt.Scale(type='log', domain=[0.001,1000]),
          title='Penicillin MIC (μg/ml, reverse log scale)')
).properties(width=250, height=250)

In [94]:
alt.Chart(antibiotics).mark_circle().encode(
    alt.X('Neomycin:Q',
         sort='descending',
         scale=alt.Scale(type='log', domain=[0.001,1000]),
         axis=alt.Axis(grid=False),
         title='Neomycin MIC (μg/ml), reverse log scale'
        ),
    alt.Y('Penicillin:Q',
          sort='descending',
          scale=alt.Scale(type='log', domain=[0.001,1000]),
          axis=alt.Axis(grid=False),
          title='Penicillin MIC (μg/ml, reverse log scale)')
).properties(width=250, height=250)

In [104]:
alt.Chart(antibiotics).mark_circle().encode(
    alt.X('Neomycin:Q',
         sort='descending',
         scale=alt.Scale(type='log', domain=[0.001,1000]),
         axis=alt.Axis(tickCount=5),
         title='Neomycin MIC (μg/ml), reverse log scale'
        ),
    alt.Y('Penicillin:Q',
          sort='descending',
          scale=alt.Scale(type='log', domain=[0.001,1000]),
          axis=alt.Axis(tickCount=5),
          title='Penicillin MIC (μg/ml, reverse log scale)')
).properties(width=250, height=250)

In [106]:
alt.Chart(antibiotics).mark_circle(size=80).encode(
    alt.X('Neomycin:Q',
         sort='descending',
         scale=alt.Scale(type='log', domain=[0.001,1000]),
         axis=alt.Axis(tickCount=5),
         title='Neomycin MIC (μg/ml), reverse log scale'
        ),
    alt.Y('Penicillin:Q',
          sort='descending',
          scale=alt.Scale(type='log', domain=[0.001,1000]),
          axis=alt.Axis(tickCount=5),
          title='Penicillin MIC (μg/ml, reverse log scale)')
).properties(width=250, height=250)

### configuring color legend

#### by gram staining

In [118]:
alt.Chart(antibiotics).mark_circle(size=80).encode(
    alt.X('Neomycin:Q',
         sort='descending',
         scale=alt.Scale(type='log', domain=[0.001,1000]),
         axis=alt.Axis(tickCount=5),
         title='Neomycin MIC (μg/ml), reverse log scale'
        ),
    alt.Y('Penicillin:Q',
          sort='descending',
          scale=alt.Scale(type='log', domain=[0.001,1000]),
          axis=alt.Axis(tickCount=5),
          title='Penicillin MIC (μg/ml, reverse log scale)'),
    alt.Color('Gram_Staining:N')
).properties(width=250, height=250)

In [124]:
alt.Chart(antibiotics).mark_circle(size=80).encode(
    alt.X('Neomycin:Q',
         sort='descending',
         scale=alt.Scale(type='log', domain=[0.001,1000]),
         axis=alt.Axis(tickCount=5),
         title='Neomycin MIC (μg/ml), reverse log scale'
        ),
    alt.Y('Penicillin:Q',
          sort='descending',
          scale=alt.Scale(type='log', domain=[0.001,1000]),
          axis=alt.Axis(tickCount=5),
          title='Penicillin MIC (μg/ml, reverse log scale)'),
    alt.Color('Gram_Staining:N',
             scale=alt.Scale(domain=['negative','positive'], range=['hotpink','purple'])
             )
).properties(width=250, height=250)

In [130]:
alt.Chart(antibiotics).mark_circle(size=80).encode(
    alt.X('Neomycin:Q',
         sort='descending',
         scale=alt.Scale(type='log', domain=[0.001,1000]),
         axis=alt.Axis(tickCount=5),
         title='Neomycin MIC (μg/ml), reverse log scale'
        ),
    alt.Y('Penicillin:Q',
          sort='descending',
          scale=alt.Scale(type='log', domain=[0.001,1000]),
          axis=alt.Axis(tickCount=5),
          title='Penicillin MIC (μg/ml, reverse log scale)'),
    alt.Color('Gram_Staining:N',
             scale=alt.Scale(domain=['negative','positive'], range=['hotpink','purple']),
             legend=alt.Legend(orient='left')
             )
).properties(width=250, height=250)

In [132]:
alt.Chart(antibiotics).mark_circle(size=80).encode(
    alt.X('Neomycin:Q',
         sort='descending',
         scale=alt.Scale(type='log', domain=[0.001,1000]),
         axis=alt.Axis(tickCount=5),
         title='Neomycin MIC (μg/ml), reverse log scale'
        ),
    alt.Y('Penicillin:Q',
          sort='descending',
          scale=alt.Scale(type='log', domain=[0.001,1000]),
          axis=alt.Axis(tickCount=5),
          title='Penicillin MIC (μg/ml, reverse log scale)'),
    alt.Color('Gram_Staining:N',
             scale=alt.Scale(domain=['negative','positive'], range=['hotpink','purple']),
             legend= None
             )
).properties(width=250, height=250)

#### by Species

In [143]:
df.sample(3)

Unnamed: 0,Bacteria,Penicillin,Streptomycin,Neomycin,Gram_Staining,Genus
9,Salmonella (Eberthella) typhosa,1.0,0.4,0.008,negative,Salmonella
12,Staphylococcus aureus,0.03,0.03,0.001,positive,Staphylococcus
5,Klebsiella pneumoniae,850.0,1.2,1.0,negative,other


In [153]:
display(df['Bacteria'].nunique())
display(df['Bacteria'].unique())
display(df['Bacteria'].value_counts())

16

array(['Aerobacter aerogenes', 'Bacillus anthracis', 'Brucella abortus',
       'Diplococcus pneumoniae', 'Escherichia coli',
       'Klebsiella pneumoniae', 'Mycobacterium tuberculosis',
       'Proteus vulgaris', 'Pseudomonas aeruginosa',
       'Salmonella (Eberthella) typhosa', 'Salmonella schottmuelleri',
       'Staphylococcus albus', 'Staphylococcus aureus',
       'Streptococcus fecalis', 'Streptococcus hemolyticus',
       'Streptococcus viridans'], dtype=object)

Bacteria
Aerobacter aerogenes               1
Bacillus anthracis                 1
Brucella abortus                   1
Diplococcus pneumoniae             1
Escherichia coli                   1
Klebsiella pneumoniae              1
Mycobacterium tuberculosis         1
Proteus vulgaris                   1
Pseudomonas aeruginosa             1
Salmonella (Eberthella) typhosa    1
Salmonella schottmuelleri          1
Staphylococcus albus               1
Staphylococcus aureus              1
Streptococcus fecalis              1
Streptococcus hemolyticus          1
Streptococcus viridans             1
Name: count, dtype: int64

In [173]:
alt.Chart(antibiotics).mark_circle(size=80).encode(
    alt.X('Neomycin:Q',
         sort='descending',
         scale=alt.Scale(type='log', domain=[0.001,1000]),
         axis=alt.Axis(tickCount=5),
         title='Neomycin MIC (μg/ml), reverse log scale'
        ),
    alt.Y('Penicillin:Q',
          sort='descending',
          scale=alt.Scale(type='log', domain=[0.001,1000]),
          axis=alt.Axis(tickCount=5),
          title='Penicillin MIC (μg/ml, reverse log scale)'),
    alt.Color('Bacteria:N')
).properties(width=250, height=250)

Color Scheme : [link](https://vega.github.io/vega/docs/schemes/#reference)

In [158]:
alt.Chart(antibiotics).mark_circle(size=80).encode(
    alt.X('Neomycin:Q',
         sort='descending',
         scale=alt.Scale(type='log', domain=[0.001,1000]),
         axis=alt.Axis(tickCount=5),
         title='Neomycin MIC (μg/ml), reverse log scale'
        ),
    alt.Y('Penicillin:Q',
          sort='descending',
          scale=alt.Scale(type='log', domain=[0.001,1000]),
          axis=alt.Axis(tickCount=5),
          title='Penicillin MIC (μg/ml, reverse log scale)'),
    alt.Color('Bacteria:N',
             scale=alt.Scale(scheme='tableau20'),
             )
).properties(width=250, height=250)

In [162]:
alt.Chart(antibiotics).mark_circle(size=80).encode(
    alt.X('Neomycin:Q',
         sort='descending',
         scale=alt.Scale(type='log', domain=[0.001,1000]),
         axis=alt.Axis(tickCount=5),
         title='Neomycin MIC (μg/ml), reverse log scale'
        ),
    alt.Y('Penicillin:Q',
          sort='descending',
          scale=alt.Scale(type='log', domain=[0.001,1000]),
          axis=alt.Axis(tickCount=5),
          title='Penicillin MIC (μg/ml, reverse log scale)'),
    alt.Color('Bacteria:O',
             # scale=alt.Scale(scheme='tableau20'),
             )
).properties(width=250, height=250)

In [171]:
alt.Chart(antibiotics).mark_circle(size=80).encode(
    alt.X('Neomycin:Q',
         sort='descending',
         scale=alt.Scale(type='log', domain=[0.001,1000]),
         axis=alt.Axis(tickCount=5),
         title='Neomycin MIC (μg/ml), reverse log scale'
        ),
    alt.Y('Penicillin:Q',
          sort='descending',
          scale=alt.Scale(type='log', domain=[0.001,1000]),
          axis=alt.Axis(tickCount=5),
          title='Penicillin MIC (μg/ml, reverse log scale)'),
    alt.Color('Bacteria:O',
             scale=alt.Scale(scheme='viridis'),
             )
).properties(width=250, height=250)

#### by Genus

In [177]:
alt.Chart(antibiotics).mark_circle(size=80).transform_calculate(
    Genus='split(datum.Bacteria," ")[0]'
).encode(
    alt.X('Neomycin:Q',
         sort='descending',
         scale=alt.Scale(type='log', domain=[0.001,1000]),
         axis=alt.Axis(tickCount=5),
         title='Neomycin MIC (μg/ml), reverse log scale'
        ),
    alt.Y('Penicillin:Q',
          sort='descending',
          scale=alt.Scale(type='log', domain=[0.001,1000]),
          axis=alt.Axis(tickCount=5),
          title='Penicillin MIC (μg/ml, reverse log scale)'),
    alt.Color('Genus:N',
             scale=alt.Scale(scheme='tableau20'),
             )
).properties(width=250, height=250)

In [179]:
alt.Chart(antibiotics).mark_circle(size=80).encode(
    alt.X('Neomycin:Q',
         sort='descending',
         scale=alt.Scale(type='log', domain=[0.001,1000]),
         axis=alt.Axis(tickCount=5),
         title='Neomycin MIC (μg/ml), reverse log scale'
        ),
    alt.Y('Penicillin:Q',
          sort='descending',
          scale=alt.Scale(type='log', domain=[0.001,1000]),
          axis=alt.Axis(tickCount=5),
          title='Penicillin MIC (μg/ml, reverse log scale)'),
    alt.Color('Genus:N',
             scale=alt.Scale(scheme='tableau20'),
             )
).properties(width=250, height=250)

In [184]:
alt.Chart(antibiotics).mark_circle(size=80).transform_calculate(
  Split='split(datum.Bacteria, " ")[0]'
).transform_calculate(
  Genus='indexof(["Salmonella", "Staphylococcus", "Streptococcus"], datum.Split) >= 0 ? datum.Split : "Other"'
).encode(
    alt.X('Neomycin:Q',
          sort='descending',
          scale=alt.Scale(type='log', domain=[0.001, 1000]),
          axis=alt.Axis(tickCount=5),
          title='Neomycin MIC (μg/ml, reverse log scale)'),
    alt.Y('Penicillin:Q',
          sort='descending',
          scale=alt.Scale(type='log', domain=[0.001, 1000]),
          axis=alt.Axis(tickCount=5),
          title='Penicillin MIC (μg/ml, reverse log scale)'),
    alt.Color('Genus:N',
          scale=alt.Scale(
            domain=['Salmonella', 'Staphylococcus', 'Streptococcus', 'Other'],
            range=['rgb(76,120,168)', 'rgb(84,162,75)', 'rgb(228,87,86)', 'rgb(121,112,110)']
          ))
).properties(width=250, height=250)

**note**: In your code snippet, the ? operator in the transform_calculate expression is used as part of a conditional (ternary) operator in JavaScript-like syntax.
- ? datum.Split : "Other":
  - If the condition is true (i.e., the genus is one of the specified ones), it returns datum.Split.
  - If the condition is false (i.e., the genus is not one of the specified ones), it returns "Other"

#### by antibiotics response

In [195]:
alt.Chart(antibiotics).mark_rect().encode(
    alt.Y('Bacteria:N',
         sort=alt.EncodingSortField(field='Penicillin',op='max',order='descending')
         ),
    alt.Color('Penicillin:Q')
)

In [229]:
alt.Chart(antibiotics).mark_rect().encode(
    alt.Y('Bacteria:N',
      sort=alt.EncodingSortField(field='Penicillin', op='max', order='descending'),
      axis=alt.Axis(
        orient='right',     # orient axis on right side of chart
        titleX=7,           # set x-position to 7 pixels right of chart
        titleY=-2,          # set y-position to 2 pixels above chart
        titleAlign='left',  # use left-aligned text
        titleAngle=0        # undo default title rotation
      )
    ),
    alt.Color('Penicillin:Q',
      scale=alt.Scale(type='log', scheme='plasma', 
                      nice=True
                     ),
      legend=alt.Legend(
                        titleOrient='right', 
                        tickCount=5
                       ),
      title='Penicillin MIC (μg/ml)'
    )
)

In [245]:
alt.Chart(antibiotics, title='Penicillin Resistance of Bacterial Strains').mark_rect().encode(
    alt.Y('Bacteria:N',
      sort=alt.EncodingSortField(field='Penicillin', op='max', order='descending'),
      axis=alt.Axis(orient='right', title=None)
    ),
    alt.Color('Penicillin:Q',
      scale=alt.Scale(type='log', scheme='plasma', nice=True),
      legend=alt.Legend(titleOrient='right', tickCount=5),
      title='Penicillin MIC (μg/ml)'
    )
).configure_title(
  anchor='start', # anchor and left-align title  |start|,middle|end
  offset=5        # set title offset from chart
)