# Visualization Curriculum

## Chart.mark_* parameter

---
* Author:  [Yuttapong Mahasittiwat](mailto:khala1391@gmail.com)
* Technologist | Data Modeler | Data Analyst
* [YouTube](https://www.youtube.com/khala1391)
* [LinkedIn](https://www.linkedin.com/in/yuttapong-m/)
---

Source: [Visualization Curriculum](https://idl.uw.edu/visualization-curriculum/altair_introduction.html)

In [21]:
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import altair as alt
print("pandas version :",pd.__version__)
print("numpy version :",np.__version__)
print("matplotlib version :",mpl.__version__)
print("seaborn version :",sns.__version__)
print("altair version :",alt.__version__)

pandas version : 2.2.1
numpy version : 1.26.4
matplotlib version : 3.8.4
seaborn version : 0.13.2
altair version : 5.4.0


In [22]:
import warnings
warnings.filterwarnings('ignore', category=FutureWarning, message="the convert_dtype parameter is deprecated")

In [23]:
from vega_datasets import data

- **fold transform**: https://altair-viz.github.io/user_guide/data.html#converting-with-fold-transform
- **method-base syntax**:https://altair-viz.github.io/user_guide/encodings/index.html#method-based-syntax
- **long format**: https://altair-viz.github.io/user_guide/data.html#long-form-vs-wide-form-data
- **datatype on color scale**:https://altair-viz.github.io/user_guide/encodings/index.html#effect-of-data-type-on-color-scales
- **datatype on axis scale**:https://altair-viz.github.io/user_guide/encodings/index.html#effect-of-data-type-on-axis-scales
- **sort_option**:https://altair-viz.github.io/user_guide/encodings/index.html#sort-option
- **Config**: https://altair-viz.github.io/user_guide/customization.html#global-config-vs-local-config-vs-encoding

- **custom legend**:https://altair-viz.github.io/gallery/line_chart_with_custom_legend.html#line-chart-with-custom-legend
- **line chart with text label on last observable**: https://altair-viz.github.io/gallery/scatter_with_labels.html#simple-scatter-plot-with-labels
- **bar chart with label**: https://altair-viz.github.io/gallery/bar_chart_with_labels.html#bar-chart-with-labels
- **url link**: https://altair-viz.github.io/gallery/scatter_href.html#gallery-scatter-href
- **adjust title**: https://altair-viz.github.io/user_guide/customization.html#adjusting-the-title
- **save chart**: https://altair-viz.github.io/user_guide/saving_charts.html#saving-altair-charts

In [25]:
data.list_datasets()
# data.*?

['7zip',
 'airports',
 'annual-precip',
 'anscombe',
 'barley',
 'birdstrikes',
 'budget',
 'budgets',
 'burtin',
 'cars',
 'climate',
 'co2-concentration',
 'countries',
 'crimea',
 'disasters',
 'driving',
 'earthquakes',
 'ffox',
 'flare',
 'flare-dependencies',
 'flights-10k',
 'flights-200k',
 'flights-20k',
 'flights-2k',
 'flights-3m',
 'flights-5k',
 'flights-airport',
 'gapminder',
 'gapminder-health-income',
 'gimp',
 'github',
 'graticule',
 'income',
 'iowa-electricity',
 'iris',
 'jobs',
 'la-riots',
 'londonBoroughs',
 'londonCentroids',
 'londonTubeLines',
 'lookup_groups',
 'lookup_people',
 'miserables',
 'monarchs',
 'movies',
 'normal-2d',
 'obesity',
 'ohlc',
 'points',
 'population',
 'population_engineers_hurricanes',
 'seattle-temps',
 'seattle-weather',
 'sf-temps',
 'sp500',
 'stocks',
 'udistrict',
 'unemployment',
 'unemployment-across-industries',
 'uniform-2d',
 'us-10m',
 'us-employment',
 'us-state-capitals',
 'volcano',
 'weather',
 'weball26',
 'wheat',

In [244]:
# alt.themes.*?
# alt.theme.*?
alt.theme.themes
# change theme : alt.themes.enable(theme_name)

# chart theme: alt.themes.get()

ThemeRegistry(active='default', registered=['carbong10', 'carbong100', 'carbong90', 'carbonwhite', 'dark', 'default', 'excel', 'fivethirtyeight', 'ggplot2', 'googlecharts', 'latimes', 'none', 'opaque', 'powerbi', 'quartz', 'urbaninstitute', 'vox'])

In [26]:
cars = data.cars()
cars.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 406 entries, 0 to 405
Data columns (total 9 columns):
 #   Column            Non-Null Count  Dtype         
---  ------            --------------  -----         
 0   Name              406 non-null    object        
 1   Miles_per_Gallon  398 non-null    float64       
 2   Cylinders         406 non-null    int64         
 3   Displacement      406 non-null    float64       
 4   Horsepower        400 non-null    float64       
 5   Weight_in_lbs     406 non-null    int64         
 6   Acceleration      406 non-null    float64       
 7   Year              406 non-null    datetime64[ns]
 8   Origin            406 non-null    object        
dtypes: datetime64[ns](1), float64(4), int64(2), object(2)
memory usage: 28.7+ KB


In [27]:
# data.list_datasets()
data.*?

data.7zip
data.__call__
data.__class__
data.__delattr__
data.__dict__
data.__dir__
data.__doc__
data.__eq__
data.__format__
data.__ge__
data.__getattr__
data.__getattribute__
data.__getstate__
data.__gt__
data.__hash__
data.__init__
data.__init_subclass__
data.__le__
data.__lt__
data.__module__
data.__ne__
data.__new__
data.__reduce__
data.__reduce_ex__
data.__repr__
data.__setattr__
data.__sizeof__
data.__str__
data.__subclasshook__
data.__weakref__
data.airports
data.annual_precip
data.anscombe
data.barley
data.birdstrikes
data.budget
data.budgets
data.burtin
data.cars
data.climate
data.co2_concentration
data.countries
data.crimea
data.disasters
data.driving
data.earthquakes
data.ffox
data.flare
data.flare_dependencies
data.flights_10k
data.flights_200k
data.flights_20k
data.flights_2k
data.flights_3m
data.flights_5k
data.flights_airport
data.gapminder
data.gapminder_health_income
data.gimp
data.github
data.graticule
data.income
data.iowa_electricity
data.iris
data.jobs
data.la_riots

In [28]:
data.cars.url
pd.read_json(data.cars.url).info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 406 entries, 0 to 405
Data columns (total 9 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Name              406 non-null    object 
 1   Miles_per_Gallon  398 non-null    float64
 2   Cylinders         406 non-null    int64  
 3   Displacement      406 non-null    float64
 4   Horsepower        400 non-null    float64
 5   Weight_in_lbs     406 non-null    int64  
 6   Acceleration      406 non-null    float64
 7   Year              406 non-null    object 
 8   Origin            406 non-null    object 
dtypes: float64(4), int64(2), object(3)
memory usage: 28.7+ KB


### Single view

In [30]:
df = pd.DataFrame({
    'city': ['Seattle', 'Seattle', 'Seattle', 'New York', 'New York', 'New York', 'Chicago', 'Chicago', 'Chicago'],
    'month': ['Apr', 'Aug', 'Dec', 'Apr', 'Aug', 'Dec', 'Apr', 'Aug', 'Dec'],
    'precip': [2.68, 0.87, 5.31, 3.94, 4.13, 3.58, 3.62, 3.98, 2.56]
})

df

Unnamed: 0,city,month,precip
0,Seattle,Apr,2.68
1,Seattle,Aug,0.87
2,Seattle,Dec,5.31
3,New York,Apr,3.94
4,New York,Aug,4.13
5,New York,Dec,3.58
6,Chicago,Apr,3.62
7,Chicago,Aug,3.98
8,Chicago,Dec,2.56


In [31]:
chart = alt.Chart(df)

#### mark_point 

In [33]:
alt.Chart(df).mark_point()

In [34]:
alt.Chart(df).mark_point().encode(
    y='city'
)

In [35]:
alt.Chart(df).mark_point().encode(
    y='city',
    x='precip'
)

In [36]:
# what if change data type
alt.Chart(df).mark_point().encode(
    y='city',
    x='precip:N'
)

In [37]:
alt.Chart(df).mark_point(filled=True).encode(
    y='city',
    x='precip:N'
)

In [38]:
alt.Chart(df).mark_point(filled=True,opacity=0.5, color='#a321e7').encode(
    y='city',
    x='precip:N'
)

Hex code : https://htmlcolorcodes.com/

In [40]:
alt.Chart(df).mark_point(filled=True,
                         opacity=1,
                         color='#e72146',
                         size=90,
                         shape='square',
                         stroke='black',
                         strokeWidth=.5).encode(
    y='city',
    x='precip:N',
    tooltip=['city','precip']
)

- Explicit annotation of data types is necessary when data is loaded from an external URL directly
  - `b:N` indicates a nominal type (unordered, categorical data)
  - `b:O` indicates an ordinal type (rank-ordered data)
  - `b:Q` indicates a quantitative type (numerical data with meaningful magnitudes)
  - `b:T` indicates a temporal type (date/time data)

📚 Q: how to set sequence for name month to be ordinal type

#### mark_bar 

In [44]:
alt.Chart(df).mark_bar().encode(
    x= 'average(precip)',
    y= 'city',
)

**full list**:
[aggregation function](https://altair-viz.github.io/user_guide/encodings/index.html#aggregation-functions)
<br>
**mark parameters**: [mark def](https://altair-viz.github.io/user_guide/generated/core/altair.MarkDef.html#altair.MarkDef)
-  primitive: `arc`,  `area`, `bar`, `image`, `line`, `point`, `rect`, `rule`, `text`, `tick`, `trail`, `circle`, `square`, `geoshape`
-  composite: `boxplot`, `errorband`, `errorbar`

In [46]:
alt.Chart(df).mark_bar(filled=True,
                       opacity=.8,
                       color='#e72146',
                       # size=90,
                       # shape='square',
                       stroke='black',
                       strokeWidth=.5,
                       cornerRadius=6).encode(
    x= 'average(precip)',
    y= 'city',
)

In [47]:
alt.Chart(df).mark_bar(binSpacing=.5,
                       width=20,
                       # clip=True
                      ).encode(
    y= 'average(precip)',
    x= 'city',
).properties(width=400)

#### mark_arc

In [49]:
alt.Chart(df).mark_arc().encode(
    theta=alt.Theta(field='precip',type='quantitative'),
    color=alt.Color(field='city',type='nominal')
)

In [50]:
alt.Chart(df).mark_arc(innerRadius=30,
                      outerRadius=80,
                      angle=90,
                      cornerRadius=20).encode(
    theta= 'sum(precip):Q',
    color= 'city:N',
    tooltip=['average(precip)']
)

#### mark_text

In [154]:
from vega_datasets import data
source = data.stocks()
# source.head(1)
source.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 560 entries, 0 to 559
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   symbol  560 non-null    object        
 1   date    560 non-null    datetime64[ns]
 2   price   560 non-null    float64       
dtypes: datetime64[ns](1), float64(1), object(1)
memory usage: 13.3+ KB


In [156]:
source = data.wheat()
source.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 52 entries, 0 to 51
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   year    52 non-null     int64  
 1   wheat   52 non-null     float64
 2   wages   50 non-null     float64
dtypes: float64(2), int64(1)
memory usage: 1.3 KB


In [150]:
df.head(1)

Unnamed: 0,city,month,precip
0,Seattle,Apr,2.68


In [160]:
# issue
base = alt.Chart(df).encode(
    y= 'average(precip)',
    x= 'city',
    text='precip'
).properties(width=400)

# max_precip = base.mark_circle().encode(
#     alt.X("last_point['city']"),
#     alt.Y("last_point['precip']")
# ).transform_aggregate(
#     last_point="argmax(precip)",
#     groupby=["city"]
# )



base.mark_bar(binSpacing=.5,width=20)+base.mark_text(align='left')

#### low level: alt.Scale

- continuous
  - `linear`
  - `log`
- discrete (to discrete, continuous)
  - `ordinal`
  - `band`
  - `point`
- discretizing scale
  - `quantile`
- `temporal`

In [125]:
alt.Chart(df).mark_point().encode(
    alt.X('precip',scale=alt.Scale(type='identity'),
          axis=alt.Axis(title='Log Value')
         ),
    alt.Y('city', axis=alt.Axis(title='Category'))
)

In [54]:
alt.Chart(df).mark_point().encode(
    alt.X('precip',scale=alt.Scale(type='linear',
                                  domain=[0,10],
                                  scheme='viridis'),
          axis=alt.Axis(title='Log Value')
         ),
    alt.Y('city', axis=alt.Axis(title='Category'))
)

In [55]:
import altair as alt
import pandas as pd

# Sample data
data = pd.DataFrame({
    'x': [10, 20, 30, 40, 50],
    'y': [15, 25, 35, 45, 55]
})

# Default scale
chart = alt.Chart(data).mark_circle(size=100).encode(
    x='x',
    y='y'
)
chart

In [56]:
chart = alt.Chart(data).mark_circle(size=100).encode(
    x=alt.X('x', scale=alt.Scale(range=[0, 500])),
    y='y'
)
chart

In [57]:
chart = alt.Chart(data).mark_circle(size=100).encode(
    x=alt.X('x', scale=alt.Scale(domain=[0, 60], range=[0, 200])),
    y='y'
)
chart

In [58]:
chart = alt.Chart(data).mark_circle(size=200).encode(
    x='x',
    y='y',
    color=alt.Color('x', scale=alt.Scale(domain=[0, 100], range=['lightblue', 'darkblue']))
)
chart

#### low level: alt.Axis

In [87]:
alt.Chart(df).mark_point().encode(
    alt.X('precip',scale=alt.Scale(type='linear',
                                  domain=[0,10],
                                  scheme='viridis'),
          axis=alt.Axis(title='Log Value')
         ),
    alt.Y('city', axis=alt.Axis(title='Category'))
)

In [95]:
alt.Chart(df).mark_point().encode(
    y='city',
    x=alt.X('precip', axis=alt.Axis(title='axis x',
                                   labelAngle=-45,
                                   format='.2f',
                                   grid=False,
                                   tickCount=5,
                                   tickSize=20,
                                   # tickMinStep=1,
                                   labelColor='green'))
)

In [140]:
alt.Chart(df).mark_point().encode(
    y='city',
    x=alt.X('precip', axis=alt.Axis(title='axis x',
                                    labels=False, 
                                    ticks=False))
)

#### low level: tooltip 

In [142]:
alt.Chart(df).mark_arc().encode(
    theta= 'sum(precip):Q',
    color= 'city:N',
    tooltip=[alt.Tooltip('precip',
                         aggregate='average',
                         format='.2f',
                         title='avg_precip')]
)

In [143]:
alt.Chart(df).mark_point().encode(
    alt.X('precip',scale=alt.Scale(type='identity'),
          axis=alt.Axis(title='Log Value')
         ),
    alt.Y('city', axis=alt.Axis(title='Category')),
    tooltip=[alt.Tooltip('month', title='Month Name'),
            alt.Tooltip('city',title='City')]
)

#### low level:sort
- https://altair-viz.github.io/user_guide/encodings/index.html#sort-option

In [184]:
# from Cur_Ch03
movies_url = 'https://cdn.jsdelivr.net/npm/vega-datasets@1/data/movies.json'

alt.Chart(movies_url).mark_bar().encode(
    # alt.X('average(Rotten_Tomatoes_Rating):Q'),
    alt.X('mean(Rotten_Tomatoes_Rating):Q'),
    alt.Y('Major_Genre:N')
)

In [186]:
# full-length code for sort
alt.Chart(movies_url).mark_bar().encode(
    alt.X('average(Rotten_Tomatoes_Rating):Q'),
    alt.Y('Major_Genre:N', sort=alt.EncodingSortField(
        op='average', field='Rotten_Tomatoes_Rating', order='descending')
    )
)

In [190]:
# from user_guide
# shorthand code for sort

barley = data.barley()

base = alt.Chart(barley).mark_bar().encode(
    y='mean(yield):Q',
    color=alt.Color('mean(yield):Q').legend(None)
).properties(width=100, height=100)

# Sort x in ascending order
ascending = base.encode(
    alt.X('site:N').sort('ascending')
).properties(
    title='Ascending'
)

# Sort x in descending order
descending = base.encode(
    alt.X('site:N').sort('descending')
).properties(
    title='Descending'
)

# Sort x in an explicitly-specified order
explicit = base.encode(
    alt.X('site:N').sort(
        ['Duluth', 'Grand Rapids', 'Morris', 'University Farm', 'Waseca', 'Crookston']
    )
).properties(
    title='Explicit'
)

# Sort according to encoding channel
sortchannel = base.encode(
    alt.X('site:N').sort('y')
).properties(
    title='By Channel'
)

# Sort according to another field
sortfield = base.encode(
    alt.X('site:N').sort(field='yield', op='mean')
).properties(
    title='By Yield'
)

alt.concat(
    ascending,
    descending,
    explicit,
    sortchannel,
    sortfield,
    columns=3
)

#### method: encode

#### method: transform_filter

In [193]:
alt.Chart(cars).mark_line().encode(
    alt.X('year'),
    alt.Y('average(Miles_per_Gallon)'),
    color='Origin'
)

In [213]:
alt.Chart(cars).mark_line().encode(
    alt.X('Year'),
    alt.Y('average(Miles_per_Gallon)'),
    color='Origin'
).transform_filter(alt.datum.Miles_per_Gallon>20
)

#### method: properties

In [147]:
# Sample data
data = pd.DataFrame({
    'x': [10, 20, 30, 40, 50],
    'y': [15, 25, 35, 45, 55]
})

# Default scale
chart = alt.Chart(data).mark_circle(size=100).encode(
    x='x',
    y='y'
)
chart

In [148]:
chart = alt.Chart(data).mark_circle(size=100).encode(
    x='x',
    y='y'
).properties(width=200,
            height=200,
            title="graph name",
            background='#daf9f4')
chart

In [84]:
chart = alt.Chart(data).mark_circle(size=100).encode(
    x='x',
    y='y'
).properties(padding = 20) # padding={'top':20, 'bottom':20,'left':20,'right':20}
chart

#### save chart

### Multiple views

In [226]:
cars.columns

Index(['Name', 'Miles_per_Gallon', 'Cylinders', 'Displacement', 'Horsepower',
       'Weight_in_lbs', 'Acceleration', 'Year', 'Origin'],
      dtype='object')

In [140]:
cars.head(2)

Unnamed: 0,Name,Miles_per_Gallon,Cylinders,Displacement,Horsepower,Weight_in_lbs,Acceleration,Year,Origin
0,chevrolet chevelle malibu,18.0,8,307.0,130.0,3504,12.0,1970-01-01,USA
1,buick skylark 320,15.0,8,350.0,165.0,3693,11.5,1970-01-01,USA


In [118]:
alt.Chart(cars).mark_line().encode(
    alt.X('Year'),
    alt.Y('average(Miles_per_Gallon)'),
    color='Origin'
)

#### layering

In [142]:
line = alt.Chart(cars).mark_line().encode(
    alt.X('Year'),
    alt.Y('average(Miles_per_Gallon)'),
    color='Origin'
)
point = alt.Chart(cars).mark_circle().encode(
    alt.X('Year'),
    alt.Y('average(Miles_per_Gallon)'),
    color='Origin',
    # size= 'Horsepower'
)

line + point

In [144]:
mpg = alt.Chart(cars).mark_line().encode(
    alt.X('Year'),
    alt.Y('average(Miles_per_Gallon)'),
    color='Origin'
)

mpg + mpg.mark_circle()

In [170]:
mpg = alt.Chart(cars).mark_line(point=True).encode(
    alt.X('Year'),
    alt.Y('average(Miles_per_Gallon)'),
    color='Origin'
)
mpg

#### concat chart
- https://altair-viz.github.io/user_guide/compound_charts.html#layered-and-multi-view-charts

In [172]:
hp = alt.Chart(cars).mark_line(point=True).encode(
    alt.X('Year'),
    alt.Y('average(Horsepower)'),
    color='Origin'
)
hp | mpg

### Interactive

In [179]:
alt.Chart(cars).mark_point().encode(
    x='Horsepower',
    y='Miles_per_Gallon',
    color='Origin',
).interactive()

#### tooltip

In [182]:
alt.Chart(cars).mark_point().encode(
    x='Horsepower',
    y='Miles_per_Gallon',
    color='Origin',
    tooltip=['Name','Origin']
).interactive()

In [241]:
# create an interval selection over an x-axis encoding
brush = alt.selection_interval(encodings=['x'])

# determine opacity based on brush
opacity = alt.condition(brush, alt.value(0.9), alt.value(0.1))

# an overview histogram of cars per year
# add the interval brush to select cars over time
overview = alt.Chart(cars).mark_bar().encode(
    alt.X('Year:O', timeUnit='year', # extract year unit, treat as ordinal
      axis=alt.Axis(title=None, labelAngle=0) # no title, no label angle
    ),
    alt.Y('count()', title=None), # counts, no axis title
    opacity=opacity
).add_params(
    brush      # add interval brush selection to the chart
).properties(
    width=400, # set the chart width to 400 pixels
    height=50  # set the chart height to 50 pixels
)

# a detail scatterplot of horsepower vs. mileage
# modulate point opacity based on the brush selection
detail = alt.Chart(cars).mark_point().encode(
    alt.X('Horsepower'),
    alt.Y('Miles_per_Gallon'),
    # set opacity based on brush selection
    opacity=opacity
).properties(width=400) # set chart width to match the first chart

chart=(overview & detail)
chart.save('VScodeProject/consol_chart.html')
# vertically concatenate (vconcat) charts using the '&' operator
overview & detail


In [None]:
consol = overview & detail
consol.save('test.json')

#### Breakdown

In [191]:
overview = alt.Chart(cars).mark_bar().encode(
    alt.X('Year:O', timeUnit='year', # extract year unit, treat as ordinal
      axis=alt.Axis(title=None, labelAngle=0) # no title, no label angle
    ),
    alt.Y('count()', title=None), # counts, no axis title
)
overview

In [195]:
# create an interval selection over an x-axis encoding
brush = alt.selection_interval(encodings=['x'])

# determine opacity based on brush
opacity = alt.condition(brush, alt.value(0.9), alt.value(0.1))

# an overview histogram of cars per year
# add the interval brush to select cars over time
overview = alt.Chart(cars).mark_bar().encode(
    alt.X('Year:O', timeUnit='year', # extract year unit, treat as ordinal
      axis=alt.Axis(title=None, labelAngle=0) # no title, no label angle
    ),
    alt.Y('count()', title=None), # counts, no axis title
    opacity=opacity
).add_params(
    brush      # add interval brush selection to the chart
).properties(
    width=400, # set the chart width to 400 pixels
    height=50  # set the chart height to 50 pixels
)

overview

### JSON output

In [230]:
chart = alt.Chart(df).mark_bar().encode(
    x='average(precip)',
    y='city',
)
print(chart.to_json())

json_save = chart.to_json()

{
  "$schema": "https://vega.github.io/schema/vega-lite/v5.20.1.json",
  "config": {
    "view": {
      "continuousHeight": 300,
      "continuousWidth": 300
    }
  },
  "data": {
    "name": "data-8e72c2f67818e64f2c6d729f1a903405"
  },
  "datasets": {
    "data-8e72c2f67818e64f2c6d729f1a903405": [
      {
        "city": "Seattle",
        "month": "Apr",
        "precip": 2.68
      },
      {
        "city": "Seattle",
        "month": "Aug",
        "precip": 0.87
      },
      {
        "city": "Seattle",
        "month": "Dec",
        "precip": 5.31
      },
      {
        "city": "New York",
        "month": "Apr",
        "precip": 3.94
      },
      {
        "city": "New York",
        "month": "Aug",
        "precip": 4.13
      },
      {
        "city": "New York",
        "month": "Dec",
        "precip": 3.58
      },
      {
        "city": "Chicago",
        "month": "Apr",
        "precip": 3.62
      },
      {
        "city": "Chicago",
        "month": "Aug",

In [200]:
### shorthand
x = alt.X('average(precip):Q')
print(x.to_json())

{
  "aggregate": "average",
  "field": "precip",
  "type": "quantitative"
}


In [202]:
### full-length
x = alt.X(aggregate='average', field='precip', type='quantitative')
print(x.to_json())

{
  "aggregate": "average",
  "field": "precip",
  "type": "quantitative"
}


### Publish Visualization

In [233]:
chart = alt.Chart(df).mark_bar().encode(
    x='average(precip)',
    y='city',
)
chart.save('testchart.html')

### slope chart

In [239]:
dataset_url = 'https://gist.githubusercontent.com/puripant/857f1981667e8b42da2c72328ba94ead/raw/5eaa80f920849c53c0c3066bc2755bc53b1c3973/medals.csv'
df=pd.read_csv(dataset_url)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 283 entries, 0 to 282
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   year    283 non-null    int64 
 1   name    283 non-null    object
 2   gold    283 non-null    int64 
 3   host    283 non-null    object
dtypes: int64(2), object(2)
memory usage: 9.0+ KB


In [359]:
df.head(2)

Unnamed: 0,year,name,gold,host
261,2021,Vietnam,205,y
262,2021,Thailand,92,n


In [361]:
df = df[(df['year'] == 2021)|(df['year'] == 2023)]
df.head()

Unnamed: 0,year,name,gold,host
261,2021,Vietnam,205,y
262,2021,Thailand,92,n
263,2021,Indonesia,69,n
264,2021,Philippines,52,n
265,2021,Singapore,47,n


In [367]:
alt.Chart(df).mark_line().encode(
    x='year:O',
    y='gold:Q',
    color ='name:N'
)

In [48]:
import pandas as pd
import altair as alt

# Load the data
url = "https://gist.githubusercontent.com/puripant/857f1981667e8b42da2c72328ba94ead/raw/5eaa80f920849c53c0c3066bc2755bc53b1c3973/medals.csv"
data = pd.read_csv(url)

# # Display the first few rows and column names
print(data.head())
# print(data.columns)

   year       name  gold host
0  1959   Thailand    35    y
1  1959    Myanmar    11    n
2  1959   Malaysia     8    n
3  1959  Singapore     8    n
4  1959    Vietnam     5    n


In [None]:
# Filter for the years 2021 and 2023
data_filtered = data[data['year'].isin([2021, 2023])]

# Pivot the data to have separate columns for 2021 and 2023 values
data_pivot = data_filtered.pivot(index='name', columns='year', values='gold').reset_index()
data_pivot.columns.name = None
data_pivot.rename(columns={2021: 'gold_2021', 2023: 'gold_2023'}, inplace=True)

# Calculate the color and labels
data_pivot['Change'] = data_pivot['gold_2023'] - data_pivot['gold_2021']
data_pivot['Color'] = data_pivot['Change'].apply(lambda x: 'blue' if x > 0 else 'grey')
data_pivot['Label'] = data_pivot.apply(lambda row: f"{row['name']}, {row['gold_2023']}", axis=1)

In [46]:
data_pivot

Unnamed: 0,name,gold_2021,gold_2023,Change,Color,Label
0,Brunei,1,2,1,blue,"Brunei, 2"
1,Cambodia,9,81,72,blue,"Cambodia, 81"
2,Indonesia,69,87,18,blue,"Indonesia, 87"
3,Laos,2,6,4,blue,"Laos, 6"
4,Malaysia,39,34,-5,grey,"Malaysia, 34"
5,Myanmar,9,21,12,blue,"Myanmar, 21"
6,Philippines,52,58,6,blue,"Philippines, 58"
7,Singapore,47,51,4,blue,"Singapore, 51"
8,Thailand,92,108,16,blue,"Thailand, 108"
9,Timor-Leste,0,0,0,grey,"Timor-Leste, 0"


In [None]:
# Melt the data for Altair
data_melted = data_pivot.melt(id_vars=['name', 'Change', 'Color', 'Label'], value_vars=['gold_2021', 'gold_2023'],
                              var_name='year', value_name='gold')

In [44]:
data_melted

Unnamed: 0,name,Change,Color,Label,year,gold
0,Brunei,1,blue,"Brunei, 2",gold_2021,1
1,Cambodia,72,blue,"Cambodia, 81",gold_2021,9
2,Indonesia,18,blue,"Indonesia, 87",gold_2021,69
3,Laos,4,blue,"Laos, 6",gold_2021,2
4,Malaysia,-5,grey,"Malaysia, 34",gold_2021,39
5,Myanmar,12,blue,"Myanmar, 21",gold_2021,9
6,Philippines,6,blue,"Philippines, 58",gold_2021,52
7,Singapore,4,blue,"Singapore, 51",gold_2021,47
8,Thailand,16,blue,"Thailand, 108",gold_2021,92
9,Timor-Leste,0,grey,"Timor-Leste, 0",gold_2021,0


In [None]:
# Create the Altair chart
line_chart = alt.Chart(data_melted).mark_line().encode(
    x=alt.X('year:O', title='Year'),
    y=alt.Y('gold:Q', title='Gold Medals'),
    color=alt.Color('Color:N', scale=alt.Scale(domain=['blue', 'grey'], range=['blue', 'grey'])),
    detail='name:N'
)

# Add text labels for the year 2023
text_chart = alt.Chart(data_pivot).mark_text(align='left', dx=150, dy=0).encode(
    x=alt.X('year:O'),
    y=alt.Y('gold_2023:Q'),
    text='Label:N'
)

# Combine line chart and text labels
chart = line_chart + text_chart

chart.properties(width=200,
    title='Slope Graph: Gold Medals in 2021 vs 2023'
)