In this notebook, we will explore how to generate a plot using Vega-Altair and Figma that tells a story about AI computation for different domains. The dataset that we will use comes from Our World in Data's Artificial Intelligence articles, specifically (https://ourworldindata.org/artificial-intelligence) 

In [208]:
# install necessary packages
import altair as alt
import pandas as pd

I'm interested in the training for AI systems, but also in the environments in which these were developed (e.g., academia, private sector, etc.). This information lives in two separate files, so I need to read them in and combine them. 

In [209]:
# get the data and arrange
# data source: https://ourworldindata.org/grapher/artificial-intelligence-training-computation?tab=table
domain = pd.read_csv("data/artificial-intelligence-training-computation.csv")
research = pd.read_csv("data/artificial-intelligence-training-computation-by-researcher-affiliation.csv")
merged = pd.concat([domain, research['Researcher_affiliation']],axis=1)
print(merged.head())

# how much of this information is actually missing from the dataset? 
percentage_nan = (merged['Training_computation_petaflop'].isna().sum())/len(merged.index)
percentage_nan


                Entity  Code  year         Day  Training_computation_petaflop  \
0  6-layer MLP (MNIST)   NaN  2010  2010-03-01                   1.310000e-01   
1            A3C FF hs   NaN  2016  2016-02-04                            NaN   
2              ADALINE   NaN  1960  1960-06-30                   9.900000e-12   
3      ADAM (CIFAR-10)   NaN  2014  2014-12-22                   6.050000e+01   
4               ALBERT   NaN  2019  2019-09-26                            NaN   

     Domain           Researcher_affiliation  
0    Vision                         Academia  
1     Games  Collaboration, Industry-leaning  
2    Vision                         Academia  
3    Vision                    Collaboration  
4  Language                    Collaboration  


0.5849802371541502

In [210]:
# force altair to read as many rows as I have in my csv file
alt.data_transformers.disable_max_rows()  # enable altair to load data >5000 rows

DataTransformerRegistry.enable('default')

In [211]:
# create my base chart
base = (alt.Chart(merged).mark_circle().encode(
        alt.X("Day:T", title="Publication Date"),
        alt.Y("Training_computation_petaflop:Q", title="petaFLOP"),
        color=alt.Color("Domain:N"),
        tooltip=["Entity", "year", "Researcher_affiliation"]
    ).interactive()
    .properties(title="Development of AI")
    ).configure_title(fontSize=20, anchor="start")
base

In [212]:
# make y-axis logarithmic to make points and distribution more readable 
log_y = (
    alt.Chart(merged)
    .mark_circle()
    .encode(
        alt.X("Day:T", title="Publication Date"),
        alt.Y("Training_computation_petaflop:Q", scale=alt.Scale(type="log"), title="petaFLOP"),
        color=alt.Color("Domain:N"),
        tooltip=["Entity", "year", "Researcher_affiliation"]
    ).interactive().properties(title="Development of AI") #for faceted
).configure_title(fontSize=20, anchor="start")
log_y


In [213]:
# facet chart by researcher affiliation
faceted = (
    alt.Chart(merged)
    .mark_circle()
    .encode(
        alt.X("Day:T", title="Publication Date"),
        alt.Y("Training_computation_petaflop:Q", scale=alt.Scale(type="log"), title="petaFLOP"),
        row='Researcher_affiliation:N',
        color=alt.Color("Domain:N"),
        tooltip=["Entity", "year", "Researcher_affiliation"]
    ).interactive().properties(title="Development of AI",width=600,height=100) #for faceted
).configure_title(fontSize=20, anchor="start")
faceted


In [214]:
# adjust colors to clarify attention and reformat axis (https://github.com/d3/d3-format#locale_format)
grey_context = (
    alt.Chart(merged)
    .mark_circle(size=30)
    .encode(
        alt.X("Day:T", title="Publication Date", scale=alt.Scale(domain=["1950-01-30","2025-12-30"])),
        alt.Y("Training_computation_petaflop:Q", scale=alt.Scale(type="log"), title="petaFLOP", axis=alt.Axis(format="s")),
        row='Researcher_affiliation:N',
        color=alt.condition(
            alt.datum.Domain == "Vision",
            alt.value("orange"), alt.value("lightgrey")
        ),
        tooltip=["Entity", "year", "Researcher_affiliation"]
    ).interactive().properties(width=600, height=100, title="Development of AI") #for faceted
)
grey_context


In [215]:
# experiment with visual decluttering before trying in Figma
grey_context.configure_header(
    labelOrient="top", 
    #titleAngle=0,
    titleAlign="left"
    ).configure_axisY(
        grid=False,
        labelColor='grey',
       #tickCount=5,
       #labelExpr="datum.value % 1000000 ? null : datum.label"
    ).configure_axisX(
        labelColor='grey',
        gridOpacity=0.4,
        #grid=False
    ).configure_view(
        stroke=None
    ).configure_title(fontSize=20, anchor="start")


At this point, it's fun to experiment but I know that what I want to achieve is much faster and easier design-wise in Figma, so moving over to there. 