# CPUs plot with Altair/Vega

Plots, made by [Vega-Altair](https://altair-viz.github.io/):
1. Performance by launch date
   - with linear or log scale
   - with or without regression line
2. Sorted performance (like already done with Matplotlib)

Remark: Altair plots which aren't persistent the notebook, for some reason...

Two Vega plot specifications are saved for Performance by launch date:
- [CPUs_bydate_linear_withreg.json](Geekbench%206%20plots/CPUs_bydate_linear_withreg.json)
  (along with static image [CPUs_bydate_linear_withreg.png](Geekbench%206%20plots/CPUs_bydate_linear_withreg.png))
- [CPUs_bydate_log_withreg.json](Geekbench%206%20plots/CPUs_bydate_log_withreg.json)
  (along with static image [CPUs_bydate_log_withreg.png](Geekbench%206%20plots/CPUs_bydate_log_withreg.png))

Work in progress:
- show side by side the CPU perf by name sorted by perf and the plot by launch date
  - with highlighting selection
  - → works but the plot is way to big (too high)
- change the sorting order interactively. See https://stackoverflow.com/questions/67379937/change-mark-order-via-parameter (doesn't work yet...) and https://altair-viz.github.io/user_guide/transform/window.html

PH, Feb-Apr 2025

In [1]:
import numpy as np
import pandas as pd
import statsmodels.api as sm
import altair as alt

## Load CPU dataset

load CSV export of Baserow table (CSV exported in [Retrieve_baserow.ipynb](Retrieve_baserow.ipynb))
- 32 rows as of April 27, 2025, but only 29 in the [low power U-series](https://www.makeuseof.com/intel-u-vs-p-vs-h-laptop-cpus/)

In [2]:
cpus = pd.read_csv('CPUs.csv', parse_dates=['Launch date'])
# add Log transforms
cpus['GB6 Single log2'] = np.log2(cpus['GB6 Single']/1000)
cpus['GB6 Multi log2'] = np.log2(cpus['GB6 Multi']/1000)

cpus.tail(3)

Unnamed: 0,Name,Designer,Launch date,Cores,Age,Architecture,GB6 Single,GB6 Multi,Win11,Product URL,GB6 Single log2,GB6 Multi log2
29,AMD Ryzen 5 PRO 6650U,AMD,2022-04-19,6,3.0,Rembrandt,1828.0,6717.0,Yes,https://www.amd.com/en/support/downloads/drive...,0.870266,2.747817
30,AMD Ryzen 5 PRO 7540U,AMD,2023-03-05,6,2.1,Phoenix,2226.0,8076.0,Yes,https://www.amd.com/en/products/processors/lap...,1.154454,3.013641
31,AMD Ryzen 5 PRO 8540U,AMD,2024-04-01,6,1.1,Hawk Point,2345.0,8081.0,Yes,https://www.amd.com/en/products/processors/lap...,1.229588,3.014534


Select CPUs to keep on the plot: only low power (series-U)

In [3]:
cpu_list_plot = [
    'Intel Core i5-5200U',
    'Intel Core i5-5300U',
    'Intel Core i5-6200U',
    'Intel Core i5-6300U',
    'Intel Core i5-7200U',
    'Intel Core i5-7300U',
    'Intel Core i5-8250U',
    'Intel Core i5-8350U',
    'Intel Core i5-8265U',
    'Intel Core i5-8365U',
    #'Intel Core i5-9400H',
    'Intel Core i5-10210U',
    'Intel Core i5-10310U',
    'Intel Core i5-1035G4',
    'Intel Core i5-1035G7',
    'Intel Core i5-1135G7',
    'Intel Core i5-1145G7',
    'Intel Core i5-1235U',
    'Intel Core i5-1245U',
    'Intel Core i5-1335U',
    'Intel Core i5-1345U',
    'Intel Core Ultra 5 125U',
    'Intel Core Ultra 5 135U',
    #'Intel Core Ultra 5 125H',
    #'Intel Core Ultra 5 135H',
    # AMD
    'AMD Ryzen 7 PRO 2700U',
    'AMD Ryzen 5 PRO 3500U',
    'AMD Ryzen 5 PRO 4650U',
    'AMD Ryzen 5 PRO 5650U',
    'AMD Ryzen 5 PRO 6650U',
    'AMD Ryzen 5 PRO 7540U',
    'AMD Ryzen 5 PRO 8540U'
]

In [4]:
i_drop = []
for i in range(len(cpus)):
    if cpus.iloc[i].Name not in cpu_list_plot:
        i_drop.append(i)
print("dropped rows: ", i_drop)
cpus.drop(i_drop, inplace=True)

dropped rows:  [10, 23, 24]


In [5]:
len(cpus)

29

### Compute "Moore’s law coefficient"

Regression model for performance score $p$:

$$\log_2 p = a.t + b$$

with $t$ in years

so that in linear scale we have:

$$p = c.2^{a.t}$$

which can be expressed in several ways:
- multiplicative gain per year: $2^{a.1} = 2^a$
- time it takes to double the performance: $T$ such that $T.a = 1$, i.e. $T=1/a$

Results (with 05-95% confidence intervals):
- **Single-core** performance trend: $a=0.160$ in [0.14, 0.18]
    - gain per year: 1.12 [1.10, 1.13], i.e. +12%/y ±1%/y
    - $T$ = 6.25 y [5.5, 7.2], i.e. about 6y ±1y
- **Multi-core** performance trend: $a=0.266$ in [0.24, 0.29]
    - gain per year: 1.20 [1.18, 1.22], i.e. +20%/y ±2%/y
    - $T$ = 3.75 y [3.5, 4.2], i.e. about 4y ±4M

Convert lauch date to a fractional year (used as regressor)

In [6]:
t = cpus['Launch date'] - cpus['Launch date'][0]
t = t/pd.Timedelta('1 day')/365.25
t.tail()

27    5.497604
28    6.162902
29    7.296372
30    8.172485
31    9.248460
Name: Launch date, dtype: float64

Fit with Numpy

In [7]:
np.polyfit(t,  cpus['GB6 Single log2'], deg=1)

array([ 0.1603516 , -0.26622315])

In [8]:
np.polyfit(t,  cpus['GB6 Multi log2'], deg=1)

array([0.26611921, 0.6764268 ])

Fit with Statsmodels: same result, but with confidence interval

In [9]:
t0 = sm.add_constant(t)
res_Single = sm.OLS(cpus['GB6 Single log2'], t0).fit()
res_Single.summary(slim=True)

0,1,2,3
Dep. Variable:,GB6 Single log2,R-squared:,0.911
Model:,OLS,Adj. R-squared:,0.908
No. Observations:,29,F-statistic:,275.9
Covariance Type:,nonrobust,Prob (F-statistic):,1.06e-15

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,-0.2662,0.053,-4.985,0.000,-0.376,-0.157
Launch date,0.1604,0.010,16.611,0.000,0.141,0.180


In [10]:
res_Multi = sm.OLS(cpus['GB6 Multi log2'], t0).fit()
res_Multi.summary(slim=True)

0,1,2,3
Dep. Variable:,GB6 Multi log2,R-squared:,0.956
Model:,OLS,Adj. R-squared:,0.954
No. Observations:,29,F-statistic:,581.3
Covariance Type:,nonrobust,Prob (F-statistic):,8.5e-20

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,0.6764,0.061,11.077,0.000,0.551,0.802
Launch date,0.2661,0.011,24.109,0.000,0.243,0.289


Just keep date coefficients:

In [11]:
a_Single = float(res_Single.params['Launch date'])
a_Multi = float(res_Multi.params['Launch date'])
a_Single, a_Multi

(0.1603516004747482, 0.2661192081739736)

Conversion to gain per year $2^a$:

In [12]:
g_Single = 2**a_Single
g_Single, 2**0.14, 2**0.18

(1.1175594662846702, 1.1019051158766107, 1.1328838852957985)

In [13]:
g_Multi = 2**a_Multi
g_Multi, 2**0.24, 2**0.29

(1.2025686116059568, 1.1809926614295303, 1.2226402776920684)

Conversion to $T$: time to double the performance

In [14]:
T_Single = 1/a_Single
T_Single, 1/0.14, 1/0.18

(6.236295721647491, 7.142857142857142, 5.555555555555555)

In [15]:
T_Multi = 1/a_Multi
T_Multi, 1/0.24, 1/0.29

(3.757714472629337, 4.166666666666667, 3.4482758620689657)

## Plot Performance by launch date

to "check" Moore's law

### Performance on a linear scale

Highlight point on mouse over

In [16]:
highlight = alt.selection_point(name="highlight", on="pointerover", empty=False)

Color reacts to selection

In [17]:
color_designer = alt.when(highlight, empty=True).then(
        alt.Color('Designer:N').scale(scheme='set1') # nice scheme: Intel=blue, AMD=red
    ).otherwise(
        alt.value('lightgray')
    )
#color_w11 = alt.when(highlight, empty=True).then('Win11').otherwise(alt.value('lightgray'))
order = alt.when(highlight).then(alt.value(1)).otherwise(alt.value(0)) # move upfront highlighted items

Chart

In [18]:
chart_date = alt.Chart(cpus).mark_point(size=150, filled=True, strokeWidth=5).encode(
    y='Launch date',
    color=color_designer,
    shape='Designer', # reenforce color
    #shape='Win11',
    order=order,
    tooltip=['Name', 'Architecture', 'Cores', 'Launch date', 'Win11', 'GB6 Single', 'GB6 Multi'],
).add_params(
    highlight
)

chart_dsingle = chart_date.encode(
    x='GB6 Single',
).properties(
    title='Single-Core Performance'
)
chart_dmulti = chart_date.encode(
    x='GB6 Multi',
    y=alt.Y('Launch date', axis=alt.Axis(labels=False, title='')) # hide y tick labels
).properties(
    title='Multi-Core Performance'
)

(chart_dsingle | chart_dmulti)

### Variant with log scale

- issue: compared to Matplotlib plots, I don't know how to display the orginal scores rather than the raw log values

In [19]:
chart_dsingle_log = chart_dsingle.encode(x='GB6 Single log2')
chart_dmulti_log = chart_dmulti.encode(x='GB6 Multi log2')
(chart_dsingle_log | chart_dmulti_log)

Better variant: log scale, **superimposed** single+multi scores

In [20]:
chart_dmulti_log2 = chart_date.encode(x='GB6 Multi log2')
chart_single_multi_log2 = (chart_dsingle_log + chart_dmulti_log2).properties(
    title='Single & Multi-Core Performance',
    width=450
)
chart_single_multi_log2

### Adding regression lines

- linear regression for log performance
- exponetial law regression for linear performance

Remark: the tooltip for the regression line only appears well in the linear plot (with exponential fit). In the log scale plot (linear fit), one needs to point the line end to get the tooltip... not nice

Nice text for the regression lines

In [21]:
tip_Single = alt.value(f'+{g_Single-1:.0%}/y (doubling every {T_Single:.1f}y)')
tip_Multi = alt.value(f'+{g_Multi-1:.0%}/y (doubling every {T_Multi:.1f}y)')
tip_Single['value'], tip_Multi['value']

('+12%/y (doubling every 6.2y)', '+20%/y (doubling every 3.8y)')

In [22]:
chart_regline = alt.Chart(cpus).mark_line(color='gray', opacity=0.5, strokeWidth=4, strokeCap='round', strokeDash=(1,8))
chart_dsingle_logreg = chart_regline.encode(
    y='Launch date',
    x='GB6 Single log2',
    tooltip=tip_Single,
).transform_regression('Launch date', 'GB6 Single log2')
chart_dmulti_logreg = chart_regline.encode(
    y='Launch date',
    x='GB6 Multi log2',
    tooltip=tip_Multi,
).transform_regression('Launch date', 'GB6 Multi log2')

(chart_dsingle_log + chart_dsingle_logreg) | (chart_dmulti_log + chart_dmulti_logreg)

Variant with the two plots superimposed

In [29]:
chart = chart_single_multi_log2 + chart_dsingle_logreg +  chart_dmulti_logreg
chart.save('Geekbench 6 plots/CPUs_bydate_log_withreg.json')
#chart.save('Geekbench 6 plots/CPUs_bydate_log_withreg.png', ppi=200)
chart

Exponential law for lin performance

In [28]:
chart_dsingle_linreg = chart_regline.encode(
    y='Launch date',
    x='GB6 Single',
    tooltip=tip_Single,
).transform_regression('Launch date', 'GB6 Single', method='exp')
chart_dmulti_linreg =chart_regline.encode(
    y='Launch date',
    x='GB6 Multi',
    tooltip=tip_Multi
).transform_regression('Launch date', 'GB6 Multi', method='exp')

chart = (chart_dsingle + chart_dsingle_linreg) | (chart_dmulti + chart_dmulti_linreg)
chart.save('Geekbench 6 plots/CPUs_bydate_linear_withreg.json')
#chart.save('Geekbench 6 plots/CPUs_bydate_linear_withreg.png', ppi=200)
chart

Attempt to retrieve regression parameters (coefficient for Moores law): *hard to do* and not available from Python → easier to do the regression in Python separately (as done above)

In [25]:
params_dsingle_logreg = chart_regline.transform_regression('Launch date', 'GB6 Single log2', params=True).mark_text().encode(
    x=alt.value(0.0),
    y=alt.value(0),
    text='coef:N'
)

chart_dsingle_log+chart_dsingle_logreg + params_dsingle_logreg

## Plot CPUs sorted by perf

works, but would be more readable with a shorter CPU list

In [26]:
color_age = alt.when(highlight).then(
        alt.value('red')
    ).otherwise(
        alt.Color('Age:Q').scale(scheme='plasma', reverse=True, zero=True)
    )
size_highlight = alt.when(highlight).then(alt.value(200))

In [27]:
chart_name = alt.Chart(cpus).mark_point(size=100, filled=True).encode(
    alt.Y('Name').sort(field='GB6 Single', order='descending'),
    tooltip=['Name', 'Architecture', 'Cores', 'Launch date', 'Win11'],
    color=color_age,#'Age',
    shape='Designer',
    size=size_highlight
).add_params(
    highlight
)

chart_nsingle = chart_name.encode(
    x='GB6 Single',
).properties(
    title='Single-Core Performance'
)

chart_nmulti = chart_name.encode(
    x='GB6 Multi',
    y=alt.Y('Name', axis=alt.Axis(labels=False, title='')).sort(field='GB6 Single', order='descending') # hide y tick labels
).properties(
    title='Multi-Core Performance'
)
chart_nsingle | chart_nmulti

### Both charts, with synced highlight

works, but is too big! Need to use the short list only

In [28]:
(chart_nsingle | chart_nmulti) & \
(chart_dsingle | chart_dmulti)

---
Attempt to make the ordering changebable (failed)

In [29]:
index_select = alt.binding_select(options=['GB6 Single', 'GB6 Multi'], name='sort field:')
index_param = alt.param(bind=index_select)
index_param

Parameter('param_1', VariableParameter({
  bind: BindRadioSelect({
    input: 'select',
    name: 'sort field:',
    options: ['GB6 Single', 'GB6 Multi']
  }),
  name: 'param_1'
}))

Note: adding the param as sort field doesn't work. Read example more carefully? https://altair-viz.github.io/gallery/multiple_interactions.html#gallery-multiple-interactions

In [30]:
chart_name = alt.Chart(cpus).transform_window(
    sort=[{'field': 'param_2'}],
    frame=[None, 0],
    perf_sorted='rank(*)'
).mark_point(size=100, filled=True).encode(
    alt.Y('Name').sort(field='perf_sorted', order='descending'),#.sort(field='GB6 Single', order='descending'),
    tooltip='Name',
    #color='Win11',
    color='Age',
    shape='Designer'
).add_params(index_param)

chart_nsingle = chart_name.encode(
    x='GB6 Single',
).properties(
    title='Single-Core Performance'
)
chart_nsingle