# Compressed Vector Downsampler Example

If you haven't already, please check the example for Compressed Vector first.

Lets say you want to plot a vector with 100000 points.

In [1]:
import pandas as pd
import altair as alt
import numpy as np

alt.data_transformers.disable_max_rows() # disable max rows limit for altair

# create sin wave data
x = pd.Series(range(100000))
y = pd.Series(np.sin(x / 1000.0))

import time 

start  = time.time()
# create a dataframe
df = pd.DataFrame({
    'x': x,
    'y': y
})
# plot it
chart = alt.Chart(df).mark_line().encode(
    x='x',
    y='y'
).properties(
    title='Original Data'
).interactive()
end = time.time()
print(f"Time taken to plot original data: {end - start:.2f} seconds")
# display the chart
chart

Time taken to plot original data: 0.01 seconds


Nice, we got our plot. There is so many points on it... and its so slow to interact with it. Do we really need that many points?
Well, we can downsample our data...

In [2]:
import tsdownsample

# downsample the data
indices = tsdownsample.MinMaxLTTBDownsampler().downsample(x, y, n_out=1000)
# create a new dataframe with the downsampled data
start = time.time()
df_downsampled = pd.DataFrame({
    'x': x[indices],
    'y': y[indices]
})
# plot the downsampled data
chart_downsampled = alt.Chart(df_downsampled).mark_line().encode(
    x='x',
    y='y'
).properties(
    title='Downsampled Data'
).interactive()
end = time.time()
print(f"Time taken to plot downsampled data: {end - start:.2f} seconds")
# display the chart
chart_downsampled

Time taken to plot downsampled data: 0.03 seconds


Cool! we got basically the same plot, in less time and way faster to interact to. However, we got extra space used.

In [3]:
print(f"Original data size: {x.nbytes + y.nbytes} bytes")
print(f"Indices size: {indices.nbytes} bytes")
print(f"Downsampled data size: {df_downsampled.x.nbytes + df_downsampled.y.nbytes} bytes")

Original data size: 1600000 bytes
Indices size: 8000 bytes
Downsampled data size: 16000 bytes


We added space to deal with the indices. We can only used the chopped vector now and that will actually mean less space used. But we can reduce this space even more with `CompressedVectorDownsampler`

In [6]:
from cv_visualization import CompressedVectorDownsampler as cvd

cvd_downsampler_x, cvd_downsampler_y = cvd().downsample(
    x = x,
    y = y,
    n_out=1000
)
start = time.time()
# create a new dataframe with the downsampled data
df_cvd_downsampled = pd.DataFrame({
    'x': cvd_downsampler_x,
    'y': cvd_downsampler_y
})
df_downsampled_shifted = df_downsampled.copy()
df_downsampled_shifted['y'] += 0.5
# plot the downsampled data along with the tsdownsampled data
df_comparison = pd.concat([
    df_downsampled_shifted.assign(type='tsdownsampled'),
    df_cvd_downsampled.assign(type='cvd_downsampled')
])

# plot the comparison
chart_comparison = alt.Chart(df_comparison).mark_line().encode(
    x='x',
    y='y',
    color='type'
).properties(
    title='Comparison of Original and Compressed Vectors'
).interactive()
end = time.time()
print(f"Time taken to plot comparison data: {end - start:.2f} seconds")
# display the chart
chart_comparison

Time taken to plot comparison data: 0.03 seconds


And we can save even more space!

In [8]:
print(f"Original data size: {x.nbytes + y.nbytes} bytes")
print(f"Indices size: {indices.nbytes} bytes")
print(f"Downsampled data size: {x[indices].nbytes + y[indices].nbytes} bytes")
print(f"Compressed Vector Downsampler data size: {cvd_downsampler_x.size_in_bytes() + cvd_downsampler_y.size_in_bytes()} bytes")

Original data size: 1600000 bytes
Indices size: 8000 bytes
Downsampled data size: 16000 bytes
Compressed Vector Downsampler data size: 6718 bytes


You can assign a different downsampler method or compress method. You can list all availables ones with:

In [11]:
from cv_visualization import list_available_compression_methods, list_available_downsamplers
print("Available compression methods:")
for method in list_available_compression_methods():
    print(f"- {method}")
print("Available downsamplers:")
for downsampler in list_available_downsamplers():
    print(f"- {downsampler}")

# lets plot one last chart with compress method vlc_vector_elias_gamma and downsampler EveryNthDownsampler
cvd_downsampler_x, cvd_downsampler_y = cvd().downsample(
    x=x,
    y=y,
    n_out=1000,
    compress_method='vlc_vector_elias_gamma',
    method='EveryNthDownsampler'
)
start = time.time()
# create a new dataframe with the downsampled data
df_cvd_downsampled = pd.DataFrame({
    'x': cvd_downsampler_x,
    'y': cvd_downsampler_y
})

# plot the downsampled data
chart_cvd_downsampled = alt.Chart(df_cvd_downsampled).mark_line().encode(
    x='x',
    y='y'
).properties(
    title='Compressed Vector Downsampled Data with VLC Vector Elias Gamma'
).interactive()
end = time.time()
print(f"Time taken to plot Compressed Vector Downsampled data: {end - start:.2f} seconds")
# display the chart
chart_cvd_downsampled   


Available compression methods:
- enc_vector_elias_gamma
- enc_vector_fibonacci
- enc_vector_comma_2
- enc_vector_elias_delta
- vlc_vector_elias_delta
- vlc_vector_elias_gamma
- vlc_vector_fibonacci
- vlc_vector_comma_2
- dac_vector
- No Compression
Available downsamplers:
- MinMaxLTTBDownsampler
- M4Downsampler
- LTTBDownsampler
- MinMaxDownsampler
- EveryNthDownsampler
- NaNM4Downsampler
- NaNMinMaxDownsampler
- NaNMinMaxLTTBDownsampler
Time taken to plot Compressed Vector Downsampled data: 0.05 seconds
