## Visualization Builder for the Bachelor Research project "Gamepad Controls for Visualizations"

### Introduction
This Notebook contains all different kinds of (interactive) Visualizations used in the project. The Library used is **Vega Altair**. It is intended to provide live interaction for testing and tweaking purposes. Note that they can always be exported as a html, json spec or even as a still image, if they are needed in different contexts. 
Because some visualizations are too memory intensive for Jupyter, they are rendered using the browser. Thats why some cells will open a new
Browser Tab. Additional configurations / versions of plots can be found inside the [outputs](./outputs/) directory. 

**Note**: Sometimes the Jupyter render engine will crash when rendering more complex Visualizations like the Parallel Coordinates Plot. If this happens, simply export the Visualization as a html file and view it in the Browser. Alternatively, close and reopen the Notebook a few times.
For more information on how to view and or export the generated Visualizations, consider the [Vega Altair User Guide](https://altair-viz.github.io/user_guide/saving_charts.html). 



In [2]:
# imports
import altair as alt
data_url = "https://raw.githubusercontent.com/nickprbs/Forschungsprojekt/refs/heads/main/yearly_avg_downsampled.csv"

# Proportions used for the prototype
height = 615
width = 900
height_small = 307
width_small = 450

### Single-Chart Pipeline ###
Each chart created with Altair needs to be constructed through a pipeline. Although some steps are **optional**, their *order* **is not!** 
Here is the order that should be followed:
1. **Data**: Set the source of the data (e.g. as Dataframe, URL, etc.)
2. **Transformation**: Perform transformations on the data (e.g. calculate new columns, filter columns, aggregate data points...) 
3. **Visualization**: Choose the kind of visualization (e.g. Bar Chart)
4. **Encodings**: Encode meanings of the data (e.g. axes, colors, tooltips etc.)
5. **Properties**: Additional Properties of the chart (e.g. Width, Height, Title, etc.)

More information can be found in the [Vega-Altair User Guide](https://altair-viz.github.io/user_guide/data.html)

---
## 1. Bar Chart

In [3]:
# Bar chart with line
bar = alt.Chart(data_url).mark_bar().encode(
    x=alt.X('time:T', title='Year'),
    y=alt.Y('tas:Q', aggregate='mean', title='Average Surface Temperature (K)')
).properties(
    width=600,
    height=400
)

bar

This is very basic but works well. I want to improve the readability by **clustering** the years into groups of five and **scaling** the y-axis more to better show the actual trends. In this configuration, the data is imported through a URL link that contains the data in csv format. Therefore all calculations are not performed with the help of *pandas*. This could be an issue but for now, it is done manually. 

**scaled**:

In [4]:
bar_scaled = alt.Chart(data_url).mark_bar().encode(
    x=alt.X('time:T', title='Year'),
    y=alt.Y('tas:Q', aggregate='mean', title='Average Surface Temperature (K)', scale=alt.Scale(domain=[250, 300]))
).properties(
    width=600,
    height=400
)
bar_scaled

**scaled + clustered**: 

In [5]:
bar_scaled_clustered = alt.Chart(data_url).transform_calculate(
    year_group = "floor(year(datum.time)/5)*5"
).mark_bar().encode(
    x=alt.X('year_group:O', title='Year (Interval of 5)', axis=alt.Axis(labelAngle=0)),
    y=alt.Y('tas:Q', aggregate='mean', title='Average Surface Temperature (K)', scale=alt.Scale(domain=[250, 300]))
).properties(
    width=600,
    height=400
)
bar_scaled_clustered

Now, I want to add another line that schows the mean over the years. Because this reintroduces more data points ( average of every year instead of average of every five years), the x-axis needs to be altered: The x-axis of the bar chart is **binned** 

In [6]:
line = alt.Chart(data_url).transform_calculate(
    years = "year(datum.time)"
).mark_line(color='red', strokeWidth=3).encode(
    x=alt.X('years:Q', axis=None, scale=alt.Scale(domain=[2015, 2099])),
    y=alt.Y('tas:Q', aggregate='mean', scale=alt.Scale(domain=[250, 300]))
).properties(
    width=600,
    height=400
)


combined = (bar_scaled_clustered + line).resolve_axis(x='shared', y='shared')
combined

The structure of this graph is pretty good for as of now. However it is only **static**. We need it to be **interactive**. A challenge from a code-based perspective is the fact, that altough VegaAltair allows for many different and customizable interactions, the amount of interactions per graph is limited (from my current knowledge). This means that we may need to create several different interactive graphs. The code therefore needs to be reusable. For now, some basic interactions are tested and *then* the graphs are linked together in the last step.

### Function based chart creation

In [7]:
# introduce function to create the same chart but with support for different selection parameters
def create_bar_line_chart(selection_type: alt.Parameter) -> alt.Chart:
    # bar chart
    bar_scaled_clustered = alt.Chart(data_url).transform_calculate(
        year_group = "floor(year(datum.time)/5)*5"
    ).mark_bar().encode(
        x=alt.X('year_group:O', title='Year (Interval of 5)', axis=alt.Axis(labelAngle=0)),
        y=alt.Y('tas:Q', aggregate='mean', title='Average Surface Temperature (K)', scale=alt.Scale(domain=[250, 300])),
        # e.g. parameterize color of graph elements based on selection
        color=alt.when(selection_type).then(alt.value("#4c78a8")).otherwise(alt.value("lightgray"))
    ).properties(
        width=600,
        height=400
    ).add_params(
        selection_type
    )
    
    # line chart (without interaction)
    line = alt.Chart(data_url).transform_calculate(
        years = "year(datum.time)"
    ).mark_line(color='red', strokeWidth=3).encode(
        x=alt.X('years:Q', axis=None, scale=alt.Scale(domain=[2015, 2099])),
        y=alt.Y('tas:Q', aggregate='mean', scale=alt.Scale(domain=[250, 300]))
    ).properties(
        width=600,
        height=400
    )
    
    # combine both charts into one
    return (bar_scaled_clustered + line).resolve_axis(x='shared', y='shared')

With this function, different selection types from altair can be tried by simply changing the value of the ```selection_type``` argument inside the ```create_bar_line_chart``` function: 

In [8]:
sel_interval = alt.selection_interval(encodings=['x'], empty=False)
sel_point = alt.selection_point(encodings=['x'], empty=False)

interactive_barchart = create_bar_line_chart(sel_interval)
interactive_barchart

---
## 2.Parallel Coordinates

This type of plot seems to be way more complex than the bar chart, because the data has to be normalized. The calculations for this need to be done through altair 

In [9]:
base = alt.Chart(data_url).transform_window( 
     index="count()"  # create an index for each data row
).transform_calculate(
    year_num="toNumber(year(datum.time))",
    lat_num="toNumber(datum.lat)",
    lon_num="toNumber(datum.lon)"
).transform_joinaggregate(
    min_year="min(year_num)", max_year="max(year_num)",
    min_lat="min(lat_num)", max_lat="max(lat_num)",
    min_lon="min(lon_num)", max_lon="max(lon_num)",
    min_tas="min(tas)", max_tas="max(tas)"
)

table= base.transform_fold(
    ['min_year', 'max_year', 'min_lat', 'max_lat', 'min_lon', 'max_lon', 'min_tas', 'max_tas'],
    as_=['Variable', 'Value']
).mark_text(
    align='left',
    dx=5
).encode(
    x=alt.value(10), 
    y=alt.Y('Variable:N', axis=alt.Axis(title='Variable')),
    text=alt.Text('Value:Q', format='.2f')
).properties(
    height=200,
    width=200
)

# Remove this comment if you want to calculate the min, max values
#table

Now, all of the values need to be normalized, such that they can be plottet in the same Y-domain (in this case \[0,1\]):

In [10]:
normalized_year = base.transform_calculate(
    year_norm="(datum.year_num - datum.min_year) / (datum.max_year - datum.min_year)"
).mark_point().encode(
    x=alt.X('year_num:Q', scale=alt.Scale(domain=[2015, 2100])),
    y=alt.Y('year_norm:Q', scale=alt.Scale(domain=[0,1.1]))
).properties(
    width=400,
    height=266
)

normalized_lat = base.transform_calculate(
    lat_norm="(datum.lat_num - datum.min_lat) / (datum.max_lat - datum.min_lat)"
).mark_point().encode(
    x=alt.X('lat_num:Q', scale=alt.Scale(domain=[-90, 90])),
    y=alt.Y('lat_norm:Q', scale=alt.Scale(domain=[0,1.1]))
).properties(
    width=400,
    height=266
)

normalized_lon = base.transform_calculate(
    lon_norm="(datum.lon_num - datum.min_lon) / (datum.max_lon - datum.min_lon)"
).mark_point().encode(
    x=alt.X('lon_num:Q', scale=alt.Scale(domain=[0, 365])),
    y=alt.Y('lon_norm:Q', scale=alt.Scale(domain=[0,1.1]))
).properties(
    width=400,
    height=266
)

normalized_tas = base.transform_calculate(
    tas_norm="(datum.tas - datum.min_tas) / (datum.max_tas - datum.min_tas)"
).mark_point().encode(
    x=alt.X('tas:Q', scale=alt.Scale(domain=[240, 320])),
    y=alt.Y('tas_norm:Q', scale=alt.Scale(domain=[0,1.1]))
).properties(
    width=400,
    height=266
)

# Normalization 
normalized_data = base.transform_calculate(
     year_norm="(datum.year_num - datum.min_year) / (datum.max_year - datum.min_year)",
     lat_norm="(datum.lat_num - datum.min_lat) / (datum.max_lat - datum.min_lat)",
     lon_norm="(datum.lon_num - datum.min_lon) / (datum.max_lon - datum.min_lon)",
     tas_norm="(datum.tas - datum.min_tas) / (datum.max_tas - datum.min_tas)"
)

chart_row1 = alt.hconcat(normalized_year, normalized_tas)
chart_row2 = alt.hconcat(normalized_lat, normalized_lon)
normalized_values_check = alt.vconcat(chart_row1, chart_row2)

# only comment this out if you want to test the values
#normalized_values_check

If each of the above graphs is in the domain from \[0,1\], then they should be drawable as a parallel coordinates plot. Firstly, a general normalization and then the transform:

In [11]:
# Now fold the data rows into "wide" format
paralles_coordinates = normalized_data.transform_sample(2000
).transform_fold(
     ['lat_norm', 'tas_norm', 'lon_norm'] # fold the data
).mark_line(opacity=0.5).encode(
     x=alt.X('key:N', title='Attribute', sort=['lat_norm', 'tas_norm', 'lon_norm']),
     y=alt.Y('value:Q', title='Normalized Value'),
     detail='index:O', # Connects all data values along a specific row
     color=alt.Color('year_num:Q', scale=alt.Scale(scheme='viridis'), title='Year')
).properties(
     width=800,
     height=500
)

paralles_coordinates

This is way to much data to handle (thats why it needed to be downsampled to 2000 data points which is quite alot already). The User should be able to control the time range:  

In [12]:
year_slider = alt.binding_range(min=2015, max=2099, step=1, name='Year')
year_select = alt.selection_point(
    name="Select Year",
    fields=["year_num"],
    bind=year_slider,
    value=2015
)

interactive_pcp = normalized_data.add_params(
    year_select     # add interactive slider to chart
).transform_filter(
    year_select    # select only the data points with the selected year
).transform_sample(
    500  # each year contains ~5400 data points --> still way to large 
).transform_fold(
     ['lat_norm', 'tas_norm', 'lon_norm'] # fold the data
).mark_line(opacity=0.5).encode(
     x=alt.X('key:N', title='Attribute', sort=['lat_norm', 'tas_norm', 'lon_norm']),
     y=alt.Y('value:Q', title='Normalized Value'),
     detail='index:O', # Connects all data values along a specific row
     color=alt.Color('tas:Q', scale=alt.Scale(scheme='viridis'), title='TAS')
).properties(
     width=1000,
     height=600
)
# this one is still to big for jupyter but works fine in the browser
interactive_pcp.save('outputs/pcp.json')

# You need to Copy the pcp json into the "spec" cosntant inside the pcp html file (contains external slider, thats why we can't just use .save(html) method)


# Show the file in the browser 
from pathlib import Path
from webbrowser import open_new_tab
filepath= Path('./outputs/pcp.html').resolve().as_uri()
open_new_tab(filepath)

True

---
## Scatterplot / Scatterplot-Matrix

In [13]:
# Single Scatterplot with year slider

# Add interactive Element
year_slider = alt.binding_range(min=2015, max=2099, step=1, name='Year')
year_select = alt.selection_point(
    name="Select Year",
    fields=["year"],
    bind=year_slider,
    value=2015
)

scatterplot = alt.Chart(data=data_url).transform_calculate(
    year = "toNumber(year(datum.time))"
).add_params(
    year_select    
).transform_filter(
    year_select
).mark_circle().encode(
    x=alt.X("lon:Q", scale=alt.Scale(domainMax=360)),
    y=alt.Y("lat:Q"),
    color=alt.Color('tas:Q', scale=alt.Scale(scheme='viridis', domain=[240, 315]), title='TAS')
).properties(
    width=600,
    height=400
)#.interactive()

scatterplot

This looks quite nice to give an overview on the trends of the whole world at once. A scatterplot Matrix across all attributes could be interesting as well

In [14]:
# Scatterplot Matrix

# Add interactive Element
year_slider = alt.binding_range(min=2015, max=2099, step=1, name='Year')
year_select = alt.selection_point(
    name="Select Year",
    fields=["year"],
    bind=year_slider,
    value=2015
)

scatter_matrix = alt.Chart(data=data_url).transform_calculate(
    year = "toNumber(year(datum.time))"
).add_params(year_select).transform_filter(
    year_select
).mark_circle().encode(
    x=alt.X(alt.repeat("column"), type="quantitative"),
    y=alt.Y(alt.repeat("row"), type="quantitative"),
    color=alt.Color('tas:Q', scale=alt.Scale(scheme='viridis'), title='TAS')
).properties(
    width=150,
    height=150
).repeat(
    row=['lat', 'lon', 'tas'],
    column=['tas', 'lon', 'lat']
).interactive()

scatter_matrix
# Export as html (seems to big for jupyter)
scatter_matrix.save('outputs/scatter_matrix.html')
# Open in the browser
from pathlib import Path
from webbrowser import open_new_tab
filepath= Path('scatter_matrix.html').resolve().as_uri()
open_new_tab(filepath)

0:163: execution error: Die Datei „some object“ wurde nicht gefunden. (-43)
69:77: execution error: „application "chrome"“ kann nicht gelesen werden. (-1728)


True

--- 
## Combining the Plots
Now, that there is a basic version for the needed plots, they need to be combined. The most efficient way would probably be to simply merge them into one singular graph, but maybe give them independent parameters. As a first step, the globe visualization that has been created with Vega-Lite needs to be imported: 


In [17]:
# import the world json content
import json
with open("outputs/world.json") as file:
    json_content = json.load(file)
    json_content.pop('$schema', None)
world_chart = alt.Chart.from_dict(json_content)

world_chart


Now, it is time to fully join all graphs into one by merging them. In order to avoid buggy behavior, all the interactive sliders need to be synchronized.

In [None]:
# Only one chart must be specified with the parameter that is shared
bar_scaled = interactive_barchart.properties(width=400, height=350)

# All of these graphs contain a year slider 
world_scaled = world_chart.properties(width=400, height=350)
pcp_scaled = interactive_pcp.properties(width=400, height=350)
scatterplot_scaled = scatterplot.properties(width=400, height=350)

upper_row = world_scaled | bar_scaled
lower_row = pcp_scaled | scatterplot_scaled

combined_chart = upper_row & lower_row

combined_chart




---
## Mockup Section

Here, the finished visualizations from above are used and modified to produce images for the final Mockup prototype

### Scala:

In [None]:
legend = alt.Chart(data=data_url).mark_rect(opacity=0).encode(
    color=alt.Color('tas:Q', scale=alt.Scale(scheme='viridis', domain=[240, 315]), title='TAS')
).properties(width=50, height=300)

legend

### Mockup World

In [None]:
# import the world json content
import json
with open("outputs/world_without_legend.json") as file:
    json_content = json.load(file)
    json_content.pop('$schema', None)
sc_world_chart = alt.Chart.from_dict(json_content)

sc_world_chart


### Mockup Scatterplot

In [None]:
# Single Scatterplot with year slider

sc_scatterplot = alt.Chart(data=data_url).transform_calculate(
    year = "toNumber(year(datum.time))"
).add_params(
    year_select    
).transform_filter(
    year_select
).mark_circle().encode(
    x=alt.X("lon:Q", scale=alt.Scale(domainMax=360)),
    y=alt.Y("lat:Q"),
    color=alt.Color('tas:Q', scale=alt.Scale(scheme='viridis', domain=[240, 315]), legend=None)
).properties(
    width=width_small,
    height=height_small
)#.interactive()

sc_scatterplot

### Mockup PCP
See [sc_pcp.html](./sc_pcp.html)

### Bar Chart

In [None]:
sc_bar_chart = alt.Chart(data_url).transform_calculate(
    year_group = "floor(year(datum.time)/5)*5"
).mark_bar().encode(
    x=alt.X('year_group:O', title='Year (Interval of 5)', axis=alt.Axis(labelAngle=0)),
    y=alt.Y('tas:Q', aggregate='mean', title='Average Surface Temperature (K)', scale=alt.Scale(domain=[250, 300]))
).properties(
    width=width_small,
    height=height_small
)
sc_bar_chart


---