## Interactive Visualization with Altair

**Some resources on interactive visualizations:**
- [Altair's user guide on interactive charts](https://altair-viz.github.io/user_guide/interactions.html)
- [Marian Dork's guide on interaction techniques](https://deepnote.com/@uclab_potsdam/4-Interaction-techniques-7c4bfb10-b7a9-48dc-bd94-a87a46421a06)
- [Some more complex examples of interaction](https://matthewkudija.com/blog/2018/06/22/altair-interactive/)

**An important first step:** The default version of altair in colab is _not_ the most recent version. But some of the nice interactive features

In [None]:
!pip install -U altair vega_datasets
!pip install -U pyarrow

Collecting altair
  Downloading altair-5.2.0-py3-none-any.whl (996 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m996.9/996.9 kB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: altair
  Attempting uninstall: altair
    Found existing installation: altair 4.2.2
    Uninstalling altair-4.2.2:
      Successfully uninstalled altair-4.2.2
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
lida 0.0.10 requires fastapi, which is not installed.
lida 0.0.10 requires kaleido, which is not installed.
lida 0.0.10 requires python-multipart, which is not installed.
lida 0.0.10 requires uvicorn, which is not installed.[0m[31m
[0mSuccessfully installed altair-5.2.0
Collecting pyarrow
  Downloading pyarrow-14.0.1-cp310-cp310-manylinux_2_28_x86_64.whl (38.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m38.

In [None]:
import pandas as pd
import altair as alt

## Loading our World Happiness Dataset

 We will explore Altair basics using the [2016 World Happiness Dataset](https://www.kaggle.com/unsdsn/world-happiness) from Kaggle. I've uploaded the dataset to a public link, which we're reading in below:

In [None]:
data = pd.read_csv("https://drive.google.com/uc?id=1veeMj8U7yRNUbz7EBG0p2oPJ91fwcFzJ")

In [None]:
data6 = pd.read_csv("https://drive.google.com/uc?id=1G3Vn_ExHNcF5uoKbUMBNeKw4Koz6Kw5x")


In [None]:
data8 = pd.read_csv("https://drive.google.com/uc?id=10hi-rDtGXpnVP5yOmM7roqyF5yrm5qHX")

In [None]:
import pandas as pd

data.dropna(subset=['Country', 'Year', 'Value', 'Unit'], inplace=True)


common_columns = ['Country', 'Year', 'Value','Pollutant']

merged_df = pd.merge(data[common_columns], data6[common_columns], on=common_columns, how='outer')
merged_df = pd.merge(merged_df, data8[common_columns], on=common_columns, how='outer')

merged_df.to_csv('merged_file.csv', index=False)

In [None]:
import pandas as pd


column_name = 'Value'
year_column = 'Year'

def get_top_15_for_year(data, year):
    return data[data[year_column] == year].nlargest(15, column_name)


top_15_1990 = get_top_15_for_year(data, 1990)


print(top_15_1990[['Country', year_column, column_name]])


                                              Country  Year        Value
1168                                     OECD - Total  1990  15428681.00
1431                                     OECD America  1990   7702241.00
925                                     United States  1990   6487330.95
1200                                    OECD - Europe  1990   5604571.00
1808  European Union – 27 countries (from 01/02/2020)  1990   4860553.45
1040                                           Russia  1990   3166579.05
1399                                OECD Asia Oceania  1990   2121869.00
448                                             Japan  1990   1269334.03
256                                           Germany  1990   1251224.78
1495                                          Ukraine  1990    942800.47
893                                    United Kingdom  1990    806301.84
96                                             Canada  1990    588602.82
1232                                           Braz

In [None]:
selected_countries = ['United States', 'European Union – 27 countries (from 01/02/2020)','Russia','Australia','Japan','Germany','Ukraine','United Kingdom','Canada','Brazil','France','Italy']
data_line = merged_df[(merged_df['Country'].isin(selected_countries)) & (merged_df['Year'] <= 2020)][['Country', 'Year', 'Value','Pollutant']].copy()
data_line['Year'] = data_line['Year'].astype(int)

In [None]:
import altair as alt

input_dropdown = alt.binding_select(
    options=[
        'None',
        'United States',
        'European Union – 27 countries (from 01/02/2020)',
        'Russia',
        'Japan',
        'Germany',
        'Ukraine',
        'United Kingdom',
        'Canada',
        'Brazil',
        'France',
        'Italy',
        'Australia',
    ],
    name='Source'
)
brush = alt.selection_interval(encodings=['x'])
selection = alt.selection_point(fields=['Country'], bind=input_dropdown)
color_scale = alt.Scale(scheme='dark2')
opacity_rule = alt.condition(
    selection,
    alt.value(.9),
    alt.value(0.05)
)

data_greenhouse = data_line[data_line['Pollutant'] == 'Greenhouse gases']
data_perfluorocarbons = data_line[data_line['Pollutant'] == 'Perfluorocarbons']
data_hydrofluorocarbons = data_line[data_line['Pollutant'] == 'Hydrofluorocarbons']

base = alt.Chart(data_greenhouse).mark_circle().encode(
    x=alt.X("Year:O", title="Year (1990 - 2014)"),
    y=alt.Y("Value:Q", title='Green House Gas Emissions By tonnes'),
    color=alt.condition(selection & brush, 'Country:N', alt.value('gray'), scale=color_scale),
    opacity=opacity_rule,
    tooltip='Value'
).add_selection(
    brush, selection
).properties(width=350, title='Greenhouse Gas Emissions')

base1 = alt.Chart(data_perfluorocarbons).mark_circle().encode(
    x=alt.X("Year:O", title="Year (1990 - 2014)"),
    y=alt.Y("Value:Q", title='Perfluorocarbons Emissions By tonnes'),
    color=alt.condition(selection & brush, 'Country:N', alt.value('gray'), scale=color_scale),
    opacity=opacity_rule,
    tooltip='Value'
).add_selection(
    brush, selection
).properties(width=350, title='Perfluorocarbons Emissions')

base2 = alt.Chart(data_hydrofluorocarbons).mark_circle().encode(
    x=alt.X("Year:O", title="Year (1990 - 2014)"),
    y=alt.Y("Value:Q", title='Hydrofluorocarbons Emissions By tonnes'),
    color=alt.condition(selection & brush, 'Country:N', alt.value('gray'), scale=color_scale),
    opacity=opacity_rule,
    tooltip='Value'
).add_selection(
    brush, selection
).properties(width=350, title='Hydrofluorocarbons Emissions')

(base | base1 | base2).resolve_scale(y='independent')




In [None]:
selected_countries = ['United States', 'European Union – 27 countries (from 01/02/2020)','Russia','Australia','Japan','Germany','Ukraine','United Kingdom','Canada','Brazil','France','Italy','China','South Africa','Greece','Mexico']
data_line = merged_df[(merged_df['Country'].isin(selected_countries)) & (merged_df['Year'] <= 2020)][['Country', 'Year', 'Value','Pollutant']].copy()
data_line['Year'] = data_line['Year'].astype(int)

In [None]:

color_scheme = 'viridis'
alt.Chart(data_line).mark_rect().encode(
    x=alt.X('Year:T', title="Year (1990 - 2014)", axis=alt.Axis(tickCount=50)),
    y=alt.Y('Value:N', title="Emissions"),
    color=alt.Color('Country:N', scale=alt.Scale(scheme=color_scheme)),
    tooltip=[
        alt.Tooltip('Year:N', title="Year"),
        alt.Tooltip('Value', title='Emissions (Tonnes of CO2 equivalent)'),
        alt.Tooltip('Country', title="Country"),
    ]
).properties(
    width=700,
    height=1200
)
