# Vega Altair

Vega-Altair is a declarative statistical visualization library for Python, based on Vega and Vega-Lite. It offers a powerful and concise grammar that enables you to quickly build a wide range of statistical visualizations. You can install Altair with the terminal command `pip install "altair[all]"`. Check the [installation guide](https://altair-viz.github.io/getting_started/installation.html) for more infos. 

In [2]:
import altair as alt
import pandas as pd

In [4]:
# load the movies example data set
data = pd.read_csv("./data/movies_imdb.csv")
data.head()

Unnamed: 0,imdbID,title,year,rating,runtime,genre,released,director,writer,cast,...,imdbRating,imdbVotes,poster,plot,fullplot,language,country,awards,lastupdated,type
0,1,Carmencita,1894,NOT RATED,1 min,"Documentary, Short",,William K.L. Dickson,,Carmencita,...,5.9,1032.0,https://m.media-amazon.com/images/M/MV5BMjAzND...,Performing on what looks like a small wooden s...,Performing on what looks like a small wooden s...,,USA,,2015-08-26 00:03:45.040000000,movie
1,5,Blacksmith Scene,1893,UNRATED,1 min,Short,1893-05-09,William K.L. Dickson,,"Charles Kayser, John Ott",...,6.2,1189.0,,Three men hammer on an anvil and pass a bottle...,A stationary camera looks at a large anvil wit...,,USA,1 win.,2015-08-26 00:03:50.133000000,movie
2,3,Pauvre Pierrot,1892,,4 min,"Animation, Comedy, Short",1892-10-28,�mile Reynaud,,,...,6.7,566.0,,"One night, Arlequin come to see his lover Colo...","One night, Arlequin come to see his lover Colo...",,France,,2015-08-12 00:06:02.720000000,movie
3,8,Edison Kinetoscopic Record of a Sneeze,1894,,1 min,"Documentary, Short",1894-01-09,William K.L. Dickson,,Fred Ott,...,5.9,988.0,,A man (Thomas Edison's assistant) takes a pinc...,A man (Edison's assistant) takes a pinch of sn...,,USA,,2015-08-10 00:21:07.127000000,movie
4,10,Employees Leaving the Lumi�re Factory,1895,,1 min,"Documentary, Short",1895-03-22,Louis Lumi�re,,,...,6.9,3469.0,,A man opens the big gates to the Lumi�re facto...,A man opens the big gates to the Lumi�re facto...,,France,,2015-08-26 00:03:56.603000000,movie


## Filter Data

Filter all movies between 1950 and 2000. To do this, first we need to create a numeric years column (as the years are strings). When this is done, we filter the dataset further down to only include movies from France, UK, and Germany. 

In [7]:
# Convert the 'year' column to numeric, coercing errors (e.g., invalid strings will become NaN)
data['year_numeric'] = pd.to_numeric(data['year'], errors='coerce')

# apply the filtering
movies_subset = data[(data['year_numeric'] >= 1950) & (data['year_numeric'] <= 2000)]
print("Found", len(movies_subset), "movies between 1950 and 2000")

# reduce to countries
movies_japan_or_switzerland = movies_subset[(movies_subset["country"] == "Japan") | (movies_subset["country"] == "Switzerland")]
print("Found", len(movies_japan_or_switzerland), "movies from Japan or Switzerland")


Found 19590 movies between 1950 and 2000
Found 746 movies from Japan or Switzerland


## Visualization

Once we have the data set ready we use the Altair library to map the year and rating to the x and y axis and differntiate between countries by color. 

In [10]:
# Create a scatter plot
chart = alt.Chart(movies_japan_or_switzerland).mark_point().encode(
    x='year',
    y='imdbRating',
    color='country'
)

# display the chart directly in the notebook
chart

For more infos on the possibilities of Altair check the official documentation: [https://altair-viz.github.io/getting_started/overview.html#overview](https://altair-viz.github.io/getting_started/overview.html#overview).