# Vega Altair

Vega-Altair is a declarative statistical visualization library for Python, based on Vega and Vega-Lite. It offers a powerful and concise grammar that enables you to quickly build a wide range of statistical visualizations. 

You can install Altair with the terminal command: 

`pip install "altair[all]"`

Check the [installation guide](https://altair-viz.github.io/getting_started/installation.html) for more infos. 

In [34]:
#import libraries
import altair as alt
import pandas as pd
from datetime import datetime as dt

In [5]:
#load data
df = pd.read_csv("data/movies_imdb.csv")
df.head(2)

Unnamed: 0,imdbID,title,year,rating,runtime,genre,released,director,writer,cast,...,imdbRating,imdbVotes,poster,plot,fullplot,language,country,awards,lastupdated,type
0,1,Carmencita,1894,NOT RATED,1 min,"Documentary, Short",,William K.L. Dickson,,Carmencita,...,5.9,1032.0,https://m.media-amazon.com/images/M/MV5BMjAzND...,Performing on what looks like a small wooden s...,Performing on what looks like a small wooden s...,,USA,,2015-08-26 00:03:45.040000000,movie
1,5,Blacksmith Scene,1893,UNRATED,1 min,Short,1893-05-09,William K.L. Dickson,,"Charles Kayser, John Ott",...,6.2,1189.0,,Three men hammer on an anvil and pass a bottle...,A stationary camera looks at a large anvil wit...,,USA,1 win.,2015-08-26 00:03:50.133000000,movie


### Data Cleaning

In [None]:
#fix the error in year
df['year'] = df['year'].str.replace('�', '')
# now convert the column
df["year"] = df["year"].astype(int)
# convert released to date
df['released'] = pd.to_datetime( df['released'])

In [41]:
df.dtypes

imdbID                  int64
title                  object
year                    int64
rating                 object
runtime                object
genre                  object
released       datetime64[ns]
director               object
writer                 object
cast                   object
metacritic            float64
imdbRating            float64
imdbVotes             float64
poster                 object
plot                   object
fullplot               object
language               object
country                object
awards                 object
lastupdated            object
type                   object
dtype: object

## Plotting =)



In [64]:
# which country has made the most films? 
top_countries = df['country'].value_counts().reset_index().head(5)

top_countries

Unnamed: 0,country,count
0,USA,20589
1,UK,2550
2,France,1683
3,Japan,1468
4,Italy,1208



### Anatomy of an Altair chart
An altair plot always follows this schema:

    alt.Chart(df).mark_bar().encode(
        x = 'column_A',
        y = 'column_B
    )

`Chart()`: variable inside sets from which dataframe data should be plotted <br>
`mark_bar()`: choose which form the plot should take<br>
`encode`: set which columns to plot


##### Bar chart

In [None]:
# Bar chart
alt.Chart(top_countries).mark_bar().encode(
    x = 'count',
    y = 'country'
)



### Plot types

There is 
* `mark_bar`
* `mark_line`
* `mark_circle`
* `mark_point`
* `mark_boxplot`
* `mark_square`

##### Line chart

In [70]:
df['year'].sort_values().tail(10)

43545        2017
45666        2018
45877        2019
40155    19821986
15802    19941998
20546    19981999
24060    20012004
30643    20062007
29443    20062012
31425    20062012
Name: year, dtype: int64

In [72]:
# we leave out any double years for the sake of this demonstration
df = df[df['year']<2025]

In [77]:
# count number of films per year
films_per_year = df.groupby('year')['country'].count().reset_index()
films_per_year

Unnamed: 0,year,country
0,1874,1
1,1880,1
2,1887,1
3,1888,2
4,1890,4
...,...,...
129,2015,2055
130,2016,506
131,2017,41
132,2018,1


In [80]:
alt.Chart(films_per_year).mark_line().encode(
    x = 'year',
    y = 'country',
    tooltip = ['year', 'country']
)

#### Dot plot

In [93]:
best_rated = df.sort_values('imdbRating', ascending=False).head(1000)

In [103]:
this_df = df[df['imdbVotes']>10000]
this_df

Unnamed: 0,imdbID,title,year,rating,runtime,genre,released,director,writer,cast,...,imdbRating,imdbVotes,poster,plot,fullplot,language,country,awards,lastupdated,type
25,417,A Trip to the Moon,1902,TV-G,13,"Short, Adventure, Fantasy",1902-10-04,Georges M�li�s,,"Fran�ois Lallement, Jules-Eug�ne Legris",...,8.2,23904.0,https://m.media-amazon.com/images/M/MV5BMTQzMD...,A group of astronomers go on an expedition to ...,A group of men travel to the moon by being sho...,,France,,2015-09-01 00:16:55.443000000,movie
106,4972,The Birth of a Nation,1915,NOT RATED,165,"Drama, History, Romance",1915-03-03,D.W. Griffith,"Thomas Dixon Jr. (adapted from his novel: ""The...","Lillian Gish, Mae Marsh, Henry B. Walthall, Mi...",...,6.8,15715.0,https://m.media-amazon.com/images/M/MV5BMTY0OD...,The Civil War divides friends and destroys fam...,"Two brothers, Phil and Ted Stoneman, visit the...",,USA,2 wins.,2015-09-11 00:32:27.763000000,movie
208,10323,The Cabinet of Dr. Caligari,1920,UNRATED,67,"Crime, Horror, Thriller",1921-03-19,Robert Wiene,"Carl Mayer (story), Hans Janowitz (story)","Werner Krauss, Conrad Veidt, Friedrich Feher, ...",...,8.1,34504.0,https://m.media-amazon.com/images/M/MV5BMTk2Nj...,"Dr. Caligari's somnambulist, Cesare, and his d...","Francis, a young man, recalls in his memory th...",German,Germany,,2015-08-26 00:40:25.337000000,movie
254,12349,The Kid,1921,NOT RATED,68,"Comedy, Drama, Family",1921-02-06,Charles Chaplin,Charles Chaplin,"Carl Miller, Edna Purviance, Jackie Coogan, Ch...",...,8.4,56858.0,https://m.media-amazon.com/images/M/MV5BMTkzNT...,"The Tramp cares for an abandoned child, but ev...","The opening title reads: ""A comedy with a smil...",English,USA,1 win.,2015-09-05 00:24:11.143000000,movie
290,13442,Nosferatu,1922,UNRATED,81,Horror,1929-06-03,F.W. Murnau,"Henrik Galeen (screen play), Bram Stoker (base...","Max Schreck, Gustav von Wangenheim, Greta Schr...",...,8.0,63322.0,https://m.media-amazon.com/images/M/MV5BMzgwNz...,Vampire Count Orlok expresses interest in a ne...,"Wisbourg, Germany based estate agent Knock dis...",German,Germany,1 win & 1 nomination.,2015-08-21 00:04:53.453000000,movie
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
44756,4044364,Citizenfour,2014,R,114,Documentary,2015-02-23,Laura Poitras,,"Edward Snowden, Glenn Greenwald, William Binne...",...,8.2,20610.0,https://m.media-amazon.com/images/M/MV5BMTc0MT...,A documentarian and a reporter travel to Hong ...,"In January 2013, Laura Poitras started receivi...","English, Portuguese, German","USA, Germany, UK",Won 1 Oscar. Another 43 wins & 25 nominations.,2015-08-27 00:14:57.227000000,movie
45005,4178092,The Gift,2015,R,108,"Mystery, Thriller",2015-08-07,Joel Edgerton,Joel Edgerton,"Jason Bateman, Rebecca Hall, Joel Edgerton, Al...",...,7.6,12823.0,https://m.media-amazon.com/images/M/MV5BMTQzMj...,A young married couple's lives are thrown into...,Simon and Robyn are a young married couple who...,English,"Australia, USA",,2015-09-05 00:00:36.160000000,movie
45092,4229236,Cobain: Montage of Heck,2015,TV-MA,145,"Documentary, Biography, Music",2015-05-04,Brett Morgen,Brett Morgen,"Aaron Burckhard, Chad Channing, Don Cobain, Je...",...,7.7,13973.0,https://m.media-amazon.com/images/M/MV5BMjIyOT...,An authorized documentary on the late musician...,An authorized documentary on the late musician...,English,USA,Nominated for 6 Primetime Emmys. Another 1 win.,2015-09-17 04:40:35.987000000,movie
45129,4257858,Going Clear: Scientology and the Prison of Belief,2015,NOT RATED,119,Documentary,2015-06-25,Alex Gibney,"Alex Gibney, Lawrence Wright (book)","Lawrence Wright, Mike Rinder, Marty Rathbun, P...",...,8.2,13687.0,https://m.media-amazon.com/images/M/MV5BMjMwMT...,An in-depth look at the inner-workings of the ...,A devastating two hour documentary based on La...,English,USA,,2015-09-10 17:47:44.460000000,movie


In [106]:
alt.Chart(this_df.head(1000)).mark_point().encode(
    x = 'imdbRating',
    y = 'imdbVotes',
    tooltip = ['title', 'imdbRating', 'imdbVotes']    
)