# INFOVIS - Project 1

### Importing pandas and altair libraries

In [2]:
import pandas as pd
import altair as alt

### Libraries Used

- **[Pandas](https://pandas.pydata.org/)** : A powerful Python library used for data manipulation and analysis. It provides data structures like DataFrames and Series.
- **[Altair](https://altair-viz.github.io/)**: A declarative statistical visualization library in Python, based on the principles of **[The Grammar of Graphics](https://www.cs.uic.edu/~wilkinson/TheGrammarOfGraphics/GOG.html)**, a foundational framework introduced by Leland Wilkinson for constructing complex and flexible visualizations. Altair is built on top of the **[Vega](https://vega.github.io/vega/)** and **[Vega-Lite](https://vega.github.io/vega-lite/)** visualization grammars, which are high-level grammars for interactive graphics.


In [3]:
df = pd.read_csv('movies.csv')

For my dataset, I used data from [**Letterboxd**](https://letterboxd.com/), a popular web platform for movie enthusiasts where users can catalog, rate, and review the films they've watched. Although the site provides an API, I chose not to use it because I found its structure confusing and overly complex for my needs. Instead, I manually collected the information from the films I’ve seen.

The dataset contains the following columns:

- **ID**: A unique identifier for each film.
- **Title**: The title of the film.
- **Director**: The director of the film.
- **Genre**: The main genre(s) of the film.
- **Month**: The month I watched the film.
- **Year**: The year the film was released.
- **Duration**: The duration of the film in minutes.
- **Rating**: My personal rating for the film, on a scale of 0 to 10.
- **Country**: The country of origin of the film.

I recorded information on **50 films** that I have watched and rated.


In [4]:
alt.Chart(df).mark_bar(color='greenyellow').encode(
    y=alt.Y('Genre', sort=alt.EncodingSortField(field='count()', op='count', order='descending')),
    x='count()',
    tooltip=['Genre', 'count()']
).interactive().properties(
    title='Bar Chart of Number of Movies Watched by Genre'
)

In [5]:
# Get a list of all unique genres
genres = df['Genre'].unique().tolist()

# Create radio buttons for each genre plus an 'All' option
input_dropdown = alt.binding_radio(
    options=genres + [None],
    labels=genres + ['All'],
    name='Genre: '
)

# Create a selection that is bound to the radio buttons
selection = alt.selection_point(
    fields=['Genre'],
    bind=input_dropdown,
    empty='all'  # This makes 'All' option equivalent to no selection
)

# Get the min and max values for 'Duration (min)' and 'Rating (out of 10)'
min_duration = df['Duration (min)'].min() - 2  # Subtract 2 for padding
max_duration = df['Duration (min)'].max() + 2  # Add 2 for padding
min_rating = df['Rating (out of 10)'].min() - 0.5  # Subtract 1 for padding
max_rating = df['Rating (out of 10)'].max() + 0.5  # Add 1 for padding

# Use the selection in your chart
alt.Chart(df).mark_circle().encode(
    x=alt.X('Duration (min)', scale=alt.Scale(domain=(min_duration, max_duration))),
    y=alt.Y('Rating (out of 10)', scale=alt.Scale(domain=(min_rating, max_rating))),
    color=alt.Color('Genre', scale=alt.Scale(scheme='category20')),
    tooltip=['Title', 'Duration (min)', 'Rating (out of 10)', 'Genre']
).add_params(
    selection
).transform_filter(
    selection
).interactive().properties(
    title='Interactive Scatter Plot: Movie Duration vs. Rating (Filter by Genre)'
)


In [6]:
alt.Chart(df).transform_aggregate(
    count='count()',
    groupby=['Country']
).mark_arc().encode(
    theta='count:Q',
    color='Country:N',
    tooltip=['Country', 'count:Q']
).interactive().properties(
    title='Distribution of Movies by Country'  # Adicione o título aqui
)

In [8]:
df_month = df.groupby('Month').size().reset_index(name='Count')

months_order = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']

df_month['Month'] = pd.Categorical(df_month['Month'], categories=months_order, ordered=True)

min_count = df_month['Count'].min() - 1
max_count = df_month['Count'].max() + 1

alt.Chart(df_month).mark_line().encode(
    x=alt.X('Month', axis=alt.Axis(labelAngle=0)),
    y=alt.Y('Count:Q', scale=alt.Scale(domain=(min_count, max_count))),
    tooltip=['Month', 'Count']
).properties(
    title='Line Chart of Movies Watched by Month',
    width=600,
    height=400
).interactive()

In [9]:
df['Rating (out of 10)'] = df['Rating (out of 10)'].astype(float)

df_genre_rating = df.groupby('Genre')['Rating (out of 10)'].mean().reset_index(name='Average Rating')

alt.Chart(df_genre_rating).mark_bar().encode(
    y=alt.Y('Genre', sort=alt.EncodingSortField(field='Average Rating', op='mean', order='descending')),
    x='Average Rating',
    tooltip=['Genre', 'Average Rating']
).properties(
    title='Bar Chart of Average Movie Ratings by Genre'
).interactive()