<a href="https://colab.research.google.com/github/lcbjrrr/ProgWdata/blob/main/CSC116_Altair.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Altair

Altair is a declarative Python library for creating clear, concise, and interactive statistical visualizations. It allows users to specify what they want to visualize by linking data columns to visual elements like axes, colors, and shapes, while handling the detailed rendering automatically. Built on Vega and Vega-Lite, Altair simplifies exploratory data analysis with an easy-to-read syntax and supports a wide range of charts from simple bar plots to complex layered and faceted visualizations.

## Data Analsyis

In [5]:
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/lcbjrrr/ProgWdata/refs/heads/main/grades_ok.csv')
df.head(2)

Unnamed: 0,Course,Student,AP1,AP2,AP3,Grade,Score
0,ADM,Finn,90,90,90,90,A
1,ADM,Jack,60,40,100,60,D


## Bar chart

In [6]:
import altair as alt
alt.Chart(df).mark_bar().encode(x='Student',y='Grade')

## Aggregated Data

In [8]:
grade_per_course = df.groupby('Course')['Grade'].mean()
print(grade_per_course)
grade_per_course = grade_per_course.reset_index()
grade_per_course

Course
ADM    67.2
ECO    79.5
LAW    72.4
Name: Grade, dtype: float64


Unnamed: 0,Course,Grade
0,ADM,67.2
1,ECO,79.5
2,LAW,72.4


In [9]:
alt.Chart(grade_per_course).mark_bar().encode(x='Course',y='Grade')

In [10]:
alt.Chart(df).mark_bar().encode(x='Course',y='mean(Grade)')

In [11]:
alt.Chart(df).mark_bar().encode(x='Course',y='Grade')

## Pie chart

![](https://pbs.twimg.com/media/Fw7gCt_XgAEcUk0?format=png&name=small)

In [12]:
alt.Chart(df).mark_arc().encode(theta='count(Student)', color='Course')

## Line chart

![](https://pbs.twimg.com/media/Fw7iM3YXoAIRs9Q?format=png&name=900x900)

In [14]:
covid = pd.read_csv('https://raw.githubusercontent.com/lcbjrrr/ProgWData/refs/heads/main/covid19.csv')
covid.head(2)

Unnamed: 0,date,Brazil,India,US
0,1/23/20,0,0,0
1,1/24/20,0,0,1


In [15]:
alt.Chart(covid).mark_line().encode(x='date:T',y='US')

In [23]:
melted = covid.melt(id_vars='date')
melted.head(3)

Unnamed: 0,date,variable,value
0,1/23/20,Brazil,0
1,1/24/20,Brazil,0
2,1/25/20,Brazil,0


In [24]:
alt.Chart(melted).mark_line().encode(x='date:T',y='value').transform_filter("datum.variable == 'US'")

In [26]:
alt.Chart(melted).mark_line().encode(x='date:T',y='value',color='variable')

## Scatterplot

In [27]:
df = pd.read_csv('https://raw.githubusercontent.com/lcbjrrr/ProgWdata/refs/heads/main/notas_full.csv')
alt.Chart(df).mark_point().encode(
    x='AP2',
    y='AP3'
).interactive()

## Heatmap and Correlation

In [32]:
cor = df.corr()
print(cor)
cor=cor.reset_index()
print(cor)
corr = cor.melt(id_vars=['index'])
corr.head(3)

            AP1       AP2       AP3     Final
AP1    1.000000  0.278632  0.003717  0.630113
AP2    0.278632  1.000000  0.952691  0.920175
AP3    0.003717  0.952691  1.000000  0.776387
Final  0.630113  0.920175  0.776387  1.000000
   index       AP1       AP2       AP3     Final
0    AP1  1.000000  0.278632  0.003717  0.630113
1    AP2  0.278632  1.000000  0.952691  0.920175
2    AP3  0.003717  0.952691  1.000000  0.776387
3  Final  0.630113  0.920175  0.776387  1.000000


Unnamed: 0,index,variable,value
0,AP1,AP1,1.0
1,AP2,AP1,0.278632
2,AP3,AP1,0.003717


In [33]:
alt.Chart(corr).mark_rect().encode(
    x='index',
    y='variable',
    color='value'
)

## Histogram

In [34]:
df = pd.read_csv('https://raw.githubusercontent.com/lcbjrrr/ProgWdata/refs/heads/main/grades_ok.csv')
alt.Chart(df).mark_bar().encode(
    x='Grade',
    y='count()'
)

## Boxplot

In [35]:
alt.Chart(df).mark_boxplot().encode(
    x='Grade'
)

In [36]:
alt.Chart(df).mark_boxplot().encode(
    x='Grade',
    y='Course',
    color='Course'
)