## Visual Data Analysis of Student Performance Using Azure Databricks

To explore and interpret key insights from the StudentsPerformance.csv dataset through a variety of visualizations using Azure Databricks' built-in tools. These charts help identify performance trends, score distributions, and the impact of demographic and educational factors.

In [0]:
%sql
USE CATALOG samples;
   SELECT
      hour(tpep_dropoff_datetime) as dropoff_hour,
      COUNT(*) AS num
   FROM samples.nyctaxi.trips
   WHERE pickup_zip in ('10001', '10002')
   GROUP BY 1;

dropoff_hour,num
12,76
22,106
1,99
13,83
6,32
16,74
3,50
20,118
5,20
19,125


Databricks visualization. Run in Databricks to view.

In [0]:
from pyspark.sql.functions import hour, col

pickupzip = '10001'  # Example value for pickupzip
df = spark.table("samples.nyctaxi.trips")
result_df = df.filter(col("pickup_zip") == pickupzip) \
              .groupBy(hour(col("tpep_dropoff_datetime")).alias("dropoff_hour")) \
              .count() \
              .withColumnRenamed("count", "num")
display(result_df)

dropoff_hour,num
12,62
22,60
1,40
13,69
6,19
16,63
3,18
20,88
5,12
19,85


Databricks visualization. Run in Databricks to view.

Step 1: Load Data (Use this before visualizations)

In [0]:
# Load CSV in Databricks notebook
df = spark.read.csv("/Volumes/hexaware_databricks/default/file/StudentsPerformance.csv", header=True, inferSchema=True)
df.display()

gender,race/ethnicity,parental level of education,lunch,test preparation course,math score,reading score,writing score
female,group B,bachelor's degree,standard,none,72,72,74
female,group C,some college,standard,completed,69,90,88
female,group B,master's degree,standard,none,90,95,93
male,group A,associate's degree,free/reduced,none,47,57,44
male,group C,some college,standard,none,76,78,75
female,group B,associate's degree,standard,none,71,83,78
female,group B,some college,standard,completed,88,95,92
male,group B,some college,free/reduced,none,40,43,39
male,group D,high school,free/reduced,completed,64,64,67
female,group B,high school,free/reduced,none,38,60,50


Bar Chart – Average Math Score by Gender

In [0]:
from pyspark.sql.functions import mean

result=df.groupBy("gender").agg(mean("math score").alias("Average Math Score"))
display(result)

gender,Average Math Score
female,63.63320463320464
male,68.72821576763485


Databricks visualization. Run in Databricks to view.

Pie Chart – Test Preparation Course Completion

In [0]:
result=df.groupBy("test preparation course").count()
display(result)

test preparation course,count
completed,358
none,642


Databricks visualization. Run in Databricks to view.

Box Plot – Reading Scores by Lunch Type

In [0]:
result=df.select("lunch", "reading score")
display(result)

lunch,reading score
standard,72
standard,90
standard,95
free/reduced,57
standard,78
standard,83
standard,95
free/reduced,43
free/reduced,64
free/reduced,60


Databricks visualization. Run in Databricks to view.

Histogram – Distribution of Writing Scores

In [0]:
result=df.select("writing score")
display(result)

writing score
74
88
93
44
75
78
92
39
67
50


Databricks visualization. Run in Databricks to view.

Line Chart – Average Scores by Race/Ethnicity

In [0]:
result=df.groupBy("race/ethnicity").agg(
    mean("math score").alias("Math Score"),
    mean("reading score").alias("Reading Score"),
    mean("writing score").alias("Writing Score")
)
display(result)


race/ethnicity,Math Score,Reading Score,Writing Score
group B,63.45263157894737,67.35263157894737,65.6
group C,64.46394984326018,69.10344827586206,67.82758620689656
group D,67.36259541984732,70.03053435114504,70.14503816793894
group A,61.62921348314607,64.67415730337079,62.674157303370784
group E,73.82142857142857,73.02857142857142,71.40714285714286


Databricks visualization. Run in Databricks to view.

 Scatter Plot – Math Score vs Writing Score

In [0]:
result=df.select("math score", "writing score")
display(result)

math score,writing score
72,74
69,88
90,93
47,44
76,75
71,78
88,92
40,39
64,67
38,50


Databricks visualization. Run in Databricks to view.

Bubble Chart – Math vs Reading Scores with Bubble Size as Writing Score

In [0]:
df_bubble = df.select("math score", "reading score", "writing score")

display(df_bubble)

math score,reading score,writing score
72,72,74
69,90,88
90,95,93
47,57,44
76,78,75
71,83,78
88,95,92
40,43,39
64,64,67
38,60,50


Databricks visualization. Run in Databricks to view.