# Assignment: Data Visualization and Presentation - AI Fellowship

### Exploring Employee Dataset with Data Visualizations

In this assignment, you will work with a synthetic employee dataset that captures various demographic, educational, and professional attributes such as age, experience, education level, job role, salary, performance scores, and more.

The goal of this assignment is to practice and apply univariate and bivariate data visualization techniques using libraries like Matplotlib and Seaborn. You will create and interpret a variety of plot types to uncover trends, distributions, and relationships within the data.

**Learning Objectives:**

- Understand how to choose the appropriate plot for different types of data

- Gain insights from visual patterns and anomalies

- Practice customization and annotation to make plots more informative

- Interpret visualizations in a real-world context (e.g., HR analytics)

In [None]:
!pip install PlotChecker

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from plotchecker import PlotChecker

## Importing Data

In [None]:
df = pd.read_csv("https://raw.githubusercontent.com/AayushFuse/data/refs/heads/main/emp_info.csv")
df.head()

## Data Descriptions

- **Age**: Age of the employee in years.

- **Experience**: Number of years the employee has worked professionally.

- **Salary**: Annual salary of the employee in USD.

- **Performance_Score**: Employee’s performance rating on a scale from 1 (lowest) to 5 (highest).

- **Work_Life_Balance**: Self-reported rating of work-life balance, from 1 (poor) to 5 (excellent).

- **Department**: The department in which the employee works (e.g., Engineering, HR, Finance).

- **Education_Level**: The highest level of education attained by the employee.

- **Gender**: Gender identity of the employee.

- **Marital_Status**: Current marital status of the employee (e.g., Single, Married).

- **Job_Role**: The job title or role of the employee within the organization.


### BASIC PLOTTING TASKS

In [None]:
### Ex-1-Task-1
fig, ax = plt.subplots(figsize=(8, 5))
# Plot a histogram to explore the distribution of salaries in the dataset.
# Keep the title of the plot "Salary Distribution"
### BEGIN SOLUTION
# YOUR CODE HERE
sns.histplot(data=df, x="Salary", bins=30, kde=True, ax=ax)
ax.set_title("Salary Distribution")
ax.set_xlabel("Salary")
ax.set_ylabel("Frequency")
### END SOLUTION

In [None]:
# INTENTIONALLY LEFT BLANK


In [None]:
### Ex-1-Task-2
# Create a scatterplot to explore the relationship between Age and Salary.
fig, ax = plt.subplots(figsize=(8, 5))
### BEGIN SOLUTION
# YOUR CODE HERE
sns.scatterplot(data=df, x="Age", y="Salary", ax=ax)
ax.set_title("Relationship between Age and Salary")
ax.set_xlabel("Age")
ax.set_ylabel("Salary")
### END SOLUTION

In [None]:
# INTENTIONALLY LEFT BLANK


In [None]:
### Ex-1-Task-3
# Plot average salary per department using a line chart.
# Set Title: "Average Salary by Department"
# Use 'x' marker
fig, ax = plt.subplots(figsize=(8, 5))
### BEGIN SOLUTION
# YOUR CODE HERE
avg_salary = df.groupby("Department")["Salary"].mean().reset_index()
sns.lineplot(data=avg_salary, x="Department", y="Salary", marker='x', ax=ax)
ax.set_title("Average Salary by Department")
ax.set_xlabel("Department")
ax.set_ylabel("Average Salary")
ax.tick_params(axis='x', rotation=45)
### END SOLUTION

In [None]:
# INTENTIONALLY LEFT BLANK

### UNIVARIATE ANALYSIS TASKS

In [None]:
### Ex-2-Task-1
# Create a boxplot to visualize the distribution of Age.
# Set title: Boxplot of Age
fig, ax = plt.subplots(figsize=(8, 5))
### BEGIN SOLUTION
# YOUR CODE HERE
sns.boxplot(data=df, x="Age", ax=ax)
ax.set_title("Boxplot of Age")
ax.set_xlabel("Age")
### END SOLUTION

In [None]:
# INTENTIONALLY LEFT BLANK

In [None]:
### Ex-2-Task-2
# Plot a bar chart showing the count of each education level.
# Set title: "Education Level Count"
fig, ax = plt.subplots(figsize=(8, 5))
### BEGIN SOLUTION
# YOUR CODE HERE
sns.countplot(data=df, x="Education_Level", ax=ax)
ax.set_title("Education Level Count")
ax.set_xlabel("Education Level")
ax.set_ylabel("Count")
ax.tick_params(axis='x', rotation=45)
### END SOLUTION

In [None]:
# INTENTIONALLY LEFT BLANK

### BIVARIATE ANALYSIS TASKS

In [None]:
### Ex-3-Task-1
# Create a scatterplot of Experience vs Salary, colored by Gender.
fig, ax = plt.subplots(figsize=(8, 5))
### BEGIN SOLUTION
# YOUR CODE HERE
sns.scatterplot(data=df, x="Experience", y="Salary", hue="Gender", ax=ax)
ax.set_title("Experience vs Salary by Gender")
ax.set_xlabel("Experience")
ax.set_ylabel("Salary")
### END SOLUTION

In [None]:
# INTENTIONALLY LEFT BLANK

In [None]:
### Ex-3-Task-2
# Create a boxplot showing Salary distribution across Departments.
fig, ax = plt.subplots(figsize=(10, 6))
### BEGIN SOLUTION
# YOUR CODE HERE
sns.boxplot(data=df, x="Department", y="Salary", ax=ax)
ax.set_title("Salary Distribution Across Departments")
ax.set_xlabel("Department")
ax.set_ylabel("Salary")
ax.tick_params(axis='x', rotation=45)
### END SOLUTION

In [None]:
# INTENTIONALLY LEFT BLANK

In [None]:
### Ex-3-Task-3
# Show correlation between numerical variables using a heatmap.
fig, ax = plt.subplots(figsize=(10, 6))
# Use seaborn heatmap
### BEGIN SOLUTION
# YOUR CODE HERE
corr = df.select_dtypes(include=['number']).corr()
sns.heatmap(corr, annot=True, cmap='coolwarm', fmt=".2f", ax=ax)
ax.set_title("Correlation Heatmap of Numerical Variables")
### END SOLUTION

In [None]:
# INTENTIONALLY LEFT BLANK

In [None]:
### Ex-4-Task-1
# Use seaborn (declarative) to plot count of each marital status.
fig, ax = plt.subplots(figsize=(10, 6))
### BEGIN SOLUTION
# YOUR CODE HERE
sns.countplot(data=df, x="Marital_Status", ax=ax)
ax.set_title("Count of Each Marital Status")
ax.set_xlabel("Marital Status")
ax.set_ylabel("Count")
### END SOLUTION

In [None]:
# INTENTIONALLY LEFT BLANK