# Introduction

This assessment aims to evaluate your understanding and application of the concepts covered in the Data Analytics course. You will be tasked with analyzing a dataset related to remote work and mental health, utilizing various data manipulation, statistical analysis, and visualization techniques learned throughout the course. This exercise will help reinforce your skills in data handling, exploratory analysis, and drawing meaningful insights from data.

### Submission Details:

The deadline for submission is 16 November at 11:59 PM. Specific submission details will be shared with you shortly.

### Passing Criteria:

To successfully pass this assessment, you must achieve a score of 80% or higher.
We encourage you to engage with the material and demonstrate your analytical skills. Good luck!


---



# Section 1 - Beginner (25%)


## Shopping Cart System with Discounts

Write a Python program to simulate a shopping cart system for an online store. The program will calculate the total cost of items, apply discounts, and check if the total exceeds a specified budget.

1.	Variables and Lists:
  - Define a `budget` variable with an initial value of 200.
  
  - Create two empty lists called `item_names_list` and `item_prices_list` to store the name and price of each item separately.

In [None]:
# Write your code here

budget = 200
item_names_list=[]
item_prices_list=[]


2. Functions:
  - Write a function `add_item_to_cart(item_name, item_price)` that takes the item’s name and price as arguments, appends the name to item_names and the price to item_prices, and returns both updated lists.
  
  - Write a function `calculate_total(item_prices)` that calculates and returns the total cost of all items in item_prices.

    Conditions:
    - If the total cost exceeds the budget after adding an item, print "Budget exceeded!" and stop adding more items.
    - If the total cost is within budget and exceeds $100, apply a 10% discount on the total and print the discounted total.
  





In [None]:
# Write your code here

def add_item_to_cart(item_name, item_price):
  item_names_list.append(item_name)
  item_prices_list.append(item_price)
  return item_names_list, item_prices_list



def calculate_total(item_prices):
  total = sum(item_prices)

  if total > budget:
    print("Budget exceeded!")
    return total, 0
  
  
  discount = 0
  if total > 100 and total < 200:
    discount = total * 0.1
    total *= 0.9
  
  return total, discount
  
  

3.	Loop and Input:
  - Start the input only once the user says 'start'
  - Use a loop to allow the user to add items to the cart by entering an item name and price. The loop should stop when the user types 'done'.
  - For each item, add it to item_names and item_prices using add_item_to_cart, then update the total cost using calculate_total.

Output:
  - After the loop ends, display the final cart with each item and its price, the initial total, any applicable discount, and the final total.




In [None]:
# Write your c# Write your code here

enter = input("Enter 'start' to start:")

initial_total = 0
while enter == "start":
  item_name = input("Enter item name:")
  item_price = float(input("Enter item price:"))
  
  add_item_to_cart(item_name, item_price)
  initial_total = sum(item_prices_list)
  final_total, discount = calculate_total(item_prices_list)
  
  if final_total > budget:
    print("Budget exceeded!")
    break
  
  enter = input("Enter 'start' to continue, 'done' to stop:")


for i in range(len(item_names_list)):
  print(f"{item_names_list[i]}: {item_prices_list[i]}")


print(initial_total)
print(discount)
print(final_total)

In [None]:
# Write your code here

# Section 2 - Intermidiate (55%) - Remote Work and Mental Health Analysis

Dataset source: Kaggle (https://www.kaggle.com/datasets/waqi786/remote-work-and-mental-health)




## Objective:
- In the following sections, you will explore the "Remote Work and Mental Health" dataset using Python and different data science libraries such as Pandas, NumPy and Matplotlib.
- Follow the instructions below to complete each task. Please provide code for each question and any observations as comments when necessary.


In [None]:
# Import necessary modules and libraries

import pandas as pd
import matplotlib.pyplot as plt

## 1. Load Dataset (2 marks)
- Instructions: Load the dataset using Pandas and display few rows.
- Question: Describe the overall structure (rows, columns, data types) as a comment at the end of your code.



In [None]:
# Write code here
df = pd.read_csv("dataset.csv")

print("First few rows of Iris dataframe:\n")
head_rows = df.head()

print(head_rows)

print(df.info())

#The overall structure: 20 columns, 5000 rows, datastructure: int,object



## 2. Display 'n' Rows (3 marks)
- Instructions: Display the first 13 rows of the dataset.

In [None]:
# Write code here

print(df.head(13))

- Instructions: Display the last 7 rows of the dataset

In [None]:
# Write code here

print(df.tail(7))

## 3. Find the Number of Null Values in the Dataset (2 mark)

In [None]:
# Write code here

print(df.isna().sum)

## 4. Statistical Summary for Numeric Columns (10 marks)
Instructions: Use individual commands to find the statistical summary.

- Count

In [None]:
# Write code here
total_non_null = df.count().sum()
print(total_non_null)

- Mean

In [None]:
# Write code here
column_means = df[["Age", "Years_of_Experience", "Hours_Worked_Per_Week", 
                   "Number_of_Virtual_Meetings", "Work_Life_Balance_Rating", 
                   "Social_Isolation_Rating", "Company_Support_for_Remote_Work"]].mean()
print(column_means)

- Standard Deviation

In [None]:
# Write code here

column_std = df[["Age", "Years_of_Experience", "Hours_Worked_Per_Week", 
                   "Number_of_Virtual_Meetings", "Work_Life_Balance_Rating", 
                   "Social_Isolation_Rating", "Company_Support_for_Remote_Work"]].std()
print(column_std)


- Quartiles

In [None]:
# Write code here
specific_q1 = df[["Age", "Years_of_Experience", "Hours_Worked_Per_Week", 
                   "Number_of_Virtual_Meetings", "Work_Life_Balance_Rating", 
                   "Social_Isolation_Rating", "Company_Support_for_Remote_Work"]].quantile(0.25)
specific_q2 = df[["Age", "Years_of_Experience", "Hours_Worked_Per_Week", 
                   "Number_of_Virtual_Meetings", "Work_Life_Balance_Rating", 
                   "Social_Isolation_Rating", "Company_Support_for_Remote_Work"]].quantile(0.50)
specific_q3 = df[["Age", "Years_of_Experience", "Hours_Worked_Per_Week", 
                   "Number_of_Virtual_Meetings", "Work_Life_Balance_Rating", 
                   "Social_Isolation_Rating", "Company_Support_for_Remote_Work"]].quantile(0.75)

print(specific_q1)
print(specific_q2)
print(specific_q3)

## 5. Calculate Extrema (2 marks)

In [None]:
# Write code here

specific_min = df[["Age", "Years_of_Experience", "Hours_Worked_Per_Week", 
                   "Number_of_Virtual_Meetings", "Work_Life_Balance_Rating", 
                   "Social_Isolation_Rating", "Company_Support_for_Remote_Work"]].min()
specific_max = df[["Age", "Years_of_Experience", "Hours_Worked_Per_Week", 
                   "Number_of_Virtual_Meetings", "Work_Life_Balance_Rating", 
                   "Social_Isolation_Rating", "Company_Support_for_Remote_Work"]].max()

print(specific_max)
print(specific_min)


## 6. Find Unique Values in a Categorical Column (3 marks)

- Instructions: Identify the unique values in the `job_role` column (2 marks)
- Question: How many unique roles are represented in the dataset? (1 mark)

In [None]:
# Write code here

unique_roles = df["Job_Role"].unique()
print(unique_roles)

num_unique_roles = df["Job_Role"].nunique()
print(num_unique_roles)

# 7 unique roles are represented in the dataset

## 7. Group Data and Calculate Mean (4 marks)
- Instructions: Group the dataset by `job_role` and calculate the mean of the `Work_Life_Balance_Rating` for each role.
- Question: Which job role has the highest average mental health index?

In [None]:
# Write code here
grouped_data = df.groupby('Job_Role')['Work_Life_Balance_Rating'].mean()

print(grouped_data)

# Sales has the highest average mental health index


## 8. Filter Data Based on Condition (4 marks)
- Instructions: Filter the dataset to show only rows where `work_hours_per_week` is greater than 40.
- Question: How many employees are working overtime?

In [None]:
# Write code here
overtime_employees = df[df["Hours_Worked_Per_Week"] > 40]
print(overtime_employees)

print(len(overtime_employees))

# 2384 employees are working overtime

## 9 . Histogram of Work Hours per Week (5 marks)
- Instructions: Create a histogram of `Hours_Worked_Per_Week` (4 marks).
- Question: Describe the distribution of work hours. Are most employees working around a certain number of hours per week? (1 mark)

In [None]:
# Write code here

plt.figure(figsize=(10, 6))

plt.hist(df["Hours_Worked_Per_Week"], bins=10, color='skyblue',edgecolor='black')

plt.xlabel("Hours_Worked_Per_Week")
plt.ylabel("Employee number")
plt.title("Distribution of Weekly Work Hours")
plt.show()

#Most employees work around 20-60 hours per week.

## 10. Scatter Plot of Work Hours vs. Mental Health Index (4 marks)
- Instructions: Create a scatter plot with `Hours_Worked_Per_Week` on the x-axis and `Years_of_Experience` on the y-axis.

In [None]:

plt.figure(figsize=(15, 6))
plt.scatter(df['Hours_Worked_Per_Week'], df['Years_of_Experience'], alpha=0.6, color='skyblue', edgecolors='black')
plt.title('Scatter Plot of Work Hours vs. Years of Experience')
plt.xlabel('Hours Worked Per Week')
plt.ylabel('Years of Experience')
plt.show()



## 11. Bar Chart of Average Mental Health Index by Job Role (5 marks)
- Instructions: Create a bar chart showing the average `Work_Life_Balance_Rating` for each `Job_Role` (4 marks).
- Question: Which job roles have the highest and lowest average mental health index? (1 mark)

In [None]:
# Write code here
average_balance_by_role = df.groupby('Job_Role')['Work_Life_Balance_Rating'].mean()


average_balance_by_role.plot(kind='bar', color='skyblue', edgecolor='black')
plt.title('Average Work Life Balance Rating by Job Role')
plt.xlabel('Job Role')
plt.ylabel('Average Work Life Balance Rating')
plt.xticks(rotation=45)
plt.show()

#highest:sales  lowest:HR

## 12. Pie Chart of Workload Level Distribution (5 marks)
- Instructions: Use a pie chart to show the proportion of `Access_to_Mental_Health_Resources` (Yes and no) in the dataset (4 marks).
- Question: What percentage of employees have access to mental health resources? (1 mark)

In [None]:
# Write code here

access_counts = df['Access_to_Mental_Health_Resources'].value_counts()

# 绘制饼图
plt.figure(figsize=(7, 7))
plt.pie(access_counts, labels=access_counts.index, autopct='%1.1f%%', colors=['skyblue', 'lightcoral'], startangle=90, wedgeprops={'edgecolor': 'black'})
plt.title('Proportion of Employees with Access to Mental Health Resources')
plt.show()

#51.1% of employees have access to mental health resources

## 13. Scatter Plot of Years of Experience vs. Mental Health Index (6 marks)
- Instructions: Create a scatter plot with `age` on the x-axis and `Social_Isolation_Rating` on the y-axis (4 marks).
- Question: Do you observe any trends or relationships between age and social isolation? Is there a noticeable impact of age on isoloation? (2 marks)

In [None]:
# Write code here

plt.figure(figsize=(8, 6))
plt.scatter(df['Age'], df['Social_Isolation_Rating'], alpha=0.6, color='skyblue', edgecolors='black')
plt.title('Scatter Plot of Age vs Social Isolation Rating')
plt.xlabel('Age')
plt.ylabel('Social Isolation Rating')
plt.show()

#

# Section 3 - Long Answer/Advanced (20%)



## Job Role and Workload Level Impact on Mental Health

Instructions: Investigate the influence of job roles and workload level on the mental health index.
- Create a new column `workload_level` that labels each entry as "High" if the `Hours_Worked_Per_Week` is above its mean, otherwise "Low." (5 marks)
- Group the dataset by `Industry` and calculate the average `Hours_Worked_Per_Week` for each combination. (5 marks)
- Use a bar chart to display the average `Stress_Level` for each job role, with separate bars for high and low stress levels. (5 marks)
- Analyze the results: Which job roles and workload levels appear to have the greatest impact on mental health? (5 marks)





In [None]:
# Write code here




mean_hours = df['Hours_Worked_Per_Week'].mean()

df['workload_level'] = df['Hours_Worked_Per_Week'].apply(lambda x: 'High' if x > mean_hours else 'Low')




industry_workload = df.groupby('Industry')['Hours_Worked_Per_Week'].mean()

print("Average Hours Worked Per Week by Industry:")
print(industry_workload)


stress_map = {'Low': 1, 'Medium': 2, 'High': 3}
df['Stress_Level'] = df['Stress_Level'].map(stress_map)

role_stress_level = df.groupby(['Job_Role', 'workload_level'])['Stress_Level'].mean().unstack()

role_stress_level.plot(kind='bar', figsize=(10, 6), color=['skyblue', 'salmon'], edgecolor='black')
plt.title('Average Stress Level by Job Role and Workload Level')
plt.xlabel('Job Role')
plt.ylabel('Average Stress Level')
plt.xticks(rotation=45)
plt.legend(title='Workload Level', labels=['Low', 'High'])
plt.show()



#The role data scientist with low workload has a big impact on mental health, which may be related to the complexity of this role. 

#The role with long working hour tend to have more pressure level, like project manager and marketing. 

#In a nutshell, data scientist and project manager have the greatest impact on mental health. 
