# Lab Instructions

Choose your own adventure! In this lab, you will select a dataset, identify the target feature, and determine what relationships are present between the target and the other features in the data.

The dataset should have at least 5 features plus the target and at least a few hundred rows.  If the original dataset has more than 5 features, you may select the 5 that seem most interesting for this project. The subject can be anything you choose.  

For your lab submission, describe the dataset and the features - including all of the values of the features - and identify the target feature.  Then make visualizations to show the relationship of each feature to the target.  Which feature(s) seem most related?  Which features don't seem to influence the value of the target?  Draw at least one big picture conclusion about your data from the visualizations you've created.


***
Paul Sachse
***

I am looking to understand how academic stress varies among students and which factors relate to it. The dataset includes 1–5 ratings for peer pressure, academic pressure from home, academic competition, and an overall stress index. There are also categorical attributes such as academic stage, study environment, daily bad habits, and coping strategy.

### Load Data

In [None]:
import pandas as pd
import plotly.express as px

# --- Load ---
csv_path = "academic Stress level.csv"
df = pd.read_csv(csv_path, encoding="utf-8")

df = df.rename(columns={
    "Timestamp": "Timestamp",
    "Your Academic Stage": "Academic_Stage",
    "Peer pressure": "Peer_Pressure",
    "Academic pressure from your home": "Home_Academic_Pressure",
    "Study Environment": "Study_Environment",
    "What coping strategy you use as a student?": "Coping_Strategy",
    "Do you have any bad habits like smoking, drinking on a daily basis?": "Daily_Bad_Habits",
    "What would you rate the academic  competition in your student life": "Academic_Competition_Rating",
    "Rate your academic stress index ": "Stress_Index"
})

# categoricals
cat_cols = ["Academic_Stage", "Study_Environment", "Coping_Strategy", "Daily_Bad_Habits"]
for c in cat_cols:
    df[c] = df[c].astype("category")

category_orders = {
    "Academic_Stage": ["high school", "undergraduate", "post-graduate"],
    "Study_Environment": ["Peaceful", "disrupted", "Noisy"],  # 'disrupted' is lowercase in the data
    "Daily_Bad_Habits": ["No", "Yes", "prefer not to say"],
    "Coping_Strategy": [
        "Analyze the situation and handle it with intellect",
        "Social support (friends, family)",
        "Emotional breakdown (crying a lot)"
    ]
}

***
# Visualization 1: Stress Index (Target Feature) Count

In [None]:
stress_counts = (df["Stress_Index"]
                 .value_counts()
                 .sort_index()
                 .reset_index())
stress_counts.columns = ["Stress_Index", "Count"]

fig = px.bar(
    stress_counts, x="Stress_Index", y="Count", text="Count",
    title="Stress Index (1–5) Distribution"
)
fig.update_layout(xaxis_title="Stress Index (1=Low, 5=High)", yaxis_title="Count")
fig.update_traces(textposition="outside")
fig.show()

#### Summary: Most students report high stress: the counts are 1: 6, 2: 9, 3: 36, 4: 56, 5: 33, so 4 and 5 together dominate the distribution. The overall average Stress Index is approximately 3.72.

***
# Visualization 2: Stress and Study Environment

In [None]:
fig = px.violin(
    df.dropna(subset=["Study_Environment"]),
    x="Study_Environment", y="Stress_Index",
    category_orders=category_orders, box=True, points="all",
    title="Stress Index by Study Environment"
)
fig.update_layout(xaxis_title="", yaxis_title="Stress Index (1–5)")
fig.show()

#### Summary: Average stress differs by environment. The violin plot shows that stress levels tend to rise as the study environment becomes less peaceful. Students in peaceful environments show a wider range of stress, including some lower scores, while disrupted and especially noisy environments push stress scores toward the top end of the scale. In short, the noisier the study environment, the higher and more consistent the stress levels.

***
# Visualization 3: Stress and Academic Stage

In [None]:
fig = px.box(
    df, x="Academic_Stage", y="Stress_Index",
    category_orders=category_orders, points="all",
    title="Stress Index by Academic Stage"
)
fig.update_layout(xaxis_title="", yaxis_title="Stress Index (1–5)")
fig.show()

#### Summary: Stress tends to be highest in high school and gradually decreases as students advance to higher academic stages. Post-graduate students show the lowest and most stable stress levels, while high school students experience the widest range of stress. Still, all three groups center around the stress index of 4.

***
# Visualization 4: Stress and Peer Pressure

In [None]:
import plotly.express as px

fig = px.scatter(
    df,
    x="Peer_Pressure", 
    y="Stress_Index",
    trendline="ols",  # adds regression line using ordinary least squares
    opacity=0.8,
    title="Relationship Between Peer Pressure and Stress Index (Overall Trend)"
)
fig.update_layout(
    xaxis_title="Peer Pressure (1=Low, 5=High)",
    yaxis_title="Stress Index (1=Low, 5=High)"
)
fig.show()

#### Summary: The plot shows a slight upward trend, meaning higher peer pressure is linked with somewhat higher stress.

***
# Visualization 5: Correlation Heatmap

In [None]:
num_cols = ["Peer_Pressure","Home_Academic_Pressure","Academic_Competition_Rating","Stress_Index"]
corr = df[num_cols].corr()

fig = px.imshow(
    corr, text_auto=True, aspect="auto",
    title="Correlation Matrix: Peer, Home, Competition, Stress",
    zmin=-1, zmax=1
)
fig.update_layout(xaxis_title="", yaxis_title="")
fig.show()

#### Summary: The correlation matrix shows that stress is moderately related to all three forms of academic pressure: peer, home, and competition. Among them, peer pressure shows the strongest connection with stress, confirming that social and academic expectations from others play a noticeable but not overwhelming role in how stressed students feel.

***
# Visualization 6: Stress and Coping Strategy

In [None]:
import plotly.express as px
import pandas as pd

# Summarize counts and mean stress by coping strategy
coping_summary = (
    df.groupby("Coping_Strategy", observed=True)
      .agg(Count=("Stress_Index", "size"),
           Mean_Stress=("Stress_Index", "mean"))
      .reset_index()
)

fig = px.treemap(
    coping_summary,
    path=["Coping_Strategy"],
    values="Count",
    color="Mean_Stress",
    color_continuous_scale="RdBu_r",
    title="Coping Strategies: Frequency and Average Stress"
)
fig.update_layout(
    coloraxis_colorbar=dict(title="Avg Stress (1–5)")
)
fig.show()





#### Summary: The most common coping strategy among students is to analyze and handle problems rationally, and this group also shows the lowest average stress. Emotional coping, such as crying, is associated with the highest stress, while seeking social support appears to help reduce stress moderately. In general, more problem-focused strategies are linked with lower stress levels.

***
# Big-Picture Conclusion:

#### Overall, the results show that stress levels among students are generally high, with most people rating their stress as a 4 or 5. Stress rises in noisy or disrupted environments and with more peer, home, and academic pressure. High school students report slightly higher stress than others, but it’s high across all stages. Coping style also seems to make a difference, where students who try to handle problems logically or rely on social support tend to have lower stress, while those who cope through emotional breakdowns show the highest levels.