<a href="https://colab.research.google.com/github/intani111/ProjectIbmGranite/blob/main/Project_IBM_Granite.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# Installation First (Run in Google Colab)
!pip install pandas matplotlib seaborn langchain_community
!pip install replicate

Collecting langchain_community
  Downloading langchain_community-0.3.27-py3-none-any.whl.metadata (2.9 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain_community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain_community)
  Downloading pydantic_settings-2.10.1-py3-none-any.whl.metadata (3.4 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain_community)
  Downloading httpx_sse-0.4.1-py3-none-any.whl.metadata (9.4 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain_community)
  Downloading marshmallow-3.26.1-py3-none-any.whl.metadata (7.3 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.5.7->langchain_community)
  Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)
Collecting python-dotenv>=0.21.0 (from pydantic-settings<3.0.0,>=2.4.0->langchain_community)
  Downloading python_dotenv-1.1.1-py3-none-any.whl.metadata (24 k

In [2]:
# Import Libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from langchain_community.llms import Replicate
import os
from google.colab import userdata

In [3]:
# Set the API token
api_token = userdata.get('api_token')
os.environ["REPLICATE_API_TOKEN"] = api_token

In [4]:
# Model setup
model = "ibm-granite/granite-3.3-8b-instruct"
llm = Replicate(
    model=model,
    replicate_api_token=api_token,
)

In [5]:
# File Upload
from google.colab import files

uploaded = files.upload()

Saving Student Depression Dataset.csv to Student Depression Dataset.csv


In [6]:
# Read Data
data = pd.read_csv('Student Depression Dataset.csv', delimiter=';')

In [7]:
# Show Data
print("Data loaded successfully!")
print(data.head())

Data loaded successfully!
   id  Gender  Age           City Profession  Academic Pressure  \
0   2    Male   33  Visakhapatnam    Student                  5   
1   8  Female   24      Bangalore    Student                  2   
2  26    Male   31       Srinagar    Student                  3   
3  30  Female   28       Varanasi    Student                  3   
4  32  Female   25         Jaipur    Student                  4   

   Work Pressure  CGPA  Study Satisfaction  Job Satisfaction  \
0              0  8,97                   2                 0   
1              0   5,9                   5                 0   
2              0  7,03                   5                 0   
3              0  5,59                   2                 0   
4              0  8,13                   3                 0   

      Sleep Duration Dietary Habits   Degree  \
0          5-6 hours        Healthy  B,Pharm   
1          5-6 hours       Moderate      BSc   
2  Less than 5 hours        Healthy       

In [8]:
# Cleaning data
# Check Null Value in Column
null_values = data.isnull().sum()
# Shows Null Value for every column
print("Jumlah Nilai Null di Setiap Kolom:")
print(null_values)

Jumlah Nilai Null di Setiap Kolom:
id                                       0
Gender                                   0
Age                                      0
City                                     0
Profession                               0
Academic Pressure                        0
Work Pressure                            0
CGPA                                     0
Study Satisfaction                       0
Job Satisfaction                         0
Sleep Duration                           0
Dietary Habits                           0
Degree                                   0
Have you ever had suicidal thoughts ?    0
Work/Study Hours                         0
Financial Stress                         3
Family History of Mental Illness         0
Depression                               0
dtype: int64


In [9]:
# Change Null Value With Zero Value
data['Financial Stress'] = data['Financial Stress'].replace('', 0)
# Then, insert Null Value With Zero Value
data['Financial Stress'] = data['Financial Stress'].fillna(0)
# VerifiedNull Value in Column
null_values_after = data['Financial Stress'].isnull().sum()
print("Jumlah Nilai Null di Kolom 'Financial Stress' setelah pengisian:")
print(null_values_after)

Jumlah Nilai Null di Kolom 'Financial Stress' setelah pengisian:
0


In [12]:
# Load dataset
def analyze_category(data, category, target='Depression'):
    """Analyze a category's relationship with depression using IBM Granite"""
    stats = data.groupby(category)[target].agg(['mean', 'count', 'std'])

    analysis_prompt = f"""
    Analyze the relationship between {category} and {target} in this student depression dataset.

    Dataset Statistics:
    {stats}

    Provide comprehensive analysis covering:
    1. Key patterns and significant differences
    2. Psychological/social explanations for these patterns
    3. Comparisons to established research
    4. Specific intervention recommendations

    Use professional language suitable for mental health practitioners.
    """

    return stats, llm(analysis_prompt)

In [13]:
# 1. Demographic Analysis
print("\n=== DEMOGRAPHIC ANALYSIS ===")
age_stats, age_analysis = analyze_category(data, 'Age')
print(f"\nAge Statistics:\n{age_stats}")
print("\nAge Analysis:")
print(age_analysis)

gender_stats, gender_analysis = analyze_category(data, 'Gender')
print(f"\nGender Statistics:\n{gender_stats}")
print("\nGender Analysis:")
print(gender_analysis)


=== DEMOGRAPHIC ANALYSIS ===


  return stats, llm(analysis_prompt)



Age Statistics:
         mean  count       std
Age                           
18   0.766226   1587  0.423364
19   0.705128   1560  0.456131
20   0.705856   2237  0.455759
21   0.677289   1726  0.467649
22   0.604310   1160  0.489209
23   0.638906   1645  0.480464
24   0.668291   2258  0.470932
25   0.606502   1784  0.488663
26   0.574026   1155  0.494704
27   0.606703   1462  0.488649
28   0.613221   2133  0.487127
29   0.565641   1950  0.495800
30   0.413974   1145  0.492759
31   0.480729   1427  0.499804
32   0.519017   1262  0.499836
33   0.389857   1893  0.487847
34   0.273842   1468  0.446081
35   0.200000     10  0.421637
36   0.142857      7  0.377964
37   0.000000      2  0.000000
38   0.250000      8  0.462910
39   0.666667      3  0.577350
41   1.000000      1       NaN
42   0.500000      4  0.577350
43   0.500000      2  0.707107
44   0.000000      1       NaN
46   0.500000      2  0.707107
48   0.333333      3  0.577350
49   0.000000      1       NaN
51   0.000000      1  

In [14]:
# 2. Academic Factors Analysis
print("\n=== ACADEMIC FACTORS ===")
acad_stats, acad_analysis = analyze_category(data, 'Academic Pressure')
print(f"\nAcademic Pressure Statistics:\n{acad_stats}")
print("\nAcademic Pressure Analysis:")
print(acad_analysis)

cgpa_stats, cgpa_analysis = analyze_category(data, 'CGPA')
print(f"\nCGPA Statistics:\n{cgpa_stats}")
print("\nCGPA Analysis:")
print(cgpa_analysis)


=== ACADEMIC FACTORS ===

Academic Pressure Statistics:
                       mean  count       std
Academic Pressure                           
0                  0.444444      9  0.527046
1                  0.194126   4801  0.395568
2                  0.374820   4178  0.484134
3                  0.601581   7462  0.489605
4                  0.761397   5155  0.426271
5                  0.860864   6296  0.346116

Academic Pressure Analysis:
**Comprehensive Analysis of Academic Pressure and Depression Relationship**

**1. Key Patterns and Significant Differences**

The dataset reveals a distribution of academic pressure levels among students, with the majority (approximately 80%) reporting moderate to high levels of academic pressure. Specifically:

- **Low Academic Pressure (0):** 11% of students report low levels of academic pressure.
- **Moderate Academic Pressure (1):** 19% of students experience moderate pressure.
- **High Academic Pressure (2):** 37% of students report high level

In [15]:
# 3. Lifestyle Factors Analysis
print("\n=== LIFESTYLE FACTORS ===")
sleep_stats, sleep_analysis = analyze_category(data, 'Sleep Duration')
print(f"\nSleep Duration Statistics:\n{sleep_stats}")
print("\nSleep Analysis:")
print(sleep_analysis)

diet_stats, diet_analysis = analyze_category(data, 'Dietary Habits')
print(f"\nDietary Habits Statistics:\n{diet_stats}")
print("\nDietary Analysis:")
print(diet_analysis)


=== LIFESTYLE FACTORS ===

Sleep Duration Statistics:
                       mean  count       std
Sleep Duration                              
5-6 hours          0.568818   6183  0.495282
7-8 hours          0.595018   7346  0.490922
Less than 5 hours  0.645126   8310  0.478504
More than 8 hours  0.509265   6044  0.499956
Others             0.500000     18  0.514496

Sleep Analysis:
**Comprehensive Analysis of Sleep Duration and Depression in the Student Depression Dataset**

**1. Key Patterns and Significant Differences**

The dataset reveals distinct patterns in sleep duration among students with depression. Notably:

- Students reporting 7-8 hours of sleep per night (mean depression rate: 0.595018) have the lowest depression prevalence, suggesting an optimal sleep duration for mental well-being.
- Students sleeping less than 5 hours (mean depression rate: 0.645126) exhibit the highest depression rates, indicating a strong association between insufficient sleep and depressive sympto

In [16]:
# Clean column names to remove leading/trailing whitespace for Risk Factor
data.columns = data.columns.str.strip()

In [18]:
# 4. Risk Factors Analysis
print("\n=== RISK FACTORS ANALYSIS ===")
# Define risk factor columns
risk_factor_columns = ['Have you ever had suicidal thoughts?', 'Financial Stress', 'Family History of Mental Illness']
available_columns = [col for col in risk_factor_columns if col in data.columns]
for column in available_columns:
    # Get stats and analysis
    stats, analysis = analyze_category(data, column)

    print(f"\n{column.upper()} STATISTICS:")
    print(stats)

    print(f"\n{column.upper()} ANALYSIS:")
    print(analysis)

    # Additional specialized prompts for specific risk factors
    if column == 'Have you ever had suicidal thoughts?':
        suicide_prompt = """
        Focus specifically on:
        1. Prevalence of suicidal ideation in the student population
        2. Correlation strength between suicidal thoughts and depression scores
        3. Immediate intervention recommendations
        4. Warning signs to monitor for
        """
        suicide_specific = llm(analysis + suicide_prompt)
        print("\nSPECIALIZED SUICIDE RISK ASSESSMENT:")
        print(suicide_specific)

    elif column == 'Financial Stress':
        finance_prompt = """
        Address:
        1. Threshold levels where financial stress becomes clinically significant
        2. Interaction with academic performance
        3. Institutional support recommendations
        """
        finance_specific = llm(analysis + finance_prompt)
        print("\nSPECIALIZED FINANCIAL STRESS ANALYSIS:")
        print(finance_specific)


=== RISK FACTORS ANALYSIS ===

FINANCIAL STRESS STATISTICS:
                      mean  count       std
Financial Stress                           
0.0               0.333333      3  0.577350
1.0               0.318688   5121  0.466013
2.0               0.429757   5061  0.495090
3.0               0.589361   5226  0.491997
4.0               0.690909   5775  0.462159
5.0               0.812807   6715  0.390095

FINANCIAL STRESS ANALYSIS:
**Comprehensive Analysis of Financial Stress and Depression in Student Dataset**

**1. Key Patterns and Significant Differences**

The dataset reveals a clear, monotonically increasing relationship between levels of financial stress and the prevalence of depression among students. 

- At the lowest level of financial stress (0), depression is least prevalent, at approximately 33.33%.
- As financial stress increases to level 1, depression prevalence rises to about 31.87%.
- The trend continues with a substantial increase to 42.98% at financial stress lev

In [19]:
# 5. Comprehensive Correlation Analysis
print("\n=== COMPREHENSIVE ANALYSIS ===")
comprehensive_prompt = f"""
Analyze this complete student depression dataset containing:
- Demographic factors (Age, Gender)
- Academic factors (Academic Pressure, CGPA, Study Satisfaction)
- Lifestyle factors (Sleep Duration, Dietary Habits)
- Risk factors (Suicidal Thoughts, Financial Stress)
Dataset Statistics Summary:
{data.describe()}
Perform integrated analysis covering:
1. The 3 strongest predictors of depression
2. Interaction effects between different factors
3. Recommended priority interventions
4. Potential confounding variables
5. Suggestions for future research
"""
comprehensive_analysis = llm(comprehensive_prompt)
print("\nComprehensive Dataset Analysis:")
print(comprehensive_analysis)


=== COMPREHENSIVE ANALYSIS ===

Comprehensive Dataset Analysis:
### Integrated Analysis of Student Depression Dataset

#### 1. The 3 Strongest Predictors of Depression

To identify the strongest predictors of depression, we would typically use statistical modeling, such as logistic regression, to assess the impact of each factor while controlling for others. However, based on the summary statistics provided, we can make some preliminary observations:

- **Academic Pressure**: The mean value is 3.14, indicating a moderate level of academic pressure. Given that it's a significant life stressor for students, it's likely a strong predictor.
- **Financial Stress**: With a mean of 3.14, financial stress also appears substantial and is known to be a significant contributor to mental health issues.
- **Study Satisfaction**: A mean of 2.94 suggests low levels of satisfaction with studies, which can lead to increased stress and potentially depression.

To confirm these as the strongest predicto