To install a Python library, simply use the `!pip install` command followed by the name of the library in a code cell. For example, to install the `pandas` library, you would run:

In [None]:
!pip install pandas

You can also install multiple libraries at once:

In [None]:
!pip install numpy matplotlib

**Do students with the same total study hours behave differently based on consistency?**

In [None]:
import pandas as pd
import numpy as np

df = pd.read_excel("/content/Course_Completion_Cleaned.xlsx")

list(df.columns)


In [None]:
df.columns = (
    df.columns
    .str.strip()        # remove extra spaces
    .str.lower()        # convert to lowercase
    .str.replace(' ', '_')  # replace spaces with underscore
)


In [None]:
df.columns

In [None]:
df['engagement_consistency'] = np.where(
    df['login_frequency'] > 0,
    df['time_spent_hours'] / df['login_frequency'],
    0
)

In [None]:
df[['engagement_consistency']].describe()


**Do students reduce activity gradually before dropping out?**

In [None]:
bins = [0, 1, 5, df['time_spent_hours'].max()]
labels = ['Low', 'Medium', 'High']

df['engagement_level'] = pd.cut(
    df['time_spent_hours'],
    bins=bins,
    labels=labels,
    include_lowest=True
)


In [None]:
df['time_spent_hours'].describe()


In [None]:
df['time_spent_hours'].value_counts().head()


In [None]:
df['engagement_level'].value_counts()


**Which courses appear active but have poor completion?**

In [None]:
df.columns




In [None]:
list(df.columns)


In [None]:
df.columns = (
    df.columns
    .str.strip()
    .str.lower()
    .str.replace(' ', '_')
)


In [None]:
df.columns


In [None]:
df.groupby('course_id')



In [None]:
if 'course_name' in df.columns:
    print("Course column exists")
else:
    print("Course column not found")


In [None]:
df.groupby('course_name')

In [None]:
course_metrics = (
    df.groupby('course_name')
    .agg(
        avg_study_hours=('time_spent_hours', 'mean'),
        completion_rate=('completed',
                          lambda x: (x == True).mean() * 100)
    )
    .reset_index()
)

course_metrics

**Can learners be segmented by dropout risk before they fail?**

In [None]:
df['Risk_Score'] = (
    (df['time_spent_hours'] < df['time_spent_hours'].median()).astype(int) +
    (df['login_frequency'] < df['login_frequency'].median()).astype(int)
)

print(df[['time_spent_hours', 'login_frequency', 'Risk_Score']].head())

In [None]:
correlation = df['Risk_Score'].corr(df['satisfaction_rating'])
print(f"Correlation between Risk_Score and Satisfaction Rating: {correlation:.2f}")

**Is frequent learning better than long study hours?**

In [None]:
import seaborn as sns

sns.scatterplot(
    data=df,
    x='login_frequency',
    y='time_spent_hours',
    hue='completed'
)

In [None]:
import matplotlib.pyplot as plt
sns.boxplot(x='completed', y='time_spent_hours', data=df, palette='pastel')
plt.title('Time Spent Hours Distribution by Course Completion')
plt.xlabel('Course Completion')
plt.ylabel('Time Spent Hours')
plt.xticks(ticks=[0, 1], labels=['Not Completed', 'Completed'])
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.show()

**Which metrics should the business monitor weekly?**

In [None]:
df[['time_spent_hours','login_frequency','engagement_consistency']].corr()