<a href="https://colab.research.google.com/github/marksimmons781-ux/CS50P-learning-github/blob/main/MarkAnthony_Simmons_Academic_Tracker.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Redefine df_cleaned as it seems to be undefined
df_cleaned = df.dropna()

# Set the figure size for better readability
plt.figure(figsize=(10, 6))

# Create a scatter plot
sns.scatterplot(x='Cognitave_Load', y='Score', data=df_cleaned, hue='Course_Name', size='Score', sizes=(50, 400), palette='viridis', alpha=0.7)

# Add title and labels
plt.title('Score vs. Cognitive Load per Course', fontsize=16)
plt.xlabel('Cognitive Load', fontsize=12)
plt.ylabel('Score', fontsize=12)

# Add grid for better readability
plt.grid(axis='both', linestyle='--', alpha=0.7)

# Add a legend for the course names
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')

# Adjust layout and display the plot
plt.tight_layout()
plt.show()

In [None]:
print("Unique values in 'Course_Name':")
print(df_cleaned['Course_Name'].unique())

In [None]:
import pandas as pd
from google.colab import auth
import gspread
from google.auth import default

# Authenticate the user (will prompt for authorization if not already done)
auth.authenticate_user()

# Initialize the gspread client
creds, _ = default()
gc = gspread.authorize(creds);

# Open the Google Sheet by name and select the worksheet
# Assuming 'Academic Tracker' is the correct sheet name based on previous execution
sh = gc.open('Academic Tracker')
worksheet = sh.get_worksheet(0) # '0' selects the first tab/sheet

# Get all values and convert to a DataFrame
rows = worksheet.get_all_values()
# Use the third row (index 2) as column headers and the rest (from index 3) as data
df = pd.DataFrame(rows[3:], columns=rows[2])

# Clean up column names by stripping whitespace
df.columns = df.columns.str.strip()

# Clean: Remove any completely empty rows
df = df.replace('', pd.NA).dropna(how='all')

# Numeric Cleaning for 'Score' and 'Cognitave_Load'
if 'Score' in df.columns:
    df['Score'] = df['Score'].astype(str).str.replace('%', '')
    df['Score'] = pd.to_numeric(df['Score'], errors='coerce')

if 'Cognitave_Load' in df.columns:
    df['Cognitave_Load'] = pd.to_numeric(df['Cognitave_Load'], errors='coerce')

print("DataFrame 'df' successfully re-created and cleaned from Google Sheets.")
print("First 5 rows of df:")
print(df.head())
print("\nInfo for df:")
df.info()

In [None]:
# Recreate df_cleaned by dropping rows with any missing values from df
# This ensures df_cleaned is available for subsequent plotting and analysis
df_cleaned = df.dropna()

print(f"DataFrame 'df' shape: {df.shape}")
print(f"DataFrame 'df_cleaned' shape after dropping rows with missing values: {df_cleaned.shape}")

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Set the figure size for better readability
plt.figure(figsize=(12, 6))

# Create a bar plot, addressing the FutureWarning
sns.barplot(x='Course_Name', y='Score', data=df_cleaned, hue='Course_Name', palette='viridis', legend=False)

# Add title and labels
plt.title('Scores per Course Name', fontsize=16)
plt.xlabel('Course Name', fontsize=12)
plt.ylabel('Score', fontsize=12)

# Rotate x-axis labels if they are long to prevent overlap
plt.xticks(rotation=45, ha='right')

# Add grid for better readability
plt.grid(axis='y', linestyle='--', alpha=0.7)

# Adjust layout and display the plot
plt.tight_layout()
plt.show()

# Task
Perform a correlation analysis on the numerical columns of the `df_cleaned` DataFrame, visualize the correlation matrix using a heatmap, and interpret the significant correlations.

## Identify Numerical Columns

### Subtask:
Identify all numerical columns in the `df_cleaned` DataFrame that can be used for correlation analysis.


**Reasoning**:
First, I'll display the information about the `df_cleaned` DataFrame to understand the current data types of its columns. This will help identify columns that are meant to be numerical but might be stored as objects.



In [None]:
df_cleaned.info()

**Reasoning**:
Based on the `df_cleaned.info()` output, 'Cognitave_Load' and 'Score' are already identified as numerical (float64). However, 'CPD Credit Hrs.' and 'Progress' are currently of object type but likely represent numerical values that should be considered for correlation analysis. I will now explicitly select all numerical columns (float64 and int64) to form an initial list and then investigate other object columns to see if they can be converted to numerical types.



In [None]:
numerical_cols = df_cleaned.select_dtypes(include=['float64', 'int64']).columns.tolist()
print(f"Initial numerical columns: {numerical_cols}")

# Inspect 'CPD Credit Hrs.' column
print("\nUnique values in 'CPD Credit Hrs.' before conversion:")
print(df_cleaned['CPD Credit Hrs.'].unique())

# Inspect 'Progress' column
print("\nUnique values in 'Progress' before conversion:")
print(df_cleaned['Progress'].unique())

**Reasoning**:
I need to convert the 'CPD Credit Hrs.' and 'Progress' columns to numerical types. For 'CPD Credit Hrs.', I will calculate the midpoint of the given range. For 'Progress', I will remove the '%' sign and convert it to a float, then divide by 100 to represent it as a decimal.



In [None]:
def convert_cpd_credit_hrs(cpd_str):
    if isinstance(cpd_str, str) and '-' in cpd_str:
        start, end = map(float, cpd_str.split('-'))
        return (start + end) / 2
    try:
        return float(cpd_str)
    except (ValueError, TypeError):
        return None

df_cleaned['CPD_Credit_Hrs_Numeric'] = df_cleaned['CPD Credit Hrs.'].apply(convert_cpd_credit_hrs)

df_cleaned['Progress_Numeric'] = df_cleaned['Progress'].str.replace('%', '').astype(float) / 100


print("Converted 'CPD_Credit_Hrs_Numeric' column info:")
print(df_cleaned['CPD_Credit_Hrs_Numeric'].head())
print("\nConverted 'Progress_Numeric' column info:")
print(df_cleaned['Progress_Numeric'].head())

df_cleaned.info()

**Reasoning**:
The previous code block successfully converted the 'CPD Credit Hrs.' and 'Progress' columns to numerical types, but it raised `SettingWithCopyWarning` warnings. To prevent these warnings, I will explicitly create a copy of the DataFrame before performing the operations using `.copy()`. After that, I will update the list of numerical columns by selecting all columns with `float64` or `int64` dtypes from the modified `df_cleaned` DataFrame.



In [None]:
df_cleaned = df_cleaned.copy()

def convert_cpd_credit_hrs(cpd_str):
    if isinstance(cpd_str, str) and '-' in cpd_str:
        start, end = map(float, cpd_str.split('-'))
        return (start + end) / 2
    try:
        return float(cpd_str)
    except (ValueError, TypeError):
        return None

df_cleaned['CPD_Credit_Hrs_Numeric'] = df_cleaned['CPD Credit Hrs.'].apply(convert_cpd_credit_hrs)
df_cleaned['Progress_Numeric'] = df_cleaned['Progress'].str.replace('%', '').astype(float) / 100

# Update the list of numerical columns after conversion
numerical_cols = df_cleaned.select_dtypes(include=['float64', 'int64']).columns.tolist()

print("Final numerical columns for correlation analysis:")
print(numerical_cols)


## Calculate Correlation Matrix

### Subtask:
Calculate the Pearson correlation matrix for the identified numerical columns in `df_cleaned`.


**Reasoning**:
I will calculate the Pearson correlation matrix for the previously identified numerical columns in the `df_cleaned` DataFrame and store it in a variable named `correlation_matrix`. Then, I will display the correlation matrix to verify the calculation.



In [None]:
correlation_matrix = df_cleaned[numerical_cols].corr(method='pearson')
print("Pearson Correlation Matrix:")
print(correlation_matrix)

### Observation and Refinement

**Observation:**
The correlation matrix shows `NaN` values for the `Progress_Numeric` column. This is because, as observed earlier, the `Progress` column (and thus `Progress_Numeric`) contains only a single unique value ('100.00%' which became 1.0). A variable with no variance cannot have a meaningful correlation with other variables, leading to `NaN` in the correlation matrix.

**Action:**
To obtain a valid correlation matrix, I will remove `Progress_Numeric` from the list of numerical columns and recalculate the correlation matrix. This ensures that only variables with variance are included in the analysis.

**Reasoning**:
Based on the observation that 'Progress_Numeric' has no variance and resulted in `NaN` values in the correlation matrix, I will remove it from the `numerical_cols` list to ensure a meaningful correlation analysis.



In [None]:
numerical_cols.remove('Progress_Numeric')
print(f"Updated numerical columns for correlation analysis: {numerical_cols}")

**Reasoning**:
Now that the `numerical_cols` list has been updated to exclude 'Progress_Numeric', I will recalculate the Pearson correlation matrix using only the valid numerical columns. This will produce a correlation matrix without `NaN` values resulting from zero-variance columns.



In [None]:
correlation_matrix = df_cleaned[numerical_cols].corr(method='pearson')
print("Recalculated Pearson Correlation Matrix:")
print(correlation_matrix)

## Visualize Correlation Matrix

### Subtask:
Create a heatmap to visualize the correlation matrix, showing the strength and direction of relationships between numerical variables. Include annotations for correlation values and a clear color bar.


**Reasoning**:
To visualize the correlation matrix as a heatmap, I will use `seaborn.heatmap()` with the specified parameters for annotations, formatting, colormap, and a descriptive title, ensuring the plot is clearly displayed.



In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Set the figure size for better readability
plt.figure(figsize=(8, 6))

# Create a heatmap to visualize the correlation matrix
sns.heatmap(correlation_matrix, annot=True, fmt='.2f', cmap='coolwarm', linewidths=.5)

# Add a title to the heatmap
plt.title('Correlation Matrix of Numerical Features', fontsize=16)

# Display the plot
plt.show()

## Interpret Correlations

### Subtask:
Provide insights and explanations about the significant correlations observed in the heatmap.


### Interpretation of Correlations:

Based on the correlation matrix and its heatmap visualization, we can observe the following relationships between the numerical variables:

1.  **Cognitave_Load and Score:**
    *   **Correlation Coefficient:** -0.33
    *   **Interpretation:** There is a **moderate negative correlation** between `Cognitave_Load` and `Score`. This suggests that as cognitive load increases, the scores tend to decrease, and vice versa. This is a plausible relationship, as higher cognitive load often indicates more challenging material or a less efficient learning process, which could lead to lower performance.

2.  **Cognitave_Load and CPD_Credit_Hrs_Numeric:**
    *   **Correlation Coefficient:** -0.08
    *   **Interpretation:** There is a **very weak negative correlation** between `Cognitave_Load` and `CPD_Credit_Hrs_Numeric`. This indicates almost no linear relationship between the cognitive load of a course and the number of CPD credit hours it offers. The correlation is very close to zero, suggesting these two variables are largely independent of each other.

3.  **Score and CPD_Credit_Hrs_Numeric:**
    *   **Correlation Coefficient:** -0.01
    *   **Interpretation:** There is an **extremely weak negative correlation** (almost zero) between `Score` and `CPD_Credit_Hrs_Numeric`. This implies that there is virtually no linear relationship between a student's score and the CPD credit hours of a course. The number of credit hours does not seem to predict or be associated with the score achieved.

**Overall Practical Implications:**

*   The most significant relationship observed is between **Cognitive Load and Score**. This highlights that managing cognitive load could be crucial for improving student performance. Strategies to reduce unnecessary cognitive load (e.g., better instruction design, clearer materials) might lead to higher scores.
*   `CPD_Credit_Hrs_Numeric` appears to be largely uncorrelated with both `Cognitave_Load` and `Score`. This suggests that the credit hours assigned to a course may not be a good indicator of its cognitive difficulty or how well students will perform in it. Other factors likely drive both cognitive load and score independently of the credit hours.

### Interpretation of Correlations:

Based on the correlation matrix and its heatmap visualization, we can observe the following relationships between the numerical variables:

1.  **Cognitave_Load and Score:**
    *   **Correlation Coefficient:** -0.33
    *   **Interpretation:** There is a **moderate negative correlation** between `Cognitave_Load` and `Score`. This suggests that as cognitive load increases, the scores tend to decrease, and vice versa. This is a plausible relationship, as higher cognitive load often indicates more challenging material or a less efficient learning process, which could lead to lower performance.

2.  **Cognitave_Load and CPD_Credit_Hrs_Numeric:**
    *   **Correlation Coefficient:** -0.08
    *   **Interpretation:** There is a **very weak negative correlation** between `Cognitave_Load` and `CPD_Credit_Hrs_Numeric`. This indicates almost no linear relationship between the cognitive load of a course and the number of CPD credit hours it offers. The correlation is very close to zero, suggesting these two variables are largely independent of each other.

3.  **Score and CPD_Credit_Hrs_Numeric:**
    *   **Correlation Coefficient:** -0.01
    *   **Interpretation:** There is an **extremely weak negative correlation** (almost zero) between `Score` and `CPD_Credit_Hrs_Numeric`. This implies that there is virtually no linear relationship between a student's score and the CPD credit hours of a course. The number of credit hours does not seem to predict or be associated with the score achieved.

**Overall Practical Implications:**

*   The most significant relationship observed is between **Cognitive Load and Score**. This highlights that managing cognitive load could be crucial for improving student performance. Strategies to reduce unnecessary cognitive load (e.g., better instruction design, clearer materials) might lead to higher scores.
*   `CPD_Credit_Hrs_Numeric` appears to be largely uncorrelated with both `Cognitave_Load` and `Score`. This suggests that the credit hours assigned to a course may not be a good indicator of its cognitive difficulty or how well students will perform in it. Other factors likely drive both cognitive load and score independently of the credit hours.

### Interpretation of Correlations:

Based on the correlation matrix and its heatmap visualization, we can observe the following relationships between the numerical variables:

1.  **Cognitave_Load and Score:**
    *   **Correlation Coefficient:** -0.33
    *   **Interpretation:** There is a **moderate negative correlation** between `Cognitave_Load` and `Score`. This suggests that as cognitive load increases, the scores tend to decrease, and vice versa. This is a plausible relationship, as higher cognitive load often indicates more challenging material or a less efficient learning process, which could lead to lower performance.

2.  **Cognitave_Load and CPD_Credit_Hrs_Numeric:**
    *   **Correlation Coefficient:** -0.08
    *   **Interpretation:** There is a **very weak negative correlation** between `Cognitave_Load` and `CPD_Credit_Hrs_Numeric`. This indicates almost no linear relationship between the cognitive load of a course and the number of CPD credit hours it offers. The correlation is very close to zero, suggesting these two variables are largely independent of each other.

3.  **Score and CPD_Credit_Hrs_Numeric:**
    *   **Correlation Coefficient:** -0.01
    *   **Interpretation:** There is an **extremely weak negative correlation** (almost zero) between `Score` and `CPD_Credit_Hrs_Numeric`. This implies that there is virtually no linear relationship between a student's score and the CPD credit hours of a course. The number of credit hours does not seem to predict or be associated with the score achieved.

**Overall Practical Implications:**

*   The most significant relationship observed is between **Cognitive Load and Score**. This highlights that managing cognitive load could be crucial for improving student performance. Strategies to reduce unnecessary cognitive load (e.g., better instruction design, clearer materials) might lead to higher scores.
*   `CPD_Credit_Hrs_Numeric` appears to be largely uncorrelated with both `Cognitave_Load` and `Score`. This suggests that the credit hours assigned to a course may not be a good indicator of its cognitive difficulty or how well students will perform in it. Other factors likely drive both cognitive load and score independently of the credit hours.

### Interpretation of Correlations:

Based on the correlation matrix and its heatmap visualization, we can observe the following relationships between the numerical variables:

1.  **Cognitave_Load and Score:**
    *   **Correlation Coefficient:** -0.33
    *   **Interpretation:** There is a **moderate negative correlation** between `Cognitave_Load` and `Score`. This suggests that as cognitive load increases, the scores tend to decrease, and vice versa. This is a plausible relationship, as higher cognitive load often indicates more challenging material or a less efficient learning process, which could lead to lower performance.

2.  **Cognitave_Load and CPD_Credit_Hrs_Numeric:**
    *   **Correlation Coefficient:** -0.08
    *   **Interpretation:** There is a **very weak negative correlation** between `Cognitave_Load` and `CPD_Credit_Hrs_Numeric`. This indicates almost no linear relationship between the cognitive load of a course and the number of CPD credit hours it offers. The correlation is very close to zero, suggesting these two variables are largely independent of each other.

3.  **Score and CPD_Credit_Hrs_Numeric:**
    *   **Correlation Coefficient:** -0.01
    *   **Interpretation:** There is an **extremely weak negative correlation** (almost zero) between `Score` and `CPD_Credit_Hrs_Numeric`. This implies that there is virtually no linear relationship between a student's score and the CPD credit hours of a course. The number of credit hours does not seem to predict or be associated with the score achieved.

**Overall Practical Implications:**

*   The most significant relationship observed is between **Cognitive Load and Score**. This highlights that managing cognitive load could be crucial for improving student performance. Strategies to reduce unnecessary cognitive load (e.g., better instruction design, clearer materials) might lead to higher scores.
*   `CPD_Credit_Hrs_Numeric` appears to be largely uncorrelated with both `Cognitave_Load` and `Score`. This suggests that the credit hours assigned to a course may not be a good indicator of its cognitive difficulty or how well students will perform in it. Other factors likely drive both cognitive load and score independently of the credit hours.

## Final Task

### Subtask:
Summarize the key correlations found and ask the user for further analysis or specific investigations.


## Summary:

### Data Analysis Key Findings

*   **Identified Numerical Variables**: The initial numerical columns were `Cognitave_Load` and `Score`. Two additional columns, `CPD Credit Hrs.` and `Progress`, were converted from object types to numerical: `CPD_Credit_Hrs_Numeric` (midpoint of ranges) and `Progress_Numeric` (percentage to float).
*   **Data Cleaning for Correlation**: The `Progress_Numeric` column was excluded from correlation analysis because it had no variance (all values were 1.0), which would result in undefined correlations.
*   **Correlation between Cognitive Load and Score**: A moderate negative correlation of -0.33 was found between `Cognitave_Load` and `Score`, suggesting that as cognitive load increases, student scores tend to decrease.
*   **Lack of Correlation with CPD Credit Hours**:
    *   There was a very weak negative correlation of -0.08 between `Cognitave_Load` and `CPD_Credit_Hrs_Numeric`, indicating almost no linear relationship.
    *   An extremely weak negative correlation of -0.01 was observed between `Score` and `CPD_Credit_Hrs_Numeric`, implying virtually no linear relationship between student performance and the number of CPD credit hours.
*   **Visualization**: A heatmap effectively visualized these correlation coefficients, providing a clear overview of the relationships.

### Insights or Next Steps

*   The inverse relationship between `Cognitave_Load` and `Score` ($p=-0.33$) indicates that instructional design strategies focused on managing or reducing cognitive load could significantly improve student performance. Further investigation into factors contributing to high cognitive load in courses is recommended.
*   Since `CPD_Credit_Hrs_Numeric` showed negligible correlation with both `Cognitave_Load` and `Score`, it suggests that the assigned credit hours might not accurately reflect a course's cognitive demand or predict student success. It would be beneficial to explore other metrics for evaluating course difficulty or impact on student learning.


# Task
Based on the correlation analysis of the `df_cleaned` DataFrame, we found a moderate negative correlation (-0.33) between `Cognitave_Load` and `Score`, indicating that higher cognitive load tends to be associated with lower scores. `CPD_Credit_Hrs_Numeric` showed negligible correlation with both `Cognitave_Load` (-0.08) and `Score` (-0.01).

What further analysis or specific investigations would you like to pursue? For example, would you like to:
*   Investigate the factors contributing to high cognitive load in certain courses?
*   Analyze the relationship between other categorical variables (e.g., `Course_Name`) and `Score` or `Cognitave_Load`?
*   Explore potential non-linear relationships between these numerical variables?
*   Perform any statistical tests to confirm the significance of the observed correlations?

## Identify Numerical Columns

### Subtask:
Confirm the numerical columns in the `df_cleaned` DataFrame that are suitable for correlation analysis.


**Reasoning**:
The subtask is to confirm the numerical columns suitable for correlation analysis. Based on the previous steps and the provided instructions, the final list of numerical columns for correlation analysis is 'Cognitave_Load', 'Score', and 'CPD_Credit_Hrs_Numeric'. I will print this list to confirm it.



In [None]:
final_numerical_cols = ['Cognitave_Load', 'Score', 'CPD_Credit_Hrs_Numeric']
print(f"Confirmed numerical columns for correlation analysis: {final_numerical_cols}")

## Calculate Correlation Matrix

### Subtask:
Calculate the Pearson correlation matrix for the identified numerical columns in `df_cleaned`.


**Reasoning**:
I will calculate the Pearson correlation matrix using the confirmed numerical columns from `df_cleaned` and then print it to display the correlations, as specified in the instructions.



In [None]:
correlation_matrix = df_cleaned[final_numerical_cols].corr(method='pearson')
print("Pearson Correlation Matrix:")
print(correlation_matrix)

**Reasoning**:
The previous code failed because the 'CPD_Credit_Hrs_Numeric' column was not present in the `df_cleaned` DataFrame. This column was supposed to be created in an earlier, unexecuted step. I need to re-include the conversion logic to add this column to `df_cleaned` before calculating the correlation matrix.



In [None]:
def convert_cpd_credit_hrs(cpd_str):
    if isinstance(cpd_str, str) and '-' in cpd_str:
        start, end = map(float, cpd_str.split('-'))
        return (start + end) / 2
    try:
        return float(cpd_str)
    except (ValueError, TypeError):
        return None

# Ensure df_cleaned is a copy to avoid SettingWithCopyWarning
df_cleaned = df_cleaned.copy()

# Apply the conversion to create the new numerical columns
df_cleaned['CPD_Credit_Hrs_Numeric'] = df_cleaned['CPD Credit Hrs.'].apply(convert_cpd_credit_hrs)
df_cleaned['Progress_Numeric'] = df_cleaned['Progress'].str.replace('%', '').astype(float) / 100

# Confirm final_numerical_cols to be used for correlation
final_numerical_cols = ['Cognitave_Load', 'Score', 'CPD_Credit_Hrs_Numeric']

correlation_matrix = df_cleaned[final_numerical_cols].corr(method='pearson')
print("Pearson Correlation Matrix:")
print(correlation_matrix)

## Visualize Correlation Matrix

### Subtask:
Create a heatmap to visualize the correlation matrix, showing the strength and direction of relationships between numerical variables. Include annotations for correlation values and a clear color bar.


**Reasoning**:
To visualize the correlation matrix as a heatmap, I will use `seaborn.heatmap()` with the specified parameters for annotations, formatting, colormap, and a descriptive title, ensuring the plot is clearly displayed.



In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Set the figure size for better readability
plt.figure(figsize=(8, 6))

# Create a heatmap to visualize the correlation matrix
sns.heatmap(correlation_matrix, annot=True, fmt='.2f', cmap='coolwarm', linewidths=.5)

# Add a title to the heatmap
plt.title('Correlation Matrix of Numerical Features', fontsize=16)

# Display the plot
plt.show()

## Interpret Correlations

### Subtask:
Provide insights and explanations about the significant correlations observed in the heatmap.


### Interpretation of Correlations:

Based on the correlation matrix and its heatmap visualization, we can observe the following relationships between the numerical variables:

1.  **Cognitave_Load and Score:**
    *   **Correlation Coefficient:** -0.33
    *   **Interpretation:** There is a **moderate negative correlation** between `Cognitave_Load` and `Score`. This suggests that as cognitive load increases, the scores tend to decrease, and vice versa. This is a plausible relationship, as higher cognitive load often indicates more challenging material or a less efficient learning process, which could lead to lower performance.

2.  **Cognitave_Load and CPD_Credit_Hrs_Numeric:**
    *   **Correlation Coefficient:** -0.08
    *   **Interpretation:** There is a **very weak negative correlation** between `Cognitave_Load` and `CPD_Credit_Hrs_Numeric`. This indicates almost no linear relationship between the cognitive load of a course and the number of CPD credit hours it offers. The correlation is very close to zero, suggesting these two variables are largely independent of each other.

3.  **Score and CPD_Credit_Hrs_Numeric:**
    *   **Correlation Coefficient:** -0.01
    *   **Interpretation:** There is an **extremely weak negative correlation** (almost zero) between `Score` and `CPD_Credit_Hrs_Numeric`. This implies that there is virtually no linear relationship between a student's score and the CPD credit hours of a course. The number of credit hours does not seem to predict or be associated with the score achieved.

**Overall Practical Implications:**

*   The most significant relationship observed is between **Cognitive Load and Score**. This highlights that managing cognitive load could be crucial for improving student performance. Strategies to reduce unnecessary cognitive load (e.g., better instruction design, clearer materials) might lead to higher scores.
*   `CPD_Credit_Hrs_Numeric` appears to be largely uncorrelated with both `Cognitave_Load` and `Score`. This suggests that the credit hours assigned to a course may not be a good indicator of its cognitive difficulty or how well students will perform in it. Other factors likely drive both cognitive load and score independently of the credit hours.

## Final Task

### Subtask:
Summarize the key correlations found and ask the user for further analysis or specific investigations.


## Summary:

### Q&A
Based on the correlation analysis, here are some further analyses or specific investigations that could be pursued:
*   Investigate the factors contributing to high cognitive load in certain courses.
*   Analyze the relationship between other categorical variables (e.g., `Course_Name`) and `Score` or `Cognitave_Load`.
*   Explore potential non-linear relationships between these numerical variables.
*   Perform statistical tests to confirm the significance of the observed correlations.

### Data Analysis Key Findings
*   A moderate negative correlation of -0.33 was found between `Cognitave_Load` and `Score`, suggesting that higher cognitive load is associated with lower scores.
*   `CPD_Credit_Hrs_Numeric` showed a very weak negative correlation of -0.08 with `Cognitave_Load`, indicating almost no linear relationship.
*   `CPD_Credit_Hrs_Numeric` had an extremely weak negative correlation of -0.01 with `Score`, implying virtually no linear relationship.

### Insights or Next Steps
*   The most significant finding is the moderate negative correlation between `Cognitave_Load` and `Score`. This suggests that managing cognitive load could be crucial for improving student performance, and strategies to reduce unnecessary cognitive load might lead to higher scores.
*   `CPD_Credit_Hrs_Numeric` appears to be largely uncorrelated with both `Cognitave_Load` and `Score`. This indicates that the credit hours assigned to a course may not be a reliable indicator of its cognitive difficulty or how well students will perform in it.


# Task
The correlation analysis of `df_cleaned` revealed a moderate negative correlation (-0.33) between `Cognitave_Load` and `Score`, implying that higher cognitive load is associated with lower scores. `CPD_Credit_Hrs_Numeric` showed negligible correlation with both `Cognitave_Load` (-0.08) and `Score` (-0.01).

I'm ready for your next instruction. Please tell me if you'd like to:
*   Investigate the factors contributing to high cognitive load in certain courses.
*   Analyze the relationship between other categorical variables (e.g., `Course_Name`) and `Score` or `Cognitave_Load`.
*   Explore potential non-linear relationships between these numerical variables.
*   Perform statistical tests to confirm the significance of the observed correlations.
*   Or if you have any other specific investigations in mind.

## Calculate Correlation Matrix

### Subtask:
Calculate the Pearson correlation matrix for the identified numerical columns in `df_cleaned`.


**Reasoning**:
I will calculate the Pearson correlation matrix using the confirmed numerical columns from `df_cleaned` and then print it to display the correlations, as specified in the instructions.



In [None]:
correlation_matrix = df_cleaned[final_numerical_cols].corr(method='pearson')
print("Pearson Correlation Matrix:")
print(correlation_matrix)

## Visualize Correlation Matrix

### Subtask:
Create a heatmap to visualize the correlation matrix, showing the strength and direction of relationships between numerical variables. Include annotations for correlation values and a clear color bar.


**Reasoning**:
To visualize the correlation matrix as a heatmap, I will use `seaborn.heatmap()` with the specified parameters for annotations, formatting, colormap, and a descriptive title, ensuring the plot is clearly displayed.



In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Set the figure size for better readability
plt.figure(figsize=(8, 6))

# Create a heatmap to visualize the correlation matrix
sns.heatmap(correlation_matrix, annot=True, fmt='.2f', cmap='coolwarm', linewidths=.5)

# Add a title to the heatmap
plt.title('Correlation Matrix of Numerical Features', fontsize=16)

# Display the plot
plt.show()

In [None]:
print("Columns in the current 'df' DataFrame:")
print(df.columns.tolist())

print(f"\nShape of 'df' (all original rows, some missing values): {df.shape}")
print(f"Shape of 'df_cleaned' (rows with missing values dropped): {df_cleaned.shape}")

print("\n'df' contains all original rows from the spreadsheet, but some columns may have missing values (NaNs). 'df_cleaned' has fewer rows because any row containing at least one missing value was removed. Which DataFrame would you like to use for further analysis?")

## Interpret Correlations

### Subtask:
Provide insights and explanations about the significant correlations observed in the heatmap.


## Interpret Correlations

### Subtask:
Provide insights and explanations about the significant correlations observed in the heatmap.

#### Instructions
1. Examine the heatmap and the `correlation_matrix` to identify the strongest and weakest correlations between the numerical variables.
2. For each pair of correlated variables (e.g., 'Cognitave_Load' and 'Score', 'Cognitave_Load' and 'CPD_Credit_Hrs_Numeric', 'Score' and 'CPD_Credit_Hrs_Numeric'), describe the strength (e.g., strong, moderate, weak, very weak) and direction (positive or negative) of the relationship.
3. Based on these observations, provide an interpretation of what these correlations might mean in the context of the dataset. For instance, explain the practical implications of a negative correlation between cognitive load and score.
4. Summarize the overall practical implications of the correlation analysis, highlighting the most significant findings.

## Interpret Correlations

### Subtask:
Provide insights and explanations about the significant correlations observed in the heatmap.

#### Instructions
1. Examine the heatmap and the `correlation_matrix` to identify the strongest and weakest correlations between the numerical variables.
2. For each pair of correlated variables (e.g., 'Cognitave_Load' and 'Score', 'Cognitave_Load' and 'CPD_Credit_Hrs_Numeric', 'Score' and 'CPD_Credit_Hrs_Numeric'), describe the strength (e.g., strong, moderate, weak, very weak) and direction (positive or negative) of the relationship.
3. Based on these observations, provide an interpretation of what these correlations might mean in the context of the dataset. For instance, explain the practical implications of a negative correlation between cognitive load and score.
4. Summarize the overall practical implications of the correlation analysis, highlighting the most significant findings.

## Interpret Correlations

### Subtask:
Provide insights and explanations about the significant correlations observed in the heatmap.

#### Instructions
1. Examine the heatmap and the `correlation_matrix` to identify the strongest and weakest correlations between the numerical variables.
2. For each pair of correlated variables (e.g., 'Cognitave_Load' and 'Score', 'Cognitave_Load' and 'CPD_Credit_Hrs_Numeric', 'Score' and 'CPD_Credit_Hrs_Numeric'), describe the strength (e.g., strong, moderate, weak, very weak) and direction (positive or negative) of the relationship.
3. Based on these observations, provide an interpretation of what these correlations might mean in the context of the dataset. For instance, explain the practical implications of a negative correlation between cognitive load and score.
4. Summarize the overall practical implications of the correlation analysis, highlighting the most significant findings.


## Interpret Correlations

### Subtask:
Provide insights and explanations about the significant correlations observed in the heatmap.

#### Instructions
1. Examine the heatmap and the `correlation_matrix` to identify the strongest and weakest correlations between the numerical variables.
2. For each pair of correlated variables (e.g., 'Cognitave_Load' and 'Score', 'Cognitave_Load' and 'CPD_Credit_Hrs_Numeric', 'Score' and 'CPD_Credit_Hrs_Numeric'), describe the strength (e.g., strong, moderate, weak, very weak) and direction (positive or negative) of the relationship.
3. Based on these observations, provide an interpretation of what these correlations might mean in the context of the dataset. For instance, explain the practical implications of a negative correlation between cognitive load and score.
4. Summarize the overall practical implications of the correlation analysis, highlighting the most significant findings.

## Interpret Correlations

### Subtask:
Provide insights and explanations about the significant correlations observed in the heatmap.

#### Instructions
1. Examine the heatmap and the `correlation_matrix` to identify the strongest and weakest correlations between the numerical variables.
2. For each pair of correlated variables (e.g., 'Cognitave_Load' and 'Score', 'Cognitave_Load' and 'CPD_Credit_Hrs_Numeric', 'Score' and 'CPD_Credit_Hrs_Numeric'), describe the strength (e.g., strong, moderate, weak, very weak) and direction (positive or negative) of the relationship.
3. Based on these observations, provide an interpretation of what these correlations might mean in the context of the dataset. For instance, explain the practical implications of a negative correlation between cognitive load and score.
4. Summarize the overall practical implications of the correlation analysis, highlighting the most significant findings.

## Interpret Correlations

### Subtask:
Provide insights and explanations about the significant correlations observed in the heatmap.

#### Instructions
1. Examine the heatmap and the `correlation_matrix` to identify the strongest and weakest correlations between the numerical variables.
2. For each pair of correlated variables (e.g., 'Cognitave_Load' and 'Score', 'Cognitave_Load' and 'CPD_Credit_Hrs_Numeric', 'Score' and 'CPD_Credit_Hrs_Numeric'), describe the strength (e.g., strong, moderate, weak, very weak) and direction (positive or negative) of the relationship.
3. Based on these observations, provide an interpretation of what these correlations might mean in the context of the dataset. For instance, explain the practical implications of a negative correlation between cognitive load and score.
4. Summarize the overall practical implications of the correlation analysis, highlighting the most significant findings.

## Interpret Correlations

### Subtask:
Provide insights and explanations about the significant correlations observed in the heatmap.

#### Instructions
1. Examine the heatmap and the `correlation_matrix` to identify the strongest and weakest correlations between the numerical variables.
2. For each pair of correlated variables (e.g., 'Cognitave_Load' and 'Score', 'Cognitave_Load' and 'CPD_Credit_Hrs_Numeric', 'Score' and 'CPD_Credit_Hrs_Numeric'), describe the strength (e.g., strong, moderate, weak, very weak) and direction (positive or negative) of the relationship.
3. Based on these observations, provide an interpretation of what these correlations might mean in the context of the dataset. For instance, explain the practical implications of a negative correlation between cognitive load and score.
4. Summarize the overall practical implications of the correlation analysis, highlighting the most significant findings.

## Interpret Correlations

### Subtask:
Provide insights and explanations about the significant correlations observed in the heatmap.

#### Instructions
1. Examine the heatmap and the `correlation_matrix` to identify the strongest and weakest correlations between the numerical variables.
2. For each pair of correlated variables (e.g., 'Cognitave_Load' and 'Score', 'Cognitave_Load' and 'CPD_Credit_Hrs_Numeric', 'Score' and 'CPD_Credit_Hrs_Numeric'), describe the strength (e.g., strong, moderate, weak, very weak) and direction (positive or negative) of the relationship.
3. Based on these observations, provide an interpretation of what these correlations might mean in the context of the dataset. For instance, explain the practical implications of a negative correlation between cognitive load and score.
4. Summarize the overall practical implications of the correlation analysis, highlighting the most significant findings.

## Interpret Correlations

### Subtask:
Provide insights and explanations about the significant correlations observed in the heatmap.

#### Instructions
1. Examine the heatmap and the `correlation_matrix` to identify the strongest and weakest correlations between the numerical variables.
2. For each pair of correlated variables (e.g., 'Cognitave_Load' and 'Score', 'Cognitave_Load' and 'CPD_Credit_Hrs_Numeric', 'Score' and 'CPD_Credit_Hrs_Numeric'), describe the strength (e.g., strong, moderate, weak, very weak) and direction (positive or negative) of the relationship.
3. Based on these observations, provide an interpretation of what these correlations might mean in the context of the dataset. For instance, explain the practical implications of a negative correlation between cognitive load and score.
4. Summarize the overall practical implications of the correlation analysis, highlighting the most significant findings.

## Interpret Correlations

### Subtask:
Provide insights and explanations about the significant correlations observed in the heatmap.

#### Instructions
1. Examine the heatmap and the `correlation_matrix` to identify the strongest and weakest correlations between the numerical variables.
2. For each pair of correlated variables (e.g., 'Cognitave_Load' and 'Score', 'Cognitave_Load' and 'CPD_Credit_Hrs_Numeric', 'Score' and 'CPD_Credit_Hrs_Numeric'), describe the strength (e.g., strong, moderate, weak, very weak) and direction (positive or negative) of the relationship.
3. Based on these observations, provide an interpretation of what these correlations might mean in the context of the dataset. For instance, explain the practical implications of a negative correlation between cognitive load and score.
4. Summarize the overall practical implications of the correlation analysis, highlighting the most significant findings.

## Interpret Correlations

### Subtask:
Provide insights and explanations about the significant correlations observed in the heatmap.

#### Instructions
1. Examine the heatmap and the `correlation_matrix` to identify the strongest and weakest correlations between the numerical variables.
2. For each pair of correlated variables (e.g., 'Cognitave_Load' and 'Score', 'Cognitave_Load' and 'CPD_Credit_Hrs_Numeric', 'Score' and 'CPD_Credit_Hrs_Numeric'), describe the strength (e.g., strong, moderate, weak, very weak) and direction (positive or negative) of the relationship.
3. Based on these observations, provide an interpretation of what these correlations might mean in the context of the dataset. For instance, explain the practical implications of a negative correlation between cognitive load and score.
4. Summarize the overall practical implications of the correlation analysis, highlighting the most significant findings.

## Interpret Correlations

### Subtask:
Provide insights and explanations about the significant correlations observed in the heatmap.

#### Instructions
1. Examine the heatmap and the `correlation_matrix` to identify the strongest and weakest correlations between the numerical variables.
2. For each pair of correlated variables (e.g., 'Cognitave_Load' and 'Score', 'Cognitave_Load' and 'CPD_Credit_Hrs_Numeric', 'Score' and 'CPD_Credit_Hrs_Numeric'), describe the strength (e.g., strong, moderate, weak, very weak) and direction (positive or negative) of the relationship.
3. Based on these observations, provide an interpretation of what these correlations might mean in the context of the dataset. For instance, explain the practical implications of a negative correlation between cognitive load and score.
4. Summarize the overall practical implications of the correlation analysis, highlighting the most significant findings.

## Interpret Correlations

### Subtask:
Provide insights and explanations about the significant correlations observed in the heatmap.

#### Instructions
1. Examine the heatmap and the `correlation_matrix` to identify the strongest and weakest correlations between the numerical variables.
2. For each pair of correlated variables (e.g., 'Cognitave_Load' and 'Score', 'Cognitave_Load' and 'CPD_Credit_Hrs_Numeric', 'Score' and 'CPD_Credit_Hrs_Numeric'), describe the strength (e.g., strong, moderate, weak, very weak) and direction (positive or negative) of the relationship.
3. Based on these observations, provide an interpretation of what these correlations might mean in the context of the dataset. For instance, explain the practical implications of a negative correlation between cognitive load and score.
4. Summarize the overall practical implications of the correlation analysis, highlighting the most significant findings.

## Interpret Correlations

### Subtask:
Provide insights and explanations about the significant correlations observed in the heatmap.

#### Instructions
1. Examine the heatmap and the `correlation_matrix` to identify the strongest and weakest correlations between the numerical variables.
2. For each pair of correlated variables (e.g., 'Cognitave_Load' and 'Score', 'Cognitave_Load' and 'CPD_Credit_Hrs_Numeric', 'Score' and 'CPD_Credit_Hrs_Numeric'), describe the strength (e.g., strong, moderate, weak, very weak) and direction (positive or negative) of the relationship.
3. Based on these observations, provide an interpretation of what these correlations might mean in the context of the dataset. For instance, explain the practical implications of a negative correlation between cognitive load and score.
4. Summarize the overall practical implications of the correlation analysis, highlighting the most significant findings.


## Interpret Correlations

### Subtask:
Provide insights and explanations about the significant correlations observed in the heatmap.

#### Instructions
1. Examine the heatmap and the `correlation_matrix` to identify the strongest and weakest correlations between the numerical variables.
2. For each pair of correlated variables (e.g., 'Cognitave_Load' and 'Score', 'Cognitave_Load' and 'CPD_Credit_Hrs_Numeric', 'Score' and 'CPD_Credit_Hrs_Numeric'), describe the strength (e.g., strong, moderate, weak, very weak) and direction (positive or negative) of the relationship.
3. Based on these observations, provide an interpretation of what these correlations might mean in the context of the dataset. For instance, explain the practical implications of a negative correlation between cognitive load and score.
4. Summarize the overall practical implications of the correlation analysis, highlighting the most significant findings.

## Interpret Correlations

### Subtask:
Provide insights and explanations about the significant correlations observed in the heatmap.

#### Instructions
1. Examine the heatmap and the `correlation_matrix` to identify the strongest and weakest correlations between the numerical variables.
2. For each pair of correlated variables (e.g., 'Cognitave_Load' and 'Score', 'Cognitave_Load' and 'CPD_Credit_Hrs_Numeric', 'Score' and 'CPD_Credit_Hrs_Numeric'), describe the strength (e.g., strong, moderate, weak, very weak) and direction (positive or negative) of the relationship.
3. Based on these observations, provide an interpretation of what these correlations might mean in the context of the dataset. For instance, explain the practical implications of a negative correlation between cognitive load and score.
4. Summarize the overall practical implications of the correlation analysis, highlighting the most significant findings.

## Interpret Correlations

### Subtask:
Provide insights and explanations about the significant correlations observed in the heatmap.


## Interpret Correlations

### Subtask:
Provide insights and explanations about the significant correlations observed in the heatmap.

#### Instructions
1. Examine the heatmap and the `correlation_matrix` to identify the strongest and weakest correlations between the numerical variables.
2. For each pair of correlated variables (e.g., 'Cognitave_Load' and 'Score', 'Cognitave_Load' and 'CPD_Credit_Hrs_Numeric', 'Score' and 'CPD_Credit_Hrs_Numeric'), describe the strength (e.g., strong, moderate, weak, very weak) and direction (positive or negative) of the relationship.
3. Based on these observations, provide an interpretation of what these correlations might mean in the context of the dataset. For instance, explain the practical implications of a negative correlation between cognitive load and score.
4. Summarize the overall practical implications of the correlation analysis, highlighting the most significant findings.

## Interpret Correlations

### Subtask:
Provide insights and explanations about the significant correlations observed in the heatmap.

#### Instructions
1. Examine the heatmap and the `correlation_matrix` to identify the strongest and weakest correlations between the numerical variables.
2. For each pair of correlated variables (e.g., 'Cognitave_Load' and 'Score', 'Cognitave_Load' and 'CPD_Credit_Hrs_Numeric', 'Score' and 'CPD_Credit_Hrs_Numeric'), describe the strength (e.g., strong, moderate, weak, very weak) and direction (positive or negative) of the relationship.
3. Based on these observations, provide an interpretation of what these correlations might mean in the context of the dataset. For instance, explain the practical implications of a negative correlation between cognitive load and score.
4. Summarize the overall practical implications of the correlation analysis, highlighting the most significant findings.

### Interpretation of Correlations:

Based on the correlation matrix and its heatmap visualization, we can observe the following relationships between the numerical variables:

1.  **Cognitave_Load and Score:**
    *   **Correlation Coefficient:** -0.33
    *   **Interpretation:** There is a **moderate negative correlation** between `Cognitave_Load` and `Score`. This suggests that as cognitive load increases, the scores tend to decrease, and vice versa. This is a plausible relationship, as higher cognitive load often indicates more challenging material or a less efficient learning process, which could lead to lower performance.

2.  **Cognitave_Load and CPD_Credit_Hrs_Numeric:**
    *   **Correlation Coefficient:** -0.08
    *   **Interpretation:** There is a **very weak negative correlation** between `Cognitave_Load` and `CPD_Credit_Hrs_Numeric`. This indicates almost no linear relationship between the cognitive load of a course and the number of CPD credit hours it offers. The correlation is very close to zero, suggesting these two variables are largely independent of each other.

3.  **Score and CPD_Credit_Hrs_Numeric:**
    *   **Correlation Coefficient:** -0.01
    *   **Interpretation:** There is an **extremely weak negative correlation** (almost zero) between `Score` and `CPD_Credit_Hrs_Numeric`. This implies that there is virtually no linear relationship between a student's score and the CPD credit hours of a course. The number of credit hours does not seem to predict or be associated with the score achieved.

**Overall Practical Implications:**

*   The most significant relationship observed is between **Cognitive Load and Score**. This highlights that managing cognitive load could be crucial for improving student performance. Strategies to reduce unnecessary cognitive load (e.g., better instruction design, clearer materials) might lead to higher scores.
*   `CPD_Credit_Hrs_Numeric` appears to be largely uncorrelated with both `Cognitave_Load` and `Score`. This suggests that the credit hours assigned to a course may not be a good indicator of its cognitive difficulty or how well students will perform in it. Other factors likely drive both cognitive load and score independently of the credit hours.

### Interpretation of Correlations:

Based on the correlation matrix and its heatmap visualization, we can observe the following relationships between the numerical variables:

1.  **Cognitave_Load and Score:**
    *   **Correlation Coefficient:** -0.33
    *   **Interpretation:** There is a **moderate negative correlation** between `Cognitave_Load` and `Score`. This suggests that as cognitive load increases, the scores tend to decrease, and vice versa. This is a plausible relationship, as higher cognitive load often indicates more challenging material or a less efficient learning process, which could lead to lower performance.

2.  **Cognitave_Load and CPD_Credit_Hrs_Numeric:**
    *   **Correlation Coefficient:** -0.08
    *   **Interpretation:** There is a **very weak negative correlation** between `Cognitave_Load` and `CPD_Credit_Hrs_Numeric`. This indicates almost no linear relationship between the cognitive load of a course and the number of CPD credit hours it offers. The correlation is very close to zero, suggesting these two variables are largely independent of each other.

3.  **Score and CPD_Credit_Hrs_Numeric:**
    *   **Correlation Coefficient:** -0.01
    *   **Interpretation:** There is an **extremely weak negative correlation** (almost zero) between `Score` and `CPD_Credit_Hrs_Numeric`. This implies that there is virtually no linear relationship between a student's score and the CPD credit hours of a course. The number of credit hours does not seem to predict or be associated with the score achieved.

**Overall Practical Implications:**

*   The most significant relationship observed is between **Cognitive Load and Score**. This highlights that managing cognitive load could be crucial for improving student performance. Strategies to reduce unnecessary cognitive load (e.g., better instruction design, clearer materials) might lead to higher scores.
*   `CPD_Credit_Hrs_Numeric` appears to be largely uncorrelated with both `Cognitave_Load` and `Score`. This suggests that the credit hours assigned to a course may not be a good indicator of its cognitive difficulty or how well students will perform in it. Other factors likely drive both cognitive load and score independently of the credit hours.

### Interpretation of Correlations:

Based on the correlation matrix and its heatmap visualization, we can observe the following relationships between the numerical variables:

1.  **Cognitave_Load and Score:**
    *   **Correlation Coefficient:** -0.33
    *   **Interpretation:** There is a **moderate negative correlation** between `Cognitave_Load` and `Score`. This suggests that as cognitive load increases, the scores tend to decrease, and vice versa. This is a plausible relationship, as higher cognitive load often indicates more challenging material or a less efficient learning process, which could lead to lower performance.

2.  **Cognitave_Load and CPD_Credit_Hrs_Numeric:**
    *   **Correlation Coefficient:** -0.08
    *   **Interpretation:** There is a **very weak negative correlation** between `Cognitave_Load` and `CPD_Credit_Hrs_Numeric`. This indicates almost no linear relationship between the cognitive load of a course and the number of CPD credit hours it offers. The correlation is very close to zero, suggesting these two variables are largely independent of each other.

3.  **Score and CPD_Credit_Hrs_Numeric:**
    *   **Correlation Coefficient:** -0.01
    *   **Interpretation:** There is an **extremely weak negative correlation** (almost zero) between `Score` and `CPD_Credit_Hrs_Numeric`. This implies that there is virtually no linear relationship between a student's score and the CPD credit hours of a course. The number of credit hours does not seem to predict or be associated with the score achieved.

**Overall Practical Implications:**

*   The most significant relationship observed is between **Cognitive Load and Score**. This highlights that managing cognitive load could be crucial for improving student performance. Strategies to reduce unnecessary cognitive load (e.g., better instruction design, clearer materials) might lead to higher scores.
*   `CPD_Credit_Hrs_Numeric` appears to be largely uncorrelated with both `Cognitave_Load` and `Score`. This suggests that the credit hours assigned to a course may not be a good indicator of its cognitive difficulty or how well students will perform in it. Other factors likely drive both cognitive load and score independently of the credit hours.


### Interpretation of Correlations:

Based on the correlation matrix and its heatmap visualization, we can observe the following relationships between the numerical variables:

1.  **Cognitave_Load and Score:**
    *   **Correlation Coefficient:** -0.33
    *   **Interpretation:** There is a **moderate negative correlation** between `Cognitave_Load` and `Score`. This suggests that as cognitive load increases, the scores tend to decrease, and vice versa. This is a plausible relationship, as higher cognitive load often indicates more challenging material or a less efficient learning process, which could lead to lower performance.

2.  **Cognitave_Load and CPD_Credit_Hrs_Numeric:**
    *   **Correlation Coefficient:** -0.08
    *   **Interpretation:** There is a **very weak negative correlation** between `Cognitave_Load` and `CPD_Credit_Hrs_Numeric`. This indicates almost no linear relationship between the cognitive load of a course and the number of CPD credit hours it offers. The correlation is very close to zero, suggesting these two variables are largely independent of each other.

3.  **Score and CPD_Credit_Hrs_Numeric:**
    *   **Correlation Coefficient:** -0.01
    *   **Interpretation:** There is an **extremely weak negative correlation** (almost zero) between `Score` and `CPD_Credit_Hrs_Numeric`. This implies that there is virtually no linear relationship between a student's score and the CPD credit hours of a course. The number of credit hours does not seem to predict or be associated with the score achieved.

**Overall Practical Implications:**

*   The most significant relationship observed is between **Cognitive Load and Score**. This highlights that managing cognitive load could be crucial for improving student performance. Strategies to reduce unnecessary cognitive load (e.g., better instruction design, clearer materials) might lead to higher scores.
*   `CPD_Credit_Hrs_Numeric` appears to be largely uncorrelated with both `Cognitave_Load` and `Score`. This suggests that the credit hours assigned to a course may not be a good indicator of its cognitive difficulty or how well students will perform in it. Other factors likely drive both cognitive load and score independently of the credit hours.


### Interpretation of Correlations:

Based on the correlation matrix and its heatmap visualization, we can observe the following relationships between the numerical variables:

1.  **Cognitave_Load and Score:**
    *   **Correlation Coefficient:** -0.33
    *   **Interpretation:** There is a **moderate negative correlation** between `Cognitave_Load` and `Score`. This suggests that as cognitive load increases, the scores tend to decrease, and vice versa. This is a plausible relationship, as higher cognitive load often indicates more challenging material or a less efficient learning process, which could lead to lower performance.

2.  **Cognitave_Load and CPD_Credit_Hrs_Numeric:**
    *   **Correlation Coefficient:** -0.08
    *   **Interpretation:** There is a **very weak negative correlation** between `Cognitave_Load` and `CPD_Credit_Hrs_Numeric`. This indicates almost no linear relationship between the cognitive load of a course and the number of CPD credit hours it offers. The correlation is very close to zero, suggesting these two variables are largely independent of each other.

3.  **Score and CPD_Credit_Hrs_Numeric:**
    *   **Correlation Coefficient:** -0.01
    *   **Interpretation:** There is an **extremely weak negative correlation** (almost zero) between `Score` and `CPD_Credit_Hrs_Numeric`. This implies that there is virtually no linear relationship between a student's score and the CPD credit hours of a course. The number of credit hours does not seem to predict or be associated with the score achieved.

**Overall Practical Implications:**

*   The most significant relationship observed is between **Cognitive Load and Score**. This highlights that managing cognitive load could be crucial for improving student performance. Strategies to reduce unnecessary cognitive load (e.g., better instruction design, clearer materials) might lead to higher scores.
*   `CPD_Credit_Hrs_Numeric` appears to be largely uncorrelated with both `Cognitave_Load` and `Score`. This suggests that the credit hours assigned to a course may not be a good indicator of its cognitive difficulty or how well students will perform in it. Other factors likely drive both cognitive load and score independently of the credit hours.


### Interpretation of Correlations:

Based on the correlation matrix and its heatmap visualization, we can observe the following relationships between the numerical variables:

1.  **Cognitave_Load and Score:**
    *   **Correlation Coefficient:** -0.33
    *   **Interpretation:** There is a **moderate negative correlation** between `Cognitave_Load` and `Score`. This suggests that as cognitive load increases, the scores tend to decrease, and vice versa. This is a plausible relationship, as higher cognitive load often indicates more challenging material or a less efficient learning process, which could lead to lower performance.

2.  **Cognitave_Load and CPD_Credit_Hrs_Numeric:**
    *   **Correlation Coefficient:** -0.08
    *   **Interpretation:** There is a **very weak negative correlation** between `Cognitave_Load` and `CPD_Credit_Hrs_Numeric`. This indicates almost no linear relationship between the cognitive load of a course and the number of CPD credit hours it offers. The correlation is very close to zero, suggesting these two variables are largely independent of each other.

3.  **Score and CPD_Credit_Hrs_Numeric:**
    *   **Correlation Coefficient:** -0.01
    *   **Interpretation:** There is an **extremely weak negative correlation** (almost zero) between `Score` and `CPD_Credit_Hrs_Numeric`. This implies that there is virtually no linear relationship between a student's score and the CPD credit hours of a course. The number of credit hours does not seem to predict or be associated with the score achieved.

**Overall Practical Implications:**

*   The most significant relationship observed is between **Cognitive Load and Score**. This highlights that managing cognitive load could be crucial for improving student performance. Strategies to reduce unnecessary cognitive load (e.g., better instruction design, clearer materials) might lead to higher scores.
*   `CPD_Credit_Hrs_Numeric` appears to be largely uncorrelated with both `Cognitave_Load` and `Score`. This suggests that the credit hours assigned to a course may not be a good indicator of its cognitive difficulty or how well students will perform in it. Other factors likely drive both cognitive load and score independently of the credit hours.


### Interpretation of Correlations:

Based on the correlation matrix and its heatmap visualization, we can observe the following relationships between the numerical variables:

1.  **Cognitave_Load and Score:**
    *   **Correlation Coefficient:** -0.33
    *   **Interpretation:** There is a **moderate negative correlation** between `Cognitave_Load` and `Score`. This suggests that as cognitive load increases, the scores tend to decrease, and vice versa. This is a plausible relationship, as higher cognitive load often indicates more challenging material or a less efficient learning process, which could lead to lower performance.

2.  **Cognitave_Load and CPD_Credit_Hrs_Numeric:**
    *   **Correlation Coefficient:** -0.08
    *   **Interpretation:** There is a **very weak negative correlation** between `Cognitave_Load` and `CPD_Credit_Hrs_Numeric`. This indicates almost no linear relationship between the cognitive load of a course and the number of CPD credit hours it offers. The correlation is very close to zero, suggesting these two variables are largely independent of each other.

3.  **Score and CPD_Credit_Hrs_Numeric:**
    *   **Correlation Coefficient:** -0.01
    *   **Interpretation:** There is an **extremely weak negative correlation** (almost zero) between `Score` and `CPD_Credit_Hrs_Numeric`. This implies that there is virtually no linear relationship between a student's score and the CPD credit hours of a course. The number of credit hours does not seem to predict or be associated with the score achieved.

**Overall Practical Implications:**

*   The most significant relationship observed is between **Cognitive Load and Score**. This highlights that managing cognitive load could be crucial for improving student performance. Strategies to reduce unnecessary cognitive load (e.g., better instruction design, clearer materials) might lead to higher scores.
*   `CPD_Credit_Hrs_Numeric` appears to be largely uncorrelated with both `Cognitave_Load` and `Score`. This suggests that the credit hours assigned to a course may not be a good indicator of its cognitive difficulty or how well students will perform in it. Other factors likely drive both cognitive load and score independently of the credit hours.


### Interpretation of Correlations:

Based on the correlation matrix and its heatmap visualization, we can observe the following relationships between the numerical variables:

1.  **Cognitave_Load and Score:**
    *   **Correlation Coefficient:** -0.33
    *   **Interpretation:** There is a **moderate negative correlation** between `Cognitave_Load` and `Score`. This suggests that as cognitive load increases, the scores tend to decrease, and vice versa. This is a plausible relationship, as higher cognitive load often indicates more challenging material or a less efficient learning process, which could lead to lower performance.

2.  **Cognitave_Load and CPD_Credit_Hrs_Numeric:**
    *   **Correlation Coefficient:** -0.08
    *   **Interpretation:** There is a **very weak negative correlation** between `Cognitave_Load` and `CPD_Credit_Hrs_Numeric`. This indicates almost no linear relationship between the cognitive load of a course and the number of CPD credit hours it offers. The correlation is very close to zero, suggesting these two variables are largely independent of each other.

3.  **Score and CPD_Credit_Hrs_Numeric:**
    *   **Correlation Coefficient:** -0.01
    *   **Interpretation:** There is an **extremely weak negative correlation** (almost zero) between `Score` and `CPD_Credit_Hrs_Numeric`. This implies that there is virtually no linear relationship between a student's score and the CPD credit hours of a course. The number of credit hours does not seem to predict or be associated with the score achieved.

**Overall Practical Implications:**

*   The most significant relationship observed is between **Cognitive Load and Score**. This highlights that managing cognitive load could be crucial for improving student performance. Strategies to reduce unnecessary cognitive load (e.g., better instruction design, clearer materials) might lead to higher scores.
*   `CPD_Credit_Hrs_Numeric` appears to be largely uncorrelated with both `Cognitave_Load` and `Score`. This suggests that the credit hours assigned to a course may not be a good indicator of its cognitive difficulty or how well students will perform in it. Other factors likely drive both cognitive load and score independently of the credit hours.



### Interpretation of Correlations:

Based on the correlation matrix and its heatmap visualization, we can observe the following relationships between the numerical variables:

1.  **Cognitave_Load and Score:**
    *   **Correlation Coefficient:** -0.33
    *   **Interpretation:** There is a **moderate negative correlation** between `Cognitave_Load` and `Score`. This suggests that as cognitive load increases, the scores tend to decrease, and vice versa. This is a plausible relationship, as higher cognitive load often indicates more challenging material or a less efficient learning process, which could lead to lower performance.

2.  **Cognitave_Load and CPD_Credit_Hrs_Numeric:**
    *   **Correlation Coefficient:** -0.08
    *   **Interpretation:** There is a **very weak negative correlation** between `Cognitave_Load` and `CPD_Credit_Hrs_Numeric`. This indicates almost no linear relationship between the cognitive load of a course and the number of CPD credit hours it offers. The correlation is very close to zero, suggesting these two variables are largely independent of each other.

3.  **Score and CPD_Credit_Hrs_Numeric:**
    *   **Correlation Coefficient:** -0.01
    *   **Interpretation:** There is an **extremely weak negative correlation** (almost zero) between `Score` and `CPD_Credit_Hrs_Numeric`. This implies that there is virtually no linear relationship between a student's score and the CPD credit hours of a course. The number of credit hours does not seem to predict or be associated with the score achieved.

**Overall Practical Implications:**

*   The most significant relationship observed is between **Cognitive Load and Score**. This highlights that managing cognitive load could be crucial for improving student performance. Strategies to reduce unnecessary cognitive load (e.g., better instruction design, clearer materials) might lead to higher scores.
*   `CPD_Credit_Hrs_Numeric` appears to be largely uncorrelated with both `Cognitave_Load` and `Score`. This suggests that the credit hours assigned to a course may not be a good indicator of its cognitive difficulty or how well students will perform in it. Other factors likely drive both cognitive load and score independently of the credit hours.



In [None]:
print("Summary statistics for df_cleaned:")
display(df_cleaned.describe())

### Interpretation of Correlations:

Based on the correlation matrix and its heatmap visualization, we can observe the following relationships between the numerical variables:

1.  **Cognitave_Load and Score:**
    *   **Correlation Coefficient:** -0.33
    *   **Interpretation:** There is a **moderate negative correlation** between `Cognitave_Load` and `Score`. This suggests that as cognitive load increases, the scores tend to decrease, and vice versa. This is a plausible relationship, as higher cognitive load often indicates more challenging material or a less efficient learning process, which could lead to lower performance.

2.  **Cognitave_Load and CPD_Credit_Hrs_Numeric:**
    *   **Correlation Coefficient:** -0.08
    *   **Interpretation:** There is a **very weak negative correlation** between `Cognitave_Load` and `CPD_Credit_Hrs_Numeric`. This indicates almost no linear relationship between the cognitive load of a course and the number of CPD credit hours it offers. The correlation is very close to zero, suggesting these two variables are largely independent of each other.

3.  **Score and CPD_Credit_Hrs_Numeric:**
    *   **Correlation Coefficient:** -0.01
    *   **Interpretation:** There is an **extremely weak negative correlation** (almost zero) between `Score` and `CPD_Credit_Hrs_Numeric`. This implies that there is virtually no linear relationship between a student's score and the CPD credit hours of a course. The number of credit hours does not seem to predict or be associated with the score achieved.

**Overall Practical Implications:**

*   The most significant relationship observed is between **Cognitive Load and Score**. This highlights that managing cognitive load could be crucial for improving student performance. Strategies to reduce unnecessary cognitive load (e.g., better instruction design, clearer materials) might lead to higher scores.
*   `CPD_Credit_Hrs_Numeric` appears to be largely uncorrelated with both `Cognitave_Load` and `Score`. This suggests that the credit hours assigned to a course may not be a good indicator of its cognitive difficulty or how well students will perform in it. Other factors likely drive both cognitive load and score independently of the credit hours.

### Interpretation of Correlations:

Based on the correlation matrix and its heatmap visualization, we can observe the following relationships between the numerical variables:

1.  **Cognitave_Load and Score:**
    *   **Correlation Coefficient:** -0.33
    *   **Interpretation:** There is a **moderate negative correlation** between `Cognitave_Load` and `Score`. This suggests that as cognitive load increases, the scores tend to decrease, and vice versa. This is a plausible relationship, as higher cognitive load often indicates more challenging material or a less efficient learning process, which could lead to lower performance.

2.  **Cognitave_Load and CPD_Credit_Hrs_Numeric:**
    *   **Correlation Coefficient:** -0.08
    *   **Interpretation:** There is a **very weak negative correlation** between `Cognitave_Load` and `CPD_Credit_Hrs_Numeric`. This indicates almost no linear relationship between the cognitive load of a course and the number of CPD credit hours it offers. The correlation is very close to zero, suggesting these two variables are largely independent of each other.

3.  **Score and CPD_Credit_Hrs_Numeric:**
    *   **Correlation Coefficient:** -0.01
    *   **Interpretation:** There is an **extremely weak negative correlation** (almost zero) between `Score` and `CPD_Credit_Hrs_Numeric`. This implies that there is virtually no linear relationship between a student's score and the CPD credit hours of a course. The number of credit hours does not seem to predict or be associated with the score achieved.

**Overall Practical Implications:**

*   The most significant relationship observed is between **Cognitive Load and Score**. This highlights that managing cognitive load could be crucial for improving student performance. Strategies to reduce unnecessary cognitive load (e.g., better instruction design, clearer materials) might lead to higher scores.
*   `CPD_Credit_Hrs_Numeric` appears to be largely uncorrelated with both `Cognitave_Load` and `Score`. This suggests that the credit hours assigned to a course may not be a good indicator of its cognitive difficulty or how well students will perform in it. Other factors likely drive both cognitive load and score independently of the credit hours.

### Interpretation of Correlations:

Based on the correlation matrix and its heatmap visualization, we can observe the following relationships between the numerical variables:

1.  **Cognitave_Load and Score:**
    *   **Correlation Coefficient:** -0.33
    *   **Interpretation:** There is a **moderate negative correlation** between `Cognitave_Load` and `Score`. This suggests that as cognitive load increases, the scores tend to decrease, and vice versa. This is a plausible relationship, as higher cognitive load often indicates more challenging material or a less efficient learning process, which could lead to lower performance.

2.  **Cognitave_Load and CPD_Credit_Hrs_Numeric:**
    *   **Correlation Coefficient:** -0.08
    *   **Interpretation:** There is a **very weak negative correlation** between `Cognitave_Load` and `CPD_Credit_Hrs_Numeric`. This indicates almost no linear relationship between the cognitive load of a course and the number of CPD credit hours it offers. The correlation is very close to zero, suggesting these two variables are largely independent of each other.

3.  **Score and CPD_Credit_Hrs_Numeric:**
    *   **Correlation Coefficient:** -0.01
    *   **Interpretation:** There is an **extremely weak negative correlation** (almost zero) between `Score` and `CPD_Credit_Hrs_Numeric`. This implies that there is virtually no linear relationship between a student's score and the CPD credit hours of a course. The number of credit hours does not seem to predict or be associated with the score achieved.

**Overall Practical Implications:**

*   The most significant relationship observed is between **Cognitive Load and Score**. This highlights that managing cognitive load could be crucial for improving student performance. Strategies to reduce unnecessary cognitive load (e.g., better instruction design, clearer materials) might lead to higher scores.
*   `CPD_Credit_Hrs_Numeric` appears to be largely uncorrelated with both `Cognitave_Load` and `Score`. This suggests that the credit hours assigned to a course may not be a good indicator of its cognitive difficulty or how well students will perform in it. Other factors likely drive both cognitive load and score independently of the credit hours.

### Interpretation of Correlations:

Based on the correlation matrix and its heatmap visualization, we can observe the following relationships between the numerical variables:

1.  **Cognitave_Load and Score:**
    *   **Correlation Coefficient:** -0.33
    *   **Interpretation:** There is a **moderate negative correlation** between `Cognitave_Load` and `Score`. This suggests that as cognitive load increases, the scores tend to decrease, and vice versa. This is a plausible relationship, as higher cognitive load often indicates more challenging material or a less efficient learning process, which could lead to lower performance.

2.  **Cognitave_Load and CPD_Credit_Hrs_Numeric:**
    *   **Correlation Coefficient:** -0.08
    *   **Interpretation:** There is a **very weak negative correlation** between `Cognitave_Load` and `CPD_Credit_Hrs_Numeric`. This indicates almost no linear relationship between the cognitive load of a course and the number of CPD credit hours it offers. The correlation is very close to zero, suggesting these two variables are largely independent of each other.

3.  **Score and CPD_Credit_Hrs_Numeric:**
    *   **Correlation Coefficient:** -0.01
    *   **Interpretation:** There is an **extremely weak negative correlation** (almost zero) between `Score` and `CPD_Credit_Hrs_Numeric`. This implies that there is virtually no linear relationship between a student's score and the CPD credit hours of a course. The number of credit hours does not seem to predict or be associated with the score achieved.

**Overall Practical Implications:**

*   The most significant relationship observed is between **Cognitive Load and Score**. This highlights that managing cognitive load could be crucial for improving student performance. Strategies to reduce unnecessary cognitive load (e.g., better instruction design, clearer materials) might lead to higher scores.
*   `CPD_Credit_Hrs_Numeric` appears to be largely uncorrelated with both `Cognitave_Load` and `Score`. This suggests that the credit hours assigned to a course may not be a good indicator of its cognitive difficulty or how well students will perform in it. Other factors likely drive both cognitive load and score independently of the credit hours.

### Interpretation of Correlations:

Based on the correlation matrix and its heatmap visualization, we can observe the following relationships between the numerical variables:

1.  **Cognitave_Load and Score:**
    *   **Correlation Coefficient:** -0.33
    *   **Interpretation:** There is a **moderate negative correlation** between `Cognitave_Load` and `Score`. This suggests that as cognitive load increases, the scores tend to decrease, and vice versa. This is a plausible relationship, as higher cognitive load often indicates more challenging material or a less efficient learning process, which could lead to lower performance.

2.  **Cognitave_Load and CPD_Credit_Hrs_Numeric:**
    *   **Correlation Coefficient:** -0.08
    *   **Interpretation:** There is a **very weak negative correlation** between `Cognitave_Load` and `CPD_Credit_Hrs_Numeric`. This indicates almost no linear relationship between the cognitive load of a course and the number of CPD credit hours it offers. The correlation is very close to zero, suggesting these two variables are largely independent of each other.

3.  **Score and CPD_Credit_Hrs_Numeric:**
    *   **Correlation Coefficient:** -0.01
    *   **Interpretation:** There is an **extremely weak negative correlation** (almost zero) between `Score` and `CPD_Credit_Hrs_Numeric`. This implies that there is virtually no linear relationship between a student's score and the CPD credit hours of a course. The number of credit hours does not seem to predict or be associated with the score achieved.

**Overall Practical Implications:**

*   The most significant relationship observed is between **Cognitive Load and Score**. This highlights that managing cognitive load could be crucial for improving student performance. Strategies to reduce unnecessary cognitive load (e.g., better instruction design, clearer materials) might lead to higher scores.
*   `CPD_Credit_Hrs_Numeric` appears to be largely uncorrelated with both `Cognitave_Load` and `Score`. This suggests that the credit hours assigned to a course may not be a good indicator of its cognitive difficulty or how well students will perform in it. Other factors likely drive both cognitive load and score independently of the credit hours.

## Final Task

### Subtask:
Summarize the key correlations found and ask the user for further analysis or specific investigations.


## Summary:

### Q&A
*   **What are the key correlations found?**
    *   There is a moderate negative correlation (-0.33) between `Cognitave_Load` and `Score`, meaning higher cognitive load tends to be associated with lower scores.
    *   `CPD_Credit_Hrs_Numeric` shows a very weak negative correlation (-0.08) with `Cognitave_Load`.
    *   `CPD_Credit_Hrs_Numeric` has an extremely weak negative correlation (-0.01) with `Score`.

### Data Analysis Key Findings
*   A Pearson correlation matrix was calculated for `Cognitave_Load`, `Score`, and `CPD_Credit_Hrs_Numeric`.
*   A moderate negative correlation of -0.328 was identified between `Cognitave_Load` and `Score`, suggesting that as cognitive load increases, scores tend to decrease.
*   `CPD_Credit_Hrs_Numeric` showed a very weak negative correlation of -0.085 with `Cognitave_Load`.
*   `CPD_Credit_Hrs_Numeric` exhibited an extremely weak negative correlation of -0.011 with `Score`, indicating virtually no linear relationship.
*   The correlation matrix was successfully visualized using a heatmap, which confirmed these relationships.

### Insights or Next Steps
*   Managing cognitive load is crucial for improving student performance; strategies to reduce unnecessary cognitive load (e.g., through better instruction design) could lead to higher scores.
*   `CPD_Credit_Hrs_Numeric` does not appear to be a good indicator of a course's cognitive difficulty or how well students will perform. Further analysis could explore other factors influencing cognitive load and scores independently of credit hours.
*   Consider exploring potential non-linear relationships between these numerical variables, analyzing relationships with categorical variables like `Course_Name`, or performing statistical tests to confirm the significance of the observed correlations.
