# Student Performance Dataset
## Task Overview
In this lab activity, your task is to build a Linear Regression model using a subset of data points from the Student Performance data. The following are the specific sub-tasks you must accomplish.

1. **Load the Data** 
- Load the provided Student Performance data in your Jupyter Notebook.
2. **Select a Unique Randomization Seed** 
- Select a unique integer that will serve as the seed for your randomization.
3. **Sample Train Data** 
- Randomly sample a subset of the data using the seed you have selected; limit the sample to 30. Ensure that your sampled data is representative of the population.
4. **Weight Update Function** 
- Build a weight update function following the Gradient Descent concept.
5. **Display the Values of Weights** 
- Print the values of weights at each iteration separated by individual cell.
6. **Plot the Value of Weights** 
- Display a line chart showing the variation of weight values per iteration. Per each weight, show an individual line chart of values against iteration.
7. **Build a Function for the Final Regression Model** 
- Create a function using the final regression model after all your iterations. Display the mathematical expression with all the final weights values multiplied by the input variables.
8. **Sample Test Data** 
- From the remainder of the original dataset, randomly sample another set of 30 observations NOT present in your training sample.
9. **Use the Regression Function for Prediction** 
- Use your built linear regression function to predict for the Target Variable in your test set.
10. **Calculate for Errors** 
- Calculate for the overall error between your model’s prediction and the actual values in the test set.

The final deliverable will include your code implementation divided into sections according to above subtasks.

## Dataset Overview
The Student Performance Dataset is a dataset designed to examine the factors influencing academic student performance. The dataset consists of 10,000 student records, with each record containing information about various predictors and a performance index.

The dataset aims to provide insights into the relationship between the predictor variables and the performance index. Researchers and data analysts can use this dataset to explore the impact of studying hours, previous scores, extracurricular activities, sleep hours, and sample question papers on student performance.

## Dataset Attributes
The dataset consists of 5 input features and one target variable:

**Input Features:**
- **Hours Studied:** The total number of hours spent studying by each student.
- **Previous Scores:** The scores obtained by students in previous tests.
- **Extracurricular Activities:** Whether the student participates in extracurricular activities (1 = Yes or 0 = No).
- **Sleep Hours:** The average number of hours of sleep the student had per day.
- **Sample Question Papers Practiced:** The number of sample question papers the student practiced.

**Target Variable**
- **Performance Index:** A measure of the overall performance of each student. The performance index represents the student's academic performance and has been rounded to the nearest integer. The index ranges from 10 to 100, with higher values indicating better performance.



## Load the Data
Load the provided Student Performance data in your Jupyter Notebook.

In [6]:
# Import necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import itertools
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.linear_model import LogisticRegression, LinearRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score, classification_report, confusion_matrix
from sklearn.model_selection import  GridSearchCV

# Load the Student Performance Dataset
data = pd.read_csv('./Student_Performance (1).csv')


In [None]:
# Check the non-null count and the data types
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 6 columns):
 #   Column                            Non-Null Count  Dtype
---  ------                            --------------  -----
 0   Hours Studied                     10000 non-null  int64
 1   Previous Scores                   10000 non-null  int64
 2   Extracurricular Activities        10000 non-null  int64
 3   Sleep Hours                       10000 non-null  int64
 4   Sample Question Papers Practiced  10000 non-null  int64
 5   Performance Index                 10000 non-null  int64
dtypes: int64(6)
memory usage: 468.9 KB


In [None]:
# Examine the basic statistical information of the dataset
data.describe()

Unnamed: 0,Hours Studied,Previous Scores,Extracurricular Activities,Sleep Hours,Sample Question Papers Practiced,Performance Index
count,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0
mean,4.9929,69.4457,0.4948,6.5306,4.5833,55.2248
std,2.589309,17.343152,0.499998,1.695863,2.867348,19.212558
min,1.0,40.0,0.0,4.0,0.0,10.0
25%,3.0,54.0,0.0,5.0,2.0,40.0
50%,5.0,69.0,0.0,7.0,5.0,55.0
75%,7.0,85.0,1.0,8.0,7.0,71.0
max,9.0,99.0,1.0,9.0,9.0,100.0


In [9]:
# Display the top and bottom 5 rows of the dataset
pd.concat([data.head(), data.tail()])

Unnamed: 0,Hours Studied,Previous Scores,Extracurricular Activities,Sleep Hours,Sample Question Papers Practiced,Performance Index
0,7,99,1,9,1,91
1,4,82,0,4,2,65
2,8,51,1,7,2,45
3,5,52,1,5,2,36
4,7,75,0,8,5,66
9995,1,49,1,4,2,23
9996,7,64,1,8,5,58
9997,6,83,1,8,5,74
9998,9,97,1,7,0,95
9999,7,74,0,8,1,64


### Select a Unique Randomization Seed
Select a unique integer that will serve as the seed for your randomization.

In [None]:
###

### Sample Train Data
Randomly sample a subset of the data using the seed you have selected; limit the sample to 30. Ensure that your sampled data is representative of the population.

In [None]:
###

### Weight Update Function
Build a weight update function following the Gradient Descent concept.

In [None]:
###

### Display the Values of Weights
Print the values of weights at each iteration separated by individual cell.

In [None]:
###

### Plot the Value of Weights
Display a line chart showing the variation of weight values per iteration. Per each weight, show an individual line chart of values against iteration.

In [None]:
###

### Build a Function for the Final Regression Model
Create a function using the final regression model after all your iterations. Display the mathematical expression with all the final weights values multiplied by the input variables.

In [None]:
###

### Sample Test Data
From the remainder of the original dataset, randomly sample another set of 30 observations NOT present in your training sample.

In [None]:
###

### Use the Regression Function for Prediction
Use your built linear regression function to predict for the Target Variable in your test set.

In [None]:
###

### Calculate for Errors
Calculate for the overall error between your model’s prediction and the actual values in the test set.

In [None]:
###