### **Linear Regression: Predicting Students' Scores**

#### Objective:
Your task is to build a **Linear Regression** model to predict students' scores based on the number of hours they study. This homework will help you understand the relationship between variables and how to use Linear Regression for prediction tasks.

---

#### Dataset:
You will work with a dataset containing the following columns:
- **Hours**: The number of hours a student studies per day.
- **Scores**: The percentage score obtained by the student in a test.

Dataset Source:
- Use the [Students' Scores dataset](https://raw.githubusercontent.com/AdiPersonalWorks/Random/master/student_scores%20-%20student_scores.csv) or create your own simple dataset with similar columns.

---

#### Steps to Complete:

1. **Data Loading and Exploration**  
   - Load the dataset using `pandas`.
   - Visualize the relationship between **Hours** and **Scores** using a scatter plot.
   - Check for outliers, missing values, or anomalies in the dataset.

2. **Build a Linear Regression Model**  
   - Split the dataset into **training** and **test sets** (80% training, 20% testing).
   - Implement a **Simple Linear Regression** model using:
     - **Manual Calculation** (Optional): Calculate the slope $m$ and intercept $b$ using the formulas for linear regression:
       $$
       m = \frac{\sum{(x_i - \bar{x})(y_i - \bar{y})}}{\sum{(x_i - \bar{x})^2}}, \quad b = \bar{y} - m\bar{x}
       $$
     - **Scikit-learn**: Use `LinearRegression` from `sklearn.linear_model` to fit the model.

3. **Model Evaluation**  
   - Predict the test set scores using the model.
   - Evaluate the model’s performance using:
     - **Mean Absolute Error (MAE)**
     - **Mean Squared Error (MSE)**
     - **R² Score**
   - Interpret the coefficients and intercept of the linear equation.

4. **Visualization**  
   - Plot the **regression line** over the scatter plot of **Hours** vs. **Scores**.
   - Visualize the residuals (errors) between the actual and predicted scores.

5. **Prediction**  
   - Use the trained model to predict the score of a student who studies for 9.25 hours per day.
   - Discuss whether the prediction seems reasonable based on the dataset.

---

#### Bonus Challenges (Optional):

1. **Polynomial Regression**:  
   - Extend the problem to a **Polynomial Regression** model by adding higher-degree terms (e.g., $ x^2 $) to the features. Does the model perform better?

2. **Regularization**:  
   - Apply **Ridge** or **Lasso Regression** to prevent overfitting and compare the results with the simple linear regression.

3. **Multiple Linear Regression**:  
   - Expand the dataset to include additional features like:
     - **Previous Test Scores**
     - **Hours of Sleep**
     - **Time Spent on Social Media**
   - Fit a multiple linear regression model and evaluate its performance.

4. **Custom Dataset**:  
   - Create a new dataset with real-world scenarios (e.g., salary vs. years of experience) and apply linear regression to analyze it.

---

#### Deliverables:
- A Python script or Jupyter Notebook containing:
  - Data loading, preprocessing, and visualization.
  - Implementation of Simple Linear Regression.
  - Model evaluation and insights.
  - Predictions and their interpretation.
- A brief report discussing:
  - What factors influenced the model's predictions?
  - What are the limitations of using linear regression for this dataset?

---

#### Useful Hints:
- Use `numpy` for manual calculations and matrix operations.
- Use `pandas` for data manipulation and `matplotlib`/`seaborn` for visualizations.
- Test the effect of adding/removing features (if extended to multiple regression).