# Regression Model Project
# Project Overview

This project focuses on building and validating a regression model using the provided dataset. The data was divided into two groups, referred to as "group1" and "group2." The regression model for both groups was assumed to be a polynomial function involving five variables: `time`, `var1`, `var2`, `var3`, and `var4`.

## Model Assumptions and Approach

- **Polynomial Regression Model**: The model was designed as a polynomial of the five variables mentioned above. The exponents for each variable were selected to maximize the R² (R-squared) value and minimize the p-value of the individual coefficients.

- **Ordinary Least Squares (OLS) Method**: The model was implemented using the Ordinary Least Squares (OLS) method from the `statsmodels.api` package in Python.

## Data Splitting

The dataset was split into training and testing sets to evaluate the model's performance. The number of samples in each group is summarized as follows:
- **Training Data Count**: The number of samples used for training the model.
- **Testing Data Count**: The number of samples used for testing the model.

## Results and Visualization

- **Analysis Results**: The results of the regression analysis are presented in the accompanying report.
- **Scatter Plots**: Scatter plots of predicted vs actual values (test data) are provided in the appendices. These plots include residuals that are gradually colored to visualize the model's performance.

## Conclusion

This project applies polynomial regression using OLS to model the relationship between the dependent variable and five independent variables across two distinct groups in the dataset. The performance metrics and visualizations help in understanding the model's accuracy and predictive power.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm

from sklearn.model_selection import train_test_split
from statsmodels.tools.eval_measures import mse, rmse
from sklearn.metrics import mean_absolute_error