### Group Mustang Pandan Hijau:
1. Keith Cahyawiyanata (2141720217)
2. Khafillah Akbar Syahputra (2141720152)
3. Muhammad Asad (2141720269)
4. Mumtaz Zain Abdullah (2141720205)
5. Septian Fahmi Ardiansyah (2141720148)

We require pertinent data to train and assess a regression model that can predict the "wearing-off" of anti-Parkinson's disease medicine. Typically, we would need a dataset with features (independent variables) that affect how quickly a medication wears off and a target variable (dependent variable) that gauges how much wear-off has occurred.

It's crucial to remember that we should thoroughly examine and prepare the data before building the model. This may entail dealing with missing values, scaling variables if necessary, and examining the independent variables for potential multicollinearity problems.

We can move forward with creating the regression model after preprocessing the data. The "wearing_off" column is the goal variable here, and the following are some possible independent variables:

heart_rate: The heart rate of the participant.

steps: The number of steps taken by the participant.

stress_score: A score indicating the stress level of the participant.

awake: Time spent awake during sleep.

deep: Time spent in deep sleep.

light: Time spent in light sleep.

rem: Time spent in REM sleep.

nonrem_total: Total time spent in non-REM sleep.

total: Total sleep time.

nonrem_percentage: Percentage of non-REM sleep.

sleep_efficiency: Sleep efficiency of the participant.

time_from_last_drug_taken: Time elapsed since the last drug intake.

wo_duration: Duration of wearing-off effects.

timestamp_hour: Hour of the day when the data was recorded.

timestamp_dayofweek: Day of the week when the data was recorded.

timestamp_hour_sin: Sine transformation of the hour variable.

timestamp_hour_cos: Cosine transformation of the hour variable.

We will move forward with creating the multiple linear regression model based on the supplied variables. We must first make sure that the data has been correctly preprocessed and that any necessary transformations or handling of missing values has been carried out.

The regression model can be built in the following steps:

1. Handle Missing Values: Verify that none of the relevant variables (heart rate, steps, stress score, awake, deep, light, rem, nonrem_total, total, nonrem_percentage, sleep efficiency, time from last medicine used, wo duration) have any missing values in the dataset. If any missing values are discovered, you have the option of either deleting the related rows or utilizing the proper methods to impute the missing values.

2. Examine the correlation matrix of the independent variables to look for any instances of multicollinearity. It is crucial to handle multicollinearity since it can negatively impact the performance of the regression model. Consider eliminating one of the highly correlated variables if multicollinearity is present, or use dimensionality reduction methods like Principal Component Analysis (PCA).

3. Dividing the Data: Create training and testing sets from the dataset. The regression model will be trained using the training set, and its performance will be assessed using the testing set.

4. If necessary, normalize the variables such that they have a mean of 0 and a standard deviation of 1. This is useful if the variables are on different scales. Depending on the circumstances and significance of the variables, this step may not always be required.

5. Construction of the Regression Model Using the practice data, fit the multiple linear regression model. For this, you can utilize Python packages like scikit-learn or statsmodels. The model equation is going to look like this:

    wearing_off is equal to b0 plus b1 times heart rate, b2 times steps,... plus bn times timestamp_hour cos.

    Here, the coefficients for each independent variable are b1, b2,..., and bn, whereas b0 represents the intercept.

6. Utilize the testing dataset to gauge the model's performance after it has been trained. Metrics like mean squared error (MSE), mean absolute error (MAE), and R-squared are frequently used to measure the effectiveness of regression models.

7. Analyzing the model: To comprehend the connection between each independent variable and the "wearing_off" result, examine the coefficients (b1, b2,..., bn). A positive correlation is implied by a positive coefficient, whereas a negative correlation is implied by a negative coefficient. Use p-values or confidence intervals to determine the statistical significance of the coefficients.

## Code: 

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

#### Load the dataset (assuming it's stored in a CSV file)

In [None]:
data = pd.read_csv('https://docs.google.com/spreadsheets/d/1PFtS5WstPA-z5k8kDqgs1m4Ecj8KmH64z_J0XVoXf6o/edit?usp=sharing')

#### Split the data into training and testing sets

In [None]:
X = data[['heart_rate', 'steps', 'stress_score', 'awake', 'deep', 'light', 'rem', 'nonrem_total', 'total',
          'nonrem_percentage', 'sleep_efficiency', 'time_from_last_drug_taken']]
y = data['wearing_off']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

#### Build the regression model

In [None]:
model = LinearRegression()
model.fit(X_train, y_train)

#### Make predictions on the testing set

In [None]:
y_pred = model.predict(X_test)

#### Evaluate the model

In [None]:
mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Squared Error:", mse)
print("Mean Absolute Error:", mae)
print("R-squared:", r2)

#### Model evaluation

In [None]:
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

#### Interpret the model - examine the coefficients

In [None]:
coefficients = pd.DataFrame({'Variable': X.columns, 'Coefficient': model.coef_})
print(coefficients)

##### Chart and Diagram

In [None]:
import matplotlib.pyplot as plt

# Create a bar chart of the coefficients
plt.figure(figsize=(10, 6))
plt.bar(coefficients['Variable'], coefficients['Coefficient'])
plt.xlabel('Variable')
plt.ylabel('Coefficient')
plt.title('Linear Regression Coefficients')
plt.xticks(rotation=45)
plt.show()

#### Dataset


In [None]:
variable = []
wearing_off = []


#### Create the scatter plot

In [None]:
plt.scatter(variable, wearing_off)
plt.xlabel('Variable')
plt.ylabel('Wearing Off')
plt.title('Relationship between Steps and Wearing Off')
plt.show()

#### Set the labels and title


In [None]:
plt.xlabel("Time since Medication Intake (hours)")
plt.ylabel("Degree of Medication Wearing-off")
plt.title("Medication Wearing-off vs. Time")


#### Display the plot

In [None]:
plt.show()