# COVID-19 Survey Linear Regression

Let's fit a multidimensional linear model to the Covid-19 survey data. We can add as many input variables as we want. Here we choose the step count and stress level as inputs and sleep latency as the output. After fitting, we can predict sleep latency for any step count and stress level combination.

In [42]:
from sklearn.linear_model import LinearRegression
import numpy as np
import pandas as pd

# Load preprocessed data
filename = 'covid_data_preprocessed.csv'
df = pd.read_csv(filename)
# Choose which variables to include in the analysis
df_tmp = df[['Steps', 'Stress', 'Latency']]

# Latency is the output, others are inputs
x = df_tmp.drop('Latency', axis = 1)
y = df_tmp['Latency']

# Fit the model
model = LinearRegression().fit(x, y)

def predict_latency(x, model):
    x_tmp = np.array(x).reshape(-1, 2)
    y_pred = model.predict(x_tmp)[0]
    # Enforce that latency is always positive
    if y_pred >=0:
        return y_pred
    else:
        return 0

# Predict latency from steps and stress
steps = 7000
stress_level = 5
x = [steps, stress_level]
pred = predict_latency(x, model)
print('Predicted sleep latency in minutes: ', round(pred, 2))

Predicted sleep latency in minutes:  13.03


