# Mood Trend Prediction

A notebook that will take your mood data from Daylio app and make predictions based on it using a machine learning model.

## Problem Statement

**Objective:** To predict what mood would the user be for the next days. <br> A dashboard summarizing this can also help which is focused on people with Bipolar Disorder.

**Performance Measure:** RMSE and MAE

:**Assumption:** This is a regression problem because my data is daily. I need to get the scale of my mood each day and predict patterns to help me with my bipolar disorder.

## Data Preparation

### Scripts for loading the data

In [5]:
# get the data from daylio app
import pandas as pd
import numpy as np

# will be modified accordingly after deploying this
def load_mood_data():
    csv_path = "daylio_export_2024_08_14.csv"
    return pd.read_csv(csv_path)

mood = load_mood_data()

In [7]:
# prepare the data before pre-processing
def process_mood_data(mood):
  mood = mood.drop(["note_title", "note"], axis=1)
  mood["full_date"] = pd.to_datetime(mood["full_date"])
  mood = mood.set_index("full_date")
  return mood

mood = process_mood_data(mood)

### Check the data structure

In [8]:
# look at the data structure
mood.head()

Unnamed: 0_level_0,date,weekday,time,mood,activities
full_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2024-08-13,13 Aug,Tuesday,22:46,meh,relax | movies | hobby
2024-08-12,12 Aug,Monday,22:14,meh,grind
2024-08-11,11 Aug,Sunday,21:27,meh,sleep early | hobby
2024-08-10,10 Aug,Saturday,20:00,rad,grind
2024-08-09,9 Aug,Friday,20:22,awful,school | grind


In [9]:
mood.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 542 entries, 2024-08-13 to 2023-02-23
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   date        542 non-null    object
 1   weekday     542 non-null    object
 2   time        542 non-null    object
 3   mood        542 non-null    object
 4   activities  500 non-null    object
dtypes: object(5)
memory usage: 25.4+ KB


In [11]:
# check the entries for each day
mood['weekday'].value_counts()

Unnamed: 0_level_0,count
weekday,Unnamed: 1_level_1
Thursday,79
Monday,78
Friday,78
Tuesday,77
Sunday,77
Saturday,77
Wednesday,76


In [14]:
# check the mood counts per category
mood['mood'].value_counts()

Unnamed: 0_level_0,count
mood,Unnamed: 1_level_1
good,385
meh,129
bad,15
awful,7
rad,6


In [18]:
# check the activities associated to the mood
mood['activities'].value_counts()

Unnamed: 0_level_0,count
activities,Unnamed: 1_level_1
grind,86
school | grind,49
friends | school,38
friends | grind,26
school,24
...,...
friends | relax | shopping | school,1
reading | sleep early | self-love | hobby,1
family | sleep early | grind | hobby,1
friends | school | grind | hobby,1


In [19]:
# statistical summary of the data
mood.describe()

Unnamed: 0,date,weekday,time,mood,activities
count,542,542,542,542,500
unique,366,7,217,5,123
top,10 Aug,Thursday,23:59,good,grind
freq,3,79,81,385,86


### Create a Test Set before pre-processing

In [21]:
# create a test set
import numpy as np
from sklearn.model_selection import TimeSeriesSplit

def create_time_series_splits(data, n_splits=5, random_seed=None):
  """
  Creates time series splits for training and testing.

  Args:
    data: The time series data (pandas DataFrame or Series).
    n_splits: The number of splits to create.
    random_seed: (Optional) Seed for random number generator for reproducibility.

  Returns:
    A generator yielding tuples of (train_data, test_data) for each split.
  """
  if random_seed is not None:
    np.random.seed(random_seed)

  tscv = TimeSeriesSplit(n_splits=n_splits)
  for train_index, test_index in tscv.split(data):
    train_data = data.iloc[train_index]
    test_data = data.iloc[test_index]
    yield train_data, test_data

# Example usage with a random seed for reproducibility:
for train_data, test_data in create_time_series_splits(mood, random_seed=42):
  # Train your model on train_data, evaluate on test_data
  print("Train data shape:", train_data.shape)
  print("Test data shape:", test_data.shape)

Train data shape: (92, 5)
Test data shape: (90, 5)
Train data shape: (182, 5)
Test data shape: (90, 5)
Train data shape: (272, 5)
Test data shape: (90, 5)
Train data shape: (362, 5)
Test data shape: (90, 5)
Train data shape: (452, 5)
Test data shape: (90, 5)


## Exploratory Data Analysis

In [None]:
# make a copy of the original

In [None]:
# visualize data

In [None]:
# look for correlatiosn

In [None]:
# experiment with attribute combinations

## Data Preprocessing

1. Get rid of the corresponding districts. <br>
2. Get rid of the whole attribute.<br>
3. Set the missing values to some value (zero, the mean, the median, etc.).<br>
This is called imputation.

In [None]:
# clean the data


In [None]:
# handle text and categorical attributes

In [None]:
# feature scaling and transformation

In [None]:
# transformation pipelines

## Select and Train a Model

In [None]:
# use pycaret for model selection ha

In [None]:
# train and evaluate on the training set

In [None]:
# better evaluation using cross-validation

## Fine Tuning

In [None]:
# grid search

In [None]:
# analyze best models and their errors

In [None]:
# evaluate your model on the test set