## Reads the heart rate data from your s1_laps_summary.csv file to understand the user's current heart rate during their running session.

In [None]:
import pandas as pd

# Load the datasets
laps_summary_df = pd.read_csv('/mnt/data/s1_laps_summary.csv')
spotify_songs_df = pd.read_csv('/mnt/data/spotify_songs.csv')

# Display the first few rows of each dataset to understand their structure
laps_summary_head = laps_summary_df.head()
spotify_songs_head = spotify_songs_df.head()

laps_summary_head, spotify_songs_head


#### Laps Summary Dataset event_id, timestamp, heart_rate, time_elapsed, total_running_time, altitude, distance, speed, cadence, start_time, among other attributes related to running sessions. This dataset seems to track various metrics during running sessions, including the heart rate at different timestamps, which is crucial for your recommendation system.

#### Spotify Songs Dataset Contains information about songs, including track_name, track_artist, track_popularity, track_album_id, track_album_name, track_album_release_date, playlist_name, playlist_id, playlist_genre, playlist_subgenre, danceability, energy, key, loudness, mode, speechiness, acousticness, instrumentalness, liveness, valence, tempo, duration_ms. The tempo field, representing the BPM of a song, will be particularly useful for matching songs to the runner's heart rate.

###### Given this information, a valuable contribution could be to develop a script that:
###### Calculates the average heart rate from the laps summary dataset.
###### Selects songs from the Spotify dataset with tempos matching the calculated average heart rate.
###### Generates a playlist of recommended songs based on the matching criteria.

#### Generates a playlist of recommended songs based on the matching criteria. I will now proceed to create a simple version of such a script. This script will:

#### Calculate the average heart rate of a given session.
#### Find songs with tempos close to this average heart rate.
#### Return a list of recommended songs.

In [None]:
# Calculate the average heart rate from the laps summary dataset
average_heart_rate = laps_summary_df['heart_rate'].mean()

# Now, let's find songs with tempos close to the average heart rate.
# We will consider songs within a range of -10 and +10 BPM of the average heart rate.

lower_tempo_limit = average_heart_rate - 10
upper_tempo_limit = average_heart_rate + 10

# Filter the Spotify songs dataset for songs within the tempo range
matching_songs_df = spotify_songs_df[(spotify_songs_df['tempo'] >= lower_tempo_limit) & (spotify_songs_df['tempo'] <= upper_tempo_limit)]

# Select a sample of matching songs to recommend
recommended_songs_sample = matching_songs_df.sample(n=5, random_state=1) if not matching_songs_df.empty else "No matching songs found"

average_heart_rate, recommended_songs_sample[['track_name', 'track_artist', 'tempo']]


#### This was a very simple modeling not using an AI model or even any eda just to see how i should go around this project  

#### Now I will do a simple eda to be able to impliment an ai model to check for recommendationds and calculate precision

#### For a basic Exploratory Data Analysis (EDA), we'll look into both datasets you've provided: the laps summary data and the Spotify songs data. The goal is to understand the distributions, identify patterns or anomalies, and glean insights that could inform the development of your recommendation system.

In [None]:
# Data Overview - Quick summary statistics and structure

# Laps Summary Dataset Overview
laps_summary_description = laps_summary_df.describe()
laps_summary_info = laps_summary_df.info()

# Spotify Songs Dataset Overview
spotify_songs_description = spotify_songs_df.describe()
spotify_songs_info = spotify_songs_df.info()

# Display the descriptions for numerical columns
laps_summary_description, spotify_songs_description


In [1]:
# Check for missing values and impute with the mean
from sklearn.impute import SimpleImputer

# Creating an imputer object to fill missing values with the mean of the column
imputer = SimpleImputer(strategy='mean')

# Applying the imputer to the heart_rate column of the laps summary dataset
laps_summary_df['heart_rate_imputed'] = imputer.fit_transform(laps_summary_df[['heart_rate']])

# Check if there are any missing values left in the heart_rate_imputed column
missing_after_imputation = laps_summary_df['heart_rate_imputed'].isnull().sum()

# Display the result
missing_after_imputation


### Key Takeaways for Recommendation System: Match Range: There's a good overlap between the heart rate range and the tempo range of songs, indicating potential for effective matching.
### Variability Consideration: The variability in both heart rates and song tempos suggests the need for a recommendation system that can adapt to a wide range of user activities and musical preferences.

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Set the style of seaborn
sns.set_style("whitegrid")

# Plot distributions
fig, ax = plt.subplots(1, 2, figsize=(14, 5))

# Distribution of Heart Rates
sns.histplot(laps_summary_df['heart_rate'], bins=30, kde=True, ax=ax[0])
ax[0].set_title('Distribution of Heart Rates')
ax[0].set_xlabel('Heart Rate (BPM)')
ax[0].set_ylabel('Frequency')

# Distribution of Song Tempos
sns.histplot(spotify_songs_df['tempo'], bins=30, kde=True, ax=ax[1])
ax[1].set_title('Distribution of Song Tempos')
ax[1].set_xlabel('Tempo (BPM)')
ax[1].set_ylabel('Frequency')

plt.tight_layout()
plt.show()


### I want to classify the day in low tempo or high tempo to further split the data and help the model understand and better our accuracy

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler

# Reloading Spotify dataset
spotify_songs_df = pd.read_csv('/mnt/data/spotify_songs.csv')

# Prepare the Data: Label songs based on median tempo
median_tempo = spotify_songs_df['tempo'].median()
spotify_songs_df['tempo_high_low'] = (spotify_songs_df['tempo'] > median_tempo).astype(int)  # 1 for High, 0 for Low

# Feature Selection: Using 'tempo' as the feature for simplicity
X = spotify_songs_df[['tempo']]
y = spotify_songs_df['tempo_high_low']

# Splitting the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Data normalization
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train the Model: Logistic Regression
model = LogisticRegression()
model.fit(X_train_scaled, y_train)

# Evaluate the Model: Calculate accuracy on the test set
predictions = model.predict(X_test_scaled)
accuracy = accuracy_score(y_test, predictions)

accuracy
