Now that we have the values for each feature, forecasted 5 years into the future (2025), we can impose these values on our complete track dataset. If there aren't any songs within this dataset whose features match our predictions exactly, then an confidence interval range of values around the prediction value will be used to filter through the dataset.

In [1]:
import pandas as pd
import numpy as np

Loading saved .csv datasets.

In [14]:
yearly_forecast = pd.read_csv('Datasets/forecast_df.csv')
all_tracks = pd.read_csv('Datasets/cleaned_all_tracks.csv')

In [315]:
yearly_forecast.describe()

Unnamed: 0,acousticness,danceability,duration_ms,energy,loudness,instrumentalness,liveness,speechiness,tempo,valence,popularity,key
count,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0
mean,0.108023,0.694486,197169.11958,0.638821,-6.13371,0.025717,0.12039,0.077753,122.138335,0.455923,72.515222,0.575002
std,0.022858,0.004146,1589.815572,0.009175,0.131202,0.012098,0.000987,0.001943,0.300084,0.006204,1.572,0.265767
min,0.080011,0.687293,194593.592041,0.626812,-6.323764,0.013049,0.118834,0.074337,121.745496,0.445719,69.989212,0.242614
25%,0.092654,0.695221,196906.37793,0.633246,-6.211718,0.014955,0.120083,0.078047,121.963691,0.455135,72.394631,0.394507
50%,0.10757,0.695958,197731.833008,0.639184,-6.074267,0.025908,0.120697,0.078567,122.196417,0.457593,72.520072,0.560721
75%,0.122582,0.696008,197805.920654,0.644889,-6.055435,0.03291,0.120971,0.078803,122.247674,0.459369,73.832693,0.826049
max,0.137296,0.69795,198807.874268,0.649976,-6.003366,0.041761,0.121365,0.079008,122.538398,0.461799,73.839503,0.851116


In [15]:
yearly_forecast

Unnamed: 0,acousticness,danceability,duration_ms,energy,loudness,instrumentalness,liveness,speechiness,tempo,valence,popularity,key
0,0.137296,0.687293,197731.833008,0.626812,-6.323764,0.013049,0.121365,0.074337,121.745496,0.445719,72.520072,0.826049
1,0.122582,0.696008,194593.592041,0.633246,-6.211718,0.014955,0.120697,0.079008,122.247674,0.459369,69.989212,0.851116
2,0.10757,0.695958,198807.874268,0.639184,-6.074267,0.025908,0.120971,0.078047,121.963691,0.461799,73.839503,0.560721
3,0.092654,0.695221,196906.37793,0.644889,-6.055435,0.03291,0.120083,0.078567,122.538398,0.457593,72.394631,0.394507
4,0.080011,0.69795,197805.920654,0.649976,-6.003366,0.041761,0.118834,0.078803,122.196417,0.455135,73.832693,0.242614


As we are not easily able to get a confidence interval for our LSTM predictions, we will need to come up with a range of values to filter the tracks with, as using the exact future value will return very few tracks. 

We will filter the range of each column, by using the 5-year forecast value (for 2025) to add and subtract the RMSE as a sort of standard deviation of error as a range.

*Note: As the range for tempo would be very small with our RMSE, and I know the actual range of this type of music is greater, I will suggest a range of 120-124 BPM (Beats Per Minute).*

In [112]:
#example calculation for confidence interval: 

# (Energy in 2025) + (model RMSE for Energy)
# print(0.649 + 0.024)
# (Energy in 2025) - (model RMSE for Energy)
# print(0.649 - 0.024)


Note 2: Some features were decided to be irrelevant to the musical variety, such as "liveness" which is if the song basically sounds like it was recorded live/with an audience. "Popularity" may also be left out, as it is a special feature which may enable similar sounding music to be left out if the artist or tracks aren't "popular" *yet*.

Loudness confidence interval is widened due to range of values in IQR Range.

7 Features of the future shared:

In [248]:
all_tracks.loc[                                                                #SINGULAR FEATURE TRACK COUNTS:
#                  (all_tracks['acousticness'].between(0.064, 0.096))     &    #6003
               (all_tracks['danceability'].between(0.647, 0.733))       &      #24180
               (all_tracks['duration_ms'].between(187645.9, 207965.9))&        #23059
               (all_tracks['energy'].between(0.625, 0.673))          &         #8816
#                (all_tracks['loudness'].between( -6.267, -5.733)) &           #6464
#                (all_tracks['instrumentalness'].between( 0.035, 0.047))  &    #1952
               (all_tracks['tempo'].between(120, 124))                    &    #8286
               (all_tracks['valence'].between(0.411, 0.499))           &       #15806
               (all_tracks['popularity'].between(71.97, 75.69))   &            #1700
               (all_tracks['key'].between(0, 2))                               #50988
              ]

Unnamed: 0,acousticness,artists,danceability,duration_ms,energy,explicit,id,instrumentalness,key,liveness,loudness,modality,name,popularity,release_date,speechiness,tempo,valence,year
84445,0.0189,['Twenty One Pilots'],0.655,188493,0.632,0,3bnVBN67NBEzedqQuWrpP4,0.0,2,0.0722,-4.802,1,Tear in My Heart,73,2015-05-15,0.0489,120.113,0.447,2015


As our dataset doesn't have any songs whose values match all of the feature predictions, we have to widen the selection of songs with the most features shared. I have done this simply by running through each future one-by-one and recording the value count of each and then manually combining features.

6 Features shared:  INXS - Beautiful Girl

In [286]:
all_tracks.loc[
# #                  (all_tracks['acousticness'].between(0.064, 0.096))     & 
               (all_tracks['danceability'].between(0.647, 0.733))     &
               (all_tracks['duration_ms'].between(187645.9, 207965.9))&
               (all_tracks['energy'].between(0.625, 0.673))          &
               (all_tracks['loudness'].between( -6.267, -5.733)) &
# #                (all_tracks['instrumentalness'].between( 0.035, 0.047)) &
               (all_tracks['tempo'].between(120, 124))   &
               (all_tracks['valence'].between(0.411, 0.499)) 
#                (all_tracks['popularity'].between(71.97, 75.69))    
#                (all_tracks['key'].between(0, 1))      
              ]

Unnamed: 0,acousticness,artists,danceability,duration_ms,energy,explicit,id,instrumentalness,key,liveness,loudness,modality,name,popularity,release_date,speechiness,tempo,valence,year
91601,0.0187,['INXS'],0.675,207547,0.666,0,6N81xlWzMaEYhjHry55OSI,0.692,11,0.112,-6.067,1,Beautiful Girl,52,1992,0.027,120.558,0.434,1992


5 feature values shared:

In [306]:
all_tracks.loc[
#                  (all_tracks['acousticness'].between(0.064, 0.096))     & 
               (all_tracks['danceability'].between(0.647, 0.733))     &
#                (all_tracks['duration_ms'].between(187645.9, 207965.9))&
               (all_tracks['energy'].between(0.625, 0.673))          &
#                (all_tracks['loudness'].between( -6.267, -5.733)) &
#                 (all_tracks['instrumentalness'].between( 0.035, 0.047)) &
               (all_tracks['tempo'].between(120, 124)) &  
               (all_tracks['valence'].between(0.411, 0.499)) &
#                (all_tracks['popularity'].between(71.97, 75.69))    
               (all_tracks['key'].between(0, 1))      
              ]

Unnamed: 0,acousticness,artists,danceability,duration_ms,energy,explicit,id,instrumentalness,key,liveness,loudness,modality,name,popularity,release_date,speechiness,tempo,valence,year
15560,0.239,"['Luh Kel', 'Lil Tjay']",0.66,195331,0.664,1,1Ml32gIRsMAQuUTEt8hwpZ,0.0,0,0.256,-4.419,1,Wrong (feat. Lil Tjay) - Remix,67,2020-04-10,0.0491,120.107,0.432,2020
112356,0.259,"['Clean Bandit', 'Zara Larsson']",0.707,212459,0.629,0,1x5sYLZiu9r5E43kMlt9f8,1.6e-05,0,0.138,-4.581,0,Symphony (feat. Zara Larsson),77,2017-03-16,0.0563,122.863,0.457,2017


4 Feature values shared:

In [314]:
all_tracks.loc[
#                  (all_tracks['acousticness'].between(0.064, 0.096))     & 
               (all_tracks['danceability'].between(0.647, 0.733))     &
#                (all_tracks['duration_ms'].between(187645.9, 207965.9))&
               (all_tracks['energy'].between(0.625, 0.673))          &
               (all_tracks['loudness'].between( -6.267, -5.733)) &
#                 (all_tracks['instrumentalness'].between( 0.035, 0.047)) &
#                (all_tracks['tempo'].between(120, 124)) &  
               (all_tracks['valence'].between(0.411, 0.499)) 
#                (all_tracks['popularity'].between(71.97, 75.69))    
#                (all_tracks['key'].between(0, 1))      
              ]

Unnamed: 0,acousticness,artists,danceability,duration_ms,energy,explicit,id,instrumentalness,key,liveness,loudness,modality,name,popularity,release_date,speechiness,tempo,valence,year
13844,0.222,"['Juan Gotti', 'Grimm', 'Russell Lee']",0.702,180600,0.647,1,0r3qVc594XuujdWgCMgH54,1e-05,7,0.225,-5.785,1,Mira Lo Que Pasa,40,2002-07-09,0.246,82.317,0.425,2002
45606,0.207,['A$AP Ferg'],0.65,214827,0.638,1,2xgX6htrEkyF90i6cwnOf6,0.0,8,0.266,-5.94,1,Dump Dump,50,2013-08-19,0.127,134.044,0.418,2013
59347,0.244,['Ragheb Alama'],0.72,270200,0.635,0,1qQqHzG5JAYD2bV1DGf5v7,5e-06,9,0.119,-5.929,0,نسينى الدنيا,50,2004-01-01,0.0403,93.001,0.484,2004
73985,0.0202,['T.I.'],0.68,249533,0.672,0,0cfzRDlIbK85XdBGDLeMIF,0.0,0,0.224,-6.199,1,Whatever You Like,42,2008-09-08,0.0814,149.994,0.495,2008
91601,0.0187,['INXS'],0.675,207547,0.666,0,6N81xlWzMaEYhjHry55OSI,0.692,11,0.112,-6.067,1,Beautiful Girl,52,1992,0.027,120.558,0.434,1992
102263,0.409,['Ciara'],0.697,267413,0.629,0,1pLdjo3lOBbMaoR4ZpybFH,4e-06,9,0.0819,-5.757,1,Promise,56,2006,0.0425,123.279,0.439,2006
110660,0.0336,"['Mary Mary', 'Kierra Sheard']",0.673,192680,0.673,0,1ZxwWrMNukjS8sb9TZ0HjU,0.0,5,0.214,-5.824,0,God in Me (feat. Kierra Sheard),48,2000,0.0948,176.088,0.428,2000
112335,0.0503,['blackbear'],0.667,212927,0.635,1,13JyykwyYQ3T5QxxL34ukQ,0.0,11,0.367,-5.761,0,chateau,69,2017-04-21,0.0518,98.011,0.478,2017
138990,0.237,['Don Toliver'],0.703,180675,0.653,0,47IXLhp3c6mu7NqvpuhuLi,0.0,11,0.199,-6.226,0,Can't Feel My Legs,70,2020-03-13,0.079,140.044,0.414,2020
144664,0.0457,['Aaron Hall'],0.686,351867,0.647,0,2z0jX7eWWddQ9PDS3Otvlf,0.0,8,0.491,-6.043,0,When You Need Me,36,1993-01-01,0.0502,109.732,0.478,1993


# Conclusions:

There were no songs within our dataset which matched the range of every forecasted feature range. The one song we found which had the most in common with the 2025 forecast (7 out of 10) feature ranges was Twenty One Pilots - 'Tear In My Heart'. This song was within our 'danceability', 'duration_ms', 'energy', 'tempo', 'valence', 'popularity', and 'key' error ranges from the actual forecast values. A song with 6 out of 10 shared features was INXS - 'Beautiful Girl'. 2 songs shared 5 out of 10 values. 12 songs shared 4 out of the 10 values.

# Future Recommendations: 

If possible, we can access the Spotify API next and query the entire Spotify library for those most ahead artists with all features. Spotify lists over 50 Million songs in total. There is a much better probability of finding songs which completely matched the future averages for each feature modeled. 

It would also be interesting to see a form of clustering done on these songs, if there are a greater number and better matches. If there were enough returned within our dataset, it would be possible with what we have.

I would also like to further extend this forecast to individual genres and see how each genre has evolved over time, and which artists within those genres have been playing music of those values much earlier.

Lastly, more time allocation to improving the model layering and/or parameters would help the accuracy of our predictions.