### Is the sophomore slump or second-album-syndrome real? 
More specifically, is such a phenomenon observable from the Pitchfork album reviews? I think this phenom often refers to the challenge an artist faces in creating/presenting their second body of work after an immensely popular first, so in this case we'll look at scores of first albums that won BNM and then compare with scores of the subsequent album.  
It occurs to me that one might be interested in seeing if the second album is also awarded BNM; however, due to possible biases associated with awarding BNM to consecutive releases (seeing if this does happen could be interesting to pursue as well!), I'll stick to scores.


In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import sqlite3 as sql
import os
from scipy.signal import savgol_filter

pd.set_option('precision', 2)
np.set_printoptions(precision=2)

plt.rcParams['axes.facecolor'] = '0.95'

for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        db_path = os.path.join(dirname, filename)

# connect to SQL database, create connection object to database
connection = sql.connect(db_path)
print("SQL database connected")

table = pd.read_sql("SELECT name FROM sqlite_master WHERE type='table'", connection)

reviews = pd.read_sql('SELECT * FROM reviews', connection)
genres = pd.read_sql('SELECT * FROM genres', connection)
connection.close()
print('SQL database connection closed')

SQL database connected
SQL database connection closed


First, let's quickly see how many first albums were given Best New Music, just to make sure the subset of the data is large enough to draw meaningful conclusions from.

In [2]:
reviews_nlabeled = reviews.copy() # review dataframe, augmented with each review's 'number' i.e. its chronology in releasing artist's discography

# add review number per artist
for a, rows in reviews_nlabeled.groupby('artist'):
    values = list(reversed(range(1, rows.shape[0]+1))) #
    reviews_nlabeled.at[rows.index, 'number'] = values
    
reviews_nlabeled.number = reviews_nlabeled.number.astype(int)

In [3]:
first = reviews_nlabeled.loc[(reviews_nlabeled.best_new_music) & (reviews_nlabeled.number == 1)]    # first albums which are also bnm
second = reviews_nlabeled.loc[(reviews_nlabeled.artist.isin(first.artist)) & (reviews_nlabeled.number == 2)]    # second albums from artists whose first albums were bnm

print('There are ' + str(first.shape[0]) 
      + ' first albums that were given Best New Music, and ' 
      + str(second.shape[0]) + ' second albums released by those artists.')

There are 292 first albums that were given Best New Music, and 187 second albums released by those artists.


We're working with 292 first albums and 187 second albums, which is plenty!
Next, we'll isolate scores for these 292 BNM first albums, then scores for the albums that immediately followed them. We'll compare the two score distributions for stastistically significant difference using the (?)