## Are a single reviewer's scores autocorrelated?  
  
Procedure:
* Plot, for randomly selected (20? 30?) reviewers with over 20 reviews, their consecutive scores against review #.
    * Do the same for another subset's BNM awarding behavior
* Look at average time lag between reviews (i.e. if most reviewers review an album on avg every week, we'd let time lag 1 be equivalent to 7 days)
* Visualize partial autocorrelations to see which lag is appropriate for autocorr calculations
* Compute autocorrelations:
    * For each author, compute autocorrelation using determine time lag
    * Use 1-sample t to see if on avg the autocorrs are significantly diff from 0

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import sqlite3 as sql
import os
from scipy.signal import savgol_filter
from scipy import stats

pd.set_option('precision', 2)
np.set_printoptions(precision=2)

plt.rcParams['axes.facecolor'] = '0.95'

for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        db_path = os.path.join(dirname, filename)

# connect to SQL database, create connection object to database
connection = sql.connect(db_path)
print("SQL database connected")

table = pd.read_sql("SELECT name FROM sqlite_master WHERE type='table'", connection)

reviews = pd.read_sql('SELECT * FROM reviews', connection)
genres = pd.read_sql('SELECT * FROM genres', connection)
connection.close()
print('SQL database connection closed')

SQL database connected
SQL database connection closed


The subsequent analyses will require us to augment the reviews data with each review's number according to the author's history.

In [2]:
reviews

Unnamed: 0,reviewid,title,artist,url,score,best_new_music,author,author_type,pub_date,pub_weekday,pub_day,pub_month,pub_year
0,22703,mezzanine,massive attack,http://pitchfork.com/reviews/albums/22703-mezz...,9.3,0,nate patrin,contributor,2017-01-08,6,8,1,2017
1,22721,prelapsarian,krallice,http://pitchfork.com/reviews/albums/22721-prel...,7.9,0,zoe camp,contributor,2017-01-07,5,7,1,2017
2,22659,all of them naturals,uranium club,http://pitchfork.com/reviews/albums/22659-all-...,7.3,0,david glickman,contributor,2017-01-07,5,7,1,2017
3,22661,first songs,"kleenex, liliput",http://pitchfork.com/reviews/albums/22661-firs...,9.0,1,jenn pelly,associate reviews editor,2017-01-06,4,6,1,2017
4,22725,new start,taso,http://pitchfork.com/reviews/albums/22725-new-...,8.1,0,kevin lozano,tracks coordinator,2017-01-06,4,6,1,2017
...,...,...,...,...,...,...,...,...,...,...,...,...,...
18388,1535,let us replay!,coldcut,http://pitchfork.com/reviews/albums/1535-let-u...,8.9,0,james p. wisdom,,1999-01-26,1,26,1,1999
18389,1341,1999,cassius,http://pitchfork.com/reviews/albums/1341-1999/,4.8,0,james p. wisdom,,1999-01-26,1,26,1,1999
18390,5376,out of tune,mojave 3,http://pitchfork.com/reviews/albums/5376-out-o...,6.3,0,jason josephes,contributor,1999-01-12,1,12,1,1999
18391,2413,"singles breaking up, vol. 1",don caballero,http://pitchfork.com/reviews/albums/2413-singl...,7.2,0,james p. wisdom,,1999-01-12,1,12,1,1999


In [3]:
reviews_nlabeled = reviews.copy() # review dataframe, augmented with each review's 'number' i.e. its chronology in author's review history

# add review number per author
for a, rows in reviews_nlabeled.groupby('author'):
    values = list(reversed(range(1, rows.shape[0]+1)))
    reviews_nlabeled.at[rows.index, 'number'] = values
    
reviews_nlabeled.number = reviews_nlabeled.number.astype(int)

Let's take a coarse look at the average reviewers' scoring behavior over the course of 20 reviews. We'll randomly choose 5 reviewers with 20 or more reviews from the dataset, then plot against the review number their a) scores and b) best new music awards.