# 1.E. Collection -- Reviews -- Pitchfork

We've extracted our lyrics, audio data, and captions from NeedleDrop YT videos. Now we're going to move to **Pitchfork.com**. Pitchfork is one of the leading music review sites for music aficionados and they have a significant log of album reviews in rap / hip hop. Some cursory research led me to **pitchfork_api**, which is again a simple wrapper for the Pitchfork API available to anyone. Similar to the NeedleDrop, there is a pre-existing Kaggle Data set of Pitchfork album reviews as well. The workflow in this notebook, then, will be as follows:

 1. Read in our **Kaggle Data Set**.
 2. Take our list of artists and albums from previous pulls and **search for reviews via pitchfork_api**.
 3. **Combine** and deduplicate both of our datasets.

In [1]:
import pitchfork_api
import pandas as pd

## Pitchfork Kaggle Data

In [3]:
#Grab our data set from Kaggle
pitchfork_reviews_kaggle = pd.read_csv('pitchfork_reviews.csv')

In [4]:
#Drop errant column
pitchfork_reviews_kaggle = pitchfork_reviews_kaggle.drop(columns=['Unnamed: 0'])

In [5]:
#Reduce our kaggle data set based on genre field
pitchfork_reviews_kaggle = pitchfork_reviews_kaggle.loc[(pitchfork_reviews_kaggle['genre'] == 'rap') | (pitchfork_reviews_kaggle['genre'] == 'pop/r&b') | (pitchfork_reviews_kaggle['genre'] == 'global'), :]

In [6]:
pitchfork_reviews_kaggle.shape

(2758, 17)

In [7]:
pitchfork_reviews_kaggle.columns

Index(['reviewid', 'title', 'url', 'score', 'best_new_music', 'author',
       'author_type', 'pub_date', 'pub_weekday', 'pub_day', 'pub_month',
       'pub_year', 'reviewid.1', 'content', 'genre', 'label', 'artist'],
      dtype='object')

## Pitchfork API

In [8]:
#Grab albums we need to read in
pitchfork_search_df = pd.read_csv('artist_album_search_list.csv')

In [9]:
#Grab album series and and clean/reduce the string so that it plays nicely with the search functionality
pitchfork_search_df['album_name'] = pitchfork_search_df['album_name'].str.replace(r"\(.*\)","")

In [1]:
#initiate our list of review data
pitchfork_query_url = []
pitchfork_query_album = []
pitchfork_query_spot_artist = []
pitchfork_query_spot_album = []
pitchfork_query_artist = []
pitchfork_query_editorial = []
pitchfork_query_fulltext = []
pitchfork_query_matched_album = []
pitchfork_query_matched_artist = []
pitchfork_query_score = []
pitchfork_query_year = []
pitchfork_query_cover = []


#Grab every artist and album combination and feed it into Pitchfork_Api object
#Grab all the elements we can, and if we encounter and error, abandon
for artist, album_name in pitchfork_search_df.itertuples(index=False):  
    pitchfork_query={}
    print(artist, album_name)
    try:
        p = pitchfork_api.search(artist, album_name)

        pitchfork_query_url.append(p.url)
        pitchfork_query_album.append(p.album())
        pitchfork_query_artist.append(p.artist())
        pitchfork_query_editorial.append(p.editorial())
        pitchfork_query_fulltext.append(p.full_text())
        pitchfork_query_matched_album.append(p.matched_album)
        pitchfork_query_matched_artist.append(p.matched_artist)
        pitchfork_query_score.append(p.score())
        pitchfork_query_year.append(p.year())
        pitchfork_query_spot_artist.append(artist)
        pitchfork_query_spot_album.append(album_name)
    except:
        pitchfork_query_url.append('')
        pitchfork_query_album.append('')
        pitchfork_query_artist.append(artist)
        pitchfork_query_editorial.append('')
        pitchfork_query_fulltext.append('')
        pitchfork_query_matched_album.append('')
        pitchfork_query_matched_artist.append('')
        pitchfork_query_score.append('')
        pitchfork_query_year.append('')
        pitchfork_query_spot_album.append(album_name)
        pitchfork_query_spot_artist.append(artist)
    

    

In [14]:
#build the dataframe
pitchfork_additions_df = pd.DataFrame({
    'url':pitchfork_query_url,
    'album':pitchfork_query_album,
    'artist':pitchfork_query_artist,
    'spotify_artist':pitchfork_query_spot_artist,
    'spotify_album_name':pitchfork_query_spot_album,
    'full_text':pitchfork_query_fulltext,
    'editorial':pitchfork_query_editorial,    
    'matched_album':pitchfork_query_matched_album,
    'matched_artist':pitchfork_query_matched_artist,
    'score':pitchfork_query_score,
    'year':pitchfork_query_year})

In [15]:
pitchfork_additions_df.shape

(7385, 11)

In [None]:
pitchfork_additions_df = pitchfork_additions_df.drop_duplicates('url')

## Combining Data

In [16]:
# Rename our columns for kaggle
pitchfork_reviews_kaggle = pitchfork_reviews_kaggle.rename(columns={'content': 'full_text', 'title':'album', 'pub_year':'year'})

In [17]:
pitchfork_reviews_kaggle.columns

Index(['reviewid', 'album', 'url', 'score', 'best_new_music', 'author',
       'author_type', 'pub_date', 'pub_weekday', 'pub_day', 'pub_month',
       'year', 'reviewid.1', 'full_text', 'genre', 'label', 'artist'],
      dtype='object')

In [18]:
#Drop unneeded columns 
pitchfork_reviews_kaggle = pitchfork_reviews_kaggle.drop(columns=['reviewid','best_new_music', 'author',
       'author_type', 'pub_date', 'pub_weekday', 'pub_day', 'pub_month','reviewid.1','genre', 'label'])

In [20]:
#Combine both into one dataframe
pitchfork_final_df = pd.concat([pitchfork_additions_df, pitchfork_reviews_kaggle], axis = 0, ignore_index=True)

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  """Entry point for launching an IPython kernel.


In [25]:
#drop duplicates based on url
pitchfork_final_df = pitchfork_final_df.drop_duplicates('url')

In [26]:
#Save to csv
pitchfork_final_df.to_csv('pitchfork_deduped_final.csv', index=False)

In [3]:
#Save to csv
pitchfork_final_df = pd.read_csv('pitchfork_deduped_final.csv')

In [6]:
pitchfork_final_df.full_text[3398]

'England\'s Black Box Recorder is what My Bloody Valentine might sound like\n    without all the tape loops, ear- splitting amplification, and confounding\n    network of effects gadgets. On second thought, maybe that\'s a stupid\n    comparison. The photograph on the cover of England Made Me-- the\n    young girl, sitting up in bed under the covers, looking bored and morbidly\n    introspective-- tells you most everything you need to know about this\n    unhappy English pop band fronted by the Auteurs\' Luke Haines.\n    \n    And, of course, there\'s the obvious tragic connotations of the moniker Black\n    Box Recorder. The band is named, as I\'m sure you\'re all well aware, after a\n    recording device that captures the final moments of an ill- fated airplane\n    and its crew. We eventually learn, that to the societal malcontents in\n    Black Box Recorder, life is basically an airplane spinning out of control,\n    spiraling toward the ground, and meeting its end in flaming cata