# Fetching data from Reverb.com

The first step is to create a database of images of electric guitars. Reverb.com is the worlds largest online marketplace for used and new guitars. Each listing usually has multiple images of electric guitars. 

Getting data from the Reverb API. Only selecting electric guitars. The website / API has a maximum of 400 pages and throws an error for the 401st page. The trick is to loop through queries that result in less than 400*50 results. This is done through regions within countries within conditions of guitars.

https://reverb.com/swagger#/ shows information on how the API works

First step is to get the list of countries from the API. Also setting a list of possible conditions that the guitars may have to loop over.

In [1]:
import requests
import pandas as pd

url = 'https://api.reverb.com/api/countries'
headers = {'Authorization': 'Bearer ab6d03ce53a2f2e1766a9594b46e3ef0407d4e1991506fb4866f98baa87c4501', 'Content-Type': 'application/hal+json', 'Accept': 'application/hal+json', 'Accept-Version': '3.0'}
response = requests.get(url, headers=headers)
country_list = response.json()

conditions = ['new', 'b-stock', 'used', 'non-functioning']

Defining a function that is called within the nested loops that will run later

In [2]:
def get_data(country_code, region, condition):

    pagenum = 0

    data_all_pages = []

    while True:

        pagenum += 1

        if region == 'None':
            url = 'https://api.reverb.com/api/listings?product_type=electric-guitars&item_region=' + country_code + '&condition=' + condition + '&page=' + str(pagenum) + '&per_page=50'
        else:
            url = 'https://api.reverb.com/api/listings?product_type=electric-guitars&item_region=' + country_code + '&item_state=' + region + '&condition=' + condition + '&page=' + str(pagenum) + '&per_page=50'

        response = requests.get(url, headers=headers)

        # Convet to json
        data_page = response.json()

        if response.status_code == 200:
            if data_page['total'] > 20000:
                # If there are more than 20000 listings, website throws an error
                if region == 'None':
                    print('Too many listings in ' + country_code + ' for ' + condition + ', find subcategory')
                else:
                    print('Too many listings in ' + region + ', ' + country_code + ' for ' + condition + ', find subcategory')
                break
            else:
                # append all listings on page
                data_page_with_index = {'Country': country_code, 'Region': region, 'Condition': condition, 'Listings': data_page['listings']}
                data_all_pages.append(data_page_with_index)
        else:
            print('Status code not 200')
            break

        # Each request is a page which includes a reference to the next unless there is no next page.
        if 'next' in data_page['_links']:
            url = data_page['_links']['next']['href']
        else:
            break

    return(data_all_pages)

In [3]:
data = []

for condition in conditions:

    country_counter = 0

    for country in country_list['countries']:
        
        country_counter += 1

        country_code = country['country_code']

        if len(country['subregions']) > 0:

            for region in country['subregions']:    

                region = region['code']

                data = data + get_data(country_code, region, condition)

        else:
            
            region = 'None'

            data = data + get_data(country_code, region, condition)

        print('\r Listings from {} retrieved of {} condition guitars. {} of {} countries with this condition.'.format(country_code, condition, country_counter, len(country_list['countries'])), end='')
    
 

 Listings from AX retrieved of non-functioning condition guitars. 242 of 242 countries with this condition.

In [6]:
d = {'id': [], 'Country': [], 'Region': [], 'Make': [], 'Model': [], 'Year': [], 'Date': [], 'Price': [], 'Currency': [], 'Type': [], 'Category': [], 'Condition': [],'url': [], 'image_url': []}

for request in data:
    for listing in request['Listings']:
        d['id'].append(listing['id'])
        d['Country'].append(request['Country'])
        d['Region'].append(request['Region'])
        d['Make'].append(listing['make'])
        d['Model'].append(listing['model'])
        d['Year'].append(listing['year'])
        d['Date'].append(listing['created_at'])
        d['Price'].append(listing['price']['amount'])
        d['Currency'].append(listing['price']['currency'])
        d['Type'].append(request['Condition'])
        d['Category'].append(str(listing['categories']).split("'")[-2])
        d['Condition'].append(str(listing['condition']).split("'")[-6])
        d['url'].append(str(listing['_links']['web']['href']))
        d['image_url'].append(str(listing['_links']['photo']['href']))
        

In [8]:
df = pd.DataFrame(d)

df.Date = pd.to_datetime(df.Date)
df.Price = df.Price.astype(float)
df.Year = pd.to_numeric(df.Year, errors="coerce").round(0)
df.Country = pd.Categorical(df.Country)
df.Region = pd.Categorical(df.Region)
df.Make = pd.Categorical(df.Make)
df.Currency = pd.Categorical(df.Currency)
df.Type = pd.Categorical(df.Type)
df.Category = pd.Categorical(df.Category)
df.Condition = pd.Categorical(df.Condition)

print(df.shape)
df.head()

(140084, 14)


Unnamed: 0,id,Country,Region,Make,Model,Year,Date,Price,Currency,Type,Category,Condition,url,image_url
0,27428046,AR,B,MSP Guitars,Orcus FR-8,2018.0,2019-08-29 16:29:58-05:00,1499.0,USD,new,Electric Guitars / Solid Body,Brand New,https://reverb.com/item/27428046-msp-guitars-o...,https://images.reverb.com/image/upload/s--VaTP...
1,27425624,AR,B,MSP Guitars,Orcus 8,2018.0,2019-08-29 15:18:19-05:00,1599.0,USD,new,Electric Guitars / Solid Body,Brand New,https://reverb.com/item/27425624-msp-guitars-o...,https://images.reverb.com/image/upload/s--xfki...
2,25973633,AR,B,MSP Guitars,Orcus II B-6,2019.0,2019-07-05 14:41:09-05:00,1599.0,USD,new,Electric Guitars / Baritone,Brand New,https://reverb.com/item/25973633-msp-guitars-o...,https://images.reverb.com/image/upload/s--ul7z...
3,26245219,AU,ACT,Reverend,Pete Anderson Eastsider Baritone Guitar -,,2019-07-16 10:40:38-05:00,1749.32,USD,new,Electric Guitars / Solid Body,Brand New,https://reverb.com/item/26245219-reverend-pete...,https://images.reverb.com/image/upload/s--NhDn...
4,30223634,AU,ACT,Suhr,Standard,,2019-12-02 18:52:03-06:00,4343.96,USD,new,Electric Guitars / Solid Body,Brand New,https://reverb.com/item/30223634-suhr-standard...,https://images.reverb.com/image/upload/s--5Z3c...


In [48]:
df.Model.value_counts().head(50)

Stratocaster                             3309
Telecaster                               1616
Les Paul Standard                         658
Les Paul                                  411
Custom 24                                 383
Les Paul Custom                           381
Jazzmaster                                276
Les Paul Studio                           262
SE Custom 24                              250
Player Stratocaster                       248
Les Paul Classic                          232
SG Standard                               202
Classic                                   197
Mustang                                   191
Jaguar                                    187
Les Paul Special                          184
American Professional II Stratocaster     165
Player Telecaster                         158
American Ultra Stratocaster               155
SG                                        152
Les Paul Junior                           152
Flying V                          

In [70]:
stratocaster = ['stratocaster' in x.lower() for x in df.Model]
telecaster = ['telecaster' in x.lower() for x in df.Model]
lespaul = ['les paul' in x.lower() for x in df.Model]
prs_se = [any(word in x.lower() for word in ['se ','custom 24']) for x in df.Model]
sg = ['sg' in x.lower() for x in df.Model]
jazzmaster = ['jazzmaster' in x.lower() for x in df.Model]
mustang = ['mustang' in x.lower() for x in df.Model]

labels = pd.DataFrame(zip(stratocaster,telecaster,lespaul,prs_se,sg,jazzmaster,mustang), 
          columns=['stratocaster','telecaster','lespaul','prs_se','sg','jazzmaster','mustang'])
labels.sum()

stratocaster    13265
telecaster       8304
lespaul          8230
prs_se           5894
sg               2459
jazzmaster       1347
mustang           747
dtype: int64

In [73]:
labels.sum(axis=1).value_counts()

0    100266
1     39391
2       426
3         1
dtype: int64

In [7]:
response = requests.get("https://images.reverb.com/image/upload/s--VaTPMrVM--/f_auto,t_large/v1567112065/vekaa09krieo6y1j6vkc.jpg")

file = open("sample_image.jpg", "wb")
file.write(response.content)
file.close()