# Scraping Experimentation

Yeah, I'm pretty good at scraping and all, but I can't just sit down and immediately know exactly how to scrape a website, how to de-construct the pages, extract the text and desired field, understand the logic I want to use, known how to use the websites structure to my advantage.

I this notebook I play around with all of those aspects and figure out how exactly I'm going to accomplish this task.

In [32]:
import pandas as pd

import re
import bs4
from bs4 import BeautifulSoup
import requests
import urllib

import numpy as np

import time

from IPython.display import clear_output

In [14]:
beer_styles_page = requests.get('https://www.beeradvocate.com/beer/style/')

beer_styles_soup = BeautifulSoup(beer_styles_page.content,'lxml')
                                

## Retrieve all the beer style pages

In [55]:
beer_styles = {}
for beer in beer_styles_soup.find_all(name='tr'):
    for style in beer.find_all('a'):
        name = style.text
        link = 'https://www.beeradvocate.com'+style['href']
        beer_styles[name] = link 

In [56]:
beer_styles

{'Altbier': 'https://www.beeradvocate.com/beer/style/86/',
 'American Adjunct Lager': 'https://www.beeradvocate.com/beer/style/38/',
 'American Amber / Red Ale': 'https://www.beeradvocate.com/beer/style/128/',
 'American Amber / Red Lager': 'https://www.beeradvocate.com/beer/style/147/',
 'American Barleywine': 'https://www.beeradvocate.com/beer/style/19/',
 'American Black Ale': 'https://www.beeradvocate.com/beer/style/175/',
 'American Blonde Ale': 'https://www.beeradvocate.com/beer/style/99/',
 'American Brown Ale': 'https://www.beeradvocate.com/beer/style/73/',
 'American Dark Wheat Ale': 'https://www.beeradvocate.com/beer/style/94/',
 'American Double / Imperial IPA': 'https://www.beeradvocate.com/beer/style/140/',
 'American Double / Imperial Pilsner': 'https://www.beeradvocate.com/beer/style/164/',
 'American Double / Imperial Stout': 'https://www.beeradvocate.com/beer/style/157/',
 'American IPA': 'https://www.beeradvocate.com/beer/style/116/',
 'American Malt Liquor': 'https:/

## Extract beer information from the beer styles page's table

In [15]:
# American Amber / Red Ale

AARA = BeautifulSoup(requests.get('https://www.beeradvocate.com/beer/style/128/?sort=revsD&start=0').content, 'lxml')

In [16]:
def quality_review_extractor(reviews_url, reviews_dict):
    
    # request page contents and convert to soup object
    QR_beer_r = requests.get(reviews_url)
    QR_beer_r_soup = BeautifulSoup(QR_beer_r.content, 'lxml')
    
    # find all individual reviews
    for ind_review in QR_beer_r_soup.find_all('div', attrs = {'id':'rating_fullview_container'}): 
        full_ind_review = ind_review.text # contains username and unwated numeric reviews

        # unwanted attributes are contained in <span class="muted">"Unwated text</span>
        numeric_review = ind_review.find_all(name ='span',attrs={'class':'muted'})
    
        # cleans out unwanted attributes (username, datetime, numeric ratings)
        for reviewer in numeric_review:
            unwanted_attr = reviewer.text
            # replaces unwanted aspects
            full_ind_review = full_ind_review.replace(unwanted_attr, '')
            # scrubs out the 'Total Score' attribute
            clean_review = re.sub( '^(.*)(%)',"",string = full_ind_review)
        
        # if the length of the remaining review is greater than 25 keep
        if len(clean_review) > 25:
            reviews_dict[str(len(reviews_dict)+1)] = clean_review

In [33]:
test_dict = {}

# scrapes individual pages of beer listings.
for table_row in AARA.find(name ='table', attrs = {'width':'100%'}).find_all('tr')[3:5]:
    
    beer_tag = table_row.find_all('td')
    beer_name = beer_tag[0].text
    brewery_name = beer_tag[1].text
    abv = beer_tag[2].text
    ratings = beer_tag[3].text
    avg_score = beer_tag[4].text
    beer_page = 'https://www.beeradvocate.com'+(beer_tag[0].find('a')['href'])

    #print(beer_page)
    
    if int(ratings.replace(',','')) < 100:
        continue
        
    quality_reviews = {} # empty dict to append reviews to
    reviews_page = 0 
    
    
    while len(quality_reviews) < 25: # while we have less than 15 good reviews.  
        reviews_page_url = beer_page+str(reviews_page) #url formula to change pages
        
        if requests.get(reviews_page_url).status_code != 200:
            break
        
        print("Current Beer : ",beer_name,
              " --- Pages Scraped : ",int(reviews_page/25),
              " --- Quality Reviews : ",len(quality_reviews))
        clear_output(wait=True)
        
        quality_review_extractor(reviews_page_url, quality_reviews) # hit the function
        
        quality_reviews['Beer_Name'] = beer_name
        quality_reviews['Brewery_Name'] = brewery_name
        quality_reviews['ABV'] = abv
        quality_reviews['Beer_Name']
        
        
        reviews_page += 25 # value increment to change to next 
        
        pause = np.random.lognormal(mean=1.5, sigma=0.4, size=1) #Lognormal dist, avg a 4.5sec pause.
        
        time.sleep(pause) #take a nap so beeradvocate does not get suspicious.  
    
    print(quality_reviews)

{'1': 'An amazing beer. The smell alone is absolutely delicious.', '2': 'Had this beer in 2016 and 2017.  To me the only flavor is resin, not citrus or pine.', '3': 'Poured from a bottle into a glass with sediment roused. \n\nOverall: Excellent brew. Surprised at how good this is.\n\nLook: orange tinged amber. Hazy. Frothy head that stayed til the last sip. Lacing everywhere. \nSmell: fresh and juicy. Not as off-putting as an IPA can sometimes be. Fruity like fresh mango and pineapple. \nTaste: very hoppy, yes. But not as extreme or unpleasant as I expected. A fruity hop burst that twisted to a sweet, bready, caramelized malt finish. An excellent balance of malts and pine. \nMouthfeel: substantial body. A little sticky. Smooth. \nWhat a great hoppy amber ale.\xa0', 'Beer_Name': 'Tröegs Nugget Nectar', 'Brewery_Name': 'Tröegs Brewing Company', 'ABV': '7.50', '7': 'An amazing beer. The smell alone is absolutely delicious.', '8': 'Had this beer in 2016 and 2017.  To me the only flavor is 

In [34]:
quality_reviews

{'1': 'An amazing beer. The smell alone is absolutely delicious.',
 '10': 'An amazing beer. The smell alone is absolutely delicious.',
 '11': 'Had this beer in 2016 and 2017.  To me the only flavor is resin, not citrus or pine.',
 '12': 'Poured from a bottle into a glass with sediment roused. \n\nOverall: Excellent brew. Surprised at how good this is.\n\nLook: orange tinged amber. Hazy. Frothy head that stayed til the last sip. Lacing everywhere. \nSmell: fresh and juicy. Not as off-putting as an IPA can sometimes be. Fruity like fresh mango and pineapple. \nTaste: very hoppy, yes. But not as extreme or unpleasant as I expected. A fruity hop burst that twisted to a sweet, bready, caramelized malt finish. An excellent balance of malts and pine. \nMouthfeel: substantial body. A little sticky. Smooth. \nWhat a great hoppy amber ale.\xa0',
 '13': 'An amazing beer. The smell alone is absolutely delicious.',
 '14': 'Had this beer in 2016 and 2017.  To me the only flavor is resin, not citrus 

In [28]:
int('8,790'.replace(',',''))

8790

In [19]:
#bsp = BeerStylesPage
for bsp in beer_styles.values():
    style_page = requests.get(bsp+'?sort=revsD&start=0')
    

https://www.beeradvocate.com/beer/style/128/
https://www.beeradvocate.com/beer/style/19/
https://www.beeradvocate.com/beer/style/175/
https://www.beeradvocate.com/beer/style/99/
https://www.beeradvocate.com/beer/style/73/
https://www.beeradvocate.com/beer/style/94/
https://www.beeradvocate.com/beer/style/140/
https://www.beeradvocate.com/beer/style/157/
https://www.beeradvocate.com/beer/style/116/
https://www.beeradvocate.com/beer/style/97/
https://www.beeradvocate.com/beer/style/93/
https://www.beeradvocate.com/beer/style/159/
https://www.beeradvocate.com/beer/style/158/
https://www.beeradvocate.com/beer/style/78/
https://www.beeradvocate.com/beer/style/171/
https://www.beeradvocate.com/beer/style/130/
https://www.beeradvocate.com/beer/style/163/
https://www.beeradvocate.com/beer/style/6/
https://www.beeradvocate.com/beer/style/72/
https://www.beeradvocate.com/beer/style/12/
https://www.beeradvocate.com/beer/style/60/
https://www.beeradvocate.com/beer/style/119/
https://www.beeradvoca

In [122]:
# Testing random distributions
import numpy as np
sample = np.random.lognormal(mean=1.5, sigma=0.4, size=25)
print(sample)
print(sample.mean())

[ 3.63096516  2.97196946  3.59375285  2.80251842  4.21879535  3.14527029
  5.06322034  3.62610547  3.94702156  4.00928969  3.82662595  5.14013319
  9.32729767  5.40578739  5.38907801  2.67664075  5.27930289  3.15537869
  4.58350277  8.47517308  3.73412727  5.48687738  2.08695297  4.58108765
  4.4826796 ]
4.42558215408


## Mongo Shit

In [129]:
# Testing clearing output for print over writting in iPython Notebooks
from IPython.display import clear_output
for i in range(1,10):
    time.sleep(0.5)
    print(i)
    clear_output(wait=True)

9


In [1]:
# connecting to the DB
from pymongo import MongoClient
client = MongoClient('52.26.233.189',27016)

Beers = client['Beers']
#beer_coll = Beers['Beer_Coll']
#beer_coll.insert_one(quality_reviews)

In [6]:
Beers['Beer_final'].count()

KeyboardInterrupt: 

In [47]:
# create a new database called 'Beers'
beer_db = client.Beers

# create a new collection called 'Beer_Coll'
beers = beer_db['Beer_Coll']


['admin', 'config', 'local', 'test']

In [48]:
# need to make an insertion in order for DB and Collection to be generated.
beers.insert_one(quality_reviews)

<pymongo.results.InsertOneResult at 0x10b59bdc8>

In [52]:
# Beers has been created
client.database_names()

['Beers', 'admin', 'config', 'local', 'test']

In [53]:
# Beer_Coll collection has been created.
Beers = client['Beers']
Beers.collection_names()

['Beer_Coll']

In [54]:
beer_coll = Beers['Beer_Coll']
beer_coll.find_one()

{'1': 'An amazing beer. The smell alone is absolutely delicious.',
 '10': 'An amazing beer. The smell alone is absolutely delicious.',
 '11': 'Had this beer in 2016 and 2017.  To me the only flavor is resin, not citrus or pine.',
 '12': 'Poured from a bottle into a glass with sediment roused. \n\nOverall: Excellent brew. Surprised at how good this is.\n\nLook: orange tinged amber. Hazy. Frothy head that stayed til the last sip. Lacing everywhere. \nSmell: fresh and juicy. Not as off-putting as an IPA can sometimes be. Fruity like fresh mango and pineapple. \nTaste: very hoppy, yes. But not as extreme or unpleasant as I expected. A fruity hop burst that twisted to a sweet, bready, caramelized malt finish. An excellent balance of malts and pine. \nMouthfeel: substantial body. A little sticky. Smooth. \nWhat a great hoppy amber ale.\xa0',
 '13': 'An amazing beer. The smell alone is absolutely delicious.',
 '14': 'Had this beer in 2016 and 2017.  To me the only flavor is resin, not citrus 

### Experimenting with Mongo DB

In [8]:
# databse names
print(client.database_names())

['admin', 'config', 'local', 'test']


In [9]:
db = client.test

In [10]:
# collections within databases
db.collection_names()

['test']

In [11]:
tests = db.test

In [12]:
# find a single document 
tests.find_one()

{'Dales Pale Ale': ['Light Crisp', 'Pale Ale'],
 '_id': ObjectId('5a3422ff6b0f33d578d48ceb')}

In [36]:
# inserting into database
tests.insert_one(quality_reviews)

  """Entry point for launching an IPython kernel.


ObjectId('5a42a18aeda243194b88525c')

In [121]:
# returning all objects from a database
for post in tests.find({}):
    print(post)

{'_id': ObjectId('5a3422ff6b0f33d578d48ceb'), 'Dales Pale Ale': ['Light Crisp', 'Pale Ale']}
{'_id': ObjectId('5a342d0e6b0f33d578d48cec'), 'Guiness': ['Dark', 'roasty', 'stout']}
{'_id': ObjectId('5a42a18aeda243194b88525c'), '1': 'An amazing beer. The smell alone is absolutely delicious.', '2': 'Had this beer in 2016 and 2017.  To me the only flavor is resin, not citrus or pine.', '3': 'Poured from a bottle into a glass with sediment roused. \n\nOverall: Excellent brew. Surprised at how good this is.\n\nLook: orange tinged amber. Hazy. Frothy head that stayed til the last sip. Lacing everywhere. \nSmell: fresh and juicy. Not as off-putting as an IPA can sometimes be. Fruity like fresh mango and pineapple. \nTaste: very hoppy, yes. But not as extreme or unpleasant as I expected. A fruity hop burst that twisted to a sweet, bready, caramelized malt finish. An excellent balance of malts and pine. \nMouthfeel: substantial body. A little sticky. Smooth. \nWhat a great hoppy amber ale.\xa0', 

In [133]:
table = client['Beers']['Beer_Coll2']

for post in table.find({})[:100]:
    print(post['Beer_Name'])

Bell's Kalamazoo Stout
Chocolate Stout
Obsidian Stout
Bell's Special Double Cream Stout
Sierra Nevada Stout
Chicory Stout
Bell's Java Stout


In [21]:
for i in range(1,10):
    if i%2 == 0:
        continue
    print(i)

1
3
5
7
9


# Final Code.

In [81]:
### Ncessary Imports
import pandas as pd

import re
import bs4
from bs4 import BeautifulSoup
import requests
import urllib

import numpy as np

import time

from IPython.display import clear_output

In [5]:
beer_styles_page = requests.get('https://www.beeradvocate.com/beer/style/')

beer_styles_soup = BeautifulSoup(beer_styles_page.content,"html5lib")

#### Retrieve all the Beer style pages

In [6]:
beer_styles = {}
for beer in beer_styles_soup.find_all(name='tr'):
    for style in beer.find_all('a'):
        name = style.text
        link = 'https://www.beeradvocate.com'+style['href']
        beer_styles[name] = link 

In [7]:
beer_styles

{'Altbier': 'https://www.beeradvocate.com/beer/style/86/',
 'American Adjunct Lager': 'https://www.beeradvocate.com/beer/style/38/',
 'American Amber / Red Ale': 'https://www.beeradvocate.com/beer/style/128/',
 'American Amber / Red Lager': 'https://www.beeradvocate.com/beer/style/147/',
 'American Barleywine': 'https://www.beeradvocate.com/beer/style/19/',
 'American Black Ale': 'https://www.beeradvocate.com/beer/style/175/',
 'American Blonde Ale': 'https://www.beeradvocate.com/beer/style/99/',
 'American Brown Ale': 'https://www.beeradvocate.com/beer/style/73/',
 'American Dark Wheat Ale': 'https://www.beeradvocate.com/beer/style/94/',
 'American Double / Imperial IPA': 'https://www.beeradvocate.com/beer/style/140/',
 'American Double / Imperial Pilsner': 'https://www.beeradvocate.com/beer/style/164/',
 'American Double / Imperial Stout': 'https://www.beeradvocate.com/beer/style/157/',
 'American IPA': 'https://www.beeradvocate.com/beer/style/116/',
 'American Malt Liquor': 'https:/

#### Randomly divide Beer Styles into 6 lists

106 different beer styles, which cannot be evenly divided by 6 so...
- Four sets of 18
- Two sets of 17

In [8]:
# randomizes the order of the list of beers
random_beer_list = np.random.choice(list(beer_styles.keys()), size = len(beer_styles),replace=False)
group1 = random_beer_list[:18]
group2 = random_beer_list[18:36]
group3 = random_beer_list[36:54]
group4 = random_beer_list[54:72]
group5 = random_beer_list[72:89]
group6 = random_beer_list[89:]

Not necessary, but I used this to make sure there were no repeating beer types in my randoms.

```python
group_list = [group1,group2,group3,group4,group5,group6]

import itertools

for a,b in itertools.combinations(group_list,2):
    print(set(a) & set(b))
```

#### Database Connection

In [86]:
!pip install pymongo

Collecting pymongo
  Downloading pymongo-3.6.0-cp36-cp36m-manylinux1_x86_64.whl (378kB)
[K    100% |████████████████████████████████| 378kB 1.4MB/s ta 0:00:011
[?25hInstalling collected packages: pymongo
Successfully installed pymongo-3.6.0


In [91]:
# connecting to the DB
#from pymongo import MongoClient

client = MongoClient('250.250.250.250',20000)

# setting path to collection
Beers = client['Beers']
beer_coll = Beers['Beer_Coll2']


In [92]:
print(client.database_names())

ServerSelectionTimeoutError: 172.31.40.93:27016: [Errno 111] Connection refused

#### Quality Review Extractor Function

In [14]:
# Used on individual Beers.
def quality_review_extractor(reviews_url, reviews_dict):
    
    # request page contents and convert to soup object
    QR_beer_r = requests.get(reviews_url)
    QR_beer_r_soup = BeautifulSoup(QR_beer_r.content, 'html5lib')
    
    # find all individual reviews
    for ind_review in QR_beer_r_soup.find_all('div', attrs = {'id':'rating_fullview_container'}): 
        full_ind_review = ind_review.text # contains username and unwated numeric reviews

        # unwanted attributes are contained in <span class="muted">"Unwated text</span>
        numeric_review = ind_review.find_all(name ='span',attrs={'class':'muted'})
    
        # cleans out unwanted attributes (username, datetime, numeric ratings)
        for reviewer in numeric_review:
            unwanted_attr = reviewer.text
            # replaces unwanted aspects
            full_ind_review = full_ind_review.replace(unwanted_attr, '')
            # scrubs out the 'Total Score' attribute
            clean_review = re.sub( '^(.*)(%)',"",string = full_ind_review)
        
        # if the length of the remaining review is greater than 25 keep
        if len(clean_review) > 25:
            reviews_dict[str(len(reviews_dict)+1)] = clean_review

#### Individual Beer Table Row Scraper.

In [78]:
def beer_table_scrape(table_row):
    beer_tag = table_row.find_all('td')
    # Try to get the general beer info from the row
    try:
        beer_name = beer_tag[0].text
        brewery_name = beer_tag[1].text
        abv = beer_tag[2].text
        ratings = beer_tag[3].text

        avg_score = beer_tag[4].text
        beer_page = 'https://www.beeradvocate.com'+(beer_tag[0].find('a')['href']+'?sort=topr&start=')
    except:
        pass
            
        
    quality_reviews = {} # empty dict to append reviews to
    reviews_page = 0 
    
    
    while len(quality_reviews) < 25 and reviews_page < 1000: # while we have less than 25 good reviews.  
        reviews_page_url = beer_page+str(reviews_page) #url formula to change pages
        
 
        
        print("Current Beer : ",beer_name,
              " --- Pages Scraped : ",int(reviews_page/25),
              " --- Quality Reviews : ",len(quality_reviews))
        
        print(reviews_page_url)
        clear_output(wait=True)
        
        quality_review_extractor(reviews_page_url, quality_reviews) # hit the function
        
        quality_reviews['Beer_Name'] = beer_name
        quality_reviews['Brewery_Name'] = brewery_name
        quality_reviews['ABV'] = abv
        quality_reviews['Beer_Name']
        
        
        reviews_page += 25 # value increment to change to next 
        
        pause = np.random.lognormal(mean=1.5, sigma=0.4, size=1) #Lognormal dist, avg a 4.5sec pause.
        
        time.sleep(pause) #take a nap so beeradvocate does not get suspicious.  
    
    beer_coll.insert_one(quality_reviews)
 

#### Beer Table Crawler loop.

Crawls over the tables on individual pages and executes the quality review extractor function within a while loop.

In [None]:
# for each beer style in the random list
    #while there are still beers of this style to scrape
        # for each individual row on the table
            # run the scrape
        # increment the style value (repeat)

In [105]:
group1

array(['Fruit / Vegetable Beer', 'American Stout',
       'American Double / Imperial IPA', 'American Pale Wheat Ale',
       'Vienna Lager', 'Light Lager', 'Eisbock',
       'American Amber / Red Ale', 'Scottish Ale', 'English Strong Ale',
       'Rauchbier', 'Bière de Garde', 'Lambic - Fruit', 'Czech Pilsener',
       'American Pale Lager', 'Hefeweizen', 'Lambic - Unblended',
       'Roggenbier'],
      dtype='<U35')

In [32]:
test_group = ['Happoshu','Low Alcohol Beer','Rye Beer',]

In [None]:
### Dafuq is wrong with Beer 30 Light????

## Need to figure out a way to increment pages better.

Maybe instead of using a while loop.  I replace the return value when the number of reviews is less than 100 so that that breaks the scraping for that beer.

```python
if number_reviews > 100:
    return
else:
     if scrape(beer_pages) == False:
        return
    
```    

In [80]:
#for each beer in sublist
for individual_style in test_group: #random style list
    base_beer_style_url = beer_styles[individual_style]+'?sort=revsD&start=' # base type url
    page_count_value = 0
    break_light = False
    
    #while we can still get access
    while requests.get(base_beer_style_url+str(page_count_value)).status_code == 200 and break_light == False:
        
        # use individual style as our key to get the appropriate url & soup that shit
        beer_style_table = BeautifulSoup(requests.get(base_beer_style_url+str(page_count_value)).content, 'html5lib')

        # Loops through each row in the table on a page
        for table_row in beer_style_table.find(name ='table', attrs = {'width':'100%'}).find_all('tr')[3:53]:
            
            if int(table_row.find_all('td')[3].text.replace(',','')) < 100:
                break_light = True
                break

                
            # perform that action, if it returns false, because we met the criteria
            beer_table_scrape(table_row)  #if the table scrape ends because of reviews being too low
            
                 # increment the page count
        page_count_value += 50









Current Beer :  Coedo Beniaka  --- Pages Scraped :  0  --- Quality Reviews :  0
https://www.beeradvocate.com/beer/profile/3551/58111/?sort=topr&start=0
Current Beer :  Coedo Beniaka  --- Pages Scraped :  1  --- Quality Reviews :  18
https://www.beeradvocate.com/beer/profile/3551/58111/?sort=topr&start=25
Current Beer :  O'Doul's  --- Pages Scraped :  0  --- Quality Reviews :  0
https://www.beeradvocate.com/beer/profile/29/5727/?sort=topr&start=0
Current Beer :  O'Doul's  --- Pages Scraped :  1  --- Quality Reviews :  13
https://www.beeradvocate.com/beer/profile/29/5727/?sort=topr&start=25
Current Beer :  O'Doul's Amber  --- Pages Scraped :  0  --- Quality Reviews :  0
https://www.beeradvocate.com/beer/profile/29/5728/?sort=topr&start=0
Current Beer :  O'Doul's Amber  --- Pages Scraped :  1  --- Quality Reviews :  16
https://www.beeradvocate.com/beer/profile/29/5728/?sort=topr&start=25
Current Beer :  Kaliber  --- Pages Scraped :  0  --- Quality Reviews :  0
https://www.beeradvocate.com

Current Beer :  LUX Rye Ale  --- Pages Scraped :  1  --- Quality Reviews :  9
https://www.beeradvocate.com/beer/profile/33519/153367/?sort=topr&start=25
Current Beer :  LUX Rye Ale  --- Pages Scraped :  2  --- Quality Reviews :  14
https://www.beeradvocate.com/beer/profile/33519/153367/?sort=topr&start=50
Current Beer :  LUX Rye Ale  --- Pages Scraped :  3  --- Quality Reviews :  18
https://www.beeradvocate.com/beer/profile/33519/153367/?sort=topr&start=75
Current Beer :  LUX Rye Ale  --- Pages Scraped :  4  --- Quality Reviews :  23
https://www.beeradvocate.com/beer/profile/33519/153367/?sort=topr&start=100
Current Beer :  Kentucky Ryed Chiquen  --- Pages Scraped :  0  --- Quality Reviews :  0
https://www.beeradvocate.com/beer/profile/26850/78698/?sort=topr&start=0
Current Beer :  Kentucky Ryed Chiquen  --- Pages Scraped :  1  --- Quality Reviews :  11
https://www.beeradvocate.com/beer/profile/26850/78698/?sort=topr&start=25
Current Beer :  Kentucky Ryed Chiquen  --- Pages Scraped :  

Current Beer :  Westside Rye  --- Pages Scraped :  2  --- Quality Reviews :  18
https://www.beeradvocate.com/beer/profile/423/74206/?sort=topr&start=50
Current Beer :  India Red Rye Ale  --- Pages Scraped :  0  --- Quality Reviews :  0
https://www.beeradvocate.com/beer/profile/34132/118988/?sort=topr&start=0
Current Beer :  India Red Rye Ale  --- Pages Scraped :  1  --- Quality Reviews :  9
https://www.beeradvocate.com/beer/profile/34132/118988/?sort=topr&start=25
Current Beer :  India Red Rye Ale  --- Pages Scraped :  2  --- Quality Reviews :  11
https://www.beeradvocate.com/beer/profile/34132/118988/?sort=topr&start=50
Current Beer :  India Red Rye Ale  --- Pages Scraped :  3  --- Quality Reviews :  17
https://www.beeradvocate.com/beer/profile/34132/118988/?sort=topr&start=75
Current Beer :  India Red Rye Ale  --- Pages Scraped :  4  --- Quality Reviews :  19
https://www.beeradvocate.com/beer/profile/34132/118988/?sort=topr&start=100
Current Beer :  India Red Rye Ale  --- Pages Scrap

Current Beer :  Red Ryeot  --- Pages Scraped :  1  --- Quality Reviews :  12
https://www.beeradvocate.com/beer/profile/24659/112724/?sort=topr&start=25
Current Beer :  Red Ryeot  --- Pages Scraped :  2  --- Quality Reviews :  17
https://www.beeradvocate.com/beer/profile/24659/112724/?sort=topr&start=50
Current Beer :  Red Ryeot  --- Pages Scraped :  3  --- Quality Reviews :  21
https://www.beeradvocate.com/beer/profile/24659/112724/?sort=topr&start=75
Current Beer :  RPA (Rye Pale Ale)  --- Pages Scraped :  0  --- Quality Reviews :  0
https://www.beeradvocate.com/beer/profile/3912/80032/?sort=topr&start=0
Current Beer :  RPA (Rye Pale Ale)  --- Pages Scraped :  1  --- Quality Reviews :  18
https://www.beeradvocate.com/beer/profile/3912/80032/?sort=topr&start=25
Current Beer :  Lakefire Rye Pale Ale  --- Pages Scraped :  0  --- Quality Reviews :  0
https://www.beeradvocate.com/beer/profile/33438/106821/?sort=topr&start=0
Current Beer :  Lakefire Rye Pale Ale  --- Pages Scraped :  1  ---

Current Beer :  Samuel Adams Honey Rye Pale Ale  --- Pages Scraped :  0  --- Quality Reviews :  0
https://www.beeradvocate.com/beer/profile/35/279354/?sort=topr&start=0
Current Beer :  Samuel Adams Honey Rye Pale Ale  --- Pages Scraped :  1  --- Quality Reviews :  15
https://www.beeradvocate.com/beer/profile/35/279354/?sort=topr&start=25
Current Beer :  Samuel Adams Honey Rye Pale Ale  --- Pages Scraped :  2  --- Quality Reviews :  23
https://www.beeradvocate.com/beer/profile/35/279354/?sort=topr&start=50
Current Beer :  Stickin' In My IPA  --- Pages Scraped :  0  --- Quality Reviews :  0
https://www.beeradvocate.com/beer/profile/30405/114302/?sort=topr&start=0
Current Beer :  Stickin' In My IPA  --- Pages Scraped :  1  --- Quality Reviews :  10
https://www.beeradvocate.com/beer/profile/30405/114302/?sort=topr&start=25
Current Beer :  Stickin' In My IPA  --- Pages Scraped :  2  --- Quality Reviews :  15
https://www.beeradvocate.com/beer/profile/30405/114302/?sort=topr&start=50
Current 

Current Beer :  Bonfire Rye  --- Pages Scraped :  1  --- Quality Reviews :  14
https://www.beeradvocate.com/beer/profile/23973/97928/?sort=topr&start=25
Current Beer :  Bonfire Rye  --- Pages Scraped :  2  --- Quality Reviews :  19
https://www.beeradvocate.com/beer/profile/23973/97928/?sort=topr&start=50
Current Beer :  Bonfire Rye  --- Pages Scraped :  3  --- Quality Reviews :  21
https://www.beeradvocate.com/beer/profile/23973/97928/?sort=topr&start=75
Current Beer :  Bronx Rye Pale Ale  --- Pages Scraped :  0  --- Quality Reviews :  0
https://www.beeradvocate.com/beer/profile/27035/84754/?sort=topr&start=0
Current Beer :  Bronx Rye Pale Ale  --- Pages Scraped :  1  --- Quality Reviews :  14
https://www.beeradvocate.com/beer/profile/27035/84754/?sort=topr&start=25
Current Beer :  Bronx Rye Pale Ale  --- Pages Scraped :  2  --- Quality Reviews :  21
https://www.beeradvocate.com/beer/profile/27035/84754/?sort=topr&start=50
Current Beer :  Bridal Veil Rye Pale Ale  --- Pages Scraped :  

Current Beer :  Improved Old Fashioned  --- Pages Scraped :  1  --- Quality Reviews :  13
https://www.beeradvocate.com/beer/profile/45/206953/?sort=topr&start=25
Current Beer :  Improved Old Fashioned  --- Pages Scraped :  2  --- Quality Reviews :  20
https://www.beeradvocate.com/beer/profile/45/206953/?sort=topr&start=50
Current Beer :  Witicus Double Rye Wit  --- Pages Scraped :  0  --- Quality Reviews :  0
https://www.beeradvocate.com/beer/profile/18823/54431/?sort=topr&start=0
Current Beer :  Witicus Double Rye Wit  --- Pages Scraped :  1  --- Quality Reviews :  10
https://www.beeradvocate.com/beer/profile/18823/54431/?sort=topr&start=25
Current Beer :  Witicus Double Rye Wit  --- Pages Scraped :  2  --- Quality Reviews :  19
https://www.beeradvocate.com/beer/profile/18823/54431/?sort=topr&start=50
Current Beer :  Witicus Double Rye Wit  --- Pages Scraped :  3  --- Quality Reviews :  24
https://www.beeradvocate.com/beer/profile/18823/54431/?sort=topr&start=75
Current Beer :  Gramar

Current Beer :  Fire In The Rye  --- Pages Scraped :  0  --- Quality Reviews :  0
https://www.beeradvocate.com/beer/profile/28433/107275/?sort=topr&start=0
Current Beer :  Fire In The Rye  --- Pages Scraped :  1  --- Quality Reviews :  15
https://www.beeradvocate.com/beer/profile/28433/107275/?sort=topr&start=25
Current Beer :  Fire In The Rye  --- Pages Scraped :  2  --- Quality Reviews :  20
https://www.beeradvocate.com/beer/profile/28433/107275/?sort=topr&start=50
Current Beer :  Fire In The Rye  --- Pages Scraped :  3  --- Quality Reviews :  23
https://www.beeradvocate.com/beer/profile/28433/107275/?sort=topr&start=75
Current Beer :  Hop*A*Potamus  --- Pages Scraped :  0  --- Quality Reviews :  0
https://www.beeradvocate.com/beer/profile/23052/76681/?sort=topr&start=0
Current Beer :  Hop*A*Potamus  --- Pages Scraped :  1  --- Quality Reviews :  15
https://www.beeradvocate.com/beer/profile/23052/76681/?sort=topr&start=25
Current Beer :  Hop*A*Potamus  --- Pages Scraped :  2  --- Qua

In [119]:
for item in test_group:
    print(beer_styles[item])

https://www.beeradvocate.com/beer/style/158/
https://www.beeradvocate.com/beer/style/89/
https://www.beeradvocate.com/beer/style/39/


In [120]:
beer_styles

{'Altbier': 'https://www.beeradvocate.com/beer/style/86/',
 'American Adjunct Lager': 'https://www.beeradvocate.com/beer/style/38/',
 'American Amber / Red Ale': 'https://www.beeradvocate.com/beer/style/128/',
 'American Amber / Red Lager': 'https://www.beeradvocate.com/beer/style/147/',
 'American Barleywine': 'https://www.beeradvocate.com/beer/style/19/',
 'American Black Ale': 'https://www.beeradvocate.com/beer/style/175/',
 'American Blonde Ale': 'https://www.beeradvocate.com/beer/style/99/',
 'American Brown Ale': 'https://www.beeradvocate.com/beer/style/73/',
 'American Dark Wheat Ale': 'https://www.beeradvocate.com/beer/style/94/',
 'American Double / Imperial IPA': 'https://www.beeradvocate.com/beer/style/140/',
 'American Double / Imperial Pilsner': 'https://www.beeradvocate.com/beer/style/164/',
 'American Double / Imperial Stout': 'https://www.beeradvocate.com/beer/style/157/',
 'American IPA': 'https://www.beeradvocate.com/beer/style/116/',
 'American Malt Liquor': 'https:/

In [135]:
def quality_review_extractor(reviews_url, reviews_dict):
    
    # request page contents and convert to soup object
    QR_beer_r = requests.get(reviews_url)
    QR_beer_r_soup = BeautifulSoup(QR_beer_r.content, 'html5lib')
    
    # find all individual reviews
    for ind_review in QR_beer_r_soup.find_all('div', attrs = {'id':'rating_fullview_container'}): 
        full_ind_review = ind_review.text # contains username and unwated numeric reviews

        # unwanted attributes are contained in <span class="muted">"Unwated text</span>
        numeric_review = ind_review.find_all(name ='span',attrs={'class':'muted'})
    
        # cleans out unwanted attributes (username, datetime, numeric ratings)
        for reviewer in numeric_review:
            unwanted_attr = reviewer.text
            # replaces unwanted aspects
            full_ind_review = full_ind_review.replace(unwanted_attr, '')
            # scrubs out the 'Total Score' attribute
            clean_review = re.sub( '^(.*)(%)',"",string = full_ind_review)
        
        # if the length of the remaining review is greater than 25 keep
        if len(clean_review) > 25:
            reviews_dict[str(len(reviews_dict)+1)] = clean_review

In [136]:
empty_reviews = {}

quality_review_extractor('https://www.beeradvocate.com/beer/profile/1422/32918/?sort=topr&start=0', empty_reviews)

In [137]:
empty_reviews

{'1': "Pours a clear yellow with a 1 inch foamy white head that settles to a film on top of the beer. Foamy rings of lace form around the glass on the drink down. Looks-wise this one isn't so bad I must say. Smell is of subtle grain, sugar, and metal. There is also a weird fruitiness that I am picking up in the smell. Taste is of sugary water and slight grain flavors. This beer seems to be quite flat with a watery quality in the mouthfeel. Overall, this is a pretty horrible beer any way you look at it. I am glad I picked up a can to try just so I can further appreciate all the good beer I am able to enjoy.\xa0",
 '10': "Beer 30 Light has a thick, egg-shell colored head and a clear, golden appearance with some bubbles streaming up and little lacing left. The aroma is strange, to say the least, with a green sour apple scent and other smells I can't detect. The taste is also peculiar, with that sour green apple aspect and other funny flavors I just cannot place. Mouthfeel is light and wat