# Introduction

This notebook will serve as a test for scraping reddit reviews. I have obtained several URLs of reviews from the following site: https://docs.google.com/spreadsheets/d/1X1HTxkI6SqsdpNSkSSivMzpxNT-oeTbjFFDdEkXD30o/edit#gid=695409533
I will scrape a set of reviews using the PRAW library. 

In [1]:
# move this to a requirements file
!pip install praw

[33mYou are using pip version 18.1, however version 19.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


## Imports

In [2]:
import os
import praw

## Reddit API Exploration

Create a Reddit() connection with PRAW with my OAUTH credentials stored in Domino

In [4]:
reddit = praw.Reddit(client_id=os.getenv('reddit_clientid'),
                     client_secret=os.getenv('reddit_secret'),
                     user_agent='jbeck22')

Use a set of sample URLS for Reviews

In [5]:
urls = [
    'https://www.reddit.com/r/Scotch/comments/14uder/100_pipers_blend_review_10/c7ghjy2/',
    'https://www.reddit.com/r/bourbon/comments/67a74d/review_316_mystery_sample/',
    'https://www.reddit.com/r/bourbon/comments/2k35c0/review_16_abraham_bowman_cider_finish/',
    'https://www.reddit.com/r/worldwhisky/comments/5ablei/adelphi_the_glover_18_review/'
]

In [6]:
posts = [reddit.submission(url=x) for x in urls]

In [7]:
for top_level_comment in posts[0].comments:
    print(top_level_comment.body)

My wife and I are on a trip to Thailand to meet her family.  I've seen plenty of whisky here, mostly JW, but this one stood out from the rest.  100 pipers is not something I've seen before and it seems to have quite the following here.  It is a blend at 40% alcohol by volume and 35cl was 220 baht or about $8 Canadian.  I got it more as a novelty as I suspect it is the Thai equivalent of chivas or glenfiddich 12.

Colour: caramel, I suspect it is artificially coloured.

Nose: (I had some tiger balm on my hands so this may be *way* off) alcohol, little bit of leather and some hints of sweetness.

Palate: very bland, I taste almost nothing really, a bit of woody flavour, the promise of leather and sweetness from the nose is gone.

Finish: short and devoid of anything but alcohol.

This reminds me of a JW red or the cheap rye my Dad drank when I was a kid.  I bought it primarily for the novelty so I don't think it was a waste.  it is just not something I'd seek out again.

68/100

PS: Yes 

## Understanding the PRAW Submission Object

I need to do some more work understanding the structure of PRAW Submission feedback.  Some information is available in their quick start (https://praw.readthedocs.io/en/latest/getting_started/quick_start.html#determine-available-attributes-of-an-object) but it may  be easier to dig into the object using the vars() 

In [8]:
test_post = posts[0]

In [9]:
vars(test_post)

{'_reddit': <praw.reddit.Reddit at 0x7f96c04c9e80>,
 '_fetched': True,
 '_info_params': {},
 'comment_limit': 2048,
 'comment_sort': 'best',
 'id': '14uder',
 '_flair': None,
 '_mod': None,
 '_comments_by_id': {'t1_c7ghjy2': Comment(id='c7ghjy2'),
  't1_c7grqm4': Comment(id='c7grqm4'),
  't1_c7grt6y': Comment(id='c7grt6y'),
  't1_c7gsa0x': Comment(id='c7gsa0x'),
  't1_c7gsale': Comment(id='c7gsale'),
  't1_c7vuiso': Comment(id='c7vuiso'),
  't1_c7wwrpc': Comment(id='c7wwrpc'),
  't1_c7wygrn': Comment(id='c7wygrn'),
  't1_c7gyxy8': Comment(id='c7gyxy8'),
  't1_c7h88ms': Comment(id='c7h88ms'),
  't1_c7gvn73': Comment(id='c7gvn73'),
  't1_c7h89g3': Comment(id='c7h89g3'),
  't1_c7hc4h0': Comment(id='c7hc4h0'),
  't1_c7ghkiw': Comment(id='c7ghkiw'),
  't1_c7gipg6': Comment(id='c7gipg6'),
  't1_c7j5mlg': Comment(id='c7j5mlg'),
  't1_c7gq4x6': Comment(id='c7gq4x6'),
  't1_c7grpk5': Comment(id='c7grpk5'),
  't1_c7hvru3': Comment(id='c7hvru3'),
  't1_c7gr9dt': Comment(id='c7gr9dt'),
  't1_c7grp

In [10]:
test_post.title

'100 Pipers Blend Review #10'

In [11]:
test_post.author

Redditor(name='merlinblack')

In [12]:
test_post.id

'14uder'

In [13]:
test_post.url

'http://imgur.com/a/gDq9h'

### Sort Comments in Posts

In [14]:
for post in posts:
    post.comment_sort = 'old'

In [15]:
first_comments = [list(x.comments)[0].body for x in posts]

In [16]:
first_comments[1]

'Amazeballs.'

So we can see here that sometimes the first comment is not the actual review.  If you follow the first URL you see the review as the 'body' of the post. Let's look at that post specifically.

In [17]:
bad_post = posts[1]

In [18]:
vars(bad_post)

{'_reddit': <praw.reddit.Reddit at 0x7f96c04c9e80>,
 '_fetched': True,
 '_info_params': {},
 'comment_limit': 2048,
 'comment_sort': 'old',
 'id': '67a74d',
 '_flair': None,
 '_mod': None,
 '_comments_by_id': {'t1_dgotucx': Comment(id='dgotucx'),
  't1_dgov0ul': Comment(id='dgov0ul'),
  't1_dgovmzx': Comment(id='dgovmzx'),
  't1_dgovwi5': Comment(id='dgovwi5'),
  't1_dgow5ez': Comment(id='dgow5ez'),
  't1_dgp3bs5': Comment(id='dgp3bs5'),
  't1_dgp4ls3': Comment(id='dgp4ls3'),
  't1_dgp4oji': Comment(id='dgp4oji'),
  't1_dgp0fzf': Comment(id='dgp0fzf'),
  't1_dgp13mq': Comment(id='dgp13mq')},
 'approved_at_utc': None,
 'subreddit': Subreddit(display_name='bourbon'),
 'selftext': "Mystery Sample picked and poured by myself(requested by TXLevi). Tasting notes messaged to /u/TXLevi for reveal.  \n             \n&nbsp;   \n\n**Color** :   \n                                                               \n**Nose** : Strawberries, cherries, rich oak, dark caramel & spicy notes of clove.      

It looks like the review is actually contained in the 'selftext' field.

In [19]:
bad_post.selftext

"Mystery Sample picked and poured by myself(requested by TXLevi). Tasting notes messaged to /u/TXLevi for reveal.  \n             \n&nbsp;   \n\n**Color** :   \n                                                               \n**Nose** : Strawberries, cherries, rich oak, dark caramel & spicy notes of clove.              \n  \n**Taste** : A very dark/dense palate. Heavily toasted oak char, dark bitter chocolate, wet copper, cherry concentrate, prune juice, brandy, maple syrup, rich caramel, ripe cherries & spearmint.                      \n        \n**Finish** : For being so heavy/powerful it isn't over-whelming.  Mouth-feel is big, bold & rich with bursting flavors. The slight harshness works well with the profile, just an over-all bruiser and I was in the right mood for it!                    \n  \n&nbsp;  \n\n\n**Guess** : 125-130 proof, 18-20 Years, Willett C?             \n \n**Reveal** : [1792 Full Proof Poison Girl 8 Year 10 Month 125 Proof](/spoiler)    \n\n**Conclusion** : heheh

So, there is some parsing we will have to do to pull the posts and store them off based on whether they are stored in selftext or the first comment.  However, the basics are all working, and it looks like we can collect some data!