# Introduction

In this notebook we will take the scraped whisky reviews and populate our MongoDB _WhiskyReviews_.

## Imports

In [1]:
import os
import pandas as pd
import pickle as pkl
import praw
import urllib
from pymongo.mongo_client import MongoClient
from pymongo.server_api import ServerApi
import urllib.parse
import certifi

## Read in the Original Dictionary

Let's read in the pickled dictionary, and check a review to ensure it's structured properly.

In [4]:
review_dict = pkl.load(open('../data/review_dict.pkl', 'rb'))

In [5]:
len(review_dict.keys())

31030

In [6]:
test_sub = review_dict['bc23e67c952b4573a6e1b66789580041']

In [7]:
test_sub['submission'].title

'Review #45: Does Barrel Proof Make a Bourbon Good? A 1792 Full Proof Review'

In [8]:
test_sub['submission'].id

'6epezn'

Everything looks good, so let's connect to whiskyreviews

In [9]:
uri = f"mongodb+srv://{os.getenv('whiskeydb_admin')}:{os.getenv('whiskeydb_pwd')}@whiskeyrecommender.mvfds.mongodb.net/?retryWrites=true&w=majority&appName=WhiskeyRecommender"
ca = certifi.where()

# Create a new client and connect to the server
client = MongoClient(uri,
    server_api=ServerApi('1'),
    tls=True,
    tlsAllowInvalidCertificates=False,
    tlsCAFile=ca)
# Send a ping to confirm a successful connection
try:
    client.admin.command('ping')
    print("Pinged your deployment. You successfully connected to MongoDB!")
except Exception as e:
    print(e)

Pinged your deployment. You successfully connected to MongoDB!


In [10]:
reddit_reviews = client.reddit_reviews

In [11]:
submissions = reddit_reviews['submissions']

In [12]:
errors = []
i = 0

for uuid, subdict in review_dict.items():
    try:
        subdict['_id'] = uuid
        # We can't store off the PRAW object since it can't be encoded.
        del subdict['submission']
        submissions.insert_one(subdict)
    except:
        errors.append(uuid)
        
    if i%1000 == 0:
        print('iteration ', i, 'with ', len(errors), 'errors')
    
    i += 1

iteration  0 with  0 errors
iteration  1000 with  0 errors
iteration  2000 with  0 errors
iteration  3000 with  0 errors
iteration  4000 with  0 errors
iteration  5000 with  0 errors
iteration  6000 with  0 errors
iteration  7000 with  0 errors
iteration  8000 with  0 errors
iteration  9000 with  0 errors
iteration  10000 with  0 errors
iteration  11000 with  0 errors
iteration  12000 with  0 errors
iteration  13000 with  0 errors
iteration  14000 with  0 errors
iteration  15000 with  0 errors
iteration  16000 with  0 errors
iteration  17000 with  0 errors
iteration  18000 with  0 errors
iteration  19000 with  0 errors
iteration  20000 with  0 errors
iteration  21000 with  0 errors
iteration  22000 with  0 errors
iteration  23000 with  0 errors
iteration  24000 with  0 errors
iteration  25000 with  0 errors
iteration  26000 with  0 errors
iteration  27000 with  0 errors
iteration  28000 with  0 errors
iteration  29000 with  0 errors
iteration  30000 with  0 errors
iteration  31000 with

In [14]:
len(review_dict.items())

100