## Webscraping

Use BeautifulSoup to get quotes, authors, and tags from [Quotes to Read](http://quotes.toscrape.com/)

First go to the site and inspect the page, look at what links there are and how the entire site is structured.

In [199]:
# import the necessary libraries
from bs4 import BeautifulSoup
import requests
import re
import pandas as pd

1. Get the first author and the href for the author's page as a tuple from the [homepage](http://quotes.toscrape.com/)

In [193]:
# Make a get request to retrieve the page
html_page = requests.get('http://quotes.toscrape.com/') 
# Pass the page contents to beautiful soup for parsing
soup = BeautifulSoup(html_page.content, 'html.parser')

# Your code here


In [191]:
""" SOLUTION: data for one author """
author = soup.find('small')
author.find_next_siblings()[0].get('href')
(author.text, author.find_next_siblings()[0].get('href'))

('Albert Einstein', '/author/Albert-Einstein')

2. Write a function to get **all** the authors and href links for the authors from the [homepage](http://quotes.toscrape.com/)


In [5]:
def authors(url):
    '''
    input: url
    
    return: a dictionary of of authors and their urls
            {'author_1':'url_of_author_1', 'author_2':'url_of_author_2' ...}
    '''
    pass

In [196]:
""" SOLUTION: data for all the authors on a page """

def authors(url):
    # Make a get request to retrieve the page
    html_page = requests.get(url) 
    # Pass the page contents to beautiful soup for parsing
    soup = BeautifulSoup(html_page.content, 'html.parser')
    authors = soup.find_all('small')
    author_dictionary = {}
    for author in authors:
        author_dictionary[author.text] = author.find_next_siblings()[0].get('href')
    return author_dictionary

In [197]:
# run this cell to test the function
print(authors('http://quotes.toscrape.com/'))
print(authors('http://quotes.toscrape.com/page/3'))

{'Albert Einstein': '/author/Albert-Einstein', 'J.K. Rowling': '/author/J-K-Rowling', 'Jane Austen': '/author/Jane-Austen', 'Marilyn Monroe': '/author/Marilyn-Monroe', 'André Gide': '/author/Andre-Gide', 'Thomas A. Edison': '/author/Thomas-A-Edison', 'Eleanor Roosevelt': '/author/Eleanor-Roosevelt', 'Steve Martin': '/author/Steve-Martin'}
{'Pablo Neruda': '/author/Pablo-Neruda', 'Ralph Waldo Emerson': '/author/Ralph-Waldo-Emerson', 'Mother Teresa': '/author/Mother-Teresa', 'Garrison Keillor': '/author/Garrison-Keillor', 'Jim Henson': '/author/Jim-Henson', 'Dr. Seuss': '/author/Dr-Seuss', 'Albert Einstein': '/author/Albert-Einstein', 'J.K. Rowling': '/author/J-K-Rowling', 'Bob Marley': '/author/Bob-Marley'}


2. Write a function to get all of the quotes from a page.

In [None]:
def get_some_quotes(url):
    '''
    input: url, number of pages to scrap (just scrape the home page if no argument is passed in)
    
    return: a list of dictionaries of quotes with their attributes
            [{'quote':'quote_1_text', 'author':'url_of_author_1'}, 
            {'quote':'quote_2_text', 'author':'url_of_author_2', 'quote_tags':[list_of_quote_2_tags]}, ...]
    '''
    pass

In [183]:
""" SOLUTION: get_some_quotes """

def get_some_quotes(url):
    # Make a get request to retrieve the page
    html_page = requests.get(url) 
    # Pass the page contents to beautiful soup for parsing
    soup = BeautifulSoup(html_page.content, 'html.parser')
        
    list_quotes = []
    for i in soup.find_all(class_="quote"):
        quotes = {}
        quote = (i.find(class_="text").text)
        quotes['quote'] = quote
        list_quotes.append(quotes)
        author = i.find(class_ = "author").text
        quotes['author'] = author
    return list_quotes

In [185]:
for_mongo = get_some_quotes('http://quotes.toscrape.com/' )

4. From the dictionary of author get all the author's birthday and birthplace

In [None]:
Now use the 
, number_of_pages=None

In [10]:
# maybe use the author function to get the url for the author's page
def from_where():
    pass



`get_some_quotes` is what we will import into the mongodb


## NoSQL 

In [179]:
import pymongo

Now open a connection to a mongo database in the terminal, using `mongod` in order to **create**, **update**, **read**, and **delete** from the database.

1. Create a mongo database

In [187]:
myclient = pymongo.MongoClient("mongodb://127.0.0.1:27017/")
mydb = myclient['quote_database']

In [192]:
mycollection = mydb['quote_collection']

2. Add the quotes from `get_some_quotes` or use the json file 

In [189]:
results = mycollection.insert_many(for_mongo)

In [190]:
results.inserted_ids

[ObjectId('5d273fa7b454d4cdb483e88d'),
 ObjectId('5d273fa7b454d4cdb483e88e'),
 ObjectId('5d273fa7b454d4cdb483e88f'),
 ObjectId('5d273fa7b454d4cdb483e890'),
 ObjectId('5d273fa7b454d4cdb483e891'),
 ObjectId('5d273fa7b454d4cdb483e892'),
 ObjectId('5d273fa7b454d4cdb483e893'),
 ObjectId('5d273fa7b454d4cdb483e894'),
 ObjectId('5d273fa7b454d4cdb483e895'),
 ObjectId('5d273fa7b454d4cdb483e896')]

3. Query the database for all the quotes by `'Albert Einstein'`

In [205]:
query_1 = mycollection.find({})
for x in query_1:
    pass

In [204]:
""" SOLUTION: data for Albert Einstein quotes """

query_1 = mycollection.find({'author':'Albert Einstein'})
for x in query_1:
    print(x)

{'_id': ObjectId('5d273fa7b454d4cdb483e88d'), 'quote': '“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”', 'author': 'Albert Einstein'}
{'_id': ObjectId('5d273fa7b454d4cdb483e88f'), 'quote': '“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”', 'author': 'Albert Einstein'}
{'_id': ObjectId('5d273fa7b454d4cdb483e892'), 'quote': '“Try not to become a man of success. Rather become a man of value.”', 'author': 'Albert Einstein'}


4. Update the 1st quote with the tags.

In [198]:
first_quote_tags = {'quote_tags': ['change', 'deep-thoughts', 'thinking', 'world']}



5. delete the third quote from the database