## Quotes

### Section 1: Webscraping

Use BeautifulSoup to get quotes, authors, and tags from [Quotes to Read](http://quotes.toscrape.com/)

First go to the site and inspect the page, look at what links there are and how the entire site is structured.

In [226]:
# import the necessary libraries
from bs4 import BeautifulSoup
import requests
import pymongo

1. Get the first author and the href for the author's page as a tuple from the [homepage](http://quotes.toscrape.com/)

In [227]:
# Make a get request to retrieve the page
html_page = requests.get('http://quotes.toscrape.com/') 
# Pass the page contents to beautiful soup for parsing
soup = BeautifulSoup(html_page.content, 'html.parser')

# Your code here


In [228]:
""" SOLUTION: data for one author """
author = soup.find('small')
author.find_next_siblings()[0].get('href')
(author.text, author.find_next_siblings()[0].get('href'))

('Albert Einstein', '/author/Albert-Einstein')

2. Write a function to get **all** the authors and href links for the authors from the [homepage](http://quotes.toscrape.com/)


In [229]:
def authors(url):
    '''
    input: url
    
    return: a dictionary of of authors and their urls
            {'author_1':'url_of_author_1', 'author_2':'url_of_author_2' ...}
    '''
    pass

In [230]:
""" SOLUTION: data for all the authors on a page """

def authors(url):
    # Make a get request to retrieve the page
    html_page = requests.get(url) 
    # Pass the page contents to beautiful soup for parsing
    soup = BeautifulSoup(html_page.content, 'html.parser')
    authors = soup.find_all('small')
    author_dictionary = {}
    for author in authors:
        author_dictionary[author.text] = author.find_next_siblings()[0].get('href')
    return author_dictionary

In [233]:
# run this cell to test the function
print(authors('http://quotes.toscrape.com/'))
print('\n')
print(authors('http://quotes.toscrape.com/page/3'))

{'Albert Einstein': '/author/Albert-Einstein', 'J.K. Rowling': '/author/J-K-Rowling', 'Jane Austen': '/author/Jane-Austen', 'Marilyn Monroe': '/author/Marilyn-Monroe', 'André Gide': '/author/Andre-Gide', 'Thomas A. Edison': '/author/Thomas-A-Edison', 'Eleanor Roosevelt': '/author/Eleanor-Roosevelt', 'Steve Martin': '/author/Steve-Martin'}


{'Pablo Neruda': '/author/Pablo-Neruda', 'Ralph Waldo Emerson': '/author/Ralph-Waldo-Emerson', 'Mother Teresa': '/author/Mother-Teresa', 'Garrison Keillor': '/author/Garrison-Keillor', 'Jim Henson': '/author/Jim-Henson', 'Dr. Seuss': '/author/Dr-Seuss', 'Albert Einstein': '/author/Albert-Einstein', 'J.K. Rowling': '/author/J-K-Rowling', 'Bob Marley': '/author/Bob-Marley'}


3. Get the first author on each of the first 5 pages of quotes. You can get to the next page with the next button at the bottom of the homepage.


In [234]:
# Your code here


In [236]:
""" SOLUTION: get_some_quotes """

for i in range(1,6):
    html_page = requests.get(f'http://quotes.toscrape.com/page/{i}/')
    soup = BeautifulSoup(html_page.content, 'html.parser')
    author = soup.find('small')
    print(author.text)

Albert Einstein
Marilyn Monroe
Pablo Neruda
Dr. Seuss
George R.R. Martin


4. Write a function to get all of the quotes from a page.

In [237]:
def get_some_quotes(url):
    '''
    input: url, number of pages to scrap (just scrape the home page if no argument is passed in)
    
    return: a list of dictionaries of quotes with their attributes
            [{'quote':'quote_1_text', 'author':'url_of_author_1'}, 
            {'quote':'quote_2_text', 'author':'url_of_author_2', 'quote_tags':[list_of_quote_2_tags]}, ...]
    '''
    pass

In [238]:
""" SOLUTION: get_some_quotes """

def get_some_quotes(url):
    # Make a get request to retrieve the page
    html_page = requests.get(url) 
    # Pass the page contents to beautiful soup for parsing
    soup = BeautifulSoup(html_page.content, 'html.parser')
        
    list_quotes = []
    for i in soup.find_all(class_="quote"):
        quotes = {}
        quote = (i.find(class_="text").text)
        quotes['quote'] = quote
        list_quotes.append(quotes)
        author = i.find(class_ = "author").text
        quotes['author'] = author
    return list_quotes

In [241]:
# set the function to a variable to use later
quotes_for_mongo = get_some_quotes('http://quotes.toscrape.com/' )
quotes_for_mongo

[{'quote': '“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”',
  'author': 'Albert Einstein'},
 {'quote': '“It is our choices, Harry, that show what we truly are, far more than our abilities.”',
  'author': 'J.K. Rowling'},
 {'quote': '“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”',
  'author': 'Albert Einstein'},
 {'quote': '“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”',
  'author': 'Jane Austen'},
 {'quote': "“Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”",
  'author': 'Marilyn Monroe'},
 {'quote': '“Try not to become a man of success. Rather become a man of value.”',
  'author': 'Albert Einstein'},
 {'quote': '“It is better to be hated for what you are than to be loved for what you are not.”',
  'author': 'And

### Section 2: NoSQL 

To do this section open a connection to a mongo database in the terminal, using `mongod` You will **create**, **update**, and **read** from a mongo database.

Create and connect to a mongo database.

In [294]:
myclient = pymongo.MongoClient("mongodb://127.0.0.1:27017/")
mydb = myclient['quote_database']

In [295]:
mycollection = mydb['quote_collection']

1. Add the quotes from `get_some_quotes` for the [homepage](http://quotes.toscrape.com/) or use the JSON file `quotes.json` for this section. To verify this get the resulting _ids back from the `results` variable.

In [317]:
# if not using  the get_some_quotes function read in the JSON file and set it to variable data

with open(r"data/quotes.json", "r") as r:
    data = json.load(r)

In [315]:
# results is variable th
results = None

In [297]:
""" SOLUTION:  for adding data in the database"""

### add the data from the JSON file
results = mycollection.insert_many(data)

### add the data from the get_some_quotes function
# results = mycollection.insert_many(quotes_for_mongo)

# check they are in the database
results.inserted_ids

2. Query the database for all the quotes by `'Albert Einstein'`.

In [307]:
q1 = None

In [313]:
""" SOLUTION: data for Albert Einstein quotes """

q1 = mycollection.find({'author':'Albert Einstein'})
for x in q1:
    print(x)

{'_id': ObjectId('5d278cb2b454d4cdb483e8b0'), 'quote': '“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”', 'author': 'Albert Einstein'}
{'_id': ObjectId('5d278cb2b454d4cdb483e8b2'), 'quote': '“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”', 'author': 'Albert Einstein'}
{'_id': ObjectId('5d278cb2b454d4cdb483e8b5'), 'quote': '“Try not to become a man of success. Rather become a man of value.”', 'author': 'Albert Einstein'}


3. Update Steve Martin's quote with the tags for the quote stored in the variable `steve_martin_tags`.

In [306]:
steve_martin_tags = {'quote_tags': ['change', 'deep-thoughts', 'thinking', 'world']}
update_steve = None
first_quote_tags = None


In [316]:
""" SOLUTION: data for Steve Martin tags """

update_steve = {'author': 'Steve Martin'}
steve_quote_tags = {'$set':steven_martin_tags}

mycollection.update_one(update_steve, steve_quote_tags)

<pymongo.results.UpdateResult at 0x120852b88>

4. Query the database to confirm that  `'Steve Martin'` is updated with `steve_martin_tags`.

In [309]:
q2 = None

In [311]:
""" SOLUTION: data for Steve Martin tags query """

q2 = mycollection.find({'author': 'Steve Martin'})
for item in q2:
    print(item)

{'_id': ObjectId('5d278cb2b454d4cdb483e8b9'), 'quote': '“A day without sunshine is like, you know, night.”', 'author': 'Steve Martin', 'quote_tags': ['change', 'deep-thoughts', 'thinking', 'world']}
