# Web Scrape Practice

## Import Dependencies

In [1]:
# Import scraping tools: 
    # Splinter (Browser instance),
    # BeautifulSoup, 
    # and driver object for Chrome (ChromeDriverManager)
from splinter import Browser
from bs4 import BeautifulSoup as soup
from webdriver_manager.chrome import ChromeDriverManager

## Set up Splinter

In [2]:
# Set up an executable path and initialize a browser:
# Set up Splinter
executable_path = {'executable_path': ChromeDriverManager().install()}
browser = Browser('chrome', **executable_path, headless=False) 
# ^ 'headless=False' means that all of broswer's actions in Chrome will be displayed so we can see them




## Set up broswer and BeautifulSoup

In [3]:
# We want to scrape the "Top Ten Tags" text from Quotes.ToScrape.com
# Visit the Quotes to Scrape site
url = 'http://quotes.toscrape.com/'
browser.visit(url)

In [4]:
# Parse the HTML using BeautifulSoup
html = browser.html
html_soup = soup(html, 'html.parser')
# There are also other options besides 'html.parser'

## Start Scraping

In [5]:
# Scrape the Title
title = html_soup.find('h2').text
title

'Top Ten tags'

In [6]:
# Scrape the top ten tags
tag_box = html_soup.find('div', class_='tags-box')
# tag_box
tags = tag_box.find_all('a', class_='tag')

# Use a for loop to print each tag
for tag in tags:
    word = tag.text
    print(word)

love
inspirational
life
humor
books
reading
friendship
friends
truth
simile


### Screenshot of HTML elements for Top Ten Tags Scrape
![TopTenTagsScraping.png](attachment:TopTenTagsScraping.png)

### Scraping from multiple webpages (same source)

In [7]:
# Use browser.visit('url') method to indicate webpage you want splinter to navigate to
url = 'http://quotes.toscrape.com/'
browser.visit(url)

In [8]:
# create a for loop that will do the following:
    # Create a BeautifulSoup object
    # Find all the quotes on the page
    # Print each quote from the page
    # Click the "Next" button at the bottom of the page 
    # (We'll use range(1, 6) in our for loop to visit the first 6 pages of the website.)
for x in range(1, 6):
   html = browser.html
   quote_soup = soup(html, 'html.parser')
   quotes = quote_soup.find_all('span', class_='text')
   for quote in quotes:
      print('page:', x, '----------')
      print(quote.text)
   browser.links.find_by_partial_text('Next').click()

page: 1 ----------
“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”
page: 1 ----------
“It is our choices, Harry, that show what we truly are, far more than our abilities.”
page: 1 ----------
“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”
page: 1 ----------
“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”
page: 1 ----------
“Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”
page: 1 ----------
“Try not to become a man of success. Rather become a man of value.”
page: 1 ----------
“It is better to be hated for what you are than to be loved for what you are not.”
page: 1 ----------
“I have not failed. I've just found 10,000 ways that won't work.”
page: 1 ----------
“A woman is like a tea bag; you never know how strong it is u

In [10]:
# 10.3.2 Skill Drill: visit 'Books.toScrape.com' and scrape the book URL list on the first page
    # Setup url
url2 = "http://Books.toScrape.com"
browser.visit(url2)

In [13]:
# 10.3.2 Skill Drill: visit 'Books.toScrape.com' and scrape the book URL list on the first page
    # Parse the HTML using BeautifulSoup
html = browser.html
html_soup = soup(html, 'html.parser')

In [36]:
# 10.3.2 Skill Drill: visit 'Books.toScrape.com' and scrape the book URL list on the first page
    # Start scraping
title_box = html_soup.find('div', class_="col-sm-8 col-md-9")
# print(f"{title_box}\n")
   # pull titles
titles = title_box.find_all('h3')
# print(titles)
# Use a for loop to print each tag
for title in titles:
    word = title.a['title']
    print(word)

A Light in the Attic
Tipping the Velvet
Soumission
Sharp Objects
Sapiens: A Brief History of Humankind
The Requiem Red
The Dirty Little Secrets of Getting Your Dream Job
The Coming Woman: A Novel Based on the Life of the Infamous Feminist, Victoria Woodhull
The Boys in the Boat: Nine Americans and Their Epic Quest for Gold at the 1936 Berlin Olympics
The Black Maria
Starving Hearts (Triangular Trade Trilogy, #1)
Shakespeare's Sonnets
Set Me Free
Scott Pilgrim's Precious Little Life (Scott Pilgrim #1)
Rip it Up and Start Again
Our Band Could Be Your Life: Scenes from the American Indie Underground, 1981-1991
Olio
Mesaerion: The Best Science Fiction Stories 1800-1849
Libertarianism for Beginners
It's Only the Himalayas
