# Challenges

1. Complete the custom `quotes_parser()` function so that the result being returned contains the quote string instead of the whole html page content.

2. In `IronhackSpider.scrape_url()`, catch any error that might occur when you make requests to scrape the webpage. This includes checking the response status code and catching http request problems such as timeout, SSL, and too many redirects.

3. In `IronhackSpider.kickstart()`, implement `sleep_interval`. You will check if `self.sleep_interval` is larger than 0. If so, tell the script to sleep the given amount of time before making the next request.

4. Change the `PAGES_TO_SCRAPE` value to 10. Try if your code still works as intended by scraping the quotes in 10 webpages. If there are errors in your code, fix them.

5. Update the parameters passed to the `IronhackSpider` constructor so that you coder can crawl [books.toscrape.com](http://books.toscrape.com/). You will need to use a different `URL_PATTERN` and write another parser function to be passed to `IronhackSpider`. 

6. [Bonus 1] Use techniques such as randomized user agents and referers in your requests to reduce the likelihood that your spider is blocked by websites. [Here](http://blog.adnansiddiqi.me/5-strategies-to-write-unblock-able-web-scrapers-in-python/) is a great article to learn these techniques.

7. [Bonus 2] Implement asynchronous calls to `IronhackSpider`. You will make requests in parallel to complete your tasks faster.

In [None]:
import requests
import time

class IronhackSpider:
    def __init__(self, url_pattern, pages_to_scrape=10, sleep_interval=-1, content_parser=None):
        self.url_pattern = url_pattern
        self.pages_to_scrape = pages_to_scrape
        self.rate = rate
        self.content_parser = content_parser
    
    def scrape_url(self, url):
        try:
            response = requests.get(url, timeout=10)
            content = self.get_response_content(response)
            if not self.content_parser is None:
                result = self.content_parser(content)
            else:
                result = content
        except:
            result = None
        self.output_results(result)
    
    def get_response_content(self, r):
        if (r.status_code == 200):
            return r.content
        return False
        
    def output_results(self, r):
        print(r)
        
    def kickstart(self):
        for i in range(1, self.pages_to_scrape+1):
            self.scrape_url(self.url_pattern % i)
            if self.sleep_interval > 0:
                time.sleep(self.sleep_interval)


In [None]:
from bs4 import BeautifulSoup

URL_PATTERN = 'http://quotes.toscrape.com/page/%s/'

PAGES_TO_SCRAPE = 3

def quotes_parser(content):
    soup = BeautifulSoup(content, 'html.parser')
    quotes = soup.find_all('span', {'class':'text'})
    results = [quote.text for quote in quotes]
    return results

my_spider = IronhackSpider(URL_PATTERN, PAGES_TO_SCRAPE, content_parser=quotes_parser)

my_spider.kickstart()