## Summary

Sometimes, a simple requests.get isn't enough. This may be due to a block from robots.txt or something more sophisticated like anti-scraping or anti-DDoS techniques. In other cases, the page may be rendering a lot of dynamic content that we cannot easily access with BeautifulSoup. Regardless, our ability to collect the text we're looking for will be encumbered, and we'll have to turn to more advanced methods.

### Scrapy and Selenium
Two Python libraries that can prove to be incredibly useful for web scraping when requests is insufficient are [selenium](https://www.selenium.dev/) and [scrapy](https://scrapy.org/). Other options, like automated scraping platforms, do exist, but these can get quite expensive and may lack the customization one might want.

#### Selenium
Selenium is a browser automation framework. When we use Selenium, we are actually browsing the site using a real browser. Note that this browser will often be "headless", so we won't actually render anything on the screen.

    from selenium import webdriver
    from selenium.webdriver.chrome.options import Options

    options = Options()
    options.headless = True
    options.add_argument("--window-size=1920,1080")

    driver = webdriver.Chrome(options=options, executable_path='/path/to/chromedriver')
    driver.get("https://www.udacity.com/")
    page_source = driver.page_source
    with open("udacity_home.html", "w") as f:
        f.write(page_source)
    driver.quit()

The above code is a simple Selenium example.

First, we import the webdriver and the Options object.

Then, we create an Options object, and specify options.headless = True so that Selenium does not actually open a browser window for us. This option is incredibly important if we're scraping many pages and even more so if we're on a resource-constrained system!

We can use the options.add_argument() to pass additional arguments to Selenium, like a window size that makes the system seem more real.

We initialize the Chrome webdriver with the following code:

    driver = webdriver.Chrome(options=options, executable_path='/path/to/chromedriver')

This will create the actual object that navigates to the page for us, and we need only specify the options we're interested in, plus the path to the Chrome driver on our machine.

Using driver.get(), we send a GET request for the page in question, just like we do with requests. However, in this case, the site will see a "normal" web browser originating the request and is less likely to drop it. Once we've made a successful GET request, the source of the page -- where all the text we're interested in is located -- is within the driver.page_source object.

We can then use that the same way we do the .text or .content attribute of a response from requests.

Finally, it's good practice to run driver.quit() so we don't run into scoping issues.

#### Scrapy
Scrapy is a framework for building web spiders that can be run either locally or in the [Zyte cloud](https://www.zyte.com/scrapy-cloud/). As a scraping framework as opposed to a headless browser, Scrapy is highly performant and scalable. When you have to scrape large amounts of data from the internet, Scrapy is therefore a fantastic option. We recommend following the [Scrapy tutorial](https://docs.scrapy.org/en/latest/intro/tutorial.html) for learning how to use Scrapy.

## Additional References

[]()