# Scraping Scrollers and Input-Required Sites


## Scrollers

Infinite scroll sites are designed for the mobile age. Links are hard to tap with a finger on a small device,  but a simple swipe easily scrolls the page down to reveal more data. That can make scraping an infinite scroll page difficult. We’ll learn to find the actual location of the data buried in the scrolls.

Here's a couple of examples of a scrolling sites:


- <a href="https://the-internet.herokuapp.com/infinite_scroll">Demo technique here</a>
- <a href="https://www.gofundme.com/discover">GoFundMe</a>
- <a href="https://www.seethroughny.net/">NY State Expenditures, Pensions, Contracts</a>
- <a href="https://www.quintoandar.com.br/alugar/imovel/sao-paulo-sp-brasil">Rentals in São Paulo</a>

Let's target the data source we'll need to scrape this <a href="https://quotes.toscrape.com/scroll">mockup site</a>.

In [1]:
## import usual suspects
import requests ## to capture content from web pages
import pandas as pd ## to easily export our data to dataframes/CSVs
from random import randrange ## to create a range of numbers
import time # for timer
import json ## to work with JSON data

In [None]:
### create new cells here as needed

## scrape single page first

In [3]:
url = "https://quotes.toscrape.com/api/quotes?page=1"

In [4]:
response = requests.get(url)

In [10]:
content = response.json()
type(content)
content

{'has_next': True,
 'page': 1,
 'quotes': [{'author': {'goodreads_link': '/author/show/9810.Albert_Einstein',
    'name': 'Albert Einstein',
    'slug': 'Albert-Einstein'},
   'tags': ['change', 'deep-thoughts', 'thinking', 'world'],
   'text': '“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”'},
  {'author': {'goodreads_link': '/author/show/1077326.J_K_Rowling',
    'name': 'J.K. Rowling',
    'slug': 'J-K-Rowling'},
   'tags': ['abilities', 'choices'],
   'text': '“It is our choices, Harry, that show what we truly are, far more than our abilities.”'},
  {'author': {'goodreads_link': '/author/show/9810.Albert_Einstein',
    'name': 'Albert Einstein',
    'slug': 'Albert-Einstein'},
   'tags': ['inspirational', 'life', 'live', 'miracle', 'miracles'],
   'text': '“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”'},
  {'author': {'goodreads_

In [11]:
content.get("has_next")

True

In [12]:
content.get("quotes")

[{'author': {'goodreads_link': '/author/show/9810.Albert_Einstein',
   'name': 'Albert Einstein',
   'slug': 'Albert-Einstein'},
  'tags': ['change', 'deep-thoughts', 'thinking', 'world'],
  'text': '“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”'},
 {'author': {'goodreads_link': '/author/show/1077326.J_K_Rowling',
   'name': 'J.K. Rowling',
   'slug': 'J-K-Rowling'},
  'tags': ['abilities', 'choices'],
  'text': '“It is our choices, Harry, that show what we truly are, far more than our abilities.”'},
 {'author': {'goodreads_link': '/author/show/9810.Albert_Einstein',
   'name': 'Albert Einstein',
   'slug': 'Albert-Einstein'},
  'tags': ['inspirational', 'life', 'live', 'miracle', 'miracles'],
  'text': '“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”'},
 {'author': {'goodreads_link': '/author/show/1265.Jane_Austen',
   'name': 'Jane 

In [14]:
quotations = content.get("quotes")
quotations

[{'author': {'goodreads_link': '/author/show/9810.Albert_Einstein',
   'name': 'Albert Einstein',
   'slug': 'Albert-Einstein'},
  'tags': ['change', 'deep-thoughts', 'thinking', 'world'],
  'text': '“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”'},
 {'author': {'goodreads_link': '/author/show/1077326.J_K_Rowling',
   'name': 'J.K. Rowling',
   'slug': 'J-K-Rowling'},
  'tags': ['abilities', 'choices'],
  'text': '“It is our choices, Harry, that show what we truly are, far more than our abilities.”'},
 {'author': {'goodreads_link': '/author/show/9810.Albert_Einstein',
   'name': 'Albert Einstein',
   'slug': 'Albert-Einstein'},
  'tags': ['inspirational', 'life', 'live', 'miracle', 'miracles'],
  'text': '“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”'},
 {'author': {'goodreads_link': '/author/show/1265.Jane_Austen',
   'name': 'Jane 

In [15]:
df = pd.DataFrame(quotations)
df

Unnamed: 0,author,tags,text
0,{'goodreads_link': '/author/show/9810.Albert_E...,"[change, deep-thoughts, thinking, world]",“The world as we have created it is a process ...
1,{'goodreads_link': '/author/show/1077326.J_K_R...,"[abilities, choices]","“It is our choices, Harry, that show what we t..."
2,{'goodreads_link': '/author/show/9810.Albert_E...,"[inspirational, life, live, miracle, miracles]",“There are only two ways to live your life. On...
3,{'goodreads_link': '/author/show/1265.Jane_Aus...,"[aliteracy, books, classic, humor]","“The person, be it gentleman or lady, who has ..."
4,{'goodreads_link': '/author/show/82952.Marilyn...,"[be-yourself, inspirational]","“Imperfection is beauty, madness is genius and..."
5,{'goodreads_link': '/author/show/9810.Albert_E...,"[adulthood, success, value]",“Try not to become a man of success. Rather be...
6,{'goodreads_link': '/author/show/7617.Andr_Gid...,"[life, love]",“It is better to be hated for what you are tha...
7,{'goodreads_link': '/author/show/3091287.Thoma...,"[edison, failure, inspirational, paraphrased]","“I have not failed. I've just found 10,000 way..."
8,{'goodreads_link': '/author/show/44566.Eleanor...,[misattributed-eleanor-roosevelt],“A woman is like a tea bag; you never know how...
9,{'goodreads_link': '/author/show/7103.Steve_Ma...,"[humor, obvious, simile]","“A day without sunshine is like, you know, nig..."


In [16]:
df = pd.json_normalize(quotations)
df

Unnamed: 0,tags,text,author.goodreads_link,author.name,author.slug
0,"[change, deep-thoughts, thinking, world]",“The world as we have created it is a process ...,/author/show/9810.Albert_Einstein,Albert Einstein,Albert-Einstein
1,"[abilities, choices]","“It is our choices, Harry, that show what we t...",/author/show/1077326.J_K_Rowling,J.K. Rowling,J-K-Rowling
2,"[inspirational, life, live, miracle, miracles]",“There are only two ways to live your life. On...,/author/show/9810.Albert_Einstein,Albert Einstein,Albert-Einstein
3,"[aliteracy, books, classic, humor]","“The person, be it gentleman or lady, who has ...",/author/show/1265.Jane_Austen,Jane Austen,Jane-Austen
4,"[be-yourself, inspirational]","“Imperfection is beauty, madness is genius and...",/author/show/82952.Marilyn_Monroe,Marilyn Monroe,Marilyn-Monroe
5,"[adulthood, success, value]",“Try not to become a man of success. Rather be...,/author/show/9810.Albert_Einstein,Albert Einstein,Albert-Einstein
6,"[life, love]",“It is better to be hated for what you are tha...,/author/show/7617.Andr_Gide,André Gide,Andre-Gide
7,"[edison, failure, inspirational, paraphrased]","“I have not failed. I've just found 10,000 way...",/author/show/3091287.Thomas_A_Edison,Thomas A. Edison,Thomas-A-Edison
8,[misattributed-eleanor-roosevelt],“A woman is like a tea bag; you never know how...,/author/show/44566.Eleanor_Roosevelt,Eleanor Roosevelt,Eleanor-Roosevelt
9,"[humor, obvious, simile]","“A day without sunshine is like, you know, nig...",/author/show/7103.Steve_Martin,Steve Martin,Steve-Martin


In [18]:
import time
from random import randrange

In [20]:
dfs_list = []
page_number = 1
valid = True
base_url = "https://quotes.toscrape.com/api/quotes?page="

while valid == True: ## kinda new
    link = f"{base_url}{page_number}"
    page_number += 1
    print(f"scraping {link}")
    response = requests.get(link)
    content = response.json()
    quotes = content.get("quotes") ## kinda new
    df = pd.json_normalize(quotes) ## new
    dfs_list.append(df)
    snoozer = randrange(4,7)
    print(f"snoozing for {snoozer} seconds before next scrape")
    time.sleep(snoozer)
    ## new:
    if content.get("has_next") == False:
        valid = False
        
print(F"Done scraping {page_number -1} pages!")
    
    

scraping https://quotes.toscrape.com/api/quotes?page=1
snoozing for 5 seconds before next scrape
scraping https://quotes.toscrape.com/api/quotes?page=2
snoozing for 4 seconds before next scrape
scraping https://quotes.toscrape.com/api/quotes?page=3
snoozing for 6 seconds before next scrape
scraping https://quotes.toscrape.com/api/quotes?page=4
snoozing for 4 seconds before next scrape
scraping https://quotes.toscrape.com/api/quotes?page=5
snoozing for 6 seconds before next scrape
scraping https://quotes.toscrape.com/api/quotes?page=6
snoozing for 5 seconds before next scrape
scraping https://quotes.toscrape.com/api/quotes?page=7
snoozing for 6 seconds before next scrape
scraping https://quotes.toscrape.com/api/quotes?page=8
snoozing for 6 seconds before next scrape
scraping https://quotes.toscrape.com/api/quotes?page=9
snoozing for 4 seconds before next scrape
scraping https://quotes.toscrape.com/api/quotes?page=10
snoozing for 5 seconds before next scrape
Done scraping 10 pages!


In [21]:
pd.concat(dfs_list, ignore_index = True)

Unnamed: 0,tags,text,author.goodreads_link,author.name,author.slug
0,"[change, deep-thoughts, thinking, world]",“The world as we have created it is a process ...,/author/show/9810.Albert_Einstein,Albert Einstein,Albert-Einstein
1,"[abilities, choices]","“It is our choices, Harry, that show what we t...",/author/show/1077326.J_K_Rowling,J.K. Rowling,J-K-Rowling
2,"[inspirational, life, live, miracle, miracles]",“There are only two ways to live your life. On...,/author/show/9810.Albert_Einstein,Albert Einstein,Albert-Einstein
3,"[aliteracy, books, classic, humor]","“The person, be it gentleman or lady, who has ...",/author/show/1265.Jane_Austen,Jane Austen,Jane-Austen
4,"[be-yourself, inspirational]","“Imperfection is beauty, madness is genius and...",/author/show/82952.Marilyn_Monroe,Marilyn Monroe,Marilyn-Monroe
...,...,...,...,...,...
95,[better-life-empathy],“You never really understand a person until yo...,/author/show/1825.Harper_Lee,Harper Lee,Harper-Lee
96,"[books, children, difficult, grown-ups, write,...",“You have to write the book that wants to be w...,/author/show/106.Madeleine_L_Engle,Madeleine L'Engle,Madeleine-LEngle
97,[truth],“Never tell the truth to people who are not wo...,/author/show/1244.Mark_Twain,Mark Twain,Mark-Twain
98,[inspirational],"“A person's a person, no matter how small.”",/author/show/61105.Dr_Seuss,Dr. Seuss,Dr-Seuss


In [None]:
# flatting lists within a dataframe
data = {
    'ID': ['A', 'B', 'C'],
    'items': [['apple', 'banana'], ['carrot', 'daikon'], ['eggplant']]
}

# create df
dfs = pd.DataFrame(data)
dfs