# Scraping Scrollers and Input-Required Sites


## Scrollers

Infinite scroll sites are designed for the mobile age. Links are hard to tap with a finger on a small device,  but a simple swipe easily scrolls the page down to reveal more data. That can make scraping an infinite scroll page difficult. We’ll learn to find the actual location of the data buried in the scrolls.

Here's a couple of examples of a scrolling sites:


- <a href="https://the-internet.herokuapp.com/infinite_scroll">Demo technique here</a>
- <a href="https://www.gofundme.com/discover">GoFundMe</a>
- <a href="https://www.seethroughny.net/">NY State Expenditures, Pensions, Contracts</a>
- <a href="https://www.quintoandar.com.br/alugar/imovel/sao-paulo-sp-brasil">Rentals in São Paulo</a>

Let's target the data source we'll need to scrape this <a href="https://quotes.toscrape.com/scroll">mockup site</a>.

In [19]:
## import usual suspects
import requests ## to capture content from web pages
import pandas as pd ## to easily export our data to dataframes/CSVs
from random import randrange ## to create a range of numbers
import time # for timer
import json ## to work with JSON data

In [None]:
### create new cells here as needed

In [3]:
##url to scrape
url = "https://quotes.toscrape.com/api/quotes?page=2"

In [None]:
##method 1, json loads

In [4]:
##get response
response = requests.get(url)
response.text

'{"has_next":true,"page":2,"quotes":[{"author":{"goodreads_link":"/author/show/82952.Marilyn_Monroe","name":"Marilyn Monroe","slug":"Marilyn-Monroe"},"tags":["friends","heartbreak","inspirational","life","love","sisters"],"text":"\\u201cThis life is what you make it. No matter what, you\'re going to mess up sometimes, it\'s a universal truth. But the good part is you get to decide how you\'re going to mess it up. Girls will be your friends - they\'ll act like it anyway. But just remember, some come, some go. The ones that stay with you through everything - they\'re your true best friends. Don\'t let go of them. Also remember, sisters make the best friends in the world. As for lovers, well, they\'ll come and go too. And baby, I hate to say it, most of them - actually pretty much all of them are going to break your heart, but you can\'t give up because if you give up, you\'ll never find your soulmate. You\'ll never find that half who makes you whole and that goes for everything. Just bec

In [5]:
##type
type(response.text)

str

In [6]:
##json loads
json.loads(response.text)

{'has_next': True,
 'page': 2,
 'quotes': [{'author': {'goodreads_link': '/author/show/82952.Marilyn_Monroe',
    'name': 'Marilyn Monroe',
    'slug': 'Marilyn-Monroe'},
   'tags': ['friends',
    'heartbreak',
    'inspirational',
    'life',
    'love',
    'sisters'],
   'text': "“This life is what you make it. No matter what, you're going to mess up sometimes, it's a universal truth. But the good part is you get to decide how you're going to mess it up. Girls will be your friends - they'll act like it anyway. But just remember, some come, some go. The ones that stay with you through everything - they're your true best friends. Don't let go of them. Also remember, sisters make the best friends in the world. As for lovers, well, they'll come and go too. And baby, I hate to say it, most of them - actually pretty much all of them are going to break your heart, but you can't give up because if you give up, you'll never find your soulmate. You'll never find that half who makes you whole

In [7]:
##method 2, use requests for json
response.json()

{'has_next': True,
 'page': 2,
 'quotes': [{'author': {'goodreads_link': '/author/show/82952.Marilyn_Monroe',
    'name': 'Marilyn Monroe',
    'slug': 'Marilyn-Monroe'},
   'tags': ['friends',
    'heartbreak',
    'inspirational',
    'life',
    'love',
    'sisters'],
   'text': "“This life is what you make it. No matter what, you're going to mess up sometimes, it's a universal truth. But the good part is you get to decide how you're going to mess it up. Girls will be your friends - they'll act like it anyway. But just remember, some come, some go. The ones that stay with you through everything - they're your true best friends. Don't let go of them. Also remember, sisters make the best friends in the world. As for lovers, well, they'll come and go too. And baby, I hate to say it, most of them - actually pretty much all of them are going to break your heart, but you can't give up because if you give up, you'll never find your soulmate. You'll never find that half who makes you whole

In [8]:
##save it into something 
content = response.json()
content

{'has_next': True,
 'page': 2,
 'quotes': [{'author': {'goodreads_link': '/author/show/82952.Marilyn_Monroe',
    'name': 'Marilyn Monroe',
    'slug': 'Marilyn-Monroe'},
   'tags': ['friends',
    'heartbreak',
    'inspirational',
    'life',
    'love',
    'sisters'],
   'text': "“This life is what you make it. No matter what, you're going to mess up sometimes, it's a universal truth. But the good part is you get to decide how you're going to mess it up. Girls will be your friends - they'll act like it anyway. But just remember, some come, some go. The ones that stay with you through everything - they're your true best friends. Don't let go of them. Also remember, sisters make the best friends in the world. As for lovers, well, they'll come and go too. And baby, I hate to say it, most of them - actually pretty much all of them are going to break your heart, but you can't give up because if you give up, you'll never find your soulmate. You'll never find that half who makes you whole

In [20]:
content.get("has_next")

True

In [9]:
## run this cell
animals = [{"rank": 1, 'animal': 'Blue whale', 'weight': 136000, 'animal_type': 'Marine'},
 {"rank": 2, 'animal': 'Bowhead whale', 'weight': 100000, 'animal_type': 'Marine'},
 {"rank": 3, 'animal': 'Fin whale', 'weight': 70000, 'animal_type': 'Marine'},
 {"rank": 4, 'animal': 'Southern right whale', 'weight': 45000, 'animal_type': 'Marine'},
 {"rank": 5, 'animal': 'Humpback whale', 'weight': 30000, 'animal_type': 'Marine'},
 {"rank": 6, 'animal': 'Gray whale', 'weight': 28500, 'animal_type': 'Marine'},
 {"rank": 7, 'animal': 'Northern right whale', 'weight': 23000, 'animal_type': 'Marine'},
 {"rank": 8, 'animal': 'Sei whale', 'weight': 20000, 'animal_type': 'Marine'},
 {"rank": 9, 'animal': "Bryde's whale", 'weight': 16000, 'animal_type': 'Marine'},
 {"rank": 10,'animal': "Baird's beaked whale", 'weight': 11380, 'animal_type': 'Marine'}]

In [10]:
animals[0].get("animal")

'Blue whale'

In [13]:
##get quotes

quotes_list = content.get("quotes")
quotes_list

[{'author': {'goodreads_link': '/author/show/82952.Marilyn_Monroe',
   'name': 'Marilyn Monroe',
   'slug': 'Marilyn-Monroe'},
  'tags': ['friends',
   'heartbreak',
   'inspirational',
   'life',
   'love',
   'sisters'],
  'text': "“This life is what you make it. No matter what, you're going to mess up sometimes, it's a universal truth. But the good part is you get to decide how you're going to mess it up. Girls will be your friends - they'll act like it anyway. But just remember, some come, some go. The ones that stay with you through everything - they're your true best friends. Don't let go of them. Also remember, sisters make the best friends in the world. As for lovers, well, they'll come and go too. And baby, I hate to say it, most of them - actually pretty much all of them are going to break your heart, but you can't give up because if you give up, you'll never find your soulmate. You'll never find that half who makes you whole and that goes for everything. Just because you fai

In [14]:
##length check
len(quotes_list)

10

In [17]:
df = pd.DataFrame(quotes_list)
df

Unnamed: 0,author,tags,text
0,{'goodreads_link': '/author/show/82952.Marilyn...,"[friends, heartbreak, inspirational, life, lov...",“This life is what you make it. No matter what...
1,{'goodreads_link': '/author/show/1077326.J_K_R...,"[courage, friends]",“It takes a great deal of bravery to stand up ...
2,{'goodreads_link': '/author/show/9810.Albert_E...,"[simplicity, understand]","“If you can't explain it to a six year old, yo..."
3,{'goodreads_link': '/author/show/25241.Bob_Mar...,[love],"“You may not be her first, her last, or her on..."
4,{'goodreads_link': '/author/show/61105.Dr_Seus...,[fantasy],"“I like nonsense, it wakes up the brain cells...."
5,{'goodreads_link': '/author/show/4.Douglas_Ada...,"[life, navigation]","“I may not have gone where I intended to go, b..."
6,{'goodreads_link': '/author/show/1049.Elie_Wie...,"[activism, apathy, hate, indifference, inspira...","“The opposite of love is not hate, it's indiff..."
7,{'goodreads_link': '/author/show/1938.Friedric...,"[friendship, lack-of-friendship, lack-of-love,...","“It is not a lack of love, but a lack of frien..."
8,{'goodreads_link': '/author/show/1244.Mark_Twa...,"[books, contentment, friends, friendship, life]","“Good friends, good books, and a sleepy consci..."
9,{'goodreads_link': '/author/show/276029.Allen_...,"[fate, life, misattributed-john-lennon, planni...",“Life is what happens to us while we are makin...


In [18]:
df = pd.json_normalize(quotes_list)
df

Unnamed: 0,tags,text,author.goodreads_link,author.name,author.slug
0,"[friends, heartbreak, inspirational, life, lov...",“This life is what you make it. No matter what...,/author/show/82952.Marilyn_Monroe,Marilyn Monroe,Marilyn-Monroe
1,"[courage, friends]",“It takes a great deal of bravery to stand up ...,/author/show/1077326.J_K_Rowling,J.K. Rowling,J-K-Rowling
2,"[simplicity, understand]","“If you can't explain it to a six year old, yo...",/author/show/9810.Albert_Einstein,Albert Einstein,Albert-Einstein
3,[love],"“You may not be her first, her last, or her on...",/author/show/25241.Bob_Marley,Bob Marley,Bob-Marley
4,[fantasy],"“I like nonsense, it wakes up the brain cells....",/author/show/61105.Dr_Seuss,Dr. Seuss,Dr-Seuss
5,"[life, navigation]","“I may not have gone where I intended to go, b...",/author/show/4.Douglas_Adams,Douglas Adams,Douglas-Adams
6,"[activism, apathy, hate, indifference, inspira...","“The opposite of love is not hate, it's indiff...",/author/show/1049.Elie_Wiesel,Elie Wiesel,Elie-Wiesel
7,"[friendship, lack-of-friendship, lack-of-love,...","“It is not a lack of love, but a lack of frien...",/author/show/1938.Friedrich_Nietzsche,Friedrich Nietzsche,Friedrich-Nietzsche
8,"[books, contentment, friends, friendship, life]","“Good friends, good books, and a sleepy consci...",/author/show/1244.Mark_Twain,Mark Twain,Mark-Twain
9,"[fate, life, misattributed-john-lennon, planni...",“Life is what happens to us while we are makin...,/author/show/276029.Allen_Saunders,Allen Saunders,Allen-Saunders


In [22]:
## All pages
quotes_dfs_list = []
page_number = 1
valid = True
url = "https://quotes.toscrape.com/api/quotes?page="

while valid == True:
    link = f"{url}{page_number}"
    page_number += 1
    print(f"Scraping {link}")
    response = requests.get(link)
    content = response.json()
    target = content.get("quotes")
    best_quotes_df = pd.json_normalize(target)
    quotes_dfs_list.append(best_quotes_df)
    snoozer = randrange(4, 7)
    print(f"Snoozing for {snoozer} seconds")
    time.sleep(snoozer)
    if content.get("has_next") == False:
        valid = False
        
print("************DONE SCRAPING**********")


Scraping https://quotes.toscrape.com/api/quotes?page=1
Snoozing for 5 seconds
Scraping https://quotes.toscrape.com/api/quotes?page=2
Snoozing for 6 seconds
Scraping https://quotes.toscrape.com/api/quotes?page=3
Snoozing for 6 seconds
Scraping https://quotes.toscrape.com/api/quotes?page=4
Snoozing for 5 seconds
Scraping https://quotes.toscrape.com/api/quotes?page=5
Snoozing for 4 seconds
Scraping https://quotes.toscrape.com/api/quotes?page=6
Snoozing for 6 seconds
Scraping https://quotes.toscrape.com/api/quotes?page=7
Snoozing for 6 seconds
Scraping https://quotes.toscrape.com/api/quotes?page=8
Snoozing for 6 seconds
Scraping https://quotes.toscrape.com/api/quotes?page=9
Snoozing for 6 seconds
Scraping https://quotes.toscrape.com/api/quotes?page=10
Snoozing for 6 seconds
************DONE SCRAPING**********


In [24]:
len(quotes_dfs_list)

10

In [25]:
quotes_dfs_list[0]

Unnamed: 0,tags,text,author.goodreads_link,author.name,author.slug
0,"[change, deep-thoughts, thinking, world]",“The world as we have created it is a process ...,/author/show/9810.Albert_Einstein,Albert Einstein,Albert-Einstein
1,"[abilities, choices]","“It is our choices, Harry, that show what we t...",/author/show/1077326.J_K_Rowling,J.K. Rowling,J-K-Rowling
2,"[inspirational, life, live, miracle, miracles]",“There are only two ways to live your life. On...,/author/show/9810.Albert_Einstein,Albert Einstein,Albert-Einstein
3,"[aliteracy, books, classic, humor]","“The person, be it gentleman or lady, who has ...",/author/show/1265.Jane_Austen,Jane Austen,Jane-Austen
4,"[be-yourself, inspirational]","“Imperfection is beauty, madness is genius and...",/author/show/82952.Marilyn_Monroe,Marilyn Monroe,Marilyn-Monroe
5,"[adulthood, success, value]",“Try not to become a man of success. Rather be...,/author/show/9810.Albert_Einstein,Albert Einstein,Albert-Einstein
6,"[life, love]",“It is better to be hated for what you are tha...,/author/show/7617.Andr_Gide,André Gide,Andre-Gide
7,"[edison, failure, inspirational, paraphrased]","“I have not failed. I've just found 10,000 way...",/author/show/3091287.Thomas_A_Edison,Thomas A. Edison,Thomas-A-Edison
8,[misattributed-eleanor-roosevelt],“A woman is like a tea bag; you never know how...,/author/show/44566.Eleanor_Roosevelt,Eleanor Roosevelt,Eleanor-Roosevelt
9,"[humor, obvious, simile]","“A day without sunshine is like, you know, nig...",/author/show/7103.Steve_Martin,Steve Martin,Steve-Martin


In [26]:
##concat all 10 dfs
df = pd.concat(quotes_dfs_list).reset_index(drop = True)
df

Unnamed: 0,tags,text,author.goodreads_link,author.name,author.slug
0,"[change, deep-thoughts, thinking, world]",“The world as we have created it is a process ...,/author/show/9810.Albert_Einstein,Albert Einstein,Albert-Einstein
1,"[abilities, choices]","“It is our choices, Harry, that show what we t...",/author/show/1077326.J_K_Rowling,J.K. Rowling,J-K-Rowling
2,"[inspirational, life, live, miracle, miracles]",“There are only two ways to live your life. On...,/author/show/9810.Albert_Einstein,Albert Einstein,Albert-Einstein
3,"[aliteracy, books, classic, humor]","“The person, be it gentleman or lady, who has ...",/author/show/1265.Jane_Austen,Jane Austen,Jane-Austen
4,"[be-yourself, inspirational]","“Imperfection is beauty, madness is genius and...",/author/show/82952.Marilyn_Monroe,Marilyn Monroe,Marilyn-Monroe
...,...,...,...,...,...
95,[better-life-empathy],“You never really understand a person until yo...,/author/show/1825.Harper_Lee,Harper Lee,Harper-Lee
96,"[books, children, difficult, grown-ups, write,...",“You have to write the book that wants to be w...,/author/show/106.Madeleine_L_Engle,Madeleine L'Engle,Madeleine-LEngle
97,[truth],“Never tell the truth to people who are not wo...,/author/show/1244.Mark_Twain,Mark Twain,Mark-Twain
98,[inspirational],"“A person's a person, no matter how small.”",/author/show/61105.Dr_Seuss,Dr. Seuss,Dr-Seuss


In [30]:
df["tags"] = df["tags"].apply(lambda x: ', '.join(x))
df

Unnamed: 0,tags,text,author.goodreads_link,author.name,author.slug
0,"change, deep-thoughts, thinking, world",“The world as we have created it is a process ...,/author/show/9810.Albert_Einstein,Albert Einstein,Albert-Einstein
1,"abilities, choices","“It is our choices, Harry, that show what we t...",/author/show/1077326.J_K_Rowling,J.K. Rowling,J-K-Rowling
2,"inspirational, life, live, miracle, miracles",“There are only two ways to live your life. On...,/author/show/9810.Albert_Einstein,Albert Einstein,Albert-Einstein
3,"aliteracy, books, classic, humor","“The person, be it gentleman or lady, who has ...",/author/show/1265.Jane_Austen,Jane Austen,Jane-Austen
4,"be-yourself, inspirational","“Imperfection is beauty, madness is genius and...",/author/show/82952.Marilyn_Monroe,Marilyn Monroe,Marilyn-Monroe
...,...,...,...,...,...
95,better-life-empathy,“You never really understand a person until yo...,/author/show/1825.Harper_Lee,Harper Lee,Harper-Lee
96,"books, children, difficult, grown-ups, write, ...",“You have to write the book that wants to be w...,/author/show/106.Madeleine_L_Engle,Madeleine L'Engle,Madeleine-LEngle
97,truth,“Never tell the truth to people who are not wo...,/author/show/1244.Mark_Twain,Mark Twain,Mark-Twain
98,inspirational,"“A person's a person, no matter how small.”",/author/show/61105.Dr_Seuss,Dr. Seuss,Dr-Seuss


In [27]:
# flatting lists within a dataframe
data = {
    'ID': ['A', 'B', 'C'],
    'items': [['apple', 'banana'], ['carrot', 'daikon'], ['eggplant']]
}

# create df
dfs = pd.DataFrame(data)
dfs

Unnamed: 0,ID,items
0,A,"[apple, banana]"
1,B,"[carrot, daikon]"
2,C,[eggplant]


In [28]:
dfs.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   ID      3 non-null      object
 1   items   3 non-null      object
dtypes: object(2)
memory usage: 176.0+ bytes


In [29]:
dfs["items"] = dfs["items"].apply(lambda x: ', '.join(x))
dfs

Unnamed: 0,ID,items
0,A,"apple, banana"
1,B,"carrot, daikon"
2,C,eggplant


## Input-Required Sites

- <a href="https://www.bsa.ca.gov/reports/recent">California State Auditor</a>
- <a href="https://leg.colorado.gov/audit-search">Colorado State Auditor</a>