# Scraping JavaScript data ("dynamic webpages")
### by [Jason DeBacker](http://jasondebacker.com), October 2019 (with thanks to [Adam Rennhoff](http://mtweb.mtsu.edu/rennhoff/) )

This notebook provides a tutorial and examples showing how to scrape webpages with JavaScript data.

## Example: scrape the store locations for Walgreens pharmacies.

Sometimes the webpage will be more complicated. As an example, suppose that I want to scrape store locations of Walgreens pharmacies.

![Walgreen Locations Screenshot](files/images/WalgreenLocations.png)

Notice that I am searching for stores near zip code 29205 but this fact is NOT displayed in the url, which is `https://www.walgreens.com/storelocator/result.jsp`. In other words, the zip code is not part of the url so it would not be possible to loop over different locations the way we did with the Wikipedia pages.

What can we do? Lets look at the request that is being sent to Walgreens.com to see if we can mimic the request that is being sent. To do this, we need to use the "Inspect" tool to look at the network data ("XHR") (Note that the format of the inspect tool will vary depending on the internet browser you are using - in the screenshots below, I'm using Safari Version 11.0):

![Walgreen Inspect Screenshot](files/images/WalgreensInspect.png)

This will take some trial and error but you can see a list of requests under "Resources" and then "XHR". I have clicked on the second search result in the list.  Notice that this is showing the address of the first result (4467 DEVINE ST). This tells me that this is the request I want to mimic.


In order to figure out the format of my request, I need to click on the drop down menu that says "Response" and select "Request" from this menu.  Then click on the "show details sidebar" icon to show details of the request.

![Walgreen Request Screenshot](files/images/RequestType.png)

The "request payload" in this case is: `{"q":" Columbia, SC 29205","r":"50","lat":33.9900337,"lng":-80.99815760000001,"requestType":"locator","s":"15","p":"1"}`

The request payload tells us what we need to send to the URL so that they return the information we want

There are three things from the show details that we'll also need: "Location", "Request and Response", and "Request Payload"

* "Location" will tell us the URL that we make our request to
* "Request and Responses" will tell us the method (POST)
* "Request Headers" will tell us the information that needs to be in the header of our request -- in the Wikipedia example, we had to send a "User-Agent" so that it looked like we were coming from a web browser like Chrome or Firefox

Let's try to make that request using the requests library in Python

In [15]:
import requests
import json

url = 'https://customersearch.walgreens.com/storelocator/v1/stores/search' # from Headers Request URL

# Request payloads
pay = {
    "q":" Columbia, SC 29205",
    "r":"500","lat":33.9900337,
    "lng":-80.99815760000001,
    "requestType":"locator",
    "s":"30",
    "p":"2"
}


# Request headers
heads = {
    "Accept" : "application/json, text/plain, */*",
    "Accept-Encoding" : "gzip, deflate, br",
    "Accept-Language" : "en-US,en;q=0.8",
    "Connection" : "keep-alive",
    "Content-Length" : "105",
    "Content-Type" : "application/json;charset=UTF-8",
    "Host" : "customersearch.walgreens.com",
    "Origin" : "https://www.walgreens.com",
    "Referer" : "https://www.walgreens.com/storelocator/find.jsp?tab=store+locator&requestType=locator",
    "User-Agent" : "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Safari/604.1.38"  
}

# Making our POST request using the headers and payload
response = requests.post(url, data = json.dumps(pay), headers = heads)
print(response)

<Response [200]>


A response code of 200 means that the request was properly sent and received. Success!

Our next step is to get the data in a usable format by using the JSON data and remebering the format of the response we receive (from the "Preview" tab).

In [16]:
data = response.json() # our requested information is now saved as JSON data
len(data)
# data.keys()
data['filter']

{'q': ' Columbia, SC 29205',
 'requestType': 'locator',
 'r': '500',
 'filters': [],
 'lat': '33.9900337',
 'lng': '-80.99815760000001',
 's': '30',
 'p': '2',
 'reqsPerPage': '30'}

We can only see the first two elements in the image above but our JSON data has three elements: "filter", "results", and "summary". The "results" key contains the information we really want.

In [17]:
results = data['results']
print(len(results))


22


This is data for 15 stores in our "results" variable. Go back and look at the code, notice that in the payload, we set a parameter 's' to equal 15

Let's look at the results for the first store in the listings...

In [9]:
results[0]

{'store': {'storeNumber': '17457',
  'phone': [{'number': '7990036 ', 'areaCode': '803', 'type': 'store'}],
  'address': {'state': 'SC',
   'city': 'COLUMBIA',
   'street': '2708 ROSEWOOD DRIVE',
   'zip': '29205'},
  'serviceIndicators': [{'name': 'One Hour Photo', 'code': 'phi'}],
  'storeOpenTime': '7AM',
  'storeCloseTime': '10PM',
  'pharmacyOpenTime': '9AM',
  'pharmacyCloseTime': '9PM',
  'timeZone': 'EA',
  'emergencyCode': '0',
  'telePharmacyKiosk': False,
  'storeType': '01',
  'storeBrand': 'Walgreens'},
 'distance': '0.51',
 'mapUrl': 'https://maps.googleapis.com/maps/api/staticmap?size=451x451&markers=icon:http://www.walgreens.com/images/gmap/markers/point_wag.png|shadow:true|33.9900337,-80.99815760000001&client=gme-walgreens&sensor=false',
 'latitude': '33.98673446',
 'longitude': '-81.00605953',
 'storeSeoUrl': '/locator/walgreens-2708+rosewood+drive-columbia-sc-29205/id=17457',
 'clinicId': '0'}

It may be difficult to see in the output but most of the information we would want is contained in the 'store' element:

In [None]:
results[0]['store'] # first store in the list

In [None]:
results[1]['store'] # second store on the list

Suppose that for a research question, I am interested in knowing which Walgreens locations offer flu shots. After some exploration, I see that a "serviceIndicators" code of "fs" indicates that flu shots are offered at that location. We can loop through the 15 returned stores to print out a list of the stores that offer flu shots.

In [10]:
# Loop over 15 stores
for j in range(len(results)):
    # For each store, loop over their serviceIndicators to find 'tc'
    try:
        for i in results[j]['store']['serviceIndicators']:
            if i['code'] == 'fs':
                print('The Walgreens at ' + str(results[j]['store']['address']['street']) + ' offers flu shots.')
    except KeyError:
        pass

The Walgreens at 4467 DEVINE ST offers flu shots.
The Walgreens at 1941 BLOSSOM ST offers flu shots.
The Walgreens at 3501 FOREST DR offers flu shots.
The Walgreens at 7801 GARNERS FERRY RD offers flu shots.
The Walgreens at 1537 CHARLESTON HWY offers flu shots.
The Walgreens at 2224 AUGUSTA RD offers flu shots.
The Walgreens at 1223 SAINT ANDREWS RD offers flu shots.
The Walgreens at 9001 TWO NOTCH RD offers flu shots.
The Walgreens at 6118 SAINT ANDREWS RD offers flu shots.
