## Undocument APIs

[**Neal Caren**](mailto:neal.caren@gmail.com)  
University of North Carolina, Chapel Hill

Social scientists are often interested in collecting massive amounts of digital data from specific websites. When you are fortunate, someone else has already scrapped it for you and has released in on [GitHub](https://github.com) or as a [Kaggle dataset](https://www.kaggle.com/datasets). Sometimes the website provides an [API](https://en.wikipedia.org/wiki/Web_API), an interface for collecting the data in a systematic way that is usually reasonably well-document. The most prominent of these for social scientists is probably [Twitter](https://developer.twitter.com/en/docs.html), who have enabled hundreds of academic studies by making available information about posts and users. 

Other times, an API exists and is being used by the website, but information about is not publicly available. Most website searches, for example, involve using an internal web API. Your search term, along with other relevant parameters, is used to extract the corresponding results from a dataset and then displayed on a web page. While we see the data as part of a web page, the data is often transmitted in a different, more research-friend format, usually in a [JSON](https://en.wikipedia.org/wiki/JSON) format. Using an undocumented API allows you to systematically collect the data without the parsing the HTML of each page. 

Below, I walk through the steps I recently used  when trying to gather Fox News Opinion articles. In this case, the API doesn't return the full text of the articles, but, as is often the case, returns all the metadata about the article, including the article URL for subsequent scraping.

I use Python, but the overall logic is similar for other languages. 

I begin by enabling the developer toolbar on my browser. In Safari, this is under **Preferences->Advanced** and then clicking **Show Develop Menu in Menu Bar**. The process is similar in Chrome. 

I visited the front page.
![frontpage](images/frontpage.png)


I clicked on **Opinion**, the section I wanted to scrape.
![opinion](images/opinion.png)

I scrolled down the page and found a list of articles with a **Show More** button. This type of button, which leads to the next set of results, is a key component for understanding how the webpage is structured.  
![more.png](images/more.png)


In this case, **Show More** does not load a new page but it does expand the number of results shown on the existing page. If you copy the link associated with this button (`https://www.foxnews.com/opinion#`) and paste it in a new browser window, it merely displays the first set of results, so that is a dead end for uncovering an API. 

To be confident that there might be a decent number of articles available, I clicked the **Show More** button several times. Each time it loaded more articles.
![more.png](images/more2.png)

Lots of data is passed between my computer and the Foxnews website with each click. Additional information about these streams can be revealed through the **Develop** menu and then the **Show Page Resources** option.

![develop.png](images/develop.png)


This defaults to showing the page's HTML code.
![html.png](images/html.png)

To look for signs of an API, I click on the **Network** tab. Results are sometimes already listed, but I start clean by using the trash icon on the far right. With the **Network** tab visible, I then click **Show More** on the web page. Each resource exchanged between my computer and various servers are now displayed. When the action stops, I sort the list by size. 
![network.png](images/network1.png)

I now review each of the items for something that looks like the results of search API, a plain text file with the search results. 

Usually, the top of the list displays the images that the page retrieved. Images usually have a .png or .jpg extension. Selecting the first item in the results, an JPG with a name full of numbers, confirms that it is a picture. 

![image.png](images/network_image.png)

The name of the second item, **article-search** has more promise. The **Preview** tab shows that this is a JSON object that appears to be a list of the articles that were returned after the **Show More** button was pressed. Bingo!

![image.png](images/network_search.png)


The **Header** tab reveals the specific URL that returned this JSON. 
![image.png](images/network_header.png)


The URL has all the signs of an undocumented API. First, it contains "API" as part of the string. Second, it includes a series of search parameters, such as `isCategory` and `size`.  Now I copy the URL and paste and save it as a Python string. I split the string over several lines to view it more clearly.

In [3]:
url = ('https://www.foxnews.com/api/article-search?'
       'isCategory=true&isTag=false&isKeyword=false&'
       'isFixed=false&isFeedUrl=false&searchSelected=opinion&'
       'contentTypes=%7B%22interactive%22:true,%22slideshow%22:true,%22video%22:false,%22article%22:true%7D&'
       'size=11&offset=30')

The next step is to see if Python can access the API with a straightforward command. Some APIs require confirmation that the search is originating from original websites, while others do not enforce that. Figuring out the right way to programmatically access the website is a process of trial and error. I use the `requests` library.

In [4]:
import requests

r = requests.get(url)
r.status_code

200

A status code of 200 means that something was returned. 

Since it looked like a JSON object when viewed in the browser, I take advantage of the `requests` JSON decoder.

In [5]:
r.json()

[{'category': {'name': 'OPINION', 'url': '/category/opinion'},
  'description': 'Rep. Ilhan Omar sparked outrage Sunday from both Democrats and Republicans after asserting on Twitter that U.S. support for Israel is based on monetary support from Jewish groups, especially AIPAC, the American Israel Public Affairs Committee.',
  'duration': '',
  'imageUrl': 'https://a57.foxnews.com/media2.foxnews.com/BrightCove/694940094001/2019/02/11/348/196/694940094001_6000812878001_6000814671001-vs.jpg?ve=1&tl=1',
  'isBreaking': False,
  'isLive': False,
  'lastPublishedDate': '2019-02-11T13:42:01-05:00',
  'publicationDate': '2019-02-11T13:42:01-05:00',
  'title': "Democrats allowing Ilhan Omar's anti-Semitic rhetoric to be standard-bearer for the party",
  'url': '/opinion/democrats-allowing-ilhan-omars-anti-semitic-rhetoric-to-be-standard-bearer-for-the-party'},
 {'category': {'name': 'OPINION', 'url': '/category/opinion'},
  'description': 'I’m a member of the U.S. House of Representatives who 

The best way to turn a JSON into usable a format is with the `pandas` library.

In [6]:
import pandas as pd

df = pd.DataFrame(r.json())

In [7]:
df.head()

Unnamed: 0,category,description,duration,imageUrl,isBreaking,isLive,lastPublishedDate,publicationDate,title,url
0,"{'name': 'OPINION', 'url': '/category/opinion'}",Rep. Ilhan Omar sparked outrage Sunday from bo...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-11T13:42:01-05:00,2019-02-11T13:42:01-05:00,Democrats allowing Ilhan Omar's anti-Semitic r...,/opinion/democrats-allowing-ilhan-omars-anti-s...
1,"{'name': 'OPINION', 'url': '/category/opinion'}",I’m a member of the U.S. House of Representati...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-11T13:22:58-05:00,2019-02-11T13:22:58-05:00,Rep. Roger Marshall: NY abortion law is a dang...,/opinion/new-yorks-late-term-abortion-law-is-a...
2,"{'name': 'OPINION', 'url': '/category/opinion'}",Neomi Rao is President Trump’s nominee for the...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-11T12:19:19-05:00,2019-02-11T12:19:19-05:00,Democrats' attacks on Kavanaugh replacement Ne...,/opinion/democrats-overreach-again-in-attackin...
3,"{'name': 'OPINION', 'url': '/category/opinion'}",It’s open season on white males at Yale Univer...,,https://a57.foxnews.com/static.foxnews.com/fox...,False,False,2019-02-11T12:02:00-05:00,2019-02-11T12:02:00-05:00,Have American universities become breeding gro...,/opinion/have-american-universities-become-bre...
4,"{'name': 'OPINION', 'url': '/category/opinion'}",Democrats need an identity-politics interventi...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-11T10:20:04-05:00,2019-02-11T10:20:04-05:00,Democrats' are having an identity meltdown -- ...,/opinion/democrats-are-having-an-identity-melt...


That looks pretty good. The JSON appears to have some nested elements in it. For example, Category contains a dictionary. These can often be flattened with `json_normalize`.

In [8]:
from pandas.io.json import json_normalize

df = json_normalize(r.json())

df.head()

Unnamed: 0,category.name,category.url,description,duration,imageUrl,isBreaking,isLive,lastPublishedDate,publicationDate,title,url
0,OPINION,/category/opinion,Rep. Ilhan Omar sparked outrage Sunday from bo...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-11T13:42:01-05:00,2019-02-11T13:42:01-05:00,Democrats allowing Ilhan Omar's anti-Semitic r...,/opinion/democrats-allowing-ilhan-omars-anti-s...
1,OPINION,/category/opinion,I’m a member of the U.S. House of Representati...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-11T13:22:58-05:00,2019-02-11T13:22:58-05:00,Rep. Roger Marshall: NY abortion law is a dang...,/opinion/new-yorks-late-term-abortion-law-is-a...
2,OPINION,/category/opinion,Neomi Rao is President Trump’s nominee for the...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-11T12:19:19-05:00,2019-02-11T12:19:19-05:00,Democrats' attacks on Kavanaugh replacement Ne...,/opinion/democrats-overreach-again-in-attackin...
3,OPINION,/category/opinion,It’s open season on white males at Yale Univer...,,https://a57.foxnews.com/static.foxnews.com/fox...,False,False,2019-02-11T12:02:00-05:00,2019-02-11T12:02:00-05:00,Have American universities become breeding gro...,/opinion/have-american-universities-become-bre...
4,OPINION,/category/opinion,Democrats need an identity-politics interventi...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-11T10:20:04-05:00,2019-02-11T10:20:04-05:00,Democrats' are having an identity meltdown -- ...,/opinion/democrats-are-having-an-identity-melt...


Great! This process demonstrates that FoxNews has an undocumented API that can be accessed via Python. The dataset, however, only has a few cases.

In [9]:
len(df)

11

The next step is to see if more data can be collected. I usually focus on two parameters: How do I get to the next set of results? Can I get more results with each call? 

In [10]:
print(url)

https://www.foxnews.com/api/article-search?isCategory=true&isTag=false&isKeyword=false&isFixed=false&isFeedUrl=false&searchSelected=opinion&contentTypes=%7B%22interactive%22:true,%22slideshow%22:true,%22video%22:false,%22article%22:true%7D&size=11&offset=30


Looking back at the URL, the likely suspects for manipulation are `size`, which usually determines the number of results, and `offset` which usually means "start with the nth result". I first check to see how many results can be returned in one call. If the answer is 10,000, I don't need to do much more.

Since I'll be making many calls using the API, I write a quick function to take the URL and return a dataframe. Once I'm confident it will work, I would likely make a more robust function that allows more direct manipulation of the parameters, but I don't want to spend too much time on that if the whole API is a dead end.

In [11]:
def fox_df(url):
    r = requests.get(url)
    df = json_normalize(r.json())
    print('Return a dataframe of length',len(df))
    return df

I confirm that the function works using the original url.

In [12]:
fox_df(url)

Return a dataframe of length 11


Unnamed: 0,category.name,category.url,description,duration,imageUrl,isBreaking,isLive,lastPublishedDate,publicationDate,title,url
0,OPINION,/category/opinion,Rep. Ilhan Omar sparked outrage Sunday from bo...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-11T13:42:01-05:00,2019-02-11T13:42:01-05:00,Democrats allowing Ilhan Omar's anti-Semitic r...,/opinion/democrats-allowing-ilhan-omars-anti-s...
1,OPINION,/category/opinion,I’m a member of the U.S. House of Representati...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-11T13:22:58-05:00,2019-02-11T13:22:58-05:00,Rep. Roger Marshall: NY abortion law is a dang...,/opinion/new-yorks-late-term-abortion-law-is-a...
2,OPINION,/category/opinion,Neomi Rao is President Trump’s nominee for the...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-11T12:19:19-05:00,2019-02-11T12:19:19-05:00,Democrats' attacks on Kavanaugh replacement Ne...,/opinion/democrats-overreach-again-in-attackin...
3,OPINION,/category/opinion,It’s open season on white males at Yale Univer...,,https://a57.foxnews.com/static.foxnews.com/fox...,False,False,2019-02-11T12:02:00-05:00,2019-02-11T12:02:00-05:00,Have American universities become breeding gro...,/opinion/have-american-universities-become-bre...
4,OPINION,/category/opinion,Democrats need an identity-politics interventi...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-11T10:20:04-05:00,2019-02-11T10:20:04-05:00,Democrats' are having an identity meltdown -- ...,/opinion/democrats-are-having-an-identity-melt...
5,OPINION,/category/opinion,For proof that Trump is not the root of all ma...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-11T07:48:14-05:00,2019-02-11T07:48:14-05:00,Trump's sanity highlighted by Democrats' crazy...,/opinion/trumps-sanity-highlighted-by-democrat...
6,OPINION,/category/opinion,Republicans have found their platform for 2020...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-11T06:59:52-05:00,2019-02-11T06:59:52-05:00,Liz Peek: GOP's 2020 campaign will put Dem ext...,/opinion/liz-peek-gops-2020-campaign-will-put-...
7,OPINION,/category/opinion,"For a while, I’ve wanted to say what I think a...",,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-11T06:41:46-05:00,2019-02-11T06:41:46-05:00,Steve Hilton: The Democratic racism scandal in...,/opinion/steve-hilton-the-democratic-racism-sc...
8,OPINION,/category/opinion,Hours after Massachusetts Democrat Sen. Elizab...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-11T04:00:38-05:00,2019-02-11T04:00:38-05:00,2020 Election: THIS is the Democrat Trump woul...,/opinion/2020-election-this-is-the-democrat-tr...
9,OPINION,/category/opinion,If I were advising our Houses of Congress on h...,,https://a57.foxnews.com/static.foxnews.com/fox...,False,False,2019-02-10T23:18:53-05:00,2019-02-10T23:18:53-05:00,I'm a corporate coach -- Here's how we can get...,/opinion/im-a-corporate-coach-heres-how-we-can...


As a first attempt, I try to retrieve 100 results.

In [13]:
url100 = url.replace('size=11','size=100')

fox_df(url100)

Return a dataframe of length 30


Unnamed: 0,category.name,category.url,description,duration,imageUrl,isBreaking,isLive,lastPublishedDate,publicationDate,title,url
0,OPINION,/category/opinion,Rep. Ilhan Omar sparked outrage Sunday from bo...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-11T13:42:01-05:00,2019-02-11T13:42:01-05:00,Democrats allowing Ilhan Omar's anti-Semitic r...,/opinion/democrats-allowing-ilhan-omars-anti-s...
1,OPINION,/category/opinion,I’m a member of the U.S. House of Representati...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-11T13:22:58-05:00,2019-02-11T13:22:58-05:00,Rep. Roger Marshall: NY abortion law is a dang...,/opinion/new-yorks-late-term-abortion-law-is-a...
2,OPINION,/category/opinion,Neomi Rao is President Trump’s nominee for the...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-11T12:19:19-05:00,2019-02-11T12:19:19-05:00,Democrats' attacks on Kavanaugh replacement Ne...,/opinion/democrats-overreach-again-in-attackin...
3,OPINION,/category/opinion,It’s open season on white males at Yale Univer...,,https://a57.foxnews.com/static.foxnews.com/fox...,False,False,2019-02-11T12:02:00-05:00,2019-02-11T12:02:00-05:00,Have American universities become breeding gro...,/opinion/have-american-universities-become-bre...
4,OPINION,/category/opinion,Democrats need an identity-politics interventi...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-11T10:20:04-05:00,2019-02-11T10:20:04-05:00,Democrats' are having an identity meltdown -- ...,/opinion/democrats-are-having-an-identity-melt...
5,OPINION,/category/opinion,For proof that Trump is not the root of all ma...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-11T07:48:14-05:00,2019-02-11T07:48:14-05:00,Trump's sanity highlighted by Democrats' crazy...,/opinion/trumps-sanity-highlighted-by-democrat...
6,OPINION,/category/opinion,Republicans have found their platform for 2020...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-11T06:59:52-05:00,2019-02-11T06:59:52-05:00,Liz Peek: GOP's 2020 campaign will put Dem ext...,/opinion/liz-peek-gops-2020-campaign-will-put-...
7,OPINION,/category/opinion,"For a while, I’ve wanted to say what I think a...",,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-11T06:41:46-05:00,2019-02-11T06:41:46-05:00,Steve Hilton: The Democratic racism scandal in...,/opinion/steve-hilton-the-democratic-racism-sc...
8,OPINION,/category/opinion,Hours after Massachusetts Democrat Sen. Elizab...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-11T04:00:38-05:00,2019-02-11T04:00:38-05:00,2020 Election: THIS is the Democrat Trump woul...,/opinion/2020-election-this-is-the-democrat-tr...
9,OPINION,/category/opinion,If I were advising our Houses of Congress on h...,,https://a57.foxnews.com/static.foxnews.com/fox...,False,False,2019-02-10T23:18:53-05:00,2019-02-10T23:18:53-05:00,I'm a corporate coach -- Here's how we can get...,/opinion/im-a-corporate-coach-heres-how-we-can...


This does not return 100 results, but it does return 30, which appears to be the internal maximum.  Collecting a larger corpus of article metadata will need to be done in batches of 30. 

The next question is how far back can the API go? Here, the first suspect is the `offset` parameter. The current value of 30 is usually associated with starting with the 30th results. So an `offset` of 30 with a `size` of `10` is likely to return results 30-39.

As a first pass, I set the value of `offset` to 0 to get the most recent results.

In [14]:
url_off = url.replace('offset=30','offset=0')

fox_df(url_off)

Return a dataframe of length 11


Unnamed: 0,category.name,category.url,description,duration,imageUrl,isBreaking,isLive,lastPublishedDate,publicationDate,title,url
0,Faith & Values,/category/faith-values,What is love? You won't find the best answer o...,,https://a57.foxnews.com/static.foxnews.com/fox...,False,False,2019-02-13T14:58:15-05:00,2019-02-13T14:58:15-05:00,"On Valentine's Day, 24 things that love is",/opinion/paul-tripp-on-valentines-day-24-thing...
1,OPINION,/category/opinion,"Rep. Ilhan Omar, D-Minn., has come under justi...",,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-13T13:30:50-05:00,2019-02-13T13:30:50-05:00,Leslie Marshall: Rep. Omar's anti-Semitic twee...,/opinion/rep-omars-anti-semitic-tweet-hurt-dem...
2,OPINION,/category/opinion,"California Gov. Gavin Newsom, under budgetary ...",,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-13T11:05:31-05:00,2019-02-13T11:05:31-05:00,California’s Gavin Newsom throws Green New Dea...,/opinion/californias-gavin-newsom-throws-green...
3,OPINION,/category/opinion,"Whew, that was a close call. But now that Demo...",,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-13T10:43:45-05:00,2019-02-13T10:43:45-05:00,Michael Goodwin: Can the Democrats ever get ov...,/opinion/michael-goodwin-can-the-democrats-eve...
4,OPINION,/category/opinion,"To some of us, Feb. 14 is Valentine's Day. To ...",,https://a57.foxnews.com/static.foxnews.com/fox...,False,False,2019-02-13T09:03:37-05:00,2019-02-13T09:03:37-05:00,Valentine’s Day chocolate – here are some not ...,/opinion/valentines-day-chocolate-here-are-som...
5,OPINION,/category/opinion,A bipartisan congressional committee has agree...,,https://a57.foxnews.com/static.foxnews.com/fox...,False,False,2019-02-13T08:45:38-05:00,2019-02-13T08:45:38-05:00,Sean Hannity: How Trump can take Congress' gar...,/opinion/sean-hannity-how-trump-can-take-congr...
6,Laura Ingraham's Monologue,/category/shows/ingraham-angle/transcript/laur...,"I hate to tell you I told you so, but I told y...",,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-13T08:08:14-05:00,2019-02-13T08:08:14-05:00,Laura Ingraham: The border wall becomes a bord...,/opinion/laura-ingraham-the-border-wall-become...
7,Tucker Carlson's Monologue,/category/shows/tucker-carlson-tonight/transcr...,"We have news for you, breaking news, that for ...",,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-13T07:44:14-05:00,2019-02-13T07:44:14-05:00,Tucker Carlson: There was no Russian collusion...,/opinion/tucker-carlson-there-was-no-russian-c...
8,OPINION,/category/opinion,Trump can't win a shutdown fight today. But he...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-13T04:00:50-05:00,2019-02-13T04:00:50-05:00,Marc Thiessen: Here's how Trump can get the re...,/opinion/marc-thiessen-heres-how-trump-can-get...
9,OPINION,/category/opinion,The American Dream exists because of capitalis...,,https://a57.foxnews.com/static.foxnews.com/fox...,False,False,2019-02-13T04:00:49-05:00,2019-02-13T04:00:49-05:00,Andy Puzder: Socialism vs. Capitalism – I want...,/opinion/andy-puzder-socialism-vs-capitalism-i...


The `lastPublicationDate` value of the first row is 2019-02-08. Hopefully a larger value will be associated with articles published earlier.

In [15]:
url_off = url.replace('offset=30','offset=100')

fox_df(url_off)


Return a dataframe of length 11


Unnamed: 0,category.name,category.url,description,duration,imageUrl,isBreaking,isLive,lastPublishedDate,publicationDate,title,url
0,OPINION,/category/opinion,What is the legal status of a baby who survive...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-07T03:45:36-05:00,2019-02-07T03:45:36-05:00,Judge Andrew Napolitano: Roe's little-known co...,/opinion/judge-andrew-napolitano-roes-little-k...
1,OPINION,/category/opinion,The State of the Union speech Tuesday was a go...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-06T19:58:43-05:00,2019-02-06T19:58:43-05:00,Gutfeld on last night’s address,/opinion/gutfeld-on-last-nights-address
2,OPINION,/category/opinion,“Our brave troops have now been fighting in th...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-06T15:22:13-05:00,2019-02-06T15:22:13-05:00,Trump's right about Afghanistan and the Middle...,/opinion/trumps-right-about-afghanistan-and-th...
3,OPINION,/category/opinion,"In Tuesday’s State of the Union address, Presi...",,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-06T15:07:46-05:00,2019-02-06T15:07:46-05:00,Trump announces another North Korea nuke summi...,/opinion/trump-announces-another-north-korea-n...
4,OPINION,/category/opinion,Every once in a while a speech is so effective...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-06T14:43:46-05:00,2019-02-06T14:43:46-05:00,Newt Gingrich: Trump's State of the Union chan...,/opinion/newt-gingrich-president-trumps-state-...
5,OPINION,/category/opinion,I was honored to be among those Americans sele...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-06T13:52:29-05:00,2019-02-06T13:52:29-05:00,Todd Starnes: I had a front-row seat at Trump'...,/opinion/todd-starnes-i-had-a-front-row-seat-a...
6,OPINION,/category/opinion,Democrats on the House Judiciary Committee hel...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-06T12:47:55-05:00,2019-02-06T12:47:55-05:00,Rep. Steve Scalise: Democrats don't want you t...,/opinion/rep-steve-scalise-democrats-dont-want...
7,OPINION,/category/opinion,President Donald Trump used his State of the U...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-06T11:10:47-05:00,2019-02-06T11:10:47-05:00,Trump laid out an infrastructure strategy for ...,/opinion/trump-laid-out-an-infrastructure-stra...
8,OPINION,/category/opinion,Wednesday is the 108th anniversary of the birt...,,https://a57.foxnews.com/a57.foxnews.com/media2...,False,False,2019-02-06T10:26:29-05:00,2019-02-06T10:26:29-05:00,"Ronald Reagan, born exactly 108 years ago, was...",/opinion/ronald-reagan-born-exactly-108-years-...
9,OPINION,/category/opinion,In his State of the Union speech Tuesday night...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-06T10:20:12-05:00,2019-02-06T10:20:12-05:00,Andy Puzder: Trump's economy has taken us from...,/opinion/andy-puzder-trumps-economy-has-taken-...


The article with an offset value of 100 was published on 2019-02-01, or roughly a week before our offset 0. Perfect! Increasing the offset value yields additional article metadata in chronological order. 

Next, how far back can we go?

In [16]:
url_off = url.replace('offset=30','offset=1000')

fox_df(url_off)


Return a dataframe of length 11


Unnamed: 0,category.name,category.url,description,duration,imageUrl,isBreaking,isLive,lastPublishedDate,publicationDate,title,url
0,OPINION,/category/opinion,A reported tentative deal between the U.S. and...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2018-11-24T21:26:10-05:00,2018-11-24T21:26:10-05:00,Reported tentative migrant asylum deal with Me...,/opinion/reported-tentative-migrant-asylum-dea...
1,OPINION,/category/opinion,Like a basketball player who mistakenly shoots...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2018-11-24T17:25:46-05:00,2018-11-24T17:25:46-05:00,Trump is right about biased judges; Schumer ac...,/opinion/trump-is-right-about-biased-judges-sc...
2,OPINION,/category/opinion,Former Secretary of States Condoleezza Rice wa...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2018-11-24T12:25:33-05:00,2018-11-24T12:25:33-05:00,Condoleezza Rice is right – NFL needs to give ...,/opinion/condoleezza-rice-is-right-nfl-needs-t...
3,OPINION,/category/opinion,What a great week to be thankful for small bus...,,https://a57.foxnews.com/static.foxnews.com/fox...,False,False,2018-11-24T04:00:44-05:00,2018-11-24T04:00:44-05:00,"This Small Business Saturday, let’s celebrate ...",/opinion/this-small-business-saturday-lets-cel...
4,OPINION,/category/opinion,In one of the most provocative and misundersto...,,https://a57.foxnews.com/static.foxnews.com/fox...,False,False,2018-11-24T04:00:36-05:00,2018-11-24T04:00:36-05:00,Did a mysterious extinction event precede Adam...,/opinion/did-a-mysterious-extinction-event-pre...
5,OPINION,/category/opinion,Serving as Chairman of the House Committee on ...,,https://a57.foxnews.com/static.foxnews.com/fox...,False,False,2018-11-24T04:00:29-05:00,2018-11-24T04:00:29-05:00,Rep. Steve Chabot: This Small Business Saturda...,/opinion/rep-steve-chabot-this-small-business-...
6,Illegal Immigrants,/category/us/immigration/illegal-immigrants,President Trump was right to order troops to o...,,https://a57.foxnews.com/static.foxnews.com/fox...,False,False,2018-11-24T04:00:04-05:00,2018-11-24T04:00:04-05:00,Trump was right to send troops to our border –...,/opinion/trump-was-right-to-send-troops-to-our...
7,OPINION,/category/opinion,Many journalists argue that there is no such t...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2018-11-24T00:00:13-05:00,2018-11-24T00:00:13-05:00,Liberal media tell us the left is right – And ...,/opinion/liberal-media-tell-us-the-left-is-rig...
8,OPINION,/category/opinion,The final U.S. Senate race of 2018 is in Missi...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2018-11-23T20:42:27-05:00,2018-11-23T20:42:27-05:00,John Fund: Mississippi US Senate race features...,/opinion/john-fund-mississippi-u-s-senate-race...
9,OPINION,/category/opinion,For someone trying to demonstrate that the jud...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2018-11-23T15:17:05-05:00,2018-11-23T15:17:05-05:00,Marc Thiessen: Chief Justice Roberts is wrong....,/opinion/marc-thiessen-chief-justice-roberts-i...


An offset of 1,000 takes us back about nine months.

In [17]:
url_off = url.replace('offset=30','offset=5000')

fox_df(url_off)


Return a dataframe of length 11


Unnamed: 0,category.name,category.url,description,duration,imageUrl,isBreaking,isLive,lastPublishedDate,publicationDate,title,url
0,POLITICS,/category/politics,Many of our country’s most cherished instituti...,,https://a57.foxnews.com/static.foxnews.com/fox...,False,False,2017-09-26T03:40:20-04:00,2017-08-12T07:00:00-04:00,Conservatives and moderates: It's time to stop...,/opinion/conservatives-and-moderates-its-time-...
1,CRIME,/category/us/crime,The orientation for the new people takes place...,,https://a57.foxnews.com/static.foxnews.com/fox...,False,False,2017-09-26T03:40:10-04:00,2017-08-12T09:30:00-04:00,Becoming a prisoner in the federal prison syst...,/opinion/becoming-a-prisoner-in-the-federal-pr...
2,MENTAL HEALTH,/category/health/mental-health,Loneliness now eclipses obesity as a cause of ...,,https://a57.foxnews.com/static.foxnews.com/fox...,False,False,2017-09-26T03:39:59-04:00,2017-08-12T06:30:00-04:00,Dr. Keith Ablow: Loneliness is now more deadly...,/opinion/dr-keith-ablow-loneliness-is-now-more...
3,White House,/category/politics/executive/white-house,The president ripped into the violent MS-13 ga...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2017-09-26T03:39:48-04:00,2017-08-12T07:15:00-04:00,"Trump, the police and the spin",/opinion/trump-the-police-and-the-spin
4,POLITICS,/category/politics,God is not a racist. Torch-wielding white nati...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2017-09-26T03:39:39-04:00,2017-08-12T11:15:00-04:00,White Christian conservatives should oppose pr...,/opinion/white-christian-conservatives-should-...
5,White House,/category/politics/executive/white-house,President Trump’s declaration Thursday that th...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2017-09-26T03:39:28-04:00,2017-08-12T12:00:00-04:00,President Trump and other leaders must work to...,/opinion/president-trump-and-other-leaders-mus...
6,Iran,/category/world/conflicts/iran,Iran’s nuclear weapons activities have continu...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2017-09-26T03:16:53-04:00,2017-08-13T15:30:00-04:00,"New sanctions on Iran, now it's time for a new...",/opinion/new-sanctions-on-iran-now-its-time-fo...
7,National Security,/category/politics/executive/national-security,As the administration and the American public ...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2017-09-26T03:16:43-04:00,2017-08-13T14:30:00-04:00,Dealing with global hot spots in times of dome...,/opinion/dealing-with-global-hot-spots-in-time...
8,Faith,/category/us/personal-freedoms/faith,"When sabers begin to rattle around the world, ...",,https://a57.foxnews.com/static.foxnews.com/fox...,False,False,2017-09-26T03:16:33-04:00,2017-08-13T13:45:00-04:00,Responding to North Korea -- When sabers rattl...,/opinion/responding-to-north-korea-when-sabers...
9,EDUCATION,/category/health/education,Over three million students were suspended for...,,https://a57.foxnews.com/static.foxnews.com/fox...,False,False,2017-09-26T02:33:44-04:00,2017-08-14T05:00:00-04:00,Millions of kids will get suspended this year....,/opinion/millions-of-kids-will-get-suspended-t...


5,000 brings us back to 2017. 

In [18]:
url_off = url.replace('offset=30','offset=10000')

fox_df(url_off)


Return a dataframe of length 0


But 10,000 fails. 😞.

After attempting multiple values, it appears something around 9,950 is the maximum offset that will return results, which is approximately three years of data. This limit could be hard coded into the API or this could simply be all the data that is available on the website. 

In [19]:
url_off = url.replace('offset=30','offset=9950')

fox_df(url_off)


Return a dataframe of length 11


Unnamed: 0,category.name,category.url,description,duration,imageUrl,isBreaking,isLive,lastPublishedDate,publicationDate,title,url
0,Family,/category/us/personal-freedoms/family,"Along with millions of viewers, I was moved to...",,https://a57.foxnews.com/static.foxnews.com/fox...,False,False,2016-03-10T16:08:41-05:00,2016-03-10T15:52:00-05:00,What Kelly Clarkson can teach us about fathers...,/opinion/what-kelly-clarkson-can-teach-us-abou...
1,Presidential,/category/politics/elections/presidential,Donald Trump could bring a welcome pragmatism ...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2016-03-10T14:28:29-05:00,2016-03-10T14:20:00-05:00,Why Donald Trump's foreign policy makes sense,/opinion/why-donald-trumps-foreign-policy-make...
2,Democrats,/category/politics/elections/democrats,What if Hillary Clinton is in legal hot water ...,,https://a57.foxnews.com/a57.foxnews.com/media2...,False,False,2016-03-10T13:27:38-05:00,2016-03-10T00:00:00-05:00,The big question about Hillary Clinton: What i...,/opinion/the-big-question-about-hillary-clinto...
3,Democrats,/category/politics/elections/democrats,"The left talks a lot about poverty, but when i...",,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2016-03-10T12:43:11-05:00,2016-03-10T12:35:00-05:00,"Bernie Sanders, white poverty and me",/opinion/bernie-sanders-white-poverty-and-me
4,Persecutions,/category/world/religion/persecutions,Dying Christians in the Middle East don’t need...,,https://a57.foxnews.com/static.foxnews.com/fox...,False,False,2016-03-10T12:28:39-05:00,2016-03-10T12:09:00-05:00,"Mr. Obama, words matter. Call the massacre of ...",/opinion/mr-obama-words-matter-call-the-massac...
5,Gretchen's Take,/category/columns/gretchens-take,Will Rubio or Kasich fall on the sword to help...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2016-03-10T10:12:15-05:00,2016-03-09T10:05:00-05:00,Gretchen's Take: Politics change on a dime,/opinion/gretchens-take-politics-change-on-a-dime
6,Afghanistan,/category/world/conflicts/afghanistan,Former First Lady Laura Bush tells Dana Perino...,,https://a57.foxnews.com/static.foxnews.com/fox...,False,False,2016-03-10T07:00:10-05:00,2016-03-10T07:00:00-05:00,Former First Lady Laura Bush talks with Dana P...,/opinion/former-first-lady-laura-bush-talks-wi...
7,OPINION,/category/opinion,Maybe it’s time to admit that Donadl Trump is ...,,https://a57.foxnews.com/static.foxnews.com/fox...,False,False,2016-03-09T23:09:12-05:00,2016-03-10T01:00:00-05:00,"Thank you, America, for playing 'Celebrity App...",/opinion/thank-you-america-for-playing-celebri...
8,Todd Starnes,/category/columns/todds-american-dispatch,There’s a witch hunt underway for conservative...,,https://a57.foxnews.com/a57.foxnews.com/media2...,False,False,2016-03-09T16:15:22-05:00,2016-03-09T15:52:00-05:00,Student senator faces impeachment for conserva...,/opinion/student-senator-faces-impeachment-for...
9,Threats,/category/politics/defense/threats,FBI Director James Comey testified before Cong...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2016-03-09T13:23:42-05:00,2016-03-09T12:53:00-05:00,Chairman McCaul: The terrorist exodus has begu...,/opinion/chairman-mccaul-the-terrorist-exodus-...


The final step in collecting the article metadata via the API is to make all the necessary API calls to gather all the relevant article metadata. I slightly revise the `fox_df` function so that it takes a value of an offset. I hardcoded the URL, fixing the size to 30.  A more robust function would turn the entire set of parameters into a dictionary that was modifiable. I also took out the print statement which would clutter the results. 

In [20]:
def fox_df(offset):
    url = ('https://www.foxnews.com/api/article-search?'
           'isCategory=true&isTag=false&isKeyword=false&'
           'isFixed=false&isFeedUrl=false&searchSelected=opinion&'
           'contentTypes=%7B%22interactive%22:true,%22slideshow%22:true,%22video%22:false,%22article%22:true%7D&'
           'size=30&offset=0')
    
    url = url.replace('offset=0', 'offset=%s' % offset)
    
    r = requests.get(url)
    df = json_normalize(r.json())
    return df


Finally, I loop over the function. I create an empty dataframe and then append the results after each loop. I pause three seconds each pass in order to not access the web server too many times. The first time I ran this, I only retrieved a few pages of results to make sure everything worked.

In [21]:
from time import sleep # for pausing

# create empty dataframe to store results
fox_opinion_df = pd.DataFrame() 

# Create a loop that counts up by 30.
for offset in range(0, 1000, 30):
    new_df = fox_df(offset)
    
    # Add the new results to the existing database
    fox_opinion_df = fox_opinion_df.append(new_df, ignore_index=True)
    
    # Pause for three seconds to be polite to the web server
    sleep(3)
    

In [22]:
fox_opinion_df.tail()

Unnamed: 0,category.name,category.url,description,duration,imageUrl,isBreaking,isLive,lastPublishedDate,publicationDate,title,url
1015,OPINION,/category/opinion,Congress and the president must find a way to ...,,https://a57.foxnews.com/static.foxnews.com/fox...,False,False,2018-11-23T04:00:26-05:00,2018-11-23T04:00:26-05:00,"Social Security, Medicare and Medicaid spendin...",/opinion/social-security-medicare-and-medicaid...
1016,OPINION,/category/opinion,It’s the quintessential irony of the American ...,,https://a57.foxnews.com/static.foxnews.com/fox...,False,False,2018-11-23T04:00:20-05:00,2018-11-23T04:00:20-05:00,This Black Friday beware of living with full c...,/opinion/this-black-friday-beware-of-living-wi...
1017,THE FIVE,/category/shows/the-five,Why do the media freak out whenever President ...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2018-11-22T12:44:01-05:00,2018-11-21T18:43:17-05:00,Gutfeld on Trump's response to Khashoggi,/opinion/gutfeld-on-trumps-response-to-khashoggi
1018,OPINION,/category/opinion,It has pleased Almighty God to prolong our nat...,,https://a57.foxnews.com/static.foxnews.com/fox...,False,False,2018-11-22T04:00:58-05:00,2018-11-22T04:00:58-05:00,Abraham Lincoln's Thanksgiving Day Proclamation,/opinion/abraham-lincolns-thanksgiving-day-pro...
1019,OPINION,/category/opinion,Thanksgiving for me is about the Big Fs – fami...,,https://a57.foxnews.com/static.foxnews.com/fox...,False,False,2018-11-22T04:00:47-05:00,2018-11-22T04:00:47-05:00,Five things I learned from Thanksgiving family...,/opinion/five-things-i-learned-from-thanksgivi...


The dataframe can be stored as either as CSV file or JSON. CSV is best if you want to use it elsewhere, while a JSON usually is more robust to handling text strings which somtimes trip up CSV readers. 

In [23]:
fox_opinion_df.to_csv('foxnews_opinion.csv')

In [24]:
fox_opinion_df.to_json('foxnews_opinion.json', orient ='records')

Things aren't always this easy, but when you stumble across an undocumented API you can quickly put together a dataset.