## Undocument APIs

Social scientists are often interested in collecting massive amounts of digital data from specific websites. When you are fortunate, someone else has already scrapped it for you and has released in on [GitHub](https://github.com) or as a [Kaggle dataset](https://www.kaggle.com/datasets). Sometimes the website provides an [API](https://en.wikipedia.org/wiki/Web_API), an interface for collecting the data in a systematic way that is usually reasonably well-document. The most prominent of these for social scientists is probably [Twitter](https://developer.twitter.com/en/docs.html), who have enabled hundreds of academic studies by making available information about posts and users. 

Other times, an API exists and is being used by the website, but information about is not publicly available. Most website searches, for example, involve using an internal web API. Your search term, along with other relevant parameters, is used to extract the corresponding results from a dataset and then displayed on a web page. While we see the data as part of a web page, the data is often transmitted in a different, more research-friend format, usually in a [JSON](https://en.wikipedia.org/wiki/JSON) format. Using an undocumented API allows you to systematically collect the data without the parsing the HTML of each page. 

Below, I walk through the steps I recently used  when trying to gather Fox News Opinion articles. In this case, the API doesn't return the full text of the articles, but, as is often the case, returns all the metadata about the article, including the article URL for subsequent scraping.

I use Python, but the overall logic is similar for other languages. 

I begin by enabling the developer toolbar on my browser. In Safari, this is under **Preferences->Advanced** and then clicking **Show Develop Menu in Menu Bar**. The process is similar in Chrome. 

I visited the front page.
![frontpage](images/frontpage.png)


I clicked on **Opinion**, the section I wanted to scrape.
![opinion](images/opinion.png)

I scrolled down the page and found a list of articles with a **Show More** button. This type of button, which leads to the next set of results, is a key component for understanding how the webpage is structured.  
![more.png](images/more.png)


In this case, **Show More** does not load a new page but it does expand the number of results shown on the existing page. If you copy the link associated with this button (`https://www.foxnews.com/opinion#`) and paste it in a new browser window, it merely displays the first set of results, so that is a dead end for uncovering an API. 

To be confident that there might be a decent number of articles available, I clicked the **Show More** button several times. Each time it loaded more articles.
![more.png](images/more2.png)

Lots of data is passed between my computer and the Foxnews website with each click. Additional information about these streams can be revealed through the **Develop** menu and then the **Show Page Resources** option.

![develop.png](images/develop.png)


This defaults to showing the page's HTML code.
![html.png](images\html.png)

To look for signs of an API, I click on the **Network** tab. Results are sometimes already listed, but I start clean by using the trash icon on the far right. With the **Network** tab visible, I then click **Show More** on the web page. Each resource exchanged between my computer and various servers are now displayed. When the action stops, I sort the list by size. 
![network.png](images\network1.png)

I now review each of the items for something that looks like the results of search API, a plain text file with the search results. 

Usually, the top of the list displays the images that the page retrieved. Images usually have a .png or .jpg extension. Selecting the first item in the results, an JPG with a name full of numbers, confirms that it is a picture. 

![image.png](images\network_image.png)

The name of the second item, **article-search** has more promise. The **Preview** tab shows that this is a JSON object that appears to be a list of the articles that were returned after the **Show More** button was pressed. Bingo!

![image.png](images\network_search.png)


The **Header** tab reveals the specific URL that returned this JSON. 
![image.png](images\network_header.png)


The URL has all the signs of an undocumented API. First, it contains "API" as part of the string. Second, it includes a series of search parameters, such as `isCategory` and `size`.  Now I copy the URL and paste and save it as a Python string. I split the string over several lines to view it more clearly.

In [69]:
url = ('https://www.foxnews.com/api/article-search?'
       'isCategory=true&isTag=false&isKeyword=false&'
       'isFixed=false&isFeedUrl=false&searchSelected=opinion&'
       'contentTypes=%7B%22interactive%22:true,%22slideshow%22:true,%22video%22:false,%22article%22:true%7D&'
       'size=11&offset=30')

The next step is to see if Python can access the API with a straightforward command. Some APIs require confirmation that the search is originating from original websites, while others do not enforce that. Figuring out the right way to programmatically access the website is a process of trial and error. I use the `requests` library.

In [9]:
import requests

r = requests.get(url)
r.status_code

200

A status code of 200 means that something was returned. 

Since it looked like a JSON object when viewed in the browser, I take advantage of the `requests` JSON decoder.

In [70]:
r.json()

[{'category': {'name': 'OPINION', 'url': '/category/opinion'},
  'description': 'The State of the Union speech Tuesday was a good night for President Trump, a bad one for the media. In tone and tenor, Trump nailed much of it – like his uncompromising rejection of socialism.',
  'duration': '',
  'imageUrl': 'https://a57.foxnews.com/media2.foxnews.com/BrightCove/694940094001/2019/02/06/348/196/694940094001_5999360740001_5999357801001-vs.jpg?ve=1&tl=1',
  'isBreaking': False,
  'isLive': False,
  'lastPublishedDate': '2019-02-06T19:58:43-05:00',
  'publicationDate': '2019-02-06T19:58:43-05:00',
  'title': 'Gutfeld on last night’s address',
  'url': '/opinion/gutfeld-on-last-nights-address'},
 {'category': {'name': 'OPINION', 'url': '/category/opinion'},
  'description': '“Our brave troops have now been fighting in the Middle East for almost 19 years,” President Trump said in his State of the Union address Tuesday night. “Great nations do not fight endless wars.”',
  'duration': '',
  'im

The best way to turn a JSON into usable a format is with the `pandas` library.

In [71]:
import pandas as pd

df = pd.DataFrame(r.json())

In [15]:
df.head()

Unnamed: 0,category,description,duration,imageUrl,isBreaking,isLive,lastPublishedDate,publicationDate,title,url
0,"{'name': 'OPINION', 'url': '/category/opinion'}",The State of the Union speech Tuesday was a go...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-06T19:58:43-05:00,2019-02-06T19:58:43-05:00,Gutfeld on last night’s address,/opinion/gutfeld-on-last-nights-address
1,"{'name': 'OPINION', 'url': '/category/opinion'}",“Our brave troops have now been fighting in th...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-06T15:22:13-05:00,2019-02-06T15:22:13-05:00,Trump's right about Afghanistan and the Middle...,/opinion/trumps-right-about-afghanistan-and-th...
2,"{'name': 'OPINION', 'url': '/category/opinion'}","In Tuesday’s State of the Union address, Presi...",,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-06T15:07:46-05:00,2019-02-06T15:07:46-05:00,Trump announces another North Korea nuke summi...,/opinion/trump-announces-another-north-korea-n...
3,"{'name': 'OPINION', 'url': '/category/opinion'}",Every once in a while a speech is so effective...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-06T14:43:46-05:00,2019-02-06T14:43:46-05:00,Newt Gingrich: Trump's State of the Union chan...,/opinion/newt-gingrich-president-trumps-state-...
4,"{'name': 'OPINION', 'url': '/category/opinion'}",I was honored to be among those Americans sele...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-06T13:52:29-05:00,2019-02-06T13:52:29-05:00,Todd Starnes: I had a front-row seat at Trump'...,/opinion/todd-starnes-i-had-a-front-row-seat-a...


That looks pretty good. The JSON appears to have some nested elements in it. For example, Category contains a dictionary. These can often be flattened with `json_normalize`.

In [72]:
from pandas.io.json import json_normalize

df = json_normalize(r.json())

df.head()

Unnamed: 0,category.name,category.url,description,duration,imageUrl,isBreaking,isLive,lastPublishedDate,publicationDate,title,url
0,OPINION,/category/opinion,The State of the Union speech Tuesday was a go...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-06T19:58:43-05:00,2019-02-06T19:58:43-05:00,Gutfeld on last night’s address,/opinion/gutfeld-on-last-nights-address
1,OPINION,/category/opinion,“Our brave troops have now been fighting in th...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-06T15:22:13-05:00,2019-02-06T15:22:13-05:00,Trump's right about Afghanistan and the Middle...,/opinion/trumps-right-about-afghanistan-and-th...
2,OPINION,/category/opinion,"In Tuesday’s State of the Union address, Presi...",,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-06T15:07:46-05:00,2019-02-06T15:07:46-05:00,Trump announces another North Korea nuke summi...,/opinion/trump-announces-another-north-korea-n...
3,OPINION,/category/opinion,Every once in a while a speech is so effective...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-06T14:43:46-05:00,2019-02-06T14:43:46-05:00,Newt Gingrich: Trump's State of the Union chan...,/opinion/newt-gingrich-president-trumps-state-...
4,OPINION,/category/opinion,I was honored to be among those Americans sele...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-06T13:52:29-05:00,2019-02-06T13:52:29-05:00,Todd Starnes: I had a front-row seat at Trump'...,/opinion/todd-starnes-i-had-a-front-row-seat-a...


Great! This process demonstrates that FoxNews has an undocumented API that can be accessed via Python. The dataset, however, only has a few cases.

In [17]:
len(df)

11

The next step is to see if more data can be collected. I usually focus on two parameters: How do I get to the next set of results? Can I get more results with each call? 

In [73]:
print(url)

https://www.foxnews.com/api/article-search?isCategory=true&isTag=false&isKeyword=false&isFixed=false&isFeedUrl=false&searchSelected=opinion&contentTypes=%7B%22interactive%22:true,%22slideshow%22:true,%22video%22:false,%22article%22:true%7D&size=11&offset=30


Looking back at the URL, the likely suspects for manipulation are `size`, which usually determines the number of results, and `offset` which usually means "start with the nth result". I first check to see how many results can be returned in one call. If the answer is 10,000, I don't need to do much more.

Since I'll be making many calls using the API, I write a quick function to take the URL and return a dataframe. Once I'm confident it will work, I would likely make a more robust function that allows more direct manipulation of the parameters, but I don't want to spend too much time on that if the whole API is a dead end.

In [45]:
def fox_df(url):
    r = requests.get(url)
    df = json_normalize(r.json())
    print('Return a dataframe of length',len(df))
    return df

I confirm that the function works using the original url.

In [26]:
fox_df(url)

Return a df of length 11


Unnamed: 0,category,description,duration,imageUrl,isBreaking,isLive,lastPublishedDate,publicationDate,title,url
0,"{'name': 'OPINION', 'url': '/category/opinion'}",The State of the Union speech Tuesday was a go...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-06T19:58:43-05:00,2019-02-06T19:58:43-05:00,Gutfeld on last night’s address,/opinion/gutfeld-on-last-nights-address
1,"{'name': 'OPINION', 'url': '/category/opinion'}",“Our brave troops have now been fighting in th...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-06T15:22:13-05:00,2019-02-06T15:22:13-05:00,Trump's right about Afghanistan and the Middle...,/opinion/trumps-right-about-afghanistan-and-th...
2,"{'name': 'OPINION', 'url': '/category/opinion'}","In Tuesday’s State of the Union address, Presi...",,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-06T15:07:46-05:00,2019-02-06T15:07:46-05:00,Trump announces another North Korea nuke summi...,/opinion/trump-announces-another-north-korea-n...
3,"{'name': 'OPINION', 'url': '/category/opinion'}",Every once in a while a speech is so effective...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-06T14:43:46-05:00,2019-02-06T14:43:46-05:00,Newt Gingrich: Trump's State of the Union chan...,/opinion/newt-gingrich-president-trumps-state-...
4,"{'name': 'OPINION', 'url': '/category/opinion'}",I was honored to be among those Americans sele...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-06T13:52:29-05:00,2019-02-06T13:52:29-05:00,Todd Starnes: I had a front-row seat at Trump'...,/opinion/todd-starnes-i-had-a-front-row-seat-a...
5,"{'name': 'OPINION', 'url': '/category/opinion'}",Democrats on the House Judiciary Committee hel...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-06T12:47:55-05:00,2019-02-06T12:47:55-05:00,Rep. Steve Scalise: Democrats don't want you t...,/opinion/rep-steve-scalise-democrats-dont-want...
6,"{'name': 'OPINION', 'url': '/category/opinion'}",President Donald Trump used his State of the U...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-06T11:10:47-05:00,2019-02-06T11:10:47-05:00,Trump laid out an infrastructure strategy for ...,/opinion/trump-laid-out-an-infrastructure-stra...
7,"{'name': 'OPINION', 'url': '/category/opinion'}",Wednesday is the 108th anniversary of the birt...,,https://a57.foxnews.com/a57.foxnews.com/media2...,False,False,2019-02-06T10:26:29-05:00,2019-02-06T10:26:29-05:00,"Ronald Reagan, born exactly 108 years ago, was...",/opinion/ronald-reagan-born-exactly-108-years-...
8,"{'name': 'OPINION', 'url': '/category/opinion'}",In his State of the Union speech Tuesday night...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-06T10:20:12-05:00,2019-02-06T10:20:12-05:00,Andy Puzder: Trump's economy has taken us from...,/opinion/andy-puzder-trumps-economy-has-taken-...
9,"{'name': 'OPINION', 'url': '/category/opinion'}",I’ve never seen a group of Washington lawmaker...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-06T09:02:25-05:00,2019-02-06T09:02:25-05:00,Dems react to Trump's economic achievements wi...,/opinion/dems-react-to-trumps-economic-achieve...


As a first attempt, I try to retrieve 100 results.

In [74]:
url100 = url.replace('size=11','size=100')

fox_df(url100)

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

This does not return 100 results, but it does return 30, which appears to be the internal maximum.  Collecting a larger corpus of article metadata will need to be done in batches of 30. 

The next question is how far back can the API go? Here, the first suspect is the `offset` parameter. The current value of 30 is usually associated with starting with the 30th results. So an `offset` of 30 with a `size` of `10` is likely to return results 30-39.

As a first pass, I set the value of `offset` to 0 to get the most recent results.

In [32]:
url_off = url.replace('offset=30','offset=0')

fox_df(url_off)

Return a df of length 11


Unnamed: 0,category,description,duration,imageUrl,isBreaking,isLive,lastPublishedDate,publicationDate,title,url
0,"{'name': 'OPINION', 'url': '/category/opinion'}",In their thirst to bloody or defeat Kavanaugh ...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-08T11:32:37-05:00,2019-02-08T11:32:37-05:00,Lessons from Kavanaugh replacement hearing – D...,/opinion/lessons-from-kavanaugh-replacement-he...
1,"{'name': 'Abortion', 'url': '/category/politic...",With late-term abortions – those performed aft...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-08T11:23:00-05:00,2019-02-08T11:23:00-05:00,Dr. Kent Ingle: ‘Pro-abortion bills’ are inhum...,/opinion/dr-kent-ingle-pro-abortion-bills-acro...
2,"{'name': 'OPINION', 'url': '/category/opinion'}",This plan is really a nifty piece of marketing...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-08T09:53:31-05:00,2019-02-08T09:53:31-05:00,Mark Penn: Green New Deal has more in common w...,/opinion/mark-penn-green-new-deal-has-more-in-...
3,"{'name': 'OPINION', 'url': '/category/opinion'}","As the 116th Congress begins, service has neve...",,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-08T07:47:51-05:00,2019-02-08T07:47:51-05:00,Michael Knowles: Trump and those self-obsessed...,/opinion/michael-knowles-trump-and-those-self-...
4,"{'name': 'OPINION', 'url': '/category/opinion'}",Alexandria Ocasio-Cortez has a megaphone that ...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-08T07:14:01-05:00,2019-02-08T07:14:01-05:00,Kimberley Strassel: Alexandria Ocasio-Cortez -...,/opinion/kimberley-strassel-alexandria-ocasio-...
5,"{'name': 'Laura Ingraham's Monologue', 'url': ...",Laura Ingraham: We can let the PC Puritans sca...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-08T07:04:49-05:00,2019-02-08T07:04:49-05:00,Laura Ingraham: We must be careful not to bow ...,/opinion/laura-ingraham-we-must-be-careful-not...
6,"{'name': 'Tucker Carlson's Monologue', 'url': ...",Tucker Carlson: We shouldn't be surprised that...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-08T06:21:15-05:00,2019-02-08T06:21:15-05:00,Tucker Carlson: Wearing blackface is fairly co...,/opinion/tucker-carlson-wearing-blackface-is-f...
7,"{'name': 'OPINION', 'url': '/category/opinion'}","""Great nations do not fight endless wars,"" Pre...",,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-08T04:00:33-05:00,2019-02-08T04:00:33-05:00,"Marc Thiessen: Yes, it feels like there are 'e...",/opinion/marc-thiessen-yes-it-feels-like-there...
8,"{'name': 'OPINION', 'url': '/category/opinion'}",Billionaires are not a problem. The rich creat...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-07T19:12:00-05:00,2019-02-07T19:12:00-05:00,Gutfeld on abolishing billionaires,/opinion/gutfeld-on-abolishing-billionaires
9,"{'name': 'OPINION', 'url': '/category/opinion'}",The 'Green New Deal' is a radical and impracti...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-07T16:22:52-05:00,2019-02-07T16:22:52-05:00,Democrats' 'Green New Deal' is a Crazy New Dea...,/opinion/democrats-green-new-deal-is-a-crazy-n...


The `lastPublicationDate` value of the first row is 2019-02-08. Hopefully a larger value will be associated with articles published earlier.

In [35]:
url_off = url.replace('offset=30','offset=100')

fox_df(url_off)


Return a df of length 11


Unnamed: 0,category,description,duration,imageUrl,isBreaking,isLive,lastPublishedDate,publicationDate,title,url
0,"{'name': 'Government Shutdown', 'url': '/categ...",We can’t continue to find ourselves in complet...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-01T18:11:41-05:00,2019-02-01T18:11:41-05:00,Rep. Rob Wittman: Shutdowns really ARE avoida...,/opinion/rep-rob-wittman-shutdowns-really-are-...
1,"{'name': 'Abortion', 'url': '/category/politic...",Politicians advocating for late-term abortion ...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-01T17:46:08-05:00,2019-02-01T17:46:08-05:00,Dr. Manny Alvarez: Late term abortion in New Y...,/opinion/dr-manny-alvarez-late-term-abortion-i...
2,"{'name': 'Fox Nation', 'url': '/category/fox-n...",Roger Stone was arrested in a pre-dawn raid at...,,https://a57.foxnews.com/static.foxnews.com/fox...,False,False,2019-02-01T17:31:27-05:00,2019-02-01T17:31:27-05:00,Judge Andrew Napolitano: The Roger Stone arres...,/opinion/judge-andrew-napolitano-the-roger-sto...
3,"{'name': 'OPINION', 'url': '/category/opinion'}",I am not willing to count out the 41-year-old ...,,https://a57.foxnews.com/static.foxnews.com/fox...,False,False,2019-02-01T16:52:00-05:00,2019-02-01T16:52:00-05:00,Dr. Marc Siegel: Tom Brady's ability to put mi...,/opinion/dr-marc-siegel-tom-bradys-ability-to-...
4,"{'name': 'OPINION', 'url': '/category/opinion'}",There is not a single reason to believe that i...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-01T12:32:04-05:00,2019-02-01T12:32:04-05:00,Venezuela is the socialist-wasteland that Ocas...,/opinion/venezuela-is-the-socialist-wasteland-...
5,"{'name': 'OPINION', 'url': '/category/opinion'}",Pro-life Americans have been staggered by the ...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-01T11:41:38-05:00,2019-02-01T11:41:38-05:00,Todd Starnes: Why the heck is Facebook blockin...,/opinion/todd-starnes-why-the-heck-is-facebook...
6,"{'name': 'Sean Hannity's Monologue', 'url': '/...","As we watch the extreme radical left, it's rea...",,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-01T07:08:19-05:00,2019-02-01T07:08:19-05:00,Sean Hannity: Radical leftist Dems want more g...,/opinion/sean-hannity-radical-leftist-dems-wan...
7,"{'name': 'OPINION', 'url': '/category/opinion'}",Senate opposes Trump's plan to withdraw U.S. f...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-01T06:53:51-05:00,2019-02-01T04:00:56-05:00,Trump shouldn't pull out of Syria right now – ...,/opinion/trump-shouldnt-pull-out-of-syria-righ...
8,"{'name': 'Tucker Carlson's Monologue', 'url': ...",Polar vortex reminds us that energy matters - ...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2019-02-01T06:40:06-05:00,2019-02-01T06:40:06-05:00,Tucker Carlson: Dems pushing the Green New Dea...,/opinion/tucker-carlson-dems-pushing-the-green...
9,"{'name': 'OPINION', 'url': '/category/opinion'}","For the players in Sunday’s game, success mean...",,https://a57.foxnews.com/static.foxnews.com/fox...,False,False,2019-02-01T04:00:45-05:00,2019-02-01T04:00:45-05:00,"Dear God, is it OK for me to pray that my team...",/opinion/dear-god-is-it-ok-for-me-to-pray-that...


The article with an offset value of 100 was published on 2019-02-01, or roughly a week before our offset 0. Perfect! Increasing the offset value yields additional article metadata in chronological order. 

Next, how far back can we go?

In [36]:
url_off = url.replace('offset=30','offset=1000')

fox_df(url_off)


Return a df of length 11


Unnamed: 0,category,description,duration,imageUrl,isBreaking,isLive,lastPublishedDate,publicationDate,title,url
0,"{'name': 'OPINION', 'url': '/category/opinion'}","It’s almost Thanksgiving, so we could give tha...",,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2018-11-17T04:00:58-05:00,2018-11-17T04:00:58-05:00,Thanksgiving time: Anti-Trump media stuffing p...,/opinion/thanksgiving-time-anti-trump-media-st...
1,"{'name': 'OPINION', 'url': '/category/opinion'}",Enactment of the First Step Act would represen...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2018-11-17T04:00:37-05:00,2018-11-17T04:00:37-05:00,Prison reform is on the horizon -- Now the Sen...,/opinion/tim-head-prison-reform-is-on-the-hori...
2,"{'name': 'OPINION', 'url': '/category/opinion'}",I hate Thanksgiving and I suspect that I’m not...,,https://a57.foxnews.com/static.foxnews.com/fox...,False,False,2018-11-17T04:00:31-05:00,2018-11-17T04:00:31-05:00,Call me the Thanksgiving Grinch -- here are fi...,/opinion/call-me-the-thanksgiving-grinch-here-...
3,"{'name': 'Faith', 'url': '/category/us/persona...",Nicholas was a man of God who worked tirelessl...,,https://a57.foxnews.com/static.foxnews.com/fox...,False,False,2018-11-17T04:00:26-05:00,2018-11-17T04:00:26-05:00,Bill Bennett: The legacy of Saint Nicholas – W...,/opinion/bill-bennett-the-legacy-of-saint-nich...
4,"{'name': 'OPINION', 'url': '/category/opinion'}","With Thanksgiving right around the corner, I m...",,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2018-11-17T04:00:26-05:00,2018-11-17T04:00:26-05:00,"Steve Doocy: This Thanksgiving, here's what NO...",/opinion/steve-doocy-this-thanksgiving-heres-w...
5,"{'name': 'OPINION', 'url': '/category/opinion'}","Well, it’s official. Over the past 14 months, ...",,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2018-11-17T04:00:24-05:00,2018-11-17T04:00:24-05:00,Amazon names new HQ2 – here's how citizens in ...,/opinion/amazon-names-new-hq2-heres-how-citize...
6,"{'name': 'OPINION', 'url': '/category/opinion'}",I felt sure there had to be some sort of awful...,,https://a57.foxnews.com/static.foxnews.com/fox...,False,False,2018-11-17T04:00:11-05:00,2018-11-17T04:00:11-05:00,When God gives you more than you can handle,/opinion/when-god-gives-you-more-than-you-can-...
7,"{'name': 'OPINION', 'url': '/category/opinion'}",A new North Korean weapons test could lead to ...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2018-11-16T20:57:25-05:00,2018-11-16T20:57:25-05:00,North Korean mystery weapons test creates poss...,/opinion/north-korean-mystery-weapons-test-cre...
8,"{'name': 'OPINION', 'url': '/category/opinion'}",One of the key lessons of last week’s midterm ...,,https://a57.foxnews.com/static.foxnews.com/fox...,False,False,2018-11-16T15:57:01-05:00,2018-11-16T15:57:01-05:00,Newt Gingrich: The left may be winning the war...,/opinion/newt-gingrich-the-left-may-be-winning...
9,"{'name': 'OPINION', 'url': '/category/opinion'}","Right now, thousands of Central American migra...",,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2018-11-16T15:56:59-05:00,2018-11-16T15:56:59-05:00,Sen. James Inhofe: A border wall can be built ...,/opinion/sen-james-inhofe-a-border-wall-can-be...


An offset of 1,000 takes us back about nine months.

In [37]:
url_off = url.replace('offset=30','offset=5000')

fox_df(url_off)


Return a df of length 11


Unnamed: 0,category,description,duration,imageUrl,isBreaking,isLive,lastPublishedDate,publicationDate,title,url
0,"{'name': 'POLITICS', 'url': '/category/politics'}",Americans have once again been subjected to a ...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2017-09-25T22:19:48-04:00,2017-08-19T13:30:00-04:00,Newt Gingrich: History is more important than ...,/opinion/newt-gingrich-history-is-more-importa...
1,"{'name': 'Bellwether', 'url': '/category/colum...",As Spain and Finland cope with this week’s vio...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2017-09-25T22:19:37-04:00,2017-08-19T16:44:00-04:00,ISIS created by Israel? One imam says so,/opinion/isis-created-by-israel-one-imam-says-so
2,"{'name': 'Solar Eclipse', 'url': '/category/sc...",The Moon is about to come between the Earth an...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2017-09-25T22:01:29-04:00,2017-08-20T00:00:00-04:00,The eclipse of 2017 has nothing to do with pol...,/opinion/the-eclipse-of-2017-has-nothing-to-do...
3,"{'name': 'Family', 'url': '/category/us/person...","Be perfectly patient, perfectly calm, perfectl...",,https://a57.foxnews.com/static.foxnews.com/fox...,False,False,2017-09-25T22:01:18-04:00,2017-08-20T05:00:00-04:00,Here's how moms can find relief from pressure,/opinion/heres-how-moms-can-find-relief-from-p...
4,"{'name': 'Solar Eclipse', 'url': '/category/sc...",On Monday something will happen in the U.S. th...,,https://a57.foxnews.com/static.foxnews.com/fox...,False,False,2017-09-25T22:01:07-04:00,2017-08-20T12:17:00-04:00,Are solar eclipses proof of God?,/opinion/are-solar-eclipses-proof-of-god
5,"{'name': 'Solar Eclipse', 'url': '/category/sc...","Sun worship is not new, in fact, the 14th cent...",,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2017-09-25T22:00:56-04:00,2017-08-20T12:38:00-04:00,"Dr. Marc Siegel: Solar eclipse - science, safe...",/opinion/dr-marc-siegel-solar-eclipse-science-...
6,"{'name': 'Values', 'url': '/category/us/person...","Listening to this cultural debate, it occurred...",,https://a57.foxnews.com/static.foxnews.com/fox...,False,False,2017-09-25T22:00:45-04:00,2017-08-20T15:47:00-04:00,"If America isn't great, who is?",/opinion/if-america-isnt-great-who-is
7,"{'name': 'Solar Eclipse', 'url': '/category/sc...","Perhaps best of all, the eclipse has nothing a...",,https://a57.foxnews.com/static.foxnews.com/fox...,False,False,2017-09-25T22:00:33-04:00,2017-08-20T16:08:00-04:00,The eclipse of 2017 has nothing to do with pol...,/opinion/the-eclipse-of-2017-has-nothing-to-do...
8,"{'name': 'Health Care', 'url': '/category/poli...",Few lawmakers are talking about the hidden hea...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2017-09-25T21:24:07-04:00,2017-08-21T10:00:00-04:00,Another health care reform quandary -- What ab...,/opinion/another-health-care-reform-quandary-w...
9,"{'name': 'POLITICS', 'url': '/category/politics'}","Expanding their attacks, many longtime critics...",,https://a57.foxnews.com/static.foxnews.com/fox...,False,False,2017-09-25T21:23:57-04:00,2017-08-21T01:00:00-04:00,Trump is passing the ‘moral’ test when it come...,/opinion/trump-is-passing-the-moral-test-when-...


5,000 brings us back to 2017. 

In [38]:
url_off = url.replace('offset=30','offset=10000')

fox_df(url_off)


Return a df of length 0


But 10,000 fails. 😞.

After attempting multiple values, it appears something around 9,950 is the maximum offset that will return results, which is approximately three years of data. This limit could be hard coded into the API or this could simply be all the data that is available on the website. 

In [49]:
url_off = url.replace('offset=30','offset=9950')

fox_df(url_off)


Return a dataframe of length 11


Unnamed: 0,category.name,category.url,description,duration,imageUrl,isBreaking,isLive,lastPublishedDate,publicationDate,title,url
0,CANCER,/category/health/cancer,This month marks the fifth anniversary of the ...,,https://a57.foxnews.com/static.foxnews.com/fox...,False,False,2016-02-29T13:17:15-05:00,2016-02-29T12:57:00-05:00,Cancer took my husband at 50. That's why the 2...,/opinion/cancer-took-my-husband-at-50-thats-wh...
1,Republicans,/category/politics/elections/republicans,Say this for the dim bulbs of the GOP: They fi...,,https://a57.foxnews.com/static.foxnews.com/fox...,False,False,2016-02-29T12:31:55-05:00,2016-02-29T10:19:00-05:00,It's the end of the GOP as we know it if Trump...,/opinion/its-the-end-of-the-gop-as-we-know-it-...
2,Values,/category/us/personal-freedoms/values,Have you seen the revelation from former NFL s...,,https://a57.foxnews.com/a57.foxnews.com/media2...,False,False,2016-02-29T11:55:57-05:00,2016-02-26T12:15:00-05:00,Terry Crews and the dirty little secret of Pla...,/opinion/terry-crews-and-the-dirty-little-secr...
3,Presidential Primaries,/category/politics/elections/presidential-prim...,I refuse to play by Washington’s political rul...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2016-02-29T05:18:36-05:00,2016-02-29T02:30:00-05:00,Ben Carson: Why I intend to stay in the GOP pr...,/opinion/ben-carson-why-i-intend-to-stay-in-th...
4,Apple,/category/tech/companies/apple,"Instead of lawyering up, Apple should mensch u...",,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2016-02-28T22:18:52-05:00,2016-02-25T14:39:00-05:00,What Apple's Tim Cook should learn from Facebo...,/opinion/what-apples-tim-cook-should-learn-fro...
5,Middle East,/category/world/world-regions/middle-east,Anniversaries are when we look back and see hi...,,https://a57.foxnews.com/static.foxnews.com/fox...,False,False,2016-02-28T06:00:23-05:00,2016-02-28T06:00:00-05:00,Why we won: Lessons from the Gulf War 25 years...,/opinion/why-we-won-lessons-from-the-gulf-war-...
6,Privacy,/category/tech/topics/privacy,Today the line between public and private pers...,,https://a57.foxnews.com/static.foxnews.com/fox...,False,False,2016-02-27T21:04:09-05:00,2016-02-27T18:00:00-05:00,Will cancer patients be the next victims of th...,/opinion/will-cancer-patients-be-the-next-vict...
7,Presidential Primaries,/category/politics/elections/presidential-prim...,The inevitability of Hillary Clinton’s road to...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2016-02-27T20:15:55-05:00,2016-02-27T20:00:00-05:00,Hillary Clinton's South Carolina victory means...,/opinion/hillary-clintons-south-carolina-victo...
8,Privacy,/category/tech/topics/privacy,"Donald Trump, Marco Rubio, Ted Cruz, Ben Carso...",,https://a57.foxnews.com/a57.foxnews.com/media2...,False,False,2016-02-26T17:39:33-05:00,2016-02-26T16:15:00-05:00,"Trump, Rubio, Cruz all want to repeal ObamaCar...",/opinion/trump-rubio-cruz-all-want-to-repeal-o...
9,Family,/category/us/personal-freedoms/family,I've now been married for 20 years. Standing o...,,https://a57.foxnews.com///static.foxnews.com/s...,False,False,2016-02-26T15:23:16-05:00,2016-02-26T11:50:00-05:00,Looking back after twenty years of marriage: W...,/opinion/looking-back-after-twenty-years-of-ma...


The final step in collecting the article metadata via the API is to make all the necessary API calls to gather all the relevant article metadata. I slightly revise the `fox_df` function so that it takes a value of an offset. I hardcoded the URL, fixing the size to 30.  A more robust function would turn the entire set of parameters into a dictionary that was modifiable. I also took out the print statement which would clutter the results. 

In [63]:
def fox_df(offset):
    url = ('https://www.foxnews.com/api/article-search?'
           'isCategory=true&isTag=false&isKeyword=false&'
           'isFixed=false&isFeedUrl=false&searchSelected=opinion&'
           'contentTypes=%7B%22interactive%22:true,%22slideshow%22:true,%22video%22:false,%22article%22:true%7D&'
           'size=30&offset=0')
    
    url = url.replace('offset=0', 'offset=%s' % offset)
    
    r = requests.get(url)
    df = json_normalize(r.json())
    return df


Finally, I loop over the function. I create an empty dataframe and then append the results after each loop. I pause three seconds each pass in order to not access the web server too many times. The first time I ran this, I only retrieved a few pages of results to make sure everything worked.

In [64]:
from time import sleep # for pausing

# create empty dataframe to store results
fox_opinion_df = pd.DataFrame() 

# Create a loop that counts up by 30.
for offset in range(0, 1000, 30):
    new_df = fox_df(offset)
    
    # Add the new results to the existing database
    fox_opinion_df = fox_opinion_df.append(new_df, ignore_index=True)
    
    # Pause for three seconds to be polite to the web server
    sleep(3)
    

In [65]:
fox_opinion_df.tail()

Unnamed: 0,category.name,category.url,description,duration,imageUrl,isBreaking,isLive,lastPublishedDate,publicationDate,title,url
1015,OPINION,/category/opinion,Republican Gov. Rick Scott leads Democratic Se...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2018-11-16T08:14:18-05:00,2018-11-16T08:14:18-05:00,Marc Thiessen: Democrats in Florida know that ...,/opinion/marc-thiessen-democrats-in-florida-kn...
1016,Tucker Carlson Tonight,/category/shows/tucker-carlson-tonight,The race for Georgia governor was tight for mo...,,https://a57.foxnews.com/secure.media.foxnews.c...,False,False,2018-11-16T07:54:18-05:00,2018-11-16T07:53:30-05:00,Tucker Carlson: Accepting election outcomes yo...,/opinion/tucker-carlson-accepting-election-out...
1017,The Ingraham Angle,/category/shows/ingraham-angle,From Weinstein to Sharpton to Farrakhan and no...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2018-11-16T07:13:30-05:00,2018-11-16T07:13:30-05:00,Laura Ingraham: Michael Avenatti's assault on ...,/opinion/laura-ingraham-michael-avenattis-assa...
1018,OPINION,/category/opinion,"In Alcoholics Anonymous, we call Thanksgiving,...",,https://a57.foxnews.com/static.foxnews.com/fox...,False,False,2018-11-16T04:00:23-05:00,2018-11-16T04:00:23-05:00,How to navigate the 'Bermuda Triangle' of holi...,/opinion/avoiding-the-bermuda-triangle-of-holi...
1019,OPINION,/category/opinion,Michael Avenatti is accused of domestic violen...,,https://a57.foxnews.com/media2.foxnews.com/Bri...,False,False,2018-11-15T22:07:56-05:00,2018-11-15T20:36:36-05:00,Tammy Bruce: Avenatti demands fairness when he...,/opinion/tammy-bruce-avenatti-demands-fairness...


The dataframe can be stored as either as CSV file or JSON. CSV is best if you want to use it elsewhere, while a JSON usually is more robust to handling text strings which somtimes trip up CSV readers. 

In [66]:
fox_opinion_df.to_csv('foxnews_opinion.csv')

In [68]:
fox_opinion_df.to_json('foxnews_opinion.json', orient ='records')

Things aren't always this easy, but when you stumble across an undocumented API you can quickly put together a dataset.