*Written by Gregory Palermo, 2018-06-29*

This notebook parses the RSS feeds of a dictionary of news sources and accumulates recent headlines (not sure what the window is, to be honest). The hope is that we can run this periodically and collect headlines, searching them for detention-center-relevant content to provide a live feed.

We may also be able to do this for article summaries, provided that field is in use by the publications. (Not quite sure how to do that, but I'm not all that familiar with the `feedparser` package.

Unfortunately, RSS feeds do not include the full text of the articles, which must be scraped otherwise.

In [6]:
import feedparser

# Function to fetch the rss feed and return the parsed RSS
def parseRSS(rss_url):
    return feedparser.parse(rss_url) 
    
# Function grabs the rss feed headlines (titles) and returns them as a list
def getHeadlines(rss_url):
    headlines = []
    
    feed = parseRSS(rss_url)
    for newsitem in feed['items']:
        headlines.append(newsitem['title'])
    
    return headlines

def getSummaries(rss_url):
    summaries = []
    
    feed = parseRSS(rss_url)
    for newsitem in feed['items']:
        summaries.append(newsitem['summary'])
    

allheadlines = []
 
# Dictionary of RSS feeds to fetch and combine (this is just a random sample, to test)
newsurls = {
    'apnews':           'http://hosted2.ap.org/atom/APDEFAULT/3d281c11a96b4ad082fe88aa0db04305',
    'googlenews':       'https://news.google.com/news/rss/?hl=en&amp;ned=us&amp;gl=US',
    'yahoonews':        'http://news.yahoo.com/rss/',
    'Chillicothe Gazette' : 'http://rssfeeds.chillicothegazette.com/chillicothe/news',
    'Cincinnati.com' : 'http://rssfeeds.cincinnati.com/cincinnati-news',
    'Chocton Tribune' : 'http://rssfeeds.coshoctontribune.com/coshocton/news',
    'The News-Messenger, Fremont' : 'http://rssfeeds.thenews-messenger.com/fremont/news',
    'Lancaster Eagle-Gazette' : 'http://rssfeeds.lancastereaglegazette.com/lancaster/news',
    'Mansfield News Journal' : 'http://rssfeeds.mansfieldnewsjournal.com/mansfield/news',
    'The Marion Star' : 'http://rssfeeds.marionstar.com/marion/news',
    'The Advocate, Newark' : 'http://rssfeeds.newarkadvocate.com/newark/news',
    'News Herald, Port Clinton' : 'http://rssfeeds.portclintonnewsherald.com/portclinton/news',
    'Times Recorder, Zanesville' : 'http://rssfeeds.zanesvilletimesrecorder.com/zanesville/news',
    'Statesman Journal, Salem' : 'http://rssfeeds.statesmanjournal.com/salem/news',
    'Chambersburg Public Opinion' : 'http://rssfeeds.publicopiniononline.com/publicopiniononline/news',
    'Hanover Evening Sun' : 'http://rssfeeds.eveningsun.com/hanover/news',
    'Lebanon Daily News' : 'http://rssfeeds.ldnews.com/lebanon/news',
    'York Daily Record' : 'http://rssfeeds.ydr.com/ydr/home',
    'The Anderson Independent-Mail' : 'http://rssfeeds.independentmail.com/anderson/news',
    'GreenvilleOnline.com' : 'http://rssfeeds.greenvilleonline.com/greenville/news',
    'Argus Leader, Sioux Falls' : 'http://rssfeeds.argusleader.com/siouxfalls/news',
    'The Commercial Appeal (Memphis)' : 'http://rssfeeds.commercialappeal.com/memphis/news',
    'The Knoxville News-Sentinel' : 'http://rssfeeds.knoxnews.com/knoxville/news',
    'The Leaf-Chronicle, Clarksville' : 'http://rssfeeds.theleafchronicle.com/clarksville/news',
    'The Jackson Sun' : 'http://rssfeeds.jacksonsun.com/jacksontn/news',
    'The Daily News Journal, Murfreesboro' : 'http://rssfeeds.dnj.com/murfreesboro/news',
    'The Tennessean, Nashville' : 'http://rssfeeds.tennessean.com/nashville/news',
    'The Abilene Reporter-News' : 'http://rssfeeds.reporternews.com/abilene/news',
    'The Corpus Christi Caller Times' : 'http://rssfeeds.caller.com/corpuschristi/news/',
    'El Paso Times' : 'http://rssfeeds.elpasotimes.com/elpaso/news',
    'The San Angelo Standard-Times' : 'http://rssfeeds.gosanangelo.com/sanangelo/news',
    'Times Record News (Wichita Falls)' : 'http://rssfeeds.timesrecordnews.com/wichitafalls/news',
    'The Spectrum, St. George' : 'http://rssfeeds.thespectrum.com/stgeorge/news',
    'The Burlington Free Press' : 'http://rssfeeds.burlingtonfreepress.com/burlington/home',
    'The News Leader, Staunton' : 'http://rssfeeds.newsleader.com/staunton-news',
    'Kitsap Sun (Bremerton)' : 'http://rssfeeds.kitsapsun.com/kitsapnews',
    'The Post-Crescent, Appleton' : 'http://rssfeeds.postcrescent.com/appleton/news',
    'The Reporter, Fond du Lac' : 'http://rssfeeds.fdlreporter.com/fonddulac/news',
    'Green Bay Press-Gazette' : 'http://rssfeeds.greenbaypressgazette.com/greenbay/news',
    'Herald Times Reporter, Manitowoc' : 'http://rssfeeds.htrnews.com/manitowoc/news',
    'Marshfield News Herald' : 'http://rssfeeds.marshfieldnewsherald.com/marshfield/news',
    'Milwaukee Journal Sentinel' : 'http://rssfeeds.jsonline.com/milwaukee/news',
    'Oshkosh Northwestern' : 'http://rssfeeds.thenorthwestern.com/oshkosh/news',
    'The Sheboygan Press' : 'http://rssfeeds.sheboyganpress.com/sheboygan/news',
    'Stevens Point Journal' : 'http://rssfeeds.stevenspointjournal.com/stevenspoint/news',
    'Wausau Daily Herald' : 'http://rssfeeds.wausaudailyherald.com/wausau/news',
    'The Daily Tribune, Wisconsin Rapids' : 'http://rssfeeds.wisconsinrapidstribune.com/wisconsinrapids/news'    
}

# Iterate over the feed urls in the dictionary
for key,url in newsurls.items():
    # Call getHeadlines() and combine the returned headlines with allheadlines
    allheadlines.extend( getHeadlines( url ) )
    


In [8]:
allheadlines

['Coast-to-coast protests denounce Trump immigration policies',
 'Supreme Court Abortion Rights Threat Will Be Election Boost for Democrats: Poll',
 'Abolishing ICE becomes Dem litmus test',
 "LUPICA: Journalists aren't 'the enemy of the people' — guns are",
 "US envoy to Estonia says he has resigned, citing Trump's treatment of European allies",
 "Thieves emptied the bank account of America's oldest living veteran",
 'Trump Supporters not Welcome in Canada? Manager Fired After Refusing to Serve MAGA Cap Customer',
 'Kylian Mbappé retired Javier Mascherano',
 'Mom Who Accidentally Left Daughter In Hot Car To Die Begged Jail Guards To Let Her Kill Herself, Lawyer Says',
 'Trump says Saudi king agreed to raise oil output by up to 2 million barrels',
 'Neighbors who call police on 12-year-old mowing lawn increase his business, customer says',
 "Iran's leaders seek ways to defend economy from US sanctions",
 '$1 million nationwide warrant issued for Washington man sought in beheading death