Many news analytics tools aggregate content from popular news outlet and highlight trending and important topics. As an example, check out [Infomous](https://www.infomous.com)

Let us build a mini news analytics tool. Using CNN top news stories on `9/3/2019`

- How many words does `article` contain?
- How many unique words?
- Sort these words by their frequency
- Some of the words have high frequency because they are common words in english, the list `stop_words` contains a list of these words. Exclude words in `stop_words` from your report of the above

In [2]:
article = """Hurricane Dorian leaves terrible damage and stranded residents in Bahamas
While residents evacuate, these men and women are flying straight into Hurricane Dorian
Here's what Hurricane Dorian is expected to do as it crawls toward the US
Bahamas' tourism could be devastated for a long time after Hurricane Dorian
Why Donald Trump golfing during Hurricane Dorian is a problem
Opinion: Donald Trump isn't up to the job
Quickly catch up on the day's news
Pentagon diverts $3.6 billion in military construction funds to build Trump's border wall
These were the passengers and crew members on board the California dive boat
What we know about the California dive boat and its last excursion
Search suspended in the deadly dive-boat fire off California coast
Opinion: The diving world tries to come to grips with devastating fire
Here's the real reason Mike Pence is staying at Donald Trump's hotel in Ireland
Want to live longer? You may want to ditch these drinks
Judge rules White House must give Playboy columnist Brian Karem his press pass back
Boris Johnson has just taken a huge gamble over Brexit
Brexit rebels seize Parliament's agenda in major blow to Boris Johnson
The UK could be heading for an early election. Here's what you need to know
What is a no-deal Brexit and what would it mean for Britain?
West Texas shooter bought gun in private sale
Teenage boy goes blind after existing on Pringles and french fries
A 14-year-old confessed to killing all five of his family members in an Alabama home, authorities say
Dorian, Comey and Debra Messing: What Trump tweeted on Labor Day weekend
Kroger asks customers not to openly carry guns in its stores
Walmart ends all handgun ammunition sales and asks customers not to carry guns into stores
Walmart CEO implores Congress to 'do their part' to stop gun violence
McConnell says he won't take up gun bill unless Trump says he will sign it
Police: Driver charged in abduction says child was sold for $10,000
Kristen Stewart says she was told not to hold hands with her girlfriend in public
Ariana Grande sues Forever 21 over ads featuring 'look-alike model'
Chicago mayor to Ted Cruz over city shootings: 'Keep our name out of your mouth'
An airline employee thought two men at Newark airport looked suspicious so she yelled for people to evacuate
Detroit Tigers prospect dies after electric skateboard accident
Mattis dodges when asked if he will speak candidly about Trump before 2020 election
Biden's campaign is already making Iowa excuses
Elizabeth Warren embraces Jay Inslee's climate change platform
Joe Manchin announces he'll stay in Senate and won't run for West Virginia governor
1 quote that perfectly explains the 2020 Democratic primary
This is why Congress remains deadlocked on climate and guns
Popeyes customer pulls a gun after being told there were no more chicken sandwiches
Robert Pattinson was blanking 'furious' over Batman leak
Justin Bieber shares use of 'heavy drugs' in revealing post
Cherokee Nation names first ever delegate to Congress
US, UK, France and Iran may be complicit in Yemen war crimes, UN says
Loud 'boom' reported across Central New York was probably a fireball entering Earth's atmosphere
Man arrested for buying ticket just to wave his wife off at gate
The problem of an 'infestation' of travel influencers
'It Chapter Two' doesn't know when to close the book
Leslie Jones bids 'Saturday Night Live' farewell
Days after her brother was charged in a triple slaying, Simone Biles says her heart aches 'especially for the victims'
Actor Cuba Gooding Jr.'s sex abuse trial has been delayed
Emma Thompson: Everything depends on what we do now"""

In [4]:
words = article.split()

In [5]:
#How many words does article contain?
len(words)

606

In [7]:
#How many unique words?
unique_words = set(words)
len(unique_words)

418

In [11]:
#Sort these words by their frequency
word_freq = {}
# first initialize the dictionary
for word in unique_words:
    word_freq[word]=0
# second, process text, increase count for each word
for word in words:
    word_freq[word]+=1

In [12]:
word_freq

{'implores': 1,
 'columnist': 1,
 "Inslee's": 1,
 'before': 1,
 'construction': 1,
 'White': 1,
 'depends': 1,
 'two': 1,
 'diverts': 1,
 'told': 2,
 'Day': 1,
 "'heavy": 1,
 "'infestation'": 1,
 'buying': 1,
 'unless': 1,
 'Man': 1,
 "Here's": 3,
 'war': 1,
 'there': 1,
 'a': 7,
 'pass': 1,
 "drugs'": 1,
 'all': 2,
 'change': 1,
 'up': 3,
 'on': 6,
 'chicken': 1,
 'will': 2,
 'hotel': 1,
 'election': 1,
 'grips': 1,
 'UK,': 1,
 'entering': 1,
 'men': 2,
 'Justin': 1,
 "victims'": 1,
 "part'": 1,
 'embraces': 1,
 'bill': 1,
 'Brexit': 3,
 'Pentagon': 1,
 'has': 2,
 'home,': 1,
 'dies': 1,
 'UK': 1,
 'carry': 2,
 'blanking': 1,
 'was': 5,
 'aches': 1,
 'violence': 1,
 'What': 3,
 'members': 2,
 'been': 1,
 'stranded': 1,
 'Yemen': 1,
 "Trump's": 2,
 'back': 1,
 "Parliament's": 1,
 'Forever': 1,
 'border': 1,
 'employee': 1,
 'Actor': 1,
 'delegate': 1,
 'York': 1,
 'weekend': 1,
 'out': 1,
 'sign': 1,
 'Kroger': 1,
 'deadly': 1,
 'an': 3,
 'France': 1,
 'ticket': 1,
 'heading': 1,
 'reb

In [16]:
#sort the dictionary

# first get a list of items
items = list(word_freq.items())

#items is a list, we can sort usint .sort or sorted
#however, we need to specify how the sort to be done

In [18]:
#take a look at the first item in items
items[0]

('$10,000', 1)

In [22]:
#notice it is a tuple, so we need to sort based on the second number (the frequency)
sorted_items=sorted(items, key=lambda x: x[1], reverse=True)
sorted_items[:10]

[('to', 20),
 ('in', 14),
 ('and', 11),
 ('the', 11),
 ('for', 8),
 ('a', 7),
 ('is', 6),
 ('on', 6),
 ('says', 6),
 ('Dorian', 5)]

In [None]:
# using Counter
from collections import Counter
Counter(words)

In [24]:
from sklearn.feature_extraction import stop_words
stop_words = set(stop_words.ENGLISH_STOP_WORDS)

In [27]:
#manual way
filtered_items = []
for item in sorted_items:
    if item[0] not in stop_words:
        filtered_items.append(item)

In [29]:
filtered_items[:10]

[('says', 6),
 ('Dorian', 5),
 ('Hurricane', 5),
 ('Trump', 5),
 ('gun', 4),
 ('Brexit', 3),
 ('California', 3),
 ('Congress', 3),
 ('Donald', 3),
 ("Here's", 3)]

In [32]:
#sneak peek, using list comprehension
[item for item in sorted_items if item[0] not in stop_words][:10]

[('says', 6),
 ('Dorian', 5),
 ('Hurricane', 5),
 ('Trump', 5),
 ('gun', 4),
 ('Brexit', 3),
 ('California', 3),
 ('Congress', 3),
 ('Donald', 3),
 ("Here's", 3)]

In [None]:
# recreating the dictionary
word_freq1 = {}
important_words = unique_words - stop_words
for word in important_words:
    word_freq1[word] = 0
for word in words:
    if word in important_words:
        word_freq1[word] += 1

In [None]:
sorted(word_freq1.items(), key = lambda x: x[1], reverse=True )