In [19]:
import string

STOP_WORDS = 'a about above across after again against all almost alone along already also although always among an and another any anybody anyone anything anywhere are area areas around as ask asked asking asks at away b back backed backing backs be became because become becomes been before began behind being beings best better between big both but by c came can cannot case cases certain certainly clear clearly come could d did differ different differently do does done down down downed downing downs during e each early either end ended ending ends enough even evenly ever every everybody everyone everything everywhere f face faces fact facts far felt few find finds first for four from full fully further furthered furthering furthers g gave general generally get gets give given gives go going good goods got great greater greatest group grouped grouping groups h had has have having he her here herself high high high higher highest him himself his how however i if important in interest interested interesting interests into is it its itself j just k keep keeps kind knew know known knows l large largely last later latest least less let lets like likely long longer longest m made make making man many may me member members men might more most mostly mr mrs much must my myself n necessary need needed needing needs never new new newer newest next no nobody non noone not nothing now nowhere number numbers o of off often old older oldest on once one only open opened opening opens or order ordered ordering orders other others our out over p part parted parting parts per perhaps place places point pointed pointing points possible present presented presenting presents problem problems put puts q quite r rather really right right room rooms s said same saw say says second seconds see seem seemed seeming seems sees several shall she should show showed showing shows side sides since small smaller smallest so some somebody someone something somewhere state states still still such sure t take taken than that the their them then there therefore these they thing things think thinks this those though thought thoughts three through thus to today together too took toward turn turned turning turns two u under until up upon us use used uses v very w want wanted wanting wants was way ways we well wells went were what when where whether which while who whole whose why will with within without work worked working works would x y year years yet you young younger youngest your yours z'.split()

def concordance(text):
    '''From the string text, build up a dictionary of all the
       whitespace-separated words occurring in text, where the key is
       the word and the value is the number of occurrences in text.
       
       Ignore stop words as defined in STOP_WORDS, strip punctuation
       from the ends of words, and convert all to lower case.
    '''
    db = {}
    for w in text.lower().split():
        w = w.strip(string.punctuation)
        if (w in STOP_WORDS) or (w == '') or (w=='–'):
            continue
        elif w not in db:
            db[w] = 1
        else:
            db[w] += 1
    return db

In [20]:
def top_n(dd, n=10):
    '''Return a list of up to n (key, value) pairs from dictionary dd
       ordered by highest value.
    '''
    dd_sort = sorted(dd.items(), key=lambda x: (-x[1], x[0]))
    return dd_sort[:n]

In [21]:
# from the NYT front page, 8 Mar 2016
sample = '''Is Trump Fading?

Mr. Trump had a rough week. He faced attacks from the party establishment and criticism for his debate performance on Thursday before barely outpacing Senator Ted Cruz of Texas on Saturday in Kentucky and Louisiana, and losing to him in Kansas and Maine, where Mr. Trump was considered a favorite.

But it is not clear whether he struggled to win because he had lost ground or because anti-Trump voters had consolidated around Mr. Cruz. Mr. Trump’s share of the vote on Saturday was roughly in line with what he had won on Super Tuesday; Mr. Cruz finished with a far higher share of the vote than his Super Tuesday total.

The outcome on Tuesday could be telling. If Mr. Trump were to replicate his Super Tuesday performance, he would take about 35 percent of the vote in Michigan and 42 percent in Mississippi. If he were to lose significant ground from last week’s vote, it could present an opening for one of his rivals.
'''

content = ' '.join(sample.strip().split('\n')) # here's an example of how to prep the text
trump = concordance(content)
trump = top_n(trump)
print(trump)

[('trump', 4), ('tuesday', 4), ('vote', 4), ('cruz', 3), ('super', 3), ('ground', 2), ('percent', 2), ('performance', 2), ('saturday', 2), ('share', 2)]


### Tasks

1. Modify the `concordance` function so that it works as described.
2. Build a concordance of the provided `sample` above, and print out the top ten most-frequently occurring words.
3. Find three samples of text from the web and show the **25** most-frequently occurring words in each. You might find it interesting to take three contrasting takes on the same event/idea/topic...

Ideas: Project Gutenberg, State of the Union addresses, NY Times, Wikipedia, Buzzfeed, (tamer parts of) reddit, fb, twitter...

In [26]:
#Fox
content = '''Donald Trump and Hillary Clinton each scored a string of impressive primary victories Tuesday night that sent an emphatic message to voters and their respective political rivals that the primary season might be all but over, and the race for the White House is on -- though Republican Sen. Ted Cruz, with victories in delegate-rich Texas and in Oklahoma and Alaska, is far from conceding anything.

Vermont Sen. Bernie Sanders, too, found reason to press on, with Super Tuesday wins in Minnesota, Oklahoma, Colorado and his home state of Vermont. Even Marco Rubio, after a string of second- and third-place finishes, found his first win in Minnesota.

But with Clinton amassing a huge delegate lead, the more competitive race is on the Republican side – where Cruz clearly edged Rubio in the Super Tuesday battle for second and quickly positioned himself as the better candidate to take on Trump.

“Tonight was another decision point, and the voters have spoken,” Cruz said in Texas, urging voters to unite behind him so he could take on Trump “head to head.” 

Even with the senators' victories, Trump emerged from Tuesday’s contests closer than ever to the nomination, and acting more and more like a general election candidate eager to take on Democratic front-runner Clinton.

“Once we get all of this finished, I’m going to go after one person, and that’s Hillary Clinton,” he said, at an unusual primary night press conference in Florida. “I think that’s frankly going to be an easy race.”

Speaking in Florida after notching several wins, Clinton also seemed to look beyond Sanders – taking implicit shots at Trump’s “make America great again” campaign slogan.

“America never stopped being great,” Clinton said. “We have to make America whole.”

She also mocked his proposal for a southern border wall, saying, “Instead of building walls, we’re going to break down barriers.” 

Trump answered right back, quipping: "Make America great again is going to be much better than making America whole again." 

With results still coming in, Trump is projected to win in Alabama, Arkansas, Georgia, Massachusetts, Tennessee, Vermont and Virginia. Clinton is projected to win Alabama, Arkansas, Georgia, Massachusetts, Tennessee, Texas and Virginia.

Across 11 states, 595 Republican delegates were up for grabs Tuesday – nearly half the number needed to clinch the nomination. And on the Democratic side, Clinton and Vermont Sen. Sanders were battling for 865 delegates in 11 states – roughly a third of the number needed to clinch the nomination.

No matter how the delegate math shakes out, the primary races are not over – yet.

While the Super Tuesday contests marked the biggest day of primary season voting to date, the states were mostly allocating delegates proportionally, meaning even the runner-ups could add to their totals.

Rubio stressed that point, as he began to focus on the March 15 contest in his home state.

“We never said Super Tuesday was going to be our night,” he told Fox News.

Cruz clearly had the better night.

Texas was the biggest prize on the Super Tuesday map, offering 222 Democratic delegates and 155 Republican delegates. A win for Cruz in his home state was considered critical, and he was able to thwart any potential late-hour surge by Trump there.

While Cruz put subtle pressure on Rubio to step aside, Trump openly mocked the Florida senator after earlier calling on him to drop out – a call Rubio rebuffed. Trump again called him a "lightweight" while threatening to take on the Florida senator in his home state in two weeks. 

Clinton entered Super Tuesday with a head of steam following her landslide win over Sanders in South Carolina this past Saturday.

Sanders, though, savored his home-state win all the same, rallying cheering supporters in Vermont Tuesday evening. 

"It is good to be home," he said, before shifting to his stump speech slams against a "corrupt campaign finance system." 

Ohio Gov. John Kasich, who expressed low expectations for Super Tuesday, remains in the GOP race in hopes of making it to the Ohio contest in two weeks, though his presence continues to frustrate efforts by Rubio and Cruz to consolidate support.

Retired neurosurgeon Ben Carson, meanwhile, has defended his continued presence in the race.

“People have asked for somebody who is not a politician, who was a member of we the people, who has an outstanding life of achievement and who thinks the way they do,” he told Fox News.'''
content = ' '.join(content.strip().split('\n'))
fox = top_n(concordance(content))

In [27]:
#CNN
content = '''Donald Trump and Hillary Clinton carved out dominant positions in their party nominating races on Super Tuesday, marching ever closer to a scorched-earth general election clash.

Political Prediction Market
Hillary Clinton
to be Democratic nominee

95%
live odds
Will the odds go up or down?

click to play
Powered by Pivit
Trump swamped his rivals by piling up seven wins across the nation, demonstrating broad appeal for his anti-establishment movement. Clinton also had a strong night, winning seven states and showing her strength with minorities in the South.

Trump won across the conservative South in Alabama, Arkansas, Georgia, Tennessee and Virginia, but also captured more moderate Massachusetts and Vermont.

MORE: 6 takeaways from Super Tuesday

"This has been an amazing night," Trump told reporters at his Mar-a-Lago resort in Palm Beach, Florida. He vowed to be a "unifier" and to go after Clinton with a singular focus once the GOP race eventually winds up.

But Trump's GOP rivals vowed to fight on. Ted Cruz won his home state of Texas, the biggest single prize of the night, and added Oklahoma and Alaska. And Florida Sen. Marco Rubio finally landed his first win of the 2016 season in the Minnesota Republican caucuses.

Trump's victories suggested that he did not pay a significant price for a controversy that flared in recent days over his initial failure to disavow David Duke, a former Ku Klux Klan leader, during a CNN interview, and disputes over his business record and positions on immigration.

Time running out
The best of Super Tuesday in 2 minutes

The best of Super Tuesday in 2 minutes 02:03
And time is running out for the panicking Republican establishment to deny the billionaire the nomination, amid fears his brand of volatile anti-immigrant rhetoric could cost the party not just the White House, but the Senate.

CNN projects that Trump has so far won 233 delegates on Super Tuesday, well ahead of Cruz with 188 and Rubio with 90. That gives the billionaire a total of 315 delegates in the overall race, compared to 205 for Cruz and 106 for Rubio. A total of 1,237 delegates are required to win the Republican nomination.

MORE: Trump's lead is tearing the Republicans apart

In the Democratic race, Clinton won seven states, building up a delegate cushion over her insurgent rival Bernie Sanders. She rode her support among African-American voters on a Southern sweep through Alabama, Arkansas, Georgia, Tennessee, Texas and Virginia, and added Massachusetts, a state Sanders had hoped to win.

"What a Super Tuesday," Clinton declared at her victory rally in Florida, taking aim at Trump by asserting that America was already great, despite his campaign mantra, and vowing to make the country "whole again."

Find your presidential match with the 2016 Candidate Matchmaker

Sanders won his own state, Vermont, along with Colorado, Minnesota and Oklahoma. And though he failed to broaden his appeal in less liberal battlegrounds, he will now look to states in the industrial Midwest such as Michigan to inflict new blows on the former secretary of state.

But Sanders has yet to find an answer for a central question of the race: How can he win the nomination of the diverse Democratic Party without demonstrating an ability to challenge Clinton's dominance of minority voters?

The Democratic race is guaranteed to go on for months, however, because the party's system of proportionally awarding delegates means no candidate is yet close to reaching the magic number of 2,383 delegates to win the nomination.

MORE: Chris Christie steals Trump's show

Clinton is projected so far to win 492 delegates on Super Tuesday, compared to 330 for Sanders. That gives Clinton a grand total of 1,055 delegates -- including super delegates, who are leading party officials and lawmakers who have endorsed her campaign. Sanders has 418 delegates so far in the race. The figures are likely to be updated throughout the night.

Trump did not have it all his own way on the Republican side, following predictions he could have won as many as 10 of the 11 states up for grabs.

New life for Cruz
The best of the Super Tuesday speeches

The best of the Super Tuesday speeches 01:41
Cruz won new life by capturing Texas, Oklahoma and Alaska, though he fell far short of the sweep through Southern states that once formed the central rationale of his campaign.

His three victories did, however, give him a reason to carry on in the race. He pointed to those triumphs, combined with his win in the Iowa caucuses, as proof that only he can actually beat Trump. He suggested that Rubio and others "prayerfully" consider exiting the race to unite the party.

"I am the only candidate who has beaten Donald three times," Cruz told CNN's Wolf Blitzer.

MORE: What a President Trump could mean for the world

And Rubio, after suffering a string of miserable election nights, finally secured his first win of the campaign in Minnesota.

He argued that Trump could not amass the 1,237 delegates needed to win the Republican nomination once winner-take-all contests begin to crop up on the calendar later this month --including his own, must-win state of Florida.

"This is the fight for the heart and soul of the Republican Party," Rubio told CNN's Jake Tapper. "I will go through all 50 states before we stop fighting to save the Republican Party from someone like that."

But his claim that he can unite the Republican Party against Trump looks increasingly questionable, given his losses to the former reality television star in other target states such as Virginia.

MORE: GOP to pitch Carson on Senate run

In some states, it was clear that Rubio and Cruz were dividing the opposition to Trump, who is still benefiting from the split field against him.

But there seems little incentive for either candidate to get out. Rubio has sufficient support and financial resources to continue and could benefit from an emerging effort by anti-Trump forces to target the billionaire with a super PAC.

The same is true of Cruz, and he and Rubio, youthful first term senators, are locked in a battle for the future leadership of the party, and don't seem likely to join together to present an anti-Trump front.

And given the fact that Cruz, who is widely disliked among his peers in Washington, and Trump have won all but one of the contests so far, it is clear the establishment is even farther away from providing a credible challenger for the nomination.

Sanders also is vowing to stay in the campaign -- and with his lucrative army of small donors and grass-roots appeal, he has no reason to leave.

MORE: What the world thinks of Trump

"This campaign is not just about electing a president," Sanders said at a rally Tuesday night in Vermont. "It is about transforming America."'''
content = ' '.join(content.strip().split('\n'))
cnn = top_n(concordance(content))

In [30]:
#NPR
content = '''Super Tuesday was a big night for both Hillary Clinton and Donald Trump. They each captured seven states in their respective Democratic and Republican races, extending leads over their remaining rivals.

But as we pointed out earlier, delegates were the name of the game on Tuesday, and each candidate's margin of victory mattered. NPR's Delegate Tracker has the full recap of what we know so far about how delegates have been allocated, according to estimates from The Associated Press.

Here's a quick snapshot of how the races in each state broke down:

Texas

The Lone Star State was the big prize of the night, and the bulk of its delegates will go to favorite son Ted Cruz. With 99 percent of the vote reporting, the state's junior senator notched an important 17-point win over Trump, and so far Cruz is projected to get 57 of the state's delegates while Trump gets 20.

Florida Sen. Marco Rubio missed out on getting any delegates because he fell just short of the 20-percent threshold in the state, getting only 18 percent in the GOP primary.

Clinton beat Vermont Sen. Bernie Sanders 2-to-1 in Texas, and picks up 122 delegates to his 48.

Georgia

The Peach State had the next-largest number of delegates up for grabs, and Trump got an important — and impressive — victory here, too. With 99 percent reporting, he took 39 percent of the GOP primary vote; Cruz and Rubio received 25 and 24 percent. Delegates are awarded proportionally and based on congressional district, and Trump is on pace to get 36 delegates, while Cruz takes 14 and Rubio gets 11.

Clinton got a big victory in Georgia, too, again fueled by support from African-Americans. According to exit polls, just over half of the electorate were black voters, and she won them by a 71-point margin.

Tennessee

Late support from Gov. Bill Haslam and Sen. Lamar Alexander wasn't enough to boost Rubio in the Volunteer State, or even help him be the runner-up to Trump. With 99 percent of the vote reporting, the real estate mogul took 39 percent in the state, while Cruz got 25 percent and Rubio took 21 percent. That breakdown gives Trump a whopping 30 delegates, while Cruz takes 12 and Rubio netted just two.

This was another state with a high evangelical constituency. According to exit polls, more than three-quarters of voters identified themselves as born-again Christians. And Trump carried 41 percent of those voters, a 14-point edge over Cruz, denying him a win with religious voters that should have been his base.

Clinton also cruised to a decisive win over Sanders here, with more than double his support with a 66 percent to 32 percent victory. That should give her 40 delegates to Sanders' 22.

Alabama

This was Trump's best state in the South. With all votes reporting, he won the Heart of Dixie with 43 percent; Cruz took 21 percent and Rubio got 19 percent. Such a decisive loss is a blow to Cruz, though — another state with significant evangelical turnout (77 percent according to exit polls) that he lost to Trump by a stunning 22-point margin.

Such a big victory helped Trump almost sweep the state's delegates; he's expected to get 28 delegates while Cruz gets just two.

Clinton had her biggest victory of the night in the state, besting Sanders 79 percent to 19 percent. Again, that was fueled by black voters, who made up 59 percent of the Democratic electorate. She carried 93 percent of black voters, compared to 5 percent that broke for Sanders. That huge win gives her 37 delegates to Sanders' four.

Virginia

Early on in the night, it looked like Florida Sen. Marco Rubio might pull an upset in the Old Dominion. He racked up big margins in the affluent, well-educated D.C. suburbs that lean more moderate. According to exit polls, he trumped Trump there 42 percent to 24 percent.

But he couldn't catch Trump's advantage in other parts of the commonwealth. The GOP front-runner carried the state's more rural areas and even edged past Rubio in the Richmond suburbs and the D.C. exurbs. With 99 percent of the vote reporting, Trump won 35 percent to Rubio's 32 percent.

Delegate breakdowns mean they fought almost to a draw though; Trump is expected to get 17 delegates with 16 for Rubio, eight for Cruz and five for Kasich. Neurosurgeon Ben Carson picked up his only delegates of the night, netting three after getting 6 percent of the primary vote.

Virginia was another Southern state where Clinton won big, beating Sanders 64 percent to 35 percent, with all votes reporting. That should give Clinton 61 more delegates while Sanders takes 32.

Arkansas

Some observers thought Cruz might sneak a win in the Natural State, but with 96 percent of the vote reporting, Trump narrowly edged him out, 33 percent to 31 percent, while Rubio took a quarter of the vote. Exit polls showed 77 percent of the GOP electorate described themselves as evangelicals, and Trump and Cruz split those voters with 33 percent apiece. But Trump had the edge among non-evangelical voters, helping him eke out the win. He's on pace to get 13 delegates, while Cruz and Rubio will get nine and six, respectively.

Clinton won decisively in the state where she was once first lady and which launched her and her husband's political careers. She bested Sanders 66 percent to 30 percent, and took 18 delegates while Sanders got seven.

Massachusetts

The Bay State gave Trump, another northeastern Republican, his biggest win of the night. With 97 percent of the vote reporting, he took 49 percent of the vote while the rest of his competitors were mired in the teens. That big win gave him 22 more delegates. Kasich and Rubio netted eight apiece, while Cruz notched four.

The state was the closest of the night for Clinton and Sanders and was the last Democratic contest to be called by The Associated Press. Clinton was edging out Sanders by fewer than 400 votes with just over 97 percent of precincts reporting. She'll get 45 delegates to Sanders' 43, but when you add in pledged superdelegates, Clinton will net 61 total delegates.

Oklahoma

Cruz pulled out a surprise victory in the Sooner State, beating Trump by six points, 34 percent to 28 percent. Rubio came in third with 26 percent, despite his campaign telegraphing early Tuesday that it felt good about the state. All votes have been reported.

This was a state where evangelical voters did propel Cruz to victory. With three-quarters of the GOP electorate identifying as born-again Christians, Cruz won 39 percent of those voters, compared to 26 percent for Rubio and 25 percent for Trump, according to exit polls. That gave Cruz 14 delegates, while Trump netted 12 and Rubio got 11.

Sanders also notched an important win in Oklahoma, besting Clinton by 10 points in the state, 52 percent to 42 percent. The electorate in the state was much more homogeneous than Southern states, with white voters making up just under three-quarters of the electorate. Sanders won white voters by 20 points, while Clinton carried black voters — just 14 percent of the electorate — by 44 points. The Vermont senator will take away 20 delegates to Clinton's 16.

Minnesota

The Land of 10,000 Lakes gave Rubio his first win. With 92 percent of caucus results reporting, he took 37 percent, followed by Cruz at 29 percent and Trump at 21 percent, his only third-place finish of the night.

But that caucus win actually gives Rubio a draw with Cruz in Minnesota delegates — each will take away a dozen delegates, while Trump netted eight.

With 86 percent reporting, Sanders got a 24-point victory over Clinton in the Democratic caucuses. With superdelegates factored in, he will take away 42 delegates compared to her 24, according to AP estimates.

Colorado

With 98 percent of the vote reporting, Sanders comfortably carried the Centennial State's Democratic caucuses with a 19-point margin. The caucuses give him 33 delegates and Clinton 23, but she ties him when her 10 superdelegates are added in.

On the GOP side, there isn't a winner to report. Republicans were beginning their caucus process to pick delegates to county caucuses; there's no presidential preference poll, yet.

Vermont

Sanders rolled to a win in his home Green Mountain State. With 97 percent reporting, he beat Clinton by a whopping 72 points. He gets all of the state's 10 allocated delegates, along with three superdelegates; Clinton has four pledged superdelegates in the state.

The GOP contest ended up being one of the few nail-biters of the night. Kasich turned in a better-than-expected performance, but he couldn't overtake Trump, who won with 33 percent to Kasich's 30 percent. The two tied for delegates though, each banking six.

Alaska

Republican caucuses in the Last Frontier were the only contest in the nation's largest state, and they were won by the favored son of the nation's second-largest. With all votes reporting, Cruz had 36 percent of the vote to Trump's 33 percent and Rubio's 11 percent. The candidates picked up, respectively, 12, 11 and five delegates.'''
content = ' '.join(content.strip().split('\n'))
npr = top_n(concordance(content))

In [31]:
print(fox)
print(cnn)
print(npr)

[('tuesday', 10), ('trump', 9), ('clinton', 8), ('cruz', 7), ('super', 7), ('rubio', 6), ('win', 6), ('delegates', 5), ('home', 5), ('primary', 5)]
[('trump', 14), ('super', 11), ('delegates', 10), ('tuesday', 10), ('cruz', 9), ('party', 9), ('rubio', 9), ('win', 9), ('clinton', 8), ('race', 8)]
[('percent', 60), ('delegates', 32), ('trump', 23), ('cruz', 22), ('rubio', 18), ('sanders', 18), ('clinton', 17), ('reporting', 14), ('voters', 13), ('win', 12)]
