## Microsoft Concept Graph

[Microsoft Concept Graph](https://concept.research.microsoft.com/) is a large taxonomy of terms mined from the internet, with `is-a` relations between concepts. 

Context Graph is available in two forms:
 * Large text file for download
 * REST API

Statistics:
 * 5401933 unique concepts, 
 * 12551613 unique instances
 * 87603947 `is-a` relations

## Using Web Service

Web service offers different calls to estimate probability of a concept belonging to different groups. More info is available [here](https://concept.research.microsoft.com/Home/Api).
Here is the sample URL to call: `https://concept.research.microsoft.com/api/Concept/ScoreByProb?instance=microsoft&topK=10`

In [1]:
import urllib
import json
import ssl

def http(x):
    ssl._create_default_https_context = ssl._create_unverified_context
    response = urllib.request.urlopen(x)
    data = response.read()
    return data.decode('utf-8')

def query(x):
    return json.loads(http("https://concept.research.microsoft.com/api/Concept/ScoreByProb?instance={}&topK=10".format(urllib.parse.quote(x))))

query('microsoft')

URLError: <urlopen error [Errno 60] Operation timed out>

Let's try to categorize the news titles using parent concepts. To get news titles, we will use [NewsApi.org](http://newsapi.org) service. You need to obtain your own API key in order to use the service - go to the web site and register for free developer plan.

In [20]:
newsapi_key = '<your API key here>'
def get_news(country='us'):
    res = json.loads(http("https://newsapi.org/v2/top-headlines?country={0}&apiKey={1}".format(country,newsapi_key)))
    return res['articles']

all_titles = [x['title'] for x in get_news('us')+get_news('gb')]

In [4]:
all_titles = ['Covid-19 Live Updates: Vaccines and Boosters News - The New York Times',
 'Ukrainians Flee Mariupol as Russian Forces Push to Take Port City - The Wall Street Journal',
 'Bond Yields Jump, Stock Futures Rise After Powell Says Fed Is Ready to Be More Aggressive - The Wall Street Journal',
 'Putin critic Alexei Navalny found guilty by Russian court - New York Post ',
 "Supreme Court nominee Ketanji Brown Jackson will face questions at confirmation hearing's second day - CNN",
 '2 teachers killed at Swedish high school, student arrested - ABC News',
 'Clues to Covid-19’s Next Moves Come From Sewers - The Wall Street Journal',
 'Republicans to roll dice by grilling Jackson over child-pornography sentencing decisions | TheHill - The Hill',
 '‘Clear sign’ Putin considering using chemical weapons in Ukraine, claims President Biden - The Independent',
 'NASA confirms there are 5,000 planets outside our solar system - Daily Mail',
 "US stocks whipsawed overnight after Fed Chair Powell's remarks - Fox Business",
 "'We've learned absolutely nothing': Tests could again be in short supply if Covid surges - POLITICO",
 "Duchess of Cambridge swaps khaki jungle gear for Vampire's Wife dress on Belize trip - Daily Mail",
 'China searches for victims, flight recorders after first plane crash in 12 years - Reuters',
 'Second superyacht linked to Russian oligarch Abramovich docks in Turkey - Reuters',
 'Live updates: Russia stops talks with Japan over sanctions - The Associated Press - en Español',
 'Powers Remain and Threats Lurk as Women’s Sweet 16 Is Set - The New York Times',
 'Webb Space Telescope Begins Multi-Instrument Alignment - SciTechDaily',
 "UConn vs UCF - NCAA women's tournament second-round highlights - March Madness",
 'Bucking Republican Trend, Indiana Governor Vetoes Transgender Sports Bill - The New York Times',
 "Maggie Fox dead: Coronation Street and Shameless actress dies after 'sudden accident' - Mirror Online - The Mirror",
 'China plane crash – live: Search for survivors continues as witness describes moment flight fell from sky - The Independent',
 'Daniel Morgan murder: damning report condemns Met police - The Guardian',
 'What to expect from Rishi Sunak’s Spring Statement - BBC.com',
 'UK and Republic of Ireland in line to host Euro 2028 after no one else bids - The Guardian',
 "Friends beg Vladimir Putin's 'lover' to persuade him to end Ukraine invasion - The Mirror",
 'Brass Eye’s outtakes show the brutal TV comedy was the tip of an iceberg - The Guardian',
 "Vladimir Putin threatens civilians to break Mariupol's spirit - The Times",
 'Shell U-turn on Cambo oilfield would threaten green targets, say campaigners - The Guardian',
 'St Helens dog attack: Girl aged 17 months killed at home - BBC',
 "PlayStation to buy 'Assassin's Creed' veteran Jade Raymond's Haven Studios - NME",
 '‘Clear sign’ Putin considering using chemical weapons in Ukraine, claims President Biden - The Independent',
 'NASA confirms there are 5,000 planets outside our solar system - Daily Mail',
 'Nintendo Switch finally has folders • Eurogamer.net - Eurogamer.net',
 'FA to “find a solution” as Liverpool fan group blasts “shambolic” Wembley travel - This Is Anfield',
 'Manchester United transfer news LIVE Erik ten Hag latest and Man Utd manager updates - Manchester Evening News',
 'Inflation raises cost of UK government borrowing in February; crude oil up again – business live - The Guardian',
 'Alexei Navalny: Kremlin critic found guilty of large-scale fraud and contempt of court by Russian court - Sky News',
 "UK prepares to nationalize Russia natural gas giant Gazprom's retail unit - Business Insider",
 'Zaghari-Ratcliffe: Hunt calls for inquiry into delay over Iran debt payment - The Guardian']
all_titles

['Covid-19 Live Updates: Vaccines and Boosters News - The New York Times',
 'Ukrainians Flee Mariupol as Russian Forces Push to Take Port City - The Wall Street Journal',
 'Bond Yields Jump, Stock Futures Rise After Powell Says Fed Is Ready to Be More Aggressive - The Wall Street Journal',
 'Putin critic Alexei Navalny found guilty by Russian court - New York Post ',
 "Supreme Court nominee Ketanji Brown Jackson will face questions at confirmation hearing's second day - CNN",
 '2 teachers killed at Swedish high school, student arrested - ABC News',
 'Clues to Covid-19’s Next Moves Come From Sewers - The Wall Street Journal',
 'Republicans to roll dice by grilling Jackson over child-pornography sentencing decisions | TheHill - The Hill',
 '‘Clear sign’ Putin considering using chemical weapons in Ukraine, claims President Biden - The Independent',
 'NASA confirms there are 5,000 planets outside our solar system - Daily Mail',
 "US stocks whipsawed overnight after Fed Chair Powell's remar

First of all, we want to be able to extract nouns from news titles. We will use `TextBlob` library to do this, which simplifies a lot of typical NLP tasks like this.

In [2]:
import sys
!{sys.executable} -m pip install textblob
!{sys.executable} -m textblob.download_corpora
from textblob import TextBlob

Collecting textblob
  Downloading textblob-0.17.1-py2.py3-none-any.whl (636 kB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m636.8/636.8 kB[0m [31m1.1 MB/s[0m eta [36m0:00:00[0mm eta [36m0:00:01[0m[36m0:00:01[0m
[?25hCollecting nltk>=3.1 (from textblob)
  Downloading nltk-3.8.1-py3-none-any.whl (1.5 MB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.5/1.5 MB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0mm eta [36m0:00:01[0m[36m0:00:01[0mm
[?25hCollecting click (from nltk>=3.1->textblob)
  Downloading click-8.1.7-py3-none-any.whl.metadata (3.0 kB)
Collecting joblib (from nltk>=3.1->textblob)
  Downloading joblib-1.3.2-py3-none-any.whl.metadata (5.4 kB)
Collecting regex>=2021.8.3 (from nltk>=3.1->textblob)
  Downloading regex-2023.10.3-cp38-cp38-macosx_10_9_x86_64.whl.metadata (40 kB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.9/40.9 kB[0m [31m232.6 kB/s[0m eta [36m0:00:00[0m 

In [5]:
w = {}
for x in all_titles:
    for n in TextBlob(x).noun_phrases:
        if n in w:
            w[n].append(x)
        else:
            w[n]=[x]
{ x:len(w[x]) for x in w.keys()}

{'covid-19 live updates': 1,
 'vaccines': 1,
 'boosters': 1,
 'york': 4,
 'ukrainians flee mariupol': 1,
 'forces push': 1,
 'port city': 1,
 'wall street journal': 3,
 'bond yields': 1,
 'futures rise': 1,
 'powell says fed': 1,
 'ready': 1,
 'be': 1,
 'aggressive': 1,
 'putin': 3,
 'alexei navalny': 2,
 'russian': 2,
 'supreme court nominee': 1,
 'ketanji brown jackson': 1,
 "confirmation hearing 's": 1,
 'cnn': 1,
 'swedish': 1,
 'high school': 1,
 'abc': 1,
 'clues': 1,
 'covid-19': 1,
 '’ s': 2,
 'moves': 1,
 'sewers': 1,
 'roll dice': 1,
 'jackson': 1,
 'decisions |': 1,
 'thehill': 1,
 'clear': 2,
 'chemical weapons': 2,
 'ukraine': 3,
 'claims president': 2,
 'biden': 2,
 'nasa': 2,
 'solar system': 2,
 'daily mail': 3,
 'us stocks': 1,
 'fed chair powell': 1,
 "'s remarks": 1,
 'fox': 1,
 "'we 've": 1,
 'tests': 1,
 'covid': 1,
 'politico': 1,
 'duchess': 1,
 'cambridge': 1,
 'swaps khaki jungle gear': 1,
 'vampire': 1,
 'wife': 1,
 'belize': 1,
 'china': 2,
 'flight recorders

We can see that nouns do not give us large thematic groups. Let's substitute nouns by more general terms obtained from the concept graph. This will take some time, because we are doing REST call for each noun phrase.

In [6]:
w = {}
for x in all_titles:
    for noun in TextBlob(x).noun_phrases:
        terms = query(noun.replace(' ','%20'))
        for term in [u for u in terms.keys() if terms[u]>0.1]:
            if term in w:
                w[term].append(x)
            else:
                w[term]=[x]

URLError: <urlopen error [Errno 60] Operation timed out>

In [24]:
{ x:len(w[x]) for x in w.keys() if len(w[x])>3}

{'city': 9,
 'brand': 4,
 'place': 9,
 'town': 4,
 'factor': 4,
 'film': 4,
 'nation': 11,
 'state': 5,
 'person': 4,
 'organization': 5,
 'publication': 10,
 'market': 5,
 'economy': 4,
 'company': 6,
 'newspaper': 6,
 'relationship': 6}

In [27]:
print('\nECONOMY:\n'+'\n'.join(w['economy']))
print('\nNATION:\n'+'\n'.join(w['nation']))
print('\nPERSON:\n'+'\n'.join(w['person']))


ECONOMY:
China searches for victims, flight recorders after first plane crash in 12 years - Reuters
Live updates: Russia stops talks with Japan over sanctions - The Associated Press - en Español
China plane crash – live: Search for survivors continues as witness describes moment flight fell from sky - The Independent
UK prepares to nationalize Russia natural gas giant Gazprom's retail unit - Business Insider

NATION:
‘Clear sign’ Putin considering using chemical weapons in Ukraine, claims President Biden - The Independent
Duchess of Cambridge swaps khaki jungle gear for Vampire's Wife dress on Belize trip - Daily Mail
China searches for victims, flight recorders after first plane crash in 12 years - Reuters
Live updates: Russia stops talks with Japan over sanctions - The Associated Press - en Español
Live updates: Russia stops talks with Japan over sanctions - The Associated Press - en Español
China plane crash – live: Search for survivors continues as witness describes moment flight 