# Setup

## Spacy
* [spacy.cli.download()](https://github.com/explosion/spaCy/blob/master/spacy/cli/download.py)

```
def download(
    model: str,
    direct: bool = False,
    sdist: bool = False,
    *pip_args,  # --target to specify the download directory
)
```

Internally, it is invokiing ```pip install``` in a command line.

```
def download_model(
    filename: str, user_pip_args: Optional[Sequence[str]] = None
) -> None:
    download_url = about.__download_url__ + "/" + filename
    pip_args = list(user_pip_args) if user_pip_args is not None else []
    cmd = [sys.executable, "-m", "pip", "install"] + pip_args + [download_url]
    run_command(cmd)
```

## NLTK

* [nltk.download()](https://www.nltk.org/api/nltk.downloader.html#nltk.downloader.Downloader.download)

```
def download(
    info_or_id=None,
    download_dir=None,
    quiet=False,
    force=False,
    prefix="[nltk_data] ",
    halt_on_error=True,
    raise_on_error=False,
    print_error_to=sys.stderr,
)
```

In [1]:
%%html
<style>
table {float:left}
</style>

In [2]:
!pip install spacy textacy nltk --quiet


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m23.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [3]:
import sys
import re
import json
import pathlib
from itertools import chain

import spacy
from spacy import displacy
from spacy.cli import download
import textacy
import nltk

In [4]:
nltk.download('stopwords')
nltk.download('wordnet')

[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/oonisim/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /Users/oonisim/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

# PYTHONPATH

In [5]:
PATH_TO_LIB: str = str(pathlib.Path("../lib").resolve())
if PATH_TO_LIB not in sys.path:
    sys.path.insert(0, PATH_TO_LIB)

In [6]:
%load_ext autoreload
%autoreload 2

from util_spacy import (
    Pipeline
)

# Constant

In [7]:
LANGUAGE: str = "english"
SPACY_MODEL_NAME: str = "en_core_web_sm"

# Language Pipeline

In [8]:
pipeline: Pipeline = Pipeline(model_name=SPACY_MODEL_NAME)
nlp = pipeline.model

In [9]:
# text = "Australian Melissa Georgiou (Melissa Georgiou) moved to Finland over a decade ago to seek happiness in one of the coldest and darkest places on Earth. “One of my favorite things about living here is that it's easy to get close to nature whether you're in a residential area or in the middle of the city,” Melissa said. Originally a teacher, 12 years ago, she switched from the beaches of Sydney to the dark winters and cold lakes of Finland, and has never looked back since. Melissa said, “For Finns, the concept of happiness is very different from the Australian concept of happiness. Finns, she said, are happy to accept portrayals of themselves as melancholy and stubborn — a popular local saying is, “People who have happiness must hide it.” “The first thing I noticed here is that you don't go to dinners or barbecues, and you don't talk about real estate. No one asks you where you live, what suburb do you live in, where your kids go to school.” The Finns seem quite happy with the status quo, and they don't always seem to want more. Melissa Georgio's Dark Night in Northern Europe Finland was named the happiest country in the world for the sixth year in a row in the “World Happiness Report” released by the United Nations. “The Nordic countries are often countries with (good) unemployment benefits, pensions, and other benefits,” explains happiness expert and researcher Frank Martela (Frank Martela). However, Frank said that Finland's position in the rankings often surprised its own people. “Finns, they're almost outraged because they don't think this can be true. We listen to sad music and hard rock.” “Therefore, happiness is not part of the Finnish self-image.” The other side of Finnish melancholy is a cultural focus on perseverance. Frank said it redefines the way Finns view happiness — a concept known as “sisu” — which is part of Finnish culture and is hard to translate directly, but can be understood as will, determination, perseverance, and reason to face adversity. This, he said, is best reflected in Finns' favorite pastime — getting warm in a sauna after taking a bath in freezing temperatures. “It's about this paradox — from one extreme to the other, and it's a pretty fun experience... because you need perseverance.” Melissa said, but Finland has many things that are great and can provide happiness for people in this country. Finland is one of the European countries least affected by the COVID-19 pandemic, and experts attribute this to a high level of trust in the government and little resistance to complying with restrictions. Trust in government, on the other hand, stems from a country's investment in its citizens. The public school system rarely tests children and is one of the best in the world. Finland also has a universal health care system with affordable childcare and strong support for parents. Melissa said, “The whole country is taking care of the child's development. This system is very well set up. So from having my son to raising him at home, to sending him to daycare, to going to school, I was very well supported in every aspect of this.” Finland vs. China, which country has the strongest sense of happiness? Since the publication of the World Happiness Report, the Nordic countries have dominated the top ten. In this year's report, Finland and its neighbors Denmark (2nd), Iceland (3rd), Sweden (6th), and Norway (7th) all scored high on happiness indicators, including healthy life expectancy, per capita GDP, low levels of corruption, social support, freedom, trust, and generosity. Other top ten countries/regions include the Netherlands (5th), Switzerland (8th), Luxembourg (9th), and New Zealand (10th). Australia ranked 12th in this report, followed by Canada (13th), Ireland (14th), and the United States (15th). In Asia, Singapore ranked 25th in the world, up two places from last year, Taiwan dropped one place from last year to 27th, Japan rose to 47th, mainland China ranked 64th, and Hong Kong ranked 82nd. Meanwhile, the polling agency Ipsos Group (Ipsos) released a survey report on the global happiness index. The results showed that out of 32 countries, the country with the highest happiness index was China (91%), followed by Saudi Arabia (86%), the Netherlands (85%), India (84%), and Brazil (83%). Australia ranked 9th in this report. According to the survey report, on average, happiness increased more significantly in middle-income countries (as defined by the World Bank) than in high-income countries."
text = """
Millions of Turks are headed to the polls Sunday in what is set to be Turkey’s most consequential election in two decades, and one whose results will have implications far beyond its own borders.

The country of 85 million holds both its presidential and parliamentary elections on May 14. For the presidency — which is expected to be close — if no candidate wins more than 50%, the vote goes to a run-off two weeks later.

Incumbent President Recep Tayyip Erdogan is facing his toughest test yet after two decades in power, grappling with public anger over worsening economic conditions and the slow government response to a series of devastating earthquakes in February that killed more than 50,000 people.

His primary opponent, 74-year-old Kemal Kilicdaroglu of the center-left Republican People’s Party (CHP), is running as a unity candidate representing six different parties that all want to see Erdogan out of power.  

In a possibly game-changing development, one of the four presidential candidates, Muharrem Ince, pulled out of the race Thursday. A former CHP member, he had been under heavy criticism for splitting the opposition vote in a way that would hurt Kilicdaroglu’s chances.

Now, with Ince out of the race, his votes may go to Erdogan’s top challenger Kilicdaroglu, helping him tremendously and spelling more trouble for the 69-year-old Erdogan.

Another crucial factor will be turnout: More than 5 million young Turks will be voting for the first time, and the greater the youth turnout, the better for the challenger candidate and the worse for the incumbent, election analysts say.

With such a high-stakes contest, many inside and out of the country are asking whether Erdogan may dispute the result if he does not win.

“The most likely tactics that he’s going to use to try to tip the vote will be to use influence in the electoral board (the YSK), courts, and media to build a narrative that either elections should be re-run or that they are illegitimate,” said Ryan Bohl, a senior Middle East and North Africa analyst at Rane. Erdogan did this in 2019 when his party narrowly lost the Istanbul mayoral race, only to lose again by a greater margin after demanding a re-run.

Some even fear violence and instability if the result is disputed, which would bring more volatility to Turkey’s already damaged economy. Turkish and foreign analysts and rights activists have for years been sounding the alarm over increasingly autocratic governance coming from Erdogan’s administration.

CNBC has reached out to the Turkish Presidency’s office for comment.

‘So much at stake’
The election’s outcome and its impact on stability in the country, which sits as a crossroads between Europe and Asia and is home to NATO’s second-largest military, is of paramount importance both domestically and internationally.

“There is so much at stake for President Erdogan and his AKP (Justice and Development Party) for the first time, as his 20-year rule over Türkiye may come to an end given the unified opposition has managed to maintain a strong alliance and stay on a hope-building positive campaign,” said Hakan Akbas, managing director of consulting firm Strategic Advisory Services based between Istanbul and Washington.

This is similar, he noted, to “what Istanbul Mayor Emrak Imamoglu did to win twice against Erdogan’s AKP candidate in the mayoral election in 2019.”

Imamoglu, a popular figure who was widely expected to run for the presidency as a formidable opponent to Erdogan, was in December sentenced to nearly three years in prison and barred from politics for what a court described as insulting the judges of the Supreme Election Council (YSK). Imamoglu and his supporters say the charges are purely political and were influenced by Erdogan and his party to sabotage his political ambitions. 
"""

In [10]:
text = re.sub(r'[\s"“]+', ' ', text).strip()
text

'Millions of Turks are headed to the polls Sunday in what is set to be Turkey’s most consequential election in two decades, and one whose results will have implications far beyond its own borders. The country of 85 million holds both its presidential and parliamentary elections on May 14. For the presidency — which is expected to be close — if no candidate wins more than 50%, the vote goes to a run-off two weeks later. Incumbent President Recep Tayyip Erdogan is facing his toughest test yet after two decades in power, grappling with public anger over worsening economic conditions and the slow government response to a series of devastating earthquakes in February that killed more than 50,000 people. His primary opponent, 74-year-old Kemal Kilicdaroglu of the center-left Republican People’s Party (CHP), is running as a unity candidate representing six different parties that all want to see Erdogan out of power. In a possibly game-changing development, one of the four presidential candida

# Process Document

Put the text through the pretrained language pipeline.

In [11]:
doc = pipeline.process(text=text)

# Dependencies

In [12]:
print("{}{}{}{}{}".format(
    "Token".ljust(15), "start".ljust(7), "POS".ljust(10), "Tag".ljust(8), "Dependency".ljust(14), "Head"
))
print("-" * 80)
for token in doc:
    print(f"{token.text:15}{token.idx:<7}{token.pos_:10}{token.tag_:8}{token.dep_:14}{token.head.text}")

Token          start  POS       Tag     Dependency    
--------------------------------------------------------------------------------
Millions       0      NOUN      NNS     nsubjpass     headed
of             9      ADP       IN      prep          Millions
Turks          12     PROPN     NNPS    pobj          of
are            18     AUX       VBP     auxpass       headed
headed         22     VERB      VBN     ROOT          headed
to             29     ADP       IN      prep          headed
the            32     DET       DT      det           polls
polls          36     NOUN      NNS     pobj          to
Sunday         42     PROPN     NNP     npadvmod      headed
in             49     ADP       IN      prep          headed
what           52     PRON      WP      nsubjpass     set
is             57     AUX       VBZ     auxpass       set
set            60     VERB      VBN     pcomp         in
to             64     PART      TO      aux           be
be             67     AUX      

In [13]:
displacy.serve(doc, style="dep", auto_select_port=True, page=False)




Using the 'dep' visualizer
Serving on http://0.0.0.0:5001 ...

Shutting down server on port 5001.


----
# Entities

Named Entities have been identified by NER in the document object.

* [Named Entities](https://spacy.io/usage/spacy-101#annotations-ner)

### Entity Labels
```
PERSON:      People, including fictional.
NORP:        Nationalities or religious or political groups.
FAC:         Buildings, airports, highways, bridges, etc.
ORG:         Companies, agencies, institutions, etc.
GPE:         Countries, cities, states.
LOC:         Non-GPE locations, mountain ranges, bodies of water.
PRODUCT:     Objects, vehicles, foods, etc. (Not services.)
EVENT:       Named hurricanes, battles, wars, sports events, etc.
WORK_OF_ART: Titles of books, songs, etc.
LAW:         Named documents made into laws.
LANGUAGE:    Any named language.
DATE:        Absolute or relative dates or periods.
TIME:        Times smaller than a day.
PERCENT:     Percentage, including ”%“.
MONEY:       Monetary values, including unit.
QUANTITY:    Measurements, as of weight or distance.
ORDINAL:     “first”, “second”, etc.
CARDINAL:    Numerals that do not fall under another type.
```

In [14]:
pipeline.named_entity_labels

('CARDINAL',
 'DATE',
 'EVENT',
 'FAC',
 'GPE',
 'LANGUAGE',
 'LAW',
 'LOC',
 'MONEY',
 'NORP',
 'ORDINAL',
 'ORG',
 'PERCENT',
 'PERSON',
 'PRODUCT',
 'QUANTITY',
 'TIME',
 'WORK_OF_ART')

In [15]:
displacy.serve(doc, style="ent", auto_select_port=True, page=False)


Using the 'ent' visualizer
Serving on http://0.0.0.0:5001 ...

Shutting down server on port 5001.


In [16]:
print("{}{}{}{}".format(
    "Entity".ljust(50), "Start".ljust(7), "End".ljust(7), "Label".ljust(10)
))
print("-" * 80)
for ent in doc.ents:
    print(f"{ent.text:50}{ent.start_char:<7}{ent.end_char:<7}{ent.label_:10}")

Entity                                            Start  End    Label     
--------------------------------------------------------------------------------
Millions                                          0      8      CARDINAL  
Turks                                             12     17     NORP      
Sunday                                            42     48     DATE      
Turkey                                            70     76     GPE       
two decades                                       110    121    DATE      
85 million                                        211    221    CARDINAL  
May 14                                            281    287    DATE      
more than 50%                                     363    376    PERCENT   
two weeks later                                   405    420    DATE      
Recep Tayyip Erdogan                              442    462    PERSON    
two decades                                       501    512    DATE      
February           

## Entities per label

In [17]:
print(json.dumps(
    pipeline.get_named_entities_from_text(
        text=text, 
        excludes=["ORDINAL", "CARDINAL", "PERCENT", "DATE", "time", "QUANTITY", "MONEY"],
        remove_special_characters=True,
        return_value_only=True,
        remove_similarity_threshold=0.9,
        include_noun_phrases=False,
        include_keywords=False
    ), 
    indent=4, 
    default=str,
    ensure_ascii=False
))

{
    "EVENT": [],
    "FAC": [],
    "GPE": [
        "Turkey",
        "Washington",
        "Kilicdaroglu",
        "Türkiye",
        "Istanbul",
        "North Africa"
    ],
    "LANGUAGE": [],
    "LAW": [],
    "LOC": [
        "Asia",
        "Middle East",
        "Europe"
    ],
    "NORP": [
        "Turkish",
        "Turks"
    ],
    "ORG": [
        "AKP",
        "CHP",
        "Supreme Election Council",
        "Kilicdaroglu",
        "electoral board",
        "Development Party",
        "NATO",
        "Republican Peoples Party",
        "Justice",
        "Erdogan",
        "CNBC",
        "Strategic Advisory Services"
    ],
    "PERSON": [
        "Ryan Bohl",
        "Kemal Kilicdaroglu",
        "Recep Tayyip Erdogan",
        "Imamoglu",
        "Hakan Akbas",
        "Emrak Imamoglu",
        "Rane",
        "Erdogan",
        "Muharrem Ince"
    ],
    "PRODUCT": [],
    "WORK_OF_ART": []
}


---
# Noun Phrases

* [Noun chunks](https://spacy.io/usage/linguistic-features#noun-chunks)

> **Noun chunks** are “base noun phrases” – flat phrases that **have a noun as their head**. You can think of noun chunks as a noun plus the words describing the noun – for example, “the lavish green grass” or “the world’s largest tech fund”. 

In [18]:
phrases = doc.noun_chunks

for sentence in doc.sents:
    print("-" * 80)
    print(sentence)
    print("-" * 80)
    
    phrase = next(phrases)
    while phrase.root.sent == sentence:
        print(
            f"phrase:[{phrase.text:30}] "\
            f"root:[{phrase.root.text:12}] "\
            f"type:[{phrase.root.dep_:6}] "\
            f"parent:[{phrase.root.head}]" 
        )
        try:
            phrase = next(phrases)
        except StopIteration:
            break
            
del phrases

--------------------------------------------------------------------------------
Millions of Turks are headed to the polls Sunday in what is set to be Turkey’s most consequential election in two decades, and one whose results will have implications far beyond its own borders.
--------------------------------------------------------------------------------
phrase:[Millions                      ] root:[Millions    ] type:[nsubjpass] parent:[headed]
phrase:[Turks                         ] root:[Turks       ] type:[pobj  ] parent:[of]
phrase:[the polls                     ] root:[polls       ] type:[pobj  ] parent:[to]
phrase:[what                          ] root:[what        ] type:[nsubjpass] parent:[set]
phrase:[Turkey’s most consequential election] root:[election    ] type:[attr  ] parent:[be]
phrase:[two decades                   ] root:[decades     ] type:[pobj  ] parent:[in]
phrase:[whose results                 ] root:[results     ] type:[nsubj ] parent:[have]
phrase:[implications 

In [19]:
for chunk in doc.noun_chunks:
    print(
        f"{chunk.text:32} " \
        f"tag:{chunk.root.dep_:10} " \
        f"root:{chunk.root.head.text:10} " \
#        sentence:{chunk.root.sent} " \
        f"children:{list(chunk.root.children)}"
    )

Millions                         tag:nsubjpass  root:headed     children:[of]
Turks                            tag:pobj       root:of         children:[]
the polls                        tag:pobj       root:to         children:[the]
what                             tag:nsubjpass  root:set        children:[]
Turkey’s most consequential election tag:attr       root:be         children:[Turkey, consequential, in]
two decades                      tag:pobj       root:in         children:[two]
whose results                    tag:nsubj      root:have       children:[whose]
implications                     tag:dobj       root:have       children:[beyond]
its own borders                  tag:pobj       root:beyond     children:[its, own]
The country                      tag:nsubj      root:holds      children:[The, of]
both its presidential and parliamentary elections tag:dobj       root:holds      children:[both, its, presidential]
May                              tag:pobj       root:on      

In [20]:
print("{}{}{}{}{}{}".format(
    "Token".ljust(15), "Lemma".ljust(15), "POS".ljust(10), "Tag".ljust(8), "Dependency".ljust(14), "StopWord"
))
print("-" * 80)
for token in doc:
    if (not token.is_stop) and len(token.text) > 1:
        print(f"{token.text:15}{token.lemma_:15}{token.pos_:10}{token.tag_:8}{token.dep_:14}{token.is_stop:}")

Token          Lemma          POS       Tag     Dependency    StopWord
--------------------------------------------------------------------------------
Millions       million        NOUN      NNS     nsubjpass     False
Turks          Turks          PROPN     NNPS    pobj          False
headed         head           VERB      VBN     ROOT          False
polls          poll           NOUN      NNS     pobj          False
Sunday         Sunday         PROPN     NNP     npadvmod      False
set            set            VERB      VBN     pcomp         False
Turkey         Turkey         PROPN     NNP     poss          False
consequential  consequential  ADJ       JJ      amod          False
election       election       NOUN      NN      attr          False
decades        decade         NOUN      NNS     pobj          False
results        result         NOUN      NNS     nsubj         False
implications   implication    NOUN      NNS     dobj          False
far            far            AD

---

In [21]:
pipeline.get_noun_phrases_from_document(
    doc=doc,
    remove_special_characters=False,
    remove_stopwords=False,
)

['the polls',
 'Turkey’s most consequential election',
 'two decades',
 'whose results',
 'its own borders',
 'The country',
 'both its presidential and parliamentary elections',
 'the presidency',
 'no candidate',
 'more than 50%',
 'the vote',
 'Incumbent President Recep Tayyip Erdogan',
 'his toughest test',
 'two decades',
 'public anger',
 'economic conditions',
 'the slow government response',
 'a series',
 'devastating earthquakes',
 'more than 50,000 people',
 'His primary opponent',
 '74-year-old Kemal Kilicdaroglu',
 'the center-left Republican People’s Party',
 'a unity candidate',
 'six different parties',
 'a possibly game-changing development',
 'the four presidential candidates',
 'Muharrem Ince',
 'the race',
 'heavy criticism',
 'the opposition vote',
 'a way',
 'Kilicdaroglu’s chances',
 'the race',
 'his votes',
 'Erdogan’s top challenger Kilicdaroglu',
 'more trouble',
 'the 69-year-old Erdogan',
 'Another crucial factor',
 'More than 5 million young Turks',
 'the f

# Noun Phrase Entities

Find a phrase that includes an identified named entity.

In [22]:
print(json.dumps(
    pipeline.get_named_entities_from_text(
        text=text, 
        excludes=["ORDINAL", "CARDINAL", "PERCENT", "DATE", "time", "QUANTITY", "MONEY"],
        remove_special_characters=True,
        return_value_only=True,
        remove_similarity_threshold=0.9,
        include_noun_phrases=True,
        include_keywords=False
    ), 
    indent=4, 
    default=str,
    ensure_ascii=False
))

{
    "EVENT": [],
    "FAC": [],
    "GPE": [
        "Turkey",
        "Washington",
        "Kilicdaroglu",
        "Türkiye",
        "Istanbul",
        "North Africa"
    ],
    "LANGUAGE": [],
    "LAW": [],
    "LOC": [
        "Asia",
        "Middle East",
        "Europe"
    ],
    "NORP": [
        "Turkish",
        "Turks"
    ],
    "ORG": [
        "AKP",
        "CHP",
        "Supreme Election Council",
        "Kilicdaroglu",
        "electoral board",
        "Development Party",
        "NATO",
        "Republican Peoples Party",
        "Justice",
        "Erdogan",
        "CNBC",
        "Strategic Advisory Services",
        "the electoral board"
    ],
    "PERSON": [
        "Ryan Bohl",
        "Kemal Kilicdaroglu",
        "Recep Tayyip Erdogan",
        "Imamoglu",
        "Hakan Akbas",
        "Emrak Imamoglu",
        "Rane",
        "Erdogan",
        "Muharrem Ince",
        "Incumbent President Recep Tayyip Erdogan"
    ],
    "PRODUCT": [],
    "WO

---
# Keyword Extraction

* [textacy - Information Extraction](https://textacy.readthedocs.io/en/latest/api_reference/extract.html)

In [23]:
print(json.dumps(
    pipeline.get_named_entities_from_text(
        text=text, 
        excludes=["ORDINAL", "CARDINAL", "PERCENT", "DATE", "time", "QUANTITY", "MONEY"],
        remove_special_characters=True,
        return_value_only=True,
        remove_similarity_threshold=0.9,
        include_noun_phrases=False,
        include_keywords=True
    ), 
    indent=4, 
    default=str,
    ensure_ascii=False
))

{
    "EVENT": [],
    "FAC": [],
    "GPE": [
        "Turkey",
        "Washington",
        "Kilicdaroglu",
        "Türkiye",
        "Istanbul",
        "North Africa"
    ],
    "LANGUAGE": [],
    "LAW": [],
    "LOC": [
        "Asia",
        "Middle East",
        "Europe"
    ],
    "NORP": [
        "Turkish",
        "Turks"
    ],
    "ORG": [
        "AKP",
        "CHP",
        "Supreme Election Council",
        "Kilicdaroglu",
        "electoral board",
        "Development Party",
        "NATO",
        "Republican Peoples Party",
        "Justice",
        "Erdogan",
        "CNBC",
        "Strategic Advisory Services"
    ],
    "PERSON": [
        "Ryan Bohl",
        "Kemal Kilicdaroglu",
        "Recep Tayyip Erdogan",
        "Imamoglu",
        "Hakan Akbas",
        "Emrak Imamoglu",
        "Rane",
        "Erdogan",
        "Muharrem Ince"
    ],
    "PRODUCT": [],
    "WORK_OF_ART": [],
    "KEYWORDS": [
        "incumbent President Recep Tayyip Erdogan

---
# Synonyms



In [24]:
del pipeline, doc

from nltk.corpus import (
    wordnet
)

In [25]:
def get_synonyms(text: str, pos=None):
    """Get synonym of the POS type. 
    
    "speed" can be verb and noun. To get the synonyms for verb, set pos='v'
    
    Args:
        text: text to find the synonyms
        pos: part of speech tag (n=noun, v=verb, a=adjective, r=adverb)
    Returns: set of synonyms
    """
    text = "_".join(text.split())
    synonyms = wordnet.synsets(text, pos=pos)
    lemmas = set(chain.from_iterable([word.lemma_names() for word in synonyms]))
    return lemmas - {text}

In [26]:
get_synonyms(text="speed", pos="v")

{'accelerate',
 'belt_along',
 'bucket_along',
 'cannonball_along',
 'hasten',
 'hie',
 'hotfoot',
 'hurry',
 'pelt_along',
 'quicken',
 'race',
 'rush',
 'rush_along',
 'speed_up',
 'step_on_it',
 'travel_rapidly',
 'zip'}

## Conversion

happy -> hapiness
speedy -> speed


In [27]:
# Just to make it a bit more readable
WN_NOUN = 'n'
WN_VERB = 'v'
WN_ADJECTIVE = 'a'
WN_ADJECTIVE_SATELLITE = 's'
WN_ADVERB = 'r'


def convert(word, from_pos, to_pos):    
    """ Transform words given from/to POS tags """
    synsets = wordnet.synsets(word, pos=from_pos)

    # Word not found
    if not synsets:
        return []

    # Get all lemmas of the word (consider 'a'and 's' equivalent)
    lemmas = []
    for s in synsets:
        for l in s.lemmas():
            if s.name().split('.')[1] == from_pos or from_pos in (WN_ADJECTIVE, WN_ADJECTIVE_SATELLITE) and s.name().split('.')[1] in (WN_ADJECTIVE, WN_ADJECTIVE_SATELLITE):
                lemmas += [l]

    # Get related forms
    derivationally_related_forms = [(l, l.derivationally_related_forms()) for l in lemmas]

    # filter only the desired pos (consider 'a' and 's' equivalent)
    related_noun_lemmas = []

    for drf in derivationally_related_forms:
        for l in drf[1]:
            if l.synset().name().split('.')[1] == to_pos or to_pos in (WN_ADJECTIVE, WN_ADJECTIVE_SATELLITE) and l.synset().name().split('.')[1] in (WN_ADJECTIVE, WN_ADJECTIVE_SATELLITE):
                related_noun_lemmas += [l]

    # Extract the words from the lemmas
    words = [l.name() for l in related_noun_lemmas]
    len_words = len(words)

    # Build the result in the form of a list containing tuples (word, probability)
    result = [(w, float(words.count(w)) / len_words) for w in set(words)]
    result.sort(key=lambda w:-w[1])

    # return all the possibilities sorted by probability
    return result

In [28]:
convert("happy", "a", "n")

[('happiness', 0.6), ('felicitousness', 0.2), ('felicity', 0.2)]

In [33]:
convert("incumbent", "a", "n")

[('incumbency', 1.0)]

In [30]:
convert("speedy", "a", "n")

[('speed', 0.375),
 ('speediness', 0.25),
 ('quickness', 0.125),
 ('rapidity', 0.125),
 ('rapidness', 0.125)]

In [31]:
convert("speedy", "a", "r")

[]