# Tutorial 3: Wordnet
(Syn-)Semantic networks like WordNet are important resources for NLP. In this tutorial you will use basic functionalities of WordNet and the German variant GermaNet. Use the documentation for WordNet (https://www.nltk.org/howto/wordnet.html) and GermaNet (https://germanetpy.readthedocs.io/en/latest/). If you get stuck with the API documentation, use other documentation and help like stackoverflow.com or tutorials.

## Task 1: Importing modules and data
### a: Import NLP modules
Import Pandas, Numpy, NLTK and RE as in the first tutorial and import them WordNet (as wn) from nltk.corpus.
### b: Import of "Quality-of-Life Modules
Often modules are not necessary, but make it easier to work with larger corpora. Install them pandarallel. As in the first tutorial, import the method pandarallel (for parallelization in pandas) and initialize it with pandarallel.initialize(). You can now use parallel\_apply() instead of the pandas method apply() for parallelizable tasks.


In [1]:
import pandas as pd
from nltk.corpus import wordnet as wn

import pandas as pd
print ("pandas", pd.__version__)

import numpy as np
print ("numpy", np.__version__)

import nltk
print ("nltk", nltk.__version__)

import re
print ("re", re.__version__)

from pandarallel import pandarallel  # parallelization
pandarallel.initialize()

pandas 1.3.4
numpy 1.22.3
nltk 3.7
re 2.2.1
INFO: Pandarallel will run on 4 workers.
INFO: Pandarallel will use Memory file system to transfer data between the main process and workers.


### 1.2: Importing the data
A data set on biased words (words that carry bias or unobjective value) is provided for the WordNet tutorial. Import the data frame "data.pkl" saved as a pickle.

In [2]:
data = pd.read_pickle("data.pkl")

In [3]:
data 

Unnamed: 0,sentence,topic,Label_bias,biased_words
0,YouTube is making clear there will be no “birtherism” on its platform during this year’s U.S. presidential election – a belated response to a type of conspiracy theory more prevalent in the 2012 race.,elections-2020,Biased,"[belated, birtherism]"
1,The increasingly bitter dispute between American women’s national soccer team and the U.S. Soccer Federation spilled onto the field Wednesday night when players wore their warm-up jerseys inside outin a protest before their 3-1 victory over Japan.,sport,Non-biased,[bitter]
2,"So while there may be a humanitarian crisis driving more vulnerable people to seek asylum in the United States, there is no security crisis.",immigration,Biased,[crisis]
3,"A professor who teaches climate change classes — a subject some would question as a legitimate area of study — said she has seen students who suffer fear, grief, stress, and anxiety about the future.",environment,Non-biased,[legitimate]
4,"Looking around the United States, there is never enough welfare for the left to stop killing developing humans in utero—solidly Democratic states lead the nation in abortion rates.",abortion,Biased,"[killing, never, developing, humans, enough]"
...,...,...,...,...
1695,In every case legislators are being swarmed by right-wing activists who don’t hesitate to use deceit and hysteria to stop Equal Rights Amendment (ERA) ratification from happening.,gender,Biased,"[deceit, hysteria, swarmed, right-wing]"
1696,"Polls show the transgender ideology is deeply unpopular, especially among women and parents.",gender,Biased,"[ideology, unpopular]"
1697,"Democrats and Republicans stood and applauded after Illinois Rep. Rodney Davis, the top Republican on the House Administration Committee, saluted Haaland for making history as the first Native American woman to preside over the chamber.",gender,Non-biased,[saluted]
1698,"As a self-described Democratic socialist, Sen. Bernie Sanders, I-Vt., has been outspoken about economic inequality.",middle-class,Non-biased,[outspoken]


## Task 2: Synsets
Synsets are the meaning-distinguishable definitions of words or tokens. One application purpose is to extend the context to words with meaning equivalent words. For the given dataset, we would like to capture the synonyms to the biased words in the form of the lemmas to the associated synsets.
### a) Find synsets
Extract a list of the pairwise different biased words (column "biased_words") from the dataset and store them in a separate list "b_words" (not in the dataframe!). 

In [4]:
test_l = [["a", "b"], ["c"], ["d", "e"]]

In [5]:
test_l2 = [a for b in test_l for a in b]

In [6]:
test_l2

['a', 'b', 'c', 'd', 'e']

In [7]:
b_words = list(set([a for b in data.biased_words.to_list() for a in b]))

In [8]:
b_words = [a for a in b_words if len(a)>1]

In [9]:
b_words[:10]

['exploit',
 'conditions',
 'socialists',
 'diverting',
 'ashamed',
 'Democrats',
 'penchant',
 'Islamophobic',
 'transgender',
 'prioritizes']

### b) Find synonyms
For each of the words, determine all synsets and all lemmas belonging to the synsets. Store all pairwise different lemmas for all biased words in a list "b_words_synonyms". 

In [10]:
synonyms = []
for word in b_words:
    for syn in wn.synsets(word):
         for lemma in syn.lemmas():
                synonyms.extend([lemma.name()])

b_words_synonyms = list(set(synonyms))

### Optional:

hyponyms (sub-term) and antonyms:

In [11]:
wn.synsets("cat")[0].hyponyms()[0].lemmas()[0].name()

'domestic_cat'

In [12]:
antonyms = []
  
for syn in wn.synsets("good"):
    for l in syn.lemmas():
        synonyms.append(l.name())
        if l.antonyms():
            antonyms.append(l.antonyms()[0].name())

In [13]:
antonyms

['evil', 'evilness', 'bad', 'badness', 'bad', 'evil', 'ill']

### 2.3 Candidates for further biased words
The synonyms for the biased words that you have just determined are a good starting point for manually determining additional biased words. Finally, create a list "new_b_words" in which you store all words of the list "b_words_synonyms" that are not contained in the original list of "b_words". For all 3 generated lists, display the number of words contained.

In [14]:
new_b_words = [a for a in b_words_synonyms if a not in b_words]

In [15]:
print(len(b_words))
print(len(b_words_synonyms))
print(len(new_b_words))

2255
10062
8509


# Part 2 GermaNet
GermaNet is the version of Germanet developed at the University of Tübingen for relational synsemantic word networks in German. Although the basic functionalities are largely identical to Wordnet, they are called differently. Since you will be dealing more with German datasets from the ESUPOL project later in this course, GermaNet could be a useful tool for semantic analysis.

### a: Install and import GermaNet
GermaNet is not in the public domain; the TH Köln has been granted a license for use in the context of teaching and research. Therefore, use GermaNet only for assignments and projects within the scope of this course. 
Copy the provided folder "germanetpy" into your site-packages folder. 
Make sure that germanet is found properly. You can be sure and install the module again via pip:

```python
import sys
!{sys.executable} -m pip install -U germanetpy
```

WordNet uses XML for relations and text files for frequencies, which must be stored in a specific location. Follow the instructions of the official API to set up GermaNet correctly:


```python
from pathlib import Path
from germanetpy.germanet import Germanet

data_path = str(Path.home()) + "/germanet/GN_V150/GN_V150_XML"
frequencylist_nouns = str(Path.home()) + "/germanet/GN_V150/FreqLists/noun_freqs_decow14_16.txt"
germanet = Germanet(data_path)
```


In [16]:
import sys
!{sys.executable} -m pip install -U germanetpy

Requirement already up-to-date: germanetpy in /home/fabian/.local/lib/python3.8/site-packages (0.2.1)


In [17]:
from pathlib import Path
from germanetpy.germanet import Germanet

data_path = str(Path.home()) + "/germanet/GN_V150/GN_V150_XML"
frequencylist_nouns = str(Path.home()) + "/germanet/GN_V150/FreqLists/noun_freqs_decow14_16.txt"
germanet = Germanet(data_path)

Load GermaNet data...: 100%|█████████▉| 99.99999999999996/100 [00:09<00:00, 10.52it/s] 
Load Wictionary data...: 100%|██████████| 100.0/100 [00:00<00:00, 394.06it/s]            
Load Ili records...: 100%|██████████| 100.0/100 [00:00<00:00, 140230.83it/s]


### b: Import data set
Import the file "single_term_suggestions.txt" as a dataframe. The file contains a list of single-word query suggestions from the 2017 federal election dataset presented in the lecture. You can use the Pandas method "read_csv".

In [18]:
data_ger = pd.read_csv("single_term_suggestions.txt")

In [19]:
data_ger

Unnamed: 0,suggestion_ger
0,aa
1,aach
2,aalten
3,aarburg
4,aaronn
...,...
3757,zwangsdienst
3758,zwangshypothek
3759,zweibruecken
3760,zwickau


## Task 4: Using Germanet
### a) Synsets
Determine all synsets for each of the suggestions (i.e., each row of data) and store the list in a separate column.

In [20]:
data_ger["synsets_ger"] = data_ger.apply(lambda row: germanet.get_synsets_by_orthform(row["suggestion_ger"], ignorecase = True), axis=1)

### b) Lexical units
For each row, determine all lemmas (lexical units, i.e. lexunits) for all synsets and enter them as a list in a new column. 

In [21]:
fussball_synsets = germanet.get_synsets_by_orthform("Fußball")

In [22]:
for a in fussball_synsets[1].lexunits:
    print(a, "\n",a.orthform)

Lexunit(id=l11339, orthform=Fußball, synset_id=s7944) 
 Fußball


In [23]:
def get_names_synsets(syns):
    ret = []
    try:
        for syn in syns:
            for lemma in syn.lexunits:
                ret.extend(lemma.get_all_orthforms())
    #print(ret)
        return ret
    except:
        return ret

In [24]:
data_ger["lexunits"] = data_ger.apply(lambda row: get_names_synsets(row["synsets_ger"]), axis=1)

In [25]:
data_ger[23:44]

Unnamed: 0,suggestion_ger,synsets_ger,lexunits
23,academy,[],[]
24,accept,[],[]
25,achtsamkeit,"[Synset(id=s64491, lexunits=Achtsamkeit)]",[Achtsamkeit]
26,adel,"[Synset(id=s24212, lexunits=Adelsgeschlecht, Adel), Synset(id=s32247, lexunits=Adelstitel, Adel, Adelsbezeichnung)]","[Adelsgeschlecht, Adel, Adelstitel, Adel, Adelsbezeichnung]"
27,adidas,[],[]
...,...,...,...
39,afd,"[Synset(id=s142253, lexunits=AfD, Alternative für Deutschland)]","[AfD, Alternative für Deutschland]"
40,affair,[],[]
41,affeln,[],[]
42,affing,[],[]


### c) Hypernyms
Semantic networks such as Germanet describe actual relationships between synsets. Hypernyms (supertypes) and hyponyms (subtypes) can be useful for classifying terms.
Determine all hypernyms for all synsets of each suggestion and store their synsets in a column "hypernyms". Then determine the lemmas of these hypernyms and store them in a separate column. 

In [26]:
def get_hypernyms_from_list_of_synsets(list_syns):
    for syn in list_syns:
        return syn.direct_hypernyms

In [27]:
data_ger["hypernyms"] = data_ger.apply(lambda row: get_hypernyms_from_list_of_synsets(row["synsets_ger"]), axis=1)

In [28]:
data_ger["lexunits_hypernyms"] = data_ger.apply(lambda row: get_names_synsets(row["hypernyms"]), axis=1)

### d) Hyponyms
Proceed as in c, but this time determine the hyponyms of the suggestions and their lemmas.

In [29]:
def get_hyponyms_from_list_of_synsets(list_syns):
    for syn in list_syns:
        return syn.direct_hyponyms

In [30]:
data_ger["hyponyms"] = data_ger.apply(lambda row: get_hyponyms_from_list_of_synsets(row["synsets_ger"]), axis=1)

In [31]:
data_ger["lexunits_hyponyms"] = data_ger.apply(lambda row: get_names_synsets(row["hyponyms"]), axis=1)

In [32]:
data_ger[["suggestion_ger","lexunits_hypernyms", "lexunits_hyponyms", "lexunits"]]

Unnamed: 0,suggestion_ger,lexunits_hypernyms,lexunits_hyponyms,lexunits
0,aa,"[Fluggesellschaft, Luftfahrtgesellschaft, Fluglinie, Airline]",[],"[American Airlines, AA, Kot, Scheiße, Exkrement, Stuhl, Kacke, Kaka, Aa, Stuhlgang]"
1,aach,[],[],[]
2,aalten,[],[],[]
3,aarburg,[],[],[]
4,aaronn,[],[],[]
...,...,...,...,...
3757,zwangsdienst,[Dienst],[],[Zwangsdienst]
3758,zwangshypothek,[],[],[]
3759,zweibruecken,[],[],[]
3760,zwickau,[Stadt],[],[Zwickau]


Get the number of all pairwise different hypernyms, hyponyms and lemmas.

In [33]:
lexunits_hypernyms = list(set([a for b in data_ger.lexunits_hypernyms for a in b]))
lexunits_hyponyms = list(set([a for b in data_ger.lexunits_hyponyms for a in b]))
lexunits = list(set([a for b in data_ger.lexunits for a in b]))

In [34]:
print(len(lexunits_hypernyms))
print(len(lexunits_hyponyms))

2026
12543


### e) Classification tags with synsets
Synsemantic networks can be used, for example, to classify terms. 
Use the identified hypernyms to find all cities in the dataset. In a "location" column, classify all cities, countries and places as "True" and all other suggestions as "False". Display a sub-dataframe of all locations in the dataset.

In [35]:
def is_word_in_list(liste, words):
    for word in words:
        if word in liste:
            return True
    return False

In [36]:
data_ger["city"] = data_ger.apply(lambda row: is_word_in_list(row["lexunits_hypernyms"], ["Land", "Dorf", "Stadt", "Ort", "Platz", "Staat", "Bundesland" ]), axis=1)

In [37]:
data_ger.loc[(data_ger["city"]==True)]

Unnamed: 0,suggestion_ger,synsets_ger,lexunits,hypernyms,lexunits_hypernyms,hyponyms,lexunits_hyponyms,city
43,afghanistan,"[Synset(id=s44583, lexunits=Afghanistan)]",[Afghanistan],"{Synset(id=s44177, lexunits=Land, Staat)}","[Land, Staat]",{},[],True
64,albanien,"[Synset(id=s44497, lexunits=Albanien)]",[Albanien],"{Synset(id=s44177, lexunits=Land, Staat)}","[Land, Staat]",{},[],True
69,aleppo,"[Synset(id=s73659, lexunits=Aleppo)]",[Aleppo],"{Synset(id=s43645, lexunits=Stadt)}",[Stadt],{},[],True
99,altenburg,"[Synset(id=s73693, lexunits=Altenburg)]",[Altenburg],"{Synset(id=s43645, lexunits=Stadt)}",[Stadt],{},[],True
121,amorbach,"[Synset(id=s44036, lexunits=Amorbach)]",[Amorbach],"{Synset(id=s43645, lexunits=Stadt)}",[Stadt],{},[],True
...,...,...,...,...,...,...,...,...
3666,wittenberg,"[Synset(id=s44111, lexunits=Wittenberg)]",[Wittenberg],"{Synset(id=s43645, lexunits=Stadt)}",[Stadt],{},[],True
3691,worms,"[Synset(id=s44019, lexunits=Worms)]",[Worms],"{Synset(id=s43645, lexunits=Stadt)}",[Stadt],{},[],True
3702,wuppertal,"[Synset(id=s44061, lexunits=Wuppertal)]",[Wuppertal],"{Synset(id=s43645, lexunits=Stadt)}",[Stadt],{},[],True
3707,xanten,"[Synset(id=s44082, lexunits=Xanten)]",[Xanten],"{Synset(id=s43645, lexunits=Stadt)}",[Stadt],{},[],True


### BONUS TASK: Semantic Similarity and Relatedness
The relational structure of synsets can be used to infer the similarity or relatedness of two terms of the same word type. In turn, the similarity can be used, for example, to eliminate ambiguity. In the case of the dataset, the terms form search suggestions to person-related searches in search engines, which are based on the names of politicians as search terms. Thus, the term "abort" was suggested as a search suggestion for at least one politician's name. To identify the relevant synset from the suggestions, the similarity to the synset "politician" ( synset(id=s34818, lexunits=politician, politician) ) can be determined. 

Proceed as explained in the official GermaNet tutorial (https://github.com/Germanet-sfs/germanetTutorials/tree/master/pythonAPI) to calculate the similarity to the politician synset for all synsets of all suggestions, respectively, and store the synset with the highest similarity in a "best_syn" column. Use either Path- or IC- based similarity or run both separately. Finally, export a sub-dataframe containing only those rows whose suggestion has at least 2 synsets. 

In [53]:
data_ger["synsets_check"] = data_ger.apply(lambda row: len(row["synsets_ger"]), axis=1)

In [54]:
data_ger

Unnamed: 0,suggestion_ger,synsets_ger,lexunits,hypernyms,lexunits_hypernyms,hyponyms,lexunits_hyponyms,city,synsets_check
0,aa,"[Synset(id=s23506, lexunits=American Airlines, AA), Synset(id=s26358, lexunits=Kot, Scheiße, Exkrement, Stuhl, Kacke, Kaka, Aa, Stuhlgang)]","[American Airlines, AA, Kot, Scheiße, Exkrement, Stuhl, Kacke, Kaka, Aa, Stuhlgang]","{Synset(id=s23493, lexunits=Fluggesellschaft, Luftfahrtgesellschaft, Fluglinie, Airline)}","[Fluggesellschaft, Luftfahrtgesellschaft, Fluglinie, Airline]",{},[],False,2
1,aach,[],[],,[],,[],False,0
2,aalten,[],[],,[],,[],False,0
3,aarburg,[],[],,[],,[],False,0
4,aaronn,[],[],,[],,[],False,0
...,...,...,...,...,...,...,...,...,...
3757,zwangsdienst,"[Synset(id=s131522, lexunits=Zwangsdienst)]",[Zwangsdienst],"{Synset(id=s19797, lexunits=Dienst)}",[Dienst],{},[],False,1
3758,zwangshypothek,[],[],,[],,[],False,0
3759,zweibruecken,[],[],,[],,[],False,0
3760,zwickau,"[Synset(id=s44112, lexunits=Zwickau)]",[Zwickau],"{Synset(id=s43645, lexunits=Stadt)}",[Stadt],{},[],True,1


In [55]:
data_bonus = data_ger.loc[(data_ger["synsets_check"]>1)]

In [56]:
data_bonus

Unnamed: 0,suggestion_ger,synsets_ger,lexunits,hypernyms,lexunits_hypernyms,hyponyms,lexunits_hyponyms,city,synsets_check
0,aa,"[Synset(id=s23506, lexunits=American Airlines, AA), Synset(id=s26358, lexunits=Kot, Scheiße, Exkrement, Stuhl, Kacke, Kaka, Aa, Stuhlgang)]","[American Airlines, AA, Kot, Scheiße, Exkrement, Stuhl, Kacke, Kaka, Aa, Stuhlgang]","{Synset(id=s23493, lexunits=Fluggesellschaft, Luftfahrtgesellschaft, Fluglinie, Airline)}","[Fluggesellschaft, Luftfahrtgesellschaft, Fluglinie, Airline]",{},[],False,2
6,abbruch,"[Synset(id=s22422, lexunits=Abbruch, Beendigung, Beenden, Aufhören, Beendung), Synset(id=s106337, lexunits=Abbruch), Synset(id=s106285, lexunits=Abbruch), Synset(id=s106286, lexunits=Abbruch), Synset(id=s21303, lexunits=Abriss, Abbruch, Niederreißung)]","[Abbruch, Beendigung, Beenden, Aufhören, Beendung, Abbruch, Abbruch, Abbruch, Abriß, Abriss, Abbruch, Niederreißung]","{Synset(id=s21961, lexunits=Veränderung, Änderung)}","[Veränderung, Änderung]","{Synset(id=s121482, lexunits=Schulabbruch), Synset(id=s22427, lexunits=Ausschaltung, Ausschalten), Synset(id=s113518, lexunits=Bauabbruch), Synset(id=s72930, lexunits=Ablauf), Synset(id=s117289, lexunits=Startabbruch), Synset(id=s22429, lexunits=Abkehr, Abwendung), Synset(id=s100335, lexunits=Vertragsbeendigung), Synset(id=s123548, lexunits=Verbindungsabbruch), Synset(id=s22430, lexunits=Einstellung), Synset(id=s22517, lexunits=Vollendung), Synset(id=s22518, lexunits=Aufkündigung, Kündigung, Vertragsauflösung, Vertragskündigung, Vertragsaufhebung), Synset(id=s22432, lexunits=Schließung), Synset(id=s140469, lexunits=Fastenbrechen), Synset(id=s149250, lexunits=Aufenthaltsbeendigung), Synset(id=s22433, lexunits=Auflösung), Synset(id=s147580, lexunits=Kontaktabbruch), Synset(id=s92201, lexunits=Ausblendung), Synset(id=s101983, lexunits=Spielabbruch), Synset(id=s22435, lexunits=Niederlegung), Synset(id=s22436, lexunits=Entlassen, Entlassung), Synset(id=s106452, lexunits=Aufhebung), Synset(id=s112714, lexunits=Studienabbruch), Synset(id=s68923, lexunits=Endanflug), Synset(id=s22439, lexunits=Ableistung), Synset(id=s22423, lexunits=Niederschlagung), Synset(id=s22424, lexunits=Niederschlagung), Synset(id=s51777, lexunits=Ausstieg, Ausstiegsphase), Synset(id=s22425, lexunits=Abschaffung, Abschaffen)}","[Schulabbruch, Ausschaltung, Ausschalten, Bauabbruch, Ablauf, Startabbruch, Abkehr, Abwendung, Vertragsbeendigung, Verbindungsabbruch, Einstellung, Vollendung, Aufkündigung, Kündigung, Vertragsauflösung, Vertragskündigung, Vertragsaufhebung, Schließung, Fastenbrechen, Aufenthaltsbeendigung, Auflösung, Kontaktabbruch, Ausblendung, Spielabbruch, Niederlegung, Entlassen, Entlassung, Aufhebung, Studienabbruch, Endanflug, Ableistung, Niederschlagung, Niederschlagung, Ausstieg, Ausstiegsphase, Abschaffung, Abschaffen]",False,5
15,abnehmen,"[Synset(id=s53836, lexunits=abnehmen), Synset(id=s53908, lexunits=abchecken, abnehmen, begutachten, checken), Synset(id=s53807, lexunits=abnehmen), Synset(id=s56878, lexunits=abnehmen, herunternehmen, runternehmen), Synset(id=s60512, lexunits=abnehmen), Synset(id=s54392, lexunits=abnehmen), Synset(id=s54723, lexunits=abnehmen, abkaufen), Synset(id=s60521, lexunits=abnehmen), Synset(id=s52508, lexunits=wegnehmen, abnehmen, fortnehmen), Synset(id=s59787, lexunits=abnehmen)]","[abnehmen, abchecken, abnehmen, begutachten, checken, abnehmen, abnehmen, herunternehmen, runternehmen, abnehmen, abnehmen, abnehmen, abkaufen, abnehmen, wegnehmen, abnehmen, fortnehmen, abnehmen]","{Synset(id=s53835, lexunits=helfen)}",[helfen],{},[],False,10
18,abschied,"[Synset(id=s105682, lexunits=Abschied), Synset(id=s17481, lexunits=Abschied, Lebewohl, Verabschiedung)]","[Abschied, Abschied, Lebewohl, Verabschiedung]","{Synset(id=s22436, lexunits=Entlassen, Entlassung)}","[Entlassen, Entlassung]",{},[],False,2
26,adel,"[Synset(id=s24212, lexunits=Adelsgeschlecht, Adel), Synset(id=s32247, lexunits=Adelstitel, Adel, Adelsbezeichnung)]","[Adelsgeschlecht, Adel, Adelstitel, Adel, Adelsbezeichnung]","{Synset(id=s24202, lexunits=Haus, Geschlecht, Dynastie, Familiendynastie, Familienclan)}","[Haus, Geschlecht, Dynastie, Familiendynastie, Familienclan]","{Synset(id=s93207, lexunits=Dienstadel), Synset(id=s136589, lexunits=Kriegeradel), Synset(id=s24215, lexunits=Hochadel), Synset(id=s111596, lexunits=Kleinadel), Synset(id=s65784, lexunits=Edelleute), Synset(id=s65171, lexunits=Ortsadel), Synset(id=s92515, lexunits=Gentry), Synset(id=s63065, lexunits=Uradel), Synset(id=s24216, lexunits=Bauernadel, Landadel), Synset(id=s87133, lexunits=Hofadel), Synset(id=s101528, lexunits=Erbadel), Synset(id=s122751, lexunits=Feudaladel), Synset(id=s24214, lexunits=Noblesse), Synset(id=s102967, lexunits=Reichsadel), Synset(id=s24217, lexunits=Stadtadel), Synset(id=s76879, lexunits=Amtsadel)}","[Dienstadel, Kriegeradel, Hochadel, Kleinadel, Edelleute, Ortsadel, Gentry, Uradel, Bauernadel, Landadel, Hofadel, Erbadel, Feudaladel, Noblesse, Reichsadel, Stadtadel, Amtsadel]",False,2
...,...,...,...,...,...,...,...,...,...
3741,zirkus,"[Synset(id=s108415, lexunits=Zirkus), Synset(id=s73248, lexunits=Zirkus, Zirkusunternehmen), Synset(id=s17960, lexunits=Zirkus), Synset(id=s14333, lexunits=Rabatz, Radau, Trubel, Theater, Zirkus)]","[Circus, Zirkus, Circus, Zirkus, Zirkusunternehmen, Circus, Zirkus, Rabatz, Radau, Trubel, Theater, Zirkus]","{Synset(id=s42745, lexunits=Arena, Stadion)}","[Arena, Stadion]",{},[],False,4
3742,zitat,"[Synset(id=s32274, lexunits=Zitat), Synset(id=s32501, lexunits=Zitat)]","[Zitat, Zitat]","{Synset(id=s32267, lexunits=Redewendung, Idiom, Redensart, Wendung)}","[Redewendung, Idiom, Redensart, Wendung]","{Synset(id=s137738, lexunits=Selbstzitat)}",[Selbstzitat],False,2
3752,zug,"[Synset(id=s40992, lexunits=Zugkraft, Zug), Synset(id=s41350, lexunits=Luftzug, Zug, Luft), Synset(id=s17675, lexunits=Zug), Synset(id=s8724, lexunits=Eisenbahn, Eisenbahnzug, Zug, Bahn), Synset(id=s14348, lexunits=Gesichtszug, Zug), Synset(id=s76189, lexunits=Zug), Synset(id=s21974, lexunits=Ziehen, Zug), Synset(id=s76183, lexunits=Zug), Synset(id=s109599, lexunits=Zug), Synset(id=s76184, lexunits=Zug), Synset(id=s14044, lexunits=Charaktereigenschaft, Charakterzug, Zug, Wesenszug, Charaktermerkmal), Synset(id=s8959, lexunits=Lastzug, Zug), Synset(id=s76185, lexunits=Zug), Synset(id=s44336, lexunits=Zug)]","[Zugkraft, Zug, Luftzug, Zug, Luft, Zug, Eisenbahn, Eisenbahnzug, Zug, Bahn, Gesichtszug, Zug, Zug, Ziehen, Zug, Zug, Zug, Zug, Charaktereigenschaft, Charakterzug, Zug, Wesenszug, Charaktermerkmal, Lastzug, Zug, Zug, Zug]","{Synset(id=s40976, lexunits=Kraft)}",[Kraft],{},[],False,14
3753,zukunft,"[Synset(id=s51054, lexunits=Zukunft, Vorzeitigkeit, Hinkunft), Synset(id=s64962, lexunits=Futur, Zukunft, Futurum)]","[Zukunft, Vorzeitigkeit, Hinkunft, Futur, Zukunft, Futurum]","{Synset(id=s51051, lexunits=Zeitstufe)}",[Zeitstufe],"{Synset(id=s139797, lexunits=Energiezukunft), Synset(id=s22920, lexunits=Nachwelt), Synset(id=s100061, lexunits=Vorweg)}","[Energiezukunft, Nachwelt, Vorweg]",False,2


In [43]:
from germanetpy.path_based_relatedness_measures import PathBasedRelatedness
from germanetpy.synset import WordCategory
from germanetpy.icbased_similarity import ICBasedSimilarity

relatedness_nouns = ICBasedSimilarity(germanet=germanet, 
                                      wordcategory=WordCategory.nomen,
                                      path=frequencylist_nouns)

file /home/fabian/germanet/GN_V150/FreqLists/noun_freqs_decow14_16.txt does not exist


In [44]:
syn_politikerIn = germanet.get_synsets_by_orthform("Politiker")
syn_politikerIn

[Synset(id=s34818, lexunits=Politiker, Politikerin)]

In [45]:
def find_distance_to_politician(synset):
    syn_p = germanet.get_synsets_by_orthform("Politiker").pop()
    path_distance = synset.shortest_path_distance(syn_p)
    return path_distance

In [46]:
# from GermaNet Tutorial:
def find_most_related_to_politician_path(list_syns):
    # First, construct a path-based similarity object. 
    # The johannis_wurm and leber_trans synsets are maximally far apart among nouns:
    johannis_wurm = germanet.get_synset_by_id("s49774")
    leber_trans = germanet.get_synset_by_id("s83979")
    relatedness_calculator = PathBasedRelatedness(germanet=germanet, category=WordCategory.nomen, max_len=35, max_depth=20, synset_pair=(johannis_wurm, leber_trans))
    
    syn_p = germanet.get_synsets_by_orthform("Politiker").pop()
    
    res = {}
    for syn in list_syns:
        if syn.word_category == WordCategory.nomen:
            res[syn] = relatedness_calculator.simple_path(syn, syn_p)
    if(len(res)>0):
        return max(res, key=res.get)
    else:
        return "no nouns"

In [47]:
def find_most_related_to_politician_ic(list_syns):
    syn_p = germanet.get_synsets_by_orthform("Politiker").pop()
    res = {}
    for syn in list_syns:
        if syn.word_category == WordCategory.nomen:
            res[syn] = relatedness_nouns.resnik(syn, syn_p, normalize=True)
    if(len(res)>0):
        return max(res, key=res.get)
    else:
        return "no nouns"

In [48]:
data_bonus["best_syn"] = data_bonus.apply(lambda row: find_most_related_to_politician_path(row["synsets_ger"]) ,axis=1)
data_bonus["best_syn_ic"] = data_bonus.apply(lambda row: find_most_related_to_politician_ic(row["synsets_ger"]) ,axis=1)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data_bonus["best_syn"] = data_bonus.apply(lambda row: find_most_related_to_politician_path(row["synsets_ger"]) ,axis=1)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data_bonus["best_syn_ic"] = data_bonus.apply(lambda row: find_most_related_to_politician_ic(row["synsets_ger"]) ,axis=1)


In [49]:
data_bonus

Unnamed: 0,suggestion_ger,synsets_ger,lexunits,hypernyms,lexunits_hypernyms,hyponyms,lexunits_hyponyms,city,synsets_check,best_syn,best_syn_ic
0,aa,"[Synset(id=s23506, lexunits=American Airlines, AA), Synset(id=s26358, lexunits=Kot, Scheiße, Exkrement, Stuhl, Kacke, Kaka, Aa, Stuhlgang)]","[American Airlines, AA, Kot, Scheiße, Exkrement, Stuhl, Kacke, Kaka, Aa, Stuhlgang]","{Synset(id=s23493, lexunits=Fluggesellschaft, Luftfahrtgesellschaft, Fluglinie, Airline)}","[Fluggesellschaft, Luftfahrtgesellschaft, Fluglinie, Airline]",{},[],False,2,"Synset(id=s26358, lexunits=Kot, Scheiße, Exkrement, Stuhl, Kacke, Kaka, Aa, Stuhlgang)","Synset(id=s23506, lexunits=American Airlines, AA)"
6,abbruch,"[Synset(id=s22422, lexunits=Abbruch, Beendigung, Beenden, Aufhören, Beendung), Synset(id=s106337, lexunits=Abbruch), Synset(id=s106285, lexunits=Abbruch), Synset(id=s106286, lexunits=Abbruch), Synset(id=s21303, lexunits=Abriss, Abbruch, Niederreißung)]","[Abbruch, Beendigung, Beenden, Aufhören, Beendung, Abbruch, Abbruch, Abbruch, Abriß, Abriss, Abbruch, Niederreißung]","{Synset(id=s21961, lexunits=Veränderung, Änderung)}","[Veränderung, Änderung]","{Synset(id=s121482, lexunits=Schulabbruch), Synset(id=s22427, lexunits=Ausschaltung, Ausschalten), Synset(id=s113518, lexunits=Bauabbruch), Synset(id=s72930, lexunits=Ablauf), Synset(id=s117289, lexunits=Startabbruch), Synset(id=s22429, lexunits=Abkehr, Abwendung), Synset(id=s100335, lexunits=Vertragsbeendigung), Synset(id=s123548, lexunits=Verbindungsabbruch), Synset(id=s22430, lexunits=Einstellung), Synset(id=s22517, lexunits=Vollendung), Synset(id=s22518, lexunits=Aufkündigung, Kündigung, Vertragsauflösung, Vertragskündigung, Vertragsaufhebung), Synset(id=s22432, lexunits=Schließung), Synset(id=s140469, lexunits=Fastenbrechen), Synset(id=s149250, lexunits=Aufenthaltsbeendigung), Synset(id=s22433, lexunits=Auflösung), Synset(id=s147580, lexunits=Kontaktabbruch), Synset(id=s92201, lexunits=Ausblendung), Synset(id=s101983, lexunits=Spielabbruch), Synset(id=s22435, lexunits=Niederlegung), Synset(id=s22436, lexunits=Entlassen, Entlassung), Synset(id=s106452, lexunits=Aufhebung), Synset(id=s112714, lexunits=Studienabbruch), Synset(id=s68923, lexunits=Endanflug), Synset(id=s22439, lexunits=Ableistung), Synset(id=s22423, lexunits=Niederschlagung), Synset(id=s22424, lexunits=Niederschlagung), Synset(id=s51777, lexunits=Ausstieg, Ausstiegsphase), Synset(id=s22425, lexunits=Abschaffung, Abschaffen)}","[Schulabbruch, Ausschaltung, Ausschalten, Bauabbruch, Ablauf, Startabbruch, Abkehr, Abwendung, Vertragsbeendigung, Verbindungsabbruch, Einstellung, Vollendung, Aufkündigung, Kündigung, Vertragsauflösung, Vertragskündigung, Vertragsaufhebung, Schließung, Fastenbrechen, Aufenthaltsbeendigung, Auflösung, Kontaktabbruch, Ausblendung, Spielabbruch, Niederlegung, Entlassen, Entlassung, Aufhebung, Studienabbruch, Endanflug, Ableistung, Niederschlagung, Niederschlagung, Ausstieg, Ausstiegsphase, Abschaffung, Abschaffen]",False,5,"Synset(id=s106337, lexunits=Abbruch)","Synset(id=s106337, lexunits=Abbruch)"
15,abnehmen,"[Synset(id=s53836, lexunits=abnehmen), Synset(id=s53908, lexunits=abchecken, abnehmen, begutachten, checken), Synset(id=s53807, lexunits=abnehmen), Synset(id=s56878, lexunits=abnehmen, herunternehmen, runternehmen), Synset(id=s60512, lexunits=abnehmen), Synset(id=s54392, lexunits=abnehmen), Synset(id=s54723, lexunits=abnehmen, abkaufen), Synset(id=s60521, lexunits=abnehmen), Synset(id=s52508, lexunits=wegnehmen, abnehmen, fortnehmen), Synset(id=s59787, lexunits=abnehmen)]","[abnehmen, abchecken, abnehmen, begutachten, checken, abnehmen, abnehmen, herunternehmen, runternehmen, abnehmen, abnehmen, abnehmen, abkaufen, abnehmen, wegnehmen, abnehmen, fortnehmen, abnehmen]","{Synset(id=s53835, lexunits=helfen)}",[helfen],{},[],False,10,no nouns,no nouns
18,abschied,"[Synset(id=s105682, lexunits=Abschied), Synset(id=s17481, lexunits=Abschied, Lebewohl, Verabschiedung)]","[Abschied, Abschied, Lebewohl, Verabschiedung]","{Synset(id=s22436, lexunits=Entlassen, Entlassung)}","[Entlassen, Entlassung]",{},[],False,2,"Synset(id=s105682, lexunits=Abschied)","Synset(id=s105682, lexunits=Abschied)"
26,adel,"[Synset(id=s24212, lexunits=Adelsgeschlecht, Adel), Synset(id=s32247, lexunits=Adelstitel, Adel, Adelsbezeichnung)]","[Adelsgeschlecht, Adel, Adelstitel, Adel, Adelsbezeichnung]","{Synset(id=s24202, lexunits=Haus, Geschlecht, Dynastie, Familiendynastie, Familienclan)}","[Haus, Geschlecht, Dynastie, Familiendynastie, Familienclan]","{Synset(id=s93207, lexunits=Dienstadel), Synset(id=s136589, lexunits=Kriegeradel), Synset(id=s24215, lexunits=Hochadel), Synset(id=s111596, lexunits=Kleinadel), Synset(id=s65784, lexunits=Edelleute), Synset(id=s65171, lexunits=Ortsadel), Synset(id=s92515, lexunits=Gentry), Synset(id=s63065, lexunits=Uradel), Synset(id=s24216, lexunits=Bauernadel, Landadel), Synset(id=s87133, lexunits=Hofadel), Synset(id=s101528, lexunits=Erbadel), Synset(id=s122751, lexunits=Feudaladel), Synset(id=s24214, lexunits=Noblesse), Synset(id=s102967, lexunits=Reichsadel), Synset(id=s24217, lexunits=Stadtadel), Synset(id=s76879, lexunits=Amtsadel)}","[Dienstadel, Kriegeradel, Hochadel, Kleinadel, Edelleute, Ortsadel, Gentry, Uradel, Bauernadel, Landadel, Hofadel, Erbadel, Feudaladel, Noblesse, Reichsadel, Stadtadel, Amtsadel]",False,2,"Synset(id=s32247, lexunits=Adelstitel, Adel, Adelsbezeichnung)","Synset(id=s24212, lexunits=Adelsgeschlecht, Adel)"
...,...,...,...,...,...,...,...,...,...,...,...
3741,zirkus,"[Synset(id=s108415, lexunits=Zirkus), Synset(id=s73248, lexunits=Zirkus, Zirkusunternehmen), Synset(id=s17960, lexunits=Zirkus), Synset(id=s14333, lexunits=Rabatz, Radau, Trubel, Theater, Zirkus)]","[Circus, Zirkus, Circus, Zirkus, Zirkusunternehmen, Circus, Zirkus, Rabatz, Radau, Trubel, Theater, Zirkus]","{Synset(id=s42745, lexunits=Arena, Stadion)}","[Arena, Stadion]",{},[],False,4,"Synset(id=s108415, lexunits=Zirkus)","Synset(id=s108415, lexunits=Zirkus)"
3742,zitat,"[Synset(id=s32274, lexunits=Zitat), Synset(id=s32501, lexunits=Zitat)]","[Zitat, Zitat]","{Synset(id=s32267, lexunits=Redewendung, Idiom, Redensart, Wendung)}","[Redewendung, Idiom, Redensart, Wendung]","{Synset(id=s137738, lexunits=Selbstzitat)}",[Selbstzitat],False,2,"Synset(id=s32501, lexunits=Zitat)","Synset(id=s32274, lexunits=Zitat)"
3752,zug,"[Synset(id=s40992, lexunits=Zugkraft, Zug), Synset(id=s41350, lexunits=Luftzug, Zug, Luft), Synset(id=s17675, lexunits=Zug), Synset(id=s8724, lexunits=Eisenbahn, Eisenbahnzug, Zug, Bahn), Synset(id=s14348, lexunits=Gesichtszug, Zug), Synset(id=s76189, lexunits=Zug), Synset(id=s21974, lexunits=Ziehen, Zug), Synset(id=s76183, lexunits=Zug), Synset(id=s109599, lexunits=Zug), Synset(id=s76184, lexunits=Zug), Synset(id=s14044, lexunits=Charaktereigenschaft, Charakterzug, Zug, Wesenszug, Charaktermerkmal), Synset(id=s8959, lexunits=Lastzug, Zug), Synset(id=s76185, lexunits=Zug), Synset(id=s44336, lexunits=Zug)]","[Zugkraft, Zug, Luftzug, Zug, Luft, Zug, Eisenbahn, Eisenbahnzug, Zug, Bahn, Gesichtszug, Zug, Zug, Ziehen, Zug, Zug, Zug, Zug, Charaktereigenschaft, Charakterzug, Zug, Wesenszug, Charaktermerkmal, Lastzug, Zug, Zug, Zug]","{Synset(id=s40976, lexunits=Kraft)}",[Kraft],{},[],False,14,"Synset(id=s109599, lexunits=Zug)","Synset(id=s17675, lexunits=Zug)"
3753,zukunft,"[Synset(id=s51054, lexunits=Zukunft, Vorzeitigkeit, Hinkunft), Synset(id=s64962, lexunits=Futur, Zukunft, Futurum)]","[Zukunft, Vorzeitigkeit, Hinkunft, Futur, Zukunft, Futurum]","{Synset(id=s51051, lexunits=Zeitstufe)}",[Zeitstufe],"{Synset(id=s139797, lexunits=Energiezukunft), Synset(id=s22920, lexunits=Nachwelt), Synset(id=s100061, lexunits=Vorweg)}","[Energiezukunft, Nachwelt, Vorweg]",False,2,"Synset(id=s51054, lexunits=Zukunft, Vorzeitigkeit, Hinkunft)","Synset(id=s51054, lexunits=Zukunft, Vorzeitigkeit, Hinkunft)"


In [51]:
data_bonus.to_csv("data_03.csv")