# 熟悉wordnet

- **WordNet**是普林斯顿大学认知科学实验室在心理学教授乔治·A·米勒的指导下建立和维护的英语字典。开发工作从1985年开始，从此以后该项目接受了超过300万美元的资助（主要来源于对机器翻译有兴趣的政府机构）。由于它包含了语义信息，所以有别于通常意义上的字典。WordNet根据词条的意义将它们分组，每一个具有相同意义的字条（**Lemma**）组称为一个**synset**（同义词集合）。WordNet为每一个synset提供了简短，概要的定义，并记录不同synset之间的语义关系。

- WordNet的开发有两个目的：
    1. 它既是一个字典，又是一个辞典，它比单纯的辞典或词典都更加易于使用。
    2. 支持自动的文本分析以及人工智能应用。

In [2]:
import nltk
from nltk.corpus import wordnet

### 查看词所在的词集

In [2]:
wordnet.synsets("machine")

[Synset('machine.n.01'),
 Synset('machine.n.02'),
 Synset('machine.n.03'),
 Synset('machine.n.04'),
 Synset('machine.n.05'),
 Synset('car.n.01'),
 Synset('machine.v.01'),
 Synset('machine.v.02')]

### 查询一个同义词集的定义

In [4]:
wordnet.synset('machine.n.01').definition()

'any mechanical or electrical device that transmits or modifies energy to perform or assist in the performance of human tasks'

In [5]:
wordnet.synset('machine.n.01') == wordnet.synsets('machine')[0]

True

In [6]:
wordnet.synsets('machine')[0].definition()

'any mechanical or electrical device that transmits or modifies energy to perform or assist in the performance of human tasks'

### 查询词义的例子

In [7]:
wordnet.synset('machine.n.02').examples()

['the boxer was a magnificent fighting machine']

In [8]:
wordnet.synset('machine.n.02').lemmas()

[Lemma('machine.n.02.machine')]

In [9]:
wordnet.synset('machine.n.05').examples()

['he was endorsed by the Democratic machine']

In [10]:
wordnet.synset('machine.n.05').lemmas()

[Lemma('machine.n.05.machine'), Lemma('machine.n.05.political_machine')]

In [15]:
wordnet.synset('car.n.01').examples()

['he needs a car to get to work']

In [16]:
wordnet.synset('car.n.01').lemmas()

[Lemma('car.n.01.car'),
 Lemma('car.n.01.auto'),
 Lemma('car.n.01.automobile'),
 Lemma('car.n.01.machine'),
 Lemma('car.n.01.motorcar')]

In [21]:
wordnet.synsets('machine')

[Synset('machine.n.01'),
 Synset('machine.n.02'),
 Synset('machine.n.03'),
 Synset('machine.n.04'),
 Synset('machine.n.05'),
 Synset('car.n.01'),
 Synset('machine.v.01'),
 Synset('machine.v.02')]

In [26]:
wordnet.synset('car.n.01').lemma_names()

['car', 'auto', 'automobile', 'machine', 'motorcar']

### 利用词条查询反义词

In [34]:
wordnet.synsets('beautiful')

[Synset('beautiful.a.01'), Synset('beautiful.s.02')]

In [111]:
wordnet.synset('beautiful.a.01').lemmas()[0].antonyms()[0].name()

'ugly'

### 查询两个词之间的语义相似度

In [40]:
wordnet.synset("computer.n.01").path_similarity(wordnet.synset("calculator.n.01"))

0.09090909090909091

In [41]:
wordnet.synset("computer.n.01").path_similarity(wordnet.synset("apple.n.01"))

0.07692307692307693

### 概念之间的关系

- hypernym，表示某一个概念的上位词，假如A的上位词是B，简单的理解即是B是一个大的概念，A是B概念的一种情况，A更加具体。
- hyponym，跟上位词对应，也有下位词概念
- holonym, 整体关系词。假如A是B的整体关系词，则意味着B是A的一个组成部分，A是一个整体

#### 名词同义词集概念关系
+ hyperonymy 上位概念
+ hyponymy  下位概念
+ part_holonym  整体概念
+ part_meronym  部件概念

In [42]:
# 上位概念
wordnet.synset("computer.n.01").hypernyms()

[Synset('machine.n.01')]

In [43]:
# 下位概念
wordnet.synset("computer.n.01").hyponyms()

[Synset('analog_computer.n.01'),
 Synset('digital_computer.n.01'),
 Synset('home_computer.n.01'),
 Synset('node.n.08'),
 Synset('number_cruncher.n.02'),
 Synset('pari-mutuel_machine.n.01'),
 Synset('predictor.n.03'),
 Synset('server.n.03'),
 Synset('turing_machine.n.01'),
 Synset('web_site.n.01')]

In [44]:
# 上位整体概念
wordnet.synset("computer.n.01").part_holonyms()

[Synset('platform.n.03')]

In [2]:
# 下位部件概念
wordnet.synset("computer.n.01").part_meronyms()

[Synset('busbar.n.01'),
 Synset('cathode-ray_tube.n.01'),
 Synset('central_processing_unit.n.01'),
 Synset('chip.n.07'),
 Synset('computer_accessory.n.01'),
 Synset('computer_circuit.n.01'),
 Synset('data_converter.n.01'),
 Synset('disk_cache.n.01'),
 Synset('diskette.n.01'),
 Synset('hardware.n.03'),
 Synset('keyboard.n.01'),
 Synset('memory.n.04'),
 Synset('monitor.n.04'),
 Synset('peripheral.n.01')]

In [59]:
# 主题域
wordnet.synset("computer.n.01").topic_domains()

[Synset('computer_science.n.01')]

In [60]:
# 词性分组关系
wordnet.synset("computer.n.01").lexname()

'noun.artifact'

In [80]:
wordnet.synset("computer.n.01").pos()

'n'

#### 动词同义词集概念关系
+ hypernym
+ hyponym
+ entailment

In [116]:
wordnet.synsets('buy')

[Synset('bargain.n.02'),
 Synset('buy.v.01'),
 Synset('bribe.v.01'),
 Synset('buy.v.03'),
 Synset('buy.v.04'),
 Synset('buy.v.05')]

In [26]:
wordnet.synset('buy.v.01').hypernyms()

[Synset('get.v.01')]

In [27]:
wordnet.synset('buy.v.01').hyponyms()

[Synset('buy_back.v.01'),
 Synset('get.v.22'),
 Synset('impulse-buy.v.01'),
 Synset('pick_up.v.08'),
 Synset('subscribe.v.05'),
 Synset('take.v.33'),
 Synset('take_out.v.07'),
 Synset('take_over.v.05')]

In [6]:
wordnet.synset('buy.v.01').entailments()[0].name()

'choose.v.01'

In [68]:
wordnet.synset('buy.v.01').topic_domains()

[Synset('commerce.n.01')]

In [77]:
wordnet.synset('buy.v.01').lexname()

'verb.possession'

In [79]:
wordnet.synset('buy.v.01').pos()

'v'

#### 形容词同义词集概念关系
+ antonym

In [15]:
wordnet.synsets('beautiful')

[Synset('beautiful.a.01'), Synset('beautiful.s.02')]

In [123]:
wordnet.synset('beautiful.a.01').name()

'beautiful.a.01'

In [17]:
wordnet.synset('beautiful.a.01').lemmas()[0].antonyms()[0].synset().name()

'ugly.a.01'

In [101]:
wordnet.synset('beautiful.s.02').topic_domains()

[]

In [75]:
wordnet.synset('beautiful.a.01').lexname()

'adj.all'

In [83]:
wordnet.synset('beautiful.a.01').pos()

'a'

#### 副词同义词集概念关系
There are only few adverbs in WordNet (hardly, mostly, really, etc.) as the majority of English adverbs are straightforwardly derived from adjectives via morphological affixation (surprisingly, strangely, etc.)

In [103]:
wordnet.synsets('hardly')

[Synset('barely.r.01'), Synset('hardly.r.02')]

In [104]:
wordnet.synset('barely.r.01').topic_domains()

[]

In [105]:
wordnet.synset('barely.r.01').lexname()

'adv.all'

In [106]:
wordnet.synset('barely.r.01').pos()

'r'

In [121]:
wordnet.synset('buy.v.01').name()

'buy.v.01'