### [Install with wheel file -- python3.6 needed](./pyltp-0.2.1-cp36-cp36m-win_amd64.whl)
![语言云架构](./ltp_framework.png)

In [1]:
from pyltp import Segmentor, SentenceSplitter
import jieba

In [2]:
model_path = 'D:\\Tutorials\\0.NLP_Stanford_Daniel_Jurasfky\\ltp_data_v3.4.0'

### 分句

In [3]:
sents = SentenceSplitter.split('元芳你怎么看？我就趴窗口上看呗！我也想这么看哟~')
print('\n'.join(sents))

元芳你怎么看？
我就趴窗口上看呗！
我也想这么看哟~


In [4]:
import os
cws_model_path = os.path.join(model_path, 'cws.model')

### 分词
> 比较jieba和pyltp的分词结果

In [5]:
segmentor = Segmentor()
segmentor.load(cws_model_path)
words = segmentor.segment('国务院总理李克强调研上海外高桥时提出，支持上海积极探索新机制。')
words_ = jieba.cut('国务院总理李克强调研上海外高桥时提出，支持上海积极探索新机制。')
print('\t'.join(words))




In [6]:
print(list(words))
print(list(words_))

Building prefix dict from the default dictionary ...


[]


Dumping model to file cache C:\Users\ADMINI~1\AppData\Local\Temp\jieba.cache
Loading model cost 1.526 seconds.
Prefix dict has been built succesfully.


['国务院', '总理', '李克强', '调研', '上海', '外高桥', '时', '提出', '，', '支持', '上海', '积极探索', '新机制', '。']


### 个性化分词

### 词性标注 - POS tagging

In [10]:
pos_model_path = os.path.join(model_path, 'pos.model')

In [11]:
from pyltp import Postagger
postagger= Postagger()  # load, load_with_lexicon, postag, release
postagger.load(pos_model_path)
postags = postagger.postag(words)
postags_ = zip(words, postags)
# print('\t'.join(postags_))
print([x for x in postags_])
postagger.release()

[('国务院', 'ni'), ('总理', 'n'), ('李克强', 'nh'), ('调研', 'v'), ('上海', 'ns'), ('外高桥', 'ns'), ('时', 'n'), ('提出', 'v'), ('，', 'wp'), ('支持', 'v'), ('上海', 'ns'), ('积极', 'a'), ('探索', 'v'), ('新', 'a'), ('机制', 'n'), ('。', 'wp')]


### 命名实体识别 - NER
> * 三种实体，Ni，Nh，Ns：机构名，人名，地名
> * S,B,I,E,O：单独，开始，中间，结束，非命名实体

In [23]:
ner_model_path = os.path.join(model_path, 'ner.model')
from pyltp import NamedEntityRecognizer

recognizer = NamedEntityRecognizer()
recognizer.load(ner_model_path)
netags = recognizer.recognize(words, postags)
netags_=zip(words,netags)
# print('\t'.join(netags))
print(list(netags_))
recognizer.release()

[('国务院', 'S-Ni'), ('总理', 'O'), ('李克强', 'S-Nh'), ('调研', 'O'), ('上海', 'B-Ns'), ('外高桥', 'E-Ns'), ('时', 'O'), ('提出', 'O'), ('，', 'O'), ('支持', 'O'), ('上海', 'S-Ns'), ('积极', 'O'), ('探索', 'O'), ('新', 'O'), ('机制', 'O'), ('。', 'O')]


### 依存句法分析 - DP
![](./依存句法分析.png)

In [24]:
par_model_path = os.path.join(model_path, 'parser.model')
from pyltp import Parser
parser = Parser()
parser.load(par_model_path)
arcs = parser.parse(words, postags)
print('\t'.join(f"{arc.head}:{arc.relation}" for arc in arcs))

2:ATT	3:ATT	4:SBV	7:ATT	6:ATT	4:VOB	8:ADV	0:HED	8:WP	8:COO	13:SBV	13:ADV	10:VOB	15:ATT	13:VOB	8:WP


### 语义角色标注 - SRL
> 3.4.0 版本 SRL模型 pisrl.model 如在windows系统下不可用，可以到[此链接](http://model.scir.yunfutech.com/server/3.4.0/pisrl_win.model)下载支持windows的语义角色标注模型。

In [31]:
srl_model_path = os.path.join(model_path, 'pisrl_win.model')
from pyltp import SementicRoleLabeller
labeller = SementicRoleLabeller()
labeller.load(srl_model_path)
roles = labeller.label(words, postags, arcs)
print(list(words))
for role in roles:
    print(role.index, ''.join([f"{arg.name}:({arg.range.start},{arg.range.end})" for arg in role
                             .arguments]))
labeller.release()

['国务院', '总理', '李克强', '调研', '上海', '外高桥', '时', '提出', '，', '支持', '上海', '积极', '探索', '新', '机制', '。']
7 TMP:(0,6)A1:(9,14)
9 A1:(10,10)


### 语义依存分析
> pyltp不提供，可使用[语言云](http://www.ltp-cloud.com/)。