<h2 align="center">点击下列图标在线运行HanLP</h2>
<div align="center">
	<a href="https://colab.research.google.com/github/hankcs/HanLP/blob/doc-zh/plugins/hanlp_demo/hanlp_demo/zh/dep_stl.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
	<a href="https://mybinder.org/v2/gh/hankcs/HanLP/doc-zh?filepath=plugins%2Fhanlp_demo%2Fhanlp_demo%2Fzh%2Fdep_stl.ipynb" target="_blank"><img src="https://mybinder.org/badge_logo.svg" alt="Open In Binder"/></a>
</div>

## 安装

无论是Windows、Linux还是macOS，HanLP的安装只需一句话搞定：

In [None]:
!pip install hanlp -U

## 加载模型
HanLP的工作流程是先加载模型，模型的标示符存储在`hanlp.pretrained`这个包中，按照NLP任务归类。

In [1]:
import hanlp
hanlp.pretrained.dep.ALL # 语种见名称最后一个字段或相应语料库

{'CTB5_BIAFFINE_DEP_ZH': 'https://file.hankcs.com/hanlp/dep/biaffine_ctb5_20191229_025833.zip',
 'CTB7_BIAFFINE_DEP_ZH': 'https://file.hankcs.com/hanlp/dep/biaffine_ctb7_20200109_022431.zip',
 'CTB9_DEP_ELECTRA_SMALL': 'https://file.hankcs.com/hanlp/dep/ctb9_dep_electra_small_20220216_100306.zip',
 'PMT1_DEP_ELECTRA_SMALL': 'https://file.hankcs.com/hanlp/dep/pmt_dep_electra_small_20220218_134518.zip',
 'CTB9_UDC_ELECTRA_SMALL': 'https://file.hankcs.com/hanlp/dep/udc_dep_electra_small_20220218_095452.zip',
 'PTB_BIAFFINE_DEP_EN': 'https://file.hankcs.com/hanlp/dep/ptb_dep_biaffine_20200101_174624.zip'}

调用`hanlp.load`进行加载，模型会自动下载到本地缓存：

In [2]:
dep = hanlp.load(hanlp.pretrained.dep.CTB9_DEP_ELECTRA_SMALL)

## 依存句法分析
依存句法分析任务的输入为已分词的一个或多个句子：

In [3]:
tree = dep(["2021年", "HanLPv2.1", "带来", "次", "世代", "最", "先进", "的", "多", "语种", "NLP", "技术", "。"])

返回对象为[CoNLLSentence](https://hanlp.hankcs.com/docs/api/common/conll.html#hanlp_common.conll.CoNLLSentence)类型：

In [4]:
tree

[{'id': 1,
  'form': '2021年',
  'cpos': None,
  'pos': None,
  'head': 3,
  'deprel': 'tmod',
  'lemma': None,
  'feats': None,
  'phead': None,
  'pdeprel': None},
 {'id': 2,
  'form': 'HanLPv2.1',
  'cpos': None,
  'pos': None,
  'head': 3,
  'deprel': 'nsubj',
  'lemma': None,
  'feats': None,
  'phead': None,
  'pdeprel': None},
 {'id': 3,
  'form': '带来',
  'cpos': None,
  'pos': None,
  'head': 0,
  'deprel': 'root',
  'lemma': None,
  'feats': None,
  'phead': None,
  'pdeprel': None},
 {'id': 4,
  'form': '次',
  'cpos': None,
  'pos': None,
  'head': 5,
  'deprel': 'det',
  'lemma': None,
  'feats': None,
  'phead': None,
  'pdeprel': None},
 {'id': 5,
  'form': '世代',
  'cpos': None,
  'pos': None,
  'head': 7,
  'deprel': 'dep',
  'lemma': None,
  'feats': None,
  'phead': None,
  'pdeprel': None},
 {'id': 6,
  'form': '最',
  'cpos': None,
  'pos': None,
  'head': 7,
  'deprel': 'advmod',
  'lemma': None,
  'feats': None,
  'phead': None,
  'pdeprel': None},
 {'id': 7,
  'form'

打印时为CoNLL格式：

In [5]:
print(tree)

1	2021年	_	_	_	_	3	tmod	_	_
2	HanLPv2.1	_	_	_	_	3	nsubj	_	_
3	带来	_	_	_	_	0	root	_	_
4	次	_	_	_	_	5	det	_	_
5	世代	_	_	_	_	7	dep	_	_
6	最	_	_	_	_	7	advmod	_	_
7	先进	_	_	_	_	12	rcmod	_	_
8	的	_	_	_	_	7	cpm	_	_
9	多	_	_	_	_	10	nummod	_	_
10	语种	_	_	_	_	12	nn	_	_
11	NLP	_	_	_	_	12	nn	_	_
12	技术	_	_	_	_	3	dobj	_	_
13	。	_	_	_	_	3	punct	_	_


如果不需要CoNLL格式的话，也许`conll=False`时的输出更加简洁：

In [6]:
dep(["2021年", "HanLPv2.1", "带来", "次", "世代", "最", "先进", "的", "多", "语种", "NLP", "技术", "。"], conll=False)

[(3, 'tmod'),
 (3, 'nsubj'),
 (0, 'root'),
 (5, 'det'),
 (7, 'dep'),
 (7, 'advmod'),
 (12, 'rcmod'),
 (7, 'cpm'),
 (10, 'nummod'),
 (12, 'nn'),
 (12, 'nn'),
 (3, 'dobj'),
 (3, 'punct')]

### 可视化
你可以构造一个`Document`实现漂亮的可视化：

In [7]:
from hanlp_common.document import Document
doc = Document(
    tok=["2021年", "HanLPv2.1", "带来", "次", "世代", "最", "先进", "的", "多", "语种", "NLP", "技术", "。"],
    dep=[(3, 'tmod'), (3, 'nsubj'), (0, 'root'), (5, 'det'), (7, 'dep'), (7, 'advmod'), (12, 'rcmod'), (7, 'cpm'), (10, 'nummod'), (12, 'nn'), (12, 'nn'), (3, 'dobj'), (3, 'punct')]
)
doc.pretty_print()