# 安裝

In [1]:
!pip install spacy
!python3 -m spacy download en_core_web_sm

Collecting spacy
  Downloading spacy-2.2.4-cp37-cp37m-manylinux1_x86_64.whl (10.6 MB)
[K     |████████████████████████████████| 10.6 MB 17.2 MB/s eta 0:00:01   |███████▋                        | 2.5 MB 4.3 MB/s eta 0:00:02     |███████████████▏                | 5.0 MB 4.3 MB/s eta 0:00:02     |███████████████████▏            | 6.4 MB 4.3 MB/s eta 0:00:01     |████████████████████████        | 8.0 MB 4.3 MB/s eta 0:00:01     |███████████████████████████▉    | 9.3 MB 4.3 MB/s eta 0:00:01
Collecting srsly<1.1.0,>=1.0.2
  Downloading srsly-1.0.2-cp37-cp37m-manylinux1_x86_64.whl (185 kB)
[K     |████████████████████████████████| 185 kB 20.3 MB/s eta 0:00:01
Collecting wasabi<1.1.0,>=0.4.0
  Downloading wasabi-0.6.0-py3-none-any.whl (20 kB)
Collecting thinc==7.4.0
  Downloading thinc-7.4.0-cp37-cp37m-manylinux1_x86_64.whl (2.2 MB)
[K     |████████████████████████████████| 2.2 MB 21.7 MB/s eta 0:00:01
Collecting catalogue<1.1.0,>=0.0.7
  Downloading catalogue-1.0.0-py2.py3-none-any.whl (7.

# 說明

命名實體識別（NER）可能是信息提取的第一步，旨在將文本中的命名實體定位和分類為預定義的類別，例如人員名稱，組織，地點，時間表，數量，貨幣價值，百分比等等。NER用於自然語言處理（NLP）的許多領域，它可以幫助回答許多現實問題，例如：

    新聞文章中提到了哪些公司？
    投訴或評論中提到的指定產品是？
    這條推文是否包含一個人的名字？這條推文是否包含此人的位置？

介紹如何使用NLTK和SpaCy構建命名實體識別器，以識別事物的名稱，例如原始文本中的人員，組織或位置。讓我們開始吧！

SpaCy的命名實體識別已經在OntoNotes 5語料庫上進行了訓練，它支持以下實體類型：

<img src="https://miro.medium.com/max/1400/1*qQggIPMugLcy-ndJ8X_aAA.png"></img>

# 程式

In [1]:
import spacy
from spacy import displacy
from collections import Counter
import en_core_web_sm
nlp = en_core_web_sm.load()

In [2]:
txt = "Deep Learning is a new area of Machine Learning research, which has been introduced with the objective of moving Machine Learning closer to one of its original goals: Artificial Intelligence."

In [3]:
doc = nlp(txt)

In [4]:
doc

Deep Learning is a new area of Machine Learning research, which has been introduced with the objective of moving Machine Learning closer to one of its original goals: Artificial Intelligence.

In [5]:
displacy.render(doc, style='ent', jupyter=True)

In [6]:
for t in doc:
    print(f"{t.text}/{t.tag_} <--{t.dep_}-- {t.head.text}/{t.head.tag_}")

Deep/JJ <--compound-- Learning/NN
Learning/NN <--nsubj-- is/VBZ
is/VBZ <--ROOT-- is/VBZ
a/DT <--det-- area/NN
new/JJ <--amod-- area/NN
area/NN <--attr-- is/VBZ
of/IN <--prep-- area/NN
Machine/NNP <--compound-- Learning/NNP
Learning/NNP <--compound-- research/NN
research/NN <--pobj-- of/IN
,/, <--punct-- research/NN
which/WDT <--nsubjpass-- introduced/VBN
has/VBZ <--aux-- introduced/VBN
been/VBN <--auxpass-- introduced/VBN
introduced/VBN <--relcl-- research/NN
with/IN <--prep-- introduced/VBN
the/DT <--det-- objective/NN
objective/NN <--pobj-- with/IN
of/IN <--prep-- objective/NN
moving/VBG <--pcomp-- of/IN
Machine/NNP <--compound-- Learning/VBG
Learning/VBG <--dobj-- moving/VBG
closer/RBR <--advmod-- moving/VBG
to/IN <--prep-- closer/RBR
one/CD <--pobj-- to/IN
of/IN <--prep-- one/CD
its/PRP$ <--poss-- goals/NNS
original/JJ <--amod-- goals/NNS
goals/NNS <--pobj-- of/IN
:/: <--punct-- research/NN
Artificial/NNP <--compound-- Intelligence/NNP
Intelligence/NNP <--appos-- research/NN
./. <-

In [7]:
displacy.render(doc, style='dep', jupyter=True, options={'distance':90})

# 中文版本

In [8]:
%%bash
cd files
pip install zh_core_web_sm-0.1.0.tar.gz

Processing ./zh_core_web_sm-0.1.0.tar.gz
Building wheels for collected packages: zh-core-web-sm
  Building wheel for zh-core-web-sm (setup.py): started
  Building wheel for zh-core-web-sm (setup.py): finished with status 'done'
  Created wheel for zh-core-web-sm: filename=zh_core_web_sm-0.1.0-py3-none-any.whl size=109433089 sha256=38d642dca040ba02d728db663af029b7bc3dca6114fbdda8603252bb45c5f05c
  Stored in directory: /home/jovyan/.cache/pip/wheels/de/43/7e/84bcae08c311c2729825db26e00fe4b8a33d6e4f589adb8034
Successfully built zh-core-web-sm
Installing collected packages: zh-core-web-sm
  Attempting uninstall: zh-core-web-sm
    Found existing installation: zh-core-web-sm 0.1.0
    Uninstalling zh-core-web-sm-0.1.0:
      Successfully uninstalled zh-core-web-sm-0.1.0
Successfully installed zh-core-web-sm-0.1.0


In [9]:
!spacy link zh_core_web_sm zh

[38;5;2m✔ Linking successful[0m
/opt/conda/lib/python3.7/site-packages/zh_core_web_sm -->
/opt/conda/lib/python3.7/site-packages/spacy/data/zh
You can now load the model via spacy.load('zh')


In [10]:
import zh_core_web_sm
nlp = zh_core_web_sm.load()

In [11]:
txt = "原油大規模減產協議有望延長，加上市場日益樂觀看待全球經濟復甦，國際油價今天走高，北海布倫特原油盤中一度漲破每桶40美元，為3月以來首見。"

In [12]:
doc = nlp(txt)

Building prefix dict from the default dictionary ...
Dumping model to file cache /tmp/jieba.cache
Loading model cost 1.865 seconds.
Prefix dict has been built succesfully.


In [13]:
displacy.render(doc, style='ent', jupyter=True)

In [None]:
displacy.render(doc, style='dep', jupyter=True, options={'distance':90})