Skip to content

percent4/DL_4_NER

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

DL_4_NER

Using Bi-LSTM model for NER in English.

语料库train.txt的前15行:

played	on	Monday	(	home	team	in	CAPS	)	:
VBD	IN	NNP	(	NN	NN	IN	NNP	)	:
O	O	O	O	O	O	O	O	O	O
American	League
NNP	NNP
B-MISC	I-MISC
Cleveland	2	DETROIT	1
NNP	CD	NNP	CD
B-ORG	O	B-ORG	O
BALTIMORE	12	Oakland	11	(	10	innings	)
VB	CD	NNP	CD	(	CD	NN	)
B-ORG	O	B-ORG	O	O	O	O	O
TORONTO	5	Minnesota	3
TO	CD	NNP	CD
B-ORG	O	B-ORG	O
......

使用Keras创建Bi-LSTM模型的示意图如下:

Bi-LSTM模型的示意图

该模型在原始数据的训练集上的准确率在99%以上,在测试集上的准确率在95%以上。

对新的数据集进行测试:

自己想的三个句子:

输入为:

sent = 'James is a world famous actor, whose home is in London.'

输出结果为:

['James', 'is', 'a', 'world', 'famous', 'actor', ',', 'whose', 'home', 'is', 'in', 'London', '.']
['B-PER', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-LOC', 'O']
NER识别结果:
PERSON:  James
LOCATION:  London

输入为:

sent = 'Oxford is in England, Jack is from here.'

输出为:

['Oxford', 'is', 'in', 'England', ',', 'Jack', 'is', 'from', 'here', '.']
['B-PER', 'O', 'O', 'B-LOC', 'O', 'B-PER', 'O', 'O', 'O', 'O']
NER识别结果:
PERSON:  Oxford
LOCATION:  England
PERSON:  Jack

输入为:

sent = 'I love Shanghai.'

输出为:

['I', 'love', 'Shanghai', '.']
['O', 'O', 'B-LOC', 'O']
NER识别结果:
LOCATION:  Shanghai

在上面的例子中,只有Oxford的识别效果不理想,模型将它识别为PERSON,其实应该是ORGANIZATION。

  接下来是三个来自CNN和wikipedia的句子:

输入为:

sent = "the US runs the risk of a military defeat by China or Russia"

输出为:

['the', 'US', 'runs', 'the', 'risk', 'of', 'a', 'military', 'defeat', 'by', 'China', 'or', 'Russia']
['O', 'B-LOC', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-LOC', 'O', 'B-LOC']
NER识别结果:
LOCATION:  US
LOCATION:  China
LOCATION:  Russia

输入为:

sent = "Home to the headquarters of the United Nations, New York is an important center for international diplomacy."

输出为:

['Home', 'to', 'the', 'headquarters', 'of', 'the', 'United', 'Nations', ',', 'New', 'York', 'is', 'an', 'important', 'center', 'for', 'international', 'diplomacy', '.']
['O', 'O', 'O', 'O', 'O', 'O', 'B-ORG', 'I-ORG', 'O', 'B-LOC', 'I-LOC', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']
NER识别结果:
ORGANIZATION:  United Nations
LOCATION:  New York

输入为:

sent = "The United States is a founding member of the United Nations, World Bank, International Monetary Fund."

输出为:

['The', 'United', 'States', 'is', 'a', 'founding', 'member', 'of', 'the', 'United', 'Nations', ',', 'World', 'Bank', ',', 'International', 'Monetary', 'Fund', '.']
['O', 'B-LOC', 'I-LOC', 'O', 'O', 'O', 'O', 'O', 'O', 'B-ORG', 'I-ORG', 'O', 'B-ORG', 'I-ORG', 'O', 'B-ORG', 'I-ORG', 'I-ORG', 'O']
NER识别结果:
LOCATION:  United States
ORGANIZATION:  United Nations
ORGANIZATION:  World Bank
ORGANIZATION:  International Monetary Fund

  这三个例子识别全部正确。

About

Using Bi-LSTM model for NER in English.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages