The frameBERT is available for both of English FrameNet 1.7 and Korean FrameNet 1.2.
The frameBERT is a BERT
-based frame-semantic parser to understand the meaning of texts in terms of FrameNet.
frame (frame semantics) is a schematic representation of a situation or an event.
For an example sentence, "The center's director pledged a thorough review of safety precedures", frameBERT identifies several frames such as Being_born
and Death
for lexical units (e.g., center.n
, director.n
and pledge.v
).
python 3
pytorch
(Link)transformers
(Link)Korean FrameNet
(Link)keras
(Link)nltk
(for target identification)flask_restful
(for REST API service)flask_cors
(for REST API service)
For nltk
, please download following packages in the python terminal:
- import nltk
nltk.download('averaged_perceptron_tagger')
nltk.download('wordnet')
Install
Install frameBERT
, and Korean FrameNet
.
(Note: Korean FrameNet would be not mandatory package in the next update)
git clone https://github.com/machinereading/frameBERT.git
cd frameBERT
git clone https://github.com/machinereading/koreanframenet.git
1. Download the pretrained model
Download two pretrained model files to {your_model_dir}
(e.g. /home/model/
).
- English Model (recommended for English): (download)
- Multilingual Model (English & Korean): (download)
2. Import model (in your python code) (make sure that your code is in a parent folder of frameBERT)
from frameBERT import frame_parser
model_path = {your_model_dir} # absolute_path (e.g. /home/model/)
parser = frame_parser.FrameParser(model_path=model_path, language='en')
optional: If you want to DO NOT USE LU DICTIONARY, set argument masking=False
)
3. Parse the input text
text = 'Hemingway was born on July 21, 1899 in Illinois, and died of suicide at the age of 62.'
parsed = parser.parser(text, sent_id='1', result_format='graph')
Then, your result would be:
[('frame:Giving_birth#1', 'frdf:lu', 'born'),
('frame:Giving_birth#1', 'frdf:Giving_birth-Child', 'Hemingway'),
('frame:Giving_birth#1', 'frdf:Giving_birth-Time', 'on July 21, 1899'),
('frame:Giving_birth#1', 'frdf:Giving_birth-Place', 'in Illinois,'),
('frame:Death#1', 'frdf:lu', 'died'),
('frame:Death#1', 'frdf:Death-Protagonist', 'Hemingway'),
('frame:Death#1', 'frdf:Death-Explanation', 'of suicide'),
('frame:Killing#1', 'frdf:lu', 'suicide'),
('frame:Killing#1', 'frdf:Killing-Victim', 'Hemingway'),
('frame:Age#1', 'frdf:lu', 'age'),
('frame:Age#1', 'frdf:Age-Age', 'of 62.')]
Also, you can run the Korean FrameBERT for the korean text
parser = frame_parser.FrameParser(model_path=model_path, language='ko')
text = '헤밍웨이는 1899년 7월 21일 미국 일리노이에서 태어났고 62세에 자살로 사망했다.'
parsed = parser.parser(text, sent_id='1', result_format='all')
optional: sent_id
and result_format
are not mandatory arguments.
You can get the result in following argument: conll
', graph
, textae
, and all
.
The result consits of following three parts:
(1) triple format (result_format='graph'
)
(2) conll format (result_format='conll'
)
(3) pubannotation format (result_format='textae'
)
Or, you can get all result in json by result_format='all'
triple format (as a Graph) The result is a list of triples.
[
('frame:Giving_birth#1', 'frdf:lu', 'born'),
('frame:Giving_birth#1', 'frdf:Giving_birth-Child', 'Hemingway'),
('frame:Giving_birth#1', 'frdf:Giving_birth-Time', 'on July 21, 1899'),
('frame:Giving_birth#1', 'frdf:Giving_birth-Place', 'in Illinois,'),
...
]
conll format The result is a list, which consists of multiple Frame-Semantic structures. Each SRL structure is in a list, which consists of four lists: (1) tokens, (2) lexical units, (3) its frames, and (4) its arguments. For example, for the given input text, the output is in the following format:
[
[
['Hemingway', 'was', 'born', 'on', 'July', '21,', '1899', 'in', 'Illinois,', 'and', 'died', 'of', 'suicide', 'at', 'the', 'age', 'of', '62.'],
['_', '_', 'bear.v', '_', '_', '_', '_', '_', '_', '_', '_', '_', '_', '_', '_', '_', '_', '_'],
['_', '_', 'Giving_birth', '_', '_', '_', '_', '_', '_', '_', '_', '_', '_', '_', '_', '_', '_', '_'],
['B-Child', 'O', 'O', 'B-Time', 'I-Time', 'I-Time', 'I-Time', 'B-Place', 'I-Place', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']
],
[
...
]
By running the code restApp.py
, you can make a standalone REST service at your own server.
python restApp.py --port {port number} --language {en|ko} --model {model path}
Example
python restApp.py --port 8888 --language en --model ./models/en
Then, you can use it with the POST
method to the url XXX.XXX.XXX.XXX:8888/frameBERT
. XXX.XXX.XXX.XXX
is your IP address.
# JSON format
{
"text": "Hemingway was born on July 21, 1899 in Illinois, and died of suicide at the age of 62.",
"result_format": "all"
}
# such as
[
[
['Greece', 'wildfires', 'force', 'thousands', 'to', '<tgt>', 'evacuate', '</tgt>'], # token list (target is indicated by the special tokens)
['_', '_', '_', '_', '_', '_', 'evacuate.v', '_'], # lu list (lu for target, else '_'
['_', '_', '_', '_', '_', '_', 'Escaping', '_'], # Frame list (frame for target, else '_')
['O', 'O', 'O', 'B-Escapee', 'O', 'X', 'O', 'X'] # FE list (IOB scheme, 'X' for the special tokens)
],
...
]
(reference: train.ipynb)
python train.py --train {TRAINING DATA, e.g., efn} --model_path {DIRECTORY TO SAVE YOUR MODEL} --pretrained_model {default="bert-base-multilingual-cased"} --early_stopping {default=TRUE} --epochs {default=20}
(reference: train.ipynb)
python evaluate.py --language {default='ko') --model {DIRECTORY OF YOUR MODEL} --test {test_data} --reult {DIRECTORY TO SAVE THE RESULT}
CC BY-NC-SA
Attribution-NonCommercial-ShareAlike- If you want to commercialize this resource, please contact to us
Machine Reading Lab @ KAIST
Younggyun Hahm. hahmyg@kaist.ac.kr
, hahmyg@gmail.com
This work was supported by Institute for Information & communications Technology Promotion(IITP) grant funded by the Korea government(MSIT) (2013-0-00109, WiseKB: Big data based self-evolving knowledge base and reasoning platform)