# Pre-trained NER models for Korean corpus

## monologg/KoELECTRA

### [monologg/koelectra-small-finetuned-naver-ner](https://huggingface.co/monologg/koelectra-small-finetuned-naver-ner)

In [None]:
!git clone https://github.com/monologg/KoELECTRA-Pipeline KoelectraPipeline

In [1]:
import torch
from transformers import ElectraTokenizer, ElectraForTokenClassification
from KoelectraPipeline.ner_pipeline import NerPipeline
from pprint import pprint

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
device_no = 0 if torch.cuda.is_available() else -1
query = "현대자동차가 사우디아라비아에 반조립(CKD) 공장 설립을 검토한다. 현대차(005380)의 첫 중동 생산기지로 사우디가 확정되면 성장 가능성이 큰 현지 시장 공략에도 속도가 붙을 것으로 전망된다."

In [2]:
tokenizer = ElectraTokenizer.from_pretrained("monologg/koelectra-small-finetuned-naver-ner")
model = ElectraForTokenClassification.from_pretrained("monologg/koelectra-small-finetuned-naver-ner").to(device)

- It seems `NerPipeline` is not available in `transformers` latest version

In [3]:
ner = NerPipeline(
    model=model,
    tokenizer=tokenizer,
    ignore_labels=[],
    ignore_special_tokens=True,
    device=device_no
)

- Showing how the NER pipeline works

In [4]:
pprint(ner(query))



[{'entity': 'ORG-B', 'score': 0.9996711611747742, 'word': '현대자동차가'},
 {'entity': 'LOC-B', 'score': 0.9993993639945984, 'word': '사우디아라비아에'},
 {'entity': 'TRM-B', 'score': 0.9200928807258606, 'word': '반조립(CKD)'},
 {'entity': 'O', 'score': 0.9994731545448303, 'word': '공장'},
 {'entity': 'O', 'score': 0.99997478723526, 'word': '설립을'},
 {'entity': 'O', 'score': 0.9999741911888123, 'word': '검토한다.'},
 {'entity': 'ORG-B', 'score': 0.999255895614624, 'word': '현대차(005380)의'},
 {'entity': 'NUM-B', 'score': 0.9716644287109375, 'word': '첫'},
 {'entity': 'LOC-B', 'score': 0.6237317323684692, 'word': '중동'},
 {'entity': 'O', 'score': 0.9186286330223083, 'word': '생산기지로'},
 {'entity': 'LOC-B', 'score': 0.9993852972984314, 'word': '사우디가'},
 {'entity': 'O', 'score': 0.9999492764472961, 'word': '확정되면'},
 {'entity': 'O', 'score': 0.9999668598175049, 'word': '성장'},
 {'entity': 'O', 'score': 0.9999701976776123, 'word': '가능성이'},
 {'entity': 'O', 'score': 0.9999551773071289, 'word': '큰'},
 {'entity': 'O', 'score

- Elapsed time

In [5]:
start = torch.cuda.Event(enable_timing=True)
end = torch.cuda.Event(enable_timing=True)

start.record()
ner(query)
end.record()

torch.cuda.synchronize()

print('Elapsed time: {} ms'.format(start.elapsed_time(end)))

Elapsed time: 21.58198356628418 ms


### [monologg/koelectra-base-v3-naver-ner](https://huggingface.co/monologg/koelectra-base-v3-naver-ner)

- `transformers==3.3.1` can't find `monologg/koelectra-base-v3-naver-ner`

```
OSError: Model name 'monologg/koelectra-base-v3-naver-ner' was not found in tokenizers model name list (google/electra-small-generator, google/electra-base-generator, google/electra-large-generator, google/electra-small-discriminator, google/electra-base-discriminator, google/electra-large-discriminator). We assumed 'monologg/koelectra-base-v3-naver-ner' was a path, a model identifier, or url to a directory containing vocabulary files named ['vocab.txt'] but couldn't find such vocabulary files at this path or url.
```

In [6]:
tokenizer = ElectraTokenizer.from_pretrained("monologg/koelectra-base-v3-naver-ner")
model = ElectraForTokenClassification.from_pretrained("monologg/koelectra-base-v3-naver-ner")

OSError: Model name 'monologg/koelectra-base-v3-naver-ner' was not found in tokenizers model name list (google/electra-small-generator, google/electra-base-generator, google/electra-large-generator, google/electra-small-discriminator, google/electra-base-discriminator, google/electra-large-discriminator). We assumed 'monologg/koelectra-base-v3-naver-ner' was a path, a model identifier, or url to a directory containing vocabulary files named ['vocab.txt'] but couldn't find such vocabulary files at this path or url.

```
$ curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | bash
$ apt-get install git-lfs
```

In [None]:
!git clone https://huggingface.co/monologg/koelectra-base-v3-naver-ner

In [7]:
tokenizer = ElectraTokenizer.from_pretrained("koelectra-base-v3-naver-ner")
model = ElectraForTokenClassification.from_pretrained("koelectra-base-v3-naver-ner").to(device)

In [8]:
ner = NerPipeline(
    model=model,
    tokenizer=tokenizer,
    ignore_labels=[],
    ignore_special_tokens=True,
    device=device_no
)

In [9]:
pprint(ner(query))

[{'entity': 'ORG-B', 'score': 0.9999872446060181, 'word': '현대자동차가'},
 {'entity': 'LOC-B', 'score': 0.9999710321426392, 'word': '사우디아라비아에'},
 {'entity': 'TRM-B', 'score': 0.9998471140861511, 'word': '반조립(CKD)'},
 {'entity': 'O', 'score': 0.9999819397926331, 'word': '공장'},
 {'entity': 'O', 'score': 0.9999977946281433, 'word': '설립을'},
 {'entity': 'O', 'score': 0.9999947547912598, 'word': '검토한다.'},
 {'entity': 'ORG-B', 'score': 0.9999806880950928, 'word': '현대차(005380)의'},
 {'entity': 'NUM-B', 'score': 0.9999110698699951, 'word': '첫'},
 {'entity': 'LOC-B', 'score': 0.9978455901145935, 'word': '중동'},
 {'entity': 'O', 'score': 0.9986305236816406, 'word': '생산기지로'},
 {'entity': 'LOC-B', 'score': 0.999977707862854, 'word': '사우디가'},
 {'entity': 'O', 'score': 0.9999973773956299, 'word': '확정되면'},
 {'entity': 'O', 'score': 0.9999985694885254, 'word': '성장'},
 {'entity': 'O', 'score': 0.9999983906745911, 'word': '가능성이'},
 {'entity': 'O', 'score': 0.9999987483024597, 'word': '큰'},
 {'entity': 'O', 'sco

In [10]:
start = torch.cuda.Event(enable_timing=True)
end = torch.cuda.Event(enable_timing=True)

start.record()
ner(query)
end.record()

torch.cuda.synchronize()

print('Elapsed time: {} ms'.format(start.elapsed_time(end)))

Elapsed time: 23.280223846435547 ms
