SciBERT - NER #8

ivalexander13 · 2021-03-19T04:02:53Z

Overview

We are doing this to compare SciBERT's performance on NER, relative to text classification. SciBERT didn't provide a chemprot dataset for NER, so we are using the chemprot dataset straight from its source (link here?) and formatting it to fit the model's NER task.

Attempt (ongoing)

We are in the middle of converting the source chemprot dataset, and doing part-of-speech tagging on each word, as well as connecting the relevant entities (substrate, product, and enzyme).

Plans

We will do the full 75 epoch training on this dataset, and see how it performs.

mrunalimanj · 2021-04-04T01:04:16Z

Sina and I got our data reformatted finally after a couple hours, in mar12_NER/20210326_set_up_NER_runs_with_dividers.ipynb -- data was saved to data/ner/chemprot_sub_enzyme/clean/{dev, train, test}.txt

We ran it yesterday but keep getting low f1s, so I'm going to start working on seeing if we can use bits and pieces of the SciBERT model to include class_weights - more coming

mrunalimanj · 2021-04-04T01:06:24Z

how we ran it for testing (didn't want to use compute hours):
source activate /global/home/groups/fc_igemcomp/software/scibert_env_NER
cd fc_igemcomp/2020_nlp/scibert
rm -R scripts/NER_output_26mar/
./scripts/train_allennlp_local_v3_NER_trial.sh ./scripts/NER_output_26mar/

mrunalimanj · 2021-04-04T01:19:42Z

creating new kernel:
source activate ~/fc_igemcomp/software/scibert_env_NER # very important! # can also use conda
( had to install ipykernel: conda install -p /global/home/groups/fc_igemcomp/software/scibert_env_NER ipykernel)
python -m ipykernel install --user --name python3.6.13_ner_scibert --display-name "Python 3.6.13 (scibert_env_NER)" # display name is what will show

6:20pm: issue with some iProgress module, so ran these

conda install -c conda-forge ipywidgets
jupyter nbextension enable --py widgetsnbextension
(cool! now TQDM works in-notebook)

mrunalimanj · 2021-04-04T02:28:44Z

okay, my plan:
what I want to do:

get a trained model of maybe 10 epochs.
then get the logits out from the predict option of the finetuned model
and weight those accordingly to get better labels.

uguguguguugugg we need to modify the loss if we want the model to LEARN these weights though

train: scripts/0403_train_allennlp_local_NER_few_epochs.sh scripts/NER_output_3apr/

mrunalimanj · 2021-04-04T03:01:27Z

oof okay switching to local to make changes to AllenNLP - will try to set up similar structure of files on Savio and sync to GitHub

ah sike - we realized it's not bert_text_classifier it uses for the NER set, but rather the bert_crf_tagger.py file - will try to see if we can modify that to use class weights instead!

mrunalimanj · 2021-04-04T19:04:23Z

kmkurn/pytorch-crf#47 is helpful, and files to modify include ner_finetune.json, allennlp CRF class, and the bert_crf_tagger.py file.

sghandian · 2021-04-06T01:28:02Z

did some more checks into how people have fixed imbalanced data issues in AllenNLP before. Seems like there is no generalized solution according to this thread.

Mrunali and my experiments with directly modifying the weights haven't made a big difference to performance so far, might be missing something though.

mrunalimanj · 2021-04-06T07:08:23Z

looking into modifying CRFs to be weighted:
mathy paper that says basically we should compute a double sum for loss so we can weight the classes https://perso.uclouvain.be/michel.verleysen/papers/ieeetbe12gdl.pdf:
seems to have kind of decent results? hadn't thought about L1 regularization.

from: allenai/allennlp#4619

someone said "I mean, I believe it can work in practice, but their theoretical motivation is not correct. If this is the case, we could do it with a much simpler approach (like weighted emission scores)." which is what we did...: tensorflow/addons#817

okay, I'm just going to keep a running list of updates in this comment on other comments/potential implementations

{in any case can you tell how much fun I'm having with GitHub issues lmao}

sghandian · 2021-04-07T07:58:53Z

This textbook chapter from my NLP class actually goes over what we have concluded as being a good approach to solving this problem which I thought was validating (i.e. NER/Relation Extraction + semi-supervised approach) https://web.stanford.edu/~jurafsky/slp3/17.pdf

ivalexander13 · 2021-04-08T04:34:38Z

This textbook chapter from my NLP class actually goes over what we have concluded as being a good approach to solving this problem which I thought was validating (i.e. NER/Relation Extraction + semi-supervised approach) https://web.stanford.edu/~jurafsky/slp3/17.pdf

Is the semi-supervised approach the approach you're/they're thinking of? It does seem really cool and it seems to have decent track record, though we'd probably need to rewrite a lot of code. Do you think this is something worth pursuing?

sghandian · 2021-04-09T01:51:11Z

Yeah take a look at 17.2.4 in there (distant supervision for relation extraction). It sounds very similar to the pattern recognition technique we've been talking about, except it learns non-regex patterns for features (or aggregates data to be fed into NN directly without extracting features beforehand). Problem is that it generally has low precision, which is similar to the other paper we read using pattern matching, so not sure what the best solution is for us.

mrunalimanj · 2021-04-13T03:53:18Z

Trying to rebalance the data (with 12apr/20210412 notebook + script) so as to remove any sentences without entities/labels of interest, but the F1 does not change considerably :(

mrunalimanj · 2021-04-17T01:21:45Z

praise Ivan who modified a hugging face implementation (in his scratch folder, /global/scratch/ivalexander13/NLPChemExtractor/scibert-text-classification/main.ipynb, but also in /global/home/groups/fc_igemcomp/2020_nlp/scibert/apr16_huggingface_NER)

hyper parameter search doc: https://docs.google.com/spreadsheets/d/1jolvSI9tCqHZqBMtX1MAUjht2WuXyl_uFauhbvHMUtQ/edit?usp=sharing
TODO:
tune hyperparams
creation of a simpler model
- would need labeled data from pattern dev to have more freedom
see if modifying loss fn is necessary
relation extraction equivalent? may be more fitted to our problem, but potentially not a huge deal if we can't do it.
run for more epochs, see if that helps in training
modify brenda data to be used as training data? or potentially use for tagging, off of a semi-supervised model with chemprot data
- get labeled data from pattern development for some kind of benchmark?

mrunalimanj · 2021-04-25T08:36:42Z

revised TODOs:

HuggingFace NER:
1. try further regularization: dropout + early stopping - don't use loss, use F1 + AUC/ROC https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5 (Ivan) to pick the desired threshold, maybe if it's in the hugging face library?
2. look into playing with loss: weights? https://huggingface.co/transformers/training.html, Adding class_weights argument for the loss function of transformers model huggingface/transformers#7024
HuggingFace QA:
1. there's no good RE implementation in hugging face, so maybe QA is better?
  https://huggingface.co/transformers/usage.html - a brief test suggests maybe it's not the best with a normal BERT model but we can to integrate use of sciBERT

ivalexander13 · 2021-04-25T22:22:33Z

revised TODOs:

HuggingFace NER:

try further regularization: dropout + early stopping - don't use loss, use F1 + AUC/ROC https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5 (Ivan) to pick the desired threshold, maybe if it's in the hugging face library?

look into playing with loss: weights? https://huggingface.co/transformers/training.html, huggingface/transformers#7024

HuggingFace QA:

there's no good RE implementation in hugging face, so maybe QA is better?
https://huggingface.co/transformers/usage.html - a brief test suggests maybe it's not the best with a normal BERT model but we can to integrate use of sciBERT

I'm working on this at #26

ivalexander13 assigned ivalexander13, sghandian and mrunalimanj Mar 19, 2021

mrunalimanj assigned vivianlu63 Mar 30, 2021

mrunalimanj added bug Something isn't working nlp functional groups rejoice labels Apr 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SciBERT - NER #8

SciBERT - NER #8

ivalexander13 commented Mar 19, 2021 •

edited

Loading

mrunalimanj commented Apr 4, 2021

mrunalimanj commented Apr 4, 2021 •

edited

Loading

mrunalimanj commented Apr 4, 2021 •

edited

Loading

mrunalimanj commented Apr 4, 2021 •

edited

Loading

mrunalimanj commented Apr 4, 2021 •

edited

Loading

mrunalimanj commented Apr 4, 2021

sghandian commented Apr 6, 2021 •

edited

Loading

mrunalimanj commented Apr 6, 2021 •

edited

Loading

sghandian commented Apr 7, 2021

ivalexander13 commented Apr 8, 2021

sghandian commented Apr 9, 2021

mrunalimanj commented Apr 13, 2021

mrunalimanj commented Apr 17, 2021

mrunalimanj commented Apr 25, 2021

ivalexander13 commented Apr 25, 2021

SciBERT - NER #8

SciBERT - NER #8

Comments

ivalexander13 commented Mar 19, 2021 • edited Loading

Overview

Attempt (ongoing)

Plans

mrunalimanj commented Apr 4, 2021

mrunalimanj commented Apr 4, 2021 • edited Loading

mrunalimanj commented Apr 4, 2021 • edited Loading

mrunalimanj commented Apr 4, 2021 • edited Loading

mrunalimanj commented Apr 4, 2021 • edited Loading

mrunalimanj commented Apr 4, 2021

sghandian commented Apr 6, 2021 • edited Loading

mrunalimanj commented Apr 6, 2021 • edited Loading

sghandian commented Apr 7, 2021

ivalexander13 commented Apr 8, 2021

sghandian commented Apr 9, 2021

mrunalimanj commented Apr 13, 2021

mrunalimanj commented Apr 17, 2021

mrunalimanj commented Apr 25, 2021

ivalexander13 commented Apr 25, 2021

ivalexander13 commented Mar 19, 2021 •

edited

Loading

mrunalimanj commented Apr 4, 2021 •

edited

Loading

mrunalimanj commented Apr 4, 2021 •

edited

Loading

mrunalimanj commented Apr 4, 2021 •

edited

Loading

mrunalimanj commented Apr 4, 2021 •

edited

Loading

sghandian commented Apr 6, 2021 •

edited

Loading

mrunalimanj commented Apr 6, 2021 •

edited

Loading