Meeting 2020 05 05

Jump to bottom

ferraric edited this page May 12, 2020 · 1 revision

Status update:

Claudio:

findings on flair:

is high level of hugging face (e.g. like tf - keras)
can take all models from huggingface as they are, but cannot finetune
e.g. if you load bert -> no fine tuning of itself, can change flair embeddings only
sklearn gridsearch possible in flair --> fancy algorithm of gridsearch, input ranges of hyperparameters --> is helpful to find optimal parameters and understand effect of parameters
trained model with loaded embedding (glove): 83% validation accuracy,
on whole dataset: gives still error

next steps:

fix bugs that possible to train whole dataset
play around with parameters
finetuning of flair embeddings (if time)

Jérémy:

now everything works on leonhard
data augmentation approach: 82% validation
10% mask random of every sentence: 80%, lime model: 80.85
seems augmentation not helping
tried to figure out why, checked for overfitting etc

next steps:

as 90% new data --> maybe too much noise? maybe augmentation still good but only 50% --> will run both appraoches (random & lime) with 1.5 increase

general info: jery has now a lot of time until june :)

Vanessa:

bert works, 81% validation
everything on leonhard set up

next steps:

figure out higher level ideas / tricks from research
checkout flair and try to stack different embeddings
try to get tipps und tricks from nlp papers (if time)

Sinan:

ALBERT works, can run on leonhard
tried some data preprocessing like removing punctuation and user/url tags --> no big improvements

next steps:

had a look at unsupervised data augmentation (https://ai.googleblog.com/2019/07/advancing-semi-supervised-learning-with.html)
investigate more in that topic
check out more techniques from nlp

General:

embeddings: size 140 as this is max length of a tweet
possible to add attention layer after bert, stack bert, glove etc.
want to figure out: which embeddings are good, stack embeddings ?
flair: put stacked embeddings in RNN
approach is to try few things but in depth

next meeting: 07.05.20, 09:30

goal: clearly decide which approaches we want to pursue so that we can start on a pipeline and clean experiments