-
Notifications
You must be signed in to change notification settings - Fork 0
Meeting 2020 05 05
Status update:
findings on flair:
-
is high level of hugging face (e.g. like tf - keras)
-
can take all models from huggingface as they are, but cannot finetune
-
e.g. if you load bert -> no fine tuning of itself, can change flair embeddings only
-
sklearn gridsearch possible in flair --> fancy algorithm of gridsearch, input ranges of hyperparameters --> is helpful to find optimal parameters and understand effect of parameters
-
trained model with loaded embedding (glove): 83% validation accuracy,
-
on whole dataset: gives still error
next steps:
- fix bugs that possible to train whole dataset
- play around with parameters
- finetuning of flair embeddings (if time)
- now everything works on leonhard
- data augmentation approach: 82% validation
- 10% mask random of every sentence: 80%, lime model: 80.85
- seems augmentation not helping
- tried to figure out why, checked for overfitting etc
next steps:
- as 90% new data --> maybe too much noise? maybe augmentation still good but only 50% --> will run both appraoches (random & lime) with 1.5 increase
general info: jery has now a lot of time until june :)
- bert works, 81% validation
- everything on leonhard set up
next steps:
- figure out higher level ideas / tricks from research
- checkout flair and try to stack different embeddings
- try to get tipps und tricks from nlp papers (if time)
- ALBERT works, can run on leonhard
- tried some data preprocessing like removing punctuation and user/url tags --> no big improvements
next steps:
- had a look at unsupervised data augmentation (https://ai.googleblog.com/2019/07/advancing-semi-supervised-learning-with.html)
- investigate more in that topic
- check out more techniques from nlp
- embeddings: size 140 as this is max length of a tweet
- possible to add attention layer after bert, stack bert, glove etc.
- want to figure out: which embeddings are good, stack embeddings ?
- flair: put stacked embeddings in RNN
- approach is to try few things but in depth
next meeting: 07.05.20, 09:30
goal: clearly decide which approaches we want to pursue so that we can start on a pipeline and clean experiments