-
Notifications
You must be signed in to change notification settings - Fork 0
Meeting 2020 05 07
Status Update
4 jobs done: trained with only glove embeddings, with glove & flair embeddings, on full and small datasets. --> learning: more data and more embeddings help
goal: ablation study - try as many embeddings as possible --> next steps: fix flair that it's possible to train on whole dataset, then start many jobs. checkout possible baseline implementatinos.
next steps: checkout google implementation and if it is possible for us to implement. is a mixed of supervised und unsupervised learning. seems rather complicated and unintuitive, but maybe possible to achieve something and generalize better.
done: pushed code to github, maybe useful as start for pipeline, general code of bert - i.e. easy to expand models. started with flair tutorials. --> next steps: train flair embeddings on our dataset - support claudio.
currently on data augmentation. finding: until now only trained on augmented data and not whole dataset, thus expect better results --> next steps: augmentation experiments done
- new data augmentation method. our dataset done with distance supervision --> use other datasets which are generated the same way --> can train on more data (https://github.com/imoea/twitterSentimentClassifier)
- mix supervised and unsupervised learning (UDA with BERT) (https://arxiv.org/pdf/1904.12848.pdf)
- spelling correction - but most probably already handled in bert
- focus also on fuzziness (https://journals.sagepub.com/doi/pdf/10.1177/0165551519828627)
- character level embeddings (https://link.springer.com/content/pdf/10.1007/s13278-019-0557-y.pdf)
- new preprocessing tool: SACPC (https://www.sciencedirect.com/science/article/pii/S0950705120300599)
- extend BERT with pooling layers (https://arxiv.org/pdf/2002.04815.pdf)
- we need 2 baselines, here we can start figuring out which one to use (glove, bert, ...)
- start with related works section in paper
- too early to start with pipeline, first need to fix direction we want to go