Meeting 2020 05 07

Status Update

Claudio

4 jobs done: trained with only glove embeddings, with glove & flair embeddings, on full and small datasets. --> learning: more data and more embeddings help

goal: ablation study - try as many embeddings as possible --> next steps: fix flair that it's possible to train on whole dataset, then start many jobs. checkout possible baseline implementatinos.

Sinan

next steps: checkout google implementation and if it is possible for us to implement. is a mixed of supervised und unsupervised learning. seems rather complicated and unintuitive, but maybe possible to achieve something and generalize better.

Vanessa

done: pushed code to github, maybe useful as start for pipeline, general code of bert - i.e. easy to expand models. started with flair tutorials. --> next steps: train flair embeddings on our dataset - support claudio.

Jérémy:

currently on data augmentation. finding: until now only trained on augmented data and not whole dataset, thus expect better results --> next steps: augmentation experiments done

Ideas from research:

new data augmentation method. our dataset done with distance supervision --> use other datasets which are generated the same way --> can train on more data (https://github.com/imoea/twitterSentimentClassifier)
mix supervised and unsupervised learning (UDA with BERT) (https://arxiv.org/pdf/1904.12848.pdf)
spelling correction - but most probably already handled in bert
focus also on fuzziness (https://journals.sagepub.com/doi/pdf/10.1177/0165551519828627)
character level embeddings (https://link.springer.com/content/pdf/10.1007/s13278-019-0557-y.pdf)
new preprocessing tool: SACPC (https://www.sciencedirect.com/science/article/pii/S0950705120300599)
extend BERT with pooling layers (https://arxiv.org/pdf/2002.04815.pdf)

General:

we need 2 baselines, here we can start figuring out which one to use (glove, bert, ...)
start with related works section in paper
too early to start with pipeline, first need to fix direction we want to go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly