This repo shows simple data poisoning in a pytorch seq to seq nlp model
With simple data poisoning, by injecting random synthetic data into the training dataset, the training accuracy is reduced. While the current dataset is based on "Treebank dataset, it can be injected with cyberbully related corupus, like one from: https://ieee-dataport.org/open-access/fine-grained-balanced-cyberbullying-dataset