An attempt at SemEval 2018's shared task 8 (SecureNLP). This project addresses parts one and two of this shared task:
- Classification of sentences as being relevant or not to the task of extracting information about malware capabilities.
- Structure prediction of sentences (in BIO format) for Entities, Actions, and Modifiers containing information about malware capabilities.
For more information see the official page. Access to the data set can also be found through this website (although you do have to contact the administrators)
Results can be seen here. As of 9/5/2018, (compared to scores from the evaluation period) highest score for Subtask 2 relaxed score, 3rd place in Subtask 1, and 4th place for Subtask 2 strict score
To run this project:
- Obtain the data set
- Make sure all file locations for the data are accurate in config.py
- Run data_process.py
- Run the first task with sent_classification_sgd.py
- Run the second task with entity_recognition_crf.py