Data is one of the most critical component while modelling any machine learning algorithm. Since it’s the first part of the process having bad data could have cascading effect on everything we do after this.
Data can have different set of issues like:
- Missing items
- incorrect items
- Skewness
- Low volume
- Being outdated.
- Malware
- https://www.kaggle.com/c/malware-classification
- A repository of LIVE malwares for your own joy and pleasure http://thezoo.morirt.com/
- https://github.com/fabrimagic72/malware-samples