Program for anomaly detection in Kotlin source code.
Anomaly is Kotlin source code file, witch according to machine learning algorithms deviates from the norm.
Program consist of four top-level parts:
- parsing Kotlin source codes: Kotlin compiler run (kotlin-academic, which instead of compiling into jvm-bytecode outputs the result of the parsing program - concrete suntax tree (CST), also referred to in intellij platform as PSI);
- CST extraction and factorization: CSTs transformation to a vectors by specified features configuration;
- autoencoding: run autoencoder neural network on the dataset obtained from the previous stage (vector set);
- anomaly selection based on the decoding losses obtained from the autoencoder (DBScan or 3/5-sigma deviation).
These parts correspond to three tools included in code-anomaly-detection as submodules:
You run the program on the project you are interested in by specifying the project folder. The program analyzes only files with the
If the program finds anomalies, then it is written to the specified file.
Before run program, you need to initialize the git submodules:
git submodule update --init --recursive --remote
--input_folder: path to your project on Kotlin (or just containing kotlin source code files);
--output_file: file in which paths to anomalistic files will be written, if they are found.
Example of use
python3 main.py -i ~/IdeaProjects/kotlin-native -o ./anomalies.json
While working, the program creates a temporary folder
data with intermediate analysis results in its own folder. Please do not remove it. The program will automatically remove it after the completion of its work.
Copying anomalistic files to specified directory
You can use
anomalistic_files_extractor.sh script to copy anomalistic files (by paths in
anomalies.json) to specified directory.
--anomalies_file: path to file with found anomalies (generated by code-anomaly-detection);
--anomalies_folder: path to folder in which will be copied anomalistic files;
--code_folder: path to project which you specify for code-anomaly-detection.
Example of use
chmod +x ./anomalistic_files_extractor.sh
./anomalistic_files_extractor.sh -i ./anomalies.json -o ./anomalies -c ~/IdeaProjects/kotlin-native
The specified folder will be automatically created if not exist.
File names will be contain anomaly factor and index number.
Kotlin CST factorization config
To factorize the Kotlin CST, a configuration file
features_config.json is used that lists the extracting features.
You can change extracting features configuration. See more: ast2vec readme
Configure algorithm parameters
To configure the parameters of the algorithms you can use the tools used in the
See submodules section.