Comment classification of C code

This project is a submission of subtask named Information Retreival in Software Engineering (IRSE) given by Forum for Information Retrieval Evaluation (FIRE) 2022. It aims to present different text mining frameworks and analyze their performance for classification of C codes as useful or non-useful. The frameworks involve various classifiers and feature engineering schemes following bag of words (BOW) model. Classical machine learning models like random forest, logistic regression and support vector machine and transformer based models like BERT, RoBERT and ALBERT have been explored.

Pre-requisites

NumPy, Scikit-Learn, NLTK, Torch, Transformers

To run the framework

Create a folder named saved_models in the main project path during training phase to store the trained models, and thus the models can be reused without training. In the testing_irse.py

the argument model can be

'bert' for transformer models

'entropy' for Entropy based term weighting scheme

'tfidf' for TF-IDF based term weighting scheme

and the argument clf_opt can be

'lr' for Logistic Regression 

'rf' for Random Forest

'svm' for Support Vector Machine

The desired number of terms can be selected by no_of_selected_features.
For running BERT, RoBERT and ALBERT models change the model_name in the irse2022.py and model_source in testing_irse.py from Hugging Face

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
README.md		README.md
irse2022.py		irse2022.py
testing_irse.py		testing_irse.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Comment classification of C code

Pre-requisites

To run the framework

About

Releases

Packages

Languages

SruthiSudheer/Comment-classification-of-C-code

Folders and files

Latest commit

History

Repository files navigation

Comment classification of C code

Pre-requisites

To run the framework

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages