This project focuses on detecting code clones using machine learning and deep learning techniques. Code clone detection helps improve software quality by identifying duplicated or highly similar code fragments that may increase maintenance cost and bug propagation.
Code cloning is a common practice in software development, but excessive duplication can negatively affect maintainability and scalability. This project applies data processing, feature extraction, and predictive models to detect code clones efficiently.
The implementation is provided as a Jupyter Notebook, covering the full pipeline from data preprocessing to model evaluation.
- Data preprocessing and cleaning
- Feature extraction from source code
- Machine Learning models for clone detection
- Deep Learning-based approaches for improved accuracy
- Model evaluation and performance comparison
- Python
- NumPy
- Pandas
- Scikit-learn
- Deep Learning Libraries (TensorFlow / Keras / PyTorch, if applicable)
- Jupyter Notebook
├── Code_Clone_Detection_Using_Machine_Learning_and_Deep_Learning+With Data Processing.ipynb
├── README.md
- Clone the repository:
git clone https://github.com/your-username/code-clone-detection-ml-dl.git
- Navigate to the project directory:
cd code-clone-detection-ml-dl - Open the notebook:
jupyter notebook "Code_Clone_Detection_Using_Machine_Learning_and_Deep_Learning+With Data Processing.ipynb" - Run all cells sequentially.
- The models successfully identify cloned and non-cloned code segments
- Deep learning approaches show improved performance over traditional ML methods
- Demonstrates the effectiveness of automated clone detection in software engineering
- Support for multiple programming languages
- Integration with static analysis tools
- Model optimization and hyperparameter tuning
- Deployment as a developer tool or IDE plugin
Sorasith Chormalee
Software Engineering & Machine Learning
📍 Potsdam, Germany / Remote
📧 chormaleesorasith@gmail.com
🔗 LinkedIn: https://www.linkedin.com/in/sorasith-chormalee/