Skip to content

joamonys11/MachineLearning-Code_Clone_Detection

Repository files navigation

🔍 Code Clone Detection Using Machine Learning & Deep Learning

This project focuses on detecting code clones using machine learning and deep learning techniques. Code clone detection helps improve software quality by identifying duplicated or highly similar code fragments that may increase maintenance cost and bug propagation.


📌 Project Overview

Code cloning is a common practice in software development, but excessive duplication can negatively affect maintainability and scalability. This project applies data processing, feature extraction, and predictive models to detect code clones efficiently.

The implementation is provided as a Jupyter Notebook, covering the full pipeline from data preprocessing to model evaluation.


🧠 Techniques Used

  • Data preprocessing and cleaning
  • Feature extraction from source code
  • Machine Learning models for clone detection
  • Deep Learning-based approaches for improved accuracy
  • Model evaluation and performance comparison

🛠️ Technologies & Tools

  • Python
  • NumPy
  • Pandas
  • Scikit-learn
  • Deep Learning Libraries (TensorFlow / Keras / PyTorch, if applicable)
  • Jupyter Notebook

📂 Repository Structure

├── Code_Clone_Detection_Using_Machine_Learning_and_Deep_Learning+With Data Processing.ipynb
├── README.md

▶️ How to Run

  1. Clone the repository:
    git clone https://github.com/your-username/code-clone-detection-ml-dl.git
  2. Navigate to the project directory:
    cd code-clone-detection-ml-dl
  3. Open the notebook:
    jupyter notebook "Code_Clone_Detection_Using_Machine_Learning_and_Deep_Learning+With Data Processing.ipynb"
  4. Run all cells sequentially.

📈 Results

  • The models successfully identify cloned and non-cloned code segments
  • Deep learning approaches show improved performance over traditional ML methods
  • Demonstrates the effectiveness of automated clone detection in software engineering

🚀 Future Improvements

  • Support for multiple programming languages
  • Integration with static analysis tools
  • Model optimization and hyperparameter tuning
  • Deployment as a developer tool or IDE plugin

👤 Author

Sorasith Chormalee
Software Engineering & Machine Learning

📍 Potsdam, Germany / Remote
📧 chormaleesorasith@gmail.com
🔗 LinkedIn: https://www.linkedin.com/in/sorasith-chormalee/

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors