Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROPOSAL] MSR Paper Review: CCFinder & Cross-language clone detection by learning over abstract syntax trees #306

Open
m09 opened this issue Jul 24, 2019 · 6 comments

Comments

@m09
Copy link
Contributor

m09 commented Jul 24, 2019

@bzz
Copy link
Contributor

bzz commented Jul 24, 2019

Will be happy to help and take care of this one, original suggestion comes from this meeting.

Here is the preliminary blog post plan, it's not a usual paper-review any more, but rather an overview that touches up on number of relevant work (presented at MSR)

This field pioneered by
Katsuro Inoue
  "CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code" 2002 
  https://www.semanticscholar.org/paper/CCFinder%3A-A-Multilinguistic-Token-Based-Code-Clone-Kamiya-Kusumoto/98e810ed098a651e0ba8cbb63d2d926d4eebdf9b
  http://www.ccfinder.net/ccfinderxos.html and MSR co-founder

Particulary important for ML on code now because of
Miltos Allamanis
  "The Adverse Effects of Code Duplication in Machine Learning Models of Code"
  https://arxiv.org/abs/1812.06469

Modern methods are cross-language and incorporate structural information from AST
Daniel Perez
  "Cross-language clone detection by learning over abstract syntax trees"
  https://static.perez.sh/research/2019/cross-language-clone-detection/clone-detection-msr19.pdf

To scale it to a large codebases source{d} built Gemini
  (paper pending)

@m09 @vmarkovtsev @warenlg please let me know what you guys think about the structure.

@m09
Copy link
Contributor Author

m09 commented Jul 24, 2019

The plan looks great to me. Let me know if you want a review at any point when you start working on this!

@vcoisne
Copy link
Contributor

vcoisne commented Aug 19, 2019

@bzz Trying to get visibility into overall content calendar. When do you think you'll be able to write this one ?

@bzz
Copy link
Contributor

bzz commented Sep 1, 2019

I'm sorry for the delay cause by vacations, @vcoisne.

As one of the goals for this one is to have a brief blog post (not a long one) - I'll try to post a draft by the end of next week.

@vcoisne
Copy link
Contributor

vcoisne commented Sep 16, 2019

@bzz ping :)

@bzz
Copy link
Contributor

bzz commented Sep 18, 2019

and of course I did not manage to find the time though the retreat week :/ Sorry about misleading communication.

I will be on vacation and then AFK for a while and shall be able to get back to this first thing on Oct 14th.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants