Skip to content

Where is the file implementing the logic to detect duplication in MATLAB fliles? #3732

Answered by jsotuyod
Masfouf asked this question in Q&A
Discussion options

You must be logged in to vote

CPD works the same way for all languages:

  1. Based on a language-specific parser, we tokenize each file to be analyzed.
  2. We roll over the code and compute a hash on a rolling n token window for each file's codebase. This code
  3. We group together token chains that have the same hash. This code

Your main concern is really as to how the code is tokenized, and how strictly you retain the code; for instance, are you looking for exact copy-pastes (exact variable and function names), or not. Some languages (specially Java), support several flags to control these, but not all languages do this to the same extent.

Matlab itself support no additional flag, only has the built-in support to ignore commen…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@adangel
Comment options

Answer selected by adangel
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants