Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Represent source code in Java for analysis #2

Closed
mkshiblu opened this issue May 14, 2020 · 2 comments
Closed

Represent source code in Java for analysis #2

mkshiblu opened this issue May 14, 2020 · 2 comments
Assignees

Comments

@mkshiblu
Copy link
Owner

mkshiblu commented May 14, 2020

To perform refactoring detection, it is necessary to represent the source code in a preferred format.

  • One way to do it is to present the source code using a composite pattern where each node represents a code element

  • The technical complexity includes parsing the source files using Java with the help of J2V8 plugin from eclipse. Though it's documentation is a bit dated, it's actively being maintained and more performant than other Js Java engine. This bridge between the js code adds additional overhead such as proper releasing of memory etc.

Alternatively, we could parse the source code in JS script and store them in a database. Then these tokens could be represented in Java. This also would allow us to cache the results since commits are permanent of nature.

@mkshiblu mkshiblu self-assigned this May 14, 2020
@mkshiblu mkshiblu added this to the MILESTONE_MAY_25 milestone May 19, 2020
@mkshiblu
Copy link
Owner Author

mkshiblu commented Jun 1, 2020

A seemingly interesting idea could be representing the code elements such as functions, identifiers, etc as nodes in graphs. The program structure will probably not be composite in such a case.

Possible Pros of graph approach:

  • Storing of parsed code elements in graph DB (Such as neo4j or in-memory like TinkerPop)
  • Good graph traversal support from google guava graph library
  • It should be much faster than a trivial composite structure or simple comparison?
  • Should allow scaling.

Cons:

  • Yet to evaluate how this will affect the Diffing of previous and current program versions to detect refactoring. Theoretically, using graph DB or graph structure should improve the retrieval of entries with many to many relationships
  • Need more R&D and effort on deciding which graph DB to choose from, integrating libraries.

@mkshiblu
Copy link
Owner Author

Due to more flexibility and theoretically increase customized performance, It's decided not to go with any databases rather creating own graph-like structure.

mkshiblu added a commit that referenced this issue Jun 15, 2020
* Successfully added custom script to run AST traversal later.
The current node js version inside J2V8 is too old to run the AST traversal on the JS side. Needs manaul building of that library.

* Added customized build j2v8 engine (4.85) with node js 7.4.
The last supported official version of j2v8 was 4.6 with nodejs below 6 for windows since it has dropped support for windows. Our babel traverse plugin needs higher Nodejs version without which we cannot traverse the AST of js files.

* Successfully traverse and transfer back function declarations with custom representation to java

* Added parsing of fully qualified namespace from scope in js

* Added support for named function expression and parameter names

* Map js function declarations to Java objects

* Detect rename function using simplest arpproach (same body, in the same namespace). The implementation is more of a proof of concept which must be optimized but it works. Need to test on real projects and test it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant