Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Succinct Data Structure Representation of _type #1

Open
edefazio opened this issue Nov 6, 2019 · 1 comment
Open

Succinct Data Structure Representation of _type #1

edefazio opened this issue Nov 6, 2019 · 1 comment

Comments

@edefazio
Copy link
Collaborator

edefazio commented Nov 6, 2019

This is a long-term goal---
A tool to take existing _types and _members, and convert them into a succinct data structure
or get a succinct data structure and turn it into a _type or _member.

the bidirectional nature succinct data structure to represent _types (_class, _enum, _interface, _annotation)
AND underlying _members (_initBlock, _method, _constructor, etc.)

  • is READ-ONLY
  • NO code formatting
  • can be iterated over (just like a _class.forMethods(m->... )) NOT MUTATED
  • can be walked into (like `Walk.listIn(_sc, Expression.class, e-> out.print(e)) )
  • can return "realized" i.e. return objects (_class, _method,_field) at any nest level

Internally I imagine it'll be similar to bytecode with bytes representing opcodes
and linking to names of things in a Lookup table

the purpose of this, is to make looking through code more memory efficient
(i.e. I should be able to take TONS of code like the source code of Linix) and
query it easily.

Looking through ALL code in a project should be fast & memory efficient
(we'll have probably MULTIPLE INDEXES outside of these types that provide information about the Class internals to speed up queries (i.e. feature hashing and or bloom filters ) and internally
we'll be able to load and sequentially walk the data structure performing analysis and transformations

more info on succinct data structures.
Succinct Data Structure
Feature Hashing
Bloom Filter

@edefazio
Copy link
Collaborator Author

Generally speaking, I should be able to achieve this by just using the existing infrastructure (for JavaParser/jdraft) to walk and create a serialized form.

Also, I should consider "fully qualifying everything without imports" i.e. directly scoping all static method calls and news and static field accesses as to have less ambiguity and making the code more easily usable so (we dont need to use the Java Symbol Solver, but rather just store the relationship directly in the AST via scope:

IF we have the classes available... it'd be nice to just use something like ClassGraph to build the CallGraph, so we wouldnt have to manually resolve the symbols or use the JavaSymbolSolver

https://github.com/classgraph/classgraph/wiki

i.e. before:

String s = "Hello"
Url url = new Ulr();
out.println("hey");

after:

java.lang.String s = "Hello"
java.net.Url url = new java.net.Url();
System.out.println("hey");

Here are some more (related) ideas about storage/querying/indexing (Finite State Automata/Bitap):
https://pvk.ca/Blog/2013/06/23/bitsets-match-regular-expressions/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant