Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TODO] [WIP] allow customizing AST: using clang's AST instead of treesitter's AST #28

Closed
timotheecour opened this issue Dec 3, 2018 · 3 comments

Comments

@timotheecour
Copy link
Contributor

timotheecour commented Dec 3, 2018

Wondering how much work it'd be to abstract away the source AST to allow using clang's AST instead of treesitter's AST

clang -Xclang -ast-dump -fno-color-diagnostics -fsyntax-only tests/include/test.h

clang -Xclang -ast-dump -fno-color-diagnostics -fsyntax-only tests/include/test.h
TranslationUnitDecl 0x7f8d3882bce8 <<invalid sloc>> <invalid sloc>
|-TypedefDecl 0x7f8d3882c260 <<invalid sloc>> <invalid sloc> implicit __int128_t '__int128'
| `-BuiltinType 0x7f8d3882bf80 '__int128'
|-TypedefDecl 0x7f8d3882c2d0 <<invalid sloc>> <invalid sloc> implicit __uint128_t 'unsigned __int128'
| `-BuiltinType 0x7f8d3882bfa0 'unsigned __int128'
|-TypedefDecl 0x7f8d3882c5a8 <<invalid sloc>> <invalid sloc> implicit __NSConstantString 'struct __NSConstantString_tag'
| `-RecordType 0x7f8d3882c3b0 'struct __NSConstantString_tag'
|   `-Record 0x7f8d3882c328 '__NSConstantString_tag'
|-TypedefDecl 0x7f8d3882c640 <<invalid sloc>> <invalid sloc> implicit __builtin_ms_va_list 'char *'
| `-PointerType 0x7f8d3882c600 'char *'
|   `-BuiltinType 0x7f8d3882bd80 'char'
...

motivation

dealing with treesitter's shortcomings, especially wrt lack of semantic analysis making it hard to deal with C++ templates etc

note

it may be even easier to write a binary using libclang that outputs the AST in json directly instead of having to custom parse it in nimterop

@genotrance
Copy link
Collaborator

While it is certainly possible, I'm not sure the current implementation will immediately translate since the two ASTs might be different. The engine can be leveraged but the AST grammar and node data structure will need to be handled separately.

I'm also not sure this is really warranted, especially since we are only focusing on wrapping and not full-fledged translation. That being said, I'm not a C++ expert so if you could give me specific examples of where the tree-sitter AST is not good enough, it will be helpful.

Take this simple subset of code I got from here:

template<class T>
class Item
{
    T Data;
public:
    Item() : Data( T() )
    {}

    void SetData(T nValue)
    {
        Data = nValue;
    }

    T GetData() const
    {
        return Data;
    }

    void PrintData()
    {
        cout << Data;
    }
};

template<class T>
void PrintNumbers(T array[], int array_size, T filter = T());

The tree-sitter AST output is as follows:

(translation_unit 1 1 366
 (template_declaration 1 1 282
  (template_parameter_list 1 9 9
   (type_parameter_declaration 1 10 7
    (type_identifier 1 16 1)
   )
  )
  (class_specifier 2 1 262
   (type_identifier 2 7 4)
   (field_declaration_list 3 1 250
    (field_declaration 4 5 7
     (type_identifier 4 5 1)
     (field_identifier 4 7 4)
    )
    (access_specifier 5 1 7)
    (function_definition 6 5 28
     (function_declarator 6 5 6
      (identifier 6 5 4)
      (parameter_list 6 9 2)
     )
     (field_initializer_list 6 12 13
      (field_initializer 6 14 11
       (field_identifier 6 14 4)
       (argument_list 6 18 7
        (call_expression 6 20 3
         (identifier 6 20 1)
         (argument_list 6 21 2)
        )
       )
      )
     )
     (compound_statement 7 5 2)
    )
    (function_definition 9 5 60
     (primitive_type 9 5 4)
     (function_declarator 9 10 17
      (field_identifier 9 10 7)
      (parameter_list 9 17 10
       (parameter_declaration 9 18 8
        (type_identifier 9 18 1)
        (identifier 9 20 6)
       )
      )
     )
     (compound_statement 10 5 32
      (expression_statement 11 9 14
       (assignment_expression 11 9 13
        (identifier 11 9 4)
        (identifier 11 16 6)
       )
      )
     )
    )
    (function_definition 14 5 53
     (type_identifier 14 5 1)
     (function_declarator 14 7 15
      (field_identifier 14 7 7)
      (parameter_list 14 14 2)
      (type_qualifier 14 17 5)
     )
     (compound_statement 15 5 30
      (return_statement 16 9 12
       (identifier 16 16 4)
      )
     )
    )
    (function_definition 19 5 53
     (primitive_type 19 5 4)
     (function_declarator 19 10 11
      (field_identifier 19 10 9)
      (parameter_list 19 19 2)
     )
     (compound_statement 20 5 31
      (expression_statement 21 9 13
       (shift_expression 21 9 12
        (identifier 21 9 4)
        (identifier 21 17 4)
       )
      )
     )
    )
   )
  )
 )
 (template_declaration 25 1 80
  (template_parameter_list 25 9 9
   (type_parameter_declaration 25 10 7
    (type_identifier 25 16 1)
   )
  )
  (declaration 26 1 61
   (primitive_type 26 1 4)
   (function_declarator 26 6 55
    (identifier 26 6 12)
    (parameter_list 26 18 43
     (parameter_declaration 26 19 9
      (type_identifier 26 19 1)
      (array_declarator 26 21 7
       (identifier 26 21 5)
      )
     )
     (parameter_declaration 26 30 14
      (primitive_type 26 30 3)
      (identifier 26 34 10)
     )
     (optional_parameter_declaration 26 46 14
      (type_identifier 26 46 1)
      (identifier 26 48 6)
      (call_expression 26 57 3
       (identifier 26 57 1)
       (argument_list 26 58 2)
      )
     )
    )
   )
  )
 )
)

As you can see, there's enough information to figure out what's going on and while it is far from trivial, we can certainly generate a Nim wrapper once the equivalent Nimisms are understood.

Let me know what you think.

@genotrance
Copy link
Collaborator

I'm closing this for now. The engine can definitely be leveraged for a different AST structure so it isn't unimaginable.

@timotheecour timotheecour changed the title [WIP] allow customizing AST: using clang's AST instead of treesitter's AST [TODO] [WIP] allow customizing AST: using clang's AST instead of treesitter's AST Dec 29, 2018
@timotheecour
Copy link
Contributor Author

@genotrance see also zig as mentioned here nim-lang/Nim#13757 (comment)

There could be something worth investigating there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants