Enhance AST dump #7

NalaGinrut · 2020-04-23T17:34:27Z

I'm working on an experimental branch to enhance -frust-dump-parse which is a little inspired by gccgo's AST dump.
For example:

 fn abc(x:u32, y:u32) -> u32 {
    return x+y;
 }
 
 fn main() {
     println!("Hello World!");
 }

The dumped result looks like this:

Crate: 
 inner attributes: none
 items: 
  
u32 abc(x : u32, y : u32)
 {
 BlockExpr: 
 outer attributes: none
 inner attributes: none
 statements: 
  ExprStmtWithoutBlock: return ArithmeticOrLogicalExpr: x + y
 final expression: none
 }

  
void main()
 {
 BlockExpr: 
 outer attributes: none
 inner attributes: none
 statements: 
  ExprStmtWithoutBlock: println!("Hello World!")
 final expression: none
 }

The branch is nala/enhance/ast-dump

Comments are welcome.

The text was updated successfully, but these errors were encountered:

philberty · 2020-04-24T09:34:15Z

I really like this output it makes it more in like with how -fdump-tree-gimple looks also

SimplyTheOther · 2020-04-24T13:55:42Z

There's two main comments I'd like to make here:

Some of the non-dump-related code uses the as_string() method to convert some things into string form, such as paths (e.g. converting something::something_else to "something::something_else"). As such, if you convert all object dumping processes to output the same form, either the code requesting a string form would have to be changed (i.e. new method), or you might have to add a new method like to_dump_string() or something.
The main issue that I experienced with creating a nice-looking dump is that indentation levels are pretty much impossible to create with the current system of recursive as_string() calls, as the called object has no conception of the indentation level of the parent, which prevents any indentation from being preserved if the called object adds a new line to the string. You'd probably have to pretty heavily modify the dumping mechanism to do this - the best solution that I can currently think of would be a separate "Dump Creator" object that holds state on the current indentation level and the string so far, and a theoretical dump_string() method could add new lines to that object, and then the final string could be extracted from it at the end. This would probably be a lot of work and may involve unforeseen issues, however.

There's also one minor nitpick with the current example you have shown - strictly speaking, BlockExpr includes the curly braces surrounding it (as that is how BlockExpr is defined).

NalaGinrut · 2020-04-24T17:36:01Z

@SimplyTheOther
I don't expect to make it perfect in a short time. Actually, I think we need a more readable output to help us to debug the very beginning syntax parsing, and this may help to attract more contributions. So we may improve it gradually. Of course, we could also rewrite the AST (iff it's not good enough) in the future for better dumping work. If the indentation is finally proved to be a problem, we may have to use an external indent tool to help us. After all, it's just for the very beginning debugging.

Thanks for the mention. I haven't read BlockExpr part. I only tried to improve the function and parameters for a prove of concept. I'll keep updating it.

NalaGinrut · 2020-04-24T19:20:49Z

BTW, if you're going to create a PR for indentation, please consider merging this branch first to avoid too many conflicts, then indent the whole code. I'm sorry it's still half baked, but I don't think it will break existing things at least.

However, here's a thing I have to mention. I've patched the string literal token to add double-quote to the value. I think this should be done when lexing the string literal. Otherwise, there are no double quotes around string when we print them, say println!(Hello World!)

NalaGinrut · 2020-04-26T12:49:13Z

I've updated for indentation, so far so good. However, the global variable has to be used since the current AST doesn't support the indentation well. But I think it's fine. We don't have to spend time optimizing AST dump. It's just for debugging support. Let's save time for type checking and IR transformation.

Crate: 
 inner attributes: none
 items: 
  
u32 abc(x : u32, y : u32)
BlockExpr:
{
 outer attributes: none
 inner attributes: none
 statements: 
 ExprStmtWithoutBlock: return ArithmeticOrLogicalExpr: x + y
 final expression: none
}

  
void main()
BlockExpr:
{
 outer attributes: none
 inner attributes: none
 statements: 
  ExprStmtWithBlock: 
   BlockExpr:
   {
    outer attributes: none
    inner attributes: none
    statements: 
    ExprStmtWithoutBlock: ArithmeticOrLogicalExpr: 1 + 1
    final expression: none
   }
 ExprStmtWithoutBlock: println!("Hello World!")
 final expression: none
}

SimplyTheOther · 2020-04-28T01:27:18Z

@NalaGinrut @philberty Kind of off-topic, but do you think this is a good template to show others what we’re working on at the moment (i.e. create an issue)? Or is there a better way (e.g. draft pull requests, projects maybe)?

BTW, if you're going to create a PR for indentation, please consider merging this branch first to avoid too many conflicts, then indent the whole code. I'm sorry it's still half baked, but I don't think it will break existing things at least.

However, here's a thing I have to mention. I've patched the string literal token to add double-quote to the value. I think this should be done when lexing the string literal. Otherwise, there are no double quotes around string when we print them, say println!(Hello World!)

Depending on what exactly you mean by that (i.e. when you add the double quotes - at the lexer stage or later), that might break some of the code comparing the values of string literals (e.g. cfg attribute handling). I think it would be better to store the value of the string value only and then add the double quotes in the as_string() method, if that’s not what you’ve done already (since adding characters is easier than removing them).

NalaGinrut · 2020-04-28T04:12:44Z

@philberty @SimplyTheOther

If you want to show AST dump to promote gccrs, I think it's at least an explicit and understandable work to common users. It shows the parser work for Rust syntax. Personally, I think it's OK.
The only concern is the time. I don't think it can be complete for every corner case of Rust syntax in a short time. However, it's not good to keep it unmerged for a long time since it should be helpful for our next step development. How about this, I will raise a PR after a few tweaks. Then we can promote gccrs by indicating AST dump, and we can fix the indentation issue after this PR soon.
Adding necessary double-quotes would be better, however, I didn't find the correct place in AST dump. I've tried some, but it doesn't work. I think it's fine. I can remove the lexer modification, and postpone this tiny fix.

SimplyTheOther · 2020-04-28T14:05:26Z

@philberty @SimplyTheOther

1. If you want to show AST dump to promote gccrs, I think it's at least an explicit and understandable work to common users. It shows the parser work for Rust syntax. Personally, I think it's OK.
   The only concern is the time. I don't think it can be complete for every corner case of Rust syntax in a short time. However, it's not good to keep it unmerged for a long time since it should be helpful for our next step development. How about this, I will raise a PR after a few tweaks. Then we can promote gccrs by indicating AST dump, and we can fix the indentation issue after this PR soon.

I see your reasoning. I'm just a little wary about the possibility of constructing a system that is difficult to change afterwards and having to do massive refactoring at some point in the future. But yeah, right now a predominantly working parser is probably more important.

2. Adding necessary double-quotes would be better, however, I didn't find the correct place in AST dump. I've tried some, but it doesn't work. I think it's fine. I can remove the lexer modification, and postpone this tiny fix.

You would add the double quotes in the as_string() method in Literal in rust-ast.h if the literal type is a string (so you'd probably do a switch, since characters should also have single quotes too then). I would then also add a raw_value() method (or some other name) that returns the raw value_as_string so that comparisons with raw value don't have to strip the quotes off.

NalaGinrut · 2020-04-28T17:17:02Z

@philberty @SimplyTheOther

Yes, although we can rethink many things at this stage, there're a lot of other works to do. I think we can polish in the future. If you were asking about the WIP template, I think the answer is yes. For larger work, it's better to raise an issue with WIP, and contributors may follow the related branch. I never used project, I don't know if it's proper for the case.
For the string literal part, I have removed my modification in the PR. I may need some time to figure it out. We don't have to fix it in the same PR.

NalaGinrut · 2020-04-28T17:24:09Z

Oh, I just tried projects, there's kanban, and we can drag related issues in. I think we can take advantage of it.

philberty · 2020-04-28T20:27:57Z

I think PR's should be for things you want to get merged into Master when you think they are ready, otherwise in my experience PR's have a tendency to be neglected by reviewers.

For distributing work i started to create a github kanban board i am not sure if you guys have access to create/update on there if not i will take a look at fix that for you guys. I think we should use itas much as we can if a piece of work is of bigger scope or are not sure about putting on the kanban board lets create an issue to discuss it.

What do you think? I am happy to change things but lets try this for now for a few weeks.

NalaGinrut · 2020-04-29T08:08:32Z

@philberty I think the Github kanban can be based on issues, say, you can just reference the issue number with a hash to reference, you don't even have to write any description. Then we can connect kanban and issues together.

However, if the new feature is working by just one person, the kanban could be ignored, the issue is enough. But if we want to attract more people to work together on the same sub-project, the kanban is still important.

So I think, at least in this stage, maybe kanban is not necessary, depends on you whether to use it. Personally, I'm glad to try. But issues should play an important role in any time.

As diagnosed with Jakub and Richard in the analysis of PR 102134, the current implementation of wi::clz has incorrect/inconsistent behaviour. As mentioned by Richard in comment #7, clz should (always) return zero for negative values, but the current implementation can only return 0 when precision is a multiple of HOST_BITS_PER_WIDE_INT. The fix is simply to reorder/shuffle the existing tests. 2021-09-06 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * wide-int.cc (wi::clz): Reorder tests to ensure the result is zero for all negative values.

…imize or target pragmas [PR103012] The following testcases ICE when an optimize or target pragma is followed by a long line (4096+ chars). This is because on such long lines we can't use columns anymore, but the cpp_define calls performed by c_cpp_builtins_optimize_pragma or from the backend hooks for target pragma are done on temporary buffers and expect to get columns from whatever line they appear on (which happens to be the long line after optimize/target pragma), and we run into: #0 fancy_abort (file=0x3abec67 "../../libcpp/line-map.c", line=502, function=0x3abecfc "linemap_add") at ../../gcc/diagnostic.c:1986 #1 0x0000000002e7c335 in linemap_add (set=0x7ffff7fca000, reason=LC_RENAME, sysp=0, to_file=0x41287a0 "pr103012.i", to_line=3) at ../../libcpp/line-map.c:502 #2 0x0000000002e7cc24 in linemap_line_start (set=0x7ffff7fca000, to_line=3, max_column_hint=128) at ../../libcpp/line-map.c:827 #3 0x0000000002e7ce2b in linemap_position_for_column (set=0x7ffff7fca000, to_column=1) at ../../libcpp/line-map.c:898 #4 0x0000000002e771f9 in _cpp_lex_direct (pfile=0x40c3b60) at ../../libcpp/lex.c:3592 #5 0x0000000002e76c3e in _cpp_lex_token (pfile=0x40c3b60) at ../../libcpp/lex.c:3394 #6 0x0000000002e610ef in lex_macro_node (pfile=0x40c3b60, is_def_or_undef=true) at ../../libcpp/directives.c:601 #7 0x0000000002e61226 in do_define (pfile=0x40c3b60) at ../../libcpp/directives.c:639 #8 0x0000000002e610b2 in run_directive (pfile=0x40c3b60, dir_no=0, buf=0x7fffffffd430 "__OPTIMIZE__ 1\n", count=14) at ../../libcpp/directives.c:589 #9 0x0000000002e650c1 in cpp_define (pfile=0x40c3b60, str=0x2f784d1 "__OPTIMIZE__") at ../../libcpp/directives.c:2513 #10 0x0000000002e65100 in cpp_define_unused (pfile=0x40c3b60, str=0x2f784d1 "__OPTIMIZE__") at ../../libcpp/directives.c:2522 #11 0x0000000000f50685 in c_cpp_builtins_optimize_pragma (pfile=0x40c3b60, prev_tree=<optimization_node 0x7fffea042000>, cur_tree=<optimization_node 0x7fffea042020>) at ../../gcc/c-family/c-cppbuiltin.c:600 assertion that LC_RENAME doesn't happen first. I think the right fix is emit those predefined macros upon optimize/target pragmas with BUILTINS_LOCATION, like we already do for those macros at the start of the TU, they don't appear in columns of the next line after it. Another possibility would be to force them at the location of the pragma. 2021-12-30 Jakub Jelinek <jakub@redhat.com> PR c++/103012 gcc/ * config/i386/i386-c.c (ix86_pragma_target_parse): Perform cpp_define/cpp_undef calls with forced token locations BUILTINS_LOCATION. * config/arm/arm-c.c (arm_pragma_target_parse): Likewise. * config/aarch64/aarch64-c.c (aarch64_pragma_target_parse): Likewise. * config/s390/s390-c.c (s390_pragma_target_parse): Likewise. gcc/c-family/ * c-cppbuiltin.c (c_cpp_builtins_optimize_pragma): Perform cpp_define_unused/cpp_undef calls with forced token locations BUILTINS_LOCATION. gcc/testsuite/ PR c++/103012 * g++.dg/cpp/pr103012.C: New test. * g++.target/i386/pr103012.C: New test.

In plenty of image and video processing code it's common to modify pixel values by a widening operation and then scale them back into range by dividing by 255. This patch adds an named function to allow us to emit an optimized sequence when doing an unsigned division that is equivalent to: x = y / (2 ^ (bitsize (y)/2)-1) For SVE2 this means we generate for: void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n) { for (int i = 0; i < (n & -16); i+=1) pixel[i] = (pixel[i] * level) / 0xff; } the following: mov z3.b, #1 .L3: ld1b z0.h, p0/z, [x0, x3] mul z0.h, p1/m, z0.h, z2.h addhnb z1.b, z0.h, z3.h addhnb z0.b, z0.h, z1.h st1b z0.h, p0, [x0, x3] inch x3 whilelo p0.h, w3, w2 b.any .L3 instead of: .L3: ld1b z0.h, p1/z, [x0, x3] mul z0.h, p0/m, z0.h, z1.h umulh z0.h, p0/m, z0.h, z2.h lsr z0.h, z0.h, #7 st1b z0.h, p1, [x0, x3] inch x3 whilelo p1.h, w3, w2 b.any .L3 Which results in significantly faster code. gcc/ChangeLog: * config/aarch64/aarch64-sve2.md (@aarch64_bitmask_udiv<mode>3): New. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve2/div-by-bitmask_1.c: New test.

NalaGinrut mentioned this issue Apr 28, 2020

Enhance AST dump #8

Merged

NalaGinrut added the WIP label May 1, 2020

philberty mentioned this issue Jul 26, 2021

Can't call extern functions #421

Closed

philberty changed the title ~~[WIP] Enhance AST dump~~ Enhance AST dump Feb 22, 2022

philberty added good-first-pr and removed WIP labels Mar 28, 2022

philberty mentioned this issue May 19, 2022

Add AST dump visitor #1261

Closed

P-E-P mentioned this issue Jul 4, 2023

Ast collector fixes #2340

Merged

CohenArthur closed this as completed in #2340 Jul 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance AST dump #7

Enhance AST dump #7

NalaGinrut commented Apr 23, 2020

philberty commented Apr 24, 2020

SimplyTheOther commented Apr 24, 2020

NalaGinrut commented Apr 24, 2020

NalaGinrut commented Apr 24, 2020

NalaGinrut commented Apr 26, 2020

SimplyTheOther commented Apr 28, 2020

NalaGinrut commented Apr 28, 2020 •

edited

SimplyTheOther commented Apr 28, 2020

NalaGinrut commented Apr 28, 2020 •

edited

NalaGinrut commented Apr 28, 2020

philberty commented Apr 28, 2020

NalaGinrut commented Apr 29, 2020

Enhance AST dump #7

Enhance AST dump #7

Comments

NalaGinrut commented Apr 23, 2020

philberty commented Apr 24, 2020

SimplyTheOther commented Apr 24, 2020

NalaGinrut commented Apr 24, 2020

NalaGinrut commented Apr 24, 2020

NalaGinrut commented Apr 26, 2020

SimplyTheOther commented Apr 28, 2020

NalaGinrut commented Apr 28, 2020 • edited

SimplyTheOther commented Apr 28, 2020

NalaGinrut commented Apr 28, 2020 • edited

NalaGinrut commented Apr 28, 2020

philberty commented Apr 28, 2020

NalaGinrut commented Apr 29, 2020

NalaGinrut commented Apr 28, 2020 •

edited

NalaGinrut commented Apr 28, 2020 •

edited