Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance AST dump #7

Closed
NalaGinrut opened this issue Apr 23, 2020 · 12 comments · Fixed by #2340
Closed

Enhance AST dump #7

NalaGinrut opened this issue Apr 23, 2020 · 12 comments · Fixed by #2340

Comments

@NalaGinrut
Copy link
Contributor

I'm working on an experimental branch to enhance -frust-dump-parse which is a little inspired by gccgo's AST dump.
For example:

 fn abc(x:u32, y:u32) -> u32 {
    return x+y;
 }
 
 fn main() {
     println!("Hello World!");
 }

The dumped result looks like this:

Crate: 
 inner attributes: none
 items: 
  
u32 abc(x : u32, y : u32)
 {
 BlockExpr: 
 outer attributes: none
 inner attributes: none
 statements: 
  ExprStmtWithoutBlock: return ArithmeticOrLogicalExpr: x + y
 final expression: none
 }

  
void main()
 {
 BlockExpr: 
 outer attributes: none
 inner attributes: none
 statements: 
  ExprStmtWithoutBlock: println!("Hello World!")
 final expression: none
 }

The branch is nala/enhance/ast-dump

Comments are welcome.

@philberty
Copy link
Member

I really like this output it makes it more in like with how -fdump-tree-gimple looks also

@SimplyTheOther
Copy link
Member

There's two main comments I'd like to make here:

  • Some of the non-dump-related code uses the as_string() method to convert some things into string form, such as paths (e.g. converting something::something_else to "something::something_else"). As such, if you convert all object dumping processes to output the same form, either the code requesting a string form would have to be changed (i.e. new method), or you might have to add a new method like to_dump_string() or something.

  • The main issue that I experienced with creating a nice-looking dump is that indentation levels are pretty much impossible to create with the current system of recursive as_string() calls, as the called object has no conception of the indentation level of the parent, which prevents any indentation from being preserved if the called object adds a new line to the string. You'd probably have to pretty heavily modify the dumping mechanism to do this - the best solution that I can currently think of would be a separate "Dump Creator" object that holds state on the current indentation level and the string so far, and a theoretical dump_string() method could add new lines to that object, and then the final string could be extracted from it at the end. This would probably be a lot of work and may involve unforeseen issues, however.

There's also one minor nitpick with the current example you have shown - strictly speaking, BlockExpr includes the curly braces surrounding it (as that is how BlockExpr is defined).

@NalaGinrut
Copy link
Contributor Author

@SimplyTheOther
I don't expect to make it perfect in a short time. Actually, I think we need a more readable output to help us to debug the very beginning syntax parsing, and this may help to attract more contributions. So we may improve it gradually. Of course, we could also rewrite the AST (iff it's not good enough) in the future for better dumping work. If the indentation is finally proved to be a problem, we may have to use an external indent tool to help us. After all, it's just for the very beginning debugging.

Thanks for the mention. I haven't read BlockExpr part. I only tried to improve the function and parameters for a prove of concept. I'll keep updating it.

@NalaGinrut
Copy link
Contributor Author

BTW, if you're going to create a PR for indentation, please consider merging this branch first to avoid too many conflicts, then indent the whole code. I'm sorry it's still half baked, but I don't think it will break existing things at least.

However, here's a thing I have to mention. I've patched the string literal token to add double-quote to the value. I think this should be done when lexing the string literal. Otherwise, there are no double quotes around string when we print them, say println!(Hello World!)

@NalaGinrut
Copy link
Contributor Author

I've updated for indentation, so far so good. However, the global variable has to be used since the current AST doesn't support the indentation well. But I think it's fine. We don't have to spend time optimizing AST dump. It's just for debugging support. Let's save time for type checking and IR transformation.

Crate: 
 inner attributes: none
 items: 
  
u32 abc(x : u32, y : u32)
BlockExpr:
{
 outer attributes: none
 inner attributes: none
 statements: 
 ExprStmtWithoutBlock: return ArithmeticOrLogicalExpr: x + y
 final expression: none
}

  
void main()
BlockExpr:
{
 outer attributes: none
 inner attributes: none
 statements: 
  ExprStmtWithBlock: 
   BlockExpr:
   {
    outer attributes: none
    inner attributes: none
    statements: 
    ExprStmtWithoutBlock: ArithmeticOrLogicalExpr: 1 + 1
    final expression: none
   }
 ExprStmtWithoutBlock: println!("Hello World!")
 final expression: none
}

@SimplyTheOther
Copy link
Member

@NalaGinrut @philberty Kind of off-topic, but do you think this is a good template to show others what we’re working on at the moment (i.e. create an issue)? Or is there a better way (e.g. draft pull requests, projects maybe)?

BTW, if you're going to create a PR for indentation, please consider merging this branch first to avoid too many conflicts, then indent the whole code. I'm sorry it's still half baked, but I don't think it will break existing things at least.

However, here's a thing I have to mention. I've patched the string literal token to add double-quote to the value. I think this should be done when lexing the string literal. Otherwise, there are no double quotes around string when we print them, say println!(Hello World!)

Depending on what exactly you mean by that (i.e. when you add the double quotes - at the lexer stage or later), that might break some of the code comparing the values of string literals (e.g. cfg attribute handling). I think it would be better to store the value of the string value only and then add the double quotes in the as_string() method, if that’s not what you’ve done already (since adding characters is easier than removing them).

@NalaGinrut
Copy link
Contributor Author

NalaGinrut commented Apr 28, 2020

@philberty @SimplyTheOther

  1. If you want to show AST dump to promote gccrs, I think it's at least an explicit and understandable work to common users. It shows the parser work for Rust syntax. Personally, I think it's OK.
    The only concern is the time. I don't think it can be complete for every corner case of Rust syntax in a short time. However, it's not good to keep it unmerged for a long time since it should be helpful for our next step development. How about this, I will raise a PR after a few tweaks. Then we can promote gccrs by indicating AST dump, and we can fix the indentation issue after this PR soon.

  2. Adding necessary double-quotes would be better, however, I didn't find the correct place in AST dump. I've tried some, but it doesn't work. I think it's fine. I can remove the lexer modification, and postpone this tiny fix.

@SimplyTheOther
Copy link
Member

@philberty @SimplyTheOther

1. If you want to show AST dump to promote gccrs, I think it's at least an explicit and understandable work to common users. It shows the parser work for Rust syntax. Personally, I think it's OK.
   The only concern is the time. I don't think it can be complete for every corner case of Rust syntax in a short time. However, it's not good to keep it unmerged for a long time since it should be helpful for our next step development. How about this, I will raise a PR after a few tweaks. Then we can promote gccrs by indicating AST dump, and we can fix the indentation issue after this PR soon.

I see your reasoning. I'm just a little wary about the possibility of constructing a system that is difficult to change afterwards and having to do massive refactoring at some point in the future. But yeah, right now a predominantly working parser is probably more important.

2. Adding necessary double-quotes would be better, however, I didn't find the correct place in AST dump. I've tried some, but it doesn't work. I think it's fine. I can remove the lexer modification, and postpone this tiny fix.

You would add the double quotes in the as_string() method in Literal in rust-ast.h if the literal type is a string (so you'd probably do a switch, since characters should also have single quotes too then). I would then also add a raw_value() method (or some other name) that returns the raw value_as_string so that comparisons with raw value don't have to strip the quotes off.

@NalaGinrut
Copy link
Contributor Author

NalaGinrut commented Apr 28, 2020

@philberty @SimplyTheOther

  1. Yes, although we can rethink many things at this stage, there're a lot of other works to do. I think we can polish in the future. If you were asking about the WIP template, I think the answer is yes. For larger work, it's better to raise an issue with WIP, and contributors may follow the related branch. I never used project, I don't know if it's proper for the case.

  2. For the string literal part, I have removed my modification in the PR. I may need some time to figure it out. We don't have to fix it in the same PR.

@NalaGinrut
Copy link
Contributor Author

Oh, I just tried projects, there's kanban, and we can drag related issues in. I think we can take advantage of it.

@philberty
Copy link
Member

I think PR's should be for things you want to get merged into Master when you think they are ready, otherwise in my experience PR's have a tendency to be neglected by reviewers.

For distributing work i started to create a github kanban board i am not sure if you guys have access to create/update on there if not i will take a look at fix that for you guys. I think we should use itas much as we can if a piece of work is of bigger scope or are not sure about putting on the kanban board lets create an issue to discuss it.

What do you think? I am happy to change things but lets try this for now for a few weeks.

@NalaGinrut
Copy link
Contributor Author

@philberty I think the Github kanban can be based on issues, say, you can just reference the issue number with a hash to reference, you don't even have to write any description. Then we can connect kanban and issues together.

However, if the new feature is working by just one person, the kanban could be ignored, the issue is enough. But if we want to attract more people to work together on the same sub-project, the kanban is still important.

So I think, at least in this stage, maybe kanban is not necessary, depends on you whether to use it. Personally, I'm glad to try. But issues should play an important role in any time.

@NalaGinrut NalaGinrut added the WIP label May 1, 2020
bors bot pushed a commit that referenced this issue Sep 30, 2021
As diagnosed with Jakub and Richard in the analysis of PR 102134, the
current implementation of wi::clz has incorrect/inconsistent behaviour.
As mentioned by Richard in comment #7, clz should (always) return zero
for negative values, but the current implementation can only return 0
when precision is a multiple of HOST_BITS_PER_WIDE_INT.  The fix is
simply to reorder/shuffle the existing tests.

2021-09-06  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* wide-int.cc (wi::clz): Reorder tests to ensure the result
	is zero for all negative values.
bors bot pushed a commit that referenced this issue Jan 25, 2022
…imize or target pragmas [PR103012]

The following testcases ICE when an optimize or target pragma
is followed by a long line (4096+ chars).
This is because on such long lines we can't use columns anymore,
but the cpp_define calls performed by c_cpp_builtins_optimize_pragma
or from the backend hooks for target pragma are done on temporary
buffers and expect to get columns from whatever line they appear on
(which happens to be the long line after optimize/target pragma),
and we run into:
 #0  fancy_abort (file=0x3abec67 "../../libcpp/line-map.c", line=502, function=0x3abecfc "linemap_add") at ../../gcc/diagnostic.c:1986
 #1  0x0000000002e7c335 in linemap_add (set=0x7ffff7fca000, reason=LC_RENAME, sysp=0, to_file=0x41287a0 "pr103012.i", to_line=3) at ../../libcpp/line-map.c:502
 #2  0x0000000002e7cc24 in linemap_line_start (set=0x7ffff7fca000, to_line=3, max_column_hint=128) at ../../libcpp/line-map.c:827
 #3  0x0000000002e7ce2b in linemap_position_for_column (set=0x7ffff7fca000, to_column=1) at ../../libcpp/line-map.c:898
 #4  0x0000000002e771f9 in _cpp_lex_direct (pfile=0x40c3b60) at ../../libcpp/lex.c:3592
 #5  0x0000000002e76c3e in _cpp_lex_token (pfile=0x40c3b60) at ../../libcpp/lex.c:3394
 #6  0x0000000002e610ef in lex_macro_node (pfile=0x40c3b60, is_def_or_undef=true) at ../../libcpp/directives.c:601
 #7  0x0000000002e61226 in do_define (pfile=0x40c3b60) at ../../libcpp/directives.c:639
 #8  0x0000000002e610b2 in run_directive (pfile=0x40c3b60, dir_no=0, buf=0x7fffffffd430 "__OPTIMIZE__ 1\n", count=14) at ../../libcpp/directives.c:589
 #9  0x0000000002e650c1 in cpp_define (pfile=0x40c3b60, str=0x2f784d1 "__OPTIMIZE__") at ../../libcpp/directives.c:2513
 #10 0x0000000002e65100 in cpp_define_unused (pfile=0x40c3b60, str=0x2f784d1 "__OPTIMIZE__") at ../../libcpp/directives.c:2522
 #11 0x0000000000f50685 in c_cpp_builtins_optimize_pragma (pfile=0x40c3b60, prev_tree=<optimization_node 0x7fffea042000>, cur_tree=<optimization_node 0x7fffea042020>)
     at ../../gcc/c-family/c-cppbuiltin.c:600
assertion that LC_RENAME doesn't happen first.

I think the right fix is emit those predefined macros upon
optimize/target pragmas with BUILTINS_LOCATION, like we already do
for those macros at the start of the TU, they don't appear in columns
of the next line after it.  Another possibility would be to force them
at the location of the pragma.

2021-12-30  Jakub Jelinek  <jakub@redhat.com>

	PR c++/103012
gcc/
	* config/i386/i386-c.c (ix86_pragma_target_parse): Perform
	cpp_define/cpp_undef calls with forced token locations
	BUILTINS_LOCATION.
	* config/arm/arm-c.c (arm_pragma_target_parse): Likewise.
	* config/aarch64/aarch64-c.c (aarch64_pragma_target_parse): Likewise.
	* config/s390/s390-c.c (s390_pragma_target_parse): Likewise.
gcc/c-family/
	* c-cppbuiltin.c (c_cpp_builtins_optimize_pragma): Perform
	cpp_define_unused/cpp_undef calls with forced token locations
	BUILTINS_LOCATION.
gcc/testsuite/
	PR c++/103012
	* g++.dg/cpp/pr103012.C: New test.
	* g++.target/i386/pr103012.C: New test.
@philberty philberty changed the title [WIP] Enhance AST dump Enhance AST dump Feb 22, 2022
CohenArthur referenced this issue in CohenArthur/gccrs Dec 2, 2022
In plenty of image and video processing code it's common to modify pixel values
by a widening operation and then scale them back into range by dividing by 255.

This patch adds an named function to allow us to emit an optimized sequence
when doing an unsigned division that is equivalent to:

   x = y / (2 ^ (bitsize (y)/2)-1)

For SVE2 this means we generate for:

void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n)
{
  for (int i = 0; i < (n & -16); i+=1)
    pixel[i] = (pixel[i] * level) / 0xff;
}

the following:

        mov     z3.b, #1
.L3:
        ld1b    z0.h, p0/z, [x0, x3]
        mul     z0.h, p1/m, z0.h, z2.h
        addhnb  z1.b, z0.h, z3.h
        addhnb  z0.b, z0.h, z1.h
        st1b    z0.h, p0, [x0, x3]
        inch    x3
        whilelo p0.h, w3, w2
        b.any   .L3

instead of:

.L3:
        ld1b    z0.h, p1/z, [x0, x3]
        mul     z0.h, p0/m, z0.h, z1.h
        umulh   z0.h, p0/m, z0.h, z2.h
        lsr     z0.h, z0.h, #7
        st1b    z0.h, p1, [x0, x3]
        inch    x3
        whilelo p1.h, w3, w2
        b.any   .L3

Which results in significantly faster code.

gcc/ChangeLog:

	* config/aarch64/aarch64-sve2.md (@aarch64_bitmask_udiv<mode>3): New.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/sve2/div-by-bitmask_1.c: New test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants