Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail fast and print error to console #1106

Open
honzajde opened this issue Oct 18, 2022 · 7 comments
Open

Fail fast and print error to console #1106

honzajde opened this issue Oct 18, 2022 · 7 comments

Comments

@honzajde
Copy link

I have been using this tool for a while but still I am not sure what is the best way to debug failing DSL expression. Issue is that the program is running like nothing happens only difference is (error) in a particular column and row.

I want to know what is that error.
What am I missing?

mlr help does not help
mlr help find does not help
mlr -v mlr --verbose deso not help (I was hoping it has a verbose mode)

As per https://miller.readthedocs.io/en/latest/reference-dsl-errors/

dump and print do not help me since it fails on an operation

There is perhaps a way to print en error's message when it's in place of column value - that could help...

However, It would be much easier to call mlr --fail-fast ... and see immediatelly.

@johnkerl
Copy link
Owner

johnkerl commented Oct 18, 2022

@honzajde I've felt the same for a while.

Since Miller was first created I've realized there are a few different ways to handle errors in data:

  • Continue with minimal error-cause information within the cell (what we have now)
  • Continue with more error-cause information, perhaps to stderr
  • Fail immediately with more error-cause information

All of these three modes are legitimate and all have times they might be desired. But I have not yet made error-handling configurable. So the most I've ever done is pick which of the three modes to be the default -- the first one -- and implement only that.

I still think the first mode is a legit default -- people deal with lots of dirty data and I don't want the default behavior to be "crash the program". But as you rightly point out, when people do want a "crash the program" option, they should be able to get it. Currently they cannot. :(

@honzajde
Copy link
Author

@johnkerl I understand the default choice in error handling and I mostly appreciate it.

On the other hand, sometimes I work with data that is messy and I don't know it's shape up-front and I get to a situation when I see an error in the result and I know this is one of those "data vs assumptions" bugs, where seeing the failing case means everything.

Apart from failing fast:

  1. As of now, there is no way to get the error message printed, right?
  2. Am I somehow able to get the error message in any other way?

@johnkerl
Copy link
Owner

@honzajde your use-case is really important, you're spot-on, and I've underserved this use-case

for your (1) & (2) -- that is correct :(

A workaround for the present is to use the REPL where you can essentially set a breakpoint at line number whatever of your file and evaluate things interactively

@honzajde
Copy link
Author

I am also surprised that this compiles:

...
  size_pure=gssub(size, ",", "");
  $size=fmtnum(size_pure, "%7.2f")." ".unit;

And it should not....
fmtnum() requires float or int and here it takes string

And maybe (hopefully!) my last question, why Miller does not have printf (for strings) while it has fmtnum?

BTW. Miller DSL is really neat, I would just wish it had everything like a programming language;) Thanks for all this work, anyways!

@johnkerl
Copy link
Owner

johnkerl commented Oct 19, 2022

I am also surprised that this compiles:

That's just it ... the expression is syntactically OK; the Miller DSL is not strongly typed so it "compiles" (parses into a concrete syntax tree); fmtnum($column, "%7.2f") might or might not be OK depending on what $column is row by row (at runtime, post-parse); the error-handling is "option 1 of 3" using the terminology from the top of this post. Which does need the options 2 & 3 you proposed --as you noted, this is one of those "data vs assumptions" bugs, where seeing the failing case means everything.

why Miller does not have printf (for strings) while it has fmtnum?

I felt like a full implementation of printf felt like it would take a lot of cases and parsing to get right; fmtnum and format seemed like enough. That said, if fmtnum takes one format string and printf takes n format strings, perhaps it wouldn't be too much work ... 🤔

I would just wish it had everything like a programming language

Indeed! 😁 There's a bit more here: A note on the complexity of Miller's expression language. Originally I had the choice to maybe embed Lua or something as a DSL language, but ultimately chose to start a language from scratch one feature at a time. There weren't even for-loops until Miller 4. Miller 5 was a big jump featurewise getting the Miller DSL closer to a full-blown programming language -- but there's a lot I don't want to do ever do like classes, modules, etc etc -- it would be better to embed an existing language, than to grow the Miller DSL that far -- existing full-blown languages do a much better job of being full-blown programming languages. I do think though that aiming for the level of what awk does is achievable ... and I think a printf fits well within that goal ...

Thanks for all this work, anyways!

You're welcome! :)

In summary: options 2 & 3 for error-handling and a printf are very much in the remit. I'm in a heavy mode with my day job these last few months and commits to Miller have a bit thin on the vine but these changes should be made, and will be.

@honzajde
Copy link
Author

fmtnum($column, "%7.2f") might or might not be OK

But fmtnum(gssub($columns, ",", ""), "%7.2f") is never OK, but I think I get your point - meaning to say that many times the type-checking is ineffective due to data being always type Any.

I felt like a full implementation of printf felt like it would take a lot of cases and parsing to get right

Actually, I should have asked rather about fmtstring then printf:) All I need is fmtstr - version of fmtnum for string(s)...
BTW, I didn't get your previous note about format - how it could do what fmtstr would do...

Originally I had the choice to maybe embed Lua or something as a DSL language...

You also mention nim as being a candidate, not clear to me why nim had been excluded for the DSL.
Would it be possible write same terse DSL as Miller is, I heard that the nim DSLs get type-checking for free...(?)
Sounds too good to be true:) I don't know much about nim...

I do think though that aiming for the level of what awk does is achievable...

Miller is doing more then awk already...
One feature that I desire is syntax highlighting in my editor... Not a good news:), I know!

@johnkerl
Copy link
Owner

@honzajde can you take a look at #1373?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants