Fail fast and print error to console #1106

honzajde · 2022-10-18T16:26:42Z

I have been using this tool for a while but still I am not sure what is the best way to debug failing DSL expression. Issue is that the program is running like nothing happens only difference is (error) in a particular column and row.

I want to know what is that error.
What am I missing?

mlr help does not help
mlr help find does not help
mlr -v mlr --verbose deso not help (I was hoping it has a verbose mode)

As per https://miller.readthedocs.io/en/latest/reference-dsl-errors/

dump and print do not help me since it fails on an operation

There is perhaps a way to print en error's message when it's in place of column value - that could help...

However, It would be much easier to call mlr --fail-fast ... and see immediatelly.

The text was updated successfully, but these errors were encountered:

johnkerl · 2022-10-18T19:31:24Z

@honzajde I've felt the same for a while.

Since Miller was first created I've realized there are a few different ways to handle errors in data:

Continue with minimal error-cause information within the cell (what we have now)
Continue with more error-cause information, perhaps to stderr
Fail immediately with more error-cause information

All of these three modes are legitimate and all have times they might be desired. But I have not yet made error-handling configurable. So the most I've ever done is pick which of the three modes to be the default -- the first one -- and implement only that.

I still think the first mode is a legit default -- people deal with lots of dirty data and I don't want the default behavior to be "crash the program". But as you rightly point out, when people do want a "crash the program" option, they should be able to get it. Currently they cannot. :(

honzajde · 2022-10-18T20:57:26Z

@johnkerl I understand the default choice in error handling and I mostly appreciate it.

On the other hand, sometimes I work with data that is messy and I don't know it's shape up-front and I get to a situation when I see an error in the result and I know this is one of those "data vs assumptions" bugs, where seeing the failing case means everything.

Apart from failing fast:

As of now, there is no way to get the error message printed, right?
Am I somehow able to get the error message in any other way?

johnkerl · 2022-10-18T20:59:47Z

@honzajde your use-case is really important, you're spot-on, and I've underserved this use-case

for your (1) & (2) -- that is correct :(

A workaround for the present is to use the REPL where you can essentially set a breakpoint at line number whatever of your file and evaluate things interactively

honzajde · 2022-10-19T12:10:35Z

I am also surprised that this compiles:

...
  size_pure=gssub(size, ",", "");
  $size=fmtnum(size_pure, "%7.2f")." ".unit;

And it should not....
fmtnum() requires float or int and here it takes string

And maybe (hopefully!) my last question, why Miller does not have printf (for strings) while it has fmtnum?

BTW. Miller DSL is really neat, I would just wish it had everything like a programming language;) Thanks for all this work, anyways!

johnkerl · 2022-10-19T14:30:22Z

I am also surprised that this compiles:

That's just it ... the expression is syntactically OK; the Miller DSL is not strongly typed so it "compiles" (parses into a concrete syntax tree); fmtnum($column, "%7.2f") might or might not be OK depending on what $column is row by row (at runtime, post-parse); the error-handling is "option 1 of 3" using the terminology from the top of this post. Which does need the options 2 & 3 you proposed --as you noted, this is one of those "data vs assumptions" bugs, where seeing the failing case means everything.

why Miller does not have printf (for strings) while it has fmtnum?

I felt like a full implementation of printf felt like it would take a lot of cases and parsing to get right; fmtnum and format seemed like enough. That said, if fmtnum takes one format string and printf takes n format strings, perhaps it wouldn't be too much work ... 🤔

I would just wish it had everything like a programming language

Indeed! 😁 There's a bit more here: A note on the complexity of Miller's expression language. Originally I had the choice to maybe embed Lua or something as a DSL language, but ultimately chose to start a language from scratch one feature at a time. There weren't even for-loops until Miller 4. Miller 5 was a big jump featurewise getting the Miller DSL closer to a full-blown programming language -- but there's a lot I don't want to do ever do like classes, modules, etc etc -- it would be better to embed an existing language, than to grow the Miller DSL that far -- existing full-blown languages do a much better job of being full-blown programming languages. I do think though that aiming for the level of what awk does is achievable ... and I think a printf fits well within that goal ...

Thanks for all this work, anyways!

You're welcome! :)

In summary: options 2 & 3 for error-handling and a printf are very much in the remit. I'm in a heavy mode with my day job these last few months and commits to Miller have a bit thin on the vine but these changes should be made, and will be.

honzajde · 2022-10-22T14:55:59Z

fmtnum($column, "%7.2f") might or might not be OK

But fmtnum(gssub($columns, ",", ""), "%7.2f") is never OK, but I think I get your point - meaning to say that many times the type-checking is ineffective due to data being always type Any.

I felt like a full implementation of printf felt like it would take a lot of cases and parsing to get right

Actually, I should have asked rather about fmtstring then printf:) All I need is fmtstr - version of fmtnum for string(s)...
BTW, I didn't get your previous note about format - how it could do what fmtstr would do...

Originally I had the choice to maybe embed Lua or something as a DSL language...

You also mention nim as being a candidate, not clear to me why nim had been excluded for the DSL.
Would it be possible write same terse DSL as Miller is, I heard that the nim DSLs get type-checking for free...(?)
Sounds too good to be true:) I don't know much about nim...

I do think though that aiming for the level of what awk does is achievable...

Miller is doing more then awk already...
One feature that I desire is syntax highlighting in my editor... Not a good news:), I know!

johnkerl · 2023-08-30T23:40:25Z

@honzajde can you take a look at #1373?

johnkerl added the feature-request label Oct 18, 2022

johnkerl mentioned this issue Jun 7, 2023

Bug when replacing a dot #1310

Closed

johnkerl added on deck active and removed on deck labels Aug 26, 2023

johnkerl mentioned this issue Aug 27, 2023

Fatal-on-data-error mlr -x option #1373

Merged

johnkerl self-assigned this Aug 30, 2023

johnkerl added pending feedback to close and removed active labels Aug 30, 2023

Porkepix mentioned this issue Aug 31, 2023

miller 6.9.0 Homebrew/homebrew-core#141012

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fail fast and print error to console #1106

Fail fast and print error to console #1106

honzajde commented Oct 18, 2022

johnkerl commented Oct 18, 2022 •

edited

honzajde commented Oct 18, 2022

johnkerl commented Oct 18, 2022

honzajde commented Oct 19, 2022

johnkerl commented Oct 19, 2022 •

edited

honzajde commented Oct 22, 2022

johnkerl commented Aug 30, 2023

Fail fast and print error to console #1106

Fail fast and print error to console #1106

Comments

honzajde commented Oct 18, 2022

johnkerl commented Oct 18, 2022 • edited

honzajde commented Oct 18, 2022

johnkerl commented Oct 18, 2022

honzajde commented Oct 19, 2022

johnkerl commented Oct 19, 2022 • edited

honzajde commented Oct 22, 2022

johnkerl commented Aug 30, 2023

johnkerl commented Oct 18, 2022 •

edited

johnkerl commented Oct 19, 2022 •

edited