Better error handling for errors in the execute() stage #191
Comments
So I think that But I see a few sets of problems that we can solve:
I think (1) is mechanical -- easy enough to add better debugging messages, catching errors along the way. (2) is probably done best with Any chance you have an example of the type of unclear message you're referencing here? |
@ropeladder to be test driven about this what other tests would be relevant to add: Should fail & have clear error messages:
Should not fail -- but maybe should warn?
Anything else? |
@skrawcz: These probably fall under different index types, but other specific things to check for would be:
@elijahbenizzy: On (3) it's definitely worth clarifying this in the code and in the docs. I was blissfully trying to output series of different frequencies until it occurred to me that the output was all going into the same dataframe. At which point I just separated my I'll try and post some examples tomorrow. |
Interesting. 🤔 Just to clarify do you want one dataframe (with a unified index?) Or separate dataframes? The Hamilton default assumes you want to create a single dataframe when you do |
The output is workable as is, and I think using tags it could be a bit cleaner than what I've been doing. On the other hand, having Error example: (Note: it only seems to happen intermittently)
with:
Errors: (when it doesn't run)
Warnings: (when it does run it still gives this warning)
|
One thing to consider (not sure if you've seen this) is |
Thanks for the tests. So in your example, naively creating a dataframe from the initial columns does work but the result isn't useful: df = pd.DataFrame(initial_columns) outputs:
What should happen here? Adding a warning here is pretty easy as is erroring. I guess the question is, do we trust Panda's index coalescing? Since it does seem to work (e.g. int -> float), but for time related ones, less so as in this example... |
Maybe a warning that listed all the index "groups" (e.g. series indices that had any overlap with over series indices). I could also see an option to output different index "groups" as separate dataframes being helpful. This is partly why I mentioned more elaborate type checking initially as an alternative -- because you'd be forced to be more explicit about what indices you expect, so e.g. hamilton could tell you straightaway that you're trying to output a dataframe with incompatible indices. The way I'm currently using hamilton is to try converting some legacy code from a different language, that has outputs that use a bunch of different frequencies. I'm finding that instead of just going through and creating one output where I can get any of the values that I want via column name, I have to divide them into subgroups based on their index types (and possibly date ranges, where joining the columns into a DataFrame adds in NaNs that I don't want). In other words, |
Also: I just ran into another example of an error when (unthinkingly) trying to
(The fact that it errors is fine, it's just that it doesn't say anything about the columns that cause it and the traceback just goes to |
Okay my goal is to have something by the end of this week barring no major disruptions that will minimally:
|
Related to issue #191, this commit is to help surface index type issues. Specifically: 1. Warn if there are index type mismatches. 2. Require you to set your logger to debug if you want to see more details. 3. Provide a "ResultBuilder" class that uses strict index type matching so if you want to error on index type mismatches, this is the results builder to use. I don't think we should build anything more custom unless there's a clear common use case - user contributed result builders sound like an interesting idea.
Related to issue #191, this commit is to help surface index type issues. Specifically: 1. Warn if there are index type mismatches. 2. Require you to set your logger to debug if you want to see more details. 3. Provide a "ResultBuilder" class that uses strict index type matching so if you want to error on index type mismatches, this is the results builder to use. I don't think we should build anything more custom unless there's a clear common use case - user contributed result builders sound like an interesting idea.
Related to issue #191, this commit is to help surface index type issues. Specifically: 1. Warn if there are index type mismatches. 2. Require you to set your logger to debug if you want to see more details. 3. Provide a "ResultBuilder" class that uses strict index type matching so if you want to error on index type mismatches, this is the results builder to use. I don't think we should build anything more custom unless there's a clear common use case - user contributed result builders sound like an interesting idea.
@ropeladder I'm going to close this since with |
Is your feature request related to a problem? Please describe.
When my code runs into errors in the
execute()
stage, the traceback just goes to theexecute()
function, and doesn't provide any useful feedback about the provenance of the error. For example, if my DAG feeds a series into another series and they have incompatible indexes, I get an index error but I can't tell which series caused.Describe the solution you'd like
Just passing back the names of the relevant series with the error would helpful.
Describe alternatives you've considered
More involved type checking when constructing the series functions and then when building the DAG could potentially eliminate some of the errors (probably not all though).
The text was updated successfully, but these errors were encountered: