-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug, lake] pl.Dataframe constructor - 'float' object cannot be interpreted as an integer #657
Comments
Hi @trentmc, I'm a bit worried that somehow a timestamp from ccxt is showing up as a float, rather than integer. I created a couple of tests:
When implementing the fix in (2). When I try to pass raw_tolhcv_data w/ a float to clean_raw_ohlcv() in the test, I get the error... hopefully I can reproduce it successfully, and then fix it. |
Looks like you're not passing a What you need to test is: if you pass a I wouldn't be surprised if sometimes an exchange gives timestamps as floats. Remember that sometimes we get NaNs for o, h, l, c or v values, and we work around those. So we'd also identify exactly what it's doing with timestamps, and figure out a workaround. But first we need to get to the bottom of the issue more. To get to the bottom of the issue more, reproduce getting the following data. BTC/USDT on Binance. That's what it was trying right before the traceback.
|
I created a temporary script to inspect the data and determine if there is a float-typed timestamp. It appears to be clear. Could this be a temporary issue with their API?
|
Thanks for the feedback @trentmc and for fixing my issue @kdetry, it was EoD and I was tired. My first step was to hard reset, nuke lake, and then configure ppss.yaml to do the same fetch.
However, the data I returned from ccxt was clean. There were no floats out of place, and all ohlcv values were valid (not null). My guess is that CCXT is provides info straight from the cex API, and if I remember correctly, CEX APIs sometimes yield wrong data which then gets patched (addressing mustafa's findings and mine). What I'm thinking is that we can't repro the issue because it will always be temporary at-the-CEX-API level. We either (1) enforce validation through coercion of expected types, value ranges, and expected data provided by ccxt/cex. Even better than the call stack, would be to have those bad record logged somewhere (3) so we can further inspect these issues we're seeing and address them accordingly. |
After reviewing PRs & tasks today, we discussed:
Based on this, I propose we: |
I ran across this error again. And I'm able to reproduce it now:) Gonna try to fix it. Run, with error:
Run, with breakpoint, to see input values:
At the top of the call stack is:
The values there are:
If name=timestamp and it thinks all these values are supposed to be timestamps, then that's the issue! Because only the first value is int, and only the first value has a reasonable value for timestamp. The other values are clearly BTC price related (ohlcv). |
I got to the bottom of the issue. I discovered other issues in polars repo that had encountered similar errors:
From the discussions there, I realized that the issue may be:
Supporting evidence: what I observed above:
So I created three separate unit tests, and observed the behavior:
From this, it's clear that we need to set orient="row" everywhere we create ohlcv dfs from data. I put that fix in accordingly, into the PR. |
Fix #657: [Bug, lake] pl.Dataframe constructor - 'float' object cannot be interpreted as an integer * Write unit tests that capture the base issue * Update unit tests to expose orient=infer/col/row behavior. They show: infer can fail sometimes, col always fails, row always passes * Update DataFrame() constructor calls to always explicitly set orient="row"
Where encountered
With this setup: my_ppss.yaml. Key params: predict BTC, 5m; approach 3; just BTC c input.
In
main
branch.I ran:
pdr predictoor 3 my_ppss.yaml sapphire-mainnet
.After about 3h runtime, I got an error:
Full log
out.txt
Full Traceback
The text was updated successfully, but these errors were encountered: