Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fill_null() does not cast ALL Null values to other dtype(Nested Dataclass) #17268

Open
2 tasks done
starzar opened this issue Jun 28, 2024 · 1 comment
Open
2 tasks done
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@starzar
Copy link

starzar commented Jun 28, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

cot_df_total =  pl.DataFrame()
cot_df_cur = pl.DataFrame(data).fill_null("zero")
        print("cot_df_cur.schema")
        print(cot_df_cur.schema)



        cot_df_total = pl.concat([cot_df_total,cot_df_cur],how="vertical")
        ticker_counter += 1


    return cot_df_total.with_row_index()

Log output

C:\Users\User_0\AppData\Local\Programs\Python\Python312\python.exe C:\Users\User_0\Documents\Code\Python\NseScraping\Others\CotReport\cotParse1.py 
cot_df_cur.schema
OrderedDict({'Ticker': String, 'CotReport_data': Struct({'Dealer': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Int64, 'short': Int64, 'spread': Int64})}), 'AssetManager': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Int64, 'short': Int64, 'spread': Int64})}), 'LeveragedFunds': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Int64, 'short': Int64, 'spread': Int64})}), 'OtherReportables': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Int64, 'short': Int64, 'spread': Int64})}), 'NonReportables': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Null, 'short': Null, 'spread': Null})})})})
cot_df_cur.schema
OrderedDict({'Ticker': String, 'CotReport_data': Struct({'Dealer': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Int64, 'short': Int64, 'spread': Int64})}), 'AssetManager': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Int64, 'short': Int64, 'spread': Int64})}), 'LeveragedFunds': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Int64, 'short': Int64, 'spread': Int64})}), 'OtherReportables': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Null, 'short': Null, 'spread': Null})}), 'NonReportables': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Null, 'short': Null, 'spread': Null})})})})
cot_df_cur.schema
OrderedDict({'Ticker': String, 'CotReport_data': Struct({'Dealer': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Int64, 'short': Int64, 'spread': Int64})}), 'AssetManager': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Int64, 'short': Int64, 'spread': Int64})}), 'LeveragedFunds': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Int64, 'short': Int64, 'spread': Int64})}), 'OtherReportables': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Int64, 'short': Null, 'spread': Null})}), 'NonReportables': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Null, 'short': Null, 'spread': Null})})})})
cot_df_cur.schema
OrderedDict({'Ticker': String, 'CotReport_data': Struct({'Dealer': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Int64, 'short': Int64, 'spread': Int64})}), 'AssetManager': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Int64, 'short': Int64, 'spread': Int64})}), 'LeveragedFunds': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Int64, 'short': Int64, 'spread': Int64})}), 'OtherReportables': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Int64, 'short': Int64, 'spread': Int64})}), 'NonReportables': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Null, 'short': Null, 'spread': Null})})})})
cot_df_cur.schema
OrderedDict({'Ticker': String, 'CotReport_data': Struct({'Dealer': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Int64, 'short': Int64, 'spread': Int64})}), 'AssetManager': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Int64, 'short': Int64, 'spread': Int64})}), 'LeveragedFunds': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Int64, 'short': Int64, 'spread': Int64})}), 'OtherReportables': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Int64, 'short': Int64, 'spread': Int64})}), 'NonReportables': Struct({'position': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'oiPct': Struct({'long': Float64, 'short': Float64, 'spread': Float64}), 'weeklyChange': Struct({'long': Int64, 'short': Int64, 'spread': Int64}), 'traders': Struct({'long': Int64, 'short': Null, 'spread': Null})})})})
Traceback (most recent call last):
  File "C:\Users\User_0\Documents\Code\Python\NseScraping\Others\CotReport\cotParse1.py", line 300, in <module>
    cot_to_html()
  File "C:\Users\User_0\Documents\Code\Python\NseScraping\Others\CotReport\cotParse1.py", line 295, in cot_to_html
    cotTotal_df = txt_to_df(filepath)
                  ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User_0\Documents\Code\Python\NseScraping\Others\CotReport\cotParse1.py", line 184, in txt_to_df
    cot_df_total = pl.concat([cot_df_total,cot_df_cur],how="vertical")
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User_0\AppData\Local\Programs\Python\Python312\Lib\site-packages\polars\functions\eager.py", line 184, in concat
    out = wrap_df(plr.concat_df(elems))
                  ^^^^^^^^^^^^^^^^^^^^
polars.exceptions.SchemaError: type Int64 is incompatible with expected type Null

Process finished with exit code 1

Issue description

https://drive.google.com/file/d/1gaLFuy6QyQNNE32eLFTz-HuAeX974wvw/view?usp=sharing

Fill_null() does not cast ALL Null values to other dtype(Nested Dataclass).
Unnesting and casting null to other dtypes results in loss of "key" column names as the value columns.

Any way to get a complete fill_null() on nested dataframes without unnesting?

Expected behavior

All values should be filled with "zero" for pl.DataFrame(data).fill_null("zero")

Installed versions

polars 0.20.31
@starzar starzar added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Jun 28, 2024
@cmdlineluser
Copy link
Contributor

Can you provide data to make your example reproducible?

As for the particular error:

polars.exceptions.SchemaError: type Int64 is incompatible with expected type Null

The vertical_relaxed strategy may be of help:

cot_df_total = pl.concat([cot_df_total,cot_df_cur],how="vertical_relaxed")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

2 participants