-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expression/context evaluation bug #17057
Labels
bug
Something isn't working
incomplete
Incomplete issue: needs MWE
needs triage
Awaiting prioritization by a maintainer
python
Related to Python Polars
Comments
Prior to the traceback the tail of the log file shows: RUN STREAMING PIPELINE
[csv -> filter -> hstack -> generic-group_by -> callback -> filter -> ordered_sink, csv -> filter -> hstack -> generic_join_build]
STREAMING CHUNK SIZE: 3571 rows
STREAMING CHUNK SIZE: 7142 rows
OOC group_by started
OOC group_by started
OOC group_by started
OOC group_by started
OOC group_by started
OOC group_by started
OOC group_by started
OOC group_by started
process partition 0 during generic-group_by-source
process partition 1 during generic-group_by-source
process partition 2 during generic-group_by-source
process partition 3 during generic-group_by-source
process partition 4 during generic-group_by-source
process partition 5 during generic-group_by-source
process partition 6 during generic-group_by-source
process partition 7 during generic-group_by-source
process partition 8 during generic-group_by-source
process partition 9 during generic-group_by-source
process partition 10 during generic-group_by-source
process partition 11 during generic-group_by-source
process partition 12 during generic-group_by-source
process partition 13 during generic-group_by-source
process partition 14 during generic-group_by-source
process partition 15 during generic-group_by-source
process partition 16 during generic-group_by-source
process partition 17 during generic-group_by-source
process partition 18 during generic-group_by-source
process partition 19 during generic-group_by-source
process partition 20 during generic-group_by-source
process partition 21 during generic-group_by-source
process partition 22 during generic-group_by-source
process partition 23 during generic-group_by-source
process partition 24 during generic-group_by-source
process partition 25 during generic-group_by-source
process partition 26 during generic-group_by-source
process partition 27 during generic-group_by-source
process partition 28 during generic-group_by-source
process partition 29 during generic-group_by-source
process partition 30 during generic-group_by-source
process partition 31 during generic-group_by-source
process partition 32 during generic-group_by-source
process partition 33 during generic-group_by-source
process partition 34 during generic-group_by-source
process partition 35 during generic-group_by-source
process partition 36 during generic-group_by-source
process partition 37 during generic-group_by-source
process partition 38 during generic-group_by-source
process partition 39 during generic-group_by-source
process partition 40 during generic-group_by-source
process partition 41 during generic-group_by-source
process partition 42 during generic-group_by-source
process partition 43 during generic-group_by-source
process partition 44 during generic-group_by-source
process partition 45 during generic-group_by-source
process partition 46 during generic-group_by-source
process partition 47 during generic-group_by-source
process partition 48 during generic-group_by-source
process partition 49 during generic-group_by-source
process partition 50 during generic-group_by-source
process partition 51 during generic-group_by-source
process partition 52 during generic-group_by-source
process partition 53 during generic-group_by-source
process partition 54 during generic-group_by-source
process partition 55 during generic-group_by-source
process partition 56 during generic-group_by-source
process partition 57 during generic-group_by-source
process partition 58 during generic-group_by-source
process partition 59 during generic-group_by-source
process partition 60 during generic-group_by-source
process partition 61 during generic-group_by-source
process partition 62 during generic-group_by-source
process partition 63 during generic-group_by-source
RUN STREAMING PIPELINE
[csv -> filter -> hstack -> generic-group_by -> callback -> filter -> ordered_sink, csv -> filter -> hstack -> generic_join_build]
STREAMING CHUNK SIZE: 3571 rows
STREAMING CHUNK SIZE: 7142 rows
OOC group_by started
OOC group_by started
OOC group_by started
OOC group_by started
OOC group_by started
OOC group_by started
OOC group_by started
OOC group_by started
process partition 0 during generic-group_by-source
process partition 1 during generic-group_by-source
process partition 2 during generic-group_by-source
process partition 3 during generic-group_by-source
process partition 4 during generic-group_by-source
process partition 5 during generic-group_by-source
process partition 6 during generic-group_by-source
process partition 7 during generic-group_by-source
process partition 8 during generic-group_by-source
process partition 9 during generic-group_by-source
process partition 10 during generic-group_by-source
process partition 11 during generic-group_by-source
process partition 12 during generic-group_by-source
process partition 13 during generic-group_by-source
process partition 14 during generic-group_by-source
process partition 15 during generic-group_by-source
process partition 16 during generic-group_by-source
process partition 17 during generic-group_by-source
process partition 18 during generic-group_by-source
process partition 19 during generic-group_by-source
process partition 20 during generic-group_by-source
process partition 21 during generic-group_by-source
process partition 22 during generic-group_by-source
process partition 23 during generic-group_by-source
process partition 24 during generic-group_by-source
process partition 25 during generic-group_by-source
process partition 26 during generic-group_by-source
process partition 27 during generic-group_by-source
process partition 28 during generic-group_by-source
process partition 29 during generic-group_by-source
process partition 30 during generic-group_by-source
process partition 31 during generic-group_by-source
process partition 32 during generic-group_by-source
process partition 33 during generic-group_by-source
process partition 34 during generic-group_by-source
process partition 35 during generic-group_by-source
process partition 36 during generic-group_by-source
process partition 37 during generic-group_by-source
process partition 38 during generic-group_by-source
process partition 39 during generic-group_by-source
process partition 40 during generic-group_by-source
process partition 41 during generic-group_by-source
process partition 42 during generic-group_by-source
process partition 43 during generic-group_by-source
process partition 44 during generic-group_by-source
process partition 45 during generic-group_by-source
process partition 46 during generic-group_by-source
process partition 47 during generic-group_by-source
process partition 48 during generic-group_by-source
process partition 49 during generic-group_by-source
process partition 50 during generic-group_by-source
process partition 51 during generic-group_by-source
process partition 52 during generic-group_by-source
process partition 53 during generic-group_by-source
process partition 54 during generic-group_by-source
process partition 55 during generic-group_by-source
process partition 56 during generic-group_by-source
process partition 57 during generic-group_by-source
process partition 58 during generic-group_by-source
process partition 59 during generic-group_by-source
process partition 60 during generic-group_by-source
process partition 61 during generic-group_by-source
process partition 62 during generic-group_by-source
process partition 63 during generic-group_by-source |
When found multiple sources; run comm_subplan_elim
UNION: `parallel=false` union is run sequentially
join parallel: false
join parallel: false
read files in parallel
avg line length: 67.58008
std. dev. line length: 6.988767
initial row estimate: 2131851
no. of chunks: 8 processed by: 8 threads.
CACHE SET: cache id: 0
CACHE HIT: cache id: 0
estimated unique values: 770990
estimated unique count: 770990 exceeded the boundary: 1000, running default HASH AGGREGATION
INNER join triggered a rechunk of the right DataFrame: 3 columns are affected
INNER join dataframes finished
dataframe filtered
LEFT join dataframes finished
join parallel: false
join parallel: false
read files in parallel
avg line length: 71.887695
std. dev. line length: 3.7583435
initial row estimate: 2079664
no. of chunks: 8 processed by: 8 threads.
CACHE SET: cache id: 1
CACHE HIT: cache id: 1
estimated unique values: 512530
estimated unique count: 512530 exceeded the boundary: 1000, running default HASH AGGREGATION
INNER join triggered a rechunk of the right DataFrame: 3 columns are affected
INNER join dataframes finished
dataframe filtered
LEFT join dataframes finished
join parallel: false
join parallel: false
read files in parallel
avg line length: 71.74707
std. dev. line length: 4.2742543
initial row estimate: 2028932
no. of chunks: 8 processed by: 8 threads.
CACHE SET: cache id: 2
CACHE HIT: cache id: 2
estimated unique values: 637402
estimated unique count: 637402 exceeded the boundary: 1000, running default HASH AGGREGATION
INNER join triggered a rechunk of the right DataFrame: 3 columns are affected
INNER join dataframes finished
dataframe filtered
LEFT join dataframes finished
join parallel: false
join parallel: false
read files in parallel
avg line length: 73.89453
std. dev. line length: 2.0176597
initial row estimate: 1954295
no. of chunks: 8 processed by: 8 threads.
CACHE SET: cache id: 3
CACHE HIT: cache id: 3
estimated unique values: 736951
estimated unique count: 736951 exceeded the boundary: 1000, running default HASH AGGREGATION
INNER join triggered a rechunk of the right DataFrame: 3 columns are affected
INNER join dataframes finished
dataframe filtered
LEFT join dataframes finished
join parallel: false
join parallel: false
read files in parallel
avg line length: 72.50195
std. dev. line length: 2.815622
initial row estimate: 2055499
no. of chunks: 8 processed by: 8 threads.
CACHE SET: cache id: 4
CACHE HIT: cache id: 4
estimated unique values: 403573
estimated unique count: 403573 exceeded the boundary: 1000, running default HASH AGGREGATION
INNER join triggered a rechunk of the right DataFrame: 3 columns are affected
INNER join dataframes finished
dataframe filtered
LEFT join dataframes finished
join parallel: false
join parallel: false
read files in parallel
avg line length: 65.9375
std. dev. line length: 4.904956
initial row estimate: 2201427
no. of chunks: 8 processed by: 8 threads.
CACHE SET: cache id: 5
CACHE HIT: cache id: 5
estimated unique values: 764336
estimated unique count: 764336 exceeded the boundary: 1000, running default HASH AGGREGATION
INNER join triggered a rechunk of the right DataFrame: 3 columns are affected
INNER join dataframes finished
dataframe filtered
LEFT join dataframes finished
join parallel: false
join parallel: false
read files in parallel
avg line length: 70.17578
std. dev. line length: 4.5201983
initial row estimate: 2064369
no. of chunks: 8 processed by: 8 threads.
CACHE SET: cache id: 6
CACHE HIT: cache id: 6
estimated unique values: 465196
estimated unique count: 465196 exceeded the boundary: 1000, running default HASH AGGREGATION
INNER join triggered a rechunk of the right DataFrame: 3 columns are affected
INNER join dataframes finished
dataframe filtered
LEFT join dataframes finished |
On
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
bug
Something isn't working
incomplete
Incomplete issue: needs MWE
needs triage
Awaiting prioritization by a maintainer
python
Related to Python Polars
Checks
Reproducible example
Unable to produce a minimal example yet.
Log output
Issue description
Please forgive me as I was unable to design a minimal reproducible example. I am still working on it.
I essentially have a
LazyFrame
which I am trying to do the following on:This fails with the error pasted above, despite
df.colllect()
completing successfully if run immediately before and producing a column calledname
(the string value was double checked by printingdf.collect().columns: ['name', ...]
).A minimal example where this works:
I've tried running my code with a single row, I've tried the beta version of polars and the issue persists.
Here are the function calls involving this dataframe since the last
collect()
, I hope this is somewhat clear and helpful:The reason I am reporting this as a bug is because if
df = df.collect().lazy()
is called immediately prior to mywith_columns
operations, everything works.Expected behavior
The code should complete successfully and split the strings in name into 3 columns as indicated.
Installed versions
The text was updated successfully, but these errors were encountered: