Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read data with Float32 and Float64 have different outputs #17082

Closed
2 tasks done
balakhaniyan opened this issue Jun 20, 2024 · 3 comments
Closed
2 tasks done

Read data with Float32 and Float64 have different outputs #17082

balakhaniyan opened this issue Jun 20, 2024 · 3 comments
Labels
bug Something isn't working python Related to Python Polars

Comments

@balakhaniyan
Copy link

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

pl.concat([a, b.rename({'data': 'data2'})], how='horizontal').sort('data').with_columns(pl.all().map_elements(str, return_dtype=pl.String))

Log output

No response

Issue description

difference_in_read_data.xlsx

I read this file, using

a = pl.read_excel('difference_in_read_data.xlsx', read_options={
    'schema_overrides': {'data': pl.Float64}
})

b = pl.read_excel('difference_in_read_data.xlsx', read_options={
    'schema_overrides': {'data': pl.Float32}
})

and ran the code snippet

Expected behavior

I assumed that both must have same output regardless the type (Float32 or Float64) because there are Integers and there are no differences for Float32 and Float64

Installed versions

--------Version info---------
Polars:               0.20.31
Index type:           UInt32
Platform:             Windows-11-10.0.22631-SP0
Python:               3.12.2 (tags/v3.12.2:6abddd9, Feb  6 2024, 21:26:36) [MSC v.1937 64 bit (AMD64)]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          <not installed>
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               <not installed>
gevent:               <not installed>
hvplot:               <not installed>
matplotlib:           <not installed>
nest_asyncio:         1.6.0
numpy:                <not installed>
openpyxl:             <not installed>
pandas:               <not installed>
pyarrow:              <not installed>
pydantic:             <not installed>
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           <not installed>
torch:                <not installed>
xlsx2csv:             0.8.2
xlsxwriter:           3.2.0
@balakhaniyan balakhaniyan added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Jun 20, 2024
@ritchie46
Copy link
Member

What is the bug here? Can you show it?

@balakhaniyan
Copy link
Author

@ritchie46
when I ran

pl.concat([a, b.rename({'data': 'data2'})], how='horizontal').sort('data').with_columns(
    (pl.col('data') - pl.col('data2')).alias('diff')
)

output is:

shape: (2_352, 3)
┌─────────────┬─────────────┬─────────┐
│ data        ┆ data2       ┆ diff    │
│ ---         ┆ ---         ┆ ---     │
│ f64         ┆ f32         ┆ f64     │
╞═════════════╪═════════════╪═════════╡
│ 1.6856309e7 ┆ 1.6856308e7 ┆ 1.0     │
│ 1.6874691e7 ┆ 1.6874692e7 ┆ -1.0    │
│ 1.6874691e7 ┆ 1.6874692e7 ┆ -1.0    │
│ 1.6874691e7 ┆ 1.6874692e7 ┆ -1.0    │
│ 1.6913661e7 ┆ 1.691366e7  ┆ 1.0     │
│ …           ┆ …           ┆ …       │
│ 7.0912e10   ┆ 7.0912e10   ┆ 22.0    │
│ 8.3216e10   ┆ 8.3216e10   ┆ 2630.0  │
│ 9.3109e10   ┆ 9.3109e10   ┆ -1275.0 │
│ 1.0732e11   ┆ 1.0732e11   ┆ 2368.0  │
│ 1.2984e11   ┆ 1.2984e11   ┆ 1616.0  │
└─────────────┴─────────────┴─────────┘

show that there are differences, is it OK?

@stinodego
Copy link
Member

stinodego commented Jun 22, 2024

Float32 has a lower precision than Float64, so you will see a difference when representing large integer values. I would say these results are expected. Please use Float64 for better accuracy if that is relevant to your applications.

@stinodego stinodego closed this as not planned Won't fix, can't repro, duplicate, stale Jun 22, 2024
@stinodego stinodego removed the needs triage Awaiting prioritization by a maintainer label Jun 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

3 participants