-
-
Notifications
You must be signed in to change notification settings - Fork 19.1k
Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
from pandas.testing import assert_frame_equal
orig_df1 = pd.DataFrame(
{
"Column": [
"Units",
1.0000000000005,
],
}
)
orig_df2 = pd.DataFrame(
{
"Column": [
"Units",
1.0000000000007,
],
}
)
orig_df1.to_csv("df1.csv", index=False)
orig_df2.to_csv("df2.csv", index=False)
def test_dfs_directly():
"""This test passes."""
assert_frame_equal(orig_df1, orig_df2)
def test_csv_roundtrip():
"""This test fails."""
df1 = pd.read_csv("df1.csv")
df2 = pd.read_csv("df2.csv")
assert_frame_equal(df1, df2)
def test_csv_roundtrip_omit_nonnumeric_row():
"""This test passes."""
df1 = pd.read_csv("df1.csv", skiprows=[1])
df2 = pd.read_csv("df2.csv", skiprows=[1])
assert_frame_equal(df1, df2)
Issue Description
Tests pass on the original dataframes or when the dataframes loaded from CSV have their non-numeric row omitted. The test fails if the non-numeric row is included. This behavior difference could be classified as a bug, or if is working as intended, then maybe it could be documented that using assert_frame_equal
behaves strangely when loading mixed-type columns from CSV.
The minimal example supposes that the user has a "Units" row, where they store, say "m/s" or "kg". Of course this may be a "data smell", that care should be taken to load uniform data into dataframes, and keep track of metadata like units separately.
Expected Behavior
All three tests should pass.
Installed Versions
AssertionError: C:\Users\Blake\AppData\Local\Programs\Python\Python310\lib\distutils\core.py
Hopefully the output of pip show pandas
should suffice:
Name: pandas
Version: 1.4.2
...