Skip to content

BUG: assert_frame_equal behaves differently when comparing a mostly-numeric column from CSV if that column has a non-numeric entry #47002

@blakeNaccarato

Description

@blakeNaccarato

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
from pandas.testing import assert_frame_equal

orig_df1 = pd.DataFrame(
    {
        "Column": [
            "Units",
            1.0000000000005,
        ],
    }
)

orig_df2 = pd.DataFrame(
    {
        "Column": [
            "Units",
            1.0000000000007,
        ],
    }
)

orig_df1.to_csv("df1.csv", index=False)
orig_df2.to_csv("df2.csv", index=False)


def test_dfs_directly():
    """This test passes."""
    assert_frame_equal(orig_df1, orig_df2)


def test_csv_roundtrip():
    """This test fails."""
    df1 = pd.read_csv("df1.csv")
    df2 = pd.read_csv("df2.csv")
    assert_frame_equal(df1, df2)


def test_csv_roundtrip_omit_nonnumeric_row():
    """This test passes."""
    df1 = pd.read_csv("df1.csv", skiprows=[1])
    df2 = pd.read_csv("df2.csv", skiprows=[1])
    assert_frame_equal(df1, df2)

Issue Description

Tests pass on the original dataframes or when the dataframes loaded from CSV have their non-numeric row omitted. The test fails if the non-numeric row is included. This behavior difference could be classified as a bug, or if is working as intended, then maybe it could be documented that using assert_frame_equal behaves strangely when loading mixed-type columns from CSV.

The minimal example supposes that the user has a "Units" row, where they store, say "m/s" or "kg". Of course this may be a "data smell", that care should be taken to load uniform data into dataframes, and keep track of metadata like units separately.

Expected Behavior

All three tests should pass.

Installed Versions

For some reason, `pd.show_versions()` is raising the following `AssertionError`:

AssertionError: C:\Users\Blake\AppData\Local\Programs\Python\Python310\lib\distutils\core.py

Hopefully the output of pip show pandas should suffice:

Name: pandas
Version: 1.4.2
...

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions