### Problem
If you ever needed to check that dataframes are exactly equal in values, 
you might know it should not be done with regular `==` operator.
The problem is in numpy, `np.nan == np.nan` gives `False`,
and when you run something like 


In [2]:
import pandas as pd
import numpy as np

(
    pd.Series([1,2,3, np.nan, 5]) == 
    pd.Series([1,2,3, np.nan, 5])
).all()

np.False_

it compares values elementwise and eventually gives `False`.


To try manually overcome this, with help of something like `.fillna(..)` - bad idea. You need different substitutors depending on dtype of the column,
and with `pd.Categorical` dtype it's especially troublesome - you need to extend your list of categories first.


Right way to do this is using of [`pd.equals(...)`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.equals.html) (don't confuse with [`pd.eq(...)`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.eq.html)!) or [assert_..._equal](https://pandas.pydata.org/docs/reference/api/pandas.testing.assert_frame_equal.html#pandas.testing.assert_frame_equal) in older versions of pandas:

In [3]:
print(
    pd.Series([1,2,3, np.nan,5]).equals(
        pd.Series([1,2,3, np.nan, 5])
    )
)

pd.testing.assert_series_equal(
    pd.Series([1,2,3, np.nan,5]),
    pd.Series([1,2,3, np.nan, 5])
)

True


Keep in mind - usually you need to ignore columns order (it could have been reordered for some reason):

In [7]:
df1 = pd.DataFrame({
    'a': [1,2,np.nan],
    'b': ['x','y','z']
})
df2 = pd.DataFrame({
    'b': ['x','y','z'],
    'a': [1,2,np.nan],
})
print(
    pd.DataFrame.equals(
        df1,
        df2
    )
)
# Output: False

print(
    pd.DataFrame.equals(
        df1.sort_index(axis=1), 
        df2.sort_index(axis=1)
    )
)
# Output: True

False
True


### Utility function

So, here's a final utility function which hopefully will save you some time:

In [19]:
import pandas as pd

def __sort_indices(df, ignore_rows_order=True):
    # Columns sorting
    df = df.sort_index(axis=1)

    # Rows index sorting
    if ignore_rows_order:
        df = df.sort_index()

    return df

def is_df_eq(df1, df2, ignore_rows_order=True) -> bool:
    df1 = __sort_indices(df1, ignore_rows_order=ignore_rows_order)
    df2 = __sort_indices(df2, ignore_rows_order=ignore_rows_order)

    return pd.DataFrame.equals(df1, df2)


def print_equals_by_columns(df1: pd.DataFrame, df2, ignore_rows_order=True):
    """
    Helps to identify which columns are not equal
    """

    df1 = __sort_indices(df1, ignore_rows_order=ignore_rows_order)
    df2 = __sort_indices(df2, ignore_rows_order=ignore_rows_order)
        
    if columns_diff := df1.columns.symmetric_difference(df2.columns).tolist():
        print("Columns set is different:", columns_diff)

    print("Common columns equality:")
    for c in df1.columns.intersection(df2.columns):
        print(c, ':', pd.Series.equals(
            df1[c],
            df2[c],
        ))

### Tests

A bit of tests demonstrating the behaviour:

In [23]:
#| label: Tests
#| code-fold: true
#| code-summary: Tests

import numpy as np

df1 = pd.DataFrame({
    'ID': [1,2,3],
    'A': [1,2,3],
    'B': [1,2,np.nan],
    'C': ['x', 'y', 'Z']
}).set_index('ID')

df2 = pd.DataFrame({
    'ID': [1,2,3],
    'A': [1,2,3],
    'B': [1,2,np.nan],
    'C': ['x', 'y', 'NOT_Z'],
    'D': [0]*3
}).set_index('ID')

common_columns = ['A', 'B']
print(
    f"Is {common_columns} equal?:", 
    is_df_eq(df1[common_columns], df2[common_columns])
)

print(
    "Is equal?:", is_df_eq(df1, df2)
)
print_equals_by_columns(df1, df2)



Is ['A', 'B'] equal?: True
Is equal?: False
Columns set is different: ['D']
Common columns equality:
A : True
B : True
C : False
