Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align should consider the dtype when creating new columns #31874

Open
TomAugspurger opened this issue Feb 11, 2020 · 3 comments
Open

Align should consider the dtype when creating new columns #31874

TomAugspurger opened this issue Feb 11, 2020 · 3 comments
Labels
API Design Dtype Conversions Unexpected or buggy dtype conversions Enhancement Needs Discussion Requires discussion from core team before further action

Comments

@TomAugspurger
Copy link
Contributor

Currently when you align columns and create a new column, align will create a new float64 column filled with NaNs.

In [1]: import pandas as pd

In [2]: a = pd.DataFrame({"A": [1, 2], "B": [pd.Timestamp('2000'), pd.NaT]})

In [3]: b = pd.DataFrame({"A": [1, 2]})

In [4]: a.align(b)[1].dtypes
Out[4]:
A      int64
B    float64
dtype: object

I think it'd be more useful for the dtypes of new columns to be the same as the dtype from the other.

# proposed behavior
In [4]: a.align(b)[1].dtypes
Out[4]:
A             int64
B    datetime64[ns]
dtype: object

The newly created B column has dtype datetime64[ns], the same as a.B.

This proposal would make the fill_value keyword a bit more complex.

  1. The default of np.nan would change to None, which means "the right NA value for the dtype".
  2. We would maybe need to accept a Mapping so users could specify specific fill values per column.

I think this would make the workaround in #31679 unnecessary, as we'd have the correct dtype going into the operation.


If we think this is a good idea, it's probably an API breaking change. We might be able to deprecate this cleanly by (ab)using fill_value. We would warn when creating new columns.

if new_columns and fill_value is no_default:
    warnings.warn("Creating new float64 columns filled with NaN. In the future... "
                             "Specify fill_value=None to accept the future behavior now.")
    fill_value = np.nan  # 

Unfortunately, that'll happen in the background during binops. Not sure how to get around that, aside from instructing users to explicitly align first.

@TomAugspurger TomAugspurger added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Dtype Conversions Unexpected or buggy dtype conversions API Design Numeric Operations Arithmetic, Comparison, and Logical operations labels Feb 11, 2020
@usman98789
Copy link

take

@TomAugspurger
Copy link
Contributor Author

@usman98789 this may be premature to work on. We still need to have a discussion on whether this is a desirable change.

@TomAugspurger
Copy link
Contributor Author

If you're interested in working on it, then feel free to drive the discussion forward.

@usman98789 usman98789 removed their assignment Jun 23, 2020
@mroeschke mroeschke added Needs Discussion Requires discussion from core team before further action Enhancement and removed Numeric Operations Arithmetic, Comparison, and Logical operations Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Jul 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Dtype Conversions Unexpected or buggy dtype conversions Enhancement Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

No branches or pull requests

3 participants