BUG: preserve dtype for right/outer merge of datetime with different resolutions #53233

jorisvandenbossche · 2023-05-15T10:55:55Z

Follow-up on #53213, fixing one more corner case for the right/outer joins making the key column object dtype.

Expanded the test added in that PR to test the different join types.
The whatsnew added in #53213 should also already cover this.

…resolutions

jorisvandenbossche · 2023-05-15T10:59:47Z

pandas/core/reshape/merge.py

+                    if (
+                        lvals.dtype.kind == "M"
+                        and rvals.dtype.kind == "M"
+                        and result_dtype.kind == "O"
+                    ):
+                        # TODO(non-nano) Workaround for common_type not dealing
+                        # with different resolutions
+                        result_dtype = key_col.dtype


This is an ugly band-aid solution: the proper fix is to handle this in find_common_type. But I am not sure if changing find_common_type would be in scope for a bug fix release? (xref #46587 (comment))

cc @jbrockmendel

Changing find_common_type would affect concat, so im hesitant to call it a bugfix. But avoiding object may be nicer behavior, so id be open to it as an api change longer-term.

Do we know that both are tznaive at this point? If one is tzaware then object is correct (though i guess we should be able to determine that the merge will be empty?)

I've given a little bit of thought to something similar in the context of Index.get_indexer with mismatched resos. Instead of converting both to object, could find a fill_value not present in left and convert right to left.unit, filling non-convertible entries with that fill_value. The "finding a fill_value" part seems really hacky though. (update: when i wrote this paragraph I thought this chunk of code was in get_merge_keys)

Do we know that both are tznaive at this point? If one is tzaware then object is correct (though i guess we should be able to determine that the merge will be empty?)

I think so yes, normally we should raise an error earlier in the code if you try to merge on a tz-aware and tz-naive column:

In [1]: df1 = pd.DataFrame({"key": [pd.Timestamp("1970-01-01")], "a": [1]}) In [2]: df2 = pd.DataFrame({"key": [pd.Timestamp("1970-01-01", tz="Europe/Brussels")], "a": [1]}) In [3]: pd.merge(df1, df2) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) Input In [3], in <cell line: 1>() ----> 1 pd.merge(df1, df2) File ~/scipy/pandas/pandas/core/reshape/merge.py:146, in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate) 129 @Substitution("\nleft : DataFrame or named Series") 130 @Appender(_merge_doc, indents=0) 131 def merge( (...) 144 validate: str | None = None, 145 ) -> DataFrame: --> 146 op = _MergeOperation( 147 left, 148 right, 149 how=how, 150 on=on, 151 left_on=left_on, 152 right_on=right_on, 153 left_index=left_index, 154 right_index=right_index, 155 sort=sort, 156 suffixes=suffixes, 157 indicator=indicator, 158 validate=validate, 159 ) 160 return op.get_result(copy=copy) File ~/scipy/pandas/pandas/core/reshape/merge.py:735, in _MergeOperation.__init__(self, left, right, how, on, left_on, right_on, axis, left_index, right_index, sort, suffixes, indicator, validate) 727 ( 728 self.left_join_keys, 729 self.right_join_keys, 730 self.join_names, 731 ) = self._get_merge_keys() 733 # validate the merge keys dtypes. We may need to coerce 734 # to avoid incompatible dtypes --> 735 self._maybe_coerce_merge_keys() 737 # If argument passed to validate, 738 # check if columns specified as unique 739 # are in fact unique. 740 if validate is not None: File ~/scipy/pandas/pandas/core/reshape/merge.py:1397, in _MergeOperation._maybe_coerce_merge_keys(self) 1393 raise ValueError(msg) 1394 elif not isinstance(lk.dtype, DatetimeTZDtype) and isinstance( 1395 rk.dtype, DatetimeTZDtype 1396 ): -> 1397 raise ValueError(msg) 1398 elif ( 1399 isinstance(lk.dtype, DatetimeTZDtype) 1400 and isinstance(rk.dtype, DatetimeTZDtype) 1401 ) or (lk.dtype.kind == "M" and rk.dtype.kind == "M"): 1402 # allows datetime with different resolutions 1403 continue ValueError: You are trying to merge on datetime64[ns] and datetime64[ns, Europe/Brussels] columns for key 'key'. If you wish to proceed you should use pd.concat

mroeschke · 2023-05-17T16:14:34Z

Thanks @jorisvandenbossche

…ge of datetime with different resolutions

…er merge of datetime with different resolutions) (#53275) Backport PR #53233: BUG: preserve dtype for right/outer merge of datetime with different resolutions Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

…resolutions (pandas-dev#53233)

* ENH: better dtype inference when doing DataFrame reductions * precommit issues * fix failures * fix failures * mypy + some docs * doc linting linting * refactor to use _reduce_with_wrap * docstring linting * pyarrow failure + linting * pyarrow failure + linting * linting * doc stuff * linting fixes * fix fix doc string * remove _wrap_na_result * doc string example * pyarrow + categorical * silence bugs * silence errors * silence errors II * fix errors III * various fixups * various fixups * delay fixing windows and 32bit failures * BUG: Adding a columns to a Frame with RangeIndex columns using a non-scalar key (#52877) * DOC: Update whatsnew (#52882) * CI: Change development python version to 3.10 (#51133) * CI: Change development python version to 3.10 * Update checks * Remove strict * Remove strict * Fixes * Add dt * Switch python to 3.9 * Remove * Fix * Try attribute * Adjust * Fix mypy * Try fixing doc build * Fix mypy * Fix stubtest * Remove workflow file * Rename back * Update * Rename * Rename * Change python version * Remove * Fix doc errors * Remove pypy * Update ci/deps/actions-pypy-39.yaml Co-authored-by: Thomas Li <47963215+lithomas1@users.noreply.github.com> * Revert pypy removal * Remove again * Fix * Change to 3.9 * Address --------- Co-authored-by: Thomas Li <47963215+lithomas1@users.noreply.github.com> * update * update * add docs * fix windows tests * fix windows tests * remove guards for 32bit linux * add bool tests + fix 32-bit failures * fix pre-commit failures * fix mypy failures * rename _reduce_with -> _reduce_and_wrap * assert missing attributes * reduction dtypes on windows and 32bit systems * add tests for min_count=0 * PERF:median with axis=1 * median with axis=1 fix * streamline Block.reduce * fix comments * FIX preserve dtype with datetime columns of different resolution when merging (#53213) * BUG Merge not behaving correctly when having `MultiIndex` with a single level (#53215) * fix merge when MultiIndex with single level * resolved conversations * fixed code style * BUG: preserve dtype for right/outer merge of datetime with different resolutions (#53233) * remove special BooleanArray.sum method * remove BooleanArray.prod * fixes * Update doc/source/whatsnew/v2.1.0.rst Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> * Update pandas/core/array_algos/masked_reductions.py Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> * small cleanup * small cleanup * only reduce 1d * fix after #53418 * update according to comments * revome note * update _minmax * REF: add keepdims parameter to ExtensionArray._reduce + remove ExtensionArray._reduce_and_wrap * REF: add keepdims parameter to ExtensionArray._reduce + remove ExtensionArray._reduce_and_wrap * fix whatsnew * fix _reduce call * simplify test * add tests for any/all --------- Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com> Co-authored-by: Thomas Li <47963215+lithomas1@users.noreply.github.com> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Yao Xiao <108576690+Charlie-XIAO@users.noreply.github.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

BUG: preserve dtype for right/outer merge of datetime with different …

db9baca

…resolutions

jorisvandenbossche added Bug Non-Nano datetime64/timedelta64 with non-nanosecond resolution labels May 15, 2023

jorisvandenbossche added this to the 2.0.2 milestone May 15, 2023

jorisvandenbossche commented May 15, 2023

View reviewed changes

jorisvandenbossche mentioned this pull request May 15, 2023

FIX preserve dtype with datetime columns of different resolution when merging #53213

Merged

5 tasks

jbrockmendel approved these changes May 17, 2023

View reviewed changes

mroeschke approved these changes May 17, 2023

View reviewed changes

mroeschke merged commit 93f9ae0 into pandas-dev:main May 17, 2023

jorisvandenbossche deleted the bug-merge-datetime-unit-outer branch May 17, 2023 16:15

meeseeksmachine mentioned this pull request May 17, 2023

Backport PR #53233 on branch 2.0.x (BUG: preserve dtype for right/outer merge of datetime with different resolutions) #53275

Merged

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request May 17, 2023

Backport PR pandas-dev#53233: BUG: preserve dtype for right/outer mer…

244dcfc

…ge of datetime with different resolutions

topper-123 pushed a commit to topper-123/pandas that referenced this pull request May 22, 2023

BUG: preserve dtype for right/outer merge of datetime with different …

26ed455

…resolutions (pandas-dev#53233)

topper-123 pushed a commit to topper-123/pandas that referenced this pull request May 27, 2023

BUG: preserve dtype for right/outer merge of datetime with different …

a7fd1b1

…resolutions (pandas-dev#53233)

Daquisu pushed a commit to Daquisu/pandas that referenced this pull request Jul 8, 2023

BUG: preserve dtype for right/outer merge of datetime with different …

c6b9415

…resolutions (pandas-dev#53233)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: preserve dtype for right/outer merge of datetime with different resolutions #53233

BUG: preserve dtype for right/outer merge of datetime with different resolutions #53233

Uh oh!

jorisvandenbossche commented May 15, 2023

Uh oh!

jorisvandenbossche May 15, 2023

Uh oh!

jbrockmendel May 15, 2023

Uh oh!

jorisvandenbossche May 16, 2023

Uh oh!

mroeschke commented May 17, 2023

Uh oh!

Uh oh!

Uh oh!

BUG: preserve dtype for right/outer merge of datetime with different resolutions #53233

BUG: preserve dtype for right/outer merge of datetime with different resolutions #53233

Uh oh!

Conversation

jorisvandenbossche commented May 15, 2023

Uh oh!

jorisvandenbossche May 15, 2023

Choose a reason for hiding this comment

Uh oh!

jbrockmendel May 15, 2023

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche May 16, 2023

Choose a reason for hiding this comment

Uh oh!

mroeschke commented May 17, 2023

Uh oh!

Uh oh!