Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Inconsistent datetime conversion behavior when constructing a DataFrame with Python datetimes. #55014

Open
3 tasks done
Tracked by #5 ...
raeganbarker opened this issue Sep 5, 2023 · 6 comments · May be fixed by #55901
Open
3 tasks done
Tracked by #5 ...
Assignees
Labels
Bug Non-Nano datetime64/timedelta64 with non-nanosecond resolution
Milestone

Comments

@raeganbarker
Copy link

raeganbarker commented Sep 5, 2023

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

from datetime import datetime

from pandas import DataFrame

date = datetime.now()

data = {
    'date1': date,
    'date2': [date, date],
}

df = DataFrame(data)

print(df.dtypes)

Issue Description

The date columns are not the same type (The case in 2.1.0).

date1    datetime64[us]
date2    datetime64[ns]

Expected Behavior

The date columns are the same type (The case in 2.0.3).

date1    datetime64[ns]
date2    datetime64[ns]

Installed Versions

INSTALLED VERSIONS ------------------ commit : ba1cccd python : 3.11.4.final.0 python-bits : 64 OS : Linux OS-release : 5.15.49-linuxkit-pr Version : #1 SMP PREEMPT Thu May 25 07:27:39 UTC 2023 machine : aarch64 processor : byteorder : little LC_ALL : None LANG : C.UTF-8 LOCALE : en_US.UTF-8

pandas : 2.1.0
numpy : 1.25.2
pytz : 2023.3.post1
dateutil : 2.8.2
setuptools : 68.0.0
pip : 23.2.1
Cython : None
pytest : 7.4.0
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : 3.1.2
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : None
pandas_datareader : None
bs4 : None
bottleneck : None
dataframe-api-compat: None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : 2.0.20
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None

@raeganbarker raeganbarker added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 5, 2023
IzerOnadimQC added a commit to IzerOnadimQC/plateau that referenced this issue Sep 8, 2023
Pandas 2.1.0 DataFrame constructor bug causeing timestamps to have
inconsistent units (pandas-dev/pandas#55014).
IzerOnadimQC added a commit to IzerOnadimQC/plateau that referenced this issue Sep 8, 2023
Pandas 2.1.0 DataFrame constructor bug causeing timestamps to have
inconsistent units (pandas-dev/pandas#55014).
@carlosProgrammer
Copy link

take

@jorisvandenbossche jorisvandenbossche added Non-Nano datetime64/timedelta64 with non-nanosecond resolution and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 15, 2023
@jorisvandenbossche jorisvandenbossche added this to the 2.1.1 milestone Sep 15, 2023
@lithomas1 lithomas1 modified the milestones: 2.1.1, 2.1.2 Sep 21, 2023
@IzerOnadimQC
Copy link

I think a conversation is needed regarding the expected behaviour in Pandas 2 when instantiating a DataFrame with columns of type dt.datetime. In Pandas<2, it is clear that dt.datetime types are expected to be inferred as datetime64[ns] - but it is unclear whether this expectation still holds for Pandas>=2.

As far as I can tell, the discrepancy was caused by this PR: #52212, which changes the way dt.datetime types are inferred. Previously, they were used to instantiate a Timestamp which was then cast to ns units, but now, there is no cast, meaning the Timestamp resolution will remain as us, which appears to be the default resolution when a Timestamp is created using a dt.datetime. I don't think any corresponding change has been made when the DataFrame column is constructed using a list, hence the discrepancy.

@jbrockmendel since you authored that change I was wondering if you had any thoughts on what type should be inferred from dt.datetime in Pandas>=2? The bug (#51196) that the PR addressed was regarding Timestamp types with non-nano units being inferred as datetime64[ns], hence, is it possible that the change to how dt.datetime types are handled was a mistake? If so, I'd be happy to make a fix where dt.datetime types are inferred with nanosecond resolution by default, separate from how Timestamps or other types that have their units explicitly set to non-nano resolutions are handled. @MarcoGorelli it would also be interesting to get your opinion on this matter since you reported the issue to begin with.

@MarcoGorelli
Copy link
Member

hey - I think the current "rule" is that:

  • for scalars, the resolution is preserved (so for stdlib datetime, it becomes 'us', because that's the resolution of the python stdlib)
  • for a list, the resolution is 'ns' by default

@IzerOnadimQC
Copy link

IzerOnadimQC commented Oct 16, 2023

  • for scalars, the resolution is preserved (so for stdlib datetime, it becomes 'us', because that's the resolution of the python stdlib)
  • for a list, the resolution is 'ns' by default

Does this mean that the above discrepancy is actually not considered a bug? I.e. it is expected behaviour for the two columns in the DataFrame to have different dtypes?

@MarcoGorelli
Copy link
Member

I think @jbrockmendel was suggesting to auto-infer the resolution in the list case as well, but that the implementation is quite tricky

@jbrockmendel
Copy link
Member

Marco has it right. The ideal solution is to improve array_to_datetime/maybe_convert_objects to do resolution inference. This would cause the example in the OP to have datetime64[us] dtype for both columns.

Actually implementing this is difficult (particularly avoiding major performance regressions), but increasingly high on my todo list. I'm optimistic it will be ready for 2.2.

kandersolar added a commit to kandersolar/pvlib-python that referenced this issue Oct 17, 2023
kandersolar added a commit to pvlib/pvlib-python that referenced this issue Oct 17, 2023
@lithomas1 lithomas1 modified the milestones: 2.1.2, 2.1.3 Oct 26, 2023
@jorisvandenbossche jorisvandenbossche modified the milestones: 2.1.3, 2.1.4 Nov 13, 2023
kandersolar added a commit to pvlib/pvlib-python that referenced this issue Nov 29, 2023
* Remove various repeated words in documentation (#1872)

* Remove repeated words

* Update pvlib/ivtools/sdm.py

Co-authored-by: Kevin Anderson <kevin.anderso@gmail.com>

---------

Co-authored-by: Kevin Anderson <kevin.anderso@gmail.com>

* fix invalid escape sequence '\c' (#1879)

* fix invalid escape sequence '\c'

pvlib/iam.py:843: DeprecationWarning: invalid escape sequence '\c'

Occurence is actually in line 854: `IAM = 1 - (1 - \cos(aoi))^5`

* Add to list of contributors

* Replace use of deprecated `pkg_resources` (#1881) (#1882)

* Update infinite_sheds.py to add shaded fraction to returned variables in infinite_sheds.get_irradiance and infinite_sheds.get_irradiance_poa (#1871)

* Update infinite_sheds.py

Added shaded fraction to returned variables.

* Update v0.10.3.rst

* Update test_infinite_sheds.py

added tests for shaded fraction

* Update test_infinite_sheds.py

Corrected the shaded fraction tests in the haydavies portion.

* Update pvlib/bifacial/infinite_sheds.py

Co-authored-by: Kevin Anderson <kevin.anderso@gmail.com>

* Update infinite_sheds.py

* Update infinite_sheds.py

* Update infinite_sheds.py

fixed indentation issues

---------

Co-authored-by: Kevin Anderson <kevin.anderso@gmail.com>

* Continuous version of the Perez transposition model implementation (#1876)

* Definitely not ready for review!

* Big step forward.

* Add entry in docs.

* A working model but just one test sofar.

* Add new model as option in get_sky_diffuse.  Docstring edits pending.

* Completed doc strings.  Also a bit of fine-tuning code.

* Updated whatsnew.

* Bugfix, formatting fix, and add all tests.

* Test warning plus some other small changes.

* Make flake happy.

* Update pvlib/irradiance.py

Co-authored-by: Cliff Hansen <cwhanse@sandia.gov>

* Address comments.

* Add contributor code comments.

* Update pvlib/irradiance.py

Co-authored-by: Adam R. Jensen <39184289+AdamRJensen@users.noreply.github.com>

* Adapt to reviewer preferences.

* Adapt to flake preferences.

* Remove model pseudo-option.

* Flake

---------

Co-authored-by: Cliff Hansen <cwhanse@sandia.gov>
Co-authored-by: Adam R. Jensen <39184289+AdamRJensen@users.noreply.github.com>

* Fix spurious test error with pandas 2.1 (#1891)

pandas-dev/pandas#55014

* Fix plotting in plot_singlediode.py gallery page (#1895)

* Update plot_singlediode.py

fixed plot annotations by moving plt.show() further down

* Update whatsnew.rst

* Update v0.10.3.rst

* Update docs/sphinx/source/whatsnew.rst

Undoing changes to whatsnew.rst

* Address pandas FutureWarnings in test suite (#1900)

* Cahnged expected reference in test_detect_clearskY_window to 1 from True to avoid Futurewarning

* Change reference to etr in ibird function to avoid FutureWarning

* In test_modelchain, update all instances when referring to series by position to using iloc to get rid of FutureWarning

* Update to iloc method for referencing by position in test_irradiance to get rid of FutureWarning

* In test_singlediode change applymap to map to get rid of FutureWarning

* Test_srml update to select using iloc to get rid of FutureWarning

* Substitute changing to float64 dtype using map with base functionality that's accessible across Pandas versions

* Added username to Contributors

* Update line break in test_clearsky to adhere to line length limit

* add comparisons to other tools

* Apply suggestions from code review

Co-authored-by: Cliff Hansen <cwhanse@sandia.gov>

* revision re: other open-source projects

* bibtex tweaks

* clarify pvlib matlab comparison

---------

Co-authored-by: Miroslav Šedivý <6774676+eumiro@users.noreply.github.com>
Co-authored-by: Arjan Keeman <akeeman@users.noreply.github.com>
Co-authored-by: Miguel Sánchez de León Peque <peque@neosit.es>
Co-authored-by: Will Hobbs <45701090+williamhobbs@users.noreply.github.com>
Co-authored-by: Anton Driesse <anton.driesse@pvperformancelabs.com>
Co-authored-by: Cliff Hansen <cwhanse@sandia.gov>
Co-authored-by: Adam R. Jensen <39184289+AdamRJensen@users.noreply.github.com>
Co-authored-by: matsuobasho <rkoulikov@pm.me>
@lithomas1 lithomas1 modified the milestones: 2.1.4, 2.2 Dec 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Non-Nano datetime64/timedelta64 with non-nanosecond resolution
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants