-
Notifications
You must be signed in to change notification settings - Fork 0
Add Pandas 1.5.x Compatibility for Databricks Environments #15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Add handling of one element array for nanstd - fix dt0 detection for timeseries with rapid reporting - Change default timezone localization to UTC - Document known DST interpolation issue BREAKING CHANGE: Default timezone behavior changed from local to UTC for naive timestamp localization. Known Issues: - np.interp() doesn't handle DST transitions correctly (not fixed)
…f supports in versions !=2.2
- Fix AttributeError in active_percent.py: Convert DatetimeIndex to Series before calling diff() - Enhance timezone handling in utils.py: Add comprehensive mixed naive/aware timestamp validation - Fix timezone comparison errors: Localize naive start_date/end_date to data timezone - Prevent object dtype conversion: Initialize Series with timezone-aware dtype - Add comprehensive DST testing: Create test_diff_dst.py with Spring Forward/Fall Back scenarios - Demonstrate correct timezone creation: Show difference between tz.localize() and direct tzinfo assignment - Add 15-minute interval DST tests: Verify diff() behavior across EST->EDT and EDT->EST transitions - Update process_data.py: Restore localize_naive_timestamp functionality - Update roc.py: Add timezone handling improvements - Update pyproject.toml: Add missing dependencies - Update .gitignore: Add coverage and build artifacts All tests pass, demonstrating proper timezone-aware timestamp handling during DST transitions.
…alues - Fixed pd.Series(data["gl"], index=data["time"]) conversion in CGMS2DayByDay function - Added proper DataFrame index handling to prevent NaN values in Series creation - Updated test framework: removed .round(3) calls and standardized rtol=0.001 for assert_frame_equal - Improved floating-point comparison precision across all test files - Updated 32 test files to use consistent rtol=0.001 tolerance for better numerical accuracy
- Add # noqa: C901 to check_data_columns function to suppress complexity warning - Add # noqa: C901 to CGMS2DayByDay function to suppress complexity warning - Add # noqa: C901 to gd2d_to_df function to suppress complexity warning - Remove redundant pylint disable comments where ruff noqa is sufficient - Fix minor whitespace formatting (trailing space removal) - Clean up function definitions for better code consistency These functions handle complex data processing logic that naturally results in high cyclomatic complexity, making the C901 warnings acceptable for maintainability.
- Fix DatetimeIndex.diff() compatibility: Use pd.Series(data.index).diff() instead of data.index.diff() - Fix DatetimeIndex.floor()/ceil() compatibility: Add fallback methods for pandas 1.5.x - Fix pd.api.types.is_string_dtype() compatibility: Add try/except fallback for pandas 1.5.x - Fix Series.total_seconds() compatibility: Use .dt.total_seconds() for pandas 1.5.x - Add ruff C901 complexity ignores for complex data processing functions - Reorganize pyproject.toml: Move pytest from main dependencies to optional dev/test groups - Add pyarrow to dev dependencies for enhanced data processing capabilities - Update .gitignore to exclude other virtual environment directories - Update uv.lock with new dependency structure These changes ensure iglu_python works seamlessly in Databricks environments that typically use pandas 1.5.x, while maintaining compatibility with pandas 2.2.x. Key compatibility fixes: - DatetimeIndex.diff() -> pd.Series(data.index).diff() - DatetimeIndex.floor()/ceil() -> hasattr() checks with fallbacks - pd.api.types.is_string_dtype() -> try/except with isinstance() fallback - Series.total_seconds() -> Series.dt.total_seconds() Tested with both pandas 1.5.3 and pandas 2.2.3 environments.
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #15 +/- ##
==========================================
- Coverage 91.16% 90.41% -0.75%
==========================================
Files 46 46
Lines 1754 1795 +41
==========================================
+ Hits 1599 1623 +24
- Misses 155 172 +17 ☔ View full report in Codecov by Sentry. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📋 Summary
This PR adds comprehensive pandas 1.5.x compatibility to ensure seamless operation in Databricks environments while maintaining backward and forward compatibility with pandas 2.2.x. Includes a critical fix for DST (Daylight Saving Time) transition bugs.
🎯 Problem Statement
✨ Major Changes
🔧 Core Compatibility Fixes
data.index.diff()withpd.Series(data.index).diff()for pandas 1.5.x compatibilityhasattr()checksisinstance()check.dt.total_seconds()for pandas 1.5.x🕐 Critical DST Transition Bug Fix
�� Test Framework Modernization
.round(3)calls: Replaced withrtol=0.001inassert_frame_equalfor better precision📦 Dependency Management Overhaul
dev: All development tools (pytest, black, mypy, ruff, etc.)test: Just testing tools (pytest, pytest-cov)lint: Just linting tools (black, isort, mypy, ruff)build: Build tools (hatch, twine)🛠️ Code Quality Improvements
🧪 Testing
📊 Impact
🔍 Files Changed
iglu_python/utils.py: Core compatibility fixes, DST bug fix, and code cleanuppyproject.toml: Dependency reorganization and modernizationtests/*.py: 32 test files updated with modern assertion methods.gitignore: Updated to exclude other venv directoriesuv.lock: New dependency lock file🚀 Deployment Notes
✅ Checklist
*Ready for review and merge/Users/staskh/Sandbox/iglu_python && git push origin 14-bug-cannot-reshape-array This PR ensures iglu_python works seamlessly in Databricks environments while maintaining compatibility with all pandas versions and fixing critical DST transition bugs. 🎉