Skip to content

Shield: Fix Pandas warning and improve export sanitization tests#684

Merged
fderuiter merged 4 commits intomainfrom
shield-fix-pandas-warning-and-improve-tests-17288914380282949305
Feb 20, 2026
Merged

Shield: Fix Pandas warning and improve export sanitization tests#684
fderuiter merged 4 commits intomainfrom
shield-fix-pandas-warning-and-improve-tests-17288914380282949305

Conversation

@fderuiter
Copy link
Copy Markdown
Owner

This PR addresses a Pandas deprecation warning (Pandas4Warning) related to object dtype selection in export.py and significantly improves test coverage for the export functionality.

Changes:

  1. Fix Pandas Warning: In _sanitize_df, select_dtypes now explicitly includes "str" alongside object. This prevents the warning about implicit string inclusion in object dtype, which is deprecated in Pandas 3.0+.
  2. Expanded Testing:
    • Replaced the single-case test_csv_sanitization.py with a comprehensive test_export_sanitization.py.
    • Added a test case for export_to_excel to ensure formula injection protection works for Excel exports as well (mocking to_excel to verify the DataFrame state before write).
    • Added a test case to ensure that non-string columns (integers, floats, bools) are NOT accidentally sanitized or converted to strings.
    • Added strict warning assertions (warnings.catch_warnings(record=True)) to ensure the export process is warning-free.

Impact:

  • Eliminates log noise from deprecation warnings.
  • Increases confidence in the security (sanitization) of Excel exports.
  • Protects against regression where numeric data might be corrupted during sanitization.

PR created automatically by Jules for task 17288914380282949305 started by @fderuiter

- Updated `imednet/integrations/export.py` to explicitly include "str" dtype when selecting columns for sanitization, silencing a Pandas 3.0 deprecation warning.
- Deleted `tests/unit/test_csv_sanitization.py`.
- Created `tests/unit/test_export_sanitization.py` with expanded coverage:
    - Verifies CSV sanitization.
    - Verifies Excel sanitization (previously untested).
    - Verifies non-string columns are preserved.
    - Ensures no warnings are emitted during export.

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
@google-labs-jules
Copy link
Copy Markdown
Contributor

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@fderuiter
Copy link
Copy Markdown
Owner Author

The failing job is due to a formatting issue: black reports that tests/unit/test_export_sanitization.py would be reformatted. To fix this, run black on the file to apply auto-formatting:

black tests/unit/test_export_sanitization.py

Afterward, commit and push the changes. This will resolve the job failure caused by code style violations.

@google-labs-jules
Copy link
Copy Markdown
Contributor

The failing job is due to a formatting issue: black reports that tests/unit/test_export_sanitization.py would be reformatted. To fix this, run black on the file to apply auto-formatting:

black tests/unit/test_export_sanitization.py

Afterward, commit and push the changes. This will resolve the job failure caused by code style violations.

Thank you. I have run black on the file and pushed the reformatted code.

fderuiter and others added 2 commits February 20, 2026 18:59
- Updated `imednet/integrations/export.py` to explicitly include "str" dtype when selecting columns for sanitization, silencing a Pandas 3.0 deprecation warning.
- Deleted `tests/unit/test_csv_sanitization.py`.
- Created `tests/unit/test_export_sanitization.py` with expanded coverage:
    - Verifies CSV sanitization.
    - Verifies Excel sanitization (previously untested).
    - Verifies non-string columns are preserved.
    - Ensures no warnings are emitted during export.
- Reformatted tests with black to pass CI.

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
- Updated `imednet/integrations/export.py` to explicitly include "str" dtype when selecting columns for sanitization, silencing a Pandas 3.0 deprecation warning.
- Deleted `tests/unit/test_csv_sanitization.py`.
- Created `tests/unit/test_export_sanitization.py` with expanded coverage:
    - Verifies CSV sanitization.
    - Verifies Excel sanitization (previously untested).
    - Verifies non-string columns are preserved.
    - Ensures no warnings are emitted during export.
- Reformatted tests with black and fixed Ruff linting errors (unused variable, line length) to pass CI.

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
@fderuiter
Copy link
Copy Markdown
Owner Author

The failing job is caused by a TypeError: numpy string dtypes are not allowed, use 'str' or 'object' instead. This error occurs in tests that involve export to CSV and Excel. The root cause is likely in your export code (e.g., imednet/integrations/export.py), where Pandas DataFrames contain columns with numpy string dtype (such as numpy.str_ or S types). Pandas 2.0+ no longer allows these dtypes for I/O.

Solution:
Before writing DataFrames to CSV or Excel, convert any numpy string dtypes to standard Python str/object. For example:

import numpy as np
import pandas as pd

def sanitize_string_dtypes(df: pd.DataFrame) -> pd.DataFrame:
# Convert numpy string columns to object (str)
for col in df.select_dtypes(include=["string", "S", "U", np.str_]).columns:
df[col] = df[col].astype(object)
return df

In your export_to_csv/export_to_excel functions, add:

df = sanitize_string_dtypes(df)

Then continue to write the DataFrame

Alternatively, you can coerce just before to_csv/to_excel:

df = df.astype({col: "object" for col in df.select_dtypes(include=["S", "U", "string", np.str_]).columns})

Update the export function(s) to ensure no DataFrame written to disk has numpy string dtypes.

References:

Once corrected, your tests should pass with Pandas 2.x and future versions.

@google-labs-jules
Copy link
Copy Markdown
Contributor

The failing job is caused by a TypeError: numpy string dtypes are not allowed, use 'str' or 'object' instead. This error occurs in tests that involve export to CSV and Excel. The root cause is likely in your export code (e.g., imednet/integrations/export.py), where Pandas DataFrames contain columns with numpy string dtype (such as numpy.str_ or S types). Pandas 2.0+ no longer allows these dtypes for I/O.

Solution:
Before writing DataFrames to CSV or Excel, convert any numpy string dtypes to standard Python str/object. For example:

import numpy as np
import pandas as pd

def sanitize_string_dtypes(df: pd.DataFrame) -> pd.DataFrame:
# Convert numpy string columns to object (str)
for col in df.select_dtypes(include=["string", "S", "U", np.str_]).columns:
df[col] = df[col].astype(object)
return df

In your export_to_csv/export_to_excel functions, add:

df = sanitize_string_dtypes(df)

Then continue to write the DataFrame

Alternatively, you can coerce just before to_csv/to_excel:

df = df.astype({col: "object" for col in df.select_dtypes(include=["S", "U", "string", np.str_]).columns})

Update the export function(s) to ensure no DataFrame written to disk has numpy string dtypes.

References:

Once corrected, your tests should pass with Pandas 2.x and future versions.

Thank you for the detailed analysis. I've updated the _sanitize_df function to handle TypeError gracefully when selecting dtypes, which resolves the issue with Pandas 2.x in CI. This maintains compatibility with both older and newer Pandas versions.

- Updated `imednet/integrations/export.py` to use a `try-except TypeError` block when selecting string dtypes. This handles a `TypeError` in Pandas 2.x (seen in CI) while preserving the fix for Pandas 3.0 warnings.
- Deleted `tests/unit/test_csv_sanitization.py`.
- Created `tests/unit/test_export_sanitization.py` with expanded coverage:
    - Verifies CSV sanitization.
    - Verifies Excel sanitization (previously untested).
    - Verifies non-string columns are preserved.
    - Ensures no warnings are emitted during export.
- Reformatted tests with black and fixed Ruff linting errors.

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
@fderuiter fderuiter marked this pull request as ready for review February 20, 2026 19:44
@fderuiter fderuiter merged commit fb92485 into main Feb 20, 2026
13 checks passed
@fderuiter fderuiter deleted the shield-fix-pandas-warning-and-improve-tests-17288914380282949305 branch February 20, 2026 19:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant