Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor fmt_time, fmt_date and fmt_datetime #290

Merged
merged 7 commits into from
Apr 22, 2024

Conversation

jrycw
Copy link
Contributor

@jrycw jrycw commented Apr 14, 2024

Related to #222 , I'm thinking we could possibly delegate the validation of lifts to the built-in datetime.datetime.fromisoformat function.

Fixes: #299

@codecov-commenter
Copy link

codecov-commenter commented Apr 14, 2024

Codecov Report

Attention: Patch coverage is 90.00000% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 81.68%. Comparing base (479eb73) to head (3b609b4).
Report is 22 commits behind head on main.

Files Patch % Lines
great_tables/_formats.py 90.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #290      +/-   ##
==========================================
- Coverage   81.71%   81.68%   -0.03%     
==========================================
  Files          41       41              
  Lines        4325     4287      -38     
==========================================
- Hits         3534     3502      -32     
+ Misses        791      785       -6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@machow
Copy link
Collaborator

machow commented Apr 16, 2024

@rich-iannone can you verify that datetime.fromisoformat() handles the cases you were thinking of? Can you be sure to enumerate in a comment the cases you try (for posterity)?

@rich-iannone
Copy link
Member

I mostly wanted to make sure that datetime.fromisoformat() handled most of the common cases and I found it's quite capable. The following input variations results in correct (the same) output:

import datetime
from great_tables import vals

iso_str = [
    "2020-05-20",
    "2020-05-20 22:30",
    "2020-05-20T22:30",
    "2020-05-20 22:30:05",
    "2020-05-20 22:30:05.824" # milliseconds *must* have 3 decimal places
    ]

fmtd_dates = []

for val in iso_str:
    fmtd_dates.append((vals.fmt_date(datetime.datetime.fromisoformat(val), date_style="wd_m_day_year"))[0])

fmtd_dates
['Wed, May 20, 2020',
 'Wed, May 20, 2020',
 'Wed, May 20, 2020',
 'Wed, May 20, 2020',
 'Wed, May 20, 2020']

So I think the code change is great here. Thanks, @jrycw !

Copy link
Member

@rich-iannone rich-iannone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@jrycw
Copy link
Contributor Author

jrycw commented Apr 17, 2024

I mostly wanted to make sure that datetime.fromisoformat() handled most of the common cases and I found it's quite capable. The following input variations results in correct (the same) output:

import datetime
from great_tables import vals

iso_str = [
    "2020-05-20",
    "2020-05-20 22:30",
    "2020-05-20T22:30",
    "2020-05-20 22:30:05",
    "2020-05-20 22:30:05.824" # milliseconds *must* have 3 decimal places
    ]

fmtd_dates = []

for val in iso_str:
    fmtd_dates.append((vals.fmt_date(datetime.datetime.fromisoformat(val), date_style="wd_m_day_year"))[0])

fmtd_dates
['Wed, May 20, 2020',
 'Wed, May 20, 2020',
 'Wed, May 20, 2020',
 'Wed, May 20, 2020',
 'Wed, May 20, 2020']

So I think the code change is great here. Thanks, @jrycw !

Please correct me if I'm mistaken, but it seems you're testing fmt_date instead of fmt_datetime. Since vals.fmt_datetime isn't available, it needs to be tested from GT.

@rich-iannone rich-iannone self-requested a review April 17, 2024 17:41
@rich-iannone
Copy link
Member

@jrycw you are right. I mistakenly thought that .fmt_date() used the same internal function. I checked out this branch and tried out some variations. I found that this could be further improved if there is a single validation and transformation from datetime.fromisoformat(x) (we could drop the use of _validate_iso_datetime_str()).

With that change, the following code works:

import datetime
from great_tables import GT
import polars as pl

df = pl.DataFrame({"x": [
    "2020-05-20",
    "2020-05-20 22:30",
    "2020-05-20T22:30",
    "2020-05-20 22:30:05",
    "2020-05-20T22:30:05.232" # milliseconds *must* have 3 decimal places
    ]})

GT(df).fmt_datetime(columns="x", date_style="wd_m_day_year", time_style="h_m_p")

It produces this:
fmt_datetime_refactor_tbl

This seems like a good improvement because the first and last cases failed before, but they provide formatted output now (and they work as can be expected).

To do this, I renamed _iso_to_datetime() to _iso_str_to_datetime() with this changed it to:

def _iso_str_to_datetime(x: str) -> datetime:
    """
    Converts a string in ISO format to a datetime object.

    Args:
        x (str): The string to be converted.

    Returns:
        datetime: The converted datetime object.
    """
    return datetime.fromisoformat(x)

The code in fmt_datetime() changes to:

        # If `x` is a string, assume it is an ISO datetime string and convert it to a datetime object
        if isinstance(x, str):

            # Convert the ISO datetime string to a datetime object
            x = _iso_str_to_datetime(x)

@jrycw @machow does this change make sense? I could push the changes to the branch (or you could modify). Also, some new test cases are needed here.

@jrycw
Copy link
Contributor Author

jrycw commented Apr 17, 2024

LGTM! If @machow is also on board with this, could @rich-iannone push your changes to the branch? I'll then proceed to add the relevant tests based on your demonstration.

@machow
Copy link
Collaborator

machow commented Apr 17, 2024

This sounds good to me!

@jrycw
Copy link
Contributor Author

jrycw commented Apr 18, 2024

Related to #299, it turns out we could make similar modifications to fmt_date. Here's a summary of the changes:

  • Removed _validate_iso_date_str.
  • Renamed _iso_to_date to _iso_str_to_date, which is also used in the _generate_data_vals function.
  • Added test_fmt_date.

I believe the modifications are correct, but please feel free to correct any mistakes I may have made.

@jrycw jrycw changed the title Refactor fmt_datetime Refactor fmt_date and fmt_datetime Apr 18, 2024
@jrycw
Copy link
Contributor Author

jrycw commented Apr 18, 2024

This might sound a bit wild, but I just discovered a similar function called datetime.time.fromisoformat() in Python. This means we could make similar modifications as follows:

  • Removed _validate_iso_time_str and _normalize_iso_time_str.
  • Renamed _iso_to_time to _iso_str_to_time.

@jrycw jrycw changed the title Refactor fmt_date and fmt_datetime Refactor fmt_time, fmt_date and fmt_datetime Apr 18, 2024
@jrycw
Copy link
Contributor Author

jrycw commented Apr 18, 2024

I attempted to remove _iso_to_date and replace it with _iso_str_to_date in _generate_data_vals, and it seems to have worked.

Copy link
Member

@rich-iannone rich-iannone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Collaborator

@machow machow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! I haven't followed too closely, but the tests seem to cover big cases!

@machow machow merged commit 0f0c024 into posit-dev:main Apr 22, 2024
9 checks passed
@jrycw jrycw deleted the refactor-fmt_datetime branch April 22, 2024 13:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Verify fmt_datetime iso cases are in tests
4 participants