Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN: enforce deprecation of frequencies deprecated for offsets #57986

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

natmokval
Copy link
Contributor

@natmokval natmokval commented Mar 24, 2024

xref #52064, #55792, #55553, #55496
Enforced deprecation of aliases M, Q, Y, etc. in favour of ME, QE, YE, etc. for offsets. Now the aliases are case-sensitive.

P.S. Corrected a note in v3.0.0 related to PR #57627

@natmokval natmokval added Clean Frequency DateOffsets labels Mar 25, 2024
@natmokval natmokval marked this pull request as ready for review March 25, 2024 15:42
@natmokval
Copy link
Contributor Author

I enforced deprecation of aliases M, Q, Y, etc. in favour of ME, QE, YE, etc. for offsets, ci - green. @MarcoGorelli could you please take a look at this PR?

@MarcoGorelli
Copy link
Member

/preview

Copy link
Contributor

Website preview of this PR available at: https://pandas.pydata.org/preview/pandas-dev/pandas/57986/

)
elif is_period and name.upper() in c_OFFSET_DEPR_FREQSTR:
elif is_period and name.upper() in c_OFFSET_REMOVED_FREQSTR:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this branch I'm seeing something like: if it's removed, then warn that it's deprecated. Is this right?

Do we need c_OFFSET_REMOVED_FREQSTR at all, or is it possible to just let it raise?

Copy link
Contributor Author

@natmokval natmokval Mar 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this branch I'm seeing something like: if it's removed, then warn that it's deprecated. Is this right?

agreed, this check looks strange. It's my mistake, I will correct it.

Do we need c_OFFSET_REMOVED_FREQSTR at all, or is it possible to just let it raise?

I am unsure, maybe we can keep c_OFFSET_REMOVED_FREQSTR at least for a while. I think it would be good to explain when we raise why this frequency is invalid. Because it's possible to use this frequency in other cases, it wouldn't be entirely correct to say that it's "Invalid frequency".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok sure, that's a good point!

If we're going to keep it around then, maybe we can tell users which to use instead? At least for offsets?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I agree. I corrected the error message. Now we advise users which frequency for offsets to use instead of invalid one. I think CI failures are unrelated to my changes.

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating!

I've left a comment, but also, in general, this still looks very complex...if we're enforcing deprecations, then this might be a good chance to simplify the logic here?

OK with adding complexity to give a good error message if someone passes freq='M' instead of 'ME', as that's probably still fairly common, but periods are far less used

The code is currently very hard to read - which is OK as a temporary phase during which we're enforcing a deprecation - but ultimately the goal should be to end up something that's cleaner than it was when we started. Is that possible here?

)
name = c_OFFSET_DEPR_FREQSTR.get(name.upper())
raise ValueError(INVALID_FREQ_ERR_MSG.format(name))
name = c_OFFSET_REMOVED_FREQSTR.get(name.upper())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure about this line? It looks very odd to me to do

                if is_period and name.upper() in c_OFFSET_REMOVED_FREQSTR:
                    name = c_OFFSET_REMOVED_FREQSTR.get(name.upper())

If you need to map 'M' (period) to its associated offset alias ('ME'), then it's there another dictionary, or another way, to do that? c_OFFSET_REMOVED_FREQSTR will presumably be removed at some point

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to me that c_OFFSET_REMOVED_FREQSTR is functioning as both:

  • aliases which used to be valid for offsets, but have been renamed
  • period aliases, to get the corresponding offset aliases

How about creating another dictionary, which maps periods aliases to their corresponding offset aliases? I think this one could be a lot smaller - for example, it wouldn't need "BY": "BYE",, right?

Then, to_offset could be a lot simpler, something like

if not is_period and name in c_OFFSET_REMOVED_FREQSTR:
    # raise error message
if is_period:
    if name in c_PERIOD_TO_OFFSET_FREQSTR:
        name = c_PERIOD_TO_OFFSET_FREQSTR[name]
    else:
        # raise: `name` not support for period

?

Copy link
Contributor Author

@natmokval natmokval Apr 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure about this line? It looks very odd to me to do

                if is_period and name.upper() in c_OFFSET_REMOVED_FREQSTR:
                    name = c_OFFSET_REMOVED_FREQSTR.get(name.upper())

If you need to map 'M' (period) to its associated offset alias ('ME'), then it's there another dictionary, or another way, to do that? c_OFFSET_REMOVED_FREQSTR will presumably be removed at some point

thank you for the comment. It seems odd, I agree. I removed this check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to me that c_OFFSET_REMOVED_FREQSTR is functioning as both:

* aliases which used to be valid for offsets, but have been renamed

* period aliases, to get the corresponding offset aliases

How about creating another dictionary, which maps periods aliases to their corresponding offset aliases? I think this one could be a lot smaller - for example, it wouldn't need "BY": "BYE",, right?

Then, to_offset could be a lot simpler, something like

if not is_period and name in c_OFFSET_REMOVED_FREQSTR:
    # raise error message
if is_period:
    if name in c_PERIOD_TO_OFFSET_FREQSTR:
        name = c_PERIOD_TO_OFFSET_FREQSTR[name]
    else:
        # raise: `name` not support for period

?

thanks, I added the dictionary PERIOD_TO_OFFSET_FREQSTR and made changes as you suggested.

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating! A few things

  • good to have the c_PERIOD_TO_OFFSET_FREQSTR dict, but I don't see why it's being used here:
    if not is_period:
        if name.upper() in c_OFFSET_REMOVED_FREQSTR:
            raise ValueError(
                f"\'{name}\' is no longer supported for offsets. Please "
                f"use \'{c_OFFSET_REMOVED_FREQSTR.get(name.upper())}\' "
                f"instead."
            )
        # below we raise for lowrecase monthly and bigger frequencies
        if (name.upper() != name and
                name.lower() not in {"h", "min", "s", "ms", "us", "ns"} and
                name.upper() not in c_PERIOD_TO_OFFSET_FREQSTR and
                name.upper() in c_OFFSET_TO_PERIOD_FREQSTR):
            raise ValueError(INVALID_FREQ_ERR_MSG.format(name))
    If you've already checked if not is_period, when why do you need to check if it's in c_PERIOD_TO_OFFSET_FREQSTR ?
  • lowrecase typo
  • is this part temporary
      elif name in {"d", "b"}:
        name = name.upper()
    elif (name.upper() not in {"B", "D"} and
            not name.upper().startswith("W")):
    ?
    If so, could you add a comment explaining why it needs to be there, possibly linking to an open issue? Ideally we should get to the point where we can get rid of all this complexity, so let's make it clear what the road towards that endpoint is

@natmokval
Copy link
Contributor Author

natmokval commented Apr 12, 2024

Thanks for updating! A few things

* good to have the `c_PERIOD_TO_OFFSET_FREQSTR` dict, but I don't see why it's being used here:
  ```python
  if not is_period:
      if name.upper() in c_OFFSET_REMOVED_FREQSTR:
          raise ValueError(
              f"\'{name}\' is no longer supported for offsets. Please "
              f"use \'{c_OFFSET_REMOVED_FREQSTR.get(name.upper())}\' "
              f"instead."
          )
      # below we raise for lowrecase monthly and bigger frequencies
      if (name.upper() != name and
              name.lower() not in {"h", "min", "s", "ms", "us", "ns"} and
              name.upper() not in c_PERIOD_TO_OFFSET_FREQSTR and
              name.upper() in c_OFFSET_TO_PERIOD_FREQSTR):
          raise ValueError(INVALID_FREQ_ERR_MSG.format(name))
  ```

  If you've already checked `if not is_period`, when why do you need to check if it's in `c_PERIOD_TO_OFFSET_FREQSTR `?

we need the check if it isn't in c_PERIOD_TO_OFFSET_FREQSTR, because we did not deprecate lowercase frequencies "d", "b", "w", "weekday", "w-sun”, and so on. We don't want to raise a ValueError for these frequencies for both offsets and period. After deprecating these frequencies we can remove the check (only uppercase will be correct).

for example without this check test_reindex_axes in pandas/tests/frame/methods/test_reindex.py raises a ValueError for freq = "d"

@natmokval
Copy link
Contributor Author

natmokval commented Apr 12, 2024

* `lowrecase` typo 

thanks, I corrected the typo

@natmokval
Copy link
Contributor Author

* is this part temporary
  ```python
    elif name in {"d", "b"}:
      name = name.upper()
  elif (name.upper() not in {"B", "D"} and
          not name.upper().startswith("W")):
  ```
  If so, could you add a comment explaining why it needs to be there, possibly linking to an open issue? Ideally we should get to the point where we can get rid of all this complexity, so let's make it clear what the road towards that endpoint is

Yes, it's the temporary part. I left the comment below.

@MarcoGorelli
Copy link
Member

thanks for explaining - is there a way to do that part without using c_PERIOD_TO_OFFSET_FREQSTR? let's try to separate them out a bit more, if c_PERIOD_TO_OFFSET_FREQSTR is for mapping period aliases to offset aliases, then we probably shouldn't be using it for offset aliases

@natmokval
Copy link
Contributor Author

thanks for explaining - is there a way to do that part without using c_PERIOD_TO_OFFSET_FREQSTR? let's try to separate them out a bit more, if c_PERIOD_TO_OFFSET_FREQSTR is for mapping period aliases to offset aliases, then we probably shouldn't be using it for offset aliases

but what should we do with aliases which are the same for both: period and offsets, such as "D", "B", "W", "WEEKDAY", "W-SUN”, etc.?
I think we need to keep them in both dictionaries c_PERIOD_TO_OFFSET_FREQSTR and c_OFFSET_TO_PERIOD_FREQSTR, we use them then we check case correctness. Oops, I forgot to add "WEEKDAY" in c_PERIOD_TO_OFFSET_FREQSTR, should I add it?

After enforcing the deprecation of "d", "b", "w", "weekday", "w-sun”, etc. we can make simplifications.

@MarcoGorelli
Copy link
Member

but what should we do with aliases which are the same for both: period and offsets, such as "D", "B", "W", "WEEKDAY", "W-SUN”, etc.?
I think we need to keep them in both dictionaries c_PERIOD_TO_OFFSET_FREQSTR and c_OFFSET_TO_PERIOD_FREQSTR, we use them then we check case correctness.

I'd suggest either that, or to add a set which contains aliases which are valid for both

@natmokval
Copy link
Contributor Author

natmokval commented Apr 12, 2024

but what should we do with aliases which are the same for both: period and offsets, such as "D", "B", "W", "WEEKDAY", "W-SUN”, etc.?
I think we need to keep them in both dictionaries c_PERIOD_TO_OFFSET_FREQSTR and c_OFFSET_TO_PERIOD_FREQSTR, we use them then we check case correctness.

I'd suggest either that, or to add a set which contains aliases which are valid for both

thanks, then maybe we can leave it as it is?

@MarcoGorelli
Copy link
Member

cool, I think this is on the right track

I think there's a logic error somewhere, as it currently gives

In [7]: pd.period_range('2000', periods=3, freq='s')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File offsets.pyx:4853, in pandas._libs.tslibs.offsets.to_offset()

ValueError: 's' is not supported as period frequency.

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Cell In[7], line 1
----> 1 pd.period_range('2000', periods=3, freq='s')

File ~/pandas-dev/pandas/core/indexes/period.py:585, in period_range(start, end, periods, freq, name)
    582 if freq is None and (not isinstance(start, Period) and not isinstance(end, Period)):
    583     freq = "D"
--> 585 data, freq = PeriodArray._generate_range(start, end, periods, freq)
    586 dtype = PeriodDtype(freq)
    587 data = PeriodArray(data, dtype=dtype)

File ~/pandas-dev/pandas/core/arrays/period.py:321, in PeriodArray._generate_range(cls, start, end, periods, freq)
    318 periods = dtl.validate_periods(periods)
    320 if freq is not None:
--> 321     freq = Period._maybe_convert_freq(freq)
    323 if start is not None or end is not None:
    324     subarr, freq = _get_ordinal_range(start, end, periods, freq)

File period.pyx:1768, in pandas._libs.tslibs.period._Period._maybe_convert_freq()

File offsets.pyx:4914, in pandas._libs.tslibs.offsets.to_offset()

ValueError: Invalid frequency: s, failed to parse with error message: ValueError("'s' is not supported as period frequency.")

but 's' should definitely be supported here, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants