Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT: Create ExtractWeekOfYear operation #2177

Merged
merged 1 commit into from
Sep 7, 2020

Conversation

xmnlab
Copy link
Contributor

@xmnlab xmnlab commented Apr 4, 2020

Create ExtractWeekOfYear operation and add its support to Clickhouse, CSV, MySQL, Pandas, Parquet, PostgreSQL, PySpark, SQLite and Spark

notes:

@jreback jreback added the feature Features or general enhancements label Apr 9, 2020
@xmnlab xmnlab force-pushed the add-extract-week-of-year branch 3 times, most recently from b44f888 to 6786c58 Compare May 30, 2020 01:58
@xmnlab xmnlab marked this pull request as ready for review May 30, 2020 02:38
@xmnlab
Copy link
Contributor Author

xmnlab commented May 30, 2020

this PR is ready for review! thanks!

@xmnlab xmnlab force-pushed the add-extract-week-of-year branch 2 times, most recently from c771f92 to 6feeb2f Compare June 12, 2020 19:22
@xmnlab
Copy link
Contributor Author

xmnlab commented Jun 12, 2020

ibis-project.ibis (LinuxTest py37_sql_parquet) tests worked, but Publish Linux environment YAML to Azure give an error:

Information, ApplicationInsightsTelemetrySender correlated 2 events with X-TFS-Session 33917c70-b933-4976-8df8-aa9153851b7f
##[error]Artifact LinuxCondaEnvironment-37-main already exists for build 2595.
Finishing: Publish Linux environment YAML to Azure

probably it is not a block for review or merge.

it is ready again for review. thanks.

Copy link
Contributor

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, added some comments to make the code more clear

https://en.wikipedia.org/wiki/ISO_week_date
"""
# n_weeks_year adjustment
def _p(year):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you replace _p by a descriptive name? Also, the heading _ is not needed, since this function it's inside another, so it doesn't add much value to mark it as private (it can't be used anywhere else anyway).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used p here because it is the name of the function used by the algorithm presented in the link in See Also section.
but I can change that.

dow = d.day_of_week.index() + 1

result = doy - dow
result += 10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes the code less readable to have all this, instead of simply:

d.day_of_year() - d.day_of_week.index() + 11

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right ... I just follow the algorithm from https://en.wikipedia.org/wiki/ISO_week_date .. I forgot to refactor that. thanks!

)

op = expr.op()
return translator.translate(_extract_woy_expr(op.args[0]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a nested function here?

Also, things like op = expr.op() seem unnecessary and confusing, if you are going to use the result just once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right .. probably I needed that for some reason in the past .. or something like that .. and forgot to refactor that. thanks!

)
/ 7
+ 1,
sa.INTEGER,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's black who is creating this unreadable code. But you can use a variable here instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I will improve that. thanks!

Copy link
Contributor Author

@xmnlab xmnlab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the feedback @datapythonista
I will work on that.

https://en.wikipedia.org/wiki/ISO_week_date
"""
# n_weeks_year adjustment
def _p(year):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used p here because it is the name of the function used by the algorithm presented in the link in See Also section.
but I can change that.

dow = d.day_of_week.index() + 1

result = doy - dow
result += 10
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right ... I just follow the algorithm from https://en.wikipedia.org/wiki/ISO_week_date .. I forgot to refactor that. thanks!

)

op = expr.op()
return translator.translate(_extract_woy_expr(op.args[0]))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right .. probably I needed that for some reason in the past .. or something like that .. and forgot to refactor that. thanks!

)
/ 7
+ 1,
sa.INTEGER,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I will improve that. thanks!

Copy link
Contributor

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice changes, added some comments that I think should improve readability

https://en.wikipedia.org/wiki/ISO_week_date

"""
# adjustment factor for week year extraction
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind moving this to the docstring of adjustment factor and provide a proper docstring for it? For me, and I guess for anyone else reading the code, it's not obvious what this is doing. Would be nice to have more information, including examples. I guess this is related to leap-years, but would be nice to know exactly what's the reasoning here, without having to spend a decent amount of time reading the code and researching.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove the line # adjustment factor for week year extraction please. I think it was left here unintentionally.

def adjustment_factor(year):
return (year + (year // 4) - (year // 100) + (year // 400)) % 7

need_adjustment = (adjustment_factor(year) == 4) | (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
need_adjustment = (adjustment_factor(year) == 4) | (
needs_adjustment = (adjustment_factor(year) == 4) | (


def _woy_preliminary(d: Union[ir.DateValue, ir.TimestampValue]) -> ir.Expr:
"""
Return a preliminary week of year (WOY).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain in another paragraph what a preliminary week of the year is please


Parameters
----------
d : ibis.expr.types.DateValue or ibis.expr.types.TimestampValue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is d? Would be nice to have a more descriptive name, and also a description if the name is not clear enough.

d = expr.op().args[0]

w = _woy_preliminary(d)
y = d.year()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, avoid naming variables like d, w, y, and use descriptive names.



@compiles(ops.ExtractQuarter)
def compile_extract_quarter(t, expr, scope, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intentional? Feels unrelated

Copy link
Contributor Author

@xmnlab xmnlab Jun 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right, maybe some problem after rebasing ... thanks!

return 52 + (ibis.case().when(need_adjustment, 1).else_(0).end())


def _woy_preliminary(d: Union[ir.DateValue, ir.TimestampValue]) -> ir.Expr:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a function really needed for this? This is being used just once, feels like moving this two lines of code inside _extract_woy would keep things much simpler.

return (result // 7).cast('int16')


def _extract_woy(translator, expr) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think being explicit about names will make things easier. In this PR woy is clear what it means by the context. But when working on something unrelated, going to read the docstring will be needed. While naming it week_of_year will make it immediate to know.

Suggested change
def _extract_woy(translator, expr) -> str:
def _extract_week_of_year(translator, expr) -> str:

Copy link
Contributor

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Added couple of small comments, to lgtm.

"""
date = expr.op().args[0]

week = _week_of_year_preliminary(date)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a separate function for _week_of_year_preliminary seems unnecessary and overcomplicated:

Suggested change
week = _week_of_year_preliminary(date)
week_of_year_without_adjustment = ((date.day_of_year() - date.day_of_week.index() + 11) // 7).cast('int16')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @datapythonista I did this change. just keep the variable name to week and added a comment about this there.

ibis/omniscidb/operations.py Outdated Show resolved Hide resolved
https://en.wikipedia.org/wiki/ISO_week_date

"""
# adjustment factor for week year extraction
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove the line # adjustment factor for week year extraction please. I think it was left here unintentionally.

@xmnlab
Copy link
Contributor Author

xmnlab commented Jun 25, 2020

@datapythonista the comment # adjustment factor for week year extraction was to skip the follow docstring checking error:
D202: No blank lines allowed after function docstring (found 1)

and black add an empty line automatically before the function definition.

I will replace it by # noqa: D202

@datapythonista
Copy link
Contributor

I will replace it by # noqa: D202

Try to keep in mind that eventually someone will read that code, and would wonder what is the reason for it. Adding a comment for the reader, besides the comment for the linter, will make our future lives easier. Something like: # noqa: D202 flake-8 incorrectly fails if there is a function definition immediately after the docstring

.end()
)

return translator.translate(result)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you writing these as ibis operations? if the db doesn't support them, then don't add them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. removed. thanks.

@datapythonista
Copy link
Contributor

@xmnlab many files conflicting here, can you rebase please?

@xmnlab
Copy link
Contributor Author

xmnlab commented Jul 15, 2020

@datapythonista thanks! rebased!

@@ -144,6 +144,39 @@ def _extract_epoch_seconds(t, expr):
return sa.cast(sa_expr, sa.BigInteger)


def _extract_week_of_year(t, expr):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I'm missing something, but in this comment Jeff said that if the db doesn't support the operation, let's simply not have it. Shouldn't we be removing this function then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove it too. I will push the changes in a bit.

Copy link
Contributor

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @xmnlab, lgtm

@datapythonista
Copy link
Contributor

@jreback I think your comments were addressed. Do you mind having another look here? I think this should be ready.

@jreback jreback added this to the Next Bugfix Release milestone Sep 7, 2020
@jreback jreback merged commit a37c24c into ibis-project:master Sep 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants