Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release/v0.11.0 #103

Merged
merged 33 commits into from
Jul 25, 2023
Merged

release/v0.11.0 #103

merged 33 commits into from
Jul 25, 2023

Conversation

fivetran-joemarkiewicz
Copy link
Contributor

@fivetran-joemarkiewicz fivetran-joemarkiewicz commented Jul 7, 2023

PR Overview

This PR is a batch release which includes updates from the following PRs: PR #102 and PR #98

This PR will result in the following new package version: v0.11.0

This is a breaking change from the addition of the holiday schedule PR. As such, the changes included in this PR will all be breaking.

Please detail what change(s) this PR introduces and any additional information that should be known during the review of this PR:

Please see the individual PRs mentioned above for specific changes.

PR Checklist

Basic Validation

Please acknowledge that you have successfully performed the following commands locally:

  • dbt compile
  • dbt run –full-refresh
  • dbt run
  • dbt test
  • [NA] dbt run –vars (if applicable)

Before marking this PR as "ready for review" the following have been applied:

  • The appropriate issue has been linked and tagged
  • You are assigned to the corresponding issue and this PR
  • BuildKite integration tests are passing

Detailed Validation

Please acknowledge that the following validation checks have been performed prior to marking this PR as "ready for review":

  • You have validated these changes and assure this PR will address the respective Issue/Feature.
  • You are reasonably confident these changes will not impact any other components of this package or any dependent packages.
  • You have provided details below around the validation steps performed to gain confidence in these changes.

Please see the individual PRs for specifics around validations.

Standard Updates

Please acknowledge that your PR contains the following standard updates:

  • Package versioning has been appropriately indexed in the following locations:
    • indexed within dbt_project.yml
    • indexed within integration_tests/dbt_project.yml
  • CHANGELOG has individual entries for each respective change in this PR
  • README updates have been applied (if applicable)
  • [NA] DECISIONLOG updates have been updated (if applicable)
  • Appropriate yml documentation has been added (if applicable)

dbt Docs

Please acknowledge that after the above were all completed the below were applied to your branch:

  • docs were regenerated (unless this PR does not include any code or yml updates)

If you had to summarize this PR in an emoji, which would it be?

🚚

@fivetran-joemarkiewicz fivetran-joemarkiewicz marked this pull request as ready for review July 12, 2023 16:58
CHANGELOG.md Outdated Show resolved Hide resolved
end_time_utc,
cast(null as {{ dbt.type_string() }}) as holiday_name_check,
false as is_holiday_week
from valid_adjustment
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the non-holiday week portion of this union, do we need to add the contrary of line 207-208?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm that is a good call out. I will need to confirm this a bit more. I do recall it was needed to get the schedule to properly generate. However, the data access has since been removed so we can't investigate this further at the moment. Let me do some more digging here.

@fivetran-reneeli
Copy link
Contributor

Just had a few notes!

Also, note that the first PR references is linked/numbered to the issue, not the PR itself-- should swap it to #102

Co-authored-by: Renee Li <91097070+fivetran-reneeli@users.noreply.github.com>
@fivetran-joemarkiewicz
Copy link
Contributor Author

Thanks for reviewing @fivetran-reneeli! I just made an update you suggested, and currently investigating another to provide a more concrete response. Would you be able to clarify what you mean with the below.

Also, note that the first PR references is linked/numbered to the issue, not the PR itself-- should swap it to #102

I don't see where we are referencing an issue?

@fivetran-reneeli
Copy link
Contributor

Thanks for reviewing @fivetran-reneeli! I just made an update you suggested, and currently investigating another to provide a more concrete response. Would you be able to clarify what you mean with the below.

Also, note that the first PR references is linked/numbered to the issue, not the PR itself-- should swap it to #102

I don't see where we are referencing an issue?

In the PR template it says:

This PR is a batch release which includes updates from the following PRs: PR #99 and PR #98

But the first link isn't to a PR, it's to the issue. The actual PR is #102

@fivetran-joemarkiewicz
Copy link
Contributor Author

@fivetran-reneeli ahhh my apologies I thought you were mentioning the CHANGELOG. I see you were talking about the PR Itself. Just updated.

@fivetran-reneeli
Copy link
Contributor

My bad I could've been more clear with that! Not a big deal, just figured for leaving a paper trail.

Looks good! Approving

@fivetran-reneeli fivetran-reneeli self-requested a review July 13, 2023 21:32
Copy link
Contributor

@fivetran-reneeli fivetran-reneeli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

prev_end,
next_start,
coalesce(case
when not is_holiday_week and prev_end is not null then first_value(prev_end) over (partition by schedule_id, period_start order by start_time_utc rows between unbounded preceding and unbounded following)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious why the unbounded following addition? Do we need to make a similar change in line 192?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch and this was a key finding I found in this latest update. Essentially the reason for this change is the following:

  • Updated window function
first_value(prev_end) over (partition by schedule_id, period_start order by start_time_utc rows between unbounded preceding and unbounded following)

In this function, for each schedule_id, period_start group, the window is set to span the entire partition (from the first to the last row). The first_value() function then always returns the value of prev_end for the first row in the partition, regardless of where the current row is. This will be the same for all rows in the partition since rows between unbounded preceding and unbounded following looks at the entire partition.

  • Previous version of the window function
first_value(prev_end) over (partition by schedule_id, period_start order by start_time_utc rows unbounded preceding)

In this function, for each schedule_id, period_start group, the window is set to start from the first row in the partition and end with the current row, based on the ordering by start_time_utc. Therefore, first_value() will return the value of prev_end for the first row in the window, which is the first row from the start of the partition up to the current row. Unlike the first function, this may yield different results for different rows since the window ends at the current row.

So the difference is in the window of rows they consider for each calculation: the updated window function considers all rows in the partition, while the second one only considers rows from the start up to the current row. I was noticing the different results within the window was causing inconsistencies for these first_value and last_value window function. Therefore, this update ensure the rows within the partition yield the same and accurate results.

Actually, with you brining this up I noticed a few things:

  • I actually missed making this update here and just applied it in the latest commit.
  • Thanks for bringing up the question around why this was not applied to the max() window functions. This was actually not applied as the complexity of the window function was not needed. Since we are just grabbing the max, the extra logic to order and frame the partition was just extra compute that was not necessary. As such it was omitted.
    • In addressing this comment, I actually realize the ordering for the max() window functions is completely unnecessary. So I removed the order by and frame clause to reduce the compute of this model as they are not needed.
  • Circling back on the code you called out on line 192... I was doing an assessment of this field and realized I never actually ended up using it in the downstream ctes 😱. This must have been an artifact of my development. As such, I was able to remove this cte. Thanks for bringing attention to these as I was able to see a significant improvement in runtime following these updates where I removed erroneous ctes and window function order by and frame clauses.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks for the details Joe! Glad you saw other opportunities to save on compute and runtime too 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks for the details Joe! Glad you saw other opportunities to save on compute and runtime too 👍

@fivetran-joemarkiewicz fivetran-joemarkiewicz merged commit 31d6a5f into main Jul 25, 2023
1 check was pending
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants