[Bug] Incorrect Historical Value when Two Updates on the Same Day #90

jon-goldberg · 2023-02-28T21:51:13Z

Is there an existing issue for this?

I have searched the existing issues

Describe the issue

int_jira__pivot_daily_field_history has an issue where there are two transactions on the same day one of which is null. The scenario is as follows.

An issue has two transactions on the same day changing the sprint. Its initially null then is set to an assigned sprint.
int_jira__daily_field_history changes Null values to the text 'is_null'
case when field_value is null then 'is_null' else field_value end as field_value,
int_jira__pivot_daily_field_history chooses the max value for any sprints. The text 'is_null' is greater than the actual sprint id and is used.

I imagine this affects all other historical fields where there are two transactions on the same day. The code should be changed to select the last transaction on a day and use those values over others. Its not as simple as ignoring the nulls as those could be the last transaction in certain cases. This is the offending code block.

pivot_out as (

    -- pivot out default columns (status and sprint) and others specified in the var(issue_field_history_columns)
    -- only days on which a field value was actively changed will have a non-null value. the nulls will need to 
    -- be backfilled in the final jira__daily_issue_field_history model
    select 
        valid_starting_on, 
        issue_id,
        max(case when lower(field_id) = 'status' then field_value end) as status,
        max(case when lower(field_name) = 'sprint' then field_value end) as sprint

        ,
            max(case when lower(field_name) = 'due date' then field_value end) as due_date
        ,
            max(case when lower(field_name) = 'labels' then field_value end) as labels
        ,
            max(case when lower(field_name) = 'reporter' then field_value end) as reporter
        ,
            max(case when lower(field_name) = 'resolution' then field_value end) as resolution
        ,
            max(case when lower(field_name) = 'resolved' then field_value end) as resolved
        ,
            max(case when lower(field_name) = 'story points' then field_value end) as story_points
        ,
            max(case when lower(field_name) = 'team' then field_value end) as team
        from daily_field_history

    group by 1,2
),

Relevant error log or model output

No response

Expected behavior

The null value should be ignored if its an earlier transaction on that day. Only the last field_value should be selected.

dbt Project configurations

N/A

Package versions

N/A

What database are you using dbt with?

postgres

dbt Version

N/A

Additional Context

Are you willing to open a PR to help address this issue?

Yes.
Yes, but I will need assistance and will schedule time during our office hours for guidance
No.

The text was updated successfully, but these errors were encountered:

fivetran-jamie · 2023-03-01T21:00:01Z

hey there, really appreciate your thoroughness! int_jira__daily_field_history, which the pivot model references, should be looking at only the last Sprint value for an issue for each day

i actually think the issue is that you have two different field ids for Sprint (customfield__10020 and customfield_10106), and the package partitions by the field_id instead of the field_name. Do you know what's up with that?

jon-goldberg · 2023-03-02T19:50:48Z

Our environment was customized over the years so I wouldn't be surprised if they changed a setting that resulted in two fields being used behind the scenes. With that said we only have one Sprint field I can see in the UI and its showing both of the toggled sprints.

Thanks for pointing me to that earlier table.

From a consistency perspective perhaps 'int_jira__daily_field_history' should be using the field name instead of the ID for the partition? Alternatively the pivot logic would need to be changed but I'd think these go hand in hand.

row_number() over (
partition by valid_starting_on, issue_id, field_id
order by valid_starting_at desc
) as row_num

fivetran-jamie · 2023-03-03T19:40:10Z

i think for most most folks, partitioning by the field_id seems like the safer route 🤔

for your case however, i'd recommend overriding the int_jira__daily_field_history model to partition by field_name instead of the ID (you may also need to override downstream models that also partition by or join on field_id). we've got some instructions around how to do so here

jon-goldberg · 2023-03-03T20:18:28Z

Thanks Jamie! Your guidance on how the package was intended to work makes that easier. I'll make one last appeal to a broader fix but likely proceed to making a fix on our end.

I understand this situation might not occur regularly but it could happen to others. The risk is without rigorous testing it would go unnoticed and result in incorrect reporting. If it is discovered later than people will waste cycles trying to fix it.

As a data practitioner that subscribes to 'No Broken Windows' - this bothers me. There are multiple ways to solve the problem and I think this would ultimately be a fix that is relatively easy to implement now as we know about it while preventing future headache for others.

In any case- thank you for the support.

fivetran-jamie · 2023-03-03T20:45:29Z

Yeah that's very fair, as there have honestly been quite a few discussions around field_name vs field_id in Jira since we first released this package.

Next sprint (in a week), the team and I take some time to think this through. can't promise anything (except that we'll seriously explore it 🤠 ), but I want to find a solution that can address your (and probably others') issue without adding too much complexity to the package.

jon-goldberg · 2023-03-22T13:13:55Z

Hey @fivetran-jamie - Just curious where the team landed on this. Appreciate all of the support again. We'll likely be trying out of a few of the prebuilt transformation packages soon.

fivetran-jamie · 2023-03-22T16:21:53Z

hey hey -- so we will indeed be adding the option to use field_name instead of field_id partitions/joins! 🎉

this will probably be achieved through a jira_field_grain variable that you can set to field_name from your dbt_project.yml

Thanks again for pushing on this @jon-goldberg 🙌 In my investigations this past sprint, I realized that the package definitely assumes field names to be unique, which is recommended by Jira but clearly not always the case. I think this will be helpful to lots of folks!

Timeline-wise, I'll be working on this next sprint (starts tomorrow)

jon-goldberg · 2023-03-22T18:37:14Z

Wooo-hoo! Awesome. Thanks Jamie for the support 💪.

fivetran-jamie · 2023-03-28T17:16:12Z

hey hey @jon-goldberg -- i've got a working branch if you have time to test it out!

to do so, you'd replace the jira part of your packages.yml with

packages:
  - git: https://github.com/fivetran/dbt_jira.git
    revision: feature/switch-field-grain

and then in your dbt_project.yml, add the following (new) variable

vars:
  jira_field_grain: 'field_name' # default value is still field_id

FYI i am still recreating your use case (ie duplicate fields) in our own Jira sandbox environment, so would really appreciate you testing this out if you've got bandwidth!

jon-goldberg · 2023-04-17T20:57:39Z

@fivetran-jamie Apologies for the delay - we were pulled into some project related work. Philip and I just tested - this appears to have addressed the problems we faced before but it did cause a failing test for some of our duplicate fields.

I am guessing this is to be expected and we should remove the test. I spot checked a field of the problem issue ids and the final table matches what I see in Jira.

unique_int_jira__combine_field_histories_combined_history_id

Thank you so much for the help!

fivetran-jamie · 2023-04-17T21:47:25Z

no problem! oh huh that's interesting... we do use the combined_history_id as the unique_key for the incremental configuration, so we it should unique for incremental runs to be accurate and complete

currently, combined_history_id is a surrogate key hashed on var('jira_field_grain'),'issue_id', 'valid_starting_at'] but perhaps it NEEDS to include field_id regardless of your jira_field_grain value. i am thinking this because at this point in the DAG, we haven't rolled fields up to the name-grain yet (they're still at the ID-grain in this model)

cc @fivetran-joemarkiewicz

fivetran-jamie · 2023-04-17T22:49:53Z

@jon-goldberg could you try this out and see if everything still ties out (and if the test failure is removed)?

# packages.yml
packages:
  - git: https://github.com/fivetran/dbt_jira.git
    revision: releases/v0.13.latest

updated the branch FYI

fivetran-jamie · 2023-04-18T19:39:24Z

heads up we're planning to release these updates tomorrow if you @jon-goldberg (or anyone creeping on this thread 👁️ 👁️) have a moment today or tomorrow to retest 🌟

fivetran-joemarkiewicz · 2023-04-20T14:53:21Z

Hi All, the updates @fivetran-jamie highlighted above are now live in the latest version of the package (v.0.13.0)! Thank you all for being engaged in this discussion and helping to make a meaningful update to the package.

As such, I will close out this issue. Please feel free to reopen if the issue persists on your end. Thanks again!

jon-goldberg · 2023-04-21T21:44:30Z

Thanks Jamie and Joe! Philip didn't have time to update our codebase to use the new packages so I could test but he is planning to do so next week. Appreciate the support with this.

fivetran-joemarkiewicz added type:bug Something is broken or incorrect status:accepted Scoped and accepted into queue labels Mar 23, 2023

fivetran-joemarkiewicz assigned fivetran-jamie Mar 23, 2023

fivetran-jamie mentioned this issue Mar 27, 2023

introduce jira_field_grain to use the field name as the grain of partitions and joins #93

Merged

14 tasks

fivetran-jamie added status:in_progress Currently being worked on update_type:feature Primary focus is to add new functionality and removed status:accepted Scoped and accepted into queue labels Mar 28, 2023

fivetran-jamie mentioned this issue Apr 4, 2023

[Feature] split out fields with the same name into distinct columns in daily_issue_field_history #96

Open

4 tasks

fivetran-joemarkiewicz closed this as completed Apr 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Incorrect Historical Value when Two Updates on the Same Day #90

[Bug] Incorrect Historical Value when Two Updates on the Same Day #90

jon-goldberg commented Feb 28, 2023 •

edited by elanfivetran

Loading

fivetran-jamie commented Mar 1, 2023

jon-goldberg commented Mar 2, 2023

fivetran-jamie commented Mar 3, 2023

jon-goldberg commented Mar 3, 2023

fivetran-jamie commented Mar 3, 2023

jon-goldberg commented Mar 22, 2023

fivetran-jamie commented Mar 22, 2023

jon-goldberg commented Mar 22, 2023

fivetran-jamie commented Mar 28, 2023

jon-goldberg commented Apr 17, 2023 •

edited

Loading

fivetran-jamie commented Apr 17, 2023

fivetran-jamie commented Apr 17, 2023

fivetran-jamie commented Apr 18, 2023

fivetran-joemarkiewicz commented Apr 20, 2023

jon-goldberg commented Apr 21, 2023

[Bug] Incorrect Historical Value when Two Updates on the Same Day #90

[Bug] Incorrect Historical Value when Two Updates on the Same Day #90

Comments

jon-goldberg commented Feb 28, 2023 • edited by elanfivetran Loading

Is there an existing issue for this?

Describe the issue

Relevant error log or model output

Expected behavior

dbt Project configurations

Package versions

What database are you using dbt with?

dbt Version

Additional Context

Are you willing to open a PR to help address this issue?

fivetran-jamie commented Mar 1, 2023

jon-goldberg commented Mar 2, 2023

fivetran-jamie commented Mar 3, 2023

jon-goldberg commented Mar 3, 2023

fivetran-jamie commented Mar 3, 2023

jon-goldberg commented Mar 22, 2023

fivetran-jamie commented Mar 22, 2023

jon-goldberg commented Mar 22, 2023

fivetran-jamie commented Mar 28, 2023

jon-goldberg commented Apr 17, 2023 • edited Loading

fivetran-jamie commented Apr 17, 2023

fivetran-jamie commented Apr 17, 2023

fivetran-jamie commented Apr 18, 2023

fivetran-joemarkiewicz commented Apr 20, 2023

jon-goldberg commented Apr 21, 2023

jon-goldberg commented Feb 28, 2023 •

edited by elanfivetran

Loading

jon-goldberg commented Apr 17, 2023 •

edited

Loading