Skip to content

Remove de-duping of attribute keys #1098

@lzchen

Description

@lzchen

In the current metrics SDK implementation, attribute sets with entries that have the same keys are de-duped. The "first" unique key is taken and the rest is dropped. This behavior is not fully defined in the spec so it is up to languages to decide the specific implementation. De-duping does not make sense for a couple of reasons:

  1. Only the first key seen is kept, the rest are dropped. Is this "correct" behavior? Who is to say that the first key is the one that the user wants?
  2. Performance improvement only exists for the case in which there are multiple of the same key, which is uncommon. The majority of the cases, keys within an AttributeSet will have different values, in which case, performance actually degrades because of the constant checks of key existence in the seen hashmap.

I ran 2 benchmark tests demonstrating this, the first Counter/AddUniqueAttr makes a call to add with 1million unique keys and the seconds makes a call to add with 1 million of the same keys.

With de-duping:

Counter/AddUniqueAttr   time:   [338.88 ms **342.29 ms** 345.65 ms]
                        change: [+26.244% +28.995% +31.699%] (p = 0.00 < 0.05)
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe
Counter/AddDupeAttr     time:   [22.946 ms **23.200 ms** 23.475 ms]
                        change: [-90.635% -90.503% -90.361%] (p = 0.00 < 0.05)

Without de-duping:

Counter/AddUniqueAttr   time:   [236.92 ms **240.31 ms** 244.08 ms]
                        change: [-0.0665% +1.5192% +3.3808%] (p = 0.08 > 0.05)
                        Performance has improved.
Counter/AddDupeAttr     time:   [186.02 ms **187.53 ms** 189.18 ms]
                        change: [+1.8287% +2.9432% +4.0648%] (p = 0.00 < 0.05)
                        Performance has regressed.

Of course de-duping will have tremendous time improvements for duplicate attributes but this is probably not worth the overhead of additional time added with non-duplicate attributes.

Thoughts?

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-metricsArea: issues related to metrics

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions