Skip to content

Conversation

pandemicsyn
Copy link

@pandemicsyn pandemicsyn commented Dec 4, 2020

Changes the behaviour of the UT parser to collect any unknown kv pairs or unbalanced fields as a slice of unparsed entries (returned in the UserTraffic struct) rather than throwing an error. This is especially helpful since we somewhat frequently have duplicate value fields for keys.

For example - while auditing a sample of log lines for a Peloton site nearly 100% of the "key with no value" errors where from a country code field where the value was a duplicate like US, US.

2020/12/04 08:29:30 ==== Errors Seen ====
2020/12/04 08:29:30 key with no value: 6341 
2020/12/04 08:29:30 missing_sid_and_csid: 1926
2020/12/04 08:29:30 lines seen: 19058
2020/12/04 08:29:30 bytes potentially billed: 611126576
2020/12/04 08:29:30 bytes potentially missed: 45379903

@pandemicsyn pandemicsyn requested a review from a team as a code owner December 4, 2020 14:44
@pandemicsyn pandemicsyn added the type: chore work needed to keep the product and development running smoothly label Dec 4, 2020
@rybit
Copy link
Member

rybit commented Dec 4, 2020

I think that this is an ok solution, I'm wondering if it might be easier to actually not error on the case of a key with no value. We could just log a warning (or a counter?) and then move on?

rybit
rybit previously approved these changes Dec 4, 2020
@pandemicsyn
Copy link
Author

pandemicsyn commented Dec 4, 2020

Yea - we have that catch all "other" map as the default case for when we run into unknown kv pair's:

default:
if ut.Other == nil {
ut.Other = make(map[string]string)
}
ut.Other[parts[0]] = parts[1]

I tested a variation that just adds the offending "key with no value" error entries as key's with an empty value to that others map. Then folks using the parser can check that map for entries and generate log lines/metrics accordingly. I'd prefer that to adding logging/metrics to the parser directly. I'm definitely down for doing that instead of the regex. @rybit wdyt ?

@rybit
Copy link
Member

rybit commented Dec 4, 2020

yeah I think that approach works nicely. Or maybe we change the 'other' to be a list of unparsable sections, the whole bit.

@pandemicsyn pandemicsyn merged commit cd18e56 into master Dec 5, 2020
@pandemicsyn pandemicsyn deleted the chore/regex-based-extractor branch December 5, 2020 00:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: chore work needed to keep the product and development running smoothly
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants