Skip to content
This repository has been archived by the owner on Mar 24, 2023. It is now read-only.

Add the validation for the new-profile ping on Nightly #65

Conversation

Dexterp37
Copy link
Contributor

No description provided.

@Dexterp37 Dexterp37 self-assigned this Jun 8, 2017
@Dexterp37 Dexterp37 requested a review from chutten June 8, 2017 12:21
Duplicate pings percentage (by document id): 0.21%


The 0.21% of ping duplicates is nice, compared to ~1% we usually get from the `main` and `crash` pings. However, nowdays we're running de-duplication by document id at the pipeline ingestion, so this might be a bit higher.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we have stats on duplicate documents that you can check - there is a parquet table called telemetry_duplicates_parquet available in re:dash. I did a quick check and it doesn't look like the duplicate detection flagged any duplicate new-profile pings from June 1 until today:

SELECT count(*)
FROM telemetry_duplicates_parquet
WHERE submission_date_s3 >= '20170601'
  AND fields.doc_type = 'new-profile'

returns 0 rows

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Mark! I've created a re:dash query and linked it in the report.

total_duplicates_clientid = deduped_docid_count - deduped_clientid.count()
print("Duplicate pings percentage (by client id): {:.2f}%".format(pct(total_duplicates_clientid, deduped_docid_count)))
```
Duplicate pings percentage (by client id): 0.89%
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, 22. Not a whole lot to make conclusions on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree. Given the nature of the 'new-profile' ping, this is sort of expected. This could just be an handful of misbehaving clients or some bug. Either way, I think we'd have to wait until Beta+1 week to have a better understanding of this.

```python
sorted_clientid_dupes = sorted(collected_clientid_reasons, key=lambda k: len(k[1]), reverse=True)
misbehaving_clients = [client_id for client_id, _ in sorted_clientid_dupes]
to_redact = subset.filter(lambda p: p.get("clientId") == misbehaving_clients[1]).collect()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you're only looking at pings that are from the 2nd misbehaving client? Shouldn't it be p.get("clientId") in misbehaving_clients?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this to a more compact form now.




Interesting! Looks like many of these pings are missing the pingsender header from the request, meaning:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but they're all from the same client.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, rephrased that part to make it clearer.




Sending `new-profile` pings with reason `shutdown` seems to happen half of the times with misbehaving clients and about 10% of the times with well behaving clients. The pingsender doesn't seem to be the issue here: if we generate a ping at shutdown and try to send it with the pingsender, and fail, then it's normal for Firefox to pick it back up and send it. As long as we don't generate a new, different, `new-profile` ping for the same client. And this is something that well behaving clients don't do, so that 10% should be safe.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this follow from the numbers above? Looks like a third of pings from misbehaving clients are coming from pingsender and less than a quarter of pings from behaving clients.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops. Looks like we're receiving about ~75% of the pings with the pingsender (~81% from well behaved clients). Changed this too with the correct numbers: the conclusion stays the same, looks like the pingsender in itself isn't the problem there, as most of the pings are being received from the pingsender with both good and bad behaving clients.


Sending `new-profile` pings with reason `shutdown` seems to happen half of the times with misbehaving clients and about 10% of the times with well behaving clients. The pingsender doesn't seem to be the issue here: if we generate a ping at shutdown and try to send it with the pingsender, and fail, then it's normal for Firefox to pick it back up and send it. As long as we don't generate a new, different, `new-profile` ping for the same client. And this is something that well behaving clients don't do, so that 10% should be safe.

### Does the profileCreationDate match the date we received the pings?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting analysis, but given that we know there are dupes, would it be better to compare only the delays of the first new-profile pings we receive?

(probably won't change things much)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed! As expected though, this resulted in a modest improvement.

# If there's a new-profile ping with reason 'shutdown', look for the closest next
# 'main' ping with reason shutdown.
for i, p in enumerate(ordered_pings):
# Skip until we find the 'new-profile' ping.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This'd be an interesting case, no? Where we send a main/shutdown ping before sending a new-profile/shutdown ping?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point. I've extended the case to account for that too. Looks like the 'new-profile' ping is always the first generated ping, if generated.

# Is the next ping a 'main' ping with reason 'shutdown'?
next_ping = ordered_pings[next_index]
if is_newprofile(next_ping):
return ("Duplicate 'new-profile' ping.", 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a special case of duplicates with no main/shutdown ping in between them. It is possible that there might be a new-profile/shutdown, then a main/shutdown, then a new-profile/shutdown.... but dupes are so infrequent maybe we can just gloss over that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored that part to simply count the number of times we have new-profile pings

if is_newprofile(next_ping):
return ("Duplicate 'new-profile' ping.", 1)

if next_ping.get("payload/info/reason") == "shutdown":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is always true, as all non-newprofile pings are main/shutdown thanks to earlier filtering.

```python
newprofile_plus_main = first_main.union(newping_shutdown)
sorted_per_client = newprofile_plus_main.map(lambda p: (p["clientId"], [(p, p["meta/creationTimestamp"])]))\
.reduceByKey(lambda a, b: sorted(a + b, key=lambda k: k[1]))\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we guaranteed that, at shutdown, the main ping is created after the new-profile ping? The 98% below suggests that this is almost always the case, but is it guaranteed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'new-profile' should always be generated before the 'main', at shutdown. See here where we wait for the 'new-profile' to be generated before moving on to shut down TelemetrySession (which generates the 'main' ping). Although this is how it happens, we're not making any strict promise on that.

The 'new-profile' ping was scheduled at the wrong time: 0.881953867028
Duplicate 'new-profile' ping.: 0.542740841248
No 'shutdown' new-profile ping found: 0.203527815468

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing validation case: we didn't check that new-profile/startup is sent only from sessions that last longer than 30min

I don't know that we absolutely must validate that case. Up to you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, just added this checks. Looks good so far!

@Dexterp37 Dexterp37 force-pushed the projects/newprofile_ping_nightly_validation.kp branch from c5567e2 to ecf43c8 Compare June 12, 2017 12:25
@Dexterp37
Copy link
Contributor Author

Thanks Chris for the feedback so far. This is RFAL. I've also attempted to pin duplicate new-profile pings on crashes, but looks like we only have crash pings for few of the cases :-\ I left my attempt there so that we can run this again on Beta FWIW.

Copy link
Contributor

@chutten chutten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I look forward to seeing the beta week 1 results.

return ("Duplicate 'new-profile' ping.", 1)

if not newprofile_mask[0]:
return ("The 'new-profile' ping it's not the first ping", 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar nit: "The 'new-profile' ping is not the first ping"

@Dexterp37 Dexterp37 force-pushed the projects/newprofile_ping_nightly_validation.kp branch from ecf43c8 to 510adb2 Compare June 12, 2017 14:02
@Dexterp37
Copy link
Contributor Author

Thanks Chris!

@Dexterp37 Dexterp37 merged commit 2ebf7f6 into mozilla:master Jun 12, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants