Skip to content
This repository has been archived by the owner on Mar 19, 2021. It is now read-only.

Include a window/tab/session "id" (better names are welcome) #81

Closed
philbooth opened this issue May 17, 2017 · 6 comments
Closed

Include a window/tab/session "id" (better names are welcome) #81

philbooth opened this issue May 17, 2017 · 6 comments
Assignees

Comments

@philbooth
Copy link
Contributor

philbooth commented May 17, 2017

The flow id performs excellently at it's stated aim of allowing us to measure user journeys across devices/windows. However sometimes we need metrics that are specific to the window or tab the user is in at that moment. @shane-tomlinson encountered such a problem recently.

For those cases, our current model is insufficient. We need one more piece of metadata, which is some kind of non-identifiable, er, identifier (if you see what I mean), so that we can group events from a single flow in to separate windows/tabs/sessions.

I don't think it would make any sense for such an identifier to be emitted with our back-end metrics, so fortunately no API changes would be necessary there. We'd just need to emit it from the content server metrics module and then update the lua output script, the import scripts and the redshift schemata.

We have some experience of adding extra fields to the CSVs before and it went okay (apart from the one bit where I ran out of disk space and briefly uploaded empty CSVs to S3 😊). Trying to make the import scripts work conditionally proved problematic last time, so I think we'd want to follow a similar path here and pad all the historical CSVs with an extra comma at the end of each line (so any future re-imports work smoothly). The extra column can be added to flow_events with default null and then we just start filling it with data as and when it's available.

@rfk, what do you think about the above? Maybe there's a simpler solution to the problem that I don't see yet?

@rfk
Copy link
Contributor

rfk commented May 17, 2017

@shane-tomlinson encountered such a problem recently.

Link? :-)

@rfk
Copy link
Contributor

rfk commented May 17, 2017

what do you think about the above? Maybe there's a simpler solution to the problem that I don't see yet?

I'm OK with it if it's purely an extra level of detail inside a flow_id. If we get into a situation where the same window id appears in multiple flow ids, then we've got some extra correlation concerns with PII etc (e.g. we might be able to identify two users who share a single computer).

That said, its not obvious to me what the concrete use-case is here. What's the concrete thing we want to measure but can't using the current events?

@philbooth
Copy link
Contributor Author

Link? :-)

I don't have a link but my recollection of the conversation runs something like:

Some events occur in both the tab where the user submits the sign-in form and in the tab where the user confirms their email. We need to differentiate between the two, because reasons.

@shane-tomlinson, can you fill in the blanks? 😄

@shane-tomlinson shane-tomlinson self-assigned this May 18, 2017
@philbooth
Copy link
Contributor Author

@shane-tomlinson, do we still need/want this?

@shane-tomlinson
Copy link

I think we can get by without it, using distinct event names if a screen is used in multiple places.

@philbooth
Copy link
Contributor Author

I'll close it out in that case, thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants