You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You're right - we've not been clear on transaction ids.
We originally added them because occassionally, Cloudfront registers the same event twice. The transaction id became a way of telling if two similar looking events are really the same event recording twice - as such the requirement was only only to be unique within a narrow time frame.
However, having a UUID for each transaction is very desirable for analysis across the full SnowPlow data set for a particular web property. The question we're working through (and would appreciate input into) is: should that UUID be generated by the tracker? (The way the current transaction id is). Or would it make more sense to generate it at the ETL phase? (In which case the current transaction_id would be one input into the creation of a more robust id that really was globally unique.
If you have any thoughts on pros / cons of each approach we'd welcome your feedback. Otherwise, I'll update our current documentation to clarify the limitations of the transaction id, as currently implemented.
All the best,
Yali
On Friday, November 16, 2012 9:51:23 AM UTC, Michael Bell wrote:
Hi,
We've recently added SnowPlow to our site and have found the transaction id (txn_id) being captured is not as unique as we'd understood from the documentation. Docs state the txn_id field is:
A unique event ID. If two or more records have the same txn_id, one is a duplicate record
The javascript that generates the txn_id appears to be simply taking a 6 character substring of a random number, which is unlikely to be unique with a large enough dataset. Have we misunderstood the intention of this field, or we missing something else?
Regards,
Mike
The text was updated successfully, but these errors were encountered:
TBD whether UUID is set in ETL or tracker.
There are a couple of potential issues with setting the UUID in the tracker:
So I'm leaning towards:
See email thread below:
On 16 November 2012 11:14, Yali yali@snowplowanalytics.com wrote:
Hi Michael,
You're right - we've not been clear on transaction ids.
We originally added them because occassionally, Cloudfront registers the same event twice. The transaction id became a way of telling if two similar looking events are really the same event recording twice - as such the requirement was only only to be unique within a narrow time frame.
However, having a UUID for each transaction is very desirable for analysis across the full SnowPlow data set for a particular web property. The question we're working through (and would appreciate input into) is: should that UUID be generated by the tracker? (The way the current transaction id is). Or would it make more sense to generate it at the ETL phase? (In which case the current transaction_id would be one input into the creation of a more robust id that really was globally unique.
If you have any thoughts on pros / cons of each approach we'd welcome your feedback. Otherwise, I'll update our current documentation to clarify the limitations of the transaction id, as currently implemented.
All the best,
Yali
On Friday, November 16, 2012 9:51:23 AM UTC, Michael Bell wrote:
Hi,
We've recently added SnowPlow to our site and have found the transaction id (txn_id) being captured is not as unique as we'd understood from the documentation. Docs state the txn_id field is:
A unique event ID. If two or more records have the same txn_id, one is a duplicate record
The javascript that generates the txn_id appears to be simply taking a 6 character substring of a random number, which is unlikely to be unique with a large enough dataset. Have we misunderstood the intention of this field, or we missing something else?
Regards,
Mike
The text was updated successfully, but these errors were encountered: