You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We need a new approach to deleting personal data from a pretix instance, the current approach is not sufficient.
Current approaches
The "data shredder" concept allows plugins to register a shredder functionality. Event organizers can execute those shredders if an event is over. This includes a "download step" to export the data before it is deleted, and then a "shredding step" that actually deletes the data.
The "corona shredder" allows to delete attendee data (and order data on free orders) X days after the event date.
Challenges
The "regular" data shredder feature needs to be invoked manually after the event is over, which is not useful for large numbers of events (Z#23135940, Z#23149655) → we need something more automated, and something more suitable for organizers with a large number of events.
The "corona shredder" is not available in pretix core → the new approach needs to become a core feature.
The shredder feature is too fine-grained for actual use → we no longer need selection of data to be deleted plugin by plugin.
None of the current approaches is suitable for a ticket shop where ticket shops are issued continuously with no dates or with using the product-level validity (Z#23149655).
There is no guarantee that all "log entry" types are properly anonymized → we need some tracking that we have thought about every log entry type.
We need to deal with very large data amounts (hundreds of millions of tickets in the database, up to a million per event), so it seems like the best option to track state on orders. However, even after we shredded personal data of an order, new data may be introduced, e.g. by an incoming webhook of a payment provider or because someone manually edits the order (Auditable state of event shredding and leftover personal identifiable information. #2498) → we need some way to shred an order again after it has been modified.
There is no way to anonymize a single order on user request → we need a "delete personal data from this order" button in the backend.
Performance of shredding on large events (100k+ orders) is problematic, the corona shredder was a frequent cause of production issues and even outages. We need to make sure the new approach scales well even when large numbers of orders need to be checked or deleted.
It's easy to forget shredding when writing a new plugin.
Key changes proposed
We deprecate all existing shredding features and start from scratch.
We need an auditable mapping from log entry types to shredding functions to verify that a shredding function exists for every type that is in active use.
We allow a retention policy to be configured for events that specifies when non-financial data should be deleted. The retention policy will automatically be executed in the background.
We add a feature to the organizer panel that allows dropping financial data for a specific time frame. We do not want to automatically delete financial data as that is not legal in most jurisdictions. For example, in Germany, a business is required to keep financial records for 10 years after the end of the year the record was created, but that does not mean you can automatically delete after 10 years. If you are, for example, currently being audited, you are not allowed to delete anything before the audit is completed. Since we do not know if the organizer is being audited, automated deletion is dangerous. But it should be an easy click once a year, not something you need to open every event for.
Detail proposal
Retention policy
Every event gets a new settings area "data retention" that allows at least the following settings:
Delete non-financial personal information from tickets that have an explicit end of validity X days after their end of validity.
[In event series:] Delete non-financial personal information from tickets that have no explicit valid_until X days after the subevent.date_to.
[In non-series events:] Delete non-financial personal information from tickets that have no explicit valid_until X days after the event.date_to.
[In non-series events:] Delete non-financial personal information from tickets that have no explicit valid_until X days after the purchase date.
Delete waiting list information X days after the event / subevent.
Delete debugging information of check-ins X days after the check-in (because the raw_barcode of failed checkins might contain personal information).
Shredding state
This is where I do not have a perfect idea yet. Somehow, at least Order, Checkin and WaitingListEntry will need some attribute to mark that they have already been cleared of personal data, and this somehow needs to be reset when the object (or it's positions, log entries, …) is changed again. We already have such an attribute for LogEntry and Invoice, but these are easier because they do not change after they have been created during normal operation.
Open questions
Some types of data are clear whether they are finanical information (invoices yes, payment records yes, question answers no). But some might be less clear like the order email address. Do these decisions need to be configurable to a degree or can we get away with a one-size-fits-all? → Check with lawyers.
Do we generally need to drop OrderPosition.secret on anonymization? We know of specific customers or plugins where this would be required (as the secret contains the name or user ID in an encrypted form), but in the general case it would make working with the old data much harder, i.e. if someone scans a ticket where we have already removed personal data, we might still want to know what ticket it is to present a good error message.
What about very unstructured data, such as comments on vouchers, that sometimes include personal data? Probably "not our problem" for now, but we'd need to look at the log entries of vouchers that say where a voucher was sent.
What about data that is not technically the organizer's, e.g. notes from pretixLEAD?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
We need a new approach to deleting personal data from a pretix instance, the current approach is not sufficient.
Current approaches
The "data shredder" concept allows plugins to register a shredder functionality. Event organizers can execute those shredders if an event is over. This includes a "download step" to export the data before it is deleted, and then a "shredding step" that actually deletes the data.
The "corona shredder" allows to delete attendee data (and order data on free orders) X days after the event date.
Challenges
The "regular" data shredder feature needs to be invoked manually after the event is over, which is not useful for large numbers of events (Z#23135940, Z#23149655) → we need something more automated, and something more suitable for organizers with a large number of events.
The "regular" data shredder feature does not work well with partially shredding event series (Offer an option to delete personal data of past events #2370, Allow deletion of Free Orders / Dates with only Free Orders #2272, Shredding data for individual dates / sub-events of an event series #2421, Automatic order anonymization for COVID-19 tracking #1713) → we need the ability to partially shred the data of past subevents.
The "corona shredder" is not available in pretix core → the new approach needs to become a core feature.
The shredder feature is too fine-grained for actual use → we no longer need selection of data to be deleted plugin by plugin.
None of the current approaches is suitable for a ticket shop where ticket shops are issued continuously with no dates or with using the product-level validity (Z#23149655).
There is no guarantee that all "log entry" types are properly anonymized → we need some tracking that we have thought about every log entry type.
We need to deal with very large data amounts (hundreds of millions of tickets in the database, up to a million per event), so it seems like the best option to track state on orders. However, even after we shredded personal data of an order, new data may be introduced, e.g. by an incoming webhook of a payment provider or because someone manually edits the order (Auditable state of event shredding and leftover personal identifiable information. #2498) → we need some way to shred an order again after it has been modified.
There is no way to anonymize a single order on user request → we need a "delete personal data from this order" button in the backend.
Performance of shredding on large events (100k+ orders) is problematic, the corona shredder was a frequent cause of production issues and even outages. We need to make sure the new approach scales well even when large numbers of orders need to be checked or deleted.
It's easy to forget shredding when writing a new plugin.
Key changes proposed
We deprecate all existing shredding features and start from scratch.
We need an auditable mapping from log entry types to shredding functions to verify that a shredding function exists for every type that is in active use.
We allow a retention policy to be configured for events that specifies when non-financial data should be deleted. The retention policy will automatically be executed in the background.
We add a feature to the organizer panel that allows dropping financial data for a specific time frame. We do not want to automatically delete financial data as that is not legal in most jurisdictions. For example, in Germany, a business is required to keep financial records for 10 years after the end of the year the record was created, but that does not mean you can automatically delete after 10 years. If you are, for example, currently being audited, you are not allowed to delete anything before the audit is completed. Since we do not know if the organizer is being audited, automated deletion is dangerous. But it should be an easy click once a year, not something you need to open every event for.
Detail proposal
Retention policy
Every event gets a new settings area "data retention" that allows at least the following settings:
Delete non-financial personal information from tickets that have an explicit end of validity X days after their end of validity.
[In event series:] Delete non-financial personal information from tickets that have no explicit
valid_until
X days after thesubevent.date_to
.[In non-series events:] Delete non-financial personal information from tickets that have no explicit
valid_until
X days after theevent.date_to
.[In non-series events:] Delete non-financial personal information from tickets that have no explicit
valid_until
X days after the purchase date.Delete waiting list information X days after the event / subevent.
Delete debugging information of check-ins X days after the check-in (because the
raw_barcode
of failed checkins might contain personal information).Shredding state
This is where I do not have a perfect idea yet. Somehow, at least
Order
,Checkin
andWaitingListEntry
will need some attribute to mark that they have already been cleared of personal data, and this somehow needs to be reset when the object (or it's positions, log entries, …) is changed again. We already have such an attribute forLogEntry
andInvoice
, but these are easier because they do not change after they have been created during normal operation.Open questions
Some types of data are clear whether they are finanical information (invoices yes, payment records yes, question answers no). But some might be less clear like the order email address. Do these decisions need to be configurable to a degree or can we get away with a one-size-fits-all? → Check with lawyers.
Do we generally need to drop
OrderPosition.secret
on anonymization? We know of specific customers or plugins where this would be required (as the secret contains the name or user ID in an encrypted form), but in the general case it would make working with the old data much harder, i.e. if someone scans a ticket where we have already removed personal data, we might still want to know what ticket it is to present a good error message.What about very unstructured data, such as comments on vouchers, that sometimes include personal data? Probably "not our problem" for now, but we'd need to look at the log entries of vouchers that say where a voucher was sent.
What about data that is not technically the organizer's, e.g. notes from pretixLEAD?
Plugins relevant to shredding
Beta Was this translation helpful? Give feedback.
All reactions