Skip to content
This repository has been archived by the owner. It is now read-only.

Extension Reporting Mechanism #1

Closed
mbabker opened this issue May 2, 2018 · 38 comments

Comments

Projects
@mbabker
Copy link
Collaborator

commented May 2, 2018

The framework must provide a way for extensions to report the data they collect in a way that can be easily reviewed by a site owner.

This should include:

  • A standardized manifest for reporting data (see joomla/joomla-cms#20140 for an example and further discussion)
    • Information retained in the database
    • Cookies
  • Event hooks where plugins can also process this data
  • A screen in a com_privacy component to display this information

@mbabker mbabker added the enhancement label May 2, 2018

@brianteeman

This comment has been minimized.

Copy link
Collaborator

commented May 2, 2018

note that many things will be beyond our control and undetectable. for example a youtube module will result in their being a tracking cookie set and a google analytics directly in the template will also be undetectable.

@mbabker

This comment has been minimized.

Copy link
Collaborator Author

commented May 2, 2018

Some things are definitely out of reach. In the case of extensions like https://extensions.joomla.org/extension/simple-youtube/ though they could report at least at a generic level "This iframes in a YouTube video and drags in whatever analytics/cookies are associated with that transaction". So the things we can target should be in reach, even if we have to think a little more abstract and less a specific "we set a cookie named X for this purpose", but we definitely aren't going to get in the business of parsing a template's index file to determine if it has analytics scripts.

@brianteeman

This comment has been minimized.

Copy link
Collaborator

commented May 2, 2018

My concern (and this is true for everything) is that we are now requiring on extension developers to provide the information to the siteowners and in a format we can read. As that's not going to happen for every extension then we run into the liability of telling site owners that this tool will give them the relevant information for their site but in reality it may only give them a small portion of that information

@mbabker

This comment has been minimized.

Copy link
Collaborator Author

commented May 2, 2018

It'll be made clear in documentation and the UI that the information we're giving them is reliant on extensions reporting that data to us and that it may be an incomplete listing. The intent is not to make it sound like they will have all of the information at any time. This won't be much different to using some of the other existing GDPR extension packages which also require extension specific integrations to add their stuff.

@mbabker

This comment has been minimized.

Copy link
Collaborator Author

commented May 2, 2018

Screenshot from WordPress' current Privacy page. Granted, the text of it is heavily geared on their system to help create a privacy policy page, but that's the type of language I think we need as it relates to communicating to users what exactly they're going to be seeing in our system.

screen shot 2018-05-02 at 10 56 31 am

@brianteeman

This comment has been minimized.

Copy link
Collaborator

commented May 2, 2018

that addresses my "concern" just wanted to make sure it was addressed

@mbabker

This comment has been minimized.

Copy link
Collaborator Author

commented May 2, 2018

https://core.trac.wordpress.org/ticket/43938 is what we'll want to follow as that is EXACTLY what you pointed out here.

@mbabker mbabker added this to To do in Core API May 3, 2018

@aDaneInSpain

This comment has been minimized.

Copy link

commented May 3, 2018

Some things to consider:

  • Should extension developers specify if data should be deleted or anonymized?
  • Should extension owners be able to provide their own methods for anonymization and deletion?
  • Should extension owners be able to indicate if a whole row in a table (as opposed to just a field) should be deleted/anonymized?
@mbabker

This comment has been minimized.

Copy link
Collaborator Author

commented May 3, 2018

Yes, yes, and no.

The extension developer/owner will know best what purpose their extensions are serving and the data they are collecting, and inherently should be able to best specify whether data by default should be able to be fully deleted, anonymized, or not touched at all (in contexts where other legal requirements would call for that).

As an anonymization/deletion process would be triggering data updates, it should also be possible to hook into that extension's own system to handle said updates (which would include the potential of triggering plugin events, which may not be using the same events that core uses).

As for full row anonymization, I would not do this; it should be a by-field operation (columns like created/modified dates don't really need to be random, columns which should be foreign keys should still have correct data integrity, etc.).

@aDaneInSpain

This comment has been minimized.

Copy link

commented May 3, 2018

As for full row deletion it would be mainly for a log system for instance. Delete all rows with the user_id = x so that you do not end up with a log table full of empty fields. But your call.

@mbabker

This comment has been minimized.

Copy link
Collaborator Author

commented May 3, 2018

To me that's something that can be left up to the extension. Because odds are if you're requesting data be deleted, you're going to just remove the full row and not only the fields containing privacy related data for exactly the reason you've pointed out. So the reporting mechanism reports sensitive fields but the code running a delete operation would make the smart decision on deleting a field versus row.

@aDaneInSpain

This comment has been minimized.

Copy link

commented May 3, 2018

So you are saying that to delete a full row the extension should provide it's own delete method (plug-in)? I guess that makes sense, although the more Joomla can handle this the more likely it will be implemented properly by 3rd party developers (who tend to like cutting corners).

@brianteeman

This comment has been minimized.

Copy link
Collaborator

commented May 3, 2018

core joomla should never touch an extension's code or data

@mbabker

This comment has been minimized.

Copy link
Collaborator Author

commented May 3, 2018

The default delete logic would basically be a series of UPDATE foo SET bar = NULL, baz = '' type queries. Depending on what data is in the manifest, we aren't going to be able to do things like:

  • Load a model from a component's MVC
  • Load a JTable object from a component

So all we can do is empty fields. This is why the system needs to be hookable/extensible, where a delete method can be specified. Conceptually this isn't going to be much more complex than the default case in Joomla\CMS\Form\Form::filterField() where core checks if there is a valid callback structure in place and if not then default to our sane defaults.

Or, we don't touch any data ourselves and make the entire delete/anonymize process run as a series of plugin events (onProcessUserAnonymize, onProcessUserDelete, etc.). If you don't have a plugin then you don't integrate into the system. Thinking about it some having the entire system be plugin driven would mean we don't have to have a separate manifest file (XML) or deal with some of the quirks that would come with specifying callback functions in XML (trying to load files for example). So for the reporting side of things, we dispatch onPrivacyCollectSensitiveFields and onPrivacyCollectCookies events that each return a collection of objects with the data in our structure (using your XML example as a good starting point for that definition).

  • all event names are just examples and definitely not final 😃
@aDaneInSpain

This comment has been minimized.

Copy link

commented May 3, 2018

So you are saying that all deletion in all extensions should be performed by the extensions then. Hmmm... it is not as smooth as I had hoped it would be which leads me to believe that if a 3rd party GDPR extension gains popularity and it offers to do the deleting for the extensions it might become the de facto standard over the Joomla core which would be a shame.

The core already deletes tables when a component is uninstalled (if the component tells the core to do so) - this would be no different. If the component supplies a GDPR config file then the core can "delete private user data" on user delete event.

If you leave it up to each extension then each extension developer will have to install and activate a system plug-in that detects "onUserBeforeDelete" and then perform their clean up on that event. They can already do that now and as such we need no connection with 3rd party extensions.

This is the wrong approach in my opinion. The core should be performing the deletion/anonymization when a user is deleted to make it as easy for 3rd party developers to implement GDPR compliance as possible.

@mbabker

This comment has been minimized.

Copy link
Collaborator Author

commented May 3, 2018

I am aiming a bit conservatively right now. Because I don't want core blindly touching data and I'm not entirely convinced that core needs to be entirely responsible for crafting the SQL queries it takes to handle the update/delete actions (especially because in the case of a deletion there may be other actions that need to be taken to remove the record; in one application I maintain if we were to delete a row from the users table there are 4 tables with direct FK relations that would need to be checked to either null the key or delete the row, and 2 of those have cascading relations to other tables as well). So personally I do believe it is in our best interest to look toward a plugin driven solution for these actions versus hardcoding some logic. That's not to say, as I pointed out, we can't have a default case where we CAN do something basic, but I really feel like using that code path would have to be an exception to the rule and not the standard.

If you haven't yet, take a look into WordPress' privacy tools that are going into their release coming out this month (this commit is where the erasure framework was added, of course there have been modifications since). It is a callback/plugin driven system. That implementation strengthens my belief that our system being plugin driven versus us writing and executing SQL queries on our own is in our best interests. Even if we did try to do the queries on our own, we would still have to emit events for the reasons I pointed out above (needing to handle cleanup on related records).

@laoneo

This comment has been minimized.

Copy link
Member

commented May 3, 2018

To stay save I would also prefere to forward the job to the extension. It can then also cleanup some relations if required or do some other cleanup tasks.

@Ruud68

This comment has been minimized.

Copy link
Contributor

commented May 3, 2018

Hi, as extension developer I would favor a plugin approach and by doing that make it my responsibility to provide the information or delete the information. Especially when there are linked tables things could get messy. 'Mess' that I would have to give support to when things stop working as expected. The JED could have a badge stating the extension's GDPR compliancy (simple yes / no) just like there is now an indicator if the Joomla updater is used.

@Ruud68 Ruud68 referenced this issue May 3, 2018

Closed

Data Download #3

@JoomliC

This comment has been minimized.

Copy link
Contributor

commented May 3, 2018

I think it could be plugin driven system (by 3rd Party extension) with manifest install statement as it is for update server to get the information in com_privacy.

  • 3rd party extension declare private data that could be stored (eg. list of informations collected by extension, purpose...)
  • Component Privacy collect this information from install/update manifest (and list it as it is for Update Sites, by giving the needed information. Table #__privacy_extensions)
  • 3rd party extension provide a privacy plugin (maybe creating a group for privacy plugin ?)
  • Allow the 3rd party privacy plugin to communicate/link information to com_privacy to complete core process information given to user.

Just an "not-clear" idea... but maybe a "Update-sites" like process ?

@mbabker

This comment has been minimized.

Copy link
Collaborator Author

commented May 3, 2018

I think it could be plugin driven system (by 3rd Party extension) with manifest install statement as it is for update server to get the information in com_privacy.

That's kind of the direction I'm thinking of going but if we are making everything plugin driven I don't see the point in having a standalone manifest file with this information also, this could be compiled directly in the plugin too (so since it's the first thing I can think of directly related to something I'm working with near daily, Akeeba could provide a plg_privacy_ars for Release System that hooks all the things (providing information about what data it collects, anonymize/delete actions, export actions, etc. etc.)).

@mbabker mbabker added the help wanted label May 4, 2018

@sandewt

This comment has been minimized.

Copy link

commented May 4, 2018

Consideration

Perhaps this topic has already been discussed?

IP addresses may fall according to the GDPR on privacy sensitive information. These are stored in the log file(s) by Joomla.

It would be nice if there is a possibility to keep it for a limited time. And not endlessly, as is the case now. Then remove it, after a defined time.

@brianteeman

This comment has been minimized.

Copy link
Collaborator

commented May 4, 2018

Which log files are you talking about? If you mean the server (apache) log files then that is nothing to do with Joomla. How long they are stored etc is something for you to discuss with your web host. Joomla doesnt have any control over them and nor can it control them

@sandewt

This comment has been minimized.

Copy link

commented May 4, 2018

@brianteeman

I mean the log files in ...\Joomla\administrator\logs\ e.g. error.php

#
#<?php die('Forbidden.'); ?>
#Date: 2018-05-01 11:06:28 UTC
#Software: Joomla Platform 13.1.0 Stable [ Curiosity ] 24-Apr-2013 00:00 GMT

#Fields: datetime	priority clientip	category	message
2018-05-01T11:06:28+00:00	INFO ::1	joomlafailure	Username and password do not match or you do not have an account yet.
2018-05-01T11:09:06+00:00	INFO ::1	joomlafailure	Username and password do not match or you do not have an account yet.
2018-05-01T11:10:40+00:00	INFO ::1	joomlafailure	Username and password do not match or you do not have an account yet.
@brianteeman

This comment has been minimized.

Copy link
Collaborator

commented May 4, 2018

@sandewt please create a NEW issue for that - it is nothing to do with "Extension Reporting Mechanism "

@sandewt

This comment has been minimized.

Copy link

commented May 4, 2018

@brianteeman

OOPS.....

@sakiss

This comment has been minimized.

Copy link

commented May 5, 2018

Leaving the deletion logic to the 3rd party extensions leaves room for error and probably an identical logic will be shared amongst the extensions.

@mbabker I understand your point about responsibility for such actions, hence if there is a manifest where the fields of specific table/s are specified as deletable/anonymizable, then the responsibility still belongs to the extension devs.

Seems a pretty easy and common task that could be carried out by the core and also could be extensible by using plugin triggers as mentioned.
From my point of view the RUD tasks should be carried out by the core.

@JoomliC

This comment has been minimized.

Copy link
Contributor

commented May 5, 2018

I don't see the point in having a standalone manifest file with this information also

I didn't mean in a new xml file, but in the manifest install file used already by all existing extensions.
This allow to collect it eg. in extensions table or in a linked com_privacy table, to allow to display easily provided information in com_privacy...

But maybe you see better nothing like that, and a plugin event handler in com_privacy control panel to allow plugin to inform user there what the 3rd party do and can do ?

It is just to know/find the best way for Joomla to know from each 3rd party extensions if they integrate with Joomla privacy, and/or what they do/not-do.

@mbabker

This comment has been minimized.

Copy link
Collaborator Author

commented May 5, 2018

But maybe you see better nothing like that, and a plugin event handler in com_privacy control panel to allow plugin to inform user there what the 3rd party do and can do ?

It is just to know/find the best way for Joomla to know from each 3rd party extensions if they integrate with Joomla privacy, and/or what they do/not-do.

I'm leaning toward just doing as much in the plugin as possible since we're essentially going to require a new plugin in the privacy group, I'm thinking we should steer this so you have one plugin fora complete extension package (so there'd be a plg_privacy_content covering com_content, the voting plugin, and whatever other elements are needed), plus if we can have one plugin handle reporting requirements for multiple individual extensions that would be a bit more efficient than having a separate standalone manifest file in com_content, the voting plugin, and every individual extension.

@dirigit

This comment has been minimized.

Copy link

commented May 6, 2018

After reading a bunch of threads and related documentation as well as getting a lot of education without becoming a guru I don't want to disturb much but, currently Joomla has a problem with already existing laws in several regions of the world:

At the moment there is no privacy by design of core system.

GDPR is only one issue on the way to become compliant. Being released 2016 it got the rock rolling now. There is an upcoming ePrivacy regulation which is not released and active as of now (should become active with GDPR).

In draft of this regulation there is a clear requirenment to get consent of user (visitor) before anything is saved at user side (cookie, whatever). No consent - no storage. Hard times especially for advertising industry ...

GDPR says consent of children is invalid when they are to young (min. age EU general: 13, Austria: 14, Germany: 16, ...).

Have a look at U.S.A. and you'll find different minimum ages to be able to accept consent as well. I stumbled over California to name only one example for such requirenments in existing and active law.

Unsorted list of further thoughts:

Anonymous storage of IPs - simple solution could be to introduce a setting and related function to say i.e. we invalidate xy bytes of IP v4 addresses. IP v6 is next to be considered. Have a look at Matomo (was Piwik) for an example of implementation.

Plugin based handling - what about cache?
AFAIR it needs a module to manipulate cache of Joomla.

Deletion of user data - isn't it enough to make data invalid e.g. with Xs to avoid empty rows?
Backups of database are a problem of website owner.

Exporting user data to fullfill requirenment of GDPR in relation to transfer - AFAIK csv export is sufficient. Performing an export is no task for users - it's a task of administration. Should be a core functionality in relation to Joomla users.

Extensions might save user data for themself and must handle those data in relation to law to be compliant. I.e. a shop forcing a customer to register at Joomla can not be compliant (GDPR 7.4 and 6.1.1). Keywords are "free" and "coupling" in this relation. Nevertheless a customer could be a Joomla user as well.

Different cookies - there are multiple kinds of cookies possible. There should be core functionality to be able to suppress system cookies and extension cookies at least with help of Joomla's core (privacy by design).

More related to extension developers:
Links to facebook and the like - a nice (and working) approach for using it in accordance to GDPR and similar regulations can be found at https://github.com/heiseonline/shariff.

@brianteeman

This comment has been minimized.

Copy link
Collaborator

commented May 6, 2018

@dirigit thank you for taking the time to comment. It would be really helpful if you could check all the existing issues to see if your issues are already listed and being addressed AND if any of them are not then to create a brand new issue report for each of them

@aDaneInSpain

This comment has been minimized.

Copy link

commented May 7, 2018

We will end up with a lot of duplicated and copy pasted code if each extension (including the core ones) are to handle their own clean up.

To me this should simply be adding a section to the manifest of a plug-in and the core should handle everything else (unless the manifest specifically supplies it's own methods for deleting).

Loading plug-ins for each extension would potentially lead to performance degradation and definitely lead to duplicated code which is never good.

@mbabker

This comment has been minimized.

Copy link
Collaborator Author

commented May 7, 2018

We will end up with a lot of duplicated and copy pasted code if each extension (including the core ones) are to handle their own clean up.

To me this should simply be adding a section to the manifest of a plug-in and the core should handle everything else (unless the manifest specifically supplies it's own methods for deleting).

This is too risky. As has been pointed out here, even if core itself could write and execute the queries to be run based on a manifest, the processing would still have to be handed over to extensions to perform additional cleanup (again, go back to all the examples where you have records with foreign key relations to other tables that would need to be processed). If core blindly writes and executes UPDATE/DELETE queries then core WILL break the database. Yes, it will result in some duplicated code between extensions, BUT, this ensures that the data is handled correctly. It is more important that the data is handled correctly than ensuring the count of lines of code remains small.

Loading plug-ins for each extension would potentially lead to performance degradation and definitely lead to duplicated code which is never good.

The only global performance degradation will come in the database query loading all of the plugin records from the database at the beginning of the request. Otherwise, like the rest of Joomla, the plugin group will only be loaded on demand.

@dirigit

This comment has been minimized.

Copy link

commented May 9, 2018

@brianteeman Thank you for kind response.

Sadly there seems to be a general tendence to introduce privacy as an add-on to Joomla. Such a solution would be a patchwork with unforseeable risks from beginning. Risk can be performance, security, take whatever you want. Dealing with cache is only one issue related to technical problems.

For me it looks like trying to bridle the horse from behind. Giving privacy same priority like security would be a long term solution IMHO.

Now, to get knowledge about what has to be done and what has been implemented there should be kind of a list of requirenments. I try to write some lines here and try to take all issues found in several regions of the world.

  1. Free consent of visitor before setting any cookie or the like
    1.a Problem: Age of visitor is an issue in multiple regions
    1.b User can change mind and wants to change previous decission

  2. Keep collected data at it's bare minimum needed to perform a task. Site owner is responsible!
    2.a Invalidation at IP collection (IP v4, IP v6)!

In case of acceptance

  1. Information to user - what kind of data is saved on visitors side (cookies etc) or in system in detail

  2. Export all user related data (common format like csv).

  3. Delete user data - Invalidation should be sufficient to avoid technical problems in db (autoincrement :()

To be continued. I will try to find out what is implemented at the moment when time permits.

cu, diri

@brianteeman

This comment has been minimized.

Copy link
Collaborator

commented May 9, 2018

@dirigit again please review what has already been noted before commenting further. Most if not all has already been addressed.

@jomres

This comment has been minimized.

Copy link

commented May 14, 2018

I'll back this approach. As Jomres works on both WP and J natively I've had to add my own methods for reporting and anonymising data. Hooking into J should, for me, be a formality. Just tell me to anonymise, give me a cms user id and I can handle the rest.

Or, we don't touch any data ourselves and make the entire delete/anonymize process run as a series of plugin events (onProcessUserAnonymize, onProcessUserDelete, etc.)

@mbabker mbabker added the In Progress label May 16, 2018

@mbabker mbabker self-assigned this May 16, 2018

@mbabker mbabker moved this from To do to In progress in Core API May 16, 2018

@mbabker

This comment has been minimized.

Copy link
Collaborator Author

commented May 17, 2018

#37 is the first extension integration point being proposed, and probably the easiest but to me one of the most beneficial as this is going to provide to site administrators a central location to get specific details about what both the core platform and their installed/enabled extensions are doing as it relates to potential privacy related matters. I also think as a lower priority feature we should think about a similar type of hook for the frontend at least when requesting your information be removed/anonymized so there's some clarity for the requesting user as to what exactly is going to happen, but that'll be long after all the other admin and architecture stuff is in place.

I'll get to data processing events next (#3 and probably other stuff that hasn't been covered in an issue yet).

@ankush-maherwal

This comment has been minimized.

Copy link

commented May 22, 2018

I also favour plugin approach to handle the data deletion requests of the site users, depending on the data the extension stores the extension developers can implement a plugin to delete or anonymize the user's data.

We have documented a solution around deboarding the users from the site, the doc specifies the way to store the consents of the users and the triggers which can be used in deleting or anonymizing the user's data.

https://github.com/techjoomla/user-deboarding/wiki

mbabker pushed a commit that referenced this issue Jun 1, 2018

Merge pull request #1 from JoomliC/patch-2
Add missing string COM_PRIVACY_MSG_CONSENT_NO_CONSENT
@mbabker

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 16, 2018

We should be covered now.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.