Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove files from MongoDB after the associated document is deleted #6780

Open
vegidio opened this issue Jul 8, 2020 · 12 comments
Open

Remove files from MongoDB after the associated document is deleted #6780

vegidio opened this issue Jul 8, 2020 · 12 comments
Labels
type:feature New feature or improvement of existing feature

Comments

@vegidio
Copy link

vegidio commented Jul 8, 2020

Disclaimer: this is a feature request, but I'm willing to contribute by coding it myself since I already coded a similar solution in many of my projects that already use the Parse Server.


Is your feature request related to a problem? Please describe.
I've been using Parse for quite some time in different projects, but when I create a document with File fields, this becomes a problem when the document or when the field itself is removed/replaced using one of the Parse SDKs (JS, Android, etc) because the reference to the file is removed, but the actual file remains there. After some time the database will be full of orphan files that only occupy space.

Describe the solution you'd like
To solve this problem I created a couple of triggers in my Parse projects (Parse.Cloud.afterSave, Parse.Cloud.afterDelete, etc) to properly remove the files whenever the document that they are associated with is deleted or when the field is remove/updated.

The way I implemented this was by creating an environment variable called DELETE_ORPHANS_FROM where I put the list of classes that should have the files removed when a document is deleted/updated. Like this: DELETE_ORPHANS_FROM=Post,Comment. So in my cloud code I read the content of this environment variable and send a Parse.Cloud.httpRequest to delete any orphan file in the classes that I listed there.

The orphan files in the classes that are not listed in this variable are of course untouched. This gives flexibility to the users if for some reason they want to keep the orphan files in certain classes.

Additional context
Before I start working on this I would like to hear from the maintainers and other members of the community what they think of this new feature. Maybe there is a reason why it was not implemented yet. Or maybe the solution that I'm proposing is not optimal, despite the fact that it works in my projects

Please let me know your thoughts and depending on the feedback I will start working on this.

@mtrezza mtrezza added the type:feature New feature or improvement of existing feature label Jul 8, 2020
@mtrezza
Copy link
Member

mtrezza commented Jul 8, 2020

I think your suggested feature would be a useful addition to Parse Server.

There have been many questions posted about the fact that files are not deleted from the file storage when the referencing Parse Object is deleted. I have come across similar solutions as you suggest in your implementation that works as expected. Adding this as a feature would certainly be welcomed by the community.

I think the 3 most commonly expected behaviors would be:

  • a) Don't delete a file when the referencing object is deleted (what we have currently)
  • b) Delete a file when the referencing object is deleted (for 1-1 relations, easier to implement)
  • c) Delete a file when all referencing objects are deleted (for 1-n relations, harder to implement)

Actually c) covers b) but is more difficult to implement. If you want to to start with b), it would make sense to explicitly define the classes (ideally in regex) in the Parse Server configuration, so that projects that use a 1-1 relations in some classes and a 1-n relations in others can use the feature at least for some of their classes.

If you want to shoot straight for c) you would most likely implement a file reference counter so that Parse Server knows when a file is not referenced anymore and deletes it.

Whichever solution you go for, we'd be happy to review your PR.

@vegidio
Copy link
Author

vegidio commented Jul 10, 2020

Thanks for the input, I appreciate it.

Indeed option c) seems to be best one to implement, but let me dive into the code first and see what I find there. I will probably start working on this next week.

@mtrezza
Copy link
Member

mtrezza commented Jul 14, 2020

Great, let me know whenever you need any guidance.

@Moumouls
Copy link
Member

Really interesting addition to clean up GridFS or some linked object storages !

@mtrezza
Copy link
Member

mtrezza commented Jul 23, 2020

Really interesting addition to clean up GridFS or some linked object storages !

A "clean-up", as I understand it, would also delete unreferenced files retroactively from the file storage. That means scanning through the files in the storage and delete the unreferenced ones. That's a (one time) script with its own complexities therefore I am not sure this should be mixed into this PR, unless someone comes up with an ingenious concept. Such a script existed for Parse.com, I wonder if it's still around somewhere or anyone brought it over after the shutdown.

@mtrezza
Copy link
Member

mtrezza commented Sep 3, 2020

@vegidio Are you still working on this? If you need any guidance, please feel free to reach out.

@dblythy
Copy link
Member

dblythy commented Dec 15, 2020

I am happy to work on this if you're not @vegidio, as I've worked on similar features.

How do we see the best approach to tackling this? Create a new class _FileObject and track references?

@dblythy
Copy link
Member

dblythy commented Dec 20, 2020

I am going to work on a PR that:

-Tracking of file references for cleanup
-CLPs via schema
-file get ACLs
-authorisation of GET requests using a temporary token store
-Restrictions on save and get via CLP

Although most of these features cover different issues, most of them depend on a _File class.

We could easily add an option to delete the file on the destroy hook of a Parse Object, if the file references are 0. However, this provides some challenges for existing deployments as the _File object is created on create of a file, and the reference count would change on the create / update event of a Parse.Object.

@Moumouls would you or @mtrezza have any suggestions around how to build the _File class for existing deployments? The only thing I can think of is creating an example script for users to run on update to the newest version that links _File objects with their respective parent objects.

@Moumouls
Copy link
Member

Good question @dblythy , here 2 options is see:

  • Avoid migration and support both systems (and let developer migrate when he want) during one major version
  • Force migration on the new Parse Server version

For continuous deployment and avoid a huge breaking change, we should may be first try to support both systems (easy to detect via Parse.File type or a Parse.File pointer on a field). I'm in favor in a non breaking change strategy to let developers migrate at their own speed 😃

@Moumouls
Copy link
Member

Moumouls commented Dec 20, 2020

Here @dblythy my point of view about the next Parse.File system:

  • Introduce _File default standard class (like, _User, _Session,etc...)
  • _File class has : objectId, ACL, createdAt, updatedAt, url fields (like _User, a developer has the choice to add fields to the class)
  • _File class use the standard security system ACL/CLP
  • _File class have a special deletePolicy property to let developer control how files are deleted from S3 etc... (deletePolicy: 'keep', noMoreUsed, etc...)
  • _File class now use standard beforeSave, beforeDelete etc...
  • Parse.File is SDK act like Parse.User
  • User can set up a File Pointer or a File Pointer Array on a Parse.Object with standard system to create/link pointers
  • Add the File tab to Parse Dashboard

Note here i have an idea on how support both systems ate the same time easily without breaking all things.
We can use the current server File implementation for the url field into the File class
Then File class will be just a new overlay, and the old File system will work on current Parse.Objects.

@dblythy
Copy link
Member

dblythy commented Dec 20, 2020

Could we also depreciate the file triggers in place of Parse.Cloud.beforeSave(Parse.File), or is there no benefit in this? Temporary token store could allow Parse.Cloud.beforeFind of files too (lookup which user the token is associated with, use that info in the beforeFind trigger).

It wouldn't be that it would be a breaking change for existing deployments, it's just that the "references" to files wouldn't be accurate on existing deployments, as the reference count would be stored on creation / update of a Parse.Object, and it would be difficult to know that count for current deployments.

It would mean that without the correct "building" of a _File collection, some of the new features wouldn't work properly. For example:

Remove files from MongoDB after file is deleted would require:

  1. Create _File on create of Parse.File, set "references" to 0, and "File" to the associated Parse.File.
  2. When a Parse.Object is created/updated, loop through keys and look for files.
  3. If there is a file, lookup the _File associated and increment "references"

When a Parse.Object is deleted:

  1. Lookup associated _File
  2. Decrement "references"
  3. If "references" is 0, delete associated File

The problem is that if the _File class is created on create of a Parse.File, how to create the _File class for existing Parse Servers, when the files are kept in a number of different locations, and references could be complicated (e.g they could have Files set as user.profile.image.file.

@dblythy
Copy link
Member

dblythy commented Dec 20, 2020

  • _File class has : objectId, ACL, createdAt, updatedAt, url fields (like _User, a developer has the choice to add fields to the class)

So you'd be able to access file.set('key','val')?

I think for simplicity the _File class should be a special internal class that only has objectId, ACL, createdAt, updatedAt, createdBy, and maybe fileSize/ fileType (isn't url is dependant on publicServerURL?), and the only change to the SDKs should be the ability to set an ACL on a Parse.File. I think this way we can get these features built internally without causing any breaking changes or major updates. That's just my thoughts though.

@dblythy dblythy mentioned this issue Dec 20, 2020
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:feature New feature or improvement of existing feature
Projects
None yet
Development

No branches or pull requests

4 participants