Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Clean Up Files" Feature #1023

Closed
natanrolnik opened this issue Mar 14, 2016 · 42 comments

Comments

@natanrolnik
Copy link
Collaborator

commented Mar 14, 2016

Make sure these boxes are checked before submitting your issue -- thanks for reporting issues back to Parse Server!

One of the features that I liked on the hosted Parse was, in the settings, the button Clean Up Files. This way, every file stored in S3 for example, that wasn't anymore referenced from a PFFile, would be deleted. I liked it specially because it allowed us to save on unused/unneeded resources.

Maybe a Rest call using the master key would be initially enough? In the future, with possible integration with the parse-dashboard?

I know it's lower priority compared to the features/fixes that are being developed, but that would be great to have.

@natanrolnik natanrolnik changed the title Allow deleting unused files "Clean Up Files" Mar 15, 2016

@natanrolnik natanrolnik changed the title "Clean Up Files" "Clean Up Files" Feature Mar 15, 2016

@gfosco

This comment has been minimized.

Copy link
Collaborator

commented Mar 17, 2016

This would be pretty difficult actually, and would need to be built for each specific Files adapter. Right now, there's no 'listing' of what files exist through the adapter.

@natario1

This comment has been minimized.

Copy link

commented Mar 17, 2016

+1 , agree with the need.

@ckarmy

This comment has been minimized.

Copy link

commented Jul 18, 2016

It's possible to clean the unused files stored in GridStore now?

@hramos hramos added the enhancement label Sep 6, 2016

@yorkwang

This comment has been minimized.

Copy link

commented Sep 28, 2016

+1, It's a very useful feature.

@Lokiitzz

This comment has been minimized.

Copy link

commented Oct 3, 2016

+1, It would be nice.

@umair6

This comment has been minimized.

Copy link

commented Oct 19, 2016

+1

@abdulwasayabbasi

This comment has been minimized.

Copy link

commented Oct 19, 2016

+1 very much needed

@JoseVigil

This comment has been minimized.

Copy link

commented Nov 21, 2016

+1

@natario1

This comment has been minimized.

Copy link

commented Nov 23, 2016

Just asking: how many of you ever actually needed a file after deleting pointers to it?

I feel the most common use of files is “if I delete the pointer, I don’t need the file anymore”. If this is the case, why not make it the default in parse-server?

I mean that when any object is deleted, after delete, all files are processed and adapter.deleteFile() is called for each. This could be opt-in / out in the ParseServer constructor, and is way easier than a complete “clean up” feature.

Given how tricky the full task is, it would also be cool if parse-server kept a Files table with url and usage_count , to simplify all the rest.

@abdulwasayabbasi

This comment has been minimized.

Copy link

commented Nov 23, 2016

@natario1
Just to answer your question why I want to keep the files is because of the intermediate state
e.g.
Consider a mobile game using Parse backend for keeping some zip packages to be used in game
While we delete/replace with some new packages on parse dashboard, there will be a state for sometime where the users still having old package URL locally in their app/game will start having issues until new configs/URLS are loaded
But I also want to keep size of my database as small as possible, so at some right time I will delete some old packages

@natario1

This comment has been minimized.

Copy link

commented Nov 23, 2016

@abdulwasayabbasi makes sense, thank you. Just wondering how frequent that is.

Your use case would not be bothered by an ‘auto delete’ feature, since you are just updating the file field. To take advantage of it you would have to create a new object with the new package file, and delete the older object when you feel safe, so the old file gets auto deleted.

@Lokiitzz

This comment has been minimized.

Copy link

commented Dec 5, 2016

I made my own "clean file". Maybe it could help someone!

https://gist.github.com/Lokiitzz/6afbf0573665d3170ffb1e83565a0fef

Be careful :)

@davimacedo

This comment has been minimized.

Copy link
Member

commented Dec 5, 2016

Why not a PR to Parse Server? :)

@flovilmart

This comment has been minimized.

Copy link
Contributor

commented Dec 5, 2016

The code won't work on the server as it loads all objects in memory.

@davimacedo

This comment has been minimized.

Copy link
Member

commented Dec 5, 2016

yes.. you're right. I didn't check it before.

@flovilmart

This comment has been minimized.

Copy link
Contributor

commented Dec 5, 2016

For those features, I'd love to see command line tools more than just another endpoint that require maintenance.

@mrmarcsmith

This comment has been minimized.

Copy link
Contributor

commented Dec 30, 2016

why not pass an "auto-delete-files" flag to the server on startup and when an individual file pointer is deleted or replaced it deletes the file? This feature would help the 50% of people who only use PFFiles for profile pictures (files that won't be needed after deletion or replacement) while leaving the other 50% who want fine grain control unaffected because they didn't pass the flag? Would this be a valid solution?

@funkenstrahlen

This comment has been minimized.

Copy link
Contributor

commented Feb 7, 2017

I also have this problem. Deleted a lot of rows in my mongodb database with parse dashboard including the reference to many images. Now I am unable to find them and clean them up. Or is there any other (manual) way?

I expected the parse dashboard to cleanup pffiles before it removes the reference to them.

@respectTheCode

This comment has been minimized.

Copy link

commented Mar 30, 2017

Any progress on this?

@flovilmart

This comment has been minimized.

Copy link
Contributor

commented Mar 30, 2017

Not yet, this is not a feature that is actively worked on, but pull requests or a separate project could take care of it

@respectTheCode

This comment has been minimized.

Copy link

commented Mar 30, 2017

Depending on how you look at this it is either an undocumented "feature" or a huge bug. Either way it has huge and expensive consequences that should at the very least be well documented.

@flovilmart

This comment has been minimized.

Copy link
Contributor

commented Mar 30, 2017

undocumented "feature" or a huge bug

What do you mean by that?

This is neither documented nor a bug as it's just not implemented, neither listing the missing files, nor deleting an existing file through the file adapters.

Because a file could be referenced by multiple Objects, we don't keep a reference count on them.
Steps are relatively easy to describe:

  • list all files present in the DB
  • list all files present in the bucket/storage
  • for each file present in the storage, if missing from the list from files in the DB, delete from storage.

However, not trivial to implement.

@umair6

This comment has been minimized.

Copy link

commented Mar 31, 2017

If you are using mongodb then as a workaround, you can write simple script to delete unreferenced file chunks directly from mongo db.
db_cleanup_script.js.zip
Make sure mongo is installed and running before running this script.

@Droppix

This comment has been minimized.

Copy link

commented Apr 5, 2017

+1

@respectTheCode

This comment has been minimized.

Copy link

commented Apr 5, 2017

If memory was handled the same way this would be a memory leak. I guess you could call this a file leak.

A cleanup scripts is a workable solution to a one time problem but this is not a one time problem. To use a cleanup script in production you have to setup, maintain and monitor the infrastructure to run the cleanup script on a schedule. Then you need to monitor the impact of running that script and adjust the schedule, scale the servers and/or throttle the script to meet your needs. This is all very dependent on your exact use case and could change as your product and users change. This means constant monitoring is needed. If you have the team to solve this problem chances are you would not be using parse in the first place.

The logical solution here is to keep a reference count and delete the file when the counter gets to 0. This is code that could be written once and used in all but the most extreme cases.

@flovilmart

This comment has been minimized.

Copy link
Contributor

commented Apr 5, 2017

Originally, on parse.com, that was a cleanup script. Which seems to be efficient enough to work it out. Files can be passed around in different objects, stored in arrays or embedded into objects. THere's nothing that guarantees that one user won't reference the file by the URL, I did that for a project, just using the File as an uploader mechanism but then I would just pass the URL's around.

The logical solution here is to keep a reference count and delete the file when the counter gets to 0.

The script based solution is as valid as a ref-count based solution. That being said, 'over releasing' a file or missing to count the usage of a file when referenced by another would destroy the file.

You seem to have a good understanding of the problem, why not try to tackle it?

This repo started as a simple file list tool, maybe there's something to look for here.

@flovilmart

This comment has been minimized.

Copy link
Contributor

commented Apr 23, 2017

Also given the costs related to unused files on S3, https://aws.amazon.com/s3/pricing/ (0.0023$ / Gb / month) this seems to be negligible.

@yorkwang

This comment has been minimized.

Copy link

commented Jun 28, 2017

Anyone can solve this issue? It's been more than one year.

@natanrolnik

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 28, 2017

Yes, anyone can solve, including you :)

@funkenstrahlen

This comment has been minimized.

Copy link
Contributor

commented Jun 28, 2017

@6thfdwp

This comment has been minimized.

Copy link
Contributor

commented Jun 30, 2017

In case we want to programatically delete a file, one option I can see so far is to make a request to end point defined in FileRouter L27, as Parse.File doesn't expose the delete method.
For creating file we have:
return CoreManager.getRESTController().request('POST', 'files/'+name, data);
So I tried to send this to delete a file:
return CoreManager.getRESTController().request('DELETE', 'files/'+name);

But got error from middleware.js trying to create buffer new Buffer(base64, 'base64');

What is the proper way to make such request to delete a file or any other way to programatically do this?

@flovilmart

This comment has been minimized.

Copy link
Contributor

commented Jun 30, 2017

Single file deletion is not implemented yet, and not required by the files adapters either I believe. We could start adding those.

@6thfdwp

This comment has been minimized.

Copy link
Contributor

commented Jun 30, 2017

Single file deletion is not implemented yet

Do you mean it's just not implemented in Parse.File?

I can see it's required in FilesAdapter, and FilesRouter define this end point as well.
So I suppose we can do single file deletion as long as our custom FilesAdapter implements this method, right? Btw, I'm using AzureStorageAdapter, I can see it has this method implemented.

This error may be related to the request format issue?

@hennessycreative

This comment has been minimized.

Copy link

commented Jul 13, 2017

The deletion is working with the file URL without app ID. i.e.:
curl -X DELETE -H "X-Parse...... http://domain/parse/files/appid/file
is not working but
curl -X DELETE -H "X-Parse...... http://domain/parse/files/file
is working :/

Edit: Oh someone has found it already #1411

@ghost

This comment has been minimized.

Copy link

commented Oct 17, 2017

+1, It would very useful.

@xainpro

This comment has been minimized.

Copy link

commented Oct 17, 2017

Any solution yet ?

@mtrezza

This comment has been minimized.

Copy link
Contributor

commented Apr 7, 2018

+1 (to keep this alive)

@flovilmart

This comment has been minimized.

Copy link
Contributor

commented Apr 7, 2018

Feel free to open a pull request for a reference implementation, but i’ll Be closing this issue as it’s an off process job that may take a very long time to complete in order to clean dereferenced files. It’s not something that I as a maintainer want to actively work on (as stated many times) but I’ll hladly review a pull request if any change to parse server is needed for that feature.

As mentioned previously, all the work can be done externally, without needing change on this project.

@flovilmart flovilmart closed this Apr 7, 2018

@jeacott1

This comment has been minimized.

Copy link
Contributor

commented May 11, 2018

GDPR requirements for anyone running parse for users that might exist in Europe and have uploaded personal data mean that without this feature anyone using parse without a way to mitigate this could have an expensive problem.

@flovilmart

This comment has been minimized.

Copy link
Contributor

commented May 11, 2018

@jeacott1 we provide a way to delete existing files on demand, through the REST API and the files adapters, so a conscious user could delete the existing picture upon replacement.

Also, we’re open to pull requests, I believe I don’t need to say it again, as it was basically the message posted before yours.
If you believe this project can’t help you achieve GPDR compliance, then you have 2 options, either fix it or stop using it. Trolling isn’t one.

Thanks.

@jeacott1

This comment has been minimized.

Copy link
Contributor

commented May 11, 2018

ah - ok, I missed that. I didn't think there was a way to delete via the rest api. I thought it just removed the reference. just trying to understand how best to do this.

@giomatiashvili

This comment has been minimized.

Copy link

commented Dec 15, 2018

curl -X DELETE \ -H "X-Parse-Application-Id:[AppId]" \ -H "X-Parse-Master-Key:[MasterKey]" \ http://[ParseServer Url]/files/5b6cd3a71873be9c79aedeb53ff71f05_fav.png

here is the code for REST API for deleting files, php API supports file delete as well:
I tested it for Digital Ocean Spaces and it works like a charm

try {
    $result = $testFile->delete(true);
    echo $result;
} catch (Exception $e) {
    echo 'Caught exception: ',  $e->getMessage(), "\n";
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.