Skip to content
This repository has been archived by the owner on Dec 13, 2023. It is now read-only.

Need to be able to safely cleanup local content #3479

Closed
rkfg opened this issue Jul 3, 2018 · 18 comments
Closed

Need to be able to safely cleanup local content #3479

rkfg opened this issue Jul 3, 2018 · 18 comments

Comments

@rkfg
Copy link
Contributor

rkfg commented Jul 3, 2018

Not the caches, the content itself. People sometimes upload big files like videos or highres photos and they stay in the storage forever. Currently there's no way to find these files and delete them after some time safely. The problem is that avatars (for example) are also stored in the same storage and just deleting everything older than N days would effectively purge every user and room avatar if they're set a while ago. I also noticed that purging history with the API doesn't delete the media attached to the purged events. In the end the storage would grow indefinitely which is quite a problem.

So I would like those files to be removed if no events use them (including the forwarded ones), that's how it should be. I know that redacting doesn't actually remove the content from the database just yet but it's unreachable and the media should be too.

But also I'd like to be able to remove media using some criteria like date range and file size and be sure that it doesn't break anything. Sure, easy find -size ... -delete would do but this really should be managed by synapse itself.

@rednerd
Copy link

rednerd commented Jul 10, 2018

I think the way to solve this right now is to delete files older than X days and larger than Y bytes (e.g. find /var/lib/matrix-synapse/media -size +50M -mtime +365). I have the same issue because my private homeserver on a tiny 10GB VPS is 3/4 full of videos of kids. I've used this method to clean up some files and it just results in a blank message in Riot from what I can see. (I can't get the purge API to actually purge, but that's a separate issue)

If there's a better way, I'm happy to be corrected.

@rkfg
Copy link
Contributor Author

rkfg commented Jul 11, 2018

You can do that but the database records linking to that content would not be removed. Same problem with the local thumbnails, you need to remove the database rows or else the thumbnails won't be regenerated on access later. I think there should be a single cleanup API that takes care of all the local and remote content, including thumbnails and URL previews.

@mytbk
Copy link

mytbk commented Aug 24, 2018

I wrote a simple tool to clean up local content tonight.
https://git.wehack.space/matrix-synapse-scripts/tree/mxclean

@Linuxine
Copy link

I agree, it would be great to have a way to clear the local content without destroying all avatars. I have created a shell script to clean this repository, but as said by @rkfg, this also delete the avatars, even if I save and restore their files (because the local thumbnails are not regenerated after the clean).

@temandroid
Copy link

I wrote a simple tool to clean up local content tonight.
https://git.wehack.space/matrix-synapse-scripts/tree/mxclean

This is solution for pgsql, do u have one more for sqlite? :))

@cryzed
Copy link

cryzed commented Nov 4, 2019

We made synapse-purge to purge remote media cache, local media and events in all rooms (both encrypted and unencrypted) until "x seconds ago" for the Synapse server, feel free to use it. It is highly configurable and should work on all instances using a Postgres database. This also preserves only the most recent user and room avatars and will remove old ones.

@ShadowJonathan
Copy link
Contributor

Possibly related to #890 and #2315

anoadragon453 pushed a commit that referenced this issue Oct 26, 2020
Related to: #6459, #3479

Add `DELETE /_synapse/admin/v1/media/<server_name>/<media_id>` to delete
a single file from server.
@anoadragon453
Copy link
Member

Hello there. The next major release of Synapse should include an admin API to delete local media by timestamp of last access (which helps exclude avatars) and file size. This was introduced by #8519, with documentation on how to use it located here.

I'm going to close this issue now. Feel free to open another issue if you have any further requirements 🙂

@Linuxine
Copy link

Hi, thanks @anoadragon453 for the response, and the great feature ! I was really waiting for this one :D

I tried using the API to delete local media (using -X POST https://<server_name>/_synapse/admin/v1/media/<server_name>/delete?before_ts=xxx) but I get a response

{
    "errcode": "M_UNRECOGNIZED",
    "error": "Unrecognized request"
}

Is it implemented in 1.22, or will it be in a next version ?
Thanks a lot again !

@clokep
Copy link
Contributor

clokep commented Oct 27, 2020

This will be in 1.23.0.

@Linuxine
Copy link

Ok, thanks @clokep ! I was being too impatient, sorry :D

@kerlerm
Copy link

kerlerm commented Jan 9, 2021

I'm running 1.24.0. Is this time based feature ("before_ts=") working for you? I just get
{"deleted_media":[],"total":0}
but there certainly are media files older than "ts" on my server...

@Linuxine
Copy link

Linuxine commented Jan 11, 2021

I'm running 1.24.0. Is this time based feature ("before_ts=") working for you? I just get
{"deleted_media":[],"total":0}
but there certainly are media files older than "ts" on my server...

Yes, it's working ! Is your timestamp in Unix format, in ms ? Beware, by default the "date" command does not include the milliseconds.

I am using this argument to delete the files older than 60 days:

delete?before_ts=$(date +%s000 --date "60 days ago")

@kerlerm
Copy link

kerlerm commented Jan 11, 2021

My bad. I used Unix timestamp in seconds, not milliseconds...

@IngwiePhoenix
Copy link

IngwiePhoenix commented Jan 28, 2021

I am running into some very tight disk space issues and I need to clean out as much old data as possible. But, the request is not working for me?

$ curl -v "http://localhost:8008/_synapse/admin/v1/media/ingwie.io/delete?before_ts="(date +%s000 --date "60 days ago")
...snip...
{"errcode":"M_UNRECOGNIZED","error":"Unrecognized request"}

what am I doing wrong here?

(BTW - I am using the fish shell. I used echo before, URL is fine. At least technically.)

@kerlerm
Copy link

kerlerm commented Jan 29, 2021

Do you send a header with token? Is the user HS admin?
See my cleanup script: https://github.com/ffulm/scripts/blob/3df10ccd2f24dcf158676782fb82b35e935a9d43/clean_matrix#L74

@IngwiePhoenix
Copy link

Ohh. None of the replies here had mentioned the header. Yes - I didn't send a token. That said, is there a way to clean out local and remote media at once or do I have to go through each and every remote server (which I could probably automate with ls -1 | xargs ...)?

@anoadragon453
Copy link
Member

@IngwiePhoenix That will need to be a POST request (-X POST), you're currently doing a GET.

That said, is there a way to clean out local and remote media at once or do I have to go through each and every remote server

I'm afraid this endpoint is a bit confusing, and I've created an issue for cleaning it up a bit: #9284

POST /_synapse/admin/v1/media/<server_name>/delete?before_ts=<before_ts> will delete all local copies of any locally or remotely-uploaded media. server_name is always the server name of your local homeserver.

So this will remove media uploaded from anywhere - you don't need to specify each individual remote server name.

See the media admin api docs for more information.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests