Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] Add media list command #1943

Merged
merged 7 commits into from
Jul 7, 2023

Conversation

daenney
Copy link
Member

@daenney daenney commented Jul 3, 2023

Description

This was brought on by my attempt to set up backups for my own instance and the update to the docs in #1942.

This is an attempt to help alleviate #1776. Using admin media list --local the full path to each local media file will be printed, with a newline. The output of this should be feadable into backup tools in order to allow to backup local media too. Together with the database this should allow to fully recover from the loss of an instance.

The list command also gets a --remote flag for symmetry. In the case of --remote we print the RemoteURL instead, the location the asset can be retrieved from.

To get all media, you can run with --local and --remote.

Checklist

Please put an x inside each checkbox to indicate that you've read and followed it: [ ] -> [x]

If this is a documentation change, only the first checkbox must be filled (you can delete the others if you want).

  • I/we have read the GoToSocial contribution guidelines.
  • I/we have discussed the proposed changes already, either in an issue on the repository, or in the Matrix chat.
  • I/we have performed a self-review of added code.
  • I/we have written code that is legible and maintainable by others.
  • I/we have commented the added code, particularly in hard-to-understand areas.
  • I/we have made any necessary changes to documentation.
  • I/we have added tests that cover new code.
  • I/we have run tests and they pass locally with the changes.
  • I/we have run go fmt ./... and golangci-lint run.

This is an attempt to help alleviate superseriousbusiness#1776. Using admin media list
--local the full path to each local media file will be printed, with a
newline. The output of this should be feadable into backup tools in
order to allow to backup local media too. Together with the database
this should allow to fully recover from the loss of an instance.

The list command also gets a --remote flag for symmetry. In the case
of --remote we print the RemoteURL instead, the location the asset can
be retrieved from.

To get all media, you can run with --local and --remote.
@@ -105,6 +105,7 @@ EXPECT=$(cat <<"EOF"
"letsencrypt-email-address": "",
"letsencrypt-enabled": true,
"letsencrypt-port": 80,
"local": false,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to how flags are declared through the config module, we now end up with these somewhat generic names in the config. It's not very terrible, but I could easily see this clashing with a different command at some point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could change the flags to have a more verbose and unique name, like --list-local, but it starts to stutter a bit with admin media list --list-local etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're going to be having a tidy-up of CLI and environment stuff in a little while anyway so i wouldn't worry about it too much until then. we're still only alpha too so we can afford to have breaking changes here and there when it comes to CLI usage

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually thinking about it. What if we had admin media [list-local|list-remote] as two separate commands? As I'm not sure I could see a situation where you would want both filepath outputs and remote URLs output at the same time 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking the same. I'll do that, that also avoids the flag issue while we're at it.


for _, a := range attachments {
if local && a.RemoteURL == "" {
if _, err := f.WriteString(path.Join(mediaPath, a.File.Path) + "\n"); err != nil {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we should return the error here, or just ignore it entirely. Likelihood of this causing an error is essentially 0, unless os.Stdout gets banged up somehow. It does have the advantage that if we do run into an issue, the command will exit non-zero so it's possible to detect that generating the list experienced an error. That seems desirable as the output of this command is meant to feed into other systems.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

honestly i would just drop the error here. a stdout error indicates a serious issue that is going to stop this entire process from working. chances are stderr would be fucked too which is why i wouldn't bother with a panic either.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also makes the code here much more concise :)

return fmt.Errorf("failed to retrieve attachments: %w", err)
}

f := bufio.NewWriter(os.Stdout)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to buffer stdout here?

Copy link
Member Author

@daenney daenney Jul 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually grab a buffered one when I expect to be flushing a lot of output. I'm happy to change it though.

Edit: Just realised that answer wasn't very clear. I typically buffer when I expect a lot of output to limit the amount of write system calls that we end up needing. Depending on the size of the instance the amount of local or remote media can be significant.

// Set the state DB connection
state.DB = dbConn

attachments, err := dbConn.GetAttachments(ctx, "", 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should probably use one of the other paged functions here? e.g. if you look at how we iterate over media in internal/cleaner. on large instances this will be loading a lot of attachments into memory at once.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah indeed. I'll take a look at that!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm, so looking at internal/cleaner it seems to iterate on the files in storage. But I can take the same approach with iterating using db.GetAttachments by starting without a maxID but with a limit and then go from there until I've made it through. That would let me do something like take 50 files at a time and not blow up memory usage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's what I was thinking essentially :)

This changes the implementation of admin media list --<variant> to two
separate top-level commands, list-local and list-remote.

The implementation now iterates over over the database in batches of 200
in order to avoid loading all media metadata into memory.

var errListDone = errors.New("no more")

func (l *list) Next(ctx context.Context) ([]*gtsmodel.MediaAttachment, error) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super stoked about this "iterator" thing, but I wanted to avoid having all this code duplicated between the ListLocal and ListRemote commands.

Very open to better suggestions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I've been thinking that maybe I could add a local bool to this and based on that do the writing to l.out directly. Only reason I didn't is because I thought it might be nicer to keep the fetching from the DB and the printing to stdout separate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done away with the iterator-like thing because it just didn't feel right. I like the new approach with passing in a filter func a lot more.

@tsmethurst
Copy link
Contributor

You nerds seem to have this covered pretty well already, but ping me if you want my input on anything :)

@daenney daenney mentioned this pull request Jul 5, 2023
9 tasks
This does away with the somewhat odd iterator-like structure we had
before and does away with most of the loop duplication in list-local and
list-remote. Instead they call GetAllMediaPaths with a filter func to
select the media they want. That's accumulated into a slice and
eventually returned.
daenney and others added 3 commits July 6, 2023 13:46
Since we don't append the empty string anywhere, the remote filter can
be limited to returning RemoteURL, as that'll be an empty string for
local media.
@tsmethurst
Copy link
Contributor

I'm happy to merge this if you're done tweaking, it looks good to me :)

@daenney
Copy link
Member Author

daenney commented Jul 7, 2023

I think the docs was the last thing I wanted to add. So yap, I'm good at this point for a squerge.

@tsmethurst
Copy link
Contributor

ACTIVATING SQUERGE

@tsmethurst tsmethurst merged commit 81f33c3 into superseriousbusiness:main Jul 7, 2023
2 checks passed
@daenney daenney deleted the media-list branch July 7, 2023 09:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants