Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote cleanups #138

Closed
vitalybaev opened this issue Apr 28, 2018 · 18 comments
Closed

Remote cleanups #138

vitalybaev opened this issue Apr 28, 2018 · 18 comments
Assignees
Milestone

Comments

@vitalybaev
Copy link

Hi, you have developed great product that used at almost all my projects. But impossibility of automatic cleaning up remotes really disappointing me. I have to write own tools for checking size and limiting amount of remote backups.
As I can see it’s not so complicated to implement remote clean. Yes I know that we could have different remote structure, but we could use settings in sync settings. And we have to implement receiving remote files in all presented adapters. But it’s not so difficult I guess.
The only question is about how to structure this in the configuration and you as owner should decide this.
If we’ll find a consensus about this feature I could help you with this implementation

@sebastianfeldmann
Copy link
Owner

Thank you very much!

Remote cleanups would be a nice addition to phpbu.
I think that the remote cleanup strategy should be configured separately from the local cleanup strategy so you can keep more/different backups

In my opinion there are two valid ways to implement remote cleanups.
The first one is like you said within the sync configuration.

<sync type="dropbox">
  <!-- ... -->
  <option name="path" value="foo/bar"/>
  <option name="cleanup.type" value="size"/>
  <option name="cleanup.limit value="50g"/>

This is likely to bloat the Sync implementations with cleanup code and it would require oversight and discipline to keep all sync cleanup configurations alike.
I could imagine doing something similar to the local cleanups. Creating something like the Backup\Collector and passing it to one of the available Cleanup implementations.

The second one would be to create a new cleanupRemote element

<cleanupRemote type="size">
  <option name="sync" value="my-dropbox"/>
  <option name="limit" value="50g"/>
</cleanupRemote>

This way the configuration is more like the local cleanup configuration. But it has other disadvantages.
Either you double the sync path configuration which would be prone to human errors, or you have to reference the Sync you want to clean up which sound complicated at first thought.

What do you think?

@vitalybaev
Copy link
Author

I definitely vote for the first variant, cause it's more flexible.
I'll take a look more deeper into it and will response you shortly!

@vitalybaev
Copy link
Author

So, I've checked the code and have following points. Sure, you know your code better than me, so please make comments.

  1. We place configuration in sync as you mentioned above.
  2. Collector logic as well as remote cleaning up logic are implemented in Sync classes for each adapter, I suggest fix it via interface.
  3. I don't want to duplicate code of Cleaners. But they using File class to do their job. But we could make some sort of AnyFile interface or class, that have needed methods (like getSize, getMTime, etc.). So we could pass other remote files classes to already existing cleaner classes.
  4. I don't know exactly do we need this, but it could be useful to determine, whether specific adapter could clean files remotely or not.

I continue to investigate

@sebastianfeldmann
Copy link
Owner

You are right!
The idea would be to...

  • Make Collector an Interface and rename current Collector to CollectorLocal
  • Make File an Interface and rename current File to FileLocal, FileDisk or something.
  • Maybe extract the "real" cleanup logic from the Cleanup implementations to separate classes, so the Sync and Cleanup implementations can use those independently.
  • I'm not sure if we need an interface like RemoteCleaner because the code doesn't really care, only the Sync itself is offering the configuration options and remote cleanup is only happening if configured and not called automatically if available. My guess is that not every Sync will support remote cleanup, at least not at first, but that's fine! Maybe different Sync implementations will offer different remote cleanup strategies because of technology restrictions. To handle this with interfaces would be a bit messy.

So in code:

create
\phpbu\App\Backup\File\FileLocal
\phpbu\App\Backup\File\FileRemote
\phpbu\App\Backup\Collector\Local
\phpbu\App\Backup\Collector\Dropbox
...

extract cleaning logic to
\phpbu\App\Backup\File\Cleaner\Capacity
...

Thanks again for looking in to this.

@sebastianfeldmann
Copy link
Owner

I think a generic File\Remote is not enough.
For the Collector approach to work we need File\Dropbox, File\FTP ... and so on that implement the unlink method.

@vitalybaev
Copy link
Author

Yes, I've also noticed this question too. As we have only local cleaning up, it works fine. But with remotes I guess that Cleaners should only return file list to delete. Also I think, that we haven't to split collectors logic for separate classes. My point is Collector is only about getting complete file list for further checking it for removing. And with third party apis it's more easy in Sync classes, cause we already have all config information needed (remote path, credentials, api client).
Impossibility of cleaning up with specific driver is not a problem because we could print debug message or something like this if cleaning up is not supported.

In common:

  1. In Sync class we implement method like collect, returning array of FileRemote (see below about it).
  2. With File, FileLocal, FileRemote - create this classes for dealing with Cleaner logic as a main purpose. InFileLocal we leaveunlink method as is. And in FileLocal we pass our Sync class that implemented some kind of unlinkFile. So it's like dependency injection pattern.
  3. Cleaning up firing in Sync class method that: 1 - checks if cleaning up configured, 2 - checks if selected cleaning up type is supported, 3 - collects files, creates specific Cleaner instance, passes files to it and fires clean.

What do you think?

@vitalybaev
Copy link
Author

Also in this case we have to place api clients as class property for reusing it.
I could write some proof-of-concept later today and you take a look at whole logic and we can decide pros and cons of it.

@sebastianfeldmann
Copy link
Owner

That would be awesome.
Having the collection logic inside the Sync classes maybe is a bit much.
But I'm looking forward for your proof of concept.

vitalybaev pushed a commit to vitalybaev/phpbu that referenced this issue Apr 30, 2018
@vitalybaev
Copy link
Author

@sebastianfeldmann I've finished my work and have the following:

  1. Code located at branch in my fork here https://github.com/vitalybaev/phpbu/tree/feature/remote-clean-up
  2. Implemented OpenSwift and Dropbox remote cleaning up
  3. Unit tests passed
  4. Code is very alpha cause much components changed/renamed
  5. In some places I've updated code style by PHPStorm :-( sorry for that, I'll fix it

@vitalybaev
Copy link
Author

vitalybaev commented Apr 30, 2018

Some points:

  1. Splitted File to File interface and FileLocal & FileRemote classes as discussed
  2. Splitted Collector for Collector abstract class and Collector\Local, Collector\Dropbox, Collector\OpenStack... containing collecting logic for specific remote storage. Remote collectors also receive specific Sync class for dealing with api client.
  3. Configuration is as mentioned: keys prefixed with cleanup.* in sync section.
  4. api clients in Sync classes are now protected properties with getters for accessing by collectors.
  5. File unlink placed in Sync classes. I'm not sure that's right place but we could find a solution.
  6. OpenStack and Dropbox successfully tested
  7. Code for setting up cleaner placed in SyncClearable trait.

So, wait you feedback!

@sebastianfeldmann
Copy link
Owner

Wow that was fast.

Here are my 50 cents

  1. I'm not sure if a single remote file class is the right approach. Using the Syncinstance is kind of nice, but I think I would prefer a File\Dropbox so the file deletion is not bloating the Sync implementations.
  2. Like!
  3. Nice!
  4. Like I said I would prefer injecting the Client not the whole Sync instance
  5. --
  6. Nice!
  7. You are calling Sync::cleanup from the Runner this means, the Runner has to know if a cleanup functionality exists, or all Sync implementations have to support this. I think I would call the cleanup internally within the Sync::sync method. So the runner doesn't have to care. It's just a Sync feature.

Will have a more detailed look at it, when I return from the movies it's Avengers time ;)

@vitalybaev
Copy link
Author

vitalybaev commented May 1, 2018

So:

  1. I think it would the most proper variant too. The only limitation is that we have only information about file path, so we could search backups for delete only in this location. But I'm sure it's ok.
  2. Good
  3. Good
  4. Since we have discussed in 1 - you are right
  5. Yep
  6. Good.
  7. My point is only to fix methods via interface and specific adapter could only have this method and actually doesn't anything (maybe shows message about unimplemented remote cleanup for this type of sync). But if you wish, I could change this behaviour.

I'll do changes shortly

@vitalybaev
Copy link
Author

I've updated my code. Did the following:

  1. Any adapter now has own FileRemote subclass. Also that classes actually removes files.
  2. To adapters we are now passing only api client and path (since we have not (isn't it?) ability to determine path for looking for).
  3. Removed unlinkFile methods from interfaces and from Sync classes due to uselessness
  4. Our last discussed case (7th pt.) - is open for now. I have to discuss it with you for final solution.

My next roadmap (if we won't dramatically change architecture):

  1. Implementing remote cleanup for at least: ftp/sftp, amazon s3
  2. Write some tests
  3. Check for code style and fix broken
  4. Test test test (but Dropbox and Rackspace tested by me good)

@sebastianfeldmann
Copy link
Owner

  1. Sweet
  2. Path and client are perfect. I think that's as minimal as it gets.
  3. Like!
  4. (7.) I think since the remote sync feature is provided by the Sync implementations, it should be handled within the Sync::sync call. The Runner doesn't have to know about this. Especially since the Sync classes take care of error handling etc. In short, I think 'remoteCleanup` is part of of the "sync" at least now, thanks to you, the implementation somehow lives up to its "Sync" name ;)

Awesome job!

I hate to bring this up, I know how annoying I can be with stuff like this, but could you please move the following.

\phpbu\App\Backup\File\File => \phpbu\App\Backup\File
\phpbu\App\Backup\File\FileLocal => \phpbu\App\Backup\File\Local
\phpbu\App\Backup\File\FileRemote => \phpbu\App\Backup\File\Remote

\phpbu\App\Backup\Collector\Collector => \phpbu\App\Backup\Collector

\phpbu\App\Backup\SyncClearable => \phpbu\App\Backup\Sync\Clearable

Thanks you so much ;)

@sebastianfeldmann
Copy link
Owner

I have just noticed one more thing.

If a remote cleanup is configured it would be nice if the simulate call would give some feedback about that.

@vitalybaev
Copy link
Author

Thank you for the feedback! I’ll fix all stuff a bit later today! :)

Oh yeah, I forgot about simulate! How much information to show on simulate section?

@sebastianfeldmann
Copy link
Owner

I think as a first step it would be enough if it states that a remote cleanup would be executed.
The premium variant would be a list of remote files that would have been deleted 😇

@sebastianfeldmann
Copy link
Owner

This is done :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants