Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XtraBackup source - Does not produce 1 file #9

Closed
F21 opened this issue Apr 7, 2015 · 26 comments
Closed

XtraBackup source - Does not produce 1 file #9

F21 opened this issue Apr 7, 2015 · 26 comments
Assignees

Comments

@F21
Copy link
Contributor

F21 commented Apr 7, 2015

I am currently implementing an XtraBackup source which uses Percona's XtraBackup tool to perform backups for mysql.

The problem is that XtraBackup will dump a bunch of files into a folder rather than producing 1 file such as backup.sql. I have already implemented the source and it works properly.

The only problem is if compress and encrypt is enabled on the target, it will break because those methods expect a file and not a folder of files.

1 solution is to have the XtraBackup source tar the files, but I feel the source is probably doing a bit too much.

Let me know what you think.

I also plan to implement a few more sources which will produce dumps that are a bunch of files instead of 1 file.

@sebastianfeldmann
Copy link
Owner

Hi, i ran into this problem with the mongodump source as well. The quick and dirty fix was to add a Tar command to the source, but your feeling might be right, that should not be handled by the source itself.

Maybe the Source could use the Target object to signal the application that the compression has to be done afterwards. Then the application would handle the Tar compression.

All the informations that has to be transfared from the Source to the application would be:

  • Trigger that compression has to be done by the application
  • The path to the uncompressed file / folder

Or can you think of something else?

@F21
Copy link
Contributor Author

F21 commented Apr 7, 2015

I think those are all good ideas!

For #10, we will need to be able to tell the target to not create the target folder. That is because if we create the folder, say /backups/mysql/2015-04-07-1010 and pass it to innobackupex, it will see that /backups/mysql/2015-04-07-1010 exists, so it terminates.

For #10, we will need to be able to signal to the Target to not call setupPath(), otherwise innobackupex will see that /backups/mysql/2015-04-07-1010 exists and fail.

If you could get these changes in, I can then submit my PRs for a few different sources 😄

@F21
Copy link
Contributor Author

F21 commented Apr 8, 2015

Just an update on this one.

I have completed sources for ArangoDB and XtraBackup. The only blockers left for those 2 sources are the issues highlighted here:

  • We need a way to signal to target to not create the folder and let the backup utility create it.
  • We need a way to deal with backup utilities that creates a folder of files as opposed to just 1 file.

@sebastianfeldmann
Copy link
Owner

I have thought a lot about that problem. I'm having a hard time to make a final decision.
I will just share my thoughts so you can add some feedback.
There are mostly two ways of doing this.

1. Let the Runner.php compress the folder

Pro: The Source is only responsible for one thing, creating the backup.
Con: We get inconsistent behavior sometimes the Source handles the compression, sometimes it doesn't

2. Create a abstract source class like DirectorySource that handles the compression

Pro: We have a consistent behavior and the Source is always making sure we are getting a single file as backup.
Con: The bad smell that the Source should not do this

What do you think?

@F21
Copy link
Contributor Author

F21 commented Apr 8, 2015

I have 2 ideas:

  1. How about we have DirectoryTarget and FileTarget? If a filename attribute does not exist in the configuration, we assume it is a DirectoryTarget and let the runner decide whether target is DirectoryTarget or FileTarget.

    Pro: I think this is the responsibility of target and it fits the model quite well.
    Con: Sources have no control over target type. What if someone using the XtraBackup source includes a filename in the target which results in a FileTarget being created and causes an error?

  2. Have a getConfiguration method inside the sourceInterface which returns some configuration in an associative array on whether the target is a file or directory which will fix Do not create target dirs for some sources #10. This also has the advantage where the source can tell the target to not create the destination folder. Maybe an abstract class can be used with sensible defaults so that only sources which are "special" need to implement this.

    Pro: Solves all our problems.
    Con: Not very OOP. Maybe getConfiguration() must return an object of type SourceConfiguration? Return type hinting will only be available in PHP 7.

@sebastianfeldmann
Copy link
Owner

I like your Nr. 2, I thought about extending the Source Interface as well but wasn't happy about the downwards incopatibility and ruled it out, but maybe it is the right way after all.

What Do you think about the following:

Adding two methods to the Source interface

  1. isHandlingCompression() returning boolean yes/no
  2. getDataPath() returning string or maybe even an array of files and directories

So the Runner/Target could decide if and how to compress the generated data.

if (!$source->isHandlingCompression()) {
    $this->handleCompression($target, $source->getSourceData());
}

Add an abstract base Source class that is impementing both methods.

abstract class CompressionHandler
{
    public function isHandlingCompression()
    {
        return true;
    }

    public function getSourceData()
    {
        throw new Exception('source is handling compression by itself');
    }
}

Big Pro
We show the user explicitly that he has to care and don't use a secret communication between Source and Runner that he doesn't know about.

Pro
We have the flexibility to allow sources to not care about the compression.

Con
Inconsitent compression handling, but this is getting defused a lot, because it is explicitly defined behavior. So not realy a "con" anymore.

Con
Break downwards compatibility... i can life with that ;)

@F21
Copy link
Contributor Author

F21 commented Apr 8, 2015

Good idea! However, I am not too sure about having isHandlingCompression() and other methods on the source interface. If we need to add new things in the future, the source interface will change and not be stable.

How do you feel about having a general method that returns a configuration object? If we need to add more configuration in the future, we can just add things to the configuration object.

@sebastianfeldmann
Copy link
Owner

I see your point. But this is hiding implementation details you have to be aware of if you create your own Source class.

The general variant you are suggesting could look something like this.

Status class (pure dummy)

class Status
{
    public function __construct($isCompressed = true, array $dataPath = array())
    {
        $this->isCompressed = $isCompressed;
        $this->dataPath     = $dataPath;
    }
    // ... setter/getter methods ...
}

Source base class

abstract class CompressionHandler
{
    public function getStatus()
    {
        return new Status(true);
    }
}

Runner logic

$source->backup($target, $result);
$status = $source->getStatus();
if (!$status->isCompressed()) {
    $this->handleCompression($target, $status->getDataPath());
}

This looks like some kind of detour to me.

@F21
Copy link
Contributor Author

F21 commented Apr 8, 2015

I was thinking maybe something like this:

Base source class

abstract class Source{

   public function getSourceConfig(){

        //This should return default config, so that only "special" sources need to implement this method
        return new SourceConfig(array('doNotCreateDirectories' => true, 'isCompressed' => true));

       //Flaw: cannot enforce return type as SourceConfig, but we can check this in Runner until PHP 7 lands.
   }

   abstract  public function setup(array $conf = array());

   abstract public function backup(Target $target, Result $result);
}

SourceConfig class

class SourceConfig{
   public function __construct(array $conf){
      //convert config array and do things with it
   }
}

Runner

$config = $source->getConfig(); //Get the config to use with the factories later.
$target = TargetFactory($config);
 //Other factories also has access to $config so that the source can provide some configuration (if required)
 //Here, the target factory looks at the config and sees "doNotCreateDirectories == true" and does not create directories.
//Target factory can also see if the source needs a file or directory target and returns `FileTarget` or `DirectoryTarget` targets

@sebastianfeldmann
Copy link
Owner

We are dealing with two problems at the same time. Lets split this up.

1 The directory problem

I think this should be handled within the source by just adding a subdirectory to the Target path.
So the Source is taking care of the directory handling if it realy needs an empty directory.
It even could create a directory in a completely different location.
This makes total sense because maybe you want to keep all your compressed backups in the same directory and then the directory exists anyway.

public function backup(Target $target, Result $result)
{
    // use some none existing sub directory or any other location you like
    $dumpDir = $target->getPath() . '/dumpDir';

2 The compression no compression

Instead of using a new method to get the Configuration/Status object we let the backup method return it.

Interface Source
{
    /**
     * @return \phpbu\App\Backup\Source\Status
     */
    public function backup(Target $target, Result $result);
}
public function backup(Target $target, Result $result)
{
    $dumpDir = $target->getPath() . '/dumpDir';
    // ...
    return new Status(false, $dumpDir);

So $dumpDir could be anywhere on the filesystem and Runner and Target are taking care of the compression if neccessary.

$status = $source->backup($target, $result);
if (!$status->isCompressed()) {
    $this->handleCompression($target, $status->getDataPath());
}

I think this is the best solution so far.
For backwars compatibility we could ignore if no status object is returned.
And then force the return in the next major version.

@F21
Copy link
Contributor Author

F21 commented Apr 8, 2015

That looks great! I really like how you handled the compression problem. I think that's how the second issue should be implemented 😄

However, I still have a question regarding the directory problem.

For example, if the user uses this config:

<target dirname="/backup/mysql/data-%Y%m%d-%H%i"/>

We would then ask innobackupex to dump all the files into /backup/mysql/data-%Y%m%d-%H%i/. The problem is that the folder data-%Y%m%d-%H%i cannot exist. It must be created by the backup tool.

We can have the source ask innobackupex to dump the files into /backup/mysql/data-%Y%m%d-%H%i/dump, but then there will be an extraneous dump folder which the user did not configure. Maybe they have some other tool that uses the backups and they will need to be able to handle the edge case of using the dump folder inside.

The issue of having a folder instead of 1 file is also a bit trouble some. Let's say the tool is able to dump the files into /backup/mysql/data-%Y%m%d-%H%i/. We then compress the contents of that folder. But where and what should the compressed file be called? Should it be /backup/mysql/data-%Y%m%d-%H%i.tar.gz? Or should it be /backup/mysql/data-%Y%m%d-%H%i/data-%Y%m%d-%H%i.tar.gz or some other filename?

@sebastianfeldmann
Copy link
Owner

Ok Nr 2. solved! Great!
So let's focus on Nr 1.

If you look at the Mongodump hack. The Tar command deletes the originals after a successful compression.

So i think both, directory and filename should be configured.

<target dirname="/backup/mysql/data-%Y%m%d-%H%i" filename="innox.dump" compress="bzip2"/>

The Source doesn't handle compression so Runner and Target have to and create the configured filename and remove the uncompressed "temp" file/directory.

It's only getting complicated if for some reason you want to keep the original uncompressed files.
But for now I would ignore that and force the deletion.

Later we could add some methods to the Status result class like.
deleteOriginals(boolean), shouldOriginalsBeDeleted() :boolean (default true)
And you could add a innobackupex option like

<option name="keepOriginalDumpDir" value="true"/>

The problem with keeping the dump dirs is, the Sync and especialy the Cleaner classes can't handle directories. But i think this is a problem for another release ;)

@F21
Copy link
Contributor Author

F21 commented Apr 8, 2015

Good idea! What if there is no filename and compressor set? Is the expected behaviour to dump into /backup/mysql/data-%Y%m%d-%H%i or some temp directory inside /backup/mysql/data-%Y%m%d-%H%i? If we want to dump directly into /backup/mysql/data-%Y%m%d-%H%i we need to ensure data-%Y%m%d-%H%i is not created, otherwise the backup process (innobackupex and arangodump will fail).

@sebastianfeldmann
Copy link
Owner

Maybe filename should be mandatory.
And if you look at the XSD

<xs:attribute name="dirname" use="required" type="xs:string"/>
<xs:attribute name="filename" use="required" type="xs:string"/>

it already is, but the Configuration\Loader\Xml does not reflect that with an Exception.

I think if no dirname or no filename is set there should be an Exception.

@F21
Copy link
Contributor Author

F21 commented Apr 8, 2015

But if the user does not want to compress the backup, then the filename would be extraneous (and perhaps a bit confusing) wouldn't it?

@sebastianfeldmann
Copy link
Owner

Yes it would be. This one is on me, I haven't thought of the possibility to keep unkompressed folders :(
I think we should just stick with this for the moment and don't allow Targets without filenames.
Allowing this would imply changes on multiple levels. This is the kind of Feature for another major release.
The good thing is, allowing to keep directories should not affect the source implementations anymore because of the changes we where digussing here.
I will open a new issue/feature-request to keep uncompressed directories.
OK for you?

@F21
Copy link
Contributor Author

F21 commented Apr 9, 2015

Yes that sounds good. I think we should keep it simple for now, because I would love to send some PRs for my other sources 😄

If you could get that implemented, I can then rebase and get my PRs in 😄

@sebastianfeldmann
Copy link
Owner

Awesome!
I'm trying to get this done today

@F21
Copy link
Contributor Author

F21 commented Apr 9, 2015

I saw some commits. Is this feature considered done?

@sebastianfeldmann
Copy link
Owner

Not quiet jet.
I still have to refactor the Mongodump Source to not execute the Tar and let the Runner do the job.
It got a "little" bit late at the office today :(
Prototype is running smoothly, but still needs some tweaking.
Hope to finish it tomorrow morning.
I keep you posted :)

@F21
Copy link
Contributor Author

F21 commented Apr 9, 2015

Awesome! 😄

sebastianfeldmann added a commit that referenced this issue Apr 10, 2015
'Runner' is able to handle uncompressed source data
@sebastianfeldmann
Copy link
Owner

I just pushed the changes. If you have a look at the Mongodump Source it returns a Status like this.

return Status::create()->uncompressed()->dataPath($this->getDumpDir($target));

The uncompressed triggers the runner to take care of the compression.

After integrating your new sources I think I have to refactor quite a bit. I don't like the inheritance Source => Binary any more. This was a really bad idea.

But first things first ;)

@sebastianfeldmann
Copy link
Owner

I also added some Exceptions to the Configuration\Target so it isn't possible to create a target without filename anymore.

@F21
Copy link
Contributor Author

F21 commented Apr 10, 2015

Is the generated tar suppose to have a . folder inside as the parent?

Currently, when I run my dump, it creates a tar file with the following inside:

./test.json
./test2.json

I do not have monogodb installed, so I cannot tell if mongodb also has the same behaviour.

@sebastianfeldmann
Copy link
Owner

Currently I call the tar command with "." at the end, so there are no leading directories included in the archive.

@F21
Copy link
Contributor Author

F21 commented Apr 10, 2015

Ah, I see. The command being called is /bin/tar -jcf '/backup/mysql/mysql-20150410-0650.tar.bz2' -C '/backup/mysql/dump' '.'

So, that's why everything will be placed under a directory called ..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants