Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could MyDumper support stream backup and loader? #100

Closed
leafonsword opened this issue Jan 12, 2018 · 13 comments · Fixed by #403
Closed

Could MyDumper support stream backup and loader? #100

leafonsword opened this issue Jan 12, 2018 · 13 comments · Fixed by #403

Comments

@leafonsword
Copy link

Sometimes we need backup to stream and load from stream so that backup is less, could MyDumper support stream backup and loader?

@maxbube
Copy link
Collaborator

maxbube commented Feb 19, 2018

Hi @leafonsword ,

I got that question often :) it won't be easy as how mydumper actually works, it will need some refactoring, i.e. the order in which diff objects are dumper, at the moment we first do data then schema so we need to swap that, and for sure there are other considerations.

The only I can tell at this time, is that I will review the effort on this and see if I can have the time to develop it.

Thanks for the feedback!

@nathanielks
Copy link

+1 here 😄

@qianxiansheng90
Copy link

+1 need this

@bartoszcisek
Copy link

@maxbube Even without switching data->schema order, this streaming would be useful. azure-cli supports only single file upload to blob storage. Streaming mydumper output directly to tar would save us 50% IO on backup server.

@Esysteme
Copy link

Esysteme commented Oct 8, 2020

any update ?

@davidducos
Copy link
Member

Hello @Esysteme,

At this moment there is no plan to implement it. I know that it is a really useful feature, but we need to consider a writer/reader object which must be handling a queue, for writes in mydumper is simpler, but for myloader it is not, as we need to stop reading from the queue when a CREATE TABLE arrives to avoid issues in the next INSERTs.

@davidducos
Copy link
Member

davidducos commented Mar 31, 2021

I'm answering to my self, we can use mysql comments that can be parsed by myloader to "send commands", so it will be able to know when a stage finished to start with the next stage.

@davidducos davidducos added this to the Release 0.10.9 milestone Mar 31, 2021
@davidducos
Copy link
Member

We can use:

#include <sys/syscall.h>
syscall(SYS_gettid)

to get the thread id, as I was thinking to implement this on function write_data and as write_data can be called several times before a file is close, we need to identify the write_data per thread.

When write_data is called and streaming is being used, we are going to be enqueuing, not just the statement, it will be the thread_id

Another consideration is that there will be a new thread which is going to be reading from the

@davidducos
Copy link
Member

I already have a version of mydumper.c which serialize the export to the stdout. I'm going to start working on the myloader.c now...

@davidducos
Copy link
Member

Hi All,

I'm still working on this solution, as I'm still not sure how to implement it. I learned a lot on what I did on mid-April, but it was going to be very difficult to maintain during the time. I'm still thinking in the best solution.

@davidducos
Copy link
Member

Finally, I reached to a conclusion.
On mydumper will be adding 2 options:
--stream: it will indicate that the files created needs to be streamed through the STDOUT
--no-delete: the files created in the backup directory will not be deleted.

When we use --stream, the files are going to be created locally and they will be also streamed, by default, they will be deleted after transferred had completed. We only need to add a stream thread that will receive file names which are going to be streamed. verbose option will be only available if -L is set.

myloader will implement a thread to read the stream and creates the file locally. It will have the same 2 options:
--stream: it will indicate that the files are coming from the stream
--no-delete: the files created in the backup directory will not be deleted.

We need to consider if --stream and --no-delete is used, the connection to the database might not be needed, as the user might want to expand the backup files only.

@cyford
Copy link

cyford commented Oct 3, 2023

Would be nice if we can stream directly into mysql without the need to create the files... It seems all files are completed before import begins

@davidducos
Copy link
Member

Hi @cyford,
I thought about that but taking into account that backup takes less time than restore and we need to use FTWRL to get a consistent backup, I decided to implement a file based buffer. You can check https://www.percona.com/blog/backup-and-restore-with-mydumper-on-docker/ which might help on what you are trying to accomplish

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants