Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial support for BackBlaze B2 #8

Closed
wants to merge 5 commits into from

Conversation

@sylvainlehmann
Copy link

sylvainlehmann commented Sep 5, 2016

Hello,

Here's a prototype implementation of BackBlaze B2 Backend. It may need some extended tests, but respond correctly to backend's test feature. There is still some things left to do :

  • backend does not support backslashes on filenames. To pass unit tests, i replace them with other characters. It seems not to be a problem, since s3ql does not use backslashes on normal operation, but remain quite dirty
  • backend does not support copy operation. The code download et upload again the file being copied.
@Nikratio

This comment has been minimized.

Copy link
Collaborator

Nikratio commented Sep 6, 2016

Looks promising, thanks for sharing! Are you looking for feedback already, or do you want to finish the implementation first?

For the blackslash issue, I'd suggest to use a technique similar to the one implemented in the escape function (from s3ql.backends.local), just escape \ instead of /.

@sylvainlehmann

This comment has been minimized.

Copy link
Author

sylvainlehmann commented Sep 15, 2016

Thank you for suggest, i corrected the escaping functions.
The code seems to do his jobs, so i'm looking for feedback already if you have some.

@dlight

This comment has been minimized.

Copy link

dlight commented Dec 21, 2016

Note: Blackblaze seems to be open to add backend functionality to support s3ql better (see this comment). They could implement server-side copy if requested by opening an issue.

@szepeviktor

This comment has been minimized.

Copy link
Contributor

szepeviktor commented Jan 8, 2017

@sylvainlehmann May I test your code?

@sylvainlehmann

This comment has been minimized.

Copy link
Author

sylvainlehmann commented Jan 9, 2017

@dlight Thanks for the suggest, i will contact them.
@szepeviktor Yes. Feel free to give me any feedback you find useful !

@szepeviktor

This comment has been minimized.

Copy link
Contributor

szepeviktor commented Jan 9, 2017

~/.s3ql/authinfo2

[blackbaze]
storage-url: b2://my-bucket/prefix_
backend-login: <Account ID>
backend-password: <Application key>
fs-passphrase: <high entropy>

mkfs.s3ql b2://my-bucket/prefix_ and mount.s3ql b2://my-bucket/prefix_ /mnt work OK.

Keep up the work.
Thank you!

@szepeviktor

This comment has been minimized.

Copy link
Contributor

szepeviktor commented Apr 16, 2017

@sylvainlehmann May I begin migrating my clients to BB?

@szepeviktor

This comment has been minimized.

Copy link
Contributor

szepeviktor commented Apr 16, 2017

Build S3QL + B2 in a Docker container - e.g. by starting an UpCloud instance
Dockerfile: https://github.com/szepeviktor/debian-server-tools/blob/master/virtualization/python-s3ql-test-b2/Dockerfile
Build and Run commands included.

@szepeviktor

This comment has been minimized.

Copy link
Contributor

szepeviktor commented Apr 22, 2017

They could implement server-side copy if requested by opening an issue.

Could that be the consequence that copying an empty dir with s3qlcp takes long minutes?

@anttotarella

This comment has been minimized.

Copy link

anttotarella commented Apr 22, 2017

Hey @szepeviktor, thanks for your work. I have built s3ql with b2 using your docker and have got it to mount inside the docker, now my question is how do you get it working outside. Thanks.

@szepeviktor

This comment has been minimized.

Copy link
Contributor

szepeviktor commented Apr 22, 2017

Although s3ql is installable by pip only I decided to patch the Debian backport:
https://github.com/szepeviktor/debian-server-tools/blob/master/package/s3ql-jessie-backports-b2.sh

Also you may run the commands of that Dockerfile on your server.
I prefer proper OS packages.

@anttotarella

This comment has been minimized.

Copy link

anttotarella commented Apr 22, 2017

@szepeviktor OK so there is no way to export the s3ql mount from the docker to the host?

@szepeviktor

This comment has been minimized.

Copy link
Contributor

szepeviktor commented Apr 22, 2017

Of course there is: docker run --volume=HOST-DIR:CONTAINER-DIR:OPTIONS
https://docs.docker.com/engine/reference/commandline/run/#mount-volume--v---read-only

@anttotarella

This comment has been minimized.

Copy link

anttotarella commented Apr 22, 2017

@szepeviktor Hmm, I know of this and have tried it and for whatever reason I cannot see my mount from outside the docker but I can pass through files. I will try with different mount options.

@sylvainlehmann

This comment has been minimized.

Copy link
Author

sylvainlehmann commented May 9, 2017

@szepeviktor

Could that be the consequence that copying an empty dir with s3qlcp takes long minutes?

I think not. Creating folders should only concern database. And server-side copy is only used for metadata rolling backup (as far as i can see)

May I begin migrating my clients to BB?

This implementation looks quite stable, i use it daily since a few monts with no problems, but i work on a quite low load and just for a few Tb of data. If you plan to use this implementation under heavy load and a very large amount of data, i suggest you to make extended tests before

@szepeviktor

This comment has been minimized.

Copy link
Contributor

szepeviktor commented May 9, 2017

This implementation looks quite stable

Could you convince @Nikratio to merge it?

@gbdoin

This comment has been minimized.

Copy link

gbdoin commented May 13, 2017

Right now we're using your fork and uploading about 20T of data. Everything is as stable as it can be. The plan is to use s3ql.b2 to stream multimedia. We will update as soon as we have some results

@ToroNZ

This comment has been minimized.

Copy link

ToroNZ commented May 18, 2017

I've been using this for the last 2 weeks, until recently the filesystem crashed:

tom@doublec-02:~$ df -h
df: /mnt/s3ql-b2: Transport endpoint is not connected
tom@doublec-02:~$ umount.s3ql /mnt/s3ql-b2
ERROR: File system appears to have crashed.
tom@doublec-02:~$ mount.s3ql --authfile /home/tom/.s3ql/authinfo2 --allow-root --compress none b2://blah-blah/blah /mnt/s3ql-b2
ERROR: Mountpoint does not exist.
tom@doublec-02:~$ ls /mnt -lth
ls: cannot access '/mnt/s3ql-b2': Transport endpoint is not connected
total 8.0K
drwxr-xr-x 2 tom www-data 4.0K Apr 30 20:27 s3fs-aws
drwxr-xr-x 2 tom www-data 4.0K Apr 30 15:21 b2fuse-mount
d????????? ? ?   ?           ?            ? s3ql-b2
tom@doublec-02:~/s3ql$ fsck.s3ql b2://blah-blah/blah
ERROR: Can not check mounted file system.
tom@doublec-02:~/s3ql$ sudo umount -l /mnt/s3ql-b2
[sudo] password for tom: 
tom@doublec-02:~/s3ql$ 
tom@doublec-02:~/s3ql$ fsck.s3ql b2://blah-blah/blah
Starting fsck of b2://blah-blah/blah
Using cached metadata.
WARNING: Remote metadata is outdated.
Checking DB integrity...
Creating temporary extra indices...
Checking lost+found...
Checking cached objects...
Checking names (refcounts)...
Checking contents (names)...
Checking contents (inodes)...
Checking contents (parent inodes)...
Checking objects (reference counts)...
Checking objects (backend)...
..processed 10000 objects so far..WARNING: Deleted spurious object 19701

Checking objects (sizes)...
Checking blocks (referenced objects)...
Checking blocks (refcounts)...
Checking blocks (checksums)...
Checking inode-block mapping (blocks)...
Checking inode-block mapping (inodes)...
Checking inodes (refcounts)...
Checking inodes (sizes)...
Checking extended attributes (names)...
Checking extended attributes (inodes)...
Checking symlinks (inodes)...
Checking directory reachability...
Checking unix conventions...
Checking referential integrity...
Dropping temporary indices...
Dumping metadata...
..objects..
..blocks..
..inodes..
..inode_blocks..
..symlink_targets..
..names..
..contents..
..ext_attributes..
Compressing and uploading metadata...
Wrote 520 KiB of compressed metadata.
Cycling metadata backups...
Backing up old metadata...
Cleaning up local metadata...
Completed fsck of b2://blah-blah/blah
tom@doublec-02:~/s3ql$ 
tom@doublec-02:~/s3ql$ mount.s3ql --authfile /home/tom/.s3ql/authinfo2 --allow-root --compress none b2://blah-blah/blah /mnt/s3ql-b2
Using 4 upload threads.
Autodetected 65492 file descriptors available for cache entries
Using cached metadata.
Setting cache size to 24148 MB
Mounting b2://blah-blah/blah at /mnt/s3ql-b2...
tom@doublec-02:~/s3ql$ s3qlstat /mnt/s3ql-b2
Directory entries:    3414
Inodes:               3416
Data blocks:          11985
Total data size:      100 GiB
After de-duplication: 100 GiB (99.84% of total)
After compression:    100 GiB (99.84% of total, 100.00% of de-duplicated)
Database size:        2.01 MiB (uncompressed)
Cache size:           49.7 MiB, 7 entries
Cache size (dirty):   19.7 MiB, 4 entries
Queued object removals: 0

Coudn't find any logs (?). There was nothing in syslog... need to see if FUSE dump a core somewhere.

@sylvainlehmann

This comment has been minimized.

Copy link
Author

sylvainlehmann commented May 18, 2017

@ToroNZ could you please open an issue on the fork (https://github.com/sylvainlehmann/s3ql) and past the content of ~/.s3ql/mount.log ?

@ToroNZ

This comment has been minimized.

Copy link

ToroNZ commented Jun 1, 2017

@sylvainlehmann Finally manage to log it! My apologies for taking so long, stuff gets in the way.

sylvainlehmann#2

@IamDH4

This comment has been minimized.

Copy link

IamDH4 commented Jul 4, 2017

Is this production ready yet? I tested this this earlier in the year and my results were pretty good.

@Nikratio

This comment has been minimized.

Copy link
Collaborator

Nikratio commented Aug 24, 2017

Sorry for the delay! I will take a look at this soon.

@Nikratio Nikratio self-assigned this Aug 24, 2017
@Nikratio

This comment has been minimized.

Copy link
Collaborator

Nikratio commented Aug 25, 2017

At the moment this is very hard to review, since it mixes changes from in between S3QL releases with the actual changes from adding BackBlaze support. I suspect that something went wrong with the merges.

Could you please squash everything into one commit, and rebase that onto current master?

Thanks!

Copy link
Collaborator

Nikratio left a comment

Please squash & rebase on current master, ensuring that the pull request really contains only the right changes

.hgtags Outdated
@@ -86,3 +86,4 @@ cc9dfb6bc9ebb4290aaaa2e06e101c460d4126f5 release-2.15
a7de95ab14ceb9f5be5eb8b44694a4c2a766fe89 release-2.18
18d9916a31218052107bcbd9739f504f9f536e24 release-2.19
6036e4dfee66dba05f2bbb1319b90ed7c9557688 release-2.20
5649b9e2d23e66d79b180107057515c73d621731 release-2.21

This comment has been minimized.

Copy link
@Nikratio

Nikratio Aug 25, 2017

Collaborator

This doesn't look like it has anything to do with adding BackBlaze support

Changes.txt Outdated
@@ -1,3 +1,10 @@
2016-10-28, S3QL 2.21

This comment has been minimized.

Copy link
@Nikratio

Nikratio Aug 25, 2017

Collaborator

..nor this...

@@ -59,7 +59,7 @@ that is not the case.
version between 1.0 (inclusive) and 2.0 (exclusive)
* `dugong <https://bitbucket.org/nikratio/python-dugong/>`_, any
version between 3.4 (inclusive) and 4.0 (exclusive)
* `pytest <http://pytest.org/>`_, version 2.3.3 or newer (optional, to run unit tests)

This comment has been minimized.

Copy link
@Nikratio

Nikratio Aug 25, 2017

Collaborator

..nor this, stopping here

@szepeviktor

This comment has been minimized.

Copy link
Contributor

szepeviktor commented Aug 31, 2017

@sylvainlehmann Your code works at each night on my development server doing an almost complete backup (/usr, mysql, /home, /etc)

Could you please update this PR?

@szepeviktor

This comment has been minimized.

Copy link
Contributor

szepeviktor commented Aug 31, 2017

Or I can open a separate PR with your 4 commits.

@sylvainlehmann sylvainlehmann force-pushed the sylvainlehmann:master branch from 36b71e3 to 425b737 Sep 2, 2017
@sylvainlehmann

This comment has been minimized.

Copy link
Author

sylvainlehmann commented Sep 2, 2017

Thank you for review. I made a clean rebase. This should be OK now

@Nikratio

This comment has been minimized.

Copy link
Collaborator

Nikratio commented Sep 2, 2017

Thanks! One thing I can see right away is the lack of documentation. Could you please extend rst/backends.rst as necessary?

@sylvainlehmann

This comment has been minimized.

Copy link
Author

sylvainlehmann commented Sep 25, 2017

The documentation has been added

@szepeviktor

This comment has been minimized.

Copy link
Contributor

szepeviktor commented Oct 26, 2017

@Nikratio Could you merge it for the next release?

@Nikratio

This comment has been minimized.

Copy link
Collaborator

Nikratio commented Oct 30, 2017

I will try. As an aside, when merging I will squash this into one commit (the current split into 5 commits doesn't seem to make sense), so in order to preserve correct attributions it would be nice to have the pull request just contain one commit too. But that's not a showstopper.

@szepeviktor

This comment has been minimized.

Copy link
Contributor

szepeviktor commented Oct 30, 2017

@Backblaze @bwbeach May blog about it.

Copy link
Collaborator

Nikratio left a comment

Alright, finally had the time to look at this. Apart from the issue with the expensive copy() operation I don't see any fundamential problems, but a lot of smaller things that need to be addressed. Do you think you would have time for that? Thanks a lot for your contribution!

`BackBlaze B2 <https://www.backblaze.com/b2/cloud-storage.html>`_ is a low cost
storage provider, with their own API. To use this storage
you first need to sign up for an account. The account is free, you only pay for the
storage and the traffic you actually use. After account creation, you

This comment has been minimized.

Copy link
@Nikratio

Nikratio Nov 4, 2017

Collaborator

Let's get rid of the "The account is free part". It sounds like advertisement, and may change in the future.

(There may be a similar paragraph for another backend, in that case feel free to remove that too)

storage provider, with their own API. To use this storage
you first need to sign up for an account. The account is free, you only pay for the
storage and the traffic you actually use. After account creation, you
need to create a bucket to store data.

This comment has been minimized.

Copy link
@Nikratio

Nikratio Nov 4, 2017

Collaborator

How is that done? Is there a command line tool, a web application, or do you need to somehow use the API directly?

need to create a bucket to store data.
The storage URL for BackBlaze backend is ::

b2://<bucket>/[</prefix>]

This comment has been minimized.

Copy link
@Nikratio

Nikratio Nov 4, 2017

Collaborator

Are you sure the the trailing slash is required if no prefix is specified? If so, is requiring that really a good idea?


b2://<bucket>/[</prefix>]

Here *bucket* correspond to as existing bucket name, and *prefix*

This comment has been minimized.

Copy link
@Nikratio

Nikratio Nov 4, 2017

Collaborator

typo

b2://<bucket>/[</prefix>]

Here *bucket* correspond to as existing bucket name, and *prefix*
a folder where all s3ql files will be stored.

This comment has been minimized.

Copy link
@Nikratio

Nikratio Nov 4, 2017

Collaborator

The b2 documentation says that there is no such thing as a folder in b2.

if is_temp_network_error(exc):
# We probably can't use the connection anymore
self.conn.disconnect()
raise

This comment has been minimized.

Copy link
@Nikratio

Nikratio Nov 4, 2017

Collaborator

Is this necessary? If yes, shouldn't the other backends be changed accordingly?

self.conn_download.disconnect()


class ObjectR(object):

This comment has been minimized.

Copy link
@Nikratio

Nikratio Nov 4, 2017

Collaborator

There seems to be a lot of code duplication from s3.ObjectR here. Can you just inherit from s3.ObjectR and change the base class so that the hash function and response header name can be easily customized?

headers=headers,
body=self.fh,
auth_token=upload_auth_token,
body_size=self.obj_size)

This comment has been minimized.

Copy link
@Nikratio

Nikratio Nov 4, 2017

Collaborator

linebreaks

upload_auth_token, upload_url = self.backend._get_upload_url()
upload_url = urllib.parse.urlparse(upload_url)

with HTTPConnection(upload_url.hostname, 443, ssl_context=self.backend.ssl_context) as conn_up:

This comment has been minimized.

Copy link
@Nikratio

Nikratio Nov 4, 2017

Collaborator

This means every upload will require creating a fresh connection (maybe even SSL). Is there a way to re-use the connection object? Hopefully the hostname of the upload url will not change every time..

if hit:
val = int(header)
else:
val = 1

This comment has been minimized.

Copy link
@Nikratio

Nikratio Nov 4, 2017

Collaborator
try:
  val = int(header)
except ValueError:
   int = 1

would be a lot clearer. However, just like s3c you should log an error if there's an invalid value returned by the server.

@szepeviktor

This comment has been minimized.

Copy link
Contributor

szepeviktor commented Nov 19, 2017

@sylvainlehmann Do you have the time to answer all these?

@sylvainlehmann

This comment has been minimized.

Copy link
Author

sylvainlehmann commented Nov 21, 2017

@szepeviktor I do not have time to work on it theses days. I think i will start sending patches on the begining of december

@szepeviktor

This comment has been minimized.

Copy link
Contributor

szepeviktor commented Nov 21, 2017

Thanks.

@szepeviktor

This comment has been minimized.

Copy link
Contributor

szepeviktor commented Dec 15, 2017

@sylvainlehmann Could I help in any way?

@szepeviktor

This comment has been minimized.

Copy link
Contributor

szepeviktor commented Feb 10, 2018

@sylvainlehmann Could you answer some of these questions?

@szepeviktor

This comment has been minimized.

Copy link
Contributor

szepeviktor commented Mar 23, 2018

@sylvainlehmann @Nikratio I've contacted Backblaze several times to donate developer hours to this integration.

@Nikratio

This comment has been minimized.

Copy link
Collaborator

Nikratio commented Dec 29, 2018

Friendly ping. Do you think you'll have time to resolve the open issues in the near future? If not, I'll close this pull request for now.

@sylvainlehmann

This comment has been minimized.

Copy link
Author

sylvainlehmann commented Jan 3, 2019

@Nikratio Hi. Sorry for the response delay. I have a very few time to work on it and does not use s3ql anymore. Feel free to close the issue, I will come back to you if i get some free time in the next months

@Nikratio Nikratio closed this Jan 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

8 participants
You can’t perform that action at this time.