Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial support for BackBlaze B2 #8

Closed
wants to merge 5 commits into from
Closed

Initial support for BackBlaze B2 #8

wants to merge 5 commits into from

Conversation

sylvainlehmann
Copy link

@sylvainlehmann sylvainlehmann commented Sep 5, 2016

Hello,

Here's a prototype implementation of BackBlaze B2 Backend. It may need some extended tests, but respond correctly to backend's test feature. There is still some things left to do :

  • backend does not support backslashes on filenames. To pass unit tests, i replace them with other characters. It seems not to be a problem, since s3ql does not use backslashes on normal operation, but remain quite dirty
  • backend does not support copy operation. The code download et upload again the file being copied.

@Nikratio
Copy link
Collaborator

Nikratio commented Sep 6, 2016

Looks promising, thanks for sharing! Are you looking for feedback already, or do you want to finish the implementation first?

For the blackslash issue, I'd suggest to use a technique similar to the one implemented in the escape function (from s3ql.backends.local), just escape \ instead of /.

@sylvainlehmann
Copy link
Author

Thank you for suggest, i corrected the escaping functions.
The code seems to do his jobs, so i'm looking for feedback already if you have some.

@dlight
Copy link

dlight commented Dec 21, 2016

Note: Blackblaze seems to be open to add backend functionality to support s3ql better (see this comment). They could implement server-side copy if requested by opening an issue.

@szepeviktor
Copy link
Collaborator

@sylvainlehmann May I test your code?

@sylvainlehmann
Copy link
Author

@dlight Thanks for the suggest, i will contact them.
@szepeviktor Yes. Feel free to give me any feedback you find useful !

@szepeviktor
Copy link
Collaborator

szepeviktor commented Jan 9, 2017

~/.s3ql/authinfo2

[blackbaze]
storage-url: b2://my-bucket/prefix_
backend-login: <Account ID>
backend-password: <Application key>
fs-passphrase: <high entropy>

mkfs.s3ql b2://my-bucket/prefix_ and mount.s3ql b2://my-bucket/prefix_ /mnt work OK.

Keep up the work.
Thank you!

@szepeviktor
Copy link
Collaborator

szepeviktor commented Apr 16, 2017

@sylvainlehmann May I begin migrating my clients to BB?

@szepeviktor
Copy link
Collaborator

szepeviktor commented Apr 16, 2017

Build S3QL + B2 in a Docker container - e.g. by starting an UpCloud instance
Dockerfile: https://github.com/szepeviktor/debian-server-tools/blob/master/virtualization/python-s3ql-test-b2/Dockerfile
Build and Run commands included.

@szepeviktor
Copy link
Collaborator

They could implement server-side copy if requested by opening an issue.

Could that be the consequence that copying an empty dir with s3qlcp takes long minutes?

@anttotarella
Copy link

anttotarella commented Apr 22, 2017

Hey @szepeviktor, thanks for your work. I have built s3ql with b2 using your docker and have got it to mount inside the docker, now my question is how do you get it working outside. Thanks.

@szepeviktor
Copy link
Collaborator

Although s3ql is installable by pip only I decided to patch the Debian backport:
https://github.com/szepeviktor/debian-server-tools/blob/master/package/s3ql-jessie-backports-b2.sh

Also you may run the commands of that Dockerfile on your server.
I prefer proper OS packages.

@anttotarella
Copy link

@szepeviktor OK so there is no way to export the s3ql mount from the docker to the host?

@szepeviktor
Copy link
Collaborator

Of course there is: docker run --volume=HOST-DIR:CONTAINER-DIR:OPTIONS
https://docs.docker.com/engine/reference/commandline/run/#mount-volume--v---read-only

@anttotarella
Copy link

@szepeviktor Hmm, I know of this and have tried it and for whatever reason I cannot see my mount from outside the docker but I can pass through files. I will try with different mount options.

@sylvainlehmann
Copy link
Author

@szepeviktor

Could that be the consequence that copying an empty dir with s3qlcp takes long minutes?

I think not. Creating folders should only concern database. And server-side copy is only used for metadata rolling backup (as far as i can see)

May I begin migrating my clients to BB?

This implementation looks quite stable, i use it daily since a few monts with no problems, but i work on a quite low load and just for a few Tb of data. If you plan to use this implementation under heavy load and a very large amount of data, i suggest you to make extended tests before

@szepeviktor
Copy link
Collaborator

This implementation looks quite stable

Could you convince @Nikratio to merge it?

@gbdoin
Copy link

gbdoin commented May 13, 2017

Right now we're using your fork and uploading about 20T of data. Everything is as stable as it can be. The plan is to use s3ql.b2 to stream multimedia. We will update as soon as we have some results

@ToroNZ
Copy link

ToroNZ commented May 18, 2017

I've been using this for the last 2 weeks, until recently the filesystem crashed:

tom@doublec-02:~$ df -h
df: /mnt/s3ql-b2: Transport endpoint is not connected
tom@doublec-02:~$ umount.s3ql /mnt/s3ql-b2
ERROR: File system appears to have crashed.
tom@doublec-02:~$ mount.s3ql --authfile /home/tom/.s3ql/authinfo2 --allow-root --compress none b2://blah-blah/blah /mnt/s3ql-b2
ERROR: Mountpoint does not exist.
tom@doublec-02:~$ ls /mnt -lth
ls: cannot access '/mnt/s3ql-b2': Transport endpoint is not connected
total 8.0K
drwxr-xr-x 2 tom www-data 4.0K Apr 30 20:27 s3fs-aws
drwxr-xr-x 2 tom www-data 4.0K Apr 30 15:21 b2fuse-mount
d????????? ? ?   ?           ?            ? s3ql-b2
tom@doublec-02:~/s3ql$ fsck.s3ql b2://blah-blah/blah
ERROR: Can not check mounted file system.
tom@doublec-02:~/s3ql$ sudo umount -l /mnt/s3ql-b2
[sudo] password for tom: 
tom@doublec-02:~/s3ql$ 
tom@doublec-02:~/s3ql$ fsck.s3ql b2://blah-blah/blah
Starting fsck of b2://blah-blah/blah
Using cached metadata.
WARNING: Remote metadata is outdated.
Checking DB integrity...
Creating temporary extra indices...
Checking lost+found...
Checking cached objects...
Checking names (refcounts)...
Checking contents (names)...
Checking contents (inodes)...
Checking contents (parent inodes)...
Checking objects (reference counts)...
Checking objects (backend)...
..processed 10000 objects so far..WARNING: Deleted spurious object 19701

Checking objects (sizes)...
Checking blocks (referenced objects)...
Checking blocks (refcounts)...
Checking blocks (checksums)...
Checking inode-block mapping (blocks)...
Checking inode-block mapping (inodes)...
Checking inodes (refcounts)...
Checking inodes (sizes)...
Checking extended attributes (names)...
Checking extended attributes (inodes)...
Checking symlinks (inodes)...
Checking directory reachability...
Checking unix conventions...
Checking referential integrity...
Dropping temporary indices...
Dumping metadata...
..objects..
..blocks..
..inodes..
..inode_blocks..
..symlink_targets..
..names..
..contents..
..ext_attributes..
Compressing and uploading metadata...
Wrote 520 KiB of compressed metadata.
Cycling metadata backups...
Backing up old metadata...
Cleaning up local metadata...
Completed fsck of b2://blah-blah/blah
tom@doublec-02:~/s3ql$ 
tom@doublec-02:~/s3ql$ mount.s3ql --authfile /home/tom/.s3ql/authinfo2 --allow-root --compress none b2://blah-blah/blah /mnt/s3ql-b2
Using 4 upload threads.
Autodetected 65492 file descriptors available for cache entries
Using cached metadata.
Setting cache size to 24148 MB
Mounting b2://blah-blah/blah at /mnt/s3ql-b2...
tom@doublec-02:~/s3ql$ s3qlstat /mnt/s3ql-b2
Directory entries:    3414
Inodes:               3416
Data blocks:          11985
Total data size:      100 GiB
After de-duplication: 100 GiB (99.84% of total)
After compression:    100 GiB (99.84% of total, 100.00% of de-duplicated)
Database size:        2.01 MiB (uncompressed)
Cache size:           49.7 MiB, 7 entries
Cache size (dirty):   19.7 MiB, 4 entries
Queued object removals: 0

Coudn't find any logs (?). There was nothing in syslog... need to see if FUSE dump a core somewhere.

@sylvainlehmann
Copy link
Author

@ToroNZ could you please open an issue on the fork (https://github.com/sylvainlehmann/s3ql) and past the content of ~/.s3ql/mount.log ?

@ToroNZ
Copy link

ToroNZ commented Jun 1, 2017

@sylvainlehmann Finally manage to log it! My apologies for taking so long, stuff gets in the way.

https://github.com/sylvainlehmann/s3ql/issues/2

@dv-anomaly
Copy link

dv-anomaly commented Jul 4, 2017

Is this production ready yet? I tested this this earlier in the year and my results were pretty good.

@Nikratio
Copy link
Collaborator

Sorry for the delay! I will take a look at this soon.

@Nikratio Nikratio self-assigned this Aug 24, 2017
@Nikratio
Copy link
Collaborator

At the moment this is very hard to review, since it mixes changes from in between S3QL releases with the actual changes from adding BackBlaze support. I suspect that something went wrong with the merges.

Could you please squash everything into one commit, and rebase that onto current master?

Thanks!

Copy link
Collaborator

@Nikratio Nikratio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please squash & rebase on current master, ensuring that the pull request really contains only the right changes

.hgtags Outdated
@@ -86,3 +86,4 @@ cc9dfb6bc9ebb4290aaaa2e06e101c460d4126f5 release-2.15
a7de95ab14ceb9f5be5eb8b44694a4c2a766fe89 release-2.18
18d9916a31218052107bcbd9739f504f9f536e24 release-2.19
6036e4dfee66dba05f2bbb1319b90ed7c9557688 release-2.20
5649b9e2d23e66d79b180107057515c73d621731 release-2.21
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look like it has anything to do with adding BackBlaze support

Changes.txt Outdated
@@ -1,3 +1,10 @@
2016-10-28, S3QL 2.21
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

..nor this...

@@ -59,7 +59,7 @@ that is not the case.
version between 1.0 (inclusive) and 2.0 (exclusive)
* `dugong <https://bitbucket.org/nikratio/python-dugong/>`_, any
version between 3.4 (inclusive) and 4.0 (exclusive)
* `pytest <http://pytest.org/>`_, version 2.3.3 or newer (optional, to run unit tests)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

..nor this, stopping here

@szepeviktor
Copy link
Collaborator

szepeviktor commented Aug 31, 2017

@sylvainlehmann Your code works at each night on my development server doing an almost complete backup (/usr, mysql, /home, /etc)

Could you please update this PR?

@szepeviktor
Copy link
Collaborator

Or I can open a separate PR with your 4 commits.

@sylvainlehmann
Copy link
Author

Thank you for review. I made a clean rebase. This should be OK now

@Nikratio
Copy link
Collaborator

Nikratio commented Sep 2, 2017

Thanks! One thing I can see right away is the lack of documentation. Could you please extend rst/backends.rst as necessary?

@sylvainlehmann
Copy link
Author

sylvainlehmann commented Sep 25, 2017

The documentation has been added

@szepeviktor
Copy link
Collaborator

@Nikratio Could you merge it for the next release?

@Nikratio
Copy link
Collaborator

I will try. As an aside, when merging I will squash this into one commit (the current split into 5 commits doesn't seem to make sense), so in order to preserve correct attributions it would be nice to have the pull request just contain one commit too. But that's not a showstopper.

@szepeviktor
Copy link
Collaborator

@Backblaze @bwbeach May blog about it.

Copy link
Collaborator

@Nikratio Nikratio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, finally had the time to look at this. Apart from the issue with the expensive copy() operation I don't see any fundamential problems, but a lot of smaller things that need to be addressed. Do you think you would have time for that? Thanks a lot for your contribution!

`BackBlaze B2 <https://www.backblaze.com/b2/cloud-storage.html>`_ is a low cost
storage provider, with their own API. To use this storage
you first need to sign up for an account. The account is free, you only pay for the
storage and the traffic you actually use. After account creation, you
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get rid of the "The account is free part". It sounds like advertisement, and may change in the future.

(There may be a similar paragraph for another backend, in that case feel free to remove that too)

storage provider, with their own API. To use this storage
you first need to sign up for an account. The account is free, you only pay for the
storage and the traffic you actually use. After account creation, you
need to create a bucket to store data.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is that done? Is there a command line tool, a web application, or do you need to somehow use the API directly?

need to create a bucket to store data.
The storage URL for BackBlaze backend is ::

b2://<bucket>/[</prefix>]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure the the trailing slash is required if no prefix is specified? If so, is requiring that really a good idea?


b2://<bucket>/[</prefix>]

Here *bucket* correspond to as existing bucket name, and *prefix*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

b2://<bucket>/[</prefix>]

Here *bucket* correspond to as existing bucket name, and *prefix*
a folder where all s3ql files will be stored.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The b2 documentation says that there is no such thing as a folder in b2.

if is_temp_network_error(exc):
# We probably can't use the connection anymore
self.conn.disconnect()
raise
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? If yes, shouldn't the other backends be changed accordingly?

self.conn_download.disconnect()


class ObjectR(object):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be a lot of code duplication from s3.ObjectR here. Can you just inherit from s3.ObjectR and change the base class so that the hash function and response header name can be easily customized?

headers=headers,
body=self.fh,
auth_token=upload_auth_token,
body_size=self.obj_size)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

linebreaks

upload_auth_token, upload_url = self.backend._get_upload_url()
upload_url = urllib.parse.urlparse(upload_url)

with HTTPConnection(upload_url.hostname, 443, ssl_context=self.backend.ssl_context) as conn_up:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means every upload will require creating a fresh connection (maybe even SSL). Is there a way to re-use the connection object? Hopefully the hostname of the upload url will not change every time..

if hit:
val = int(header)
else:
val = 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try:
  val = int(header)
except ValueError:
   int = 1

would be a lot clearer. However, just like s3c you should log an error if there's an invalid value returned by the server.

@szepeviktor
Copy link
Collaborator

@sylvainlehmann Do you have the time to answer all these?

@sylvainlehmann
Copy link
Author

sylvainlehmann commented Nov 21, 2017

@szepeviktor I do not have time to work on it theses days. I think i will start sending patches on the begining of december

@szepeviktor
Copy link
Collaborator

Thanks.

@szepeviktor
Copy link
Collaborator

@sylvainlehmann Could I help in any way?

@szepeviktor
Copy link
Collaborator

@sylvainlehmann Could you answer some of these questions?

@szepeviktor
Copy link
Collaborator

szepeviktor commented Mar 23, 2018

@sylvainlehmann @Nikratio I've contacted Backblaze several times to donate developer hours to this integration.

@Nikratio
Copy link
Collaborator

Friendly ping. Do you think you'll have time to resolve the open issues in the near future? If not, I'll close this pull request for now.

@sylvainlehmann
Copy link
Author

@Nikratio Hi. Sorry for the response delay. I have a very few time to work on it and does not use s3ql anymore. Feel free to close the issue, I will come back to you if i get some free time in the next months

@Nikratio Nikratio closed this Jan 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants