Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement backblaze-b2 gateway support #5002

Merged
merged 1 commit into from Oct 13, 2017

Conversation

harshavardhana
Copy link
Member

@harshavardhana harshavardhana commented Oct 2, 2017

Description

Implement backblaze-b2 gateway support

Motivation and Context

Fixes #4072

How Has This Been Tested?

Manually testing and mint.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@codecov
Copy link

codecov bot commented Oct 2, 2017

Codecov Report

Merging #5002 into master will decrease coverage by 1.15%.
The diff coverage is 5.47%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5002      +/-   ##
==========================================
- Coverage   62.36%   61.21%   -1.16%     
==========================================
  Files         190      192       +2     
  Lines       27756    28320     +564     
==========================================
+ Hits        17310    17336      +26     
- Misses       9209     9740     +531     
- Partials     1237     1244       +7
Impacted Files Coverage Δ
cmd/gateway-gcs.go 6.13% <ø> (ø) ⬆️
cmd/globals.go 100% <ø> (ø) ⬆️
cmd/gateway-b2.go 0% <0%> (ø)
cmd/gateway-main.go 12.88% <0%> (-1.01%) ⬇️
cmd/gateway-unsupported.go 0% <0%> (ø) ⬆️
cmd/gateway-b2-anonymous.go 37.33% <37.33%> (ø)
cmd/api-headers.go 87.5% <50%> (-5.36%) ⬇️
cmd/fs-v1-rwpool.go 64.34% <0%> (-4.35%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3d0dced...274ca16. Read the comment docs.

@harshavardhana
Copy link
Member Author

harshavardhana commented Oct 3, 2017

Consistent unrelated failure on appveyor ignore windows error for now.

go test -v -timeout 17m -coverprofile=c.txt -covermode=atomic github.com/minio/minio/cmd
go build github.com/minio/minio/cmd: c:\go18\pkg\tool\windows_amd64\compile.exe: fork/exec c:\go18\pkg\tool\windows_amd64\compile.exe: The filename or extension is too long.

Talking to appveyor support at the moment.

@harshavardhana
Copy link
Member Author

Looks like with this PR we are expanding beyond 32K limit on windows shell PATH_MAX - golang/go#18468

Work-around is i will try to merge the files reducing the overall length and lets see if it fixes.

@harshavardhana harshavardhana force-pushed the backblaze-b2 branch 2 times, most recently from 482a29f to 21c1275 Compare October 3, 2017 06:34
@harshavardhana
Copy link
Member Author

Looks like with this PR we are expanding beyond 32K limit on windows shell PATH_MAX - golang/go#18468

Work-around is i will try to merge the files reducing the overall length and lets see if it fixes.

Was able to fix it by merging some files.

@harshavardhana harshavardhana force-pushed the backblaze-b2 branch 2 times, most recently from dec5bb7 to ef3ff30 Compare October 3, 2017 07:31
@@ -134,8 +135,9 @@ func newS3Gateway(host string) (GatewayLayer, error) {
anonClient.SetCustomTransport(newCustomHTTPTransport())

return &s3Objects{
Client: client,
anonClient: anonClient,
unsupportedAPIs: unsupportedAPIs{},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsupportedAPIs: unsupportedAPIs{}, can be left out

@@ -16,27 +16,30 @@

package cmd

type unsupportedAPIs struct{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be renamed to more specific gatewayUnsupported

}
// startPartNumber must be in the range 1 - 10000 for B2.
if partNumberMarker == 0 {
partNumberMarker = 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will cause the list response to always skip the first part

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't @krishnasrinivas . Not in case of Backblaze.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

startPartNumber includes 1 and later, it is not the same style as S3.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in that case we need to do partNumberMarker++ always because:

krishna@escape:~/dev/minio-tile-2$ miniodebug multipart listparts --bucket krishna1980 --object testobject --uploadid 4_z1efea6cee041ea395aeb021f_f205a55e6b4215c82_d20171003_m202115_c001_v0001093_t0023 --partmarker 2
{
  "Bucket": "krishna1980",
  "Key": "testobject",
  "UploadID": "4_z1efea6cee041ea395aeb021f_f205a55e6b4215c82_d20171003_m202115_c001_v0001093_t0023",
  "Initiator": {
    "ID": "02d6176db174dc93cb1b899f7c6078f08654445fe8cf1b6ce98d8855f66bdbf4",
    "DisplayName": ""
  },
  "Owner": {
    "DisplayName": "",
    "ID": "02d6176db174dc93cb1b899f7c6078f08654445fe8cf1b6ce98d8855f66bdbf4"
  },
  "StorageClass": "STANDARD",
  "PartNumberMarker": 2,
  "NextPartNumberMarker": 0,
  "MaxParts": 1000,
  "IsTruncated": false,
  "ObjectParts": [
    {
      "PartNumber": 2,
      "LastModified": "0001-01-01T00:00:00Z",
      "ETag": "\"5da54579c335464b6ba52090bf57d063384bb1e9\"",
      "Size": 2461
    },
    {
      "PartNumber": 3,
      "LastModified": "0001-01-01T00:00:00Z",
      "ETag": "\"5da54579c335464b6ba52090bf57d063384bb1e9\"",
      "Size": 2461
    }
  ],
  "EncodingType": ""
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes you are right. @krishnasrinivas

@harshavardhana harshavardhana force-pushed the backblaze-b2 branch 2 times, most recently from fa10005 to 07211fb Compare October 3, 2017 23:02
Copy link
Member

@krisis krisis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are some initial comments.

return fmt.Sprintf("%s%d-%d", byteRangePrefix, offset, offset+size-1)
}

// AnonGetObject - Get object anonymously
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment doesn't say anything more than the name of the function. The comment could be AnonGetObject - Get object using a plain GET request. or something to that effect.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

// Converts http Header into ObjectInfo. This function looks for all the
// standard Backblaze B2 headers to convert into ObjectInfo.
//
// Content-Length is converted to Size.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we could order the headers in the comment such that X-Bz-... appear together.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are written in the order they are handled inside the code..

return objInfo, err
}

info := make(map[string]string)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could call this userDefinedHdrs instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

Bucket: bucket,
Name: object,
ContentType: header.Get("Content-Type"),
ModTime: time.Unix(timeStamp/1000, timeStamp%1000*1e6),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ModTime can be idiomatically computed like time.Unix(0, 0).Add(timeStamp * time.Millisecond).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but why is your approach more idiomatic? @krisis .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is idiomatic since it uses standard Go types like time.Duration to work with milliseconds. To work with milliseconds available as string in header, we could do the following,

timeStamp, err := time.ParseDuration(header.Get("X-Bz-Upload-Timestamp"))
if err != nil {
}
modTime := time.Unix(0, 0).Add(timeStamp)

This way we use the appropriate Go types to work with time in milliseconds instead of dropping down to low-level types like int64, int.

return objInfo, err
}
defer resp.Body.Close()
if resp.StatusCode != 200 && resp.StatusCode != 206 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we expect partial content for a HEAD request?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes if you set the Range header.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we aren't setting range headers here. should we still check?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need then

uploadID = params[2]
}

// Following code is a non-exhaustive list.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Following code is a non-exhaustive list is out of place since there is no list of codes below.

// MakeBucket creates a new container on B2 backend.
func (l *b2Objects) MakeBucketWithLocation(bucket, location string) error {
info := make(map[string]string)
info["x-minio-bucket-createTime"] = UTCNow().Format(time.RFC1123)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"x-minio-bucket-createTime" should be a const.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

return b2ToObjectError(traceError(err), bucket)
}

// Bucket - bucket
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment can be expanded.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

}

// Looks for x-minio-bucket-createTime as part of bucket metadata, if key
// is absent then returns current UTC time instead. Returns error if
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When creation time is unavailable we should use a sentinel value like `time.Unix(0,0) to indicate that we couldn't determine the actual time of creation. Returning current time may be misleading.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

time.Unix(0,0) is not a good idea because of the tests that expect CreatTime to be always within some acceptable boundary of time. The reason to use currentTime here is to provide lastAccessedTime instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

time.Unix(0,0) is not a good idea because of the tests that expect CreatTime to be always within some acceptable boundary of time

Which tests are you referring to? If it is unit tests we can (and should) change it as fit. This suggestion is to avoid any confusion to the user, who knows that the bucket was created much before the time we might show (i.e, last accessed time like you mention above).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am talking about mint tests.. Setting time.Unix(0, 0) will be more confusing to user IMO. I would prefer not make this change and keep it the way it is. We can even discuss about this but i would like to avoid epoch date.

For the most part users using buckets from minio gateway will not have any problems, B2 users aren't expecting this value anyways. So i choose to set this value to something more meaningful than Epoch time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a matter of fact perhaps i can remove the x-minio-bucket-createTime and avoid such intelligence in the first place.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting time.Unix(0, 0) will be more confusing to user IMO. I would prefer not make this

The idea behind time.Unix(0,0) is that it is a constant and doesn't change with time like time.Now(). We could choose a different constant time.Time value if that's the only issue.

As a matter of fact perhaps i can remove the x-minio-bucket-createTime and avoid such intelligence in the first place.

what would InitiatedTime value for BucketInfo be?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing just time.Unix(0, 0) there is no such info available from B2 API so we don't return anything.

@harshavardhana harshavardhana force-pushed the backblaze-b2 branch 3 times, most recently from 3fb3748 to 9e1aeb1 Compare October 5, 2017 01:52
@harshavardhana
Copy link
Member Author

New change in this PR is regular PutObject() is doing multipart if the input stream size is bigger than MinPartSize . This is to ensure that we don't corrupt the namespace if there were errors during data transfer.

@harshavardhana harshavardhana force-pushed the backblaze-b2 branch 3 times, most recently from 51e141f to 6043296 Compare October 5, 2017 02:11
@harshavardhana harshavardhana force-pushed the backblaze-b2 branch 2 times, most recently from ac1bf8e to 36994a0 Compare October 6, 2017 00:49
}
}

type hexEndReader struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be renamed to b2Hasher as hexEndReader would be a generic name

}

hr := newHexDigitsEndReader(data, data.Size())
sha1, err := fc.UploadPart(l.ctx, hr, "hex_digits_at_end", hr.Len(), partID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"hex_digits_at_end" should be a const

ContentType: header.Get("Content-Type"),
ModTime: time.Unix(0, 0).Add(time.Duration(timeStamp) * time.Millisecond),
Size: clen,
ETag: header.Get("X-Bz-Content-Sha1"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like fileID is the right candidate for ETag, because the functionality of If-Match is supported by b2_download_file_by_id and b2_get_file_info both of which take fileID, there is no SHA1 support for these calls

@harshavardhana harshavardhana force-pushed the backblaze-b2 branch 7 times, most recently from 87524ff to 2672aea Compare October 10, 2017 16:44
// Content-Type is converted to ContentType.
// X-Bz-Content-Sha1 is converted to ETag.
func headerToObjectInfo(bucket, object string, header http.Header) (objInfo ObjectInfo, err error) {
clen, err := strconv.ParseInt(header.Get("Content-Length"), 10, 64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a comment mentioning what is going on here. E.g, // Converting upload timestamp in milliseconds to a time.Time value for ObjectInfo.ModTime.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return objInfo, err
}

userMetadata := make(map[string]string)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment explaining why were skipping headers that don't start with X-Bz-Info is needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return objInfo, err
}
defer resp.Body.Close()
if resp.StatusCode != 200 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we returning a new error value, i.e errors.New(resp.Status) while returning the error as is in AnonGetObject?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the error that you would return as is?

return b2ToObjectError(traceError(err), bucket)
}

// Bucket - bucket
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we name this function similar to other functions like GetBucket? Is this function required to be exported? Please add a comment too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Name: file.Name,
ModTime: file.Timestamp,
Size: file.Size,
ETag: file.Info.SHA1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't file.FileID be returned for ETag?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Name: file.Name,
ModTime: file.Timestamp,
Size: file.Size,
ETag: file.Info.SHA1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't file.FileID be returned for ETag?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

type B2Reader struct {
r *HashReader
size int64
hsh hash.Hash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can call it sha1Sum instead of hsh.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if err != nil {
return b2ToObjectError(traceError(err), bucket, object)
}
io.Copy(ioutil.Discard, reader)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we not Close reader without copying?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return lpi, nil
}

// AbortMultipartUpload aborts a ongoing multipart upload, uses B2's LargeFile upload API.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: s/a ongoing/an ongoing/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
_, err = bkt.File(uploadID, object).CompileParts(0, hashes).FinishLargeFile(l.ctx)
if err != nil {
return oi, err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we call b2ToObjectError(traceError(err), ...)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@harshavardhana harshavardhana force-pushed the backblaze-b2 branch 4 times, most recently from f26bdf4 to 14e577c Compare October 11, 2017 19:03
"sync"
"time"

b2 "github.com/minio/blazer/base"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we not using "github.com/minio/blazer/b2" client, which has reauthorization built into it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need low level API @krisis

@harshavardhana harshavardhana force-pushed the backblaze-b2 branch 2 times, most recently from f1c3ac9 to cb814ad Compare October 13, 2017 00:30
@harshavardhana harshavardhana force-pushed the backblaze-b2 branch 2 times, most recently from d379d42 to de8231a Compare October 13, 2017 10:02
@nitisht nitisht merged commit 0c0d1e4 into minio:master Oct 13, 2017
@harshavardhana harshavardhana deleted the backblaze-b2 branch October 13, 2017 10:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

server: [gateway] Support for Backblaze B2
4 participants