Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor of Download function and helpers #864

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

Conversation

pontus
Copy link
Contributor

@pontus pontus commented May 16, 2024

Try to make the make it so that fewer tracks/cases need to be kept in mind. Also implements data edit lists when needed to provide the correct range for encrypted downloads.

Closes #834 and closes #749.

Likely touches on some of #697 but not enough to close that yet.

Copy link
Contributor

@MalinAhlberg MalinAhlberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think this refactoring looks really nice!
I have made some comments and questions, though. When testing with htsget, the last part of the file cannot be downloaded. The log says coordinate outside file size 2596799. The file size of the complete encrypted file is 2598043 and the requested end position was 2598042 (I assume it has to do with the use of fileDetails.DecryptedSize).

// but ignore that here.

c.Header("Server-Additional-Bytes", fmt.Sprint(len(fileDetails.Header)))
c.Header("Client-Additional-Bytes", fmt.Sprint(len(fileDetails.Header)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be misinterpreting things here, but the spec says that Client-Additional-Bytes should be "the size of the Crypt4GH header that the client will receive when it queries the backend". Therefore, I think it makes sense to send the size of the header that is encrypted with reencKey, as it was before. Server-Additional-Bytes should be "the size of the Crypt4GH header that htsget-rs will receive from the backend". Probably almost always the same, but if we hard code them to have the same value, we might as well leave Client-Additional-Bytes out.

c.Header("Content-Length", fmt.Sprint(int(headerSize)+fileDetails.ArchiveSize))
c.Header("Content-Length", fmt.Sprint(len(fileDetails.Header)+fileDetails.ArchiveSize))
} else {
c.Header("Content-Length", fmt.Sprint(len(fileDetails.Header)+fileDetails.DecryptedSize))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about the size set at (new) line 276 and 279? Do we really need to reset this header here, for the unencrypted case?

rangeParts := strings.Split(rangeSpec, "-")

// Prefill with size from header
coord := coordinates{end: int64(fileDetails.DecryptedSize)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do not know if it's the decrypted or encrypted file size we're aiming for

return
}

if coord.start >= int64(fileDetails.DecryptedSize) || coord.end > int64(fileDetails.DecryptedSize) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if the range is for the encrypted case?

}

if coord.start >= int64(fileDetails.DecryptedSize) || coord.end > int64(fileDetails.DecryptedSize) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if encrypted?

alignedEnd = coords[0].end - (coords[0].end % 65536) + 65536
}

if coords[0].end != int64(fileDetails.DecryptedSize) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the end position should be for the encrypted file

// This includes transforming the coordinates from the virtual decrypted stream
// to the in file encrypted stream (which includes MACs and other overheads)
// returns a dataeditlist and any possible error
var adjustCoordinates = func(coords []coordinates, fileDetails *database.FileDownload) (del []uint64, err error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the coordinates should only be adjusted if they are read from the params. If they are read from the Range: bytes=... header, the should be left as is. Otherwise the ranges from htsget will not work correctly. For example, htsget usually specifies a request for Range: bytes=16-123. A request for this range should give a file with only contain these 108 bytes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

refactor code for htsget endpoints Refactor startt&end coordination parsing and calculations
2 participants