Skip to content
This repository has been archived by the owner on Jul 24, 2021. It is now read-only.

Support the new BITS API #34

Closed
bobobo1618 opened this issue Nov 13, 2014 · 23 comments
Closed

Support the new BITS API #34

bobobo1618 opened this issue Nov 13, 2014 · 23 comments

Comments

@bobobo1618
Copy link

This came from one of the OneDrive devs in a response on StackOverflow https://gist.github.com/rgregg/37ba8929768a62131e85

@mk-fg
Copy link
Owner

mk-fg commented Nov 14, 2014

To clarify:

BITS is a simple extension to HTTP that enables chunked file uploads to OneDrive.

Actually means not Transfer-Encoding: chunked, which is already supported and works with OneDrive, but rather uploading large files over several TCP/HTTP connections with each request only sending some byte-range from the original file.

Wish they didn't call it "chunked" in HTTP context like that, as it already means a different thing there.

@kamudadreieck
Copy link

Does that mean we can upload files >100MB finally?

The normal API doesnt accept bigger files after a few seconds at 8MB/s. :(

onedrive.api_v5.ProtocolError: (None, "('Connection aborted.', error(104, 'Connection reset by peer'))")

@bobobo1618
Copy link
Author

That seem to be the idea of BITS, yes.

@kamudadreieck
Copy link

It would be so awesome if it would be implemented in python-onedrive. :-)

@Lyrrad
Copy link

Lyrrad commented Nov 20, 2014

Yes, the new API appears to work to upload files up to the 10GB file size limit. I've been experimenting with it. So far, the largest file that I've uploaded through that API has been 2GB. I'm going to continue to experiment with it. My main issue has been getting errors that require me to start uploading again from the beginning. I assume it wouldn't be too difficult for someone to add this feature to python-onedrive.

@mk-fg
Copy link
Owner

mk-fg commented Nov 20, 2014

My main issue has been getting errors that require me to start uploading again from the beginning.

The whole file, you mean, not just one of the chunks?

Because if not, I imagine you can easily split multi-GB file into 50 KiB chunks with no significant overhead and re-uploading these shouldn't be a problem.
Though I didn't read into the doc to figure out if there're limits on chunk size/count.

@mk-fg
Copy link
Owner

mk-fg commented Nov 20, 2014

Does that mean we can upload files >100MB finally?

#16 has a related question, and indeed, that seem to be allowed via such APIs.

@mk-fg
Copy link
Owner

mk-fg commented Nov 20, 2014

I assume it wouldn't be too difficult for someone to add this feature to python-onedrive.

Yeah, simple implementation can probably be one method with parameters like "chunk_size", "retries" and "timeout" that'd read/upload these chunks from a source file sequentially.

There can also be an implementation that'd store upload state in a persistent config file, allowing for e.g. upload resuming after app restart, plus exposing that "half-uploaded" state in the python api somehow.

@Lyrrad
Copy link

Lyrrad commented Nov 20, 2014

It sometimes thinks that a fragment was uploaded out of order or that a there has been some overlap with a previously uploaded fragment even though previous fragments were uploaded successfully. Some fragment errors require that the entire upload be restarted. I think I also get some other weird errors sometimes. I'm not sure if my account is subject to upload limits that are causing these errors. I'm not too experienced in writing this sort of program, though I'll probably tinker with my experimental upload code (unrelated to python-onedrive) some more over the next few days.

I'm unsure what the optimal chunk size should be. The stated max is 60mb, and I've successfully tried from anywhere from a few kb to 30mb or so. I suppose it may make sense to dynamically adjust based on network performance so that each chunk takes approximately the same amount of time to send.

@mk-fg
Copy link
Owner

mk-fg commented Nov 20, 2014

I suppose it may make sense to dynamically adjust based on network performance so that each chunk takes approximately the same amount of time to send.

I imagine one can just grab some TCP window scaling algorithm verbatim and apply here, treating any failure as a "lost packet" ;)

@rgregg
Copy link

rgregg commented Nov 20, 2014

It'd be great to see support for this implemented in the python library. @mk-fg I've updated the documentation to stop using the term "chunked" or "chunk" to avoid confusion. Thanks for the feedback.

@mk-fg
Copy link
Owner

mk-fg commented Nov 22, 2014

Added initial (simple) support for the thing in 7943435, but didn't get it to work so far:

  • For folder-id upload urls, API seem to flat-out give http-404:

    INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): cid-a3a6XXX.users.storage.live.com
    DEBUG:requests.packages.urllib3.connectionpool:"POST /users/0xa3a6XXX/items/folder.a3a689XXX!112/README.md HTTP/1.1" 404 0
    
  • For "Transfer-Encoding: chunked" uploads to folder-path urls, getting http-400.

    Most likely due to missing Content-Length, as it's explicitly mentioned in the gist.
    Will probably fix it eventually, as soon as I'll figure out why all fixed-length stream-body uploads seem to hang with OneDrive APIs (related: [Errno 104] Connection reset by peer - Sometimes #30), probably something in "OneDriveHTTPClient.request" wrapper func...

    Unlike other OneDrive APIs, error doesn't seem to have a body with json-encoded clarification of what exactly went wrong.

  • Kinda unclear whether overwrite/downsize flags are supported in the new API, but maybe mentioned (same as with other gray things) in BITS Upload Protocol doc on msdn.

@Lyrrad
Copy link

Lyrrad commented Nov 22, 2014

Here's what I did to get it to sort of work:

For Create Session packet:

headers = {'X-Http-Method-Override' : 'BITS_POST', 
    'Authorization': 'Bearer '+od_access_token, 
    'BITS-Packet-Type': 'Create-Session',
    'BITS-Supported-Protocols': '{7df0354d-249b-430f-820d-3d2a9bef4931}'}
    r = requests.post('https://cid-XXXXXXXXXXX.users.storage.live.com/users/0xXXXXXXXXXXXX/LiveFolders/Test_Upload/' +filename,  
        headers=headers) #hardcoded CID and test upload folder

To get session ID from Create Session response:

session_id = r.headers['bits-session-id']

For fragment:

headers = {'X-Http-Method-Override' : 'BITS_POST', 
    'Authorization': 'Bearer '+od_access_token, 
    'BITS-Packet-Type': 'Fragment',
    'BITS-Session-Id': session_id,
    'Content-Length': chunkSize,
    'Content-Range' : 'bytes '+str(chunkSize*x)+'-'+str(chunkSize*(x+1)-1)+'/'+str(totalSize),
}
r = requests.post('https://cid-XXXXXXXXXXX.users.storage.live.com/users/0xXXXXXXXXXXXX/LiveFolders/Test_Upload/'+filename, 
    headers=headers, data=data)

For close session:

headers = {'X-Http-Method-Override' : 'BITS_POST', 
    'Authorization': 'Bearer '+od_access_token, 
    'BITS-Packet-Type': 'Close-Session',
    'BITS-Session-Id': session_id,
    'Content-Length': '0'}
    r = requests.post('https://cid-XXXXXXXXXXX.users.storage.live.com/users/0xXXXXXXXXXXXX/LiveFolders/Test_Upload/'+filename, 
        headers=headers)

I occasionally receive some 416 (FragmentOutOfOrder or FragmentOverlap) or 503 (ServiceNotAvailable) errors, and I am unable to resume the file. I haven't had a chance to look into those any closer.

@mk-fg
Copy link
Owner

mk-fg commented Nov 22, 2014

Thanks.

I've been able to spot at least off-by-one error in Content-Range of my implementation (and source gist, it seems).
Also, I think you don't need to pass Content-Length headers explicitly like that, as requests will calculate and add them automatically from the passed data.

I wonder, have you tried using folder-id URLs (as the rest of the API does) instead of LiveFolders?

@mk-fg
Copy link
Owner

mk-fg commented Nov 23, 2014

Mentioned http-400 error for folder-path uploads was due to that off-by-one error (which seem to also be present in the documentation example), thanks to @Lyrrad for helping me spot that.

Uploads via API seem to be working now, in general:

% ls -lah image.jpg
-rw-r--r-- 1 fraggod fraggod 5.3M Nov 23 06:08 image.jpg
% ./onedrive-cli --debug put -b --bits-frag-bytes 512000 image.jpg Pics
DEBUG:onedrive.api_v5:Using "requests" module version: '2.3.0'
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): apis.live.net
DEBUG:requests.packages.urllib3.connectionpool:"GET /v5.0/me?access_token=EwCAAq1D... HTTP/1.1" 200 100
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): cid-a3aXXX.users.storage.live.com
DEBUG:requests.packages.urllib3.connectionpool:"POST /users/0xa3aXXX/LiveFolders/Pics/image.jpg HTTP/1.1" 201 0
DEBUG:onedrive.api_v5:Uploading BITS fragment 1 / 11 (max-size: 0.49 MiB)
DEBUG:onedrive.api_v5:Uploading BITS fragment 2 / 11 (max-size: 0.49 MiB)
DEBUG:onedrive.api_v5:Uploading BITS fragment 3 / 11 (max-size: 0.49 MiB)
DEBUG:onedrive.api_v5:Uploading BITS fragment 4 / 11 (max-size: 0.49 MiB)
DEBUG:onedrive.api_v5:Uploading BITS fragment 5 / 11 (max-size: 0.49 MiB)
DEBUG:onedrive.api_v5:Uploading BITS fragment 6 / 11 (max-size: 0.49 MiB)
DEBUG:onedrive.api_v5:Uploading BITS fragment 7 / 11 (max-size: 0.49 MiB)
DEBUG:onedrive.api_v5:Uploading BITS fragment 8 / 11 (max-size: 0.49 MiB)
DEBUG:onedrive.api_v5:Uploading BITS fragment 9 / 11 (max-size: 0.49 MiB)
DEBUG:onedrive.api_v5:Uploading BITS fragment 10 / 11 (max-size: 0.49 MiB)
WARNING:requests.packages.urllib3.connectionpool:Connection pool is full, discarding connection: cid-a3aXXX.users.storage.live.com
DEBUG:onedrive.api_v5:Uploading BITS fragment 11 / 11 (max-size: 0.49 MiB)
WARNING:requests.packages.urllib3.connectionpool:Connection pool is full, discarding connection: cid-a3aXXX.users.storage.live.com
DEBUG:requests.packages.urllib3.connectionpool:"POST /users/0xa3aXXX/LiveFolders/Pics/image.jpg HTTP/1.1" 200 0
DEBUG:root:Call result:
--------------------
...(lots of metadata)...
--------------------

(will be silent without --debug)

API-wise, "put" method now has "bits_api_fallback" option:

put(path_or_tuple, folder_id='me/skydrive', overwrite=None, downsize=None, bits_api_fallback=True)

...which can be True/False or max (non-BITS) file size (default - 95 MiB).
And there's also "put_bits" method.

Limitations/hacks:

  • overwrite/downsize flags, supported by the regular POST/PUT API requests are not documented for BITS, so "put_bits" does not have these.

    Files uploaded via BITS seem to overwrite same-name ones, so "put" will raise exception when falling-back to BITS with overwrite set to False. Passed "downsize" option will issue a warning on such fallback.
    Tried passing "overwrite=false" in query of a BITS session creation request, didn't work.

  • Couldn't get BITS uploads to folder-ids to work, suspecting that these might not be implemented yet.

    Simple workaround in place "resolves" folder_id (if passed instead of path) to folder_path via several (== depth) "info" calls.

  • Some documented headers (BITS-Session-Id, BITS-Received-Content-Range) aren't actually returned in the last response.

  • file-id returned by BITS API has different format than any "id" returned by other APIs (which look like file.{user_id}.{file_id}), so it gets converted, unless "raw_id=True" gets passed to "put_bits".

  • Given that it's a "simple" implementation, there's no way to resume BITS uploads after e.g. app restart atm.

    Kinda easy to add that by chopping "put_bits" into smaller pieces and adding some "bits upload session" (class or generator) concept. Will maybe implement later.

Thanks again to everyone for the feedback.

@mk-fg mk-fg closed this as completed Nov 23, 2014
@mk-fg
Copy link
Owner

mk-fg commented Nov 23, 2014

Note on the output in previous msg: no idea about these warnings from "requests" - it doesn't even open new connections there (reusing same one for all BITS requests).

@KarmaPoliceT2
Copy link

Just wanted to cross post this here as you may not have seen it yet, not sure if it helps:

To reference a folder by id you'll actually want:

/Items/{folder-id}

Where the {folder-id} of "folder.a5858c9cb698b77b.A5858C9CB698B77B!24220" is "A5858C9CB698B77B!24220"

Posted here: https://gist.github.com/rgregg/37ba8929768a62131e85

@mk-fg
Copy link
Owner

mk-fg commented Dec 12, 2014

Oh, nice, haven't seen it. Thanks, I'll try this out.

Not sure though if it means that there should be no "/users/0x{id}/" at the start of the uri, or that there should be no filename after the {folder-id} (but where else you'd specify it then?), or that "Items" must have that capital "I", or some combination of these, but should be easy to try stuff out, see if something might work.

@mk-fg
Copy link
Owner

mk-fg commented Dec 15, 2014

That means DO NOT INCLUDE /Users/{Id}.

Heheh ;)

Just updated the OneDriveAPIWrapper.api_bits_url_by_id and added folder-id mangling as suggested by @ificator in the gist comments, and uploads by folder-id seem to work now. Yay!

@ajcsoftware
Copy link

I have the chunked uploading working fine but I have one problem. I can't upload a zero length file. If I do just a start and a close call I get a "(416) requested range not satisfiable" error and if I try a start, and a zero length fragment I get "(400) Bad Request.".

@mk-fg
Copy link
Owner

mk-fg commented Jan 15, 2015

@ajcsoftware

Yeah, I see same behavior as you described here as well.
python-onedrive does not try to do "Range: bytes=0-0" chunk upload request and gets 416 error when trying to commit the upload session.

It's an obvious workaround of course, but for completeness' sake I want to note that you can upload zero-length files via normal (non-BITS) PUT/POST requests just fine, and python-onedrive kinda does that automatically if you use "put" (api or cli) with bits_api_fallback threshold greater than 0.

Don't think raising some special exception for zero-length files in "put_bits" method (of python-onedrive) is worth it, as it seem to be rather API's place to give proper error in such cases, if they aren't supported.

Also, as you seem to be talking about API issue in general (re-posting question from the gist), and not about how python-onedrive handles things, it might be worth mentioning here that this module is in no way "official" or affiliated with the service itself, so I can't really fix things in the API and have no influence (that I know of) over how/when Microsoft fixes these.

@ajcsoftware
Copy link

Yes I now upload zero length files using the normal method but even that throws an exception saying the request was cancelled but I ignore the error because the file has actually appeared on OneDrive.

Interestingly it looks like they don't actually support zero length files (which is crazy) because if you go to the OneDrive official web UI and try to upload one manually it says you can't.

Yes I have mentioned this problem elsewhere but I though you guys might be interested or come across the same problem. There is not much coverage of this on the net. Working with OneDrive for business is even worse! (another problem by the way is you can't create a folder starting with a period/dot even though you can through the web UI).

@mk-fg
Copy link
Owner

mk-fg commented Jan 16, 2015

Yes I now upload zero length files using the normal method but even that throws an exception saying the request was cancelled

Seem to work fine for me, at least with uploads via PUT requests now, i.e. http 200 status, file gets uploaded, metadata on it shows "size: 0", so I guess you might be doing it somewhat differently.


I think they guy in the gist has a point that Stack Exchange sites might be a way better place for such coverage and general questions than some random project's github issues comment thread.

As there's like a few dozens of unrelated comments on this page already, and might be a hundred more, there's little to no hope anyone will find anything here (unless they're really desperate), while on e.g. Stack Overflow you'll get some relevant thing floating right on top of the first link in google, as I'm sure you're well aware.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants