Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom Chunk Size for GetContentFile #49

Closed
TanukiAI opened this issue Aug 1, 2020 · 9 comments
Closed

Custom Chunk Size for GetContentFile #49

TanukiAI opened this issue Aug 1, 2020 · 9 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@TanukiAI
Copy link
Contributor

TanukiAI commented Aug 1, 2020

Would it be possible to add a custom chunk size for the GetContentFile method?
Currently it imports the default chunk size given by googleapiclient. Maybe a overwrite method?
Like this:

file = drive.CreateFile({'id': gid})
file.FetchMetadata()
file.GetContentFile(file["title"], chunksize=250*1024*1024)  # 250 MB instead of the 100 MB

Thanks.

@shcheklein
Copy link
Member

I think default is 100KB, not 100MB.

@TA40 I think it should be very easy to add:

pretty much add it here:

https://github.com/iterative/PyDrive2/blob/master/pydrive2/files.py#L311

and pass it then to the downloader:

https://github.com/iterative/PyDrive2/blob/master/pydrive2/files.py#L331

unless I'm missing something.

Would be great to add a test as well for this.

Feel free to create a PR and we'll release a new version very quick.

Btw, out of curiosity - do you expect a better performance with a large buffer size?

@shcheklein shcheklein added enhancement New feature or request good first issue Good for newcomers labels Aug 1, 2020
@TanukiAI
Copy link
Contributor Author

TanukiAI commented Aug 1, 2020

No, the default is 100 MB, not KB.
I made a small callback progressbar function and it downloads the file in 100MB chunks.
Callback progressbar function
Also you can see the python process taking 100 MB in the RAM and then releasing the 100 MB every chunk cycle.

Unfortunately I never did a PR, but there is always a first time ^^

I think so, yes. Because from what I've seen, the more you switch between writing and downloading, the more time you need because it needs to restart the download process.
But I could also be wrong and maybe it gets worse.

@shcheklein
Copy link
Member

@TA40 hmm 🤔

Yes, you are right, it's 100Mb, sorry.

https://github.com/googleapis/google-api-python-client/blob/master/googleapiclient/http.py#L72
https://github.com/googleapis/google-api-python-client/blob/master/googleapiclient/http.py#L665
https://github.com/googleapis/google-api-python-client/blob/master/googleapiclient/http.py#L673

Because from what I've seen, the more you switch between writing and downloading, the more time you need because it needs to restart the download process.

that's probably true. 200MB can do some improvement in this case. Depending on the network speed/file system performance, I would expect though that OS buffers can mitigate this to some extent. I would test the network performance with some external tool and see if there are spikes.

Btw, curious what is your use case for this? E.g. if you need to download multiple files you can also do it in parallel that should also keep the network loaded.

@TanukiAI
Copy link
Contributor Author

TanukiAI commented Aug 1, 2020

I think... I did it? #50
I'm not really sure.

Btw, curious what is your use case for this?

I need to download multiple files to reupload them to another cloud service. And I want to do it as fast as possible.

E.g. if you need to download multiple files you can also do it in parallel that should also keep the network loaded.

I don't like downloading multiple files at once because it messes around with the disk i/o and RAM and I am running multiple scripts at once.

@shcheklein
Copy link
Member

Closed by #50 Thanks @TA40 !

@KengoSawa2
Copy link

Hello

Thank you for development and contributions of PyDrive2:)

I want to upload a huge zip file by PyDrive2.
When I looked into the API, GetContents() has a callback for progress.
But UpLoad() does not.

https://gsuitedevs.github.io/PyDrive/docs/build/html/pydrive.html#pydrive.files.GoogleDriveFile.Upload

Are you planning to add a callback to show the upload progress?
Progress is a very important feature as I plan to upload a zip close to 750GB of single file

If the community isn't interested in implementing it, I'd consider implementing it myself. Please let us know your ideas and opinions.

Thank you

@shcheklein
Copy link
Member

@KengoSawa2 that can be a good PR, happy to help with it! thanks 🙏

There is a workaround though, that you could use for now. That's how we do this in DVC (the primary reason we forked and support PyDrive2 at all):

https://github.com/iterative/dvc/blob/5d1c1c418c75677379108980970c6546fba8fe18/dvc/tree/gdrive.py#L362-L373

@shcheklein
Copy link
Member

@KengoSawa2 also, consider creating a separate feature request issue for this.

@KengoSawa2
Copy link

Thank you for your kind response.
I raised the new issue.

#54

I also understand your workaround.
I refer to your workaround so you don't have to rush to officially add features.

Thank you :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants