-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Batched Uploads and Green Threads #14
Conversation
Breaks localtaskqueue. We can just have slow green threads instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can test this change along with my next test run once the tests are passing.
Splitting the features into separate PR's would help me speed up review and tests.
taskqueue/aws_queue_api.py
Outdated
try: | ||
iter(obj) | ||
if type(obj) is dict: | ||
return [ obj ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
list(obj)
does what you want, [obj]
just puts the dictionary itself in a list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's actually what I wanted for this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. If it's not a "general purpose" function, you could try to hide it a bit from the user - maybe name it _to_iterable
. Not crucial, though.
@@ -6,6 +6,17 @@ | |||
|
|||
from .secrets import aws_credentials | |||
|
|||
|
|||
def toiter(obj): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would change the name to to_iterable
- you are not returning an iterator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I am!
taskqueue/aws_queue_api.py
Outdated
AWS_BATCH_SIZE = 10 | ||
|
||
resps = [] | ||
for i in range(0, len(tasks), AWS_BATCH_SIZE): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iterables
are not guaranteed to have len
(generators, for example)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true and a headache. I should call it "tolist".
def check_monkey_patch_status(self): | ||
import gevent.monkey | ||
if not gevent.monkey.is_module_patched("socket"): | ||
print(yellow(""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yellow
is not defined
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
e4c68fb
to
b61a1e9
Compare
@nkemnitz latest changes include allowing the use of generators instead of lists in Generators are necessary for two reasons when the number of tasks being generated becomes large. Firstly, they allow the commencement of upload nearly instantly. Previously, while batching was an option, typically people would create lists of hundreds of thousands or millions of elements before submitting them for upload. This process took many seconds of time to complete, resulting in substantial latency. Secondly, these lists of millions of items would often result in MemoryErrors, requiring more complex batching logic. Now, so long as the user structures their submission logic as a generator, both issues are disposed of completely. |
@@ -6,7 +6,7 @@ | |||
import pytest | |||
|
|||
import taskqueue | |||
from taskqueue import RegisteredTask, TaskQueue, MockTask, PrintTask, LocalTaskQueue, MockTaskQueue | |||
from taskqueue import RegisteredTask, GreenTaskQueue, TaskQueue, MockTask, PrintTask, LocalTaskQueue, MockTaskQueue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No monkey patching here, so GreenTaskQueue will simply behave as a single threaded TaskQueue. monkey patching was disrupting LocalTaskQueue.
a840860
to
8d2482e
Compare
8d2482e
to
f450e39
Compare
df58d58
to
eb688c7
Compare
eb688c7
to
3d752d8
Compare
8384f8a
to
e06192c
Compare
e06192c
to
4adf1f0
Compare
This contains a few features.
delay_seconds
to insert and insert_all which allows you to delay a message's visibility in the queue.