-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix start #113
Fix start #113
Conversation
Codecov Report
@@ Coverage Diff @@
## 0.3.6dev #113 +/- ##
============================================
+ Coverage 79.09% 79.17% +0.07%
============================================
Files 22 22
Lines 1598 1604 +6
Branches 277 276 -1
============================================
+ Hits 1264 1270 +6
Misses 289 289
Partials 45 45
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just minor comments/ideas :)
src/arche/tools/api.py
Outdated
@@ -139,10 +139,12 @@ def get_items_with_pool( | |||
processes_count = min(max(helpers.cpus_count(), workers), active_connections_limit) | |||
batch_size = math.ceil(count / processes_count) | |||
|
|||
start_idxs = [i for i in range(start_index, start_index + count, batch_size)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
list(range(start_index, start_index + count, batch_size))
.
But if does not need to be a list, then just range
src/arche/tools/schema.py
Outdated
item_numbers.sort() | ||
if item_numbers[-1] >= items_count or item_numbers[0] < 0: | ||
raise ValueError(item_n_err.format(item_numbers[-1], items_count - 1)) | ||
if items_numbers: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if max(items_numbers) > items_count or min(items_numbers) < 0
might read better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other would be
if not all(0 < v < items_count for v in items_numbers):
tests/conftest.py
Outdated
# Scrapinghub API returns all posible items even if `count` greater than possible | ||
if start + count > len(self.items): | ||
limit = len(self.items) | ||
if kwargs.get("filter"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rewrite to only one for
statement.
filter = lambda x: True
if kwargs.get('filter'):
filter = lambda x: x.get(field_name) == value
for item in self.items...
if counter == count: return
elif filter(item):...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this but then got this warning about assigning lambdas- https://www.python.org/dev/peps/pep-0008/#programming-recommendations
So I rewrote it as a function.
tests/conftest.py
Outdated
start = kwargs.get("start", None) | ||
if start: | ||
start_idx = [ | ||
i for i, item in enumerate(self.items) if item.get("_key") == start |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rewrite it as generator to avoid evaluating all items (not sure if it would be a good performance improvement)
`start_idx = next(i for i, item in enumerate(self.items) if ...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to overthink it :)
tests/conftest.py
Outdated
counter = 0 | ||
for index in range(start, limit): | ||
if counter == limit: | ||
for item in self.items[start_idx:]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only one for
loop as above
@ejulio I found a bug, fixed here ddc369a#diff-83d30786a51d86a819c6a40df1ba1fc3R112 |
start
worksOn a low-level I separated
start
(start item key) fromstart_index
For jobs, I continue to treat them as the same since that what I have seen so far (e.g. "12312/22/33/1000" is 1000th item from the job)