Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix start #113

Merged
merged 4 commits into from
Jun 17, 2019
Merged

Fix start #113

merged 4 commits into from
Jun 17, 2019

Conversation

manycoding
Copy link
Contributor

@manycoding manycoding commented Jun 13, 2019

  1. Everything is the same for jobs
  2. For collections, start works

On a low-level I separated start (start item key) from start_index
For jobs, I continue to treat them as the same since that what I have seen so far (e.g. "12312/22/33/1000" is 1000th item from the job)

@codecov
Copy link

codecov bot commented Jun 13, 2019

Codecov Report

Merging #113 into 0.3.6dev will increase coverage by 0.07%.
The diff coverage is 82.6%.

Impacted file tree graph

@@             Coverage Diff              @@
##           0.3.6dev     #113      +/-   ##
============================================
+ Coverage     79.09%   79.17%   +0.07%     
============================================
  Files            22       22              
  Lines          1598     1604       +6     
  Branches        277      276       -1     
============================================
+ Hits           1264     1270       +6     
  Misses          289      289              
  Partials         45       45
Impacted Files Coverage Δ
src/arche/arche.py 84.89% <100%> (-0.22%) ⬇️
src/arche/readers/items.py 85.82% <100%> (+0.58%) ⬆️
src/arche/tools/schema.py 87.95% <57.14%> (+1.04%) ⬆️
src/arche/tools/api.py 64.21% <83.33%> (+0.47%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b5543e3...aaccb8a. Read the comment docs.

Copy link

@ejulio ejulio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just minor comments/ideas :)

@@ -139,10 +139,12 @@ def get_items_with_pool(
processes_count = min(max(helpers.cpus_count(), workers), active_connections_limit)
batch_size = math.ceil(count / processes_count)

start_idxs = [i for i in range(start_index, start_index + count, batch_size)]
Copy link

@ejulio ejulio Jun 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list(range(start_index, start_index + count, batch_size)).
But if does not need to be a list, then just range

item_numbers.sort()
if item_numbers[-1] >= items_count or item_numbers[0] < 0:
raise ValueError(item_n_err.format(item_numbers[-1], items_count - 1))
if items_numbers:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if max(items_numbers) > items_count or min(items_numbers) < 0
might read better

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other would be
if not all(0 < v < items_count for v in items_numbers):

# Scrapinghub API returns all posible items even if `count` greater than possible
if start + count > len(self.items):
limit = len(self.items)
if kwargs.get("filter"):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rewrite to only one for statement.

filter = lambda x: True
if kwargs.get('filter'):
    filter = lambda x: x.get(field_name) == value

for item in self.items...
    if counter == count: return
    elif filter(item):...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this but then got this warning about assigning lambdas- https://www.python.org/dev/peps/pep-0008/#programming-recommendations

So I rewrote it as a function.

start = kwargs.get("start", None)
if start:
start_idx = [
i for i, item in enumerate(self.items) if item.get("_key") == start
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rewrite it as generator to avoid evaluating all items (not sure if it would be a good performance improvement)
`start_idx = next(i for i, item in enumerate(self.items) if ...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to overthink it :)

counter = 0
for index in range(start, limit):
if counter == limit:
for item in self.items[start_idx:]:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one for loop as above

@manycoding manycoding merged commit fc5cf41 into 0.3.6dev Jun 17, 2019
@manycoding manycoding deleted the fix_start branch June 17, 2019 18:37
manycoding added a commit that referenced this pull request Jun 18, 2019
@manycoding
Copy link
Contributor Author

manycoding commented Jun 18, 2019

@ejulio I found a bug, fixed here ddc369a#diff-83d30786a51d86a819c6a40df1ba1fc3R112

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants