Fix start #113

manycoding · 2019-06-13T20:12:18Z

Everything is the same for jobs
For collections, start works

On a low-level I separated start (start item key) from start_index
For jobs, I continue to treat them as the same since that what I have seen so far (e.g. "12312/22/33/1000" is 1000th item from the job)

codecov · 2019-06-13T20:15:04Z

Codecov Report

Merging #113 into 0.3.6dev will increase coverage by 0.07%.
The diff coverage is 82.6%.

@@             Coverage Diff              @@
##           0.3.6dev     #113      +/-   ##
============================================
+ Coverage     79.09%   79.17%   +0.07%     
============================================
  Files            22       22              
  Lines          1598     1604       +6     
  Branches        277      276       -1     
============================================
+ Hits           1264     1270       +6     
  Misses          289      289              
  Partials         45       45

Impacted Files	Coverage Δ
src/arche/arche.py	`84.89% <100%> (-0.22%)`	⬇️
src/arche/readers/items.py	`85.82% <100%> (+0.58%)`	⬆️
src/arche/tools/schema.py	`87.95% <57.14%> (+1.04%)`	⬆️
src/arche/tools/api.py	`64.21% <83.33%> (+0.47%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b5543e3...aaccb8a. Read the comment docs.

ejulio

Just minor comments/ideas :)

ejulio · 2019-06-17T13:06:29Z

src/arche/tools/api.py

@@ -139,10 +139,12 @@ def get_items_with_pool(
    processes_count = min(max(helpers.cpus_count(), workers), active_connections_limit)
    batch_size = math.ceil(count / processes_count)

+    start_idxs = [i for i in range(start_index, start_index + count, batch_size)]


list(range(start_index, start_index + count, batch_size)).
But if does not need to be a list, then just range

ejulio · 2019-06-17T13:08:32Z

src/arche/tools/schema.py

-        item_numbers.sort()
-        if item_numbers[-1] >= items_count or item_numbers[0] < 0:
-            raise ValueError(item_n_err.format(item_numbers[-1], items_count - 1))
+    if items_numbers:


if max(items_numbers) > items_count or min(items_numbers) < 0
might read better

Other would be
if not all(0 < v < items_count for v in items_numbers):

ejulio · 2019-06-17T13:14:40Z

tests/conftest.py

-        # Scrapinghub API returns all posible items even if `count` greater than possible
-        if start + count > len(self.items):
-            limit = len(self.items)
+        if kwargs.get("filter"):


Maybe rewrite to only one for statement.

filter = lambda x: True if kwargs.get('filter'): filter = lambda x: x.get(field_name) == value for item in self.items... if counter == count: return elif filter(item):...

I did this but then got this warning about assigning lambdas- https://www.python.org/dev/peps/pep-0008/#programming-recommendations

So I rewrote it as a function.

ejulio · 2019-06-17T13:17:37Z

tests/conftest.py

+        start = kwargs.get("start", None)
+        if start:
+            start_idx = [
+                i for i, item in enumerate(self.items) if item.get("_key") == start


I'd rewrite it as generator to avoid evaluating all items (not sure if it would be a good performance improvement)
`start_idx = next(i for i, item in enumerate(self.items) if ...)

I don't want to overthink it :)

ejulio · 2019-06-17T13:18:08Z

tests/conftest.py

-            counter = 0
-            for index in range(start, limit):
-                if counter == limit:
+            for item in self.items[start_idx:]:


Only one for loop as above

manycoding · 2019-06-18T18:11:09Z

@ejulio I found a bug, fixed here ddc369a#diff-83d30786a51d86a819c6a40df1ba1fc3R112

manycoding added 2 commits June 13, 2019 15:56

Rewrite Source and add StoreSource mocks

09ff7c1

Separate start_index and start, support for collections, closes #112

274b897

manycoding requested review from raphapassini, wRAR, Gallaecio, ejulio, victor-torres and ivankivanov June 13, 2019 20:12

ejulio approved these changes Jun 17, 2019

View reviewed changes

manycoding added 2 commits June 17, 2019 13:48

Nicer code

b64311a

Merge branch '0.3.6dev' into fix_start

aaccb8a

manycoding merged commit fc5cf41 into 0.3.6dev Jun 17, 2019

manycoding deleted the fix_start branch June 17, 2019 18:37

manycoding added a commit that referenced this pull request Jun 18, 2019

Fix none start for jobs, introduced in #113

ddc369a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix start #113

Fix start #113

manycoding commented Jun 13, 2019 •

edited

Loading

codecov bot commented Jun 13, 2019 •

edited

Loading

ejulio left a comment

ejulio Jun 17, 2019 •

edited

Loading

ejulio Jun 17, 2019

ejulio Jun 17, 2019

ejulio Jun 17, 2019

manycoding Jun 17, 2019

ejulio Jun 17, 2019

manycoding Jun 17, 2019

ejulio Jun 17, 2019

manycoding commented Jun 18, 2019 •

edited

Loading

Fix start #113

Fix start #113

Conversation

manycoding commented Jun 13, 2019 • edited Loading

codecov bot commented Jun 13, 2019 • edited Loading

Codecov Report

ejulio left a comment

Choose a reason for hiding this comment

ejulio Jun 17, 2019 • edited Loading

Choose a reason for hiding this comment

ejulio Jun 17, 2019

Choose a reason for hiding this comment

ejulio Jun 17, 2019

Choose a reason for hiding this comment

ejulio Jun 17, 2019

Choose a reason for hiding this comment

manycoding Jun 17, 2019

Choose a reason for hiding this comment

ejulio Jun 17, 2019

Choose a reason for hiding this comment

manycoding Jun 17, 2019

Choose a reason for hiding this comment

ejulio Jun 17, 2019

Choose a reason for hiding this comment

manycoding commented Jun 18, 2019 • edited Loading

manycoding commented Jun 13, 2019 •

edited

Loading

codecov bot commented Jun 13, 2019 •

edited

Loading

ejulio Jun 17, 2019 •

edited

Loading

manycoding commented Jun 18, 2019 •

edited

Loading