-
Notifications
You must be signed in to change notification settings - Fork 61
Split frontier logic into Frontiers & Frontier #49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
README.rst
Outdated
|
|
||
| Get an iterator to iterate through a frontier slots:: | ||
|
|
||
| >>> frontier.iter_slots() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe replace with iter() and list()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had doubts about it but if we're going to use a separate entity for slots - it looks fine
| return iter(self.list_slots()) | ||
|
|
||
| def list_slots(self): | ||
| return next(self._frontiers._origin.apiget((self.key, 'list'))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a proposal - maybe we should add another entity - Slot or FrontierSlot? Frontier.iter() will go through available slot names, Frontier.get() will get Slot object, Slot.iter() or Slot.list() will do what read() does now, Slot.delete(ids) will delete data from the slot, Frontier.delete(name) will delete whole slot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I'd go with FrontierSlot
|
@chekunkov Ready for next review, I also added integration tests for frontier logic and added new cassettes to #40 (rebased it on top of the PR). |
README.rst
Outdated
|
|
||
| To delete the slot ``example.com`` from the frontier:: | ||
|
|
||
| >>> frontier.delete('example.com') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just realized this can be confusing - usually it's expected to call delete on the object that should be deleted. e.g. we have Job.delete() method. What if we move delete to FrontierSlot, if no arguments are passed the entier slot is removed, if ids are passed - we remove only some requests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, it's bit more consistent with other classes, let's do that
scrapinghub/client.py
Outdated
| origin = self._frontier._frontiers._origin | ||
| return origin.add(self._frontier.key, self.key, fps) | ||
|
|
||
| def iter(self, mincount=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better replace mincount with **kwargs, it's server responsibility to decide what parameters are valid or not
README.rst
Outdated
| To retrieve fingerprints for a given slot:: | ||
|
|
||
| >>> fps = [req['requests'] for req in slot.iter()] | ||
| >>> fps = [req['requests'] for req in slot.q.iter()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmm, shouldn't it be slot.f.iter() / slot.f.list()?
|
After we merge this PR we will probably need to split client.py into smaller modules - 1.3k lines is a bit too much |
|
@chekunkov I think that's it, take a look pls when you have time. |
Frontier newcount counter per slot
We also agreed to move frontier slot logic to a separate entity in a similar way.
Please, review.