-
Notifications
You must be signed in to change notification settings - Fork 61
Improve client to use it via IDE #54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Not a public constructor: use :class:`Project` instance to get a | ||
| :class:`Activity` instance. See :attr:`Project.activity` attribute. | ||
| Please note that list() method can use a lot of memory and for a large |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lines are moved to _Proxy.list method to avoid copy-pasting it everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with moving this to a parent class is that it becomes invisible when you check help or docs for the method. E.g. I think we should build documentation based on docstrings later on, this line should be in each list method that can cause such issues.
|
@chekunkov Initial work is done, let me know if there's anything could be improved, I'm afraid that I has lost freshness of vision and can miss something important. Most of the changes in the PR are related with indentation, renaming params and fixing docstrings. |
scrapinghub/client/__init__.py
Outdated
|
|
||
| def __init__(self, auth=None, dash_endpoint=None, **kwargs): | ||
| # FIXME not sure it's worth to keep all HS kwargs here, most of them | ||
| # are appliable only to oldy Hubstorage client |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd keep only necessary kwards here and link a HubstorageClient docs for more info
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, let's keep only necessary kwargs then
scrapinghub/client/__init__.py
Outdated
| connection_timeout=None, | ||
| max_retries=None, | ||
| max_retry_time=None, | ||
| user_agent=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can revert that as I metioned above, but if for some reason you decide to keep it - note that you need to forward input args here, not use None everywhere.
scrapinghub/client/collections.py
Outdated
| setattr(self, method, wrapped) | ||
|
|
||
| def list(self, *args, **kwargs): | ||
| def list(self, requests_params=None, **params): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
api docs mention much more parameters, I think they are applicable here https://doc.scrapinghub.com/api/collections.html#collections-project-id-type-collection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, nice, I'll add it 👍
scrapinghub/client/items.py
Outdated
| Returns: | ||
| dict: updated set of params | ||
| :return: a dict with updated set of params. | ||
| :rtype: dict. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove the dot
scrapinghub/client/jobs.py
Outdated
| :param \*\*params: (optional) a set of filters to apply when counting | ||
| jobs (e.g. spider, state, has_tag, lacks_tag, startts and endts). | ||
| :return: jobs count. | ||
| :rtype: int. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think rtype value should have a trailing dot, please check across the repo
scrapinghub/client/jobs.py
Outdated
| ('has_tag', has_tag), ('lacks_tag', lacks_tag), | ||
| ('startts', startts), ('endts', endts), | ||
| ('meta', meta)] | ||
| params.update({k: v for k, v in filter_kwargs if v is not None}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
possibly you can generalize filtering? something like
update_kwargs(kwargs, spider=spider, state=state, count=count,
spidername=spider_name)
note the last one, you can use map to different name if necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, will do, thanks
scrapinghub/client/jobs.py
Outdated
| return Job(self._client, str(job_key)) | ||
|
|
||
| def summary(self, _queuename=None, **params): | ||
| # FIXME there must me spider_name instead of spider_id for consistency |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spider_name or spider? list takes spider, schedule takes spider_name :) or maybe above methods should be filtered by spider_id for consistency with summary and iter_last?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, so spider or spider_id then..I'd vote for spider, name is easier to remember
scrapinghub/client/jobs.py
Outdated
| :meth:`ScrapinghubClient.get_job` and :meth:`Jobs.get` methods. | ||
| :ivar projectid: in integer project id. | ||
| :ivar project_id: in integer project id. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"in" is not needed here
scrapinghub/client/jobs.py
Outdated
| Not a public constructor: use :class:`Job` instance to get a | ||
| :class:`Jobmeta` instance. See :attr:`Job.metadata` attribute. | ||
| Usage:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused, why Usage is followed by ::? When rst renders this will produce a literal block, I don't think we want literal block for the whole usage description, I expect that it's a list of short descriptions followed by a literal block with example:
Usage:
- get job metadata instance::
>>> job.metadata
<scrapinghub.client.jobs.JobMeta at 0x10494f198>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I'll fix it everywhere 👌
scrapinghub/client/utils.py
Outdated
| - each string defines: | ||
| a successor method name to proxy 1:1 with origin method | ||
| - each tuple should consist of 2 strings: | ||
| a successor method name and an origin method name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why literal block ::?
Add return types in docstrings Add hints for kwargs, fix docstrings Use update_kwargs helper to unify logic Rename spider_args -> job_args Unify spider param for different methods Don't return count from job.update_tags
aa270a1 to
b6c86ff
Compare
|
Tests are fixed, merging. |
The main goal of the PR are the following:
I'm also reviewing all the current docstrings and fixing any inconsistencies.