Skip to content

Conversation

@vshlapakov
Copy link
Contributor

@vshlapakov vshlapakov commented Mar 14, 2017

The main goal of the PR are the following:

  • provide return types everywhere where it could be useful,
  • hint kwargs in methods to clarify which params can be passed,
  • maybe work on style improvements also, i.e. params naming.

I'm also reviewing all the current docstrings and fixing any inconsistencies.

@vshlapakov vshlapakov self-assigned this Mar 14, 2017
Not a public constructor: use :class:`Project` instance to get a
:class:`Activity` instance. See :attr:`Project.activity` attribute.
Please note that list() method can use a lot of memory and for a large
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lines are moved to _Proxy.list method to avoid copy-pasting it everywhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with moving this to a parent class is that it becomes invisible when you check help or docs for the method. E.g. I think we should build documentation based on docstrings later on, this line should be in each list method that can cause such issues.

@vshlapakov
Copy link
Contributor Author

vshlapakov commented Mar 15, 2017

@chekunkov Initial work is done, let me know if there's anything could be improved, I'm afraid that I has lost freshness of vision and can miss something important. Most of the changes in the PR are related with indentation, renaming params and fixing docstrings.


def __init__(self, auth=None, dash_endpoint=None, **kwargs):
# FIXME not sure it's worth to keep all HS kwargs here, most of them
# are appliable only to oldy Hubstorage client
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd keep only necessary kwards here and link a HubstorageClient docs for more info

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, let's keep only necessary kwargs then

connection_timeout=None,
max_retries=None,
max_retry_time=None,
user_agent=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can revert that as I metioned above, but if for some reason you decide to keep it - note that you need to forward input args here, not use None everywhere.

setattr(self, method, wrapped)

def list(self, *args, **kwargs):
def list(self, requests_params=None, **params):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

api docs mention much more parameters, I think they are applicable here https://doc.scrapinghub.com/api/collections.html#collections-project-id-type-collection

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, nice, I'll add it 👍

Returns:
dict: updated set of params
:return: a dict with updated set of params.
:rtype: dict.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove the dot

:param \*\*params: (optional) a set of filters to apply when counting
jobs (e.g. spider, state, has_tag, lacks_tag, startts and endts).
:return: jobs count.
:rtype: int.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think rtype value should have a trailing dot, please check across the repo

('has_tag', has_tag), ('lacks_tag', lacks_tag),
('startts', startts), ('endts', endts),
('meta', meta)]
params.update({k: v for k, v in filter_kwargs if v is not None})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possibly you can generalize filtering? something like

update_kwargs(kwargs, spider=spider, state=state, count=count,
               spidername=spider_name)

note the last one, you can use map to different name if necessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, will do, thanks

return Job(self._client, str(job_key))

def summary(self, _queuename=None, **params):
# FIXME there must me spider_name instead of spider_id for consistency
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spider_name or spider? list takes spider, schedule takes spider_name :) or maybe above methods should be filtered by spider_id for consistency with summary and iter_last?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so spider or spider_id then..I'd vote for spider, name is easier to remember

:meth:`ScrapinghubClient.get_job` and :meth:`Jobs.get` methods.
:ivar projectid: in integer project id.
:ivar project_id: in integer project id.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"in" is not needed here

Not a public constructor: use :class:`Job` instance to get a
:class:`Jobmeta` instance. See :attr:`Job.metadata` attribute.
Usage::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused, why Usage is followed by ::? When rst renders this will produce a literal block, I don't think we want literal block for the whole usage description, I expect that it's a list of short descriptions followed by a literal block with example:

Usage:

- get job metadata instance::

    >>> job.metadata
    <scrapinghub.client.jobs.JobMeta at 0x10494f198>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I'll fix it everywhere 👌

- each string defines:
a successor method name to proxy 1:1 with origin method
- each tuple should consist of 2 strings:
a successor method name and an origin method name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why literal block ::?

Add return types in docstrings
Add hints for kwargs, fix docstrings
Use update_kwargs helper to unify logic
Rename spider_args -> job_args
Unify spider param for different methods
Don't return count from job.update_tags
@vshlapakov
Copy link
Contributor Author

Tests are fixed, merging.

@vshlapakov vshlapakov merged commit 7137456 into sc1467-1 Mar 17, 2017
@vshlapakov vshlapakov deleted the sc1467-1-ide branch March 17, 2017 09:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants