Skip to content

Commit

Permalink
docstring updates
Browse files Browse the repository at this point in the history
  • Loading branch information
svenkreiss committed Jul 30, 2017
1 parent 30f994c commit 2bc479f
Show file tree
Hide file tree
Showing 3 changed files with 26 additions and 24 deletions.
2 changes: 1 addition & 1 deletion docs/sphinx/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ API

.. currentmodule:: pysparkling

A usual ``pysparkling`` session starts with either parallelizing a ``list``
A usual ``pysparkling`` session starts with either parallelizing a `list`
with :func:`Context.parallelize` or by reading data from a file using
:func:`Context.textFile`. These two methods return :class:`RDD` instances that
can then be processed.
Expand Down
5 changes: 2 additions & 3 deletions pysparkling/context.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,10 +102,9 @@ class Context(object):
:param pool: An instance with a ``map(func, iterable)`` method.
:param serializer:
Serializer for functions. Examples are `pickle.dumps` and
``dill.dumps``.
`cloudpickle.dumps`.
:param deserializer:
Deserializer for functions. Examples are `pickle.loads` and
``dill.loads``.
Deserializer for functions. For example `pickle.loads`.
:param data_serializer: Serializer for the data.
:param data_deserializer: Deserializer for the data.
:param int max_retries: maximum number a partition is retried
Expand Down
43 changes: 23 additions & 20 deletions pysparkling/rdd.py
Original file line number Diff line number Diff line change
Expand Up @@ -86,8 +86,8 @@ def aggregate(self, zeroValue, seqOp, combOp):
:param zeroValue:
The initial value to an aggregation, for example ``0`` or ``0.0``
for aggregating ``int`` s and ``float`` s, but any Python object is
possible. Can be ``None``.
for aggregating `int` s and `float` s, but any Python object is
possible.
:param seqOp:
A reference to a function that combines the current state with a
Expand Down Expand Up @@ -126,8 +126,8 @@ def aggregateByKey(self, zeroValue, seqFunc, combFunc, numPartitions=None):
:param zeroValue:
The initial value to an aggregation, for example ``0`` or ``0.0``
for aggregating ``int`` s and ``float`` s, but any Python object is
possible. Can be ``None``.
for aggregating `int` s and `float` s, but any Python object is
possible.
:param seqFunc:
A reference to a function that combines the current state with a
Expand All @@ -137,8 +137,7 @@ def aggregateByKey(self, zeroValue, seqFunc, combFunc, numPartitions=None):
A reference to a function that combines outputs of seqFunc.
In the first iteration, the current state is zeroValue.
:param int numPartitions: (optional)
Not used.
:param int numPartitions: Not used.
:returns: An RDD with the output of ``combOp`` operations.
:rtype: RDD
Expand Down Expand Up @@ -176,7 +175,7 @@ def combFuncByKey(l):
def cache(self):
"""Once a partition is computed, cache the result.
Alias for :func:`RDD.persist`.
Alias for :func:`~pysparkling.RDD.persist`.
Example:
Expand Down Expand Up @@ -334,14 +333,14 @@ def count(self):
resultHandler=sum)

def countApprox(self):
"""same as :func:`RDD.count()`
"""same as :func:`~pysparkling.RDD.count()`
:rtype: int
"""
return self.count()

def countByKey(self):
"""returns a ``dict`` containing the count for every key
"""returns a `dict` containing the count for every key
:rtype: dict
Expand All @@ -357,7 +356,7 @@ def countByKey(self):
return self.map(lambda r: r[0]).countByValue()

def countByValue(self):
"""returns a ``dict`` containing the count for every value
"""returns a `dict` containing the count for every value
:rtype: dict
Expand Down Expand Up @@ -521,7 +520,7 @@ def foldByKey(self, zeroValue, op):
def foreach(self, f):
"""applies ``f`` to every element
It does not return a new RDD like :func:`RDD.map()`.
It does not return a new RDD like :func:`~pysparkling.RDD.map`.
:param f: Apply a function to every element.
:rtype: None
Expand All @@ -542,7 +541,8 @@ def foreach(self, f):
def foreachPartition(self, f):
"""applies ``f`` to every partition
It does not return a new RDD like :func:`RDD.mapPartitions()`.
It does not return a new RDD like
:func:`~pysparkling.RDD.mapPartitions`.
:param f: Apply a function to every partition.
:rtype: None
Expand All @@ -554,7 +554,7 @@ def fullOuterJoin(self, other, numPartitions=None):
"""returns the full outer join of two RDDs
The output contains all keys from both input RDDs, with missing
keys replaced with None.
keys replaced with `None`.
:param RDD other: The RDD to join to this one.
:param int numPartitions: Number of partitions in the resulting RDD.
Expand Down Expand Up @@ -592,7 +592,10 @@ def getNumPartitions(self):
return len(self.partitions())

def getPartitions(self):
"""returns the partitions of this RDD"""
"""returns the partitions of this RDD
:rtype: list
"""
return self.partitions()

def groupBy(self, f, numPartitions=None):
Expand Down Expand Up @@ -1038,7 +1041,7 @@ def reduceByKey(self, f):
:rtype: RDD
.. note::
This operation includes a :func:`pysparkling.RDD.groupByKey()`
This operation includes a :func:`~pysparkling.RDD.groupByKey()`
which is a local operation.
Expand Down Expand Up @@ -1070,7 +1073,7 @@ def repartitionAndSortWithinPartitions(
:param int numPartitions: Number of partitions in new RDD.
:param partitionFunc: function that partitions
:param ascending: Default is True.
:param ascending: Sort order.
:param keyfunc: Returns the value that will be sorted.
:rtype: RDD
Expand Down Expand Up @@ -1330,9 +1333,9 @@ def sortBy(self, keyfunc, ascending=True, numPartitions=None):
"""sort by keyfunc
:param keyfunc: Returns the value that will be sorted.
:param ascending: Default is True.
:param ascending: Specify sort order.
:param int numPartitions:
Default is None. None means the output will have the same number of
`None` means the output will have the same number of
partitions as the input.
:rtype: RDD
Expand Down Expand Up @@ -1365,8 +1368,8 @@ def sortByKey(self, ascending=True, numPartitions=None,
keyfunc=itemgetter(0)):
"""sort by key
:param ascending: Default is True.
:param int numPartitions: Default is None. None means the output will
:param ascending: Sort order.
:param int numPartitions: `None` means the output will
have the same number of partitions as the input.
:param keyfunc: Returns the value that will be sorted.
:rtype: RDD
Expand Down

0 comments on commit 2bc479f

Please sign in to comment.