octue · cortadocodes · Jun 2, 2021 · May 10, 2021 · May 12, 2021 · May 12, 2021
diff --git a/docs/source/analysis_objects.rst b/docs/source/analysis_objects.rst
@@ -27,18 +27,5 @@ your app can always be verified. These hashes exist on the following attributes:
 -   ``configuration_values_hash``
 -   ``configuration_manifest_hash``
 
-If an input or configuration attribute is ``None``, so will its hash attribute be. For ``Manifests``, some metadata
-about the ``Datafiles`` and ``Datasets`` within them, and about the ``Manifest`` itself, is included when calculating
-the hash:
-
-- For a ``Datafile``, the content of its on-disk file is hashed, along with the following metadata:
-
-    - ``name``
-    - ``cluster``
-    - ``sequence``
-    - ``timestamp``
-    - ``tags``
-
-- For a ``Dataset``, the hashes of its ``Datafiles`` are included, along with its ``tags``.
-
-- For a ``Manifest``, the hashes of its ``Datasets`` are included, along with its ``keys``.
+If a strand is ``None``, so will its corresponding hash attribute be. The hash of a datafile is the hash of
+its file, while the hash of a manifest or dataset is the cumulative hash of the files it refers to.
diff --git a/docs/source/child_services.rst b/docs/source/child_services.rst
@@ -104,13 +104,13 @@ The children field must also be present in the ``twine.json`` file:
                 "key": "wind_speed",
                 "purpose": "A service that returns the average wind speed for a given latitude and longitude.",
                 "notes": "Some notes.",
-                "filters": "tags:wind_speed"
+                "filters": "labels:wind_speed"
             },
             {
                 "key": "elevation",
                 "purpose": "A service that returns the elevation for a given latitude and longitude.",
                 "notes": "Some notes.",
-                "filters": "tags:elevation"
+                "filters": "labels:elevation"
             }
         ],
         ...

diff --git a/docs/source/cloud_storage.rst b/docs/source/cloud_storage.rst
@@ -12,7 +12,7 @@ in Octue SDK, please join the discussion `in this issue. <https://github.com/oct
 Data container classes
 ----------------------
 All of the data container classes in the SDK have a ``to_cloud`` and a ``from_cloud`` method, which handles their
-upload/download to/from the cloud, including all relevant metadata from the instance (e.g. tags, ID). Data integrity is
+upload/download to/from the cloud, including all relevant metadata from the instance (e.g. labels, ID). Data integrity is
 checked before and after upload and download to ensure any data corruption is avoided.
 
 Datafile

diff --git a/docs/source/cloud_storage_advanced_usage.rst b/docs/source/cloud_storage_advanced_usage.rst
@@ -26,14 +26,14 @@ to any of these methods.
         local_path=<path/to/file>,
         bucket_name=<bucket-name>,
         path_in_bucket=<path/to/file/in/bucket>,
-        metadata={"tags": ["blah", "glah", "jah"], "cleaned": True, "id": 3}
+        metadata={"id": 3, "labels": ["blah", "glah", "jah"], "cleaned": True, "colour": "blue"}
     )
 
     storage_client.upload_from_string(
         string='[{"height": 99, "width": 72}, {"height": 12, "width": 103}]',
         bucket_name=<bucket-name>,
         path_in_bucket=<path/to/file/in/bucket>,
-        metadata={"tags": ["dimensions"], "cleaned": True, "id": 96}
+        metadata={"id": 96, "labels": ["dimensions"], "cleaned": True, "colour": "red", "size": "small"}
     )
 
 **Downloading**
@@ -61,7 +61,7 @@ to any of these methods.
         bucket_name=<bucket-name>,
         path_in_bucket=<path/to/file/in/bucket>,
     )
-    >>> {"tags": ["dimensions"], "cleaned": True, "id": 96}
+    >>> {"id": 96, "labels": ["dimensions"], "cleaned": True, "colour": "red", "size": "small"}
 
 
 **Deleting**

diff --git a/docs/source/datafile.rst b/docs/source/datafile.rst
@@ -10,7 +10,8 @@ the following main attributes:
 - ``path`` - the path of this file, which may include folders or subfolders, within the dataset.
 - ``cluster`` - the integer cluster of files, within a dataset, to which this belongs (default 0)
 - ``sequence`` - a sequence number of this file within its cluster (if sequences are appropriate)
-- ``tags`` - a space-separated string or iterable of tags relevant to this file
+- ``tags`` - key-value pairs of metadata relevant to this file
+- ``labels`` - a space-separated string or iterable of labels relevant to this file
 - ``timestamp`` - a posix timestamp associated with the file, in seconds since epoch, typically when it was created but could relate to a relevant time point for the data
 
 
@@ -43,14 +44,15 @@ Example A
     bucket_name = "my-bucket",
     datafile_path = "path/to/data.csv"
 
-    with Datafile.from_cloud(project_name, bucket_name, datafile_path, mode="r") as datafile, f:
+    with Datafile.from_cloud(project_name, bucket_name, datafile_path, mode="r") as (datafile, f):
         data = f.read()
         new_metadata = metadata_calculating_function(data)
 
         datafile.timestamp = new_metadata["timestamp"]
         datafile.cluster = new_metadata["cluster"]
         datafile.sequence = new_metadata["sequence"]
         datafile.tags = new_metadata["tags"]
+        datafile.labels = new_metadata["labels"]
 
 
 Example B
@@ -76,7 +78,8 @@ Example B
     datafile.timestamp = datetime.now()
     datafile.cluster = 0
     datafile.sequence = 3
-    datafile.tags = {"manufacturer:Vestas", "output:1MW"}
+    datafile.tags = {"manufacturer": "Vestas", "output": "1MW"}
+    datafile.labels = {"new"}
 
     datafile.to_cloud()  # Or, datafile.update_cloud_metadata()
 
@@ -122,10 +125,11 @@ For creating new data in a new local file:
 
 
     sequence = 2
-    tags = {"cleaned:True", "type:linear"}
+    tags = {"cleaned": True, "type": "linear"}
+    labels = {"Vestas"}
 
 
-    with Datafile(path="path/to/local/file.dat", sequence=sequence, tags=tags, mode="w") as datafile, f:
+    with Datafile(path="path/to/local/file.dat", sequence=sequence, tags=tags, labels=labels, mode="w") as (datafile, f):
         f.write("This is some cleaned data.")
 
     datafile.to_cloud(project_name="my-project", bucket_name="my-bucket", path_in_bucket="path/to/data.dat")
@@ -139,7 +143,8 @@ For existing data in an existing local file:
 
 
     sequence = 2
-    tags = {"cleaned:True", "type:linear"}
+    tags = {"cleaned": True, "type": "linear"}
+    labels = {"Vestas"}
 
-    datafile = Datafile(path="path/to/local/file.dat", sequence=sequence, tags=tags)
+    datafile = Datafile(path="path/to/local/file.dat", sequence=sequence, tags=tags, labels=labels)
     datafile.to_cloud(project_name="my-project", bucket_name="my-bucket", path_in_bucket="path/to/data.dat")
diff --git a/docs/source/dataset.rst b/docs/source/dataset.rst
@@ -8,9 +8,10 @@ A ``Dataset`` contains any number of ``Datafiles`` along with the following meta
 
 - ``name``
 - ``tags``
+- ``labels``
 
 The files are stored in a ``FilterSet``, meaning they can be easily filtered according to any attribute of the
-:doc:`Datafile <datafile>` instances it contains.
+:doc:`Datafile <datafile>` instances contained.
 
 
 --------------------------------
@@ -23,23 +24,26 @@ You can filter a ``Dataset``'s files as follows:
 
     dataset = Dataset(
         files=[
-            Datafile(timestamp=time.time(), path="path-within-dataset/my_file.csv", tags="one a:2 b:3 all"),
-            Datafile(timestamp=time.time(), path="path-within-dataset/your_file.txt", tags="two a:2 b:3 all"),
-            Datafile(timestamp=time.time(), path="path-within-dataset/another_file.csv", tags="three all"),
+            Datafile(path="path-within-dataset/my_file.csv", labels=["one", "a", "b" "all"]),
+            Datafile(path="path-within-dataset/your_file.txt", labels=["two", "a", "b", "all"),
+            Datafile(path="path-within-dataset/another_file.csv", labels=["three", "all"]),
         ]
     )
 
-    dataset.files.filter(filter_name="name__ends_with", filter_value=".csv")
+    dataset.files.filter(name__ends_with=".csv")
     >>> <FilterSet({<Datafile('my_file.csv')>, <Datafile('another_file.csv')>})>
 
-    dataset.files.filter("tags__contains", filter_value="a:2")
+    dataset.files.filter(labels__contains="a")
     >>> <FilterSet({<Datafile('my_file.csv')>, <Datafile('your_file.txt')>})>
 
-You can also chain filters indefinitely:
+You can also chain filters indefinitely, or specify them all at the same time:
 
 .. code-block:: python
 
-    dataset.files.filter(filter_name="name__ends_with", filter_value=".csv").filter("tags__contains", filter_value="a:2")
+    dataset.files.filter(name__ends_with=".csv").filter(labels__contains="a")
+    >>> <FilterSet({<Datafile('my_file.csv')>})>
+
+    dataset.files.filter(name__ends_with=".csv", labels__contains="a")
     >>> <FilterSet({<Datafile('my_file.csv')>})>
 
 Find out more about ``FilterSets`` :doc:`here <filter_containers>`, including all the possible filters available for each type of object stored on

diff --git a/docs/source/filter_containers.rst b/docs/source/filter_containers.rst
@@ -4,43 +4,61 @@
 Filter containers
 =================
 
-A filter container is just a regular python container that has some extra methods for filtering or ordering its
+A filter container is just a regular python container that has some extra methods for filtering and ordering its
 elements. It has the same interface (i.e. attributes and methods) as the primitive python type it inherits from, with
 these extra methods:
 
 - ``filter``
 - ``order_by``
 
-There are two types of filter containers currently implemented:
+There are three types of filter containers currently implemented:
 
 - ``FilterSet``
 - ``FilterList``
+- ``FilterDict``
 
-``FilterSets`` are currently used in:
+``FilterSets`` are currently used in ``Dataset.files`` to store ``Datafiles`` and make them filterable, which is useful
+for dealing with a large number of datasets, while ``FilterList`` is returned when ordering any filter container.
 
-- ``Dataset.files`` to store ``Datafiles``
-- ``TagSet.tags`` to store ``Tags``
-
-You can see filtering in action on the files of a ``Dataset`` :doc:`here <dataset>`.
+You can see an example of filtering of a ``Dataset``'s files :doc:`here <dataset>`.
 
 
 ---------
 Filtering
 ---------
 
-Filters are named as ``"<name_of_attribute_to_check>__<filter_action>"``, and any attribute of a member of the
-``FilterSet`` whose type or interface is supported can be filtered.
+Key points:
+
+* Any attribute of a member of a filter container whose type or interface is supported can be used when filtering
+* Filters are named as ``"<name_of_attribute_to_check>__<filter_action>"``
+* Multiple filters can be specified at once for chained filtering
+* ``<name_of_attribute_to_check>`` can be a single attribute name or a double-underscore-separated string of nested attribute names
+* Nested attribute names work for real attributes as well as dictionary keys (in any combination and to any depth)
 
 .. code-block:: python
 
     filter_set = FilterSet(
-        {Datafile(timestamp=time.time(), path="my_file.csv"), Datafile(timestamp=time.time(), path="your_file.txt"), Datafile(timestamp=time.time(), path="another_file.csv")}
+        {
+            Datafile(path="my_file.csv", cluster=0, tags={"manufacturer": "Vestas"}),
+            Datafile(path="your_file.txt", cluster=1, tags={"manufacturer": "Vergnet"}),
+            Datafile(path="another_file.csv", cluster=2, tags={"manufacturer": "Enercon"})
+        }
     )
 
-    filter_set.filter(filter_name="name__ends_with", filter_value=".csv")
+    # Single filter, non-nested attribute.
+    filter_set.filter(name__ends_with=".csv")
     >>> <FilterSet({<Datafile('my_file.csv')>, <Datafile('another_file.csv')>})>
 
-The following filters are implemented for the following types:
+    # Two filters, non-nested attributes.
+    filter_set.filter(name__ends_with=".csv", cluster__gt=1)
+    >>> <FilterSet({<Datafile('another_file.csv')>})>
+
+    # Single filter, nested attribute.
+    filter_set.filter(tags__manufacturer__startswith("V"))
+    >>> <FilterSet({<Datafile('my_file.csv')>, <Datafile('your_file.csv')>})>
+
+
+These filters are currently available for the following types:
 
 - ``bool``:
 
@@ -73,19 +91,20 @@ The following filters are implemented for the following types:
     * ``is``
     * ``is_not``
 
-- ``TagSet``:
+- ``LabelSet``:
 
     * ``is``
     * ``is_not``
     * ``equals``
     * ``not_equals``
-    * ``any_tag_contains``
-    * ``not_any_tag_contains``
-    * ``any_tag_starts_with``
-    * ``not_any_tag_starts_with``
-    * ``any_tag_ends_with``
-    * ``not_any_tag_ends_with``
-
+    * ``contains``
+    * ``not_contains``
+    * ``any_label_contains``
+    * ``not_any_label_contains``
+    * ``any_label_starts_with``
+    * ``not_any_label_starts_with``
+    * ``any_label_ends_with``
+    * ``not_any_label_ends_with``
 
 
 Additionally, these filters are defined for the following *interfaces* (duck-types). :
@@ -118,14 +137,31 @@ list of filters.
 --------
 Ordering
 --------
-As sets are inherently orderless, ordering a ``FilterSet`` results in a new ``FilterList``, which has the same extra
-methods and behaviour as a ``FilterSet``, but is based on the ``list`` type instead - meaning it can be ordered and
-indexed etc. A ``FilterSet`` or ``FilterList`` can be ordered by any of the attributes of its members:
+As sets and dictionaries are inherently orderless, ordering any filter container results in a new ``FilterList``, which
+has the same methods and behaviour but is based on ``list`` instead, meaning it can be ordered and indexed etc. A
+filter container can be ordered by any of the attributes of its members:
 
 .. code-block:: python
 
     filter_set.order_by("name")
     >>> <FilterList([<Datafile('another_file.csv')>, <Datafile('my_file.csv')>, <Datafile(path="your_file.txt")>])>
 
+    filter_set.order_by("cluster")
+    >>> <FilterList([<Datafile('my_file.csv')>, <Datafile('your_file.csv')>, <Datafile(path="another_file.txt")>])>
+
 The ordering can also be carried out in reverse (i.e. descending order) by passing ``reverse=True`` as a second argument
 to the ``order_by`` method.
+
+
+--------------
+``FilterDict``
+--------------
+The keys of a ``FilterDict`` can be anything, but each value must be a ``Filterable``. Hence, a ``FilterDict`` is
+filtered and ordered by its values' attributes; when ordering, its items (key-value tuples) are returned in a
+``FilterList``.
+
+-----------------------
+Using for your own data
+-----------------------
+If using filter containers for your own data, all the members must inherit from ``octue.mixins.filterable.Filterable``
+to be filterable and orderable.