In [1]:
%load_ext autoreload
%autoreload 2

This notebook is meant to demo a few dev techniques when tackling a new problem. 

It involves: TDD, typing, scaffolding, protocols, facades, repository pattern,... 

## The problem

The original problem: **As a user, I can tag (recording) sessions, view my tags, and retrieve the sessions for a given tag.**

_(Note that tagging is equivalent to grouping, and is itself more general than heirarchical (i.e. tree structure) grouping.)_

Questions:

- Where do I store the tag/sessions info?
- How (what data structure, what am I actually storing)?

Answers:

Shouldn't start with those questions. Those questions have to do with specifics. Of course, we'll need to make choices about those questions eventually, but we'll produce a brittle code-base if we don't first think of abstractions.

Let's first think of a more general expression of what we're actually trying to do, and find a less specific vocabulary to describe it. We can (maybe even should) reintroduce the domain-specific vocabulary, as a layer on top of the more general mechanism, but we want to give ourselves a chance of knowing what the more abstract problem pattern is.

The transformed, more general, so reusable, problem: **Same as above but replace "sessions" with "objects".**

Then:

* First look at what makes sense in the domain/interface -- Express this with types and tests

* Then implement the interface, with a backend that corresponds to the current constraints

Our tagging problem comes down to CRUD on `(tag, obj)` pairs. 
We assume a many-to-many situation where a same `obj` can have several `tags` and 
a same `tag` can be used by several `objs`. 
Assuming we don't have to track anything further than a **set** of `(tag, obj)` pairs
(for example, no multiplicity, or date, or author of the pair), we can use a 
`typing.MutableMapping` interface for tag-obj pairs: 
Either `objs[obj] = tags` or `tags[tag] = objs`.

An example of a `tags[tag] = objs` interface:

```python

# Example if we use sets of objects
tags['favorites'] = {obj1, obj2}  # create a (tag-)group called "favorites" and put two object in it
tags['favorites'].add(obj3)  # tag obj3 by 'favorites', i.e. add an object in the "favorites" group
tags['favorites'] |= {obj3}  # (approx) equivalent -- to add multiple objects at the same time
assert tags['favorites'] == {obj1, obj2, obj3}  # get objects for a given group

# Example if we use lists of objects
tags['favorites'] = [obj1, obj2]  # create a group called "favorites" and put two object in it
tags['favorites'].append(obj3)  # add an object in the "favorites" group
tags['favorites'].extend([obj3])  # (approx) equivalent -- to add multiple objects at the same time
tags['favorites'] += [obj3]  # (approx) equivalent
assert tags['favorites'] == [obj1, obj2, obj3]  # get objects for a given group

# List tags
assert 'favorites' in list(tags)  # I can list tags (and find 'favorites' in the list)
assert 'favorites' in tags  # the tags Mapping is also a container (has a `__contains__`)

```

Note that `objs[obj] = tags` could seem like a strange choice. 
Wouldn't `tags[obj] == set_of_tags_associated_with_obj` read better? 
Indeed it would. But then `list(tags)` would not read very well! 
You would expect that `list(tags)` would give you a list of tags, but in 
the "better" alternative we just proposed, it would actually be a list of objects.

## TDD: Tests that describe the behavior we want

In [1]:
from typing import MutableMapping, Iterable, Any, NewType, Callable, Protocol, Optional

Tag = NewType('Tag', str)
Obj = NewType('Object', Any)  # or just object?


class TaggerProtocol(Protocol):
    def tag_objs(self, tag: Tag, *objs: Iterable[Obj]) -> Any:
        """tag one or several objs """

    def tags(self, obj: Optional[Obj] = None) -> Iterable[Tag]:
        """List tags of obj, or all tags if obj is None"""

    def objs(self, tag: Optional[Tag] = None) -> Iterable[Obj]:
        """List objs with tag, or all objs if tag is None"""


def test_tagger(tagger: TaggerProtocol):
    # the following assertion isn't part of the behavior we want -- just a condition we'll 
    # need to be able to conduct our test: Namely, that our collection of tagings/objs is empty.

    assert list(tagger.tags()) == []  # make sure test is well setup (tagger is empty)

    tagger.tag_objs('tag_a', 'obj_1', 'obj_2')
    assert sorted(tagger.tags()) == ['tag_a']  # unfiltered tags() method

    tagger.tag_objs('tag_b', 'obj_3')
    assert sorted(tagger.tags()) == ['tag_a', 'tag_b']  
    assert sorted(tagger.objs('tag_a')) == ['obj_1', 'obj_2']  # filtered objs() method

    tagger.tag_objs('tag_c', 'obj_3')
    assert sorted(tagger.tags()) == ['tag_a', 'tag_b', 'tag_c']
    assert sorted(tagger.tags('obj_3')) == ['tag_b', 'tag_c']  # filtered tags() method

    # unfiltered objs() method
    assert sorted(tagger.objs()) == ['obj_1', 'obj_2', 'obj_3']


Now we'll implement two concrete `TaggerProtocol`, using a store, a `MutableMapping`, as a back-end so as to keep the persistance concern still separate. 

The idea is: As long as we provide our concrete persister with the right `MutableMapping` facade (with a minimum of specifics/semantics such as what the keys and values are meant to be), we should have a working object.

The two `TaggerProtocol` options will differ on the particulars of the store. 
- In the first, we'll assume the store has objs as keys and tags as values. 
- In the second we'll assume the tags are the keys, and values are sets of objs of that group.

We now have our facade for tagging -- we'll go further and facade the persister that our tagger will use with a `typing.MutableMapping`
(i.e. a dict-like key-value interface).
Then, all we'll need to do to implement a concrete tagger is endow this interface with a concrete backend (a data base, or file system etc.).

So now we need to choose what the key and what the value should be in our key-value interface. 

We have two obvious choices:

```python
# Group `tags` under the `obj` that
# TagsForObj: Each `obj` points to the `tags` that it was tagged with
s[obj] = tags
# ObjsForTag: Each `tag` points to the `objs` that were tagged by it
s[tag] = objs
```

Insofar as objects and tags are of the same type, both solutions are identical from an implementation point of view. 
Further, once you have one, you can get the other:

```python
def get_objs_of_tag(tag, tags_of_obj: Mapping[Obj, Tags]) -> Mapping[Tag, Objs]:
    from collections import defaultdict

    objs_of_tag = defaultdict(set)
    for obj, tags in tags_of_obj.objs():
        for tag in tags:
            objs_of_tag[tag].add(obj)
```

(See more about this in the "bi-directional mapping" appendix below.)

Practically, though, one of these grouping forms may usually make more sense than the other, given the context. 

Below we'll implement two taggers, both using a `store: typing.MutableMapping` 
as its "persistence" backend: The first assuming `store[obj]` is a set of tags
and the second assuming `store[tag]` is a set of objects. 

We'll then be able to implement our concrete tagger by injecting the `store` dependency.




## Concrete Tagger (option 1): Using a TagsOfObj store

In [31]:
from typing import MutableMapping, Iterable, Any, NewType, Set
from dataclasses import dataclass

Tag = NewType('Tag', str)
Obj = NewType('Object', Any)  # or just object?
TagsOfObj = NewType('TagsOfObj', MutableMapping[Obj, Set[Tag]])

def flatten_set(set_of_sets):
    return {obj for subset in set_of_sets for obj in subset}


@dataclass
class TagsOfObjDacc:
    store: TagsOfObj

    def tags(self, obj: Optional[Obj] = None) -> Iterable[Tag]:
        if obj is None:
            return flatten_set(self.store.values())
        else:
            return self.store[obj]

    def objs(self, tag: Optional[Tag] = None) -> Iterable[Obj]:
        # TODO: Express this filtering in such a way that will allow us to take advantage of DB specifics
        #  (e.g., passing on the filtering to the DB instead of filtering in python itself)
        if tag is None:
            return self.store.keys()
        else:
            return (obj for obj, tags in self.store.items() if tag in tags)
        
    def tag_objs(self, tag: Tag, *objs: Iterable[Obj]) -> Any:
        for obj in objs:
            _add_tag_to_obj(self.store, obj, tag)
            


class TagsOfObjDacc2(TagsOfObjDacc):
    # Option 2: Depend only on value having add method (like a set does)
    def tag_objs(self, tag: Tag, *objs: Iterable[Obj]) -> Any:
        for obj in objs:
            _add_tag_to_obj_2(self.store, obj, tag)

    
def _add_tag_to_obj(store: TagsOfObj, obj: Obj, tag: Tag):
    # Option 1: Depend on a (immutable) __or__ method that returns the union
    # store.setdefault(obj, set()) | {tag}
    tags = store.get(obj, set()) | {tag}
    store[obj] = tags

def _add_tag_to_obj_2(store: TagsOfObj, obj: Obj, tag: Tag, ):
    # Option 2: Depend on a (mutable) add method
    store.setdefault(obj, set()).add(tag)

def _add_tag_to_obj_3(store: TagsOfObj, obj: Obj, tag: Tag, ):
    # Option 3: Depend on:
    # * default value if key not in store
    # * value has an __ior__ method (like a set does)
    store[obj] |= {tag}



# Test it:
store = dict()  # make a store for TagsForObj to use
tagger = TagsOfObjDacc(store)  # make a tagger (that will use that store to "persist")
test_tagger(tagger)  # test the tagger

store = dict()  # make a store for TagsForObj2 to use
tagger = TagsOfObjDacc2(store)  # make a tagger (that will use that store to "persist")
test_tagger(tagger)  # test the tagger



## Concrete Tagger (option 2): Using a ObjsOfTag store

In [32]:
from typing import MutableMapping, Iterable, Any, NewType, Set
from dataclasses import dataclass

TagsForObj2ag = NewType('Tag', str)
Obj = NewType('Object', Any)  # or just object?
ObjsOfTag = NewType('ObjsOfTag', MutableMapping[Tag, Set[Obj]])

# TODO: Note that the tags and objs methods are essentially those of tagsForObj, swapped
#  Let's use that fact!
@dataclass
class ObjsOfTagDacc:
    store: ObjsOfTag
        
    def tag_objs(self, tag: Tag, *objs: Iterable[Obj]) -> Any:
        if tag not in self.store:
            self.store[tag] = set()
        self.store[tag] |= set(objs)

    def tags(self, obj: Optional[Obj] = None) -> Iterable[Tag]:
        if obj is None:
            return set(self.store)
        else:
            return set(tag for tag, objs in self.store.items() if obj in objs)
            
    def objs(self, tag: Optional[Tag] = None) -> Iterable[Obj]:
        if tag is None:
            return flatten_set(self.store.values())
        else:
            return self.store[tag]
    
# Test it:
store = dict()  # make a store for tagsForObj to use
tagger = ObjsOfTagDacc(store)  # make a tagger (that will use that store to "persist")
test_tagger(tagger)  # test the tagger

## Actual persisting stores using mongo

In [4]:
# from mongodol.base import MongoClient

### ObjTagPairs (for TagsForObj)

Options for implementing:

```
s[obj] = tags
```

Option 1: But here we'd need to produce the ID on write

```
--> {'_id': ID, 'tags': tags, 'obj': obj}
```

Option 2: But we need to allow re-writes on `_id`

```
--> {'_id': obj, 'tags': tags}
```


In [33]:
# Option 2

from operator import itemgetter

from dol import wrap_kvs, Pipe
from mongodol.stores import MongoStore  # Note: This is the original MongoStore, not based on the "new" mongodol objects


# To be able to overwrite an existing (obj, taging) pair (by default MongoStore doesn't allow it)
def delete_if_exists(self, k, v):
    if k in self:
        del self[k]
    return v


trans = Pipe(
    wrap_kvs(
        key_of_id=itemgetter('_id'), 
        id_of_key=lambda x: {'_id': x}, 
        obj_of_data=lambda x: set(x['tags']),
        data_of_obj=lambda x: {'tags': list(x)}, 
        preset=delete_if_exists,
    )
)

@trans
class TagStore(MongoStore):
    """To taging objs"""
    def __init__(self,
        db_name='scrap',
        collection_name='tagged_objects',
        mongo_client_kwargs=None,
    ):
        super().__init__(
            db_name=db_name,
            collection_name=collection_name,
            key_fields=['_id'],
            data_fields=['tags'],
            mongo_client_kwargs=mongo_client_kwargs
        )
    
store = TagStore()

# empty the store
for k in store: 
    del store[k]

tagger = TagsOfObjDacc(store)
test_tagger(tagger)


In [34]:
underlying_store = store.store
list(zip(underlying_store, underlying_store.values())) 

[({'_id': 'obj_1'}, {'tags': ['tag_a']}),
 ({'_id': 'obj_2'}, {'tags': ['tag_a']}),
 ({'_id': 'obj_3'}, {'tags': ['tag_c', 'tag_b']})]

In [35]:
base_store = store.store.store
list(zip(base_store, base_store.values())) 

[({'_id': 'obj_1'}, {'tags': ['tag_a']}),
 ({'_id': 'obj_2'}, {'tags': ['tag_a']}),
 ({'_id': 'obj_3'}, {'tags': ['tag_c', 'tag_b']})]

... to be continued

## Implementation that uses "metadata" collection

Say you already have a mongo collection that contains meta-data on your objects. 
That is, a collection that contains docs, one per object, 
that is intended to record information about this object,
referenced by a (unique) `ref` field.

We add a `tags` field to contain the tags for the object that is being referenced by `ref`.

A natural container to hold the tags would be a set if we want to ensure there are no duplicates. 
But JSON doesn't have set types, so in MongoDB, we'd have to use a list. 
This is fine, since we can ensure uniqueness at the access level, but we'll chose 
anyway to use a dict ("object" in JSON) instead, whose fields will be the tags, 
and whose values will all be `True`. That is, instead of `["a", "list", "of", "tags"]`, 
we'll use `{"a": True, "list": True, "of": True, "tags": True}`. 

This will not only ensure uniqueness, but also 
enable us to index specific tags if and when we want efficient tag grouping, 
i.e. being able to efficient respond to the query "all objects tagged [TAG]". 

In [36]:
test_metadata_docs = [
    {
        "_id": "123",
        "ref": "absolute/reference/to/content",
        "some": "other metadata",  # just to show there can be other stuff
        "tags": {
            # instead of a list, we'll use an object (dict), whose fields are the taging names
            # This is because mongoDB allows us to index fields, therefore automatically 
            # get the bidirectional mapping from tagings to refs the taging "contains"
            "apple": True,
            "sauce": True,
        }
    },
    {
        "_id": "456",
        "ref": "absolute/reference/to/some/other/content",
        "tags": {
            "apple": True,
            "pie": True,
        }
    },
    {
        "_id": "789",
        "ref": "this/ref/is/necessary",
        "optional": "metadata",
        # and no tags here (but whenever someone/something adds a tag, it will be added here)
    }
]

# Make a collection with only those docs in it

from mongodol import MongoCollectionPersister

def prepare_test_collection(mongo_store, test_metadata_docs=test_metadata_docs):
    assert mongo_store.mgc.database.name == 'test', "needs to be in a test db for safety"

    # Make a "raw" collection persister for the collection underlying the mongo_store
    collection_uri = f"{mongo_store.mgc.database.name}/{mongo_store.mgc.name}"
    s = MongoCollectionPersister(collection_uri, iter_projection=None)

    # delete all docs
    for doc in s:
        del s[doc]

    # verify there's no more docs
    assert list(s) == []

    # write the test docs
    for doc in test_metadata_docs:
        s[doc] = doc

    # verify the docs have been written
    assert list(s) == test_metadata_docs

    # return this mongo collection (in case user wants to use it)
    return s


def test_metadata_based_tagger_mongo_store(
        tag_store_cls, 
        collection_uri='test/tagged_objects'
    ):

    m = tag_store_cls(collection_uri)
    prepare_test_collection(m)

    # Case: The ref exists, and is already tagged
    ref = "absolute/reference/to/content"
    assert m[ref] == {"apple", "sauce"}
    m[ref] |= ["sauce", "pan"]
    m[ref] |= "a string"
    assert m[ref] == {"apple", "sauce", "pan", "a string"}

    # Extra: see what the doc looks like
    assert next(iter(m.store[{"ref": ref}])) == {
        "_id": "123",
        "ref": "absolute/reference/to/content",
        "some": "other metadata",
        "tags": {"a string": True, "apple": True, "pan": True, "sauce": True},
    }

    # Case: The ref exists, but hasn't been tagged yet
    ref = 'this/ref/is/necessary'
    assert m[ref] == set([])
    m[ref] = 'one tag'  # can insert a single tag with assignment (though `m[ref] |= {'one tag'}` works too!)
    assert m[ref] == {'one tag'}
    m[ref] = 'another tag'  # but if you use assignment, you're replacing all existing tags
    assert m[ref] == {'another tag'}
    # NOTE: As always with "inplace" mutating operations, if we would want to do
    #  `m[ref].add(...)`` like we can do with set.add, we'd need to have `m[ref]`` be an
    #  object that would "write back" to mongo

    # Case: The ref doesn't exist.
    ref = 'no/such/ref'
    m[ref] |= ['a', 'doc', 'is', 'created']
    assert m[ref] == {'a', 'created', 'doc', 'is'}





In [9]:
from operator import itemgetter

from dol import wrap_kvs, Pipe
# from mongodol.stores import MongoStore
from mongodol.base import MongoCollectionPersister, Mapping


# TODO: Should this be the default MongoCollectionPersister? 
# TODO: Should the replace/update be a param?
# TODO: Do any other methods need to be updated for the "update" (vs replace) mode?
class MongoCollectionUpdater(MongoCollectionPersister):
    def __setitem__(self, k, v):
        assert isinstance(k, Mapping) and isinstance(
            v, Mapping
        ), f'k (key) and v (value) must both be mappings (often dictionaries). Were:\n\tk={k}\n\tv={v}'
        return self.mgc.update_one(
            self._merge_with_filt(k),
            {"$set": self._build_doc(k, v)},
            upsert=True,
        )


class UnicityError(ValueError):
    """When something should have been unique and wasn't"""

class ConjunctiveSet(set):
    """A set that is forgiving from a point of view of conjunctions. 
    
    That is, you can conjunct ("or") it with a string, list, tuple, or any iterable,
    and instead of getting a `TypeError: unsupported operand type(s)` we get what 
    you'd most often expect:

    >>> s = ConjunctiveSet({'this', 'and'})
    >>> assert s | 'that' == {'this', 'and', 'that'}
    >>> assert s | ['the', 'other'] == {'this', 'and', 'the', 'other'}
    
    """
    def __or__(self, other):
        if isinstance(other, str):
            other = {other}
        return super(ConjunctiveSet, self).__or__(set(other))
    
def there_is_more_in_the_cursor(cursor):
    return next(cursor, None) is not None

def first_doc_of_cursor_or_empty_dict(cursor):
    v = next(cursor, None)
    if v is None:
        return dict()
    elif there_is_more_in_the_cursor(cursor):
        raise UnicityError(f"There should be only one match, but was several")
    else:
        return v
    

def value_decode(cursor):
    doc = first_doc_of_cursor_or_empty_dict(cursor)
    return ConjunctiveSet(doc.get('tags', set()))

def value_encode(tags):
    if isinstance(tags, str):
        tags = {tags}
    return {"tags": {tag: True for tag in tags}}

trans = Pipe(
    wrap_kvs(
        key_of_id=itemgetter('ref'), 
        id_of_key=lambda x: {'ref': x}, 
        obj_of_data=value_decode,
        data_of_obj=value_encode,
        # preset=delete_if_exists,
    )
)

@trans
class MetadataTagStore(MongoCollectionUpdater):
    """To group items"""
    def __init__(self,
        mgc='test/tagged_objects',
        iter_projection=('ref',), 
        # iter_projection=None, #tuple({'ref': True, '_id': False}.items()),
        **mgc_find_kwargs,
    ):
        # if iter_projection is not None:
        #     iter_projection = dict(iter_projection)
        super().__init__(
            mgc=mgc,
            iter_projection=iter_projection,
            **mgc_find_kwargs
        )


test_metadata_based_tagger_mongo_store(MetadataTagStore)

In [43]:
# Note: NOT WORKING YET!!
from operator import itemgetter

from dol import wrap_kvs, Pipe
# from mongodol.stores import MongoStore
from mongodol.base import MongoCollectionPersister, Mapping

# TODO: Should this be the default MongoCollectionPersister? 
# TODO: Should the replace/update be a param?
# TODO: Do any other methods need to be updated for the "update" (vs replace) mode?
class MongoCollectionUpdater(MongoCollectionPersister):
    def __setitem__(self, k, v):
        assert isinstance(k, Mapping) and isinstance(
            v, Mapping
        ), f'k (key) and v (value) must both be mappings (often dictionaries). Were:\n\tk={k}\n\tv={v}'
        return self.mgc.update_one(
            self._merge_with_filt(k),
            {"$set": self._build_doc(k, v)},
            upsert=True,
        )

class UnicityError(ValueError):
    """When something should have been unique and wasn't"""

class ConjunctiveSet(set):
    """A set that is forgiving from a point of view of conjunctions. 
    
    That is, you can conjunct ("or") it with a string, list, tuple, or any iterable,
    and instead of getting a `TypeError: unsupported operand type(s)` we get what 
    you'd most often expect:

    >>> s = ConjunctiveSet({'this', 'and'})
    >>> assert s | 'that' == {'this', 'and', 'that'}
    >>> assert s | ['the', 'other'] == {'this', 'and', 'the', 'other'}
    
    """
    def __or__(self, other):
        if isinstance(other, str):
            other = {other}
        return super(ConjunctiveSet, self).__or__(set(other))
    
def there_is_more_in_the_cursor(cursor):
    return next(cursor, None) is not None

def first_obj_of_cursor_or_empty_dict(cursor):
    v = next(cursor, None)
    if v is None:
        return dict()
    elif there_is_more_in_the_cursor(cursor):
        raise UnicityError(f"There should be only one match, but was several")
    else:
        return v
    

def value_decode(cursor):
    doc = first_obj_of_cursor_or_empty_dict(cursor)
    return ConjunctiveSet(doc.get('tags', set()))

def value_encode(tags):
    if isinstance(tags, str):
        tags = {tags}
    return {"tags": {tag: True for tag in tags}}

trans = Pipe(
    wrap_kvs(
        key_of_id=itemgetter('ref'), 
        id_of_key=lambda x: {'ref': x}, 
        obj_of_data=value_decode,
        data_of_obj=value_encode,
        # preset=delete_if_exists,
    )
)

@trans
class MetadataTagStore(MongoCollectionUpdater):
    """To taging objs"""
    def __init__(self,
        mgc='test/tagged_objects',
        iter_projection=('ref',), 
        # iter_projection=None, #tuple({'ref': True, '_id': False}.objs()),
        **mgc_find_kwargs,
    ):
        # if iter_projection is not None:
        #     iter_projection = dict(iter_projection)
        super().__init__(
            mgc=mgc,
            iter_projection=iter_projection,
            **mgc_find_kwargs
        )


store = TagStore()

for k in store:
    del store[k]

TagsOfObjDacc(store)

test_tagger(TagsOfObjDacc(store))


In [44]:
# But test_tagger tests from an empty store. Let's test from a non-empty store:

test_metadata_based_tagger_mongo_store(MetadataTagStore)

# Appendices

### What objects are we storing?

Above, we used the word "object" (`obj1, obj2,...`) to denote what we're grouping or tagging, 
but note that in practice there'll be two cases that will need different implementations 
(and possibly interfaces): The object is a literal, or a reference. 

If objects are "complex", we'll want to store references (to the actual object's 
"content", which can then be resolved from the references). That is, when we say:

```python
assert tags['favorites'] == [obj1, obj2, obj3]
```

We really mean, at the base;

```python
assert ref_tags['favorites'] == [obj1_ref, obj2_ref, obj3_ref]
```

This `ref_tags` mapping can then be wrapped/enhanced, along with the `ref->obj` 
logic, to a mapping `obj_tags` that gives us acces to "resolved objects":

```python
assert obj_tags['favorites'] == [obj1, obj2, obj3]
```

But if we're doing something like "label (tag/annotation) groups" where we're doing something like

```python
tags['pizza/ingredients/normal'] += ['cheese', 'pepperoni']
tags['pizza/ingredients/abnormal'] += ['nutella', 'french fries']
```

It seems silly to store the string `'cheese'` somewhere, and then reference it. 
Though from a design point of view it may be cleaner. 
There's design patterns for this (todo: find and reference).


In [11]:
{'apple', 'sauce'}.issubset({'apple': 1, 'sauce': 2})

True

### Bidirectional mappings -- How do groups and tags work together?

Sometimes the perspective that makes sense is the "tags" perspective, 
i.e. being able to list existing tags and get a collection of objects given a tag:

```python
assert tags['favorites'] == [obj1, obj2, obj3]  # get objects for a given group
```

Sometimes the "objs" one makes more sense, 
i.e. being able to list objects and get a collection of tags given an object:

```python
assert {obj1, obj2, obj3}.issubset(objs)
assert 'favorites' in objs[obj1]
assert 'favorites' in objs[obj2]
assert 'favorites' in objs[obj3]
```

Sometimes we need to have (and the hard part: maintain, synched) both:

```python
objs = tags_to_objs(tags)
tags = objs_to_tags(objs)
```

The `tags_to_objs` and `objs_to_tags` functions above are not meant to return 
"static" `groups` and `tags` objects, but rather return objects from which we can get 
a "dynamic" view. For example:

```python
tags['favorites'] = [obj1]  # let's start with just one object in favorites

assert tags['favorites'] == [obj1] 
# get a tags view of groups
objs = tags_to_objs(tags)
assert 'favorites' in objs[obj1]
assert 'favorites' not in objs[obj2]
assert 'favorites' not in objs[obj3]  # obj3 is not in the 'favorites' group (yet)

# but (dynamic demo) if we add an object to groups
groups['favorites'].append(obj2)
assert groups['favorites'] == [obj1, obj2]  # we see it in groups
assert 'favorites' in tags[obj2]  # but also in tags (without calling groups_to_tags again)
tags[obj3].append('favorites')  # and also, if we tag obj3... 
assert groups['favorites'] == [obj1, obj2, obj3]  # ... we'll see the effect of this in groups
```

Might want to have a look at [relativity](https://github.com/kurtbrose/relativity) 
for some ideas regarding this two-way mapping. The README is very satisfying for 
a mathematician; it's very much an "axiomatic" approach to the problem.

Note that some DBs already have support for this kind of "(maintained) two way mapping", 
in the form of "indexing". 
The `groups_to_tags` and `tags_to_groups` would have to take that into account.

### Questions

* What resources might help us navigate this problem intelligently? Things like keywords, design patterns, and (light weight) third party tools might help us...
* What's the abstract or base class for `groups` and `tags` instances?
* How do we make these so that these are open-closed (e.g. "plug-in architecture") support for specifying back-end implementation details (e.g. using what MongoDB might have to offer to help out)? Namely, we'll want to take advantage of DB particulars to enable efficient `groups_to_tags` and `tags_to_groups`. 
* What supporting "helpers" should we implement to make the common group and tagging operations easy to perform. For example, we used the path `pizza/ingredients/normal` in one of our examples, and like all "path systems", we have a natural (nested) grouping, via folders and subfolders. Do we implement the `pizza` group and `ingredients` group as actual groups here, or do we just enable a "groups" view to the path strings?
* How do we enable "groups of groups" recursiveness? That is, we'll want to make groups of groups, and groups of groups of groups, and have the ability to work at the level we need to (for example, might want to "flatten" a groups of groups view to a "groups of objects" view).


# Historical sections

The interface for grouping/tagging was significantly changed, but keeping the original proposal below.

## A little digression on scaffolding

In [None]:
from typing import MutableMapping, Iterable, Any, NewType, Callable

Tag = NewType('Tag', str)
Obj = NewType('Obj', Any)


What functionality do we want around groups and their objs?

Let's express this through type annotations of some functions (which we'll encapsulate in a `Groups` class.

In [None]:
from typing import Optional
class Tagger:
    add_objs_to_tag: Callable[[Iterable[Obj], Tag], Any]
    tags: Callable[[], Iterable[Tag]]
    objs: Callable[[],Iterable[Obj]]
    

Note that this should now be sufficient to generate a scaffold of what we need. 

We'll do it in two ways: By (dynamically) creating a `typing.protocol` describing a concrete `Groups` object we could implement in the future, and by generating (the code string for) a concrete (but empty) such `Groups` class.

In [None]:
import i2

from meshed.scrap.annotations_to_meshes import (
    func_types_to_scaffold, 
    func_types_to_protocol
)

# Taggings.__annotations__ is a {name: func_annotation, ...} dict
# We can make a protocol from that
TagsProtocol = func_types_to_protocol(Tagger.__annotations__)

# See that tagingsProtocol has methods for each obj of the 
# Taggings.__annotations__ dict. Each method bares a signature compatible with 
# the annotations.
i2.Sig(TagsProtocol.add_objs_to_tag)
# <Sig (self, iterable: Iterable[__main__.Obj], taging: __main__.taging) -> Any>

<Sig (self, iterable: Iterable[__main__.Object], tag: __main__.Tag) -> Any>

In [None]:
# We can also 
print(func_types_to_scaffold(Tagger.__annotations__))


class GeneratedClass:
    def add_objs_to_tag(self, iterable: Iterable, tag: Tag) -> Any:
    	pass

    def tags(self) -> Iterable:
    	pass

    def objs(self) -> Iterable:
    	pass



## TDD: Tests that describe the behavior we want

In [None]:
from typing import MutableMapping, Iterable, Any, NewType, Callable, Protocol

taging = NewType('taging', str)
Obj = NewType('Obj', Any)


class tagingsProtocol(Protocol):
    def add_objs_to_taging(self, iterable: Iterable, taging: taging) -> Any:
        """Add one or several objs to a taging"""

    def list_tagings(self) -> Iterable:
        """List taging names"""

    def objs_for_taging(self, taging: taging) -> Iterable:
        """List the objs in a taging"""
    
    
def test_tagings(tagings: tagingsProtocol):
    # the following assertion isn't part of the behavior we want -- just a condition we'll 
    # need to be able to conduct our test: Namely, that our collection of tagings/objs is empty.
    assert list(tagings.list_tagings()) == []  # make sure test is well setup
    
    tagings.add_objs_to_taging('taging_a', 'obj_1', 'obj_2')
    assert sorted(tagings.list_tagings()) == ['taging_a']
    
    tagings.add_objs_to_taging('taging_b', 'obj_3')
    assert sorted(tagings.list_tagings()) == ['taging_a', 'taging_b']
    assert sorted(tagings.objs_for_taging('taging_a')) == ['obj_1', 'obj_2']
    

Now we'll implement two concrete `GroupsDacc`, using a store, a `MutableMapping`, as a back-end so as to keep the persistance concern still separate. 

The idea is: As long as we provide our concrete persister with the right `MutableMapping` facade (with a minimum of specifics/semantics such as what the keys and values are meant to be), we should have a working object.

The two `GroupsDacc` options will differ on the particulars of the store. 
- In the first, we'll assume the store has objs as keys and groups as values. 
- In the second we'll assume the groups are the keys, and values are sets of objs of that group.

## Concrete GroupsDacc (option 1): ItemGroupDacc

In [None]:
from typing import MutableMapping, Iterable, Any, NewType
from dataclasses import dataclass

taging = NewType('taging', str)
Obj = NewType('Obj', Any)
ObjtagingPairs = NewType('ObjtagingPairs', MutableMapping[Obj, taging])

@dataclass
class ObjtagingDacc:
    store: ObjtagingPairs
        
    def add_objs_to_taging(self, taging: taging, *objs: Iterable[Obj]) -> Any:
        for obj in objs:
            self.store[obj] = taging

    def list_tagings(self) -> Iterable[taging]:
        return set(self.store.values())

    def objs_for_taging(self, taging: taging) -> Iterable[Obj]:
        # TODO: Exxpress this filtering in such a way that will allow us to take advantage of DB specifics
        #  (e.g., passing on the filtering to the DB instead of filtering in python itself)
        return (obj for obj, taging_ in self.store.objs() if taging_ == taging)
    

In [None]:
store = dict()
test_tagings(ObjtagingDacc(store))  # a dict works!

## Concrete GroupsDacc (option 2): GroupSetsDacc

In [None]:
from typing import MutableMapping, Iterable, Any, NewType, Set
from dataclasses import dataclass

taging = NewType('taging', str)
Obj = NewType('Obj', Any)
tagingSets = NewType('tagingSets', MutableMapping[taging, Set[Obj]])

@dataclass
class tagingSetsDacc:
    store: tagingSets
        
    def add_objs_to_taging(self, taging: taging, *objs: Iterable[Obj]) -> Any:
        self.store[taging] |= set(objs)

    def list_tagings(self) -> Iterable[taging]:
        return set(self.store)
            
    def objs_for_taging(self, taging: taging) -> Iterable[Obj]:
        return self.store[taging]
    

In [None]:
from collections import defaultdict

store = defaultdict(set)
test_tagings(tagingSetsDacc(store))  # a defaultdict(set) works as a store!

## Actual persisting stores using mongo

In [None]:
# from mongodol.base import MongoClient

### ItemGroupPairs (for ItemGroupDacc)

Options for implementing:

```
s[obj] = group
```

Option 1: But here we'd need to produce the ID on write

```
--> {'_id': ID, 'group': group', 'obj': obj}
```

Option 2: But we need to allow re-writes on `_id`

```
--> {'_id': obj, 'group': group}
```


In [None]:
from operator import itemgetter

from dol import wrap_kvs, Pipe
from mongodol.stores import MongoStore


# To be able to overwrite an existing (obj, taging) pair (by default MongoStore doesn't allow it)
def delete_if_exists(self, k, v):
    if k in self:
        del self[k]
    return v


trans = Pipe(
    wrap_kvs(
        key_of_id=itemgetter('_id'), 
        id_of_key=lambda x: {'_id': x}, 
        obj_of_data=itemgetter('taging'),
        data_of_obj=lambda x: {'taging': x}, 
        preset=delete_if_exists,
    )
)

@trans
class tagingStore(MongoStore):
    """To taging objs"""
    def __init__(self,
        db_name='scrap',
        collection_name='taging_objs',
        mongo_client_kwargs=None,
    ):
        super().__init__(
            db_name=db_name,
            collection_name=collection_name,
            key_fields=['_id'],
            data_fields=['taging'],
            mongo_client_kwargs=mongo_client_kwargs
        )
    
m = tagingStore()
list(m)

[]

In [None]:
store = tagingStore()

# empty the store
for k in store: 
    del store[k]

test_tagings(ObjtagingDacc(store))

### GroupSets (for GroupSetsDacc)

Options for implementing:

```
s[group] |= objs
```

Option 1: But here we'd need to produce the ID on write

```
--> {'_id': group, 'objs': objs}
```

Option 2: But we need to allow re-writes on `_id`

```
--> {'_id': ID, 'group': group, 'obj': obj}  # (group, objs) -> (group, obj_1), (group, obj_2), ...
```


In [None]:
from operator import itemgetter

from dol import wrap_kvs, Pipe
from mongodol.stores import MongoStore


# To be able to overwrite an existing (obj, taging) pair (by default MongoStore doesn't allow it)
def delete_if_exists(self, k, v):
    if k in self:
        del self[k]
    return v


trans = Pipe(
    wrap_kvs(
        key_of_id=itemgetter('_id'), 
        id_of_key=lambda x: {'_id': x}, 
        obj_of_data=Pipe(itemgetter('objs'), set),
        data_of_obj=lambda x: {'objs': list(x) if not isinstance(x, str) else [x]}, 
#         preset=delete_if_exists,
    )
)

@trans
class ObjsStore(MongoStore):
    """To taging objs"""
    def __init__(self,
        db_name='scrap',
        collection_name='objs_taging',
        mongo_client_kwargs=None,
    ):
        super().__init__(
            db_name=db_name,
            collection_name=collection_name,
            key_fields=['_id'],
            data_fields=['objs'],
            mongo_client_kwargs=mongo_client_kwargs
        )
        
    def __missing__(self, k):
        return {'objs': []}
    

In [None]:
store = ObjsStore()

# empty the store
for k in store: 
    del store[k]

test_tagings(tagingSetsDacc(store))

## Implementation that uses "metadata" collection

Say you already have a mongo collection that contains meta-data on your items. 
That is, a collection that contains docs, one per item, that is intended to record information about this item. 
The groups the item belongs to can be just one additional one. 

In [None]:
metadata_docs = [
    {
        "_id": "123",
        "ref": "absolute/reference/to/content",
        "some": "other metadata",  # just to show there can be other stuff
        "tagings": {
            # instead of a list, we'll use an object (dict), whose fields are the taging names
            # This is because mongoDB allows us to index fields, therefore automatically 
            # get the bidirectional mapping from tagings to refs the taging "contains"
            "taging1": True,
            "taging2": True,
        }
    },
    {
        "_id": "456",
        "ref": "absolute/reference/to/some/other/content",
        "tagings": {
            "taging1": True,
            "taging3": True,
        }
    },
    {
        "_id": "789",
        "ref": "this/ref/is/necessary",
        "optional": "metadata",
        # and not tagings here (but whenever someone/something adds a taging, it will be added here)
    }
]