# `datalink`

`datalink` is a python module that lets you interact with entries of SQL data as if you were using dictionaries.

Loading, saving, unique identification, and database management all take place behind the scenes, so the user doesn't need to worry about databases at all.

In [1]:
import logging
import warnings
import datalink
import sys
import random
warnings.filterwarnings('ignore')

## Logging
`datalink` supports the `logging` module

In [2]:
logging.basicConfig(format='%(levelname)s | %(message)s',
                    level=logging.DEBUG,
                    stream=sys.stdout)
log = logging.getLogger(__name__)
datalink.test_output()

INFO | Test logging output from datalink.


Datalink logging can be disabled with
```python
logging.getLogger('datalink').propagate = False
```

# Creating user datalink classes

The `datalink` module's `factory` function is used to create new classes that the user can utilise,
in a similar fashion to `namedtuple`-based classes from the `collections` module.

Some required arguments are:
- `name`: the new class's name
- `db_path`: the simple file path to the database, without any dialect-specific protocol.
- `table_name`: the name of the table to be saved to in the database.

None of the above database structures have to exist already - `datalink` will create them for you if necessary.

The last required field is the `fields` property.
This must be a mapping, where each database column name (and subsequent data-store variable) is a key, and the value is the default value.

e.g.
```python
my_album_fields = {'title': '', 'artist': '', 'tracks': []}
```

Any properties here are said to comprise the "data-store".

If any value is to be mutable, e.g. a `list` or `dict`, then the correct python typing of that value must be given as a default, e.g. ```[]``` for a list.

Immutable types can be defaulted, and changed, to any value specified, including `None`.

In [3]:
my_order_fields = {
    'client_name': None,
    'shipping_address': None, 
    'items': [],
    'cost': 0.0
    }

Order = datalink.factory(name='Order', db_path='/tmp/my_ledger.db',
                         table_name='orders',
                         fields=my_order_fields)

### Creating and manipulating `Order` instances

Now let's make an instance of the `Order` class with the default settings.

In [4]:
o = Order()

INFO | sqlite db created at path: /tmp/my_ledger.db
DEBUG | Creating new database entry with id 0fe72f31-4e59-4bc4-9691-d490b6f2277e.


As this is the first `Order` instance we have created, the associated database is created.
The new order is also saved to the database immediately upon creation.
The order is automatically assigned a uuid as its identifier, accessible through the read-only `id` property.

Now lets alter some of the data properties of the `Order`.

In [5]:
print(o.client_name, o.shipping_address, o.cost, o.items)
o.client_name = 'Alice Smith'
o.shipping_address = '123 Leaf Avenue, Sometown'
o.items.append('bracket')
o.cost += 11.50

None None 0.0 []
DEBUG | Updating existing database entry for id 0fe72f31-4e59-4bc4-9691-d490b6f2277e.
DEBUG | Updating existing database entry for id 0fe72f31-4e59-4bc4-9691-d490b6f2277e.
DEBUG | Updating existing database entry for id 0fe72f31-4e59-4bc4-9691-d490b6f2277e.
DEBUG | Updating existing database entry for id 0fe72f31-4e59-4bc4-9691-d490b6f2277e.


As soon as the assignent is made to the `Order` instance's data-store attributes, the appropriate database entry is updated.

The `update` method can be used with keywords to update multiple entries at once, with only 
a single write operation to the database.

In [6]:
o.update(shipping_address='123 Leaf Way, Sometown', client_name='Alice C Smith')

DEBUG | Updating existing database entry for id 0fe72f31-4e59-4bc4-9691-d490b6f2277e.


A dictionary containing all data-store variables is exposed through the read-only `data` property. 

In [7]:
o.data

{'client_name': 'Alice C Smith',
 'shipping_address': '123 Leaf Way, Sometown',
 'items': ['bracket'],
 'cost': 11.5,
 'id': '0fe72f31-4e59-4bc4-9691-d490b6f2277e'}

### Key-word data-store instantiation
Attributes in the data-store can be defined upon the instantiation of an `Order` by key-word.

In [8]:
o2 = Order(client_name='Beatrice Smith', address='456 Rock Drive, Someothertown',
          items=['paint_black', 'small_brush'], cost=22.40)
o2.data

DEBUG | Creating new database entry with id 9c54f9cb-6874-4838-ba37-36b9b77eafe2.


{'client_name': 'Beatrice Smith',
 'shipping_address': None,
 'items': ['paint_black', 'small_brush'],
 'cost': 22.4,
 'id': '9c54f9cb-6874-4838-ba37-36b9b77eafe2'}

### Loading persisted orders

We can use the identifier of our previous order as a positional argument to load it from the database.

In [9]:
print(o2.id)
o3 = Order(o2.id)

9c54f9cb-6874-4838-ba37-36b9b77eafe2
DEBUG | Loading data corresponding to ID: 9c54f9cb-6874-4838-ba37-36b9b77eafe2


In [10]:
o3.data

{'client_name': 'Beatrice Smith',
 'shipping_address': None,
 'items': ['paint_black', 'small_brush'],
 'cost': 22.4,
 'id': '9c54f9cb-6874-4838-ba37-36b9b77eafe2'}

If the user provides the order id as a positional argument, keyword arguments can also be supplied to update data-store attributes in a single expression.

In [11]:
Order(o3.id, cost=30.0)
o3.data

DEBUG | Loading data corresponding to ID: 9c54f9cb-6874-4838-ba37-36b9b77eafe2
DEBUG | Updating existing database entry for id 9c54f9cb-6874-4838-ba37-36b9b77eafe2.


<traits.has_traits.Order at 0x7f056c573a10>

{'client_name': 'Beatrice Smith',
 'shipping_address': None,
 'items': ['paint_black', 'small_brush'],
 'cost': 30.0,
 'id': '9c54f9cb-6874-4838-ba37-36b9b77eafe2'}

### Updates to interface items

My default the links are bidirectional, in that not only will a change to an individual Order push a result back to the database, but any changes made to the database entry are propagated back to all related links.

In [12]:
o3.items.append('paint_red')
print(o2.data) # data exposed here is guaranteed to be up-to-date.
print(o2.items)  # data exposed here is guaranteed to be up-to-date.

DEBUG | Updating existing database entry for id 9c54f9cb-6874-4838-ba37-36b9b77eafe2.
{'client_name': 'Beatrice Smith', 'shipping_address': None, 'items': ['paint_black', 'small_brush', 'paint_red'], 'cost': 30.0, 'id': '9c54f9cb-6874-4838-ba37-36b9b77eafe2'}
['paint_black', 'small_brush', 'paint_red']


To have updates only be pushed in the direction from the program using links to the database, pass ```bidirectional=False``` agrument to `datalink.factory`.

## User specified ids
For new entries, the user can supply an id themselves as the first argument, which will override the assigned uuid.

In [13]:
o4 = Order('mynewid', name='Bob Smith')

DEBUG | Creating new database entry with id mynewid.


### Metadata lookup with user specified ids

The class's boolean is overridden to detect if any data was loaded from the database, returning `True` if the internal data was loaded from the database, and `False` if the data within the instance is new.

This allows for user-specified ids to be used to record data/metadata which is intensive to calculate.

We give the example below of detecting particles in a detector with some efficiency response that is hard to compute:

In [14]:
# Make a new container to represent a particle.
particle_defaults = {'efficiency': None}
Particle = datalink.factory(name='Particle', table_name='particles',
                            db_path='/tmp/particles.db', fields=particle_defaults)

from functools import reduce
logging.getLogger().setLevel(logging.INFO) # suppress datalink debug logging

def assign_efficiency(part):
    '''Some dummy function that is time consuming.'''
    print(f'Doing an intensive calculation for {part.id} efficiency response.')
    part.efficiency = random.random()    

def get_efficiency(*particles):
    ''' Calculate the efficiency of detecting an event with the input particles'''
    d = {}
    for p in particles:
        if p not in d:
            part = Particle(p)
            if not part:
                assign_efficiency(part)
            d[p] = part
    efficiency = reduce(lambda x, y: x*y, [d[p].efficiency for p in particles])
    for p in set(particles):
        print(f'efficiency for {p} = {d[p].efficiency}')
    print(f'efficiency for {particles} = {efficiency}')

In [15]:
get_efficiency('proton', 'proton', 'electron')

INFO | sqlite db created at path: /tmp/particles.db
Doing an intensive calculation for proton efficiency response.
Doing an intensive calculation for electron efficiency response.
efficiency for electron = 0.52688131537186
efficiency for proton = 0.0989122453769629
efficiency for ('proton', 'proton', 'electron') = 0.005154813047705433


Now we calculate another efficiency and find that the stored values for protons are reused.

In [16]:
get_efficiency('proton', 'proton', 'muon')

Doing an intensive calculation for muon efficiency response.
efficiency for muon = 0.034614486368294
efficiency for proton = 0.0989122453769629
efficiency for ('proton', 'proton', 'muon') = 0.00033865540637927413


In [17]:
get_efficiency('electron', 'proton', 'muon')

efficiency for muon = 0.034614486368294
efficiency for electron = 0.52688131537186
efficiency for proton = 0.0989122453769629
efficiency for ('electron', 'proton', 'muon') = 0.0018039344399764397
