## Experiment logger
The logger stores experimental data in a single SQLite database. It is intended to be fast and lightweight, but record all necessary meta data and timestamps for experimental trials.

Most of the entries are stored as JSON strings in the database tables; any object that can be serialised by Python's `json` module can be added directly.

### Connecting to the database

In [27]:
from experimentlog import ExperimentLog, np_to_str, str_to_np
import numpy as np
## open a connection to a database; will be created if it does not exist.
# here we use a memory database so the results are not stored to disk
e = ExperimentLog(":memory:")

### Setting up the database
When a log is set up for the first time, the database needs to be configured for the experimental sessions. 

Each sensor/information **stream** needs to be registered with the database. This could be individual sensors like a mouse (x,y) time series, or questionnaire results.
Every entry in the log stream maintains the time, tag and a "valid" flag for each entry, plus the JSON representing the data logged.

In [28]:
# check if we've already set everything up
if e.get_stage()=="init":
    
    # the mouse time series, 
    e.register_stream(name="mouse", description="A time series of x,y cursor positions",
                   # the data is optional, and can contain anything you want 
                  data={
                    "sample_rate": 60,
                    "dpi": 3000,
                    "mouse_device":"Logitech MX600"})
    
    # a matrix of some kind
    e.register_stream(name="important_matrix", description="A matrix")
    
    # and a post-condition questionnaire
    e.register_stream(name="satisfaction", 
                   description="A simple satisfaction score",
                   # here, we store the questions used for future reference
                  data={
                    "questions":["How satisfied were you with your performance?",
                                "How satisfied were you with the interface?"]}
                    )


## Sessions
**ExperimentLog** uses the concept of *sessions* to manage experimental data. Sessions are much like folders in a filesystem and usually form a hierarchy, for example:
    
    /
        Experiment1/
            ConditionA/
                Rep1/
                Rep2/
                Rep3/
            ConditionB/
                Rep1/
                Rep2/
                Rep3/
                
        Experiment 2
            ConditionA/
                Rep1/
                Rep2/
                Rep3/
                Rep4/
            ConditionC/
                Rep1/
                Rep2/
                Rep3/
                Rep4/
    

Each *session* type has **metadata** attached to it; for example giving the parameters for a given condition. 

When an experiment is run, **instances** of sessions are created, like files inside the filesystem.

In [29]:
if e.get_stage()=="init":
    # We'll register an experiment, with three different conditions
    e.register_session("Experiment", description="The main experiment", 
                           data={"target_size":40.0, "cursor_size":5.0})
    e.register_session("ConditionA",description="Condition A:circular targets", 
                           data={"targets":["circle"]})
    e.register_session("ConditionB", description="Condition B:square targets", 
                           data={"targets":["square"]})
    e.register_session("ConditionC", description="Condition C:mixed targets", 
                           data={"targets":["circle","mixed"]})


We'd usually only want to do this once-ever; this setup procedure can be recorded by changing the database **stage**:

In [30]:
# mark the database as ready to log data
e.set_stage("setup")

## Users
Each instance of a session (usually) involves experimental subjects. Each user should be registered, and then attached to a recording session. Multiple users can be attached to one session (e.g. for experiments with groups) but normally there will just be one user.

The `pseudo` module can generate pronounceable, random, verifiable pseudonyms for subjects.


In [31]:
import pseudo
user = pseudo.get_pseudo()
print user

RACAN-UNATA


In [32]:
# now register the user with the database
e.register_user(name=user, user_vars={"age":30, "leftright":"right"})


'RACAN-UNATA'

## Runs
Finally, each **run** of the experimental session is logged. If there are any variables that change on a per-run basis (e.g. calibration parameters) they can be stored here.

The experimenter running this trial should be specified for each run.

In [33]:
e.start_run(experimenter="JHW")

In [34]:
# attach the user to this experimental run
e.add_active_user(user)
# enter conditionA of experiment
e.enter_session("Experiment")
e.enter_session("ConditionA")
e.enter_session() # Unnammed sessions are assumed to repetitions

In [35]:
print e.session_path


/Experiment/ConditionA/None


1

In [36]:
# log some data
e.log("mouse", data={"x":0, "y":10})
e.log("mouse", data={"x":0, "y":20})

2

Test how fast we can write into the database:

In [37]:
%%timeit -n 50000
e.log("mouse", data={"x":20, "y":20})

50000 loops, best of 3: 13.5 µs per loop


In [38]:
# log questionnaire output
e.log("satisfaction", data={"q1":4,"q2":5})

150003

In [39]:
# leave this repetition
e.leave_session() 

# move out of condition A
e.leave_session()

In [40]:
e.enter_session("ConditionB")

In [41]:
# could log more stuff...

In [42]:
x = np.random.uniform(-1,1,(16,16))
i = e.log("important_matrix")
# if we need to attach binary data to a log file (e.g. an image), we can do this:
# in general, it is best to avoid using blobs unless absolutely necessary
e.attach_blob(i, np_to_str({"matrix":(x)}))

In [43]:
e.leave_session()
e.leave_session()
# back to the root

In [44]:
e.end_run() # end the run

In [45]:
# print some results with raw SQL queries
mouse_log = e.cursor.execute("SELECT time, json FROM mouse", ())
print "\n".join([str(m) for m in mouse_log.fetchone()])

1454934544.76
{"y": 10, "x": 0}


In [46]:
import report

In [47]:
print report.string_report(e.cursor)

# Report generated for none

----------------------------------------
#### Report date: Mon Feb 08 12:29:07 2016

----------------------------------------

## Runs
* Number of runs: 1
* Total duration recorded: 2.2 seconds
* Dirty exits: 0

----------------------------------------

## Sessions

#### /Experiment
* Runs: 1
* Duration recorded: 2.19400000572 seconds

#### /Experiment/ConditionA
* Runs: 1
* Duration recorded: 2.12700009346 seconds

#### /Experiment/ConditionA/None
* Runs: 1
* Duration recorded: 2.125 seconds

#### /Experiment/ConditionB
* Runs: 1
* Duration recorded: 0.050999879837 seconds

----------------------------------------

## Users
* Unique users: 1


#### RACAN-UNATA
**JSON** 
 
        {
            "age": 30,
            "leftright": "right"
        }
        
Duration recorded: 6.49699997902 seconds
Paths recorded:
	/Experiment
	/Experiment/ConditionA
	/Experiment/ConditionA/None
	/Experiment/ConditionB
----------------------------------------

## Log
* Log st

### Post-processing
Once all data is logged, it is wise to add indices so that logs can be accessed quickly.

In [48]:
# should only do this when all data is logged; otherwise there may be
# a performance penalty
e.add_indices()

## SQL format
There are a few basic tables in the ExperimentLog:

#### Metadata
    meta: 
        id, Unique ID
        mtype,    Type of this metadata: one of LOG, SESSION, USER, PATH
        name,     Name of the object, e.g. user pseudonyn
        type,     (Optional) type tag
        description, (Optional) text description
        json       (Optional) JSON string holding any other metadata.

The metadata for a log, session or user, path. `mtype` specifies the kind of metadata it is. There are convenience views of this table:

    log_stream
    users
    session_meta
    paths

All have the same fields as above.

#### Session

        session: 
            id,          Unique ID
            start_time,  Time this session was started
            end_time,    Time this session was completed (if it was)
            last_time,   Last time a log was written for this session
            test_run,    If this is a test run or not
            random_seed, Random seed used for this session can be stored here
            valid,       If this session was marked valid or not
            complete,    If this session was marked completed or not
            parent,      ID of the session this session is a subsession of
            path,        ID of the full path this session belongs to
            json,        Any additional metadata

        
       run_session: (maps sessions to runs)
           id,           Unique ID
           run,          ID of the run
           session,      ID of the session
           
       user_session: (maps users to sessions)
           id,           Unique ID
           session,      ID of the session
           user,         ID of the user
           role,         (optional) String giving the role the user plays
           json,         (optional) Any session-specific user variables.
           
#### Logs

    log:
        id,         Unique ID
        time,       Timestamp
        valid,      Valid flag for this data (e.g. to mark faulty sensor data)
        stream,     ID of the stream this log belongs to
        session,    ID of the session this log entry belongs to
        json,       The log entry itself        
        tag,        (optional) tag for this log entry
        
        
When a new log is created, it also creates a new view with the given stream name; so `e.register_log("mouse")` creates a new view table `mouse` with the same fields as `log`.


    blob:
        id,         Unique ID
        blob,       Binary blob 
        log,        Log this blob attaches to
        

### Misc

    children: (maps sessions to all of their children sessions)
        id,     Unique ID
        parent, ID of the parent session
        child,  ID of the child session

This is automatically filled in by `enter_session()`/`leave_session()`
       
    setup:
        id,      Unique ID
        time,    Time of the stage change
        stage,   Stage of the database

This is normally only used to record whether the database is fully setup or not.



### Custom tables
If you want to log values with a **custom** table where the fields are not just plain JSON, you can add a new table to the database and just attach it to the log fields. The `log()` function returns the ID of the new log entry; use this as a foreign key in the new log table.

Example:




In [49]:
# make the new table -- must have a reference to the main
# log table
e.execute("""CREATE TABLE accelerometer 
          (id INTEGER PRIMARY KEY, device INT, x REAL, y REAL, z REAL, log INT,
          FOREIGN KEY(log) REFERENCES log(id))
          """)

# register a new stream
e.register_stream(name="acc", description="A time series of accelerometer values")

# now log a new value, put it into the separate accelerometer table and link
# it to the main log
def log_acc(dev,x,y,z):
    log_id = e.log("acc")
    e.execute("INSERT INTO accelerometer VALUES (?,?,?,?,?)", 
              (dev, x, y, z, log_id))
