Skip to content

Commit

Permalink
Create a Raw Data Table (#165)
Browse files Browse the repository at this point in the history
Summary:
## Changes description.

This PR adds three new tables in the database to store raw data collected during experiments. The new tables are shown in the following figure:

<p align="center">
  <img alt="Database structure" src="https://i.imgur.com/lr7NTSj.png" width="80%">
</p>

The specific changes in the code are in 6 different files, described below:

1. `aepsych/database/tables.py`: Add three new tables structures to the database:
    1. Raw data table: fact table for raw data collected during experiments.
    2. Param table: dimension table for parameters used in the experiment.
    3. Outcome table: dimension table for outcomes of the experiment.

2. `aepsych/database/db.py`: Integrates these new tables into the database. Methods to save data, retrieve data, and update the from the tables are included. Also, there is a new method `generate_experiment_table`, which generates a new table with raw data for a given experiment id. This table is created within the database and intended to make it easy to access an experiment's data. An example of this table structure:

| iteration_id 	| theta1 	| theta2 	| outcome_0 	| outcome_1 	|                  timestamp 	|
|-------------:	|-------:	|-------:	|----------:	|----------:	|---------------------------:	|
|            1 	|    1.0 	|    0.0 	|       0.0 	|     -10.0 	| 2022-10-17 10:52:23.757922 	|
|            2 	|    3.0 	|    2.0 	|       1.0 	|      -7.5 	| 2022-10-17 10:52:23.857462 	|
|            3 	|    5.0 	|    4.0 	|       0.0 	|      -5.0 	| 2022-10-17 10:52:23.949986 	|
|            4 	|    7.0 	|    6.0 	|       1.0 	|      -2.5 	| 2022-10-17 10:52:24.041166 	|

4. `aepsych/server/server.py`: Saves experiment results when the `tell()` method is used in the server class, and it's not a replay.
5. `aepsych/tests/test_db.py`: Add tests for the new tables.
6. `aepsych/tests/test_integration`: These tests check that the server can handle different experiments (multi/single stimuli, multi/single outcome). They ensure that the data is correctly stored in the database tables (raw, param, and outcome). It also checks that the experiment table is correctly populated (generate_experiment_table method).

The PR #184 adds the descriptions of the DB changes to the `For Developers/Data Overview section`.

## Associated Issue
Fixes #34

## To-Do

- [x] Implement tests for new table.
- [x] Create `--update` method to apply these changes to old databases.
    - This goes in a new PR. Associated branch: https://github.com/GabrielMissael/aepsych/tree/update_database
- [x] The outcome is not necessarily binary!
- [x] Allow multiple stimuli per experiment
- [x] Allow multiple outcomes per experiment

Pull Request resolved: #165

Reviewed By: crasanders

Differential Revision: D40518350

fbshipit-source-id: 12fda3a8db253b14108b27ddfe44bea6cedcf945
  • Loading branch information
GabrielMissael authored and facebook-github-bot committed Nov 16, 2022
1 parent da07de4 commit c9a29a1
Show file tree
Hide file tree
Showing 8 changed files with 681 additions and 8 deletions.
95 changes: 95 additions & 0 deletions aepsych/database/db.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,9 @@ def is_update_required(self):
or tables.DbReplayTable.requires_update(self._engine)
or tables.DbStratTable.requires_update(self._engine)
or tables.DbConfigTable.requires_update(self._engine)
or tables.DbRawTable.requires_update(self._engine)
or tables.DbParamTable.requires_update(self._engine)
or tables.DbOutcomeTable.requires_update(self._engine)
)

def perform_updates(self):
Expand All @@ -75,6 +78,9 @@ def perform_updates(self):
tables.DbReplayTable.update(self._engine)
tables.DbStratTable.update(self._engine)
tables.DbConfigTable.update(self._engine)
tables.DbRawTable.update(self._engine)
tables.DbParamTable.update(self._engine)
tables.DbOutcomeTable.update(self._engine)

@contextmanager
def session_scope(self):
Expand Down Expand Up @@ -150,6 +156,63 @@ def get_config_for(self, master_id):
return master_record.children_config[0].config
return None

def get_raw_for(self, master_id):
"""Get the raw data for a specific master row."""
master_record = self.get_master_record(master_id)

if master_record is not None:
return master_record.children_raw

return None

def get_all_params_for(self, master_id):
"""Get the parameters for all the iterations of a specific experiment."""
raw_record = self.get_raw_for(master_id)
params = []

if raw_record is not None:
for raw in raw_record:
for param in raw.children_param:
params.append(param)
return params

return None

def get_param_for(self, master_id, iteration_id):
"""Get the parameters for a specific iteration of a specific experiment."""
raw_record = self.get_raw_for(master_id)

if raw_record is not None:
for raw in raw_record:
if raw.unique_id == iteration_id:
return raw.children_param

return None

def get_all_outcomes_for(self, master_id):
"""Get the outcomes for all the iterations of a specific experiment."""
raw_record = self.get_raw_for(master_id)
outcomes = []

if raw_record is not None:
for raw in raw_record:
for outcome in raw.children_outcome:
outcomes.append(outcome)
return outcomes

return None

def get_outcome_for(self, master_id, iteration_id):
"""Get the outcomes for a specific iteration of a specific experiment."""
raw_record = self.get_raw_for(master_id)

if raw_record is not None:
for raw in raw_record:
if raw.unique_id == iteration_id:
return raw.children_outcome

return None

def record_setup(
self,
description,
Expand Down Expand Up @@ -216,6 +279,38 @@ def record_message(self, master_table, type, request) -> None:
self._session.add(record)
self._session.commit()

def record_raw(self, master_table, model_data):
raw_entry = tables.DbRawTable()
raw_entry.model_data = model_data

raw_entry.timestamp = datetime.datetime.now()
raw_entry.parent = master_table

self._session.add(raw_entry)
self._session.commit()

return raw_entry

def record_param(self, raw_table, param_name, param_value) -> None:
param_entry = tables.DbParamTable()
param_entry.param_name = param_name
param_entry.param_value = param_value

param_entry.parent = raw_table

self._session.add(param_entry)
self._session.commit()

def record_outcome(self, raw_table, outcome_name, outcome_value) -> None:
outcome_entry = tables.DbOutcomeTable()
outcome_entry.outcome_name = outcome_name
outcome_entry.outcome_value = outcome_value

outcome_entry.parent = raw_table

self._session.add(outcome_entry)
self._session.commit()

def record_strat(self, master_table, strat):
strat_entry = tables.DbStratTable()
strat_entry.strat = strat
Expand Down
131 changes: 130 additions & 1 deletion aepsych/database/tables.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,16 @@

from aepsych.config import Config
from aepsych.version import __version__
from sqlalchemy import Column, DateTime, ForeignKey, Integer, PickleType, String
from sqlalchemy import (
Boolean,
Column,
DateTime,
Float,
ForeignKey,
Integer,
PickleType,
String,
)
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import relationship, sessionmaker

Expand Down Expand Up @@ -60,6 +69,7 @@ class DBMasterTable(Base):
children_replay = relationship("DbReplayTable", back_populates="parent")
children_strat = relationship("DbStratTable", back_populates="parent")
children_config = relationship("DbConfigTable", back_populates="parent")
children_raw = relationship("DbRawTable", back_populates="parent")

@classmethod
def from_sqlite(cls, row):
Expand Down Expand Up @@ -323,3 +333,122 @@ def update(engine):
@staticmethod
def requires_update(engine):
return False


class DbRawTable(Base):
"""
Fact table to store the raw data of each iteration of an experiment.
"""
__tablename__ = "raw_data"

unique_id = Column(Integer, primary_key=True, autoincrement=True)
timestamp = Column(DateTime)
model_data = Column(Boolean)

master_table_id = Column(Integer, ForeignKey("master.unique_id"))
parent = relationship("DBMasterTable", back_populates="children_raw")
children_param = relationship("DbParamTable", back_populates="parent")
children_outcome = relationship("DbOutcomeTable", back_populates="parent")

@classmethod
def from_sqlite(cls, row):
this = DbRawTable()
this.unique_id = row["unique_id"]
this.timestamp = row["timestamp"]
this.model_data = row["model_data"]
this.master_table_id = row["master_table_id"]

return this

def __repr__(self):
return (
f"<DbRawTable(unique_id={self.unique_id})"
f", timestamp={self.timestamp} "
f", master_table_id={self.master_table_id})>"
)

@staticmethod
def update(engine):
logger.info("DbRawTable : update called")

@staticmethod
def requires_update(engine):
return False


class DbParamTable(Base):
"""
Dimension table to store the parameters of each iteration of an experiment.
Supports multiple parameters per iteration, and multiple stimuli per parameter.
"""
__tablename__ = "param_data"

unique_id = Column(Integer, primary_key=True, autoincrement=True)
param_name = Column(String(50))
param_value = Column(Float)

iteration_id = Column(Integer, ForeignKey("raw_data.unique_id"))
parent = relationship("DbRawTable", back_populates="children_param")

@classmethod
def from_sqlite(cls, row):
this = DbParamTable()
this.unique_id = row["unique_id"]
this.param_name = row["param_name"]
this.param_value = row["param_value"]
this.iteration_id = row["iteration_id"]

return this

def __repr__(self):
return (
f"<DbParamTable(unique_id={self.unique_id})"
f", iteration_id={self.iteration_id}>"
)

@staticmethod
def update(engine):
logger.info("DbParamTable : update called")

@staticmethod
def requires_update(engine):
return False


class DbOutcomeTable(Base):
"""
Dimension table to store the outcomes of each iteration of an experiment.
Supports multiple outcomes per iteration.
"""
__tablename__ = "outcome_data"

unique_id = Column(Integer, primary_key=True, autoincrement=True)
outcome_name = Column(String(50))
outcome_value = Column(Float)

iteration_id = Column(Integer, ForeignKey("raw_data.unique_id"))
parent = relationship("DbRawTable", back_populates="children_outcome")

@classmethod
def from_sqlite(cls, row):
this = DbOutcomeTable()
this.unique_id = row["unique_id"]
this.outcome_name = row["outcome_name"]
this.outcome_value = row["outcome_value"]
this.iteration_id = row["iteration_id"]

return this

def __repr__(self):
return (
f"<DbOutcomeTable(unique_id={self.unique_id})"
f", iteration_id={self.iteration_id}>"
)

@staticmethod
def update(engine):
logger.info("DbOutcomeTable : update called")

@staticmethod
def requires_update(engine):
return False
Loading

0 comments on commit c9a29a1

Please sign in to comment.