# Introduction to yabadaba: More complex Record details

This Notebook provides explanations and an example of a Record class for more complex data.

In [1]:
# Standard Python libraries
import datetime

# Import the main yabadaba package
import yabadaba

# Import the yabadaba_demo package that defines the Records to use
import yabadaba_demo

# Show yabadaba version
print('yabadaba version =', yabadaba.__version__)

# Show date of Notebook execution
print('Notebook executed on', datetime.date.today())

yabadaba version = 0.3.0
Notebook executed on 2025-02-13


## 1. Value types

The Value objects define what data elements are present in the full data schema and how those data elements are to be interacted with and transformed. Each value object can be associated with a singular data value or multiple values either as a list or in a more complex representation.

### 1.1. Built-in Value classes

The yabadaba package includes definitions for Value classes that encompass the basic data types.  These are immediately accessible to the Record._add_value() method by specifying the corresponding Value style.

List of the built-in Value styles included in yabadaba

- __'str'__ is for string values, typically ones that are short or have a limited number of allowed values.  When setting, the given value is automatically converted into a str. The default query is 'str_match', which returns True if the value exactly matches one of the given target values.   
- __'longstr'__ is for string values, typically ones that are longer or not restricted.  When setting, the given value is automatically converted into a str. The default query is 'str_contains', which returns True if the value contains one of the given target values.
- __'strlist'__ is for a list of string values. When setting, the given value will be converted into a list if needed, and all values within the list will be converted into strs.   The default query is 'list_contains', which returns True if the value contains one of the given target values.
- __'bool'__ is for boolean values. When setting, the value can be given as a bool or as a str that when converted to lowercase matches with 't', 'f', 'true', or 'false'.  The default query is 'bool_match', which returns True if the value matches a given target value.
- __'int'__ is for int values.  When setting, the given value is automatically converted into an int if possible. The default query is 'int_match', which returns True if the value exactly matches one of the given target values.   
- __'float'__  is for float values.  This has an additional 'unit' parameter associated with it that allows for automatic unit conversions when loading and saving. When setting, the given value is automatically converted into a float if possible and any unit conversions are performed. The default query is 'float_match', which returns True if the value exactly matches one of the given target values.
- __'date'__ is for date values. When setting, the given value is converted into a datetime.date() object if needed and possible. When the data model representation of the record is generated, the date is converted into a str of format 'YYYY-MM-DD'. The default query is 'date_match' which returns True if the value exactly matches one of the given target values expressed in the 'YYYY-MM-DD' format.
- __'month'__ is for month values.  When setting, the provided value can be given as an int from 1-12, the full month name, or the first three letters of the month name.  The default query is 'month_match', which returns True if the value matches one of the given target values expressed as month number ints.
- __'floatarray'__ is for numpy arrays of floats.  This has an additional 'unit' parameter associated with it that allows for automatic conversions of all values when loading and saving. When setting, the given value can be anything interpretable by numpy.asarray or a str representation of the array with optional units. The array value can be of any shape and dimensions supported by numpy.  There is no default query.
- __'record'__ is a special style that allows for Record classes to be used as values within other Record classes.  The value itself is a list of record objects, making this most suitable for the case where the higher level record contains multiple entries of the subrecord.  By default, all queries of the subrecord are automatically inherited by the higher level record.
- __'base'__ uses the base Value class which should work with any simple data type but provides no interpretation or default queries. This is left accessible as it could potentially be useful for testing and development, but is not recommended for normal use.

### 1.2. Defining custom Value classes

New Value classes can be easily defined if the built-in values don't adequately support your data.  The general steps for doing this closely parallels how new records are defined in your yabadaba-based package:

1. Add new Value class definitions to your package.  The general recommendation is to collect and Value class definitions in a 'value' subfolder, but this is not required.
2. import valuemanager from yabadaba in the parent/root module, i.e. the \_\_init\_\_.py file.
3. Use valuemanager.import_style() to dynamically import each new value and integrate it into yabadaba.

When defining new Value classes, you should inherit from yabadaba.value.Value. The parent Value class manages most of the interactions between value objects. For most new Value subclasses, the only functions and attributes that should be overriden are those listed below centered around data transformations and querying: 

- __set_value_mod()__ specifies any data type checks or transformations to perform when a value is being set.  This function takes the "raw" value and returns the processed one. 
- __build_model_value()__ specifies how to transform the value for inclusion in the record's model representation.  
- __load_model_value()__ specifies how to interpret the value from the record's model representation.  Note that during loading, the value is set meaning that set_value_mod() will be called after this.
- __metadata_value()__ specifies how to transform the value for inclusion in the record's metadata representation.
- __\_default_queries__ is a @property that defines a dict containing Query objects for the Value.  Note that you will probably need to define a new Query class for properly interacting with the model and metadata representations.

It is possible to override more fundamental methods for more complex Value handling. If you need to modify the following methods, be sure to copy from or call the corresponding parent method to avoid breaking any important functional components.  

- __\_\_init\_\_()__ creates a new Value object.  Overriding this is typically only done to add any new Value-class-specific keyword parameters.
- __build_model()__ specifies how to insert the value's build_model_value() into the record's model representation.
- __load_model()__ specifies how to locate the content in the record's model representation that should be interpreted by load_model_value().
- __metadata()__ specifies what fields are added to the record's metadata representation for the value.  By default, this creates a single entry named for the metadata key with the metadata_value().  Modifying this allows for multiple fields or no fields to be generated instead.


## 2. Example: Album record

The yabadaba_demo example package defines an 'album' record style that demonstrates a variety of the value options listed above.

### 2.1 Record definitions

The code in yabadaba_demo/record/album/Album.py contains definitions for two Record subclasses: Album and Track.  The Album class represents an album, which may contain multiple tracks represented by Track objects. 

(I'm aware that the real metadata is organized differently and almost exclusively assigns everything at the Track level.  The purpose of this is to provide a conceptual demonstration, not a fully fledged schema.)   

The code found in this file is shown below.

```Python
from yabadaba.record import Record

import pandas as pd
class Track(Record):

    ########################## Basic metadata fields ##########################

    @property
    def style(self):
        """str: The record style"""
        return 'track'

    @property
    def modelroot(self):
        """str: The root element of the content"""
        return 'track'
    
    ############################# Define Values  ##############################

    def _init_values(self):
        """
        Method that defines the value objects for the Record.  This should
        call the super of this method, then use self._add_value to create new Value objects.
        Note that the order values are defined matters
        when build_model is called!!!
        """
        self._add_value('str', 'title', description='track title')
        self._add_value('int', 'number', description='track number')
        self._add_value('time_delta', 'duration')
        self._add_value('longstr', 'lyrics')

class Album(Record):
    """
    Class for representing a music album.
    """

    ########################## Basic metadata fields ##########################

    @property
    def style(self):
        """str: The record style"""
        return 'album'

    @property
    def modelroot(self):
        """str: The root element of the content"""
        return 'album'
    

    ############################# Define Values  ##############################

    def _init_values(self):
        """
        Method that defines the value objects for the Record.  This should
        call the super of this method, then use self._add_value to create new Value objects.
        Note that the order values are defined matters
        when build_model is called!!!
        """
        self._add_value('longstr', 'artist', description='artist name')
        self._add_value('str', 'producer', description='producer name')
        self._add_value('str', 'album', description='album title')
        self._add_value('date', 'releasedate', description='release date')
        self._add_value('strlist', 'genre')

        self._add_value('record', 'tracks', recordclass=Track, description='List of album tracks')

        # Modify tracks queries
        self.get_value('tracks').queries.pop('number')
        
    def add_track(self, **kwargs):
        """Adds a new track to the tracks list"""
        self.get_value('tracks').append(**kwargs)

    def tracks_metadata(self) -> pd.DataFrame:
        """Compiles the tracks metadata for this album into a pandas.DataFrame"""
        df = []
        for track in self.tracks:
            df.append(track.metadata())
        df = pd.DataFrame(df)
        return df
```

A few things to note in the above code:

- As they are right now, the two Record class definitions largely consist of the style and modelroot attributes and the _init_values() method.
- The "tracks" value in the Record class is of style 'record' and specifies the recordclass.
- The get_value() method retrieves the value object by name allowing for interactions with the object's methods and attributes beyond simply the stored value.
- In Album's _init_values, the queries dict of tracks is modified by deleting the query for track number. Otherwise, all queries for values defined in the Track record will be automatically included in the queries for Album.
- A couple of additional methods have been added in support of the tracks value: add_track() builds a new Track object and appends it to the tracks list and tracks_metadata() generates a pandas.DataFrame of the tracks content. While methods like this are not required, they do tend to be convenient for working with 'record' style values.
- The Track value 'duration' is of style 'time_delta' which is not in the built-in Value classes.   

### 2.2. Value definition

As to the last bullet point above, the 'time_delta' value style is defined by a TimeDeltaValue class found in yabadaba_demo/value/TimeDeltaValue.py.  This is included for interpreting non-absolute time segments and properly converting them between datetime.timedelta objects and their str representations.

The code is shown below, which overrides only the set_value_mod() and build_model_value() methods:

```python
from datetime import timedelta

from yabadaba.value import Value
import pandas as pd

class TimeDeltaValue(Value):
    """Value object for time segments"""

    def set_value_mod(self, val):
        
        # Check if value is in #text
        val = self.set_value_mod_textfield(val)
        
        if val is None:
            return None
        elif not isinstance(val, timedelta):
            val = pd.to_timedelta(val)
        return val

    def build_model_value(self):
        return str(self.value)
```

### 2.3. Integrating the new Value and Record classes into yabadaba

The final step (besides debugging) is to integrate the new Value and Record classes into yabadaba using yabadaba's valuemanager and recordmanager, respectively.  This can be seen in yabadaba_demo/value/\_\_init\_\_.py and yabadaba_demo/record/\_\_init\_\_.py.

In yabadaba_demo/value/\_\_init\_\_.py:

```python
# Import valuemanager
from yabadaba import valuemanager

# from .TimeDeltaValue import TimeDeltaValue as the 'time_delta' value style
valuemanager.import_style('time_delta', '.TimeDeltaValue', __name__)
```

In yabadaba_demo/record/\_\_init\_\_.py:

```python
# Import recordmanager
from yabadaba import recordmanager

# Manually install records to recordmanager

# from .FAQ import FAQ as the 'FAQ' record style
recordmanager.import_style('FAQ', '.FAQ', __name__, 'FAQ')

# from .album import Album as the 'album' record style
recordmanager.import_style('album', '.album', __name__, 'Album')

# from .bad_record import BadRecord as the 'bad_record' style - which will be caught as an error
recordmanager.import_style('bad_record', '.bad_record', __name__, 'BadRecord')
```

And finally, in yabadaba_demo/\_\_init\_\_.py:

```python
from . import value
from . import record
```

### 2.4. Demonstrate Album in action

Check yabadaba modules for 'album' record style and 'time_delta' value style

In [2]:
yabadaba.check_modules()

Database styles that passed import:
- local: <class 'yabadaba.database.LocalDatabase.LocalDatabase'>
- mongo: <class 'yabadaba.database.MongoDatabase.MongoDatabase'>
- cdcs: <class 'yabadaba.database.CDCSDatabase.CDCSDatabase'>
Database styles that failed import:


Record styles that passed import:
- FAQ: <class 'yabadaba_demo.record.FAQ.FAQ'>
- album: <class 'yabadaba_demo.record.album.Album.Album'>
Record styles that failed import:
- bad_record: <class 'ModuleNotFoundError'>: No module named 'package_that_does_not_exist'


Query styles that passed import:
- bool_match: <class 'yabadaba.query.BoolMatchQuery.BoolMatchQuery'>
- str_contains: <class 'yabadaba.query.StrContainsQuery.StrContainsQuery'>
- str_match: <class 'yabadaba.query.StrMatchQuery.StrMatchQuery'>
- list_contains: <class 'yabadaba.query.ListContainsQuery.ListContainsQuery'>
- int_match: <class 'yabadaba.query.IntMatchQuery.IntMatchQuery'>
- float_match: <class 'yabadaba.query.FloatMatchQuery.FloatMatchQuery'>
- date_mat

Define an album with the common metadata, and add tracks with the add_track() method

In [3]:
album = yabadaba.load_record('album',
                             artist='Mad Season', 
                             album='Above',
                             releasedate='1995-03-14')

album.add_track(number=1, title='Wake Up', duration='00:07:37')
album.add_track(number=2, title='X-Ray Mind', duration='00:05:12')
album.add_track(number=3, title='River of Deceit', duration='00:05:03')
album.add_track(number=4, title="I'm Above", duration='00:05:45')
album.add_track(number=5, title='ARtificial Red', duration='00:06:16')
album.add_track(number=6, title='Lifeless Dead', duration='00:04:27')
album.add_track(number=7, title="I Don't Know Anything", duration='00:05:00')
album.add_track(number=8, title='Long Gone Day', duration='00:04:50')
album.add_track(number=9, title='November Hotel', duration='00:07:06')
album.add_track(number=10, title='All Alone', duration='00:04:13')

Note that releasedate is automatically transformed from a str into a datetime.date object, and similarly all track durations are transformed into datetime.timedelta objects

In [4]:
album.releasedate

datetime.date(1995, 3, 14)

In [5]:
album.tracks[0].duration

Timedelta('0 days 00:07:37')

Calling tracks_metadata() collects all Track metadata into a dataframe 

In [6]:
album.tracks_metadata()

Unnamed: 0,title,number,duration,lyrics
0,Wake Up,1,0 days 00:07:37,
1,X-Ray Mind,2,0 days 00:05:12,
2,River of Deceit,3,0 days 00:05:03,
3,I'm Above,4,0 days 00:05:45,
4,ARtificial Red,5,0 days 00:06:16,
5,Lifeless Dead,6,0 days 00:04:27,
6,I Don't Know Anything,7,0 days 00:05:00,
7,Long Gone Day,8,0 days 00:04:50,
8,November Hotel,9,0 days 00:07:06,
9,All Alone,10,0 days 00:04:13,


Calling metadata creates a mostly flat dict (with list of tracks metadata).  Note that in this representation, the releasedate and duration values are still objects from datetime.

In [7]:
album.metadata()

{'name': None,
 'artist': 'Mad Season',
 'producer': None,
 'album': 'Above',
 'releasedate': datetime.date(1995, 3, 14),
 'genre': None,
 'tracks': [{'title': 'Wake Up',
   'number': 1,
   'duration': Timedelta('0 days 00:07:37'),
   'lyrics': None},
  {'title': 'X-Ray Mind',
   'number': 2,
   'duration': Timedelta('0 days 00:05:12'),
   'lyrics': None},
  {'title': 'River of Deceit',
   'number': 3,
   'duration': Timedelta('0 days 00:05:03'),
   'lyrics': None},
  {'title': "I'm Above",
   'number': 4,
   'duration': Timedelta('0 days 00:05:45'),
   'lyrics': None},
  {'title': 'ARtificial Red',
   'number': 5,
   'duration': Timedelta('0 days 00:06:16'),
   'lyrics': None},
  {'title': 'Lifeless Dead',
   'number': 6,
   'duration': Timedelta('0 days 00:04:27'),
   'lyrics': None},
  {'title': "I Don't Know Anything",
   'number': 7,
   'duration': Timedelta('0 days 00:05:00'),
   'lyrics': None},
  {'title': 'Long Gone Day',
   'number': 8,
   'duration': Timedelta('0 days 00:04:

Calling build_model builds the model representation. Note that in this representation, the releasedate and duration values are str.

In [8]:
model = album.build_model()
print(model.json(indent=4))

{
    "album": {
        "artist": "Mad Season",
        "album": "Above",
        "releasedate": "1995-03-14",
        "tracks": [
            {
                "title": "Wake Up",
                "number": 1,
                "duration": "0 days 00:07:37"
            },
            {
                "title": "X-Ray Mind",
                "number": 2,
                "duration": "0 days 00:05:12"
            },
            {
                "title": "River of Deceit",
                "number": 3,
                "duration": "0 days 00:05:03"
            },
            {
                "title": "I'm Above",
                "number": 4,
                "duration": "0 days 00:05:45"
            },
            {
                "title": "ARtificial Red",
                "number": 5,
                "duration": "0 days 00:06:16"
            },
            {
                "title": "Lifeless Dead",
                "number": 6,
                "duration": "0 days 00:04:27"
            },
  

The default queries for the Album class include the default queries for the values defined both in Album as well as in Track, except for the track number which was deleted in the Album's _set_values() method.

In [9]:
print(album.querydoc)

# album Query Parameters

- __artist__ (*str or list, optional*): Return only the records where artist name contains the given values
- __producer__ (*str or list, optional*): Return only the records where producer name matches a given value
- __album__ (*str or list, optional*): Return only the records where album title matches a given value
- __releasedate__ (*str or list, optional*): Return only the records where release date matches a given value
- __genre__ (*str or list, optional*): Return only the records where genre contains a given value
- __title__ (*str or list, optional*): Return only the records where track title matches a given value
- __lyrics__ (*str or list, optional*): Return only the records where lyrics contains the given values

