# Introduction to yabadaba: Defining simple records

This Notebook provides a demonstration of using yabadaba to interact with a very simple data schema for frequently asked questions (FAQ) data that consists solely of a question and an answer.  

In [1]:
# Standard Python libraries
import datetime

import yabadaba
from yabadaba.record import Record
from yabadaba import recordmanager

# Show yabadaba version
print('yabadaba version =', yabadaba.__version__)

# Show date of Notebook execution
print('Notebook executed on', datetime.date.today())

yabadaba version = 0.3.0
Notebook executed on 2025-02-13


## 1. FAQ records

The FAQ records are incredibly simple data models that have a single root of 'faq' and two fields: the 'question' and the associated 'answer'.

In [2]:
faq_faq_json = """{
    "faq": {
        "question": "What does a FAQ Record represent?",
        "answer": "A frequently asked question and the corresponding answer."
    }
}"""

woodchuck_faq_json = """{
    "faq": {
        "question": "How much wood would a woodchuck chuck if a woodchuck could chuck wood?",
        "answer": "A woodchuck would chuck as much wood as a woodchuck could chuck if a woodchuck could chuck wood."
    }
}"""

fuzzy_faq_json = """{
    "faq": {
        "question": "Fuzzywuzzy was a bear. Fuzzywuzzy had no hair. So Fuzzywuzzy wasn't fuzzy, was he?",
        "answer": "Nope."
    }
}"""

## 2. Record class definition

This section shows an example code for defining a Record class for the FAQ data and describes the components.

### 2.1. Define the FAQ class

We define a Record class for the FAQ data shown above that provides some basic metadata information and specifies what the values are. 

In [3]:
class FAQ(Record):
    """
    Class for representing FAQ (frequently asked question) records.
    """
    ########################## Basic metadata fields ##########################

    @property
    def style(self) -> str:
        """str: The record style"""
        return 'FAQ'

    @property
    def modelroot(self):
        """str: The root element of the content"""
        return 'faq'

    ############################# Define Values  ##############################

    def _init_values(self):
        """
        Method that defines the value objects for the Record.  This should
        call the super of this method, then use self._add_value to create new Value objects.
        Note that the order values are defined matters
        when build_model is called!!!
        """
        self._add_value('longstr', 'question')
        self._add_value('longstr', 'answer')

### 2.2. Basic metadata attributes

Each Record class has a few basic metadata attributes that either help with data transformations or with interacting with the databases.  The two such values that are required are

- __style__ is the style name associated with the class, i.e. how the recordmanager (see below) will identify it.
- __modelroot__ is the name of the root element of the data model schema.

### 2.3. Define Values

The purpose of the Record class is to manage data transformations between multiple different formats.  The easiest way to do this is to define a _init_values() method for the Record class and within it use self._add_value() to define all values in your schema. The _add_value() method creates a new Value object and automatically associates it with the Record class. 

Parameters of the _add_value() method:

- __style__ (*str*) The value style.
- __name__ (*str*) The name for the parameter value.  This corresponds to the name of the associated class attribute.
- __defaultvalue__ (*any or None, optional*) The default value to use for the property.  The default value of None indicates that there is no default value.
- __valuerequired__ (*bool, optional*) Indicates if a value must be given for the property.  If True, then checks will be performed that a value is assigned to the property.
- __allowedvalues__ (*tuple or None, optional*) A list/tuple of values that the parameter is restricted to have. Setting this to None (default) indicates any value is allowed.
- __metadatakey__ (*str, bool or None, optional*) The key name to use for the property when constructing the record metadata dict.  If set to None (default) then name will be used or
 metadatakey.  If set to False then the parameter will not be included in the metadata dict.
- __metadataparent__ (*str or None, optional*) If given, then this indicates that the metadatakey is actually an element of a dict in metadata with this name.  This allows for limited support for metadata having embedded dicts.
- __modelpath__ (*str, optional*) The period-delimited path after the record root element for where the parameter will be found in the built data model.  If set to None (default) then name will be used for modelpath.
- __description__ (*str or None, optional*) A short description for the value.  If not given, then the record name will be used.
- __\*\*kwargs__ (*any, optional*) Any additional value style-specific keyword parameters.

### 2.4. Define data-specific methods and attributes

Once the above connections with yabadaba have been defined, the rest of the Record class can consist of data-specific methods and properties allowing users to better interact with the data.

## 3. Interacting with records

Defining record classes that inherit from the base yabadaba Record class provides a variety of useful tools for interacting with and transforming the data associated with a single database entry even before database interactions are considered.

### 3.1. Initializing records

The default parameters for initializing a Record object are

- __model__ (*str, file-like object, or DataModelDict, optional*)  The contents of a record to load. This can either be a str containing JSON or XML content, a path name to a file containing such content, or a DataModelDict object that interprets such content.
- __name__ (*str, optional*) The unique name to assign to the record.  If model is a file path, then the default record name is the file name without extension.
- __database__ (*yabadaba.Database, optional*) A default Database to associate with the Record, typically the Database that the Record was obtained from.  Can allow for Record methods to perform Database operations without needing to specify which Database to use.
- __kwargs__ (*any*) Any record-specific attributes to assign.

These parameters provide two main routes for creating a new record object

1. Load in existing record contents from a data model representation (i.e. JSON or XML).
2. Create a new record entry and start assigning values to it.

Using the FAQ class defined in section 2, we can directly load the JSON contents for the three records defined in section 1.

In [4]:
faq_faq = FAQ(name='faq', model=faq_faq_json)
woodchuck_faq = FAQ(name='woodchuck', model=woodchuck_faq_json)
fuzzy_faq = FAQ(name='fuzzy', model=fuzzy_faq_json)

Alternatively, we can initialize new FAQ objects which can be used to build new database entries.  The Value objects defined for the record are automatically recognized as kwargs that can be assigned during the initialization.

In [5]:
init_faq = FAQ(name='init', question='Can I assign values during init?', answer='Yes, you can!')

For any values not assigned during init, they will be given default values of either None or what was specified as their "defaultvalue".

In [6]:
build_faq = FAQ()

### 3.2. Basic attribute interactions

Once created, a record object will have default parameters that characterize the class as well as attributes associated with any defined Values.

In [7]:
print(woodchuck_faq)

FAQ record named woodchuck


In [8]:
print(woodchuck_faq.name)

woodchuck


In [9]:
print(woodchuck_faq.style)

FAQ


In [10]:
print(woodchuck_faq.modelroot)

faq


In [11]:
print(woodchuck_faq.question)

How much wood would a woodchuck chuck if a woodchuck could chuck wood?


In [12]:
print(woodchuck_faq.answer)

A woodchuck would chuck as much wood as a woodchuck could chuck if a woodchuck could chuck wood.


Any values that were not assigned during initialization can be directly assigned to these attributes.

In [13]:
print(build_faq.name)
print(build_faq.question)
print(build_faq.answer)

None
None
None


In [14]:
build_faq.name = 'build'
build_faq.question = 'Is it easy to build record contents by assigning to object attributes?'
build_faq.answer = 'It seems that way to me.'

print(build_faq.name)
print(build_faq.question)
print(build_faq.answer)

build
Is it easy to build record contents by assigning to object attributes?
It seems that way to me.


### 3.3. Data transformation methods

There are also a number of data transformation methods built into the Record class.  These primarily allow for interfacing with different database infrastructures, but can also serve as means of interacting with the data in different ways that can possibly be more useful for different situations.

#### 3.3.1. Metadata (flat dict) representation

The metadata() method of a record is intended to generate a simple flat dictionary where the values are simple data types like int, float and str.  This is not intended to provide a full representation of the record data in all cases, but rather a quick survey of simple data and metadata values of the record. 

In [15]:
woodchuck_faq.metadata()

{'name': 'woodchuck',
 'question': 'How much wood would a woodchuck chuck if a woodchuck could chuck wood?',
 'answer': 'A woodchuck would chuck as much wood as a woodchuck could chuck if a woodchuck could chuck wood.'}

The metadata() method regenerates the dict contents from the currently assigned values of the Value objects each time it is called.  

In [16]:
build_faq.metadata()

{'name': 'build',
 'question': 'Is it easy to build record contents by assigning to object attributes?',
 'answer': 'It seems that way to me.'}

__NOTE__: Changing values in the metadata dict or python object will not automatically adjust the values in the other.

In [17]:
meta = build_faq.metadata()
meta['answer'] = 'trying to assign a new answer to the meta dict'

# Printing the answer value attribute shows the original unchanged value
print(build_faq.answer)

It seems that way to me.


### 3.3.2. Data model representation

The record can also be transformed to/from the tree-like "data model" representation that can exist in one of three formats:
1. JSON string,
2. XML string, or
3. Embedded Python dicts and lists analogous to the JSON.

The model contents can be accessed using the model attribute.  This returns embedded DataModelDict objects of the content, where DataModelDict is an extension of the basic Python dict to include built-in conversion methods to/from JSON and XML and some other tools supporting building and interacting with data models.

In [18]:
# Show the DataModelDict representation
woodchuck_faq.model

DataModelDict([('faq',
                DataModelDict([('question',
                                'How much wood would a woodchuck chuck if a woodchuck could chuck wood?'),
                               ('answer',
                                'A woodchuck would chuck as much wood as a woodchuck could chuck if a woodchuck could chuck wood.')]))])

In [19]:
# Instantly convert to JSON
print(woodchuck_faq.model.json(indent=4))

{
    "faq": {
        "question": "How much wood would a woodchuck chuck if a woodchuck could chuck wood?",
        "answer": "A woodchuck would chuck as much wood as a woodchuck could chuck if a woodchuck could chuck wood."
    }
}


In [20]:
# Instantly convert to XML
print(woodchuck_faq.model.xml(indent=4))

<?xml version="1.0" encoding="utf-8"?>
<faq>
    <question>How much wood would a woodchuck chuck if a woodchuck could chuck wood?</question>
    <answer>A woodchuck would chuck as much wood as a woodchuck could chuck if a woodchuck could chuck wood.</answer>
</faq>


__NOTE__: Similar to the metadata() contents, any changes made to the record object attribute values or to the values in the model contents are not automatically reflected in the values of the alternate representation.  But, since the model is a full representation of the data there are built-in conversion methods between the two.

- build_model() will (re)build the model representation based on the current values assigned to the Python representation.
- load_model() will read in JSON/XML/DataModelDict contents and assign the values to the correct Python attributes.  Calling this is equivalent to initializing a new record using the "model" parameter.
- reload_model() will perform load_model() using the current model contents.  This allows for manual changes to the model that are then reflected in the object.

In [21]:
# The build_faq object does not currently have model contents as none were loaded in
print(build_faq.model)

None


In [22]:
# Calling build_model() generates the model and returns it
print(build_faq.build_model().json(indent=4))

{
    "faq": {
        "question": "Is it easy to build record contents by assigning to object attributes?",
        "answer": "It seems that way to me."
    }
}


In [23]:
# Now, the model contents exist
print(build_faq.model.xml(indent=4))

<?xml version="1.0" encoding="utf-8"?>
<faq>
    <question>Is it easy to build record contents by assigning to object attributes?</question>
    <answer>It seems that way to me.</answer>
</faq>


In [24]:
# If I change the model answer, the object answer is not immediately changed
build_faq.model['faq']['answer'] = 'Yes, and you can assign to model contents as well.'
print(build_faq.answer)

# But, calling reload_model() will update the values
build_faq.reload_model()
print(build_faq.answer)

It seems that way to me.
Yes, and you can assign to model contents as well.


## 4. Integrate the record into yabadaba for database interactions

Once a Record subclass has been defined, yabadaba needs to know about it so that the database interactions know how to interpret data of the associated schema.

### 4.1. Directly add to recordmanager

The simplest way to integrate a record into yabadaba is by importing yabadaba.recordmanager and assigning the record to the recordmanager's loaded_styles dict.  The key should match the record's style attribute, and the value is the record class.  This can be useful for prototyping and testing, but can be unwieldy as the record gets more complicated and other people may wish to use it.

In [25]:
recordmanager.loaded_styles['FAQ'] = FAQ

In [26]:
# Show that FAQ is now a recognized record style
recordmanager.check_styles()

Record styles that passed import:
- FAQ: <class '__main__.FAQ'>
Record styles that failed import:



Once in the recordmanager, other yabadaba methods will know about the style and the record class.  For instance:

- yabadaba.load_record() can be used to create a new record of the given class simply by providing the style attribute.
- A Database object will recognize the record style as corresponding to a given record schema and will allow for queries and automatic interpretation of the data (more in the next Notebook).

In [27]:
record = yabadaba.load_record('FAQ', name='load_record test', question='Does load_record know of the FAQ style?', answer='Yes, once it is added to the recordmanager')
print(record.build_model().xml(indent=2))

<?xml version="1.0" encoding="utf-8"?>
<faq>
  <question>Does load_record know of the FAQ style?</question>
  <answer>Yes, once it is added to the recordmanager</answer>
</faq>


In [28]:
# Create a new database object to interact with the local demo database
database = yabadaba.load_database(style='local', host='yabadaba_demo_database')

In [29]:
# Get the FAQ records in the database
records, records_df = database.get_records(style='FAQ', return_df=True)
records_df

Unnamed: 0,name,question,answer
0,faq,What does a FAQ Record represent?,A frequently asked question and the correspond...
1,fuzzy,Fuzzywuzzy was a bear. Fuzzywuzzy had no hair....,Nope.
2,woodchuck,How much wood would a woodchuck chuck if a woo...,A woodchuck would chuck as much wood as a wood...


### 4.2. Create a new package containing the record

The more practical method is to place the Record class definition in a new package and import it.  The yabadaba_demo folder in this doc directory does just that.

The yabadaba_demo package is very minimal in design to showcase only what is needed.

- The main \_\_init\_\_.py imports a record submodule.
- The record submodule contains record definitions either within python files in the record directory or in further subdirectories.  
- The \_\_init\_\_.py of the record submodule imports the recordmanager from yabadaba and calls its import_style() method to dynamically import each defined record class.
- Supporting XSL and XSD files are included in the xsl and xsd subdirectories.  These are optional and can be used to transform and validate the XML representation of the records. 

#### 4.2.1. recordmanager.import_style()

The recordmanager.import_style() imports a defined Record class and, if successful, adds it to the recordmanager.loaded_styles dict. If the import fails, then the error message is caught and added to the recordmanager.failed_styles dict.  This allows for a modular treatment of the records where the core package being imported can still be ran even if some of the record styles fail.  This can be useful when a record style requires additional package requirements beyond what the core package needs.

The parameters for import_style() are

- __style__ (*str*)  The style name to associate with the modular class  This should match the style property assigned in the class.- __modulename__ (*str*) The name of the module to try to import.
- __package__ (*str, optional*) The name of the package which is to act as the anchor for resolving relative package names.
- __classname__ (*str, optional*) The name of the class in the imported module to associate with the style. If not given, will use the final name of the modulename path.


The code and comments from yabadaba_demo/record/\_\_init\_\_.py are shown below demonstrating how to use import_style.

```Python
# from .FAQ import FAQ as the 'FAQ' record style
recordmanager.import_style('FAQ', '.FAQ', __name__, 'FAQ')

# from .demo import Demo as the 'demo' record style
recordmanager.import_style('demo', '.demo', __name__, 'Demo')
```


#### 4.2.2. Using the package.

With the setup mentioned above, the package can be used by first importing yabadaba and then the yabadaba_demo package. Since yabadaba was imported at the top of this Notebook, all we need to do is import the demo code.

In [30]:
import yabadaba_demo

Once the demo package is imported, yabadaba will know about the defined Record classes within it.

In [31]:
yabadaba.recordmanager.check_styles()

Record styles that passed import:
- FAQ: <class 'yabadaba_demo.record.FAQ.FAQ'>
- album: <class 'yabadaba_demo.record.album.Album.Album'>
Record styles that failed import:
- bad_record: <class 'ModuleNotFoundError'>: No module named 'package_that_does_not_exist'

