# Introduction to yabadaba: Build a Package and Interact with Databases

Once Record classes have been defined, they can easily be integrated in allowing for yabadaba's Database operations to be used.

In [1]:
# Standard Python libraries
import datetime
from pathlib import Path

import yabadaba
from yabadaba.record import Record

from yabadaba import load_record, recordmanager

# https://github.com/usnistgov/DataModelDict
from DataModelDict import DataModelDict as DM

import pandas as pd

# Show yabadaba version
print('yabadaba version =', yabadaba.__version__)

# Show date of Notebook execution
print('Notebook executed on', datetime.date.today())

yabadaba version = 0.2.0
Notebook executed on 2023-04-03


## 1. ModuleManager objects

The yabadaba package treats subclasses of the base Record, Database and Query classes in a modular fashion.  This modular handling is supported by ModuleManager objects, which provide a common framework for interacting with the subclasses.  Three ModuleManager objects are included in yabadaba, recordmanager, databasemanager, and querymanager, that all interact with one family of subclasses.  The benefits of the ModuleManagers are that
- The ModuleManagers handle the importing of the subclasses in a way that allows for new subclasses to be easily added and any subclasses that fail import to not break the code.  This is useful if certain subclasses require additional optional packages to be installed, or if a particular subclass is not fully implemented yet due to it being in development or in need of an update.
- Any subclasses that get successfully imported can be accessed using their assigned style.  Conversely, the error messages are caught and retained for any subclasses that failed import. 
- Subsequent code can then interact with the ModuleManagers without needing to know what styles are available.

## 2. Adding records to the recordmanager

Records can be added to the recordmanager either by adding them to the recordmanager's loaded_styles dict or by importing the class from code in a Python package.  It is recommended to do the latter, as detailed in this section.

### 2.1. Record class definitions

The yabadaba.demo subpackage contains definitions for two Record classes

- The file yabadaba/demo/FAQ.py defines the FAQ Record class as seen in the previous Notebook.
- The file yabadaba/demo/BadRecord.py defines the BadRecord Record class, which is meant to fail importing. 

### 2.2. Import the Records

Knowing the locations of the classes, they can be dynamically imported and added to the recordmanager using the import_style method.  The import_style method has the following paramters
- __style__ (*str*) The style name to associate with the modular class.
- __modulename__ (*str*) The name of the module to try to import.
- __package__ (*str, optional*) The name of the package which is to act as the anchor for resolving a relative module.
- __classname__ (*str, optional*) The name of the class in the module being imported to associate with the style.  If not given, will assume that the classname corresponds to the module that it is in.


In [2]:
# Class FAQ is in module yabadaba.demo.FAQ.  This is being assigned to style FAQ.
recordmanager.import_style(style='FAQ', modulename='yabadaba.demo.FAQ')

# Class BadRecord is in module yabadaba.demo.BadRecord. This is being assigned to style bad.
recordmanager.import_style(style='bad', modulename='yabadaba.demo.BadRecord')

### 2.3. Check the status of imported Records

Most of the other methods and attributes of the ModuleManager objects, such as recordmanager, are oriented towards checking the status or accessing the imported records.

- __loaded_styles__ (*dict*) collects the successfully loaded subclasses according to their style names.
- __failed_styles__ (*dict*) collects the error messages raised by the subclasses that failed import according to their style names.
- __loaded_style_names__ (*list*) lists the style names of the successfully loaded subclasses.
- __failed_style_names__ (*list*) lists the style names of the subclasses that failed import.
- __check_styles()__ prints informative details about both the successfully and unsuccessfully loaded subclasses.
- __assert_style(style)__ raises an error if the given style is not in loaded_style_names.


In [3]:
recordmanager.check_styles()

Record styles that passed import:
- FAQ: <class 'yabadaba.demo.FAQ.FAQ'>
Record styles that failed import:
- bad: <class 'NotImplementedError'>: BadRecord is meant to fail import!



## 3. Creating records

Once a Record style has been added to the recordmanager, new record objects can be initialized using the load_record function.  Similar load_database and load_query methods exist for accessing the styles contained in the databasemanager and querymanager objects, respectively.

Here, we'll create the same FAQ objects as the previous Notebook, but use load_record instead.

In [4]:
records = []

# Load the first three records from their JSON
model = """{
    "faq": {
        "question": "What does a FAQ Record represent?",
        "answer": "A frequently asked question and the corresponding answer."
    }
}"""
records.append(load_record(style='FAQ', name='faq', model=model))

model = """{
    "faq": {
        "question": "How much wood would a woodchuck chuck if a woodchuck could chuck wood?",
        "answer": "A woodchuck would chuck as much wood as a woodchuck could chuck if a woodchuck could chuck wood."
    }
}"""
records.append(load_record(style='FAQ', name='woodchuck', model=model))

model = """{
    "faq": {
        "question": "Fuzzywuzzy was a bear. Fuzzywuzzy had no hair. So Fuzzywuzzy wasn't fuzzy, was he?",
        "answer": "Nope."
    }
}"""
records.append(load_record(style='FAQ', name='fuzzy', model=model))

# Create the last record from parameters and build_model
records.append(load_record(style='FAQ', name='define', 
                           question="Can I define a FAQ using parameters?",
                           answer="Yes, you can."))
records[-1].build_model()

for record in records:
    print(record)

FAQ record named faq
FAQ record named woodchuck
FAQ record named fuzzy
FAQ record named define


## 4. Interacting with databases

Everything is now in place to interact with databases.  The Database classes are managed using the databasemanager in a fashion similar to how the recordmanager handles Records.  The main difference being that some Database styles are already defined meaning that you (hopefully) won't need to define your own Database classes.  

### 4.1. Initializing databases

Database objects are initialized by providing all of the necessary access parameters for the specific database.  As these vary between the different styles of databases, the allowed parameters also vary.  

#### 4.1.1. Local database 

A local style Database exists as a local directory containing individual JSON and XML files, as well as csv cache files containing the compiled metadata associated with the hosted records.  The cache files greatly increase the speed of queries as it allows for parsing the records without reading and loading all of the individual files.  A local-style database is initialized with the following parameters

- __host__ (*str*) The host name (local directory path) for the database.
- __format__ (*str, optional*) The format that the model records are saved as.  Can be either JSON or XML.  Default value is JSON.
- __indent__ (*int or None, optional*) The indentation used when saving the records.  If None (default) then the saved records are compact.  Otherwise, the lines in the file will be indented by multiples of this value based on the model's element recursion.  Compact records are smaller, while indented records are easier to read.

In [5]:
db = yabadaba.load_database(style='local', host='testdb')
print(db)

database style local at C:\Users\lmh1\Documents\Python-packages\yabadaba\doc\testdb


#### 4.1.2. Mongo database 

A mongo style Database interacts with a MongoDB instance. The initialization parameters relate to the corresponding parameters of the pymongo.Mongoclient class as it is utilized by the Database class.

- __host__ (*str*) The mongo host to connect to.  Default value is 'localhost'.
- __port__ (*int*) Then port to use in connecting to the mongo host.  Default value is 27017.
- __database__ (*str*) The name of the database in the mongo host to interact with. Default value is 'iprPy'
- __\*\*kwargs__ (*dict, optional*) Any extra keyword arguments needed to initialize a pymongo.MongoClient object.

#### 4.1.3. CDCS database

A cdcs Database style interacts with a CDCS (Configurable Data Curation System) instance.  CDCS databases provide REST APIs meaning that the initialization parameters correspond to web requests permissions.

- __host__ (*str*) The host name (url) for the database.
- __username__ (*str or tuple of two str*) The username to use for accessing the database.  Alternatively, a tuple of (username, password).
- __password__ (*str, optional*) The password associated with username to use for accessing the database. This can either be the password as a str, or a str path to a file containing only the password. If not given, a prompt will ask for the password.
- __cert__ (*str, optional*) The path to a certification file if needed for accessing the database.
- __verify__ (*bool, optional*) Indicates if verifications for the site are used

### 4.2. Interacting with records

#### 4.2.1. Adding/updating records

Records can be added/updated to a database using the add_record() and update_record() Database methods.  Many of the database operations in yabadaba work on the principle that all records of a given style should have unique names.  The two methods adhere to that according to the following operations
- __add_record()__ will add a new record to the database as long as no similarly named record exists.  If a record with the same style and name is already in the database, then the method will throw an error.
- __update_record()__ will update the contents of an existing record in the database.  If no record in the database has the matching name and style, then the method will throw an error.

A combined add/update can easily be done with code that does try: add, except: update.

Both methods take similar parameters

- __record__ (*Record, optional*) The new record to add to the database.  If not given, then name, style and content are required.
- __style__ (*str, optional*) The record style for the new record.  Required if record is not given.
- __name__ (*str, optional*) The name to assign to the new record.  Required if record is not given.
- __model__ (*str or DataModelDict, optional*) The model contents of the new record.  Required if record is not given.
- __build__ (*bool, optional*) If True, then the uploaded content will be (re)built based on the record's attributes.  If False (default), then record's existing content will be loaded if it exists, or built if it doesn't exist.
- __verbose__ (*bool, optional*) If True, info messages will be printed during operations.  Default value is False.

In [6]:
for record in records:
    db.add_record(record, verbose=True)

FAQ record named faq added to C:\Users\lmh1\Documents\Python-packages\yabadaba\doc\testdb
FAQ record named woodchuck added to C:\Users\lmh1\Documents\Python-packages\yabadaba\doc\testdb
FAQ record named fuzzy added to C:\Users\lmh1\Documents\Python-packages\yabadaba\doc\testdb
FAQ record named define added to C:\Users\lmh1\Documents\Python-packages\yabadaba\doc\testdb


#### 4.2.2. Exploring records

There are three related get methods that query the database for matching Records.

- __get_records()__ returns a numpy.NDArray containing Record objects that match the query.
- __get_record()__ returns a single Record object if exactly one matching record is found.  Otherwise it will issue an error.
- __get_records_df()__ returns a pandas.DataFrame of Record metadata for all records that match the query.

Parameters for the three methods are comparable
        
- __style__ (*str, optional*) The record style to search. If not given, a prompt will ask for it.
- __return_df__ (*bool, optional*) Only accepted by get_records. If True, then the corresponding pandas.Dataframe of metadata will also be returned.
- __\*\*kwargs__ (*any, optional*) Any extra options specific to the database style or metadata search parameters specific to the record style.

In [7]:
# Fetch all records
db.get_records_df('FAQ')

Unnamed: 0,name,question,answer
0,define,Can I define a FAQ using parameters?,"Yes, you can."
1,faq,What does a FAQ Record represent?,A frequently asked question and the correspond...
2,fuzzy,Fuzzywuzzy was a bear. Fuzzywuzzy had no hair....,Nope.
3,woodchuck,How much wood would a woodchuck chuck if a woo...,A woodchuck would chuck as much wood as a wood...


__NOTE for local-style Databases__: These methods use the csv cache files to speed up query operations.  The cache automatically checks for records that have been added or deleted, but not records that have been updated. If records have been updated, add refresh_cache=True when calling the methods to rebuild the cache for all records.

In [8]:
# Fetch all records
db.get_records_df('FAQ', refresh_cache=True)

Unnamed: 0,name,question,answer
0,define,Can I define a FAQ using parameters?,"Yes, you can."
1,faq,What does a FAQ Record represent?,A frequently asked question and the correspond...
2,fuzzy,Fuzzywuzzy was a bear. Fuzzywuzzy had no hair....,Nope.
3,woodchuck,How much wood would a woodchuck chuck if a woo...,A woodchuck would chuck as much wood as a wood...


Additionally, there is one convenience method, __retrieve_record()__, which gets a matching record from the database and saves it to a local file all at once.  This can be used to easily fetch and save a record so that external software can interact with it.
    
- __style__ (*str, optional*) The record style to search. If not given, a prompt will ask for it.
- __dest__ (*path, optional*) The parent directory where the record will be saved to.  If not given, will use the current working directory.
- __format__ (*str, optional*) The file format to save the record in: 'json' or 'xml'.  Default is 'json'.
- __indent__ (*int, optional*) The number of space indentation spacings to use in the saved record for the different tiered levels.  Default is 4.  Giving None will create a compact record.
- __verbose__ (*bool, optional*) If True, info messages will be printed during operations.  Default value is False.
- __\*\*kwargs__ (*any, optional*) Any extra options specific to the database style or metadata search parameters specific to the record style.

#### 4.2.3. Deleting records

The __delete_record()__ method will delete a single record.  The method's parameters make it possible to specify which record to delete.

- __record__ (*Record, optional*) The record to delete from the database.  If not given, name and/or style are needed to uniquely define the record to delete.
- __style__ (*str, optional*) The style of the record to delete.
- __name__ (*str, optional*) The name of the record to delete.

Also, check out the destroy_records() method listed in section 4.4 below.


In [9]:
db.delete_record(style='FAQ', name='fuzzy')
db.get_records_df('FAQ')

Unnamed: 0,name,question,answer
0,define,Can I define a FAQ using parameters?,"Yes, you can."
1,faq,What does a FAQ Record represent?,A frequently asked question and the correspond...
2,woodchuck,How much wood would a woodchuck chuck if a woo...,A woodchuck would chuck as much wood as a wood...


### 4.3. Interacting with non-record files

The Database classes also have methods that allow for other files to be uploaded to the various databases.  

#### 4.3.1. Tarballs of supporting files

The primary use of file uploads in yabadaba is to provide supporting files that are associated with individual record entries.  Every record entry in the database can have a corresponding folder of files associated with it.  These are primarily stored in the databases as tar.gz tarballs.

- __get_tar()__ Retrives the tar archive associated with a record in the database.
    - __record__ (*Record, optional*) The record to retrive the associated tar archive for.
    - __style__ (*str, optional*) The style to use in uniquely identifying the record.
    - __name__ (*str, optional*) The name to use in uniquely identifying the record.
    - __raw__ (*bool, optional*) If True, return the archive as raw binary content. If False, return as an open tarfile. (Default is False).
- __add_tar()__ Archives and stores a folder associated with a record.
    - __record__ (*Record, optional*) The record to associate the tar archive with.  If not given, then name and/or style necessary to uniquely identify the record are needed.
    - __style__ (*str, optional*) The style to use in uniquely identifying the record.
    - __name__ (*str, optional*) The name to use in uniquely identifying the record.
    - __tar__ (*bytes, optional*) The bytes content of a tar file to save.  tar cannot be given with root_dir.
    - __root_dir__ (*str, optional*) Specifies the root directory for finding the directory to archive. The directory to archive is at \<root_dir\>/\<name\>. (Default is to set root_dir to the current working directory.) tar cannot be given with root_dir.
- __update_tar()__ Replaces an existing tar archive for a record with a new one.
    - __record__ (*Record, optional*) The record to associate the tar archive with.  If not given, then name and/or style necessary to uniquely identify the record are needed.
    - __style__ (*str, optional*) The style to use in uniquely identifying the record.
    - __name__ (*str, optional*) The name to use in uniquely identifying the record.
    - __tar__ (*bytes, optional*) The bytes content of a tar file to save.  tar cannot be given with root_dir.
    - __root_dir__ (*str, optional*) Specifies the root directory for finding the directory to archive. The directory to archive is at \<root_dir\>/\<name\>.  (Default is to set root_dir to the current working directory.)  tar cannot be given with root_dir.
- __delete_tar()__ Deletes a tar file from the database.
    - __record__ (*Record, optional*) The record associated with the tar archive to delete.  If not given, then name and/or style necessary to uniquely identify the record are needed.
    - __style__ (*str, optional*) The style to use in uniquely identifying the record.
    - __name__ (*str, optional*) The name to use in uniquely identifying the record.

#### 4.3.2. Folders of supporting files

The local-style Databases alternatively allow for the record-specific supporting files to be stored in normal directories. Doing so allows for the contained files to be directly explored without extracting them.  Note that the use of the folders is alternative to the tars and both should not be used simultaneously for any record.

- __get_folder()__ Retrives the location of the folder associated with a record in the database. 
    - __record__ (*Record, optional*) The record to retrive the associated folder location for.
    - __name__ (*str, optional*) The name to use in uniquely identifying the record.
    - __style__ (*str, optional*) The style to use in uniquely identifying the record.
- __add_folder()__ Stores a folder associated with a record.
    - __record__ (*Record, optional*) The record to associate the folder with.  If not given, then name and/or style necessary to uniquely identify the record are needed.
    - __name__ (*str, optional*) The name to use in uniquely identifying the record.
    - __style__ (*str, optional*) The style to use in uniquely identifying the record.
    - __filenames__ (*str or list, optional*) The paths to files that are to be added to the record's folder. Cannot be given with root_dir.
    - __root_dir__ (*str, optional*) Specifies the root directory for finding the folder to add. The directory to add is at \<root_dir\>/\<name\>.  (Default is to set root_dir to the current working directory.)  Cannot be given with filenames.
- __update_folder()__ Updates an existing folder for a record.
    - __record__ (*Record, optional*) The record to associate the folder with.  If not given, then name and/or style necessary to uniquely identify the record are needed.
    - __name__ (*str, optional*) The name to use in uniquely identifying the record.
    - __style__ (*str, optional*) The style to use in uniquely identifying the record.
    - __filenames__ (*str or list, optional*) The paths to files that are to be added to the record's folder. Cannot be given with root_dir.
    - __root_dir__ (*str, optional*) Specifies the root directory for finding the folder to add. The directory to add is at \<root_dir\>/\<name\>.  (Default is to set root_dir to the current working directory.)  Cannot be given with filenames.
    - __clear__ (*bool, optional*) If True (default), then the current folder contents will be deleted before the new contents are added.  If False, existing files may remain if the new content does not overwrite it.
- __delete_folder()__ Deletes a folder from the database.
    - __record__ (*Record, optional*) The record associated with the folder to delete.  If not given, then name and/or style necessary to uniquely identify the record are needed.
    - __name__ (*str, optional*) The name to use in uniquely identifying the record.
    - __style__ (*str, optional*) The style to use in uniquely identifying the record.

#### 4.3.3. Loose files

Currently, there are no methods built into yabadaba that allow for files unassociated with records to be uploaded to the various databases.  The underlying database infrastructures that yabadaba supports allows for this, so options to interact with individual files may be added later.  If this is important for you, email potentials@nist.gov and implementing these methods will be made a higher priority.

### 4.4. Other utility methods

The Database class also defines additional methods that perform more complex convenience operations.

#### 4.4.1. copy_records()

Copies records from the current database to another database.

- __dest__ (*Database*) The destination database to copy records to.
- __record_style__ (*str, optional*) The record style to copy.  If record_style and records not given, then the available record styles will be listed and the user prompted to pick one.  Cannot be given with records.
- __records__ (*list, optional*) A list of Record obejcts from the current database to copy to dbase2.  Allows the user full control on which records to copy/update.  Cannot be given with record_style.
- __includetar__ (*bool, optional*) If True, the tar archives will be copied along with the records. If False, only the records will be copied. (Default is True).
- __overwrite__ (*bool, optional*) If False (default) only new records and tars will be copied. If True, all existing content will be updated.

#### 4.4.2. destroy_records()

Permanently deletes multiple records and their associated tars all at once.
        
- __record_style__ (*str, optional*) The record style to delete.  If given, all records of that style will be deleted. If neither record_style nor records given, then the available record styles will be listed and the user prompted to pick one.
- __records__ (*list, optional*) A list of pre-selected records to delete. 

In [10]:
db.destroy_records('FAQ')

3 records found to be destroyed
Delete records? (must type yes): yes


destroying records: 100%|##############################################################| 3/3 [00:00<00:00, 1499.57it/s]

3 records successfully deleted



