### A Demonstration of Torchlite's Inner Workings

Welcome to a demo of Torchlite's back-end library!  In this demo, we are going to show you how to use Torchlite to access metadata from an extracted-features workset using Python.

Do you already know how to use a Jupyter Notebook? Great! Then this demonstration should be easy to follow.  If you aren't familiar with Jupyter Notebooks, don't worry: all you have to do to go through the demonstration is to click on a cell to make it active (a cell is a gray, rectangular region; when a cell is activated, a green box appears around it) and press the shift key and the return key at the same time. 

Let's try it! Click on the cell right below this one: the cell that has a simple Python print statement in it. When you click on that cell, a green box will appear around it to show you it is the active cell.  Now click on the "Run" button at the top of the page, or hold down the shift key on your keyboard and press the return key (that is: press down on the shift key and keep pressing down, then press down on the return key briefly and release it).

In [None]:
print("Hello, Torchlite!")

The words "Hello, Torchlite!" should have appeared underneath the cell.  Did it work? Great! Then we can proceed with the demonstration.

This demonstration assumes you have a basic knowledge of Python. If you've never encountered Python code before, don't worry: you can still run the demonstration by activating cells and running them, but some of the explanations may not make much sense to you.

The first thing we'll do is import two classes: Workset and Api. Activate the cell right below this one and run it by clicking on the Run button or pressing shift-return.

In [1]:
from htrc.ef.api import Api
from htrc.torchlite.worksets import Workset

The Api class handles the network communication with the Extracted Features Service, so you don't have to know web addresses or api endpoints.  The Workset class is an abstraction on top of the workset data returned by the Extracted Features Api. In this version, a Workset is a simple wrapper around the EF workset data; in future versions, users will be able to assign filters and transformers to a Workset object, as well as being able to serialize it to a datastore (save it).

Every workset in the Extracted Features Database has a unique identifier (id). You must know the id of the workset you want to use; at the moment there is no way to search for worksets.  The Torchlite dashboard includes a widget for selecting a workset from pre-selected set of worksets, but for the demo we'll use a small workset containing four volumes.

In [2]:
workset_id = '64407dbd3300005208a5dca4'

We'll begin by creating an Api object for the Workset to use.  In this version, the Api object is very simple, but in later versions users will be able to configure it using parameters.

In [3]:
api = Api()

Now we'll create a workset by passing the workset identifier and the api object to the Workset class initializer.

In [4]:
workset = Workset(workset_id, api)
print(workset)

Torchlite_Workset(64407dbd3300005208a5dca4)


A Workset object has several user-accessible properties and methods. The htids property contains the htids of all the volumes in the workset:

In [5]:
print(workset.htids)

['uc2.ark:/13960/t3028pw46', 'nc01.ark:/13960/t2s47rp50', 'mdp.39015004288042', 'mdp.39015004830785', 'uc2.ark:/13960/t7gq6rn7k', 'uva.x000301612', 'uc2.ark:/13960/t0qr4pk1s', 'uc1.b3144696', 'loc.ark:/13960/t2x35cw4s', 'njp.32101064283474', 'uc2.ark:/13960/t4bn9z448', 'wu.89088297718', 'hvd.hnl33i', 'loc.ark:/13960/t1zc81g1r', 'uva.x000550658', 'inu.30000105000149', 'mdp.39015051125089', 'uva.x030832514', 'mdp.39015033911176', 'nyp.33433076069214', 'mdp.39015030041613', 'nyp.33433076032725', 'nyp.33433074941026', 'loc.ark:/13960/t08w49p1x', 'mdp.39015005618072', 'dul1.ark:/13960/t6m04w73w', 'nc01.ark:/13960/t5n87970b', 'nyp.33433081883005', 'dul1.ark:/13960/t1cj9916g', 'mdp.39015009772446', 'miun.afj9003.0001.001', 'njp.32101073475947', 'loc.ark:/13960/t6zw1qw76', 'uva.x004508565', 'nc01.ark:/13960/t5w67jz42', 'pst.000005977888', 'nyp.33433066599915', 'uc2.ark:/13960/t5x63bg3m', 'inu.30000053319657', 'uva.x002015843', 'nc01.ark:/13960/t7jq10k79', 'inu.30000109024962', 'uc1.b4719425', 

The metadata property is very useful.  It is a method that takes a list of attributes and queries to EF database for just those attributes and nothing else, leading to better performance for very large worksets.

In [6]:
metadata = workset.metadata(['htid', 'metadata.title', 'metadata.pubDate'])
print(metadata)

[htrc.ef.datamodels.Volume(dul1.ark:/13960/t1cj9916g), htrc.ef.datamodels.Volume(dul1.ark:/13960/t6m04w73w), htrc.ef.datamodels.Volume(hvd.hnl33i), htrc.ef.datamodels.Volume(hvd.hw3e5e), htrc.ef.datamodels.Volume(inu.30000053319657), htrc.ef.datamodels.Volume(inu.30000054472976), htrc.ef.datamodels.Volume(inu.30000105000149), htrc.ef.datamodels.Volume(inu.30000109024962), htrc.ef.datamodels.Volume(inu.30000111997973), htrc.ef.datamodels.Volume(inu.30000127830770), htrc.ef.datamodels.Volume(loc.ark:/13960/t08w49p1x), htrc.ef.datamodels.Volume(loc.ark:/13960/t1zc81g1r), htrc.ef.datamodels.Volume(loc.ark:/13960/t2x35cw4s), htrc.ef.datamodels.Volume(loc.ark:/13960/t3417vk95), htrc.ef.datamodels.Volume(loc.ark:/13960/t6zw1qw76), htrc.ef.datamodels.Volume(mdp.39015004288042), htrc.ef.datamodels.Volume(mdp.39015004830785), htrc.ef.datamodels.Volume(mdp.39015005618072), htrc.ef.datamodels.Volume(mdp.39015009772446), htrc.ef.datamodels.Volume(mdp.39015030041613), htrc.ef.datamodels.Volume(mdp.3

The values returned from the EF API are validated using Pydantic data models, which also provide a simplified interface to the data fields.

In [7]:
print(metadata[0].metadata.title)

Travels through North and South Carolina, Georgia, East and West Florida, the Cherokee country, the extensive territories of the Muscogulges, or Creek confederacy, and the country of the Chactaws; containing an account of the soil and natural productions of those regions; together with observations on the manners of the Indians ...


We can use simple Python list comprehensions to compile a basic data structure (a list of dictionaries) that we can use in our data visualization and analysis:

In [8]:
pubData = [{"htid": v.htid, "title": v.metadata.title, "pubDate": v.metadata.pubDate} for v in metadata]
print(pubData)

[{'htid': 'dul1.ark:/13960/t1cj9916g', 'title': 'Travels through North and South Carolina, Georgia, East and West Florida, the Cherokee country, the extensive territories of the Muscogulges, or Creek confederacy, and the country of the Chactaws; containing an account of the soil and natural productions of those regions; together with observations on the manners of the Indians ...', 'pubDate': 1793}, {'htid': 'dul1.ark:/13960/t6m04w73w', 'title': 'Master William Mitten: or, A youth of brilliant talent, who was ruined by bad luck.', 'pubDate': 1864}, {'htid': 'hvd.hnl33i', 'title': 'The wigwam and the cabin, or Tales of the south. 1st series.', 'pubDate': 1853}, {'htid': 'hvd.hw3e5e', 'title': 'Sketches of the life and character of Patrick Henry.', 'pubDate': 1844}, {'htid': 'inu.30000053319657', 'title': 'The Westover manuscripts: containing the history of the dividing line betwixt Virginia and North Carolina; A journey to the land of Eden, A.D. 1733; and A progress to the mines.', 'pub

In the next demo, we'll show you how to use Torchlite to create visualizations in a Jupyter Notebook.