# Using HathiMeta

In [6]:
%load_ext autoreload
%autoreload 2
from compare_tools.hathimeta import HathiMeta, get_json_meta
from compare_tools.configuration import config

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Connecting to the DB

If no DB path is specified, a temporary in-memory database is created.

In [8]:
print('Creating DB at', config['metadb_path'])
meta = HathiMeta(config['metadb_path'])

Creating DB at /data/saddl/meta.db


### First time: Building DB

If you haven't yet ingested the [Hathifiles](https://www.hathitrust.org/hathifiles) or have a transient in-memory db, you'll need to run `create_db`:

In [3]:
meta.create_db(config['meta_path'])

0, 1, 2, 3, 4, 5, 

## Access

In [9]:
len(meta)

143864

In [11]:
meta['uiuo.ark:/13960/t6g20tp0c']

htid                                               uiuo.ark:/13960/t6g20tp0c
access                                                                 allow
rights                                                                    pd
ht_bib_key                                                          11639242
description                                                             v.13
source                                                                   UIU
source_bib_num                                                       2420951
oclc_num                                                            49472649
isbn                                                                    None
issn                                                                    None
lccn                                                                    None
title                      Standard shop efficiency schedules, by Henry W...
imprint                                               Crane & company, 1910-

In [12]:
meta.default_fields = ['title', 'author', 'description']
meta['uiuo.ark:/13960/t6g20tp0c']

title          Standard shop efficiency schedules, by Henry W...
author                              Jacobs, Henry William, 1874-
description                                                 v.13
Name: 0, dtype: object

## Extending DB

For the SADDL project, we extend the metadata with page_count info, which is available in the Extracted Features data. This is done by passing a basic DataFrame with [htid, newcol1, newcol2] to `extend_db`.

In [None]:
htids = meta.get_fields(['htid'])
htids['page_count'] = htids.htid.apply(lambda x: get_json_meta(x, config['parquet_root'])['page_count'])
htids.head(2)

In [10]:
# Where the magic happens
meta.extend_db(htids)

In [14]:
meta.get_volume('uiuo.ark:/13960/t6g20tp0c', ['htid', 'title', 'page_count'])

htid                                  uiuo.ark:/13960/t6g20tp0c
title         Standard shop efficiency schedules, by Henry W...
page_count                                                  800
Name: 0, dtype: object