Ingest catalogue data #4

raquelalegre · 2018-05-02T10:37:46Z

Period information is already in the glossaries ES DB, but location and genre (and other possibly interesting bits like sub/super genre) are not linked to individual glossary entries. Those are in the catalogue.json file Steve sent. We need to:

Check all entries in our ES glossary are linked to a P-object in catalogue.json.
Add the instance IDs to the ES glossary entries.
Add the catalogue "members" (i.e. P-objects).
Change the search_all endpoint to return also genre and location (and maybe other things) by joining catalogue entries and glossary entries.

The text was updated successfully, but these errors were encountered:

ageorgou · 2018-05-02T17:55:58Z

Some relevant info on adding instance IDs from oracc/elastic-search-poc#4:

The instances field of the glossary has a list of of (lists of occurrences). Each element (i.e. list of occurrences) has an id, referred to from the xis field of an entry, with two caveats:

Some ids have the form [lan].[abcde].p.[per], where the part p.[per] refers to a particular period. These are probably not referred to directly from the entries but are generated automatically for other reasons. Additionally, for each of these, there should be a corresponding instance with id [lan].[abcde], containing the same list of occurrences. We can forget about the period-specific instances and only use the "general" ones (i.e. without the .p.[per] part).

Some sub-fields of entries (norms, forms, ...) also have xis fields referring to potentially distinct elements of instances. These will be sub-lists of the list of occurrences referred to by the top-level entry. We can decide how to present the results, whether it's just using the top-level xis or providing more detail.

ageorgou · 2018-05-10T15:30:24Z

Two questions (probably for Steve):

Is it true that all items have a supergenre and genre, but not necessarily a subgenre?
Are P-numbers (eg P010632) unique across all projects, or only within a certain project?

Looking into this further, it seems that some catalogue entries are missing at least one of genre, supergenre or period (see results for catalogue.json for the "neo" project: catalogue_missing.txt)

raquelalegre · 2018-05-15T12:17:59Z

From conversation with Steve:

Pnumbers are unique across the whole DB.
Default to unknown genre if it's not in the DB. Subgenres have not been curated. We can offer users search on them, but it's not standardized. Instead of dropdown with limited list of options for subgenre, we can use a free text box (Steve thinks it would be useful for users to search sub-genres).

ageorgou mentioned this issue Jun 22, 2018

Ingest instances and allow non-string types #12

Merged

raquelalegre assigned ageorgou and raquelalegre Jul 27, 2018

ageorgou added the data Ingestion and preprocessing of data label May 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ingest catalogue data #4

Ingest catalogue data #4

raquelalegre commented May 2, 2018 •

edited

Loading

ageorgou commented May 2, 2018

ageorgou commented May 10, 2018 •

edited

Loading

raquelalegre commented May 15, 2018 •

edited by ageorgou

Loading

Ingest catalogue data #4

Ingest catalogue data #4

Comments

raquelalegre commented May 2, 2018 • edited Loading

ageorgou commented May 2, 2018

ageorgou commented May 10, 2018 • edited Loading

raquelalegre commented May 15, 2018 • edited by ageorgou Loading

raquelalegre commented May 2, 2018 •

edited

Loading

ageorgou commented May 10, 2018 •

edited

Loading

raquelalegre commented May 15, 2018 •

edited by ageorgou

Loading