# ontology-building GUI example code

To start building a whitelist for messy key:value data you'll need a few things:

* a list of paths to images (so we can do sanity checks)
* a list of the same length, containing a list of key:value tags for each image
* previous version of the ontology if we're not starting from scratch

In [1]:
import json
from tqdm import tqdm
import logging
import panel as pn
import os
import dateutil

import promptadour

In [2]:
pn.extension()

## set up logging

If you want you can have `promptadour` track everything you do, so you can try to estimate the return on investment for more time spent adding to an ontology. It records at the `logging.INFO` level.

In [3]:
logging.basicConfig(filename='bigearthnet_osm_ontology.log', encoding='utf-8', level=logging.INFO)

## find all the images

In [4]:
imdir = "/mnt/data/BigEarthNet_3channel/"
imfiles = [imdir+x for x in os.listdir(imdir)]
len(imfiles)

590326

Check the first few entries:

In [5]:
imfiles[:3]

['/mnt/data/BigEarthNet_3channel/S2B_MSIL2A_20180326T112109_37_77.png',
 '/mnt/data/BigEarthNet_3channel/S2A_MSIL2A_20171201T112431_30_51.png',
 '/mnt/data/BigEarthNet_3channel/S2A_MSIL2A_20180413T95032_34_35.png']

## find all the tags

For this example, I've already downloaded all the OSM tags for the chips in BigEarthNet. I'll also add some non-OSM tags:

* a "bigearthnet" key for each of the multihot labels in BEN
* a "month" key to record the collection time for the image

In [6]:
# load OSM tags
labels = json.load(open("/mnt/data/BigEarthNet_models/osm_tags_all.json", "r"))
len(labels)

588365

In [7]:
# add custom types
taglist = []
emptycounter = 0
for i in tqdm(imfiles):
    k = i.split("/")[-1].replace(".png", ".json")
    if k in labels:
        tags = []
        tags += [t for t in labels[k]["osmtags"]]
        tags += ["bigearthnet:"+l for l in labels[k]["labels"]]
        tags += ["month:"+str(dateutil.parser.parse(labels[k]["acquisition_date"]).month)]
        taglist.append(tags)
    else:
        taglist.append([])
        emptycounter += 1
print(len(taglist))
print(f"{emptycounter} images with no labels")

100%|█████████████████████████████████| 590326/590326 [01:07<00:00, 8802.99it/s]

590326
1961 images with no labels





Check the first few entries:

In [8]:
[t[:5] for t in taglist[:3]]

[['name:Estômbar e Parchal',
  'motor_vehicle:yes',
  'MCC:268',
  'highway:turning_circle',
  'highway:track'],
 ['waterway:stream',
  'highway:unclassified',
  'source:Bing',
  'highway:track',
  'name:Ribeira da Perna da Negra'],
 ['seamark:type:cable_submarine',
  'admin_level:9',
  'submarine:yes',
  'surface:unpaved',
  'vayla_id:FI224000000323285']]

## load previous work

If you're not starting from scratch, load what you've got so far

In [9]:
ont = json.load(open("ontologies/bigearthnet_osm.json", "r"))
len(ont["whitelist"])

235

This should be a dictionary containing "whitelist" and "blacklist" keys.

In [10]:
ont["blacklist"][:10]

['name',
 'addr',
 'ref',
 'source',
 'official_name',
 'wikidata',
 'mml',
 'alt_name',
 'wikipedia',
 'note']

Each entry in the whitelist maps a key:value pair to a type of tag and a list of strings you could represent it with in a prompt:

In [11]:
ont["whitelist"]["highway:track"]

{'tagtype': 'thing', 'strings': ['highway', 'road', 'street']}

## Create the GUI object

You can specify whatever tag types you want, but the defaults are "thing", "stuff", and "context".

In [12]:
%%time
of = promptadour.OntFarm(taglist, filepaths=imfiles, ontology=ont, saveto="ontfarmtest_bigearthnet_osm.json")

CPU times: user 52.6 s, sys: 60.4 ms, total: 52.7 s
Wall time: 52.7 s


In [13]:
of.serve()

INFO:bokeh.server.server:Starting Bokeh server version 2.3.3 (running on Tornado 6.1)
INFO:bokeh.server.tornado:User authentication hooks NOT provided (default user enabled)


Launching server at http://localhost:37941
