---

### for me

 - [YAML Cheatsheet](https://quickref.me/yaml.html)
 - [YAML Viewer](https://jsonformatter.org/yaml-viewer)

# Notes

 - provide links and URIs wherever possible
 - _extended_ YAML with inline links in Markdown style [text](https://www.example.org)
 - 

# Feedback

 - the more you annotate _what_ the information in the guides _is_ (types, meta-info), the more information we can extract and potentially link together with the Datahub
 - try to keep structures _across_ guides of different levels _and_ pieces of information as uniform as possible
   -> makes info _predictable_ for others and allows to link more into the Datahub
 - 

# Problems

 - which URIs?
   -> definitely the Datahub ones _at some point_ but those aren't ready yet
  - which thesaurus/thesauri?

# Thoughts, Ideas

 - link directly to other search guides (across levels)
 - link directly to example objects that the current guide pertains to (is examplary of)
   -> this serves the "points of acces"/"portals" motivation of the research guides

---

# YAML to Markdown Parsing - custom-built

In [None]:
from glob import glob
import yaml

from ResearchAids import ResearchAid

BASE_DIR = "../published"
eng = glob(f"{BASE_DIR}/*/English/*.yml")
dutch = glob(f"{BASE_DIR}/*/Dutch/*.yml")
# top = glob(f"{BASE_DIR}/TopLevel/*.yml")

yaml_files = sorted(dutch + eng)

for filename in yaml_files:
    with open(filename) as handle:
        yml = yaml.safe_load(handle)
        try:
            r = ResearchAid(yml, raise_parsing_error=True)
        except KeyError as e:
            print(filename, e)
            # raise
            if "remarks" in str(e):
                print(filename, e)
                raise
            else:
                print(filename, e)
        except AttributeError as e:
            print(filename, e)



In [None]:
for y in yamls:
    print(y.keys())

In [None]:
# print(YAML2MD(yamls[1])())
# yamls[3]["Relevant data"]#["Tags"]
print([s.keys() for s in yamls[2]["Sources"]['Secondary sources']][0], "\n---")
print([s.keys() for s in yamls[3]["Sources"]['Secondary sources']][0])



In [None]:
# print(Level2(yamls[2])())

print(ResearchAid(yamls[0], raise_parsing_error=True)())

In [None]:
with open("../published/niveau3/English/NZG_20240508.yml") as handle:
    yml = yaml.safe_load(handle)

[d.keys() for d in yml["Sources"]['Secondary sources']]
yml["Sources"]['Secondary sources']

---

In [None]:
import re
from glob import glob
import yaml


def correct_IRI(url):
    # correct IRIs:
    #  - https://sws.geonames.org/6255149/
    #  - http://vocab.getty.edu/aat/300266789
    # -- http://www.wikidata.org/entity/Q219477
    md_link_re = re.compile(r"\[(.*)\]\(https?:\/\/(?:sws|www).geonames.org\/([0-9]+)\/?.*\)")
    # uri_re = re.compile(r"^https?:\/\/(?:sws|www).geonames.org\/([0-9]+)\/?.*")

    if md_link_re.match(url):
        link_text, geonames_id = md_link_re.match(url).group(1), md_link_re.match(url).group(2)
        print(f"parsed {url}")
        return f"[{link_text}](https://sws.geonames.org/{geonames_id}/)"
    elif ("http" in url[:20]) or ("www" in url[:20]):
        print(f" {url}  didn't parse! is it correct?")
    else:
        pass
    
    return url



complex_types = (list, dict)
def iter_urls(yml):
    if isinstance(yml, str):
        return correct_IRI(yml)
    if isinstance(yml, list):
        return list(map(iter_urls, yml))
    if isinstance(yml, dict):
        return {iter_urls(k): iter_urls(v) for k, v in yml.items()}
    return yml

In [None]:
BASE_DIR = "../published"
eng = glob(f"{BASE_DIR}/*/English/*.yml")
dutch = glob(f"{BASE_DIR}/*/Dutch/*.yml")
# top = glob(f"{BASE_DIR}/TopLevel/*.yml")

yaml_files = sorted(dutch + eng)

for filename in yaml_files:
    print(filename)
    with open(filename) as handle:
        yml = yaml.safe_load(handle)
        iter_urls(yml)
    print("\n-------------------\n")

In [None]:
with open("../published/niveau0/Dutch/TopLevel_20240606.yml") as handle:
    yml = yaml.safe_load(handle)
    print(yml == iter_urls(yml))


---

# parsing MD to DOCX

In [1]:
import yaml
from ResearchAids import ResearchAid

In [3]:
aid = "../published/niveau3/English/WMLeiden_20240508.yml"
md_file = "../EXPORTS/MD/niveau3/English/WMLeiden.md"

with open(md_file) as handle:
    md = handle.read()

with open(aid) as handle:
    yml = yaml.safe_load(handle)


# print(md)

In [4]:
ra = ResearchAid(yml)
if ra._parsed:
    md_content = ra()
    
    with open("test.md", "w") as handle:
        handle.write(md_content)

In [16]:
import re

img_regex = re.compile(r"!\[.+\]\(.+\)")

md2 = md_content[:]
for instance in img_regex.findall(md2):
    md2 = md2.replace(instance, instance[1:])


In [18]:
print(md2)

_This is a level 3 Research Aid_  
_first edited by Wiebe Reints as original_author on 2024-05-08_  
_last edited by Abacus as translator on 2025-04-24
        (applies to section: Main-text; Sources)_


# Wereldmuseum Leiden


## Abstract

Today's Wereldmuseum Leiden, which was previously known as Museum Volkenkunde and before that as the 's Rijks Etnografisch Museum (State Ethnographic Museum), was founded in 1837 from the amalgamation of several collections of objects that were acquired in regions that were under Dutch colonial rule.

[Wereldmuseum Leiden (November 2024)](https://upload.wikimedia.org/wikipedia/commons/5/51/Wereldmuseum_Leiden_%28nov_2024%29.jpg "The Wereldmuseum Leiden in November 2024.")

### History of the collection

The history of today’s Wereldmuseum Leiden goes back to 1837. Its first hundred years were characterised by financial difficulties, the accumulation of large quantities of objects and many changes of location. In 1937 the museum finally found a perma

In [7]:
from datetime import datetime



'2025-07-28 12:49:04.625968'