# Extract Signs from ORACC JSON
The code in this notebook will parse [ORACC](http://oracc.org) `JSON` files to extract signs from the Sumerian texts of one or more projects. 

In [1]:
import pandas as pd
import zipfile
import json
import tqdm
import requests
import errno
import os
import pickle
import re

## 0 Create Directories, if Necessary
The two directories needed for this script are `jsonzip` and `output`. If they do not exist they are created, else: do nothing.

For the code, see [Stack Overflow](http://stackoverflow.com/questions/18973418/os-mkdirpath-returns-oserror-when-directory-does-not-exist).

In [2]:
directories = ['jsonzip', 'output']
for d in directories:
    try:
        os.mkdir(d)
    except OSError as exc:
        if exc.errno !=errno.EEXIST:
            raise
        pass

## 1.1 Input Project Names
Provide a list of one or more project names, separated by commas. Note that subprojects must be listed separately, they are not included in the main project. For instance:

`epsd2/admin/ed3a, epsd2/admin/ed3b, epsd2/admin/ebla, epsd2/admin/oakk, epsd2/admin/lagash2, epsd2/admin/u3adm, epsd2/admin/u3let, epsd2/admin/u3leg, epsd2/admin/oldbab, epsd2/literary, epsd2/emesal, epsd2/earlylit, epsd2/royal, epsd2/praxis, dcclt, obmc, ckst, blms`

In [3]:
projects = input('Project(s): ').lower()

Project(s): epsd2/admin/ed3a, epsd2/admin/ed3b, epsd2/admin/ebla, epsd2/admin/oakk, epsd2/admin/lagash2, epsd2/admin/u3adm, epsd2/admin/u3let, epsd2/admin/u3leg, epsd2/admin/oldbab, epsd2/literary, epsd2/emesal, epsd2/earlylit, epsd2/royal, epsd2/praxis, dcclt, obmc, ckst, blms


## 1.2 Split the List of Projects
Split the list of projects and create a list of project names.

In [4]:
p = projects.split(',')               # split at each comma and make a list called `p`
p = [x.strip() for x in p]        # strip spaces left and right of each entry in `p`

## 1.3 Download the ZIP files
For each project in the list download all the `json` files from `http://build-oracc.museum.upenn.edu/json/`. The file is called `PROJECT.zip` (for instance: `dcclt.zip`). For subprojects the file is called `PROJECT-SUBPROJECT.zip` (for instance `cams-gkab.zip`). 

For larger projects (such as [DCCLT](http://oracc.org/dcclt)) the `zip` file may be 25Mb or more. Downloading may take some time and it may be necessary to chunk the downloading process. The `iter_content()` function in the `requests` library takes care of that.

If you have downloaded the files by hand (and put them in the `jsonzip` directory) you may skip this cell and jump directly to section [2.1 The Parsejson() function](#head21).

In [5]:
CHUNK = 16 * 1024
for project in tqdm.tqdm(p):
    project = project.replace('/', '-')
    url = "http://build-oracc.museum.upenn.edu/json/" + project + ".zip"
    file = 'jsonzip/' + project + '.zip'
    r = requests.get(url)
    if r.status_code == 200:
        print("Downloading " + url + " saving as " + file)
        with open(file, 'wb') as f:
            for c in r.iter_content(chunk_size=CHUNK):
                f.write(c)
    else:
        print(url + " does not exist.")

  6%|▌         | 1/18 [00:04<01:16,  4.49s/it]

Downloading http://build-oracc.museum.upenn.edu/json/epsd2-admin-ed3a.zip saving as jsonzip/epsd2-admin-ed3a.zip


 11%|█         | 2/18 [00:19<02:01,  7.59s/it]

Downloading http://build-oracc.museum.upenn.edu/json/epsd2-admin-ed3b.zip saving as jsonzip/epsd2-admin-ed3b.zip


 17%|█▋        | 3/18 [00:21<01:29,  6.00s/it]

Downloading http://build-oracc.museum.upenn.edu/json/epsd2-admin-ebla.zip saving as jsonzip/epsd2-admin-ebla.zip


 22%|██▏       | 4/18 [00:30<01:37,  6.97s/it]

Downloading http://build-oracc.museum.upenn.edu/json/epsd2-admin-oakk.zip saving as jsonzip/epsd2-admin-oakk.zip


 28%|██▊       | 5/18 [00:32<01:11,  5.52s/it]

Downloading http://build-oracc.museum.upenn.edu/json/epsd2-admin-lagash2.zip saving as jsonzip/epsd2-admin-lagash2.zip
Downloading http://build-oracc.museum.upenn.edu/json/epsd2-admin-u3adm.zip saving as jsonzip/epsd2-admin-u3adm.zip


 39%|███▉      | 7/18 [01:43<03:14, 17.73s/it]

Downloading http://build-oracc.museum.upenn.edu/json/epsd2-admin-u3let.zip saving as jsonzip/epsd2-admin-u3let.zip


 44%|████▍     | 8/18 [01:47<02:16, 13.66s/it]

Downloading http://build-oracc.museum.upenn.edu/json/epsd2-admin-u3leg.zip saving as jsonzip/epsd2-admin-u3leg.zip


 50%|█████     | 9/18 [01:56<01:50, 12.29s/it]

Downloading http://build-oracc.museum.upenn.edu/json/epsd2-admin-oldbab.zip saving as jsonzip/epsd2-admin-oldbab.zip


 56%|█████▌    | 10/18 [02:11<01:43, 12.97s/it]

Downloading http://build-oracc.museum.upenn.edu/json/epsd2-literary.zip saving as jsonzip/epsd2-literary.zip


 61%|██████    | 11/18 [02:12<01:05,  9.31s/it]

Downloading http://build-oracc.museum.upenn.edu/json/epsd2-emesal.zip saving as jsonzip/epsd2-emesal.zip


 67%|██████▋   | 12/18 [02:13<00:41,  6.95s/it]

Downloading http://build-oracc.museum.upenn.edu/json/epsd2-earlylit.zip saving as jsonzip/epsd2-earlylit.zip


 72%|███████▏  | 13/18 [02:23<00:38,  7.73s/it]

Downloading http://build-oracc.museum.upenn.edu/json/epsd2-royal.zip saving as jsonzip/epsd2-royal.zip


 78%|███████▊  | 14/18 [02:28<00:28,  7.12s/it]

Downloading http://build-oracc.museum.upenn.edu/json/epsd2-praxis.zip saving as jsonzip/epsd2-praxis.zip


 83%|████████▎ | 15/18 [02:46<00:30, 10.29s/it]

Downloading http://build-oracc.museum.upenn.edu/json/dcclt.zip saving as jsonzip/dcclt.zip


 89%|████████▉ | 16/18 [02:49<00:16,  8.25s/it]

Downloading http://build-oracc.museum.upenn.edu/json/obmc.zip saving as jsonzip/obmc.zip


 94%|█████████▍| 17/18 [02:52<00:06,  6.49s/it]

Downloading http://build-oracc.museum.upenn.edu/json/ckst.zip saving as jsonzip/ckst.zip


100%|██████████| 18/18 [02:59<00:00,  6.74s/it]

Downloading http://build-oracc.museum.upenn.edu/json/blms.zip saving as jsonzip/blms.zip





## <a name="head21"></a>2.1 The `parsejson()` function
The `parsejson()` function will "dig into" the `json` file (transformed into a dictionary) until it finds the relevant data. The `json` file consists of a hierarchy of `cdl` nodes; only the lowest nodes contain lemmatization data. The function goes down this hierarchy by calling itself when another `cdl` node is encountered. For nore information about the data hierarchy in the [ORACC](http://oracc.org) `json` files, see [ORACC Open Data](http://oracc.museum.upenn.edu/doc/opendata/index.html).

The argument of the `parsejson()` function is a `JSON` object, a dictionary that initially contains the entire contents of the original JSON file. The code takes the key `cdl` which itself contains an array (a list) of `JSON` objects. Iterating through these objects, if an object contains another `cdl` node, the function calls itself with this object as first argument. This way the function digs deeper and deeper into the `JSON` tree, until it does not encounter a `cdl` key anymore. Here we are at the level of individual words. The code checks for a key `f`, if it exists the signs are in the node `gdl` within the `f` node. 

In [6]:
def parsejson_signs(text):
    for JSONobject in text["cdl"]:
        field = ''
        if "cdl" in JSONobject: 
            parsejson_signs(JSONobject)
        if "type" in JSONobject and JSONobject["type"] == "field-start":
            field = JSONobject["subtype"]
        if "f" in JSONobject and not field in ['sg', 'pr']: # skip the fields "sign" and "pronunciation"
                                # in lexical texts
            if JSONobject["f"]["lang"][:3] == "sux": #only Sumerian and Emesal
                f = JSONobject["f"]["form"]
                if "sexified" in JSONobject["f"]["gdl"][0]:
                    f = JSONobject["f"]["gdl"][0]["sexified"]
                lemm = JSONobject["inst"]
                all_.append(f)
                lemm_.append(lemm)
                
#    all_.append("\nEndofDoc")
    return

## 2.2 Call the `parsejson()` function for every `JSON` file
The code in this cell will iterate through the list of projects entered above (1.1). For each project the `JSON` zip file is located in the directory `jsonzip`, named PROJECT.zip. 

Each of these files is extracted from the `zip` file and read with the command `json.loads()`, which reads the json data and transforms it into a Python dictionary (a sequence of keys and values).

This dictionary, which is called `text` is now sent to the `parsejson()` function. The function adds signs to the `sign_l` list.

In [7]:
all_ = []
lemm_ = []
for project in p:
    file = "jsonzip/" + project.replace("/", "-") + ".zip"
    try:
        z = zipfile.ZipFile(file)       # create a Zipfile object
    except:
        print(file + " does not exist or is not a proper ZIP file")
        continue
    files = z.namelist()     # list of all the files in the ZIP
    files = [name for name in files if "corpusjson" in name and name[-5:] == '.json']                                                                                                  #that holds all the P, Q, and X numbers.
    for filename in tqdm.tqdm(files):                            #iterate over the file names
        id_text = project + filename[-13:-5] # id_text is, for instance, blms/P414332
        try:
            text = z.read(filename).decode('utf-8')         #read and decode the json file of one particular text
            #print(filename)
            data_json = json.loads(text)                # make it into a json object (essentially a dictionary)
            all_.append('Start'+id_text)
            lemm_.append('Start'+id_text)   # to keep all_ and lemm_ same length
            parsejson_signs(data_json)
        except:
            print(id_text + ' is not available or not complete')

100%|██████████| 778/778 [00:00<00:00, 1445.16it/s]
100%|██████████| 3157/3157 [00:02<00:00, 1089.20it/s]
100%|██████████| 157/157 [00:00<00:00, 1918.38it/s]
  0%|          | 0/4944 [00:00<?, ?it/s]

epsd2/admin/ebla/P241764 is not available or not complete
epsd2/admin/ebla/P315437 is not available or not complete
epsd2/admin/ebla/P315459 is not available or not complete


100%|██████████| 4944/4944 [00:02<00:00, 2271.52it/s]
100%|██████████| 494/494 [00:00<00:00, 2091.75it/s]
  0%|          | 337/71496 [00:00<00:43, 1632.15it/s]

epsd2/admin/u3adm/P511905 is not available or not complete
epsd2/admin/u3adm/P511471 is not available or not complete
epsd2/admin/u3adm/P109084 is not available or not complete


  1%|          | 689/71496 [00:00<00:42, 1649.58it/s]

epsd2/admin/u3adm/P511973 is not available or not complete
epsd2/admin/u3adm/P504596 is not available or not complete


  2%|▏         | 1158/71496 [00:00<00:52, 1346.76it/s]

epsd2/admin/u3adm/P414560 is not available or not complete
epsd2/admin/u3adm/P512114 is not available or not complete
epsd2/admin/u3adm/P109115 is not available or not complete
epsd2/admin/u3adm/P511467 is not available or not complete
epsd2/admin/u3adm/P511979 is not available or not complete


  2%|▏         | 1636/71496 [00:01<00:46, 1498.81it/s]

epsd2/admin/u3adm/P105380 is not available or not complete
epsd2/admin/u3adm/P512156 is not available or not complete
epsd2/admin/u3adm/P414575 is not available or not complete


  3%|▎         | 2157/71496 [00:01<00:42, 1624.70it/s]

epsd2/admin/u3adm/P497673 is not available or not complete
epsd2/admin/u3adm/P476069 is not available or not complete
epsd2/admin/u3adm/P474548 is not available or not complete
epsd2/admin/u3adm/P474558 is not available or not complete
epsd2/admin/u3adm/P511901 is not available or not complete
epsd2/admin/u3adm/P414535 is not available or not complete
epsd2/admin/u3adm/P414516 is not available or not complete
epsd2/admin/u3adm/P497679 is not available or not complete


  4%|▎         | 2677/71496 [00:01<00:42, 1627.20it/s]

epsd2/admin/u3adm/P511956 is not available or not complete
epsd2/admin/u3adm/P109129 is not available or not complete


  4%|▍         | 3035/71496 [00:01<00:40, 1697.01it/s]

epsd2/admin/u3adm/P474539 is not available or not complete
epsd2/admin/u3adm/P474557 is not available or not complete
epsd2/admin/u3adm/P511912 is not available or not complete
epsd2/admin/u3adm/P105361 is not available or not complete


  5%|▌         | 3616/71496 [00:02<00:37, 1796.56it/s]

epsd2/admin/u3adm/P511969 is not available or not complete
epsd2/admin/u3adm/P512144 is not available or not complete
epsd2/admin/u3adm/P511526 is not available or not complete


  6%|▌         | 3981/71496 [00:02<00:43, 1538.46it/s]

epsd2/admin/u3adm/P497669 is not available or not complete
epsd2/admin/u3adm/P511402 is not available or not complete


  6%|▌         | 4307/71496 [00:02<00:44, 1498.19it/s]

epsd2/admin/u3adm/P511446 is not available or not complete
epsd2/admin/u3adm/P474535 is not available or not complete


  7%|▋         | 4983/71496 [00:03<00:45, 1461.38it/s]

epsd2/admin/u3adm/P114110 is not available or not complete
epsd2/admin/u3adm/P511983 is not available or not complete
epsd2/admin/u3adm/P512131 is not available or not complete


  7%|▋         | 5345/71496 [00:03<00:41, 1612.83it/s]

epsd2/admin/u3adm/P512108 is not available or not complete


  8%|▊         | 5879/71496 [00:03<00:41, 1591.05it/s]

epsd2/admin/u3adm/P511949 is not available or not complete
epsd2/admin/u3adm/P474534 is not available or not complete


  9%|▊         | 6199/71496 [00:03<00:43, 1513.29it/s]

epsd2/admin/u3adm/P512140 is not available or not complete
epsd2/admin/u3adm/P414550 is not available or not complete
epsd2/admin/u3adm/P105530 is not available or not complete
epsd2/admin/u3adm/P511911 is not available or not complete
epsd2/admin/u3adm/P511876 is not available or not complete
epsd2/admin/u3adm/P511987 is not available or not complete


 10%|▉         | 6912/71496 [00:04<00:42, 1519.37it/s]

epsd2/admin/u3adm/P511926 is not available or not complete
epsd2/admin/u3adm/P414571 is not available or not complete
epsd2/admin/u3adm/P476056 is not available or not complete


 10%|█         | 7485/71496 [00:04<00:38, 1670.49it/s]

epsd2/admin/u3adm/P476078 is not available or not complete
epsd2/admin/u3adm/P478289 is not available or not complete
epsd2/admin/u3adm/P414521 is not available or not complete
epsd2/admin/u3adm/P114184 is not available or not complete
epsd2/admin/u3adm/P512146 is not available or not complete
epsd2/admin/u3adm/P139502 is not available or not complete
epsd2/admin/u3adm/P329926 is not available or not complete


 11%|█         | 7829/71496 [00:05<00:39, 1596.84it/s]

epsd2/admin/u3adm/P511916 is not available or not complete
epsd2/admin/u3adm/P511435 is not available or not complete
epsd2/admin/u3adm/P476061 is not available or not complete
epsd2/admin/u3adm/P512147 is not available or not complete


 11%|█▏        | 8196/71496 [00:05<00:37, 1684.46it/s]

epsd2/admin/u3adm/P474544 is not available or not complete
epsd2/admin/u3adm/P476067 is not available or not complete


 12%|█▏        | 8791/71496 [00:05<00:33, 1857.09it/s]

epsd2/admin/u3adm/P511976 is not available or not complete
epsd2/admin/u3adm/P474530 is not available or not complete
epsd2/admin/u3adm/P476062 is not available or not complete
epsd2/admin/u3adm/P139503 is not available or not complete


 13%|█▎        | 8983/71496 [00:05<00:33, 1873.72it/s]

epsd2/admin/u3adm/P512107 is not available or not complete


 13%|█▎        | 9366/71496 [00:05<00:40, 1529.54it/s]

epsd2/admin/u3adm/P511909 is not available or not complete
epsd2/admin/u3adm/P511621 is not available or not complete
epsd2/admin/u3adm/P511558 is not available or not complete
epsd2/admin/u3adm/P512137 is not available or not complete
epsd2/admin/u3adm/P512103 is not available or not complete


 14%|█▎        | 9765/71496 [00:06<00:36, 1686.74it/s]

epsd2/admin/u3adm/P108848 is not available or not complete
epsd2/admin/u3adm/P511555 is not available or not complete
epsd2/admin/u3adm/P478307 is not available or not complete


 14%|█▍        | 10312/71496 [00:06<00:35, 1709.22it/s]

epsd2/admin/u3adm/P511609 is not available or not complete
epsd2/admin/u3adm/P430694 is not available or not complete
epsd2/admin/u3adm/P274567 is not available or not complete
epsd2/admin/u3adm/P511990 is not available or not complete
epsd2/admin/u3adm/P414548 is not available or not complete
epsd2/admin/u3adm/P312454 is not available or not complete


 15%|█▌        | 10846/71496 [00:06<00:35, 1721.21it/s]

epsd2/admin/u3adm/P511455 is not available or not complete


 16%|█▌        | 11221/71496 [00:06<00:33, 1799.39it/s]

epsd2/admin/u3adm/P511612 is not available or not complete
epsd2/admin/u3adm/P478279 is not available or not complete
epsd2/admin/u3adm/P414533 is not available or not complete
epsd2/admin/u3adm/P329888 is not available or not complete
epsd2/admin/u3adm/P511544 is not available or not complete
epsd2/admin/u3adm/P511589 is not available or not complete


 16%|█▌        | 11583/71496 [00:07<00:34, 1721.14it/s]

epsd2/admin/u3adm/P109109 is not available or not complete
epsd2/admin/u3adm/P105447 is not available or not complete
epsd2/admin/u3adm/P512159 is not available or not complete
epsd2/admin/u3adm/P512150 is not available or not complete


 16%|█▋        | 11776/71496 [00:07<00:33, 1768.96it/s]

epsd2/admin/u3adm/P476065 is not available or not complete
epsd2/admin/u3adm/P511632 is not available or not complete
epsd2/admin/u3adm/P511906 is not available or not complete
epsd2/admin/u3adm/P478293 is not available or not complete


 17%|█▋        | 12322/71496 [00:07<00:38, 1543.23it/s]

epsd2/admin/u3adm/P512141 is not available or not complete
epsd2/admin/u3adm/P474560 is not available or not complete
epsd2/admin/u3adm/P113145 is not available or not complete
epsd2/admin/u3adm/P109096 is not available or not complete
epsd2/admin/u3adm/P414519 is not available or not complete


 18%|█▊        | 12691/71496 [00:07<00:34, 1681.78it/s]

epsd2/admin/u3adm/P478297 is not available or not complete
epsd2/admin/u3adm/P511989 is not available or not complete
epsd2/admin/u3adm/P512102 is not available or not complete


 18%|█▊        | 13063/71496 [00:08<00:33, 1768.35it/s]

epsd2/admin/u3adm/P478284 is not available or not complete
epsd2/admin/u3adm/P477691 is not available or not complete
epsd2/admin/u3adm/P511434 is not available or not complete
epsd2/admin/u3adm/P476082 is not available or not complete
epsd2/admin/u3adm/P511396 is not available or not complete
epsd2/admin/u3adm/P511591 is not available or not complete
epsd2/admin/u3adm/P478294 is not available or not complete


 19%|█▉        | 13788/71496 [00:08<00:34, 1677.34it/s]

epsd2/admin/u3adm/P142626 is not available or not complete
epsd2/admin/u3adm/P512118 is not available or not complete
epsd2/admin/u3adm/P414529 is not available or not complete
epsd2/admin/u3adm/P511963 is not available or not complete
epsd2/admin/u3adm/P105378 is not available or not complete
epsd2/admin/u3adm/P105309 is not available or not complete
epsd2/admin/u3adm/P512138 is not available or not complete


 20%|█▉        | 14140/71496 [00:08<00:33, 1713.54it/s]

epsd2/admin/u3adm/P477698 is not available or not complete
epsd2/admin/u3adm/P511941 is not available or not complete
epsd2/admin/u3adm/P511608 is not available or not complete
epsd2/admin/u3adm/P512121 is not available or not complete
epsd2/admin/u3adm/P474531 is not available or not complete


 20%|██        | 14504/71496 [00:08<00:32, 1761.22it/s]

epsd2/admin/u3adm/P476074 is not available or not complete
epsd2/admin/u3adm/P105340 is not available or not complete


 21%|██        | 14887/71496 [00:09<00:31, 1820.22it/s]

epsd2/admin/u3adm/P476063 is not available or not complete
epsd2/admin/u3adm/P361750 is not available or not complete
epsd2/admin/u3adm/P430674 is not available or not complete
epsd2/admin/u3adm/P511915 is not available or not complete
epsd2/admin/u3adm/P512134 is not available or not complete
epsd2/admin/u3adm/P333133 is not available or not complete
epsd2/admin/u3adm/P414576 is not available or not complete


 21%|██        | 15071/71496 [00:09<00:31, 1811.75it/s]

epsd2/admin/u3adm/P511412 is not available or not complete
epsd2/admin/u3adm/P477695 is not available or not complete
epsd2/admin/u3adm/P105542 is not available or not complete


 22%|██▏       | 15586/71496 [00:09<00:38, 1446.26it/s]

epsd2/admin/u3adm/P109091 is not available or not complete
epsd2/admin/u3adm/P120695 is not available or not complete
epsd2/admin/u3adm/P114180 is not available or not complete
epsd2/admin/u3adm/P109121 is not available or not complete


 22%|██▏       | 15893/71496 [00:09<00:39, 1407.10it/s]

epsd2/admin/u3adm/P476076 is not available or not complete
epsd2/admin/u3adm/P114143 is not available or not complete
epsd2/admin/u3adm/P414553 is not available or not complete


 23%|██▎       | 16241/71496 [00:10<00:35, 1555.20it/s]

epsd2/admin/u3adm/P497674 is not available or not complete
epsd2/admin/u3adm/P114107 is not available or not complete
epsd2/admin/u3adm/P109123 is not available or not complete


 23%|██▎       | 16556/71496 [00:10<00:37, 1478.67it/s]

epsd2/admin/u3adm/P511913 is not available or not complete
epsd2/admin/u3adm/P511527 is not available or not complete
epsd2/admin/u3adm/P497670 is not available or not complete
epsd2/admin/u3adm/P414561 is not available or not complete
epsd2/admin/u3adm/P112322 is not available or not complete


 24%|██▍       | 17037/71496 [00:10<00:35, 1553.52it/s]

epsd2/admin/u3adm/P248996 is not available or not complete
epsd2/admin/u3adm/P139504 is not available or not complete


 24%|██▍       | 17411/71496 [00:10<00:31, 1701.90it/s]

epsd2/admin/u3adm/P511925 is not available or not complete
epsd2/admin/u3adm/P105307 is not available or not complete
epsd2/admin/u3adm/P511945 is not available or not complete
epsd2/admin/u3adm/P512153 is not available or not complete


 25%|██▌       | 17952/71496 [00:11<00:30, 1742.83it/s]

epsd2/admin/u3adm/P511619 is not available or not complete
epsd2/admin/u3adm/P511603 is not available or not complete


 26%|██▌       | 18594/71496 [00:11<00:38, 1375.62it/s]

epsd2/admin/u3adm/P511978 is not available or not complete
epsd2/admin/u3adm/P474538 is not available or not complete
epsd2/admin/u3adm/P511423 is not available or not complete


 27%|██▋       | 18948/71496 [00:11<00:33, 1546.81it/s]

epsd2/admin/u3adm/P511931 is not available or not complete
epsd2/admin/u3adm/P511980 is not available or not complete


 27%|██▋       | 19307/71496 [00:12<00:31, 1661.69it/s]

epsd2/admin/u3adm/P511902 is not available or not complete
epsd2/admin/u3adm/P511487 is not available or not complete
epsd2/admin/u3adm/P139501 is not available or not complete
epsd2/admin/u3adm/P511924 is not available or not complete


 28%|██▊       | 19879/71496 [00:12<00:29, 1732.47it/s]

epsd2/admin/u3adm/P114108 is not available or not complete
epsd2/admin/u3adm/P144316 is not available or not complete
epsd2/admin/u3adm/P430673 is not available or not complete
epsd2/admin/u3adm/P511569 is not available or not complete
epsd2/admin/u3adm/P114181 is not available or not complete
epsd2/admin/u3adm/P414523 is not available or not complete
epsd2/admin/u3adm/P268222 is not available or not complete
epsd2/admin/u3adm/P114151 is not available or not complete
epsd2/admin/u3adm/P109103 is not available or not complete
epsd2/admin/u3adm/P511878 is not available or not complete


 28%|██▊       | 20238/71496 [00:12<00:31, 1638.59it/s]

epsd2/admin/u3adm/P511919 is not available or not complete
epsd2/admin/u3adm/P414528 is not available or not complete


 29%|██▉       | 20767/71496 [00:13<00:30, 1670.31it/s]

epsd2/admin/u3adm/P511975 is not available or not complete
epsd2/admin/u3adm/P476088 is not available or not complete
epsd2/admin/u3adm/P430672 is not available or not complete
epsd2/admin/u3adm/P511927 is not available or not complete
epsd2/admin/u3adm/P511974 is not available or not complete
epsd2/admin/u3adm/P511439 is not available or not complete


 30%|██▉       | 21129/71496 [00:13<00:39, 1289.02it/s]

epsd2/admin/u3adm/P511917 is not available or not complete
epsd2/admin/u3adm/P478301 is not available or not complete
epsd2/admin/u3adm/P511582 is not available or not complete
epsd2/admin/u3adm/P109114 is not available or not complete


 30%|███       | 21669/71496 [00:13<00:31, 1588.31it/s]

epsd2/admin/u3adm/P474541 is not available or not complete
epsd2/admin/u3adm/P511928 is not available or not complete


 31%|███       | 22031/71496 [00:13<00:29, 1649.93it/s]

epsd2/admin/u3adm/P103285 is not available or not complete
epsd2/admin/u3adm/P432386 is not available or not complete


 31%|███▏      | 22387/71496 [00:14<00:30, 1635.47it/s]

epsd2/admin/u3adm/P109100 is not available or not complete
epsd2/admin/u3adm/P511950 is not available or not complete
epsd2/admin/u3adm/P511579 is not available or not complete
epsd2/admin/u3adm/P414552 is not available or not complete


 32%|███▏      | 22947/71496 [00:14<00:27, 1791.20it/s]

epsd2/admin/u3adm/P511431 is not available or not complete
epsd2/admin/u3adm/P249240 is not available or not complete
epsd2/admin/u3adm/P114103 is not available or not complete


 33%|███▎      | 23307/71496 [00:14<00:28, 1710.98it/s]

epsd2/admin/u3adm/P114106 is not available or not complete
epsd2/admin/u3adm/P414562 is not available or not complete
epsd2/admin/u3adm/P476057 is not available or not complete
epsd2/admin/u3adm/P512157 is not available or not complete
epsd2/admin/u3adm/P476077 is not available or not complete
epsd2/admin/u3adm/P512100 is not available or not complete


 33%|███▎      | 23684/71496 [00:14<00:26, 1783.55it/s]

epsd2/admin/u3adm/P144092 is not available or not complete
epsd2/admin/u3adm/P512185 is not available or not complete
epsd2/admin/u3adm/P478299 is not available or not complete
epsd2/admin/u3adm/P511960 is not available or not complete


 34%|███▎      | 24038/71496 [00:15<00:28, 1649.35it/s]

epsd2/admin/u3adm/P511965 is not available or not complete
epsd2/admin/u3adm/P109090 is not available or not complete


 34%|███▍      | 24231/71496 [00:15<00:27, 1721.75it/s]

epsd2/admin/u3adm/P109089 is not available or not complete


 34%|███▍      | 24601/71496 [00:15<00:37, 1240.35it/s]

epsd2/admin/u3adm/P330479 is not available or not complete
epsd2/admin/u3adm/P105537 is not available or not complete
epsd2/admin/u3adm/P414555 is not available or not complete


 35%|███▌      | 25307/71496 [00:16<00:29, 1577.37it/s]

epsd2/admin/u3adm/P476084 is not available or not complete


 36%|███▌      | 25661/71496 [00:16<00:27, 1664.81it/s]

epsd2/admin/u3adm/P512149 is not available or not complete
epsd2/admin/u3adm/P511943 is not available or not complete
epsd2/admin/u3adm/P114185 is not available or not complete
epsd2/admin/u3adm/P476058 is not available or not complete
epsd2/admin/u3adm/P512106 is not available or not complete
epsd2/admin/u3adm/P511464 is not available or not complete
epsd2/admin/u3adm/P330391 is not available or not complete
epsd2/admin/u3adm/P331094 is not available or not complete
epsd2/admin/u3adm/P478308 is not available or not complete


 37%|███▋      | 26440/71496 [00:16<00:24, 1808.72it/s]

epsd2/admin/u3adm/P474529 is not available or not complete
epsd2/admin/u3adm/P511880 is not available or not complete


 38%|███▊      | 27005/71496 [00:16<00:25, 1769.48it/s]

epsd2/admin/u3adm/P511448 is not available or not complete
epsd2/admin/u3adm/P474536 is not available or not complete
epsd2/admin/u3adm/P511410 is not available or not complete


 38%|███▊      | 27367/71496 [00:17<00:25, 1759.10it/s]

epsd2/admin/u3adm/P109113 is not available or not complete
epsd2/admin/u3adm/P500140 is not available or not complete
epsd2/admin/u3adm/P105305 is not available or not complete
epsd2/admin/u3adm/P478290 is not available or not complete


 39%|███▊      | 27547/71496 [00:17<00:24, 1770.85it/s]

epsd2/admin/u3adm/P114105 is not available or not complete


 39%|███▉      | 27869/71496 [00:17<00:37, 1176.21it/s]

epsd2/admin/u3adm/P331645 is not available or not complete
epsd2/admin/u3adm/P414530 is not available or not complete
epsd2/admin/u3adm/P512109 is not available or not complete
epsd2/admin/u3adm/P511853 is not available or not complete


 39%|███▉      | 28235/71496 [00:17<00:31, 1395.13it/s]

epsd2/admin/u3adm/P511961 is not available or not complete
epsd2/admin/u3adm/P511453 is not available or not complete
epsd2/admin/u3adm/P478296 is not available or not complete
epsd2/admin/u3adm/P478305 is not available or not complete
epsd2/admin/u3adm/P512115 is not available or not complete


 40%|████      | 28619/71496 [00:18<00:26, 1613.02it/s]

epsd2/admin/u3adm/P512099 is not available or not complete
epsd2/admin/u3adm/P512160 is not available or not complete
epsd2/admin/u3adm/P476055 is not available or not complete


 41%|████      | 29198/71496 [00:18<00:23, 1800.59it/s]

epsd2/admin/u3adm/P414563 is not available or not complete
epsd2/admin/u3adm/P322406 is not available or not complete
epsd2/admin/u3adm/P139506 is not available or not complete
epsd2/admin/u3adm/P511629 is not available or not complete


 41%|████▏     | 29604/71496 [00:18<00:22, 1878.85it/s]

epsd2/admin/u3adm/P109080 is not available or not complete
epsd2/admin/u3adm/P511910 is not available or not complete
epsd2/admin/u3adm/P109095 is not available or not complete
epsd2/admin/u3adm/P476087 is not available or not complete


 42%|████▏     | 30188/71496 [00:18<00:22, 1865.48it/s]

epsd2/admin/u3adm/P114148 is not available or not complete
epsd2/admin/u3adm/P512145 is not available or not complete
epsd2/admin/u3adm/P511416 is not available or not complete
epsd2/admin/u3adm/P109097 is not available or not complete
epsd2/admin/u3adm/P511865 is not available or not complete


 43%|████▎     | 30599/71496 [00:19<00:21, 1946.29it/s]

epsd2/admin/u3adm/P474551 is not available or not complete
epsd2/admin/u3adm/P511614 is not available or not complete
epsd2/admin/u3adm/P512148 is not available or not complete
epsd2/admin/u3adm/P474553 is not available or not complete
epsd2/admin/u3adm/P512130 is not available or not complete
epsd2/admin/u3adm/P105297 is not available or not complete
epsd2/admin/u3adm/P511985 is not available or not complete
epsd2/admin/u3adm/P109118 is not available or not complete


 43%|████▎     | 30815/71496 [00:19<00:20, 2005.66it/s]

epsd2/admin/u3adm/P511456 is not available or not complete
epsd2/admin/u3adm/P414524 is not available or not complete


 44%|████▍     | 31609/71496 [00:19<00:24, 1653.12it/s]

epsd2/admin/u3adm/P113246 is not available or not complete
epsd2/admin/u3adm/P248913 is not available or not complete
epsd2/admin/u3adm/P511630 is not available or not complete
epsd2/admin/u3adm/P474549 is not available or not complete
epsd2/admin/u3adm/P511954 is not available or not complete
epsd2/admin/u3adm/P249111 is not available or not complete
epsd2/admin/u3adm/P414566 is not available or not complete
epsd2/admin/u3adm/P109111 is not available or not complete
epsd2/admin/u3adm/P512127 is not available or not complete
epsd2/admin/u3adm/P476053 is not available or not complete
epsd2/admin/u3adm/P511415 is not available or not complete
epsd2/admin/u3adm/P112316 is not available or not complete


 45%|████▍     | 31969/71496 [00:20<00:25, 1563.74it/s]

epsd2/admin/u3adm/P477696 is not available or not complete
epsd2/admin/u3adm/P109127 is not available or not complete
epsd2/admin/u3adm/P512091 is not available or not complete
epsd2/admin/u3adm/P114146 is not available or not complete
epsd2/admin/u3adm/P511404 is not available or not complete
epsd2/admin/u3adm/P332487 is not available or not complete
epsd2/admin/u3adm/P474555 is not available or not complete


 45%|████▌     | 32325/71496 [00:20<00:24, 1623.05it/s]

epsd2/admin/u3adm/P476079 is not available or not complete
epsd2/admin/u3adm/P109122 is not available or not complete
epsd2/admin/u3adm/P512125 is not available or not complete


 46%|████▌     | 32663/71496 [00:20<00:23, 1636.37it/s]

epsd2/admin/u3adm/P512143 is not available or not complete
epsd2/admin/u3adm/P478310 is not available or not complete
epsd2/admin/u3adm/P511958 is not available or not complete


 47%|████▋     | 33500/71496 [00:20<00:19, 1961.75it/s]

epsd2/admin/u3adm/P511440 is not available or not complete
epsd2/admin/u3adm/P477694 is not available or not complete
epsd2/admin/u3adm/P511984 is not available or not complete
epsd2/admin/u3adm/P477693 is not available or not complete
epsd2/admin/u3adm/P511474 is not available or not complete
epsd2/admin/u3adm/P478278 is not available or not complete
epsd2/admin/u3adm/P414522 is not available or not complete
epsd2/admin/u3adm/P454532 is not available or not complete
epsd2/admin/u3adm/P430677 is not available or not complete


 48%|████▊     | 34097/71496 [00:21<00:19, 1927.06it/s]

epsd2/admin/u3adm/P105310 is not available or not complete
epsd2/admin/u3adm/P511977 is not available or not complete
epsd2/admin/u3adm/P414551 is not available or not complete
epsd2/admin/u3adm/P114182 is not available or not complete
epsd2/admin/u3adm/P511599 is not available or not complete
epsd2/admin/u3adm/P511918 is not available or not complete


 48%|████▊     | 34448/71496 [00:21<00:28, 1306.33it/s]

epsd2/admin/u3adm/P511581 is not available or not complete
epsd2/admin/u3adm/P414520 is not available or not complete
epsd2/admin/u3adm/P105362 is not available or not complete
epsd2/admin/u3adm/P478281 is not available or not complete


 49%|████▊     | 34844/71496 [00:21<00:23, 1576.76it/s]

epsd2/admin/u3adm/P511602 is not available or not complete
epsd2/admin/u3adm/P511930 is not available or not complete
epsd2/admin/u3adm/P477687 is not available or not complete
epsd2/admin/u3adm/P511920 is not available or not complete
epsd2/admin/u3adm/P511964 is not available or not complete


 49%|████▉     | 35265/71496 [00:21<00:20, 1792.24it/s]

epsd2/admin/u3adm/P476081 is not available or not complete
epsd2/admin/u3adm/P511957 is not available or not complete


 50%|█████     | 35860/71496 [00:22<00:19, 1869.91it/s]

epsd2/admin/u3adm/P511546 is not available or not complete
epsd2/admin/u3adm/P511981 is not available or not complete
epsd2/admin/u3adm/P511450 is not available or not complete


 51%|█████     | 36255/71496 [00:22<00:18, 1880.87it/s]

epsd2/admin/u3adm/P114179 is not available or not complete
epsd2/admin/u3adm/P511594 is not available or not complete
epsd2/admin/u3adm/P361749 is not available or not complete
epsd2/admin/u3adm/P476070 is not available or not complete
epsd2/admin/u3adm/P114150 is not available or not complete
epsd2/admin/u3adm/P476072 is not available or not complete


 52%|█████▏    | 36838/71496 [00:22<00:18, 1912.63it/s]

epsd2/admin/u3adm/P109128 is not available or not complete


 52%|█████▏    | 37466/71496 [00:23<00:17, 1917.42it/s]

epsd2/admin/u3adm/P511929 is not available or not complete
epsd2/admin/u3adm/P511578 is not available or not complete
epsd2/admin/u3adm/P511542 is not available or not complete
epsd2/admin/u3adm/P511858 is not available or not complete


 53%|█████▎    | 37872/71496 [00:23<00:24, 1368.22it/s]

epsd2/admin/u3adm/P511908 is not available or not complete
epsd2/admin/u3adm/P478298 is not available or not complete
epsd2/admin/u3adm/P474554 is not available or not complete
epsd2/admin/u3adm/P511874 is not available or not complete
epsd2/admin/u3adm/P511904 is not available or not complete


 53%|█████▎    | 38229/71496 [00:23<00:21, 1549.18it/s]

epsd2/admin/u3adm/P512155 is not available or not complete
epsd2/admin/u3adm/P139508 is not available or not complete
epsd2/admin/u3adm/P497672 is not available or not complete
epsd2/admin/u3adm/P112317 is not available or not complete


 54%|█████▍    | 38837/71496 [00:24<00:17, 1842.67it/s]

epsd2/admin/u3adm/P511585 is not available or not complete
epsd2/admin/u3adm/P200536 is not available or not complete
epsd2/admin/u3adm/P414527 is not available or not complete
epsd2/admin/u3adm/P511855 is not available or not complete


 55%|█████▍    | 39244/71496 [00:24<00:16, 1933.86it/s]

epsd2/admin/u3adm/P103265 is not available or not complete
epsd2/admin/u3adm/P109104 is not available or not complete
epsd2/admin/u3adm/P109107 is not available or not complete


 56%|█████▌    | 39859/71496 [00:24<00:16, 1923.26it/s]

epsd2/admin/u3adm/P511549 is not available or not complete
epsd2/admin/u3adm/P512123 is not available or not complete
epsd2/admin/u3adm/P511986 is not available or not complete
epsd2/admin/u3adm/P474533 is not available or not complete
epsd2/admin/u3adm/P430676 is not available or not complete
epsd2/admin/u3adm/P511894 is not available or not complete
epsd2/admin/u3adm/P474528 is not available or not complete


 56%|█████▋    | 40273/71496 [00:24<00:15, 1998.00it/s]

epsd2/admin/u3adm/P474556 is not available or not complete
epsd2/admin/u3adm/P511948 is not available or not complete
epsd2/admin/u3adm/P511967 is not available or not complete
epsd2/admin/u3adm/P511568 is not available or not complete
epsd2/admin/u3adm/P109117 is not available or not complete


 57%|█████▋    | 40698/71496 [00:24<00:15, 2005.56it/s]

epsd2/admin/u3adm/P474537 is not available or not complete
epsd2/admin/u3adm/P105308 is not available or not complete
epsd2/admin/u3adm/P333132 is not available or not complete
epsd2/admin/u3adm/P511411 is not available or not complete


 58%|█████▊    | 41538/71496 [00:25<00:14, 2006.09it/s]

epsd2/admin/u3adm/P105383 is not available or not complete
epsd2/admin/u3adm/P474527 is not available or not complete
epsd2/admin/u3adm/P414517 is not available or not complete


 58%|█████▊    | 41740/71496 [00:25<00:22, 1312.27it/s]

epsd2/admin/u3adm/P511955 is not available or not complete


 59%|█████▉    | 42150/71496 [00:25<00:18, 1607.20it/s]

epsd2/admin/u3adm/P430671 is not available or not complete
epsd2/admin/u3adm/P511490 is not available or not complete
epsd2/admin/u3adm/P477699 is not available or not complete
epsd2/admin/u3adm/P478285 is not available or not complete
epsd2/admin/u3adm/P114142 is not available or not complete
epsd2/admin/u3adm/P414532 is not available or not complete
epsd2/admin/u3adm/P109085 is not available or not complete


 60%|█████▉    | 42784/71496 [00:26<00:15, 1826.82it/s]

epsd2/admin/u3adm/P511408 is not available or not complete
epsd2/admin/u3adm/P474532 is not available or not complete
epsd2/admin/u3adm/P105551 is not available or not complete


 61%|██████    | 43380/71496 [00:26<00:15, 1846.27it/s]

epsd2/admin/u3adm/P511537 is not available or not complete
epsd2/admin/u3adm/P511959 is not available or not complete
epsd2/admin/u3adm/P476060 is not available or not complete
epsd2/admin/u3adm/P476080 is not available or not complete


 61%|██████▏   | 43945/71496 [00:26<00:14, 1842.15it/s]

epsd2/admin/u3adm/P333130 is not available or not complete
epsd2/admin/u3adm/P105544 is not available or not complete


 62%|██████▏   | 44131/71496 [00:26<00:15, 1766.39it/s]

epsd2/admin/u3adm/P511885 is not available or not complete
epsd2/admin/u3adm/P497676 is not available or not complete
epsd2/admin/u3adm/P511492 is not available or not complete
epsd2/admin/u3adm/P109124 is not available or not complete
epsd2/admin/u3adm/P477700 is not available or not complete
epsd2/admin/u3adm/P109106 is not available or not complete
epsd2/admin/u3adm/P105304 is not available or not complete


 63%|██████▎   | 44707/71496 [00:27<00:14, 1864.65it/s]

epsd2/admin/u3adm/P109131 is not available or not complete


 63%|██████▎   | 45143/71496 [00:27<00:13, 2011.82it/s]

epsd2/admin/u3adm/P511970 is not available or not complete
epsd2/admin/u3adm/P414559 is not available or not complete
epsd2/admin/u3adm/P511921 is not available or not complete
epsd2/admin/u3adm/P474525 is not available or not complete
epsd2/admin/u3adm/P512117 is not available or not complete
epsd2/admin/u3adm/P497678 is not available or not complete
epsd2/admin/u3adm/P109081 is not available or not complete
epsd2/admin/u3adm/P114178 is not available or not complete


 64%|██████▍   | 45697/71496 [00:27<00:17, 1461.11it/s]

epsd2/admin/u3adm/P511424 is not available or not complete
epsd2/admin/u3adm/P511426 is not available or not complete
epsd2/admin/u3adm/P511523 is not available or not complete


 64%|██████▍   | 46055/71496 [00:28<00:16, 1524.79it/s]

epsd2/admin/u3adm/P109119 is not available or not complete
epsd2/admin/u3adm/P511483 is not available or not complete
epsd2/admin/u3adm/P511437 is not available or not complete


 65%|██████▍   | 46430/71496 [00:28<00:15, 1668.11it/s]

epsd2/admin/u3adm/P511932 is not available or not complete
epsd2/admin/u3adm/P478309 is not available or not complete
epsd2/admin/u3adm/P511953 is not available or not complete
epsd2/admin/u3adm/P476051 is not available or not complete
epsd2/admin/u3adm/P476075 is not available or not complete
epsd2/admin/u3adm/P474552 is not available or not complete
epsd2/admin/u3adm/P512126 is not available or not complete
epsd2/admin/u3adm/P511852 is not available or not complete


 66%|██████▌   | 46968/71496 [00:28<00:14, 1705.52it/s]

epsd2/admin/u3adm/P512119 is not available or not complete
epsd2/admin/u3adm/P478288 is not available or not complete
epsd2/admin/u3adm/P109105 is not available or not complete


 66%|██████▌   | 47334/71496 [00:28<00:13, 1733.87it/s]

epsd2/admin/u3adm/P330388 is not available or not complete
epsd2/admin/u3adm/P511536 is not available or not complete
epsd2/admin/u3adm/P476071 is not available or not complete
epsd2/admin/u3adm/P112321 is not available or not complete
epsd2/admin/u3adm/P476059 is not available or not complete


 67%|██████▋   | 47710/71496 [00:29<00:13, 1796.94it/s]

epsd2/admin/u3adm/P511881 is not available or not complete
epsd2/admin/u3adm/P497677 is not available or not complete
epsd2/admin/u3adm/P474543 is not available or not complete
epsd2/admin/u3adm/P511944 is not available or not complete


 68%|██████▊   | 48265/71496 [00:29<00:16, 1440.85it/s]

epsd2/admin/u3adm/P476717 is not available or not complete
epsd2/admin/u3adm/P511971 is not available or not complete
epsd2/admin/u3adm/P109130 is not available or not complete


 68%|██████▊   | 48630/71496 [00:29<00:14, 1616.81it/s]

epsd2/admin/u3adm/P511968 is not available or not complete
epsd2/admin/u3adm/P414570 is not available or not complete
epsd2/admin/u3adm/P478302 is not available or not complete
epsd2/admin/u3adm/P512133 is not available or not complete


 69%|██████▉   | 49402/71496 [00:30<00:12, 1760.91it/s]

epsd2/admin/u3adm/P476085 is not available or not complete
epsd2/admin/u3adm/P474547 is not available or not complete
epsd2/admin/u3adm/P511942 is not available or not complete
epsd2/admin/u3adm/P511429 is not available or not complete
epsd2/admin/u3adm/P474546 is not available or not complete


 69%|██████▉   | 49609/71496 [00:30<00:11, 1840.39it/s]

epsd2/admin/u3adm/P476052 is not available or not complete
epsd2/admin/u3adm/P478304 is not available or not complete
epsd2/admin/u3adm/P512161 is not available or not complete
epsd2/admin/u3adm/P511457 is not available or not complete
epsd2/admin/u3adm/P476086 is not available or not complete
epsd2/admin/u3adm/P512158 is not available or not complete


 70%|███████   | 50199/71496 [00:30<00:11, 1831.29it/s]

epsd2/admin/u3adm/P128551 is not available or not complete
epsd2/admin/u3adm/P139505 is not available or not complete
epsd2/admin/u3adm/P105330 is not available or not complete
epsd2/admin/u3adm/P414554 is not available or not complete
epsd2/admin/u3adm/P478295 is not available or not complete
epsd2/admin/u3adm/P511946 is not available or not complete


 71%|███████   | 50569/71496 [00:30<00:11, 1793.18it/s]

epsd2/admin/u3adm/P477689 is not available or not complete
epsd2/admin/u3adm/P511417 is not available or not complete
epsd2/admin/u3adm/P511866 is not available or not complete


 71%|███████   | 50933/71496 [00:31<00:17, 1154.64it/s]

epsd2/admin/u3adm/P511557 is not available or not complete
epsd2/admin/u3adm/P511560 is not available or not complete
epsd2/admin/u3adm/P476716 is not available or not complete


 72%|███████▏  | 51481/71496 [00:31<00:13, 1522.42it/s]

epsd2/admin/u3adm/P511620 is not available or not complete
epsd2/admin/u3adm/P512151 is not available or not complete
epsd2/admin/u3adm/P512116 is not available or not complete
epsd2/admin/u3adm/P474550 is not available or not complete


 72%|███████▏  | 51823/71496 [00:31<00:13, 1466.96it/s]

epsd2/admin/u3adm/P113137 is not available or not complete
epsd2/admin/u3adm/P511413 is not available or not complete
epsd2/admin/u3adm/P511631 is not available or not complete
epsd2/admin/u3adm/P109125 is not available or not complete
epsd2/admin/u3adm/P511580 is not available or not complete


 73%|███████▎  | 52186/71496 [00:32<00:12, 1553.54it/s]

epsd2/admin/u3adm/P511854 is not available or not complete
epsd2/admin/u3adm/P511590 is not available or not complete


 73%|███████▎  | 52536/71496 [00:32<00:17, 1088.74it/s]

epsd2/admin/u3adm/P114145 is not available or not complete
epsd2/admin/u3adm/P511451 is not available or not complete
epsd2/admin/u3adm/P511903 is not available or not complete


 75%|███████▍  | 53323/71496 [00:32<00:11, 1644.50it/s]

epsd2/admin/u3adm/P112320 is not available or not complete
epsd2/admin/u3adm/P512104 is not available or not complete
epsd2/admin/u3adm/P474559 is not available or not complete


 75%|███████▌  | 53914/71496 [00:33<00:09, 1789.12it/s]

epsd2/admin/u3adm/P512105 is not available or not complete
epsd2/admin/u3adm/P105543 is not available or not complete
epsd2/admin/u3adm/P511940 is not available or not complete


 76%|███████▌  | 54504/71496 [00:33<00:09, 1802.99it/s]

epsd2/admin/u3adm/P511553 is not available or not complete
epsd2/admin/u3adm/P108847 is not available or not complete
epsd2/admin/u3adm/P109094 is not available or not complete
epsd2/admin/u3adm/P333129 is not available or not complete
epsd2/admin/u3adm/P102530 is not available or not complete
epsd2/admin/u3adm/P512135 is not available or not complete
epsd2/admin/u3adm/P414525 is not available or not complete


 77%|███████▋  | 55082/71496 [00:33<00:08, 1842.11it/s]

epsd2/admin/u3adm/P433131 is not available or not complete
epsd2/admin/u3adm/P478292 is not available or not complete


 78%|███████▊  | 55452/71496 [00:34<00:09, 1755.67it/s]

epsd2/admin/u3adm/P511605 is not available or not complete
epsd2/admin/u3adm/P474545 is not available or not complete
epsd2/admin/u3adm/P114315 is not available or not complete
epsd2/admin/u3adm/P512129 is not available or not complete
epsd2/admin/u3adm/P478312 is not available or not complete


 78%|███████▊  | 55814/71496 [00:34<00:08, 1757.00it/s]

epsd2/admin/u3adm/P511907 is not available or not complete
epsd2/admin/u3adm/P333128 is not available or not complete
epsd2/admin/u3adm/P414534 is not available or not complete
epsd2/admin/u3adm/P112796 is not available or not complete


 79%|███████▊  | 56132/71496 [00:34<00:14, 1075.08it/s]

epsd2/admin/u3adm/P511992 is not available or not complete
epsd2/admin/u3adm/P476715 is not available or not complete
epsd2/admin/u3adm/P114147 is not available or not complete
epsd2/admin/u3adm/P414526 is not available or not complete
epsd2/admin/u3adm/P476064 is not available or not complete


 79%|███████▉  | 56509/71496 [00:34<00:11, 1307.77it/s]

epsd2/admin/u3adm/P511550 is not available or not complete
epsd2/admin/u3adm/P109098 is not available or not complete
epsd2/admin/u3adm/P139507 is not available or not complete
epsd2/admin/u3adm/P478303 is not available or not complete
epsd2/admin/u3adm/P414567 is not available or not complete
epsd2/admin/u3adm/P511856 is not available or not complete


 80%|████████  | 57486/71496 [00:35<00:07, 1795.39it/s]

epsd2/admin/u3adm/P477688 is not available or not complete
epsd2/admin/u3adm/P477697 is not available or not complete
epsd2/admin/u3adm/P511863 is not available or not complete
epsd2/admin/u3adm/P512128 is not available or not complete
epsd2/admin/u3adm/P511436 is not available or not complete


 81%|████████  | 57867/71496 [00:35<00:07, 1783.53it/s]

epsd2/admin/u3adm/P511556 is not available or not complete
epsd2/admin/u3adm/P511458 is not available or not complete
epsd2/admin/u3adm/P511430 is not available or not complete
epsd2/admin/u3adm/P512142 is not available or not complete


 81%|████████▏ | 58232/71496 [00:35<00:07, 1771.02it/s]

epsd2/admin/u3adm/P105381 is not available or not complete
epsd2/admin/u3adm/P414568 is not available or not complete
epsd2/admin/u3adm/P511972 is not available or not complete
epsd2/admin/u3adm/P512098 is not available or not complete


 82%|████████▏ | 58593/71496 [00:36<00:07, 1781.19it/s]

epsd2/admin/u3adm/P109126 is not available or not complete
epsd2/admin/u3adm/P511875 is not available or not complete


 82%|████████▏ | 58962/71496 [00:36<00:06, 1809.60it/s]

epsd2/admin/u3adm/P105306 is not available or not complete
epsd2/admin/u3adm/P511857 is not available or not complete


 83%|████████▎ | 59346/71496 [00:36<00:10, 1134.04it/s]

epsd2/admin/u3adm/P105422 is not available or not complete
epsd2/admin/u3adm/P105454 is not available or not complete
epsd2/admin/u3adm/P114177 is not available or not complete


 84%|████████▎ | 59751/71496 [00:37<00:08, 1463.66it/s]

epsd2/admin/u3adm/P511454 is not available or not complete
epsd2/admin/u3adm/P511982 is not available or not complete
epsd2/admin/u3adm/P511409 is not available or not complete
epsd2/admin/u3adm/P109110 is not available or not complete
epsd2/admin/u3adm/P478300 is not available or not complete
epsd2/admin/u3adm/P476068 is not available or not complete
epsd2/admin/u3adm/P109116 is not available or not complete


 84%|████████▍ | 60139/71496 [00:37<00:06, 1656.30it/s]

epsd2/admin/u3adm/P512139 is not available or not complete
epsd2/admin/u3adm/P109108 is not available or not complete
epsd2/admin/u3adm/P511468 is not available or not complete
epsd2/admin/u3adm/P512124 is not available or not complete
epsd2/admin/u3adm/P455737 is not available or not complete
epsd2/admin/u3adm/P512120 is not available or not complete


 85%|████████▍ | 60511/71496 [00:37<00:06, 1610.42it/s]

epsd2/admin/u3adm/P511588 is not available or not complete
epsd2/admin/u3adm/P474540 is not available or not complete
epsd2/admin/u3adm/P511472 is not available or not complete
epsd2/admin/u3adm/P511923 is not available or not complete


 85%|████████▌ | 60860/71496 [00:37<00:06, 1642.92it/s]

epsd2/admin/u3adm/P511962 is not available or not complete


 86%|████████▋ | 61746/71496 [00:38<00:07, 1285.40it/s]

epsd2/admin/u3adm/P478306 is not available or not complete


 87%|████████▋ | 62094/71496 [00:38<00:06, 1437.19it/s]

epsd2/admin/u3adm/P105265 is not available or not complete
epsd2/admin/u3adm/P109112 is not available or not complete


 87%|████████▋ | 62456/71496 [00:38<00:05, 1572.78it/s]

epsd2/admin/u3adm/P109101 is not available or not complete
epsd2/admin/u3adm/P504807 is not available or not complete
epsd2/admin/u3adm/P114144 is not available or not complete
epsd2/admin/u3adm/P109120 is not available or not complete
epsd2/admin/u3adm/P511600 is not available or not complete
epsd2/admin/u3adm/P414549 is not available or not complete


 88%|████████▊ | 63005/71496 [00:39<00:05, 1547.71it/s]

epsd2/admin/u3adm/P477690 is not available or not complete
epsd2/admin/u3adm/P298522 is not available or not complete
epsd2/admin/u3adm/P511403 is not available or not complete


 89%|████████▊ | 63355/71496 [00:39<00:04, 1643.54it/s]

epsd2/admin/u3adm/P474526 is not available or not complete
epsd2/admin/u3adm/P112319 is not available or not complete
epsd2/admin/u3adm/P511626 is not available or not complete
epsd2/admin/u3adm/P331067 is not available or not complete
epsd2/admin/u3adm/P114183 is not available or not complete
epsd2/admin/u3adm/P109092 is not available or not complete


 89%|████████▉ | 63694/71496 [00:39<00:07, 1091.41it/s]

epsd2/admin/u3adm/P512112 is not available or not complete
epsd2/admin/u3adm/P511922 is not available or not complete


 90%|████████▉ | 64244/71496 [00:40<00:05, 1396.55it/s]

epsd2/admin/u3adm/P511428 is not available or not complete


 91%|█████████ | 64762/71496 [00:40<00:04, 1540.67it/s]

epsd2/admin/u3adm/P414556 is not available or not complete
epsd2/admin/u3adm/P414558 is not available or not complete
epsd2/admin/u3adm/P511624 is not available or not complete


 91%|█████████ | 65141/71496 [00:40<00:03, 1700.56it/s]

epsd2/admin/u3adm/P113128 is not available or not complete
epsd2/admin/u3adm/P108851 is not available or not complete
epsd2/admin/u3adm/P476066 is not available or not complete
epsd2/admin/u3adm/P511421 is not available or not complete
epsd2/admin/u3adm/P511993 is not available or not complete


 92%|█████████▏| 65508/71496 [00:41<00:03, 1676.43it/s]

epsd2/admin/u3adm/P105369 is not available or not complete


 92%|█████████▏| 66046/71496 [00:41<00:04, 1213.98it/s]

epsd2/admin/u3adm/P511952 is not available or not complete
epsd2/admin/u3adm/P511991 is not available or not complete
epsd2/admin/u3adm/P478311 is not available or not complete
epsd2/admin/u3adm/P108843 is not available or not complete
epsd2/admin/u3adm/P511864 is not available or not complete


 93%|█████████▎| 66389/71496 [00:41<00:03, 1430.21it/s]

epsd2/admin/u3adm/P114149 is not available or not complete
epsd2/admin/u3adm/P512110 is not available or not complete
epsd2/admin/u3adm/P333127 is not available or not complete
epsd2/admin/u3adm/P511625 is not available or not complete
epsd2/admin/u3adm/P474542 is not available or not complete


 94%|█████████▎| 66977/71496 [00:42<00:02, 1748.44it/s]

epsd2/admin/u3adm/P114104 is not available or not complete
epsd2/admin/u3adm/P476083 is not available or not complete
epsd2/admin/u3adm/P414547 is not available or not complete


 94%|█████████▍| 67516/71496 [00:42<00:02, 1604.69it/s]

epsd2/admin/u3adm/P511494 is not available or not complete
epsd2/admin/u3adm/P511947 is not available or not complete
epsd2/admin/u3adm/P430675 is not available or not complete


 95%|█████████▌| 68259/71496 [00:42<00:01, 1755.41it/s]

epsd2/admin/u3adm/P512152 is not available or not complete
epsd2/admin/u3adm/P512101 is not available or not complete


 96%|█████████▌| 68652/71496 [00:43<00:01, 1847.54it/s]

epsd2/admin/u3adm/P511966 is not available or not complete
epsd2/admin/u3adm/P414572 is not available or not complete


 97%|█████████▋| 69035/71496 [00:43<00:02, 1088.72it/s]

epsd2/admin/u3adm/P512122 is not available or not complete
epsd2/admin/u3adm/P114111 is not available or not complete
epsd2/admin/u3adm/P511988 is not available or not complete


 97%|█████████▋| 69599/71496 [00:43<00:01, 1497.93it/s]

epsd2/admin/u3adm/P512154 is not available or not complete
epsd2/admin/u3adm/P511432 is not available or not complete
epsd2/admin/u3adm/P511914 is not available or not complete
epsd2/admin/u3adm/P476073 is not available or not complete
epsd2/admin/u3adm/P512136 is not available or not complete
epsd2/admin/u3adm/P477692 is not available or not complete
epsd2/admin/u3adm/P478291 is not available or not complete


 98%|█████████▊| 69960/71496 [00:44<00:00, 1604.03it/s]

epsd2/admin/u3adm/P414531 is not available or not complete
epsd2/admin/u3adm/P478286 is not available or not complete
epsd2/admin/u3adm/P512132 is not available or not complete
epsd2/admin/u3adm/P113106 is not available or not complete
epsd2/admin/u3adm/P105482 is not available or not complete


 99%|█████████▊| 70512/71496 [00:44<00:00, 1742.39it/s]

epsd2/admin/u3adm/P511951 is not available or not complete
epsd2/admin/u3adm/P109102 is not available or not complete
epsd2/admin/u3adm/P331079 is not available or not complete
epsd2/admin/u3adm/P512113 is not available or not complete


 99%|█████████▉| 70906/71496 [00:44<00:00, 1736.46it/s]

epsd2/admin/u3adm/P511862 is not available or not complete
epsd2/admin/u3adm/P511938 is not available or not complete
epsd2/admin/u3adm/P476054 is not available or not complete


100%|█████████▉| 71286/71496 [00:44<00:00, 1814.17it/s]

epsd2/admin/u3adm/P512097 is not available or not complete
epsd2/admin/u3adm/P511577 is not available or not complete
epsd2/admin/u3adm/P511541 is not available or not complete
epsd2/admin/u3adm/P109086 is not available or not complete
epsd2/admin/u3adm/P511470 is not available or not complete
epsd2/admin/u3adm/P511479 is not available or not complete


100%|██████████| 71496/71496 [00:44<00:00, 1588.94it/s]
100%|██████████| 129/129 [00:00<00:00, 3388.67it/s]
100%|██████████| 181/181 [00:00<00:00, 1750.60it/s]
 33%|███▎      | 464/1424 [00:00<00:00, 2192.88it/s]

epsd2/admin/oldbab/P454327 is not available or not complete


100%|██████████| 1424/1424 [00:00<00:00, 1511.14it/s]
  0%|          | 0/394 [00:00<?, ?it/s]

epsd2/admin/oldbab/P453296 is not available or not complete


100%|██████████| 394/394 [00:04<00:00, 83.99it/s] 
0it [00:00, ?it/s]
100%|██████████| 35/35 [00:00<00:00, 1082.31it/s]
100%|██████████| 1348/1348 [00:00<00:00, 1485.42it/s]
 52%|█████▏    | 127/242 [00:00<00:00, 649.55it/s]

epsd2/praxis/P222388 is not available or not complete
epsd2/praxis/P342998 is not available or not complete
epsd2/praxis/P345800 is not available or not complete


100%|██████████| 242/242 [00:00<00:00, 387.26it/s]
  2%|▏         | 89/4290 [00:00<00:04, 886.33it/s]

epsd2/praxis/P307377 is not available or not complete


100%|██████████| 4290/4290 [00:10<00:00, 392.97it/s]
100%|██████████| 100/100 [00:00<00:00, 1123.88it/s]
100%|██████████| 92/92 [00:00<00:00, 2567.17it/s]
100%|██████████| 397/397 [00:01<00:00, 305.75it/s]


## 3 Data Structuring
### 3.1 Transform the Data into a DataFrame


In [21]:
words_l = []
word = []
separators = ['{', '}', '-']
separators2 = ['.', '+', '|']
operators = ['&', '%', '@', '×']
for e in tqdm.tqdm(all_):
    word = []
    for s in separators: # first split word into signs
        if '1(šar₂{gal})' in e: # this cheating but it seems to work (appears in SKL)
            e = e.replace('1(šar₂{gal})', '1(šar₂)-gal')
        e = e.replace(s, ' ').strip()
    s_l = e.split()
    for sign in s_l:
        if sign[-1] == ')': # qualified sign - get only the qualifier
            stack = []  # |GIŠ×(GIŠ%GIŠ)|(LAK277) becomes LAK277
            ind = {}    # LAK277(|GIŠ×(GIŠ%GIŠ)|) becomes |GIŠ×(GIŠ%GIŠ)|
            for i, c in reversed(list(enumerate(sign))):
                if c == ')':
                    stack.append(i)
                if c == '(':
                    ind[stack.pop()] = i   # find the opening parens that belongs to the closing parens at position -1    
            start = ind[len(sign)-1]   # this line fails on 1(šar₂{gal}) in SKL.
            t = sign[start+1:-1]
            if t.isupper(): #leave 1(diš) etc. alone
                sign = t
            
        if '|' in sign:  # separate |DU.DU| and |DU+DU| into its components but not |DU&DU|
                        # and also not |DU.DU&DU|
            flag = False
            for o in operators:
                if o in e:
                    flag = True
            if not flag:
                for s in separators2:
                    sign = sign.replace(s, ' ').strip() 
                    sign_l = sign.split()
                word.extend(sign_l)
            continue
        elif "+" in sign:  # + as marker of gloss
            sign = sign.replace('+', ' ').strip()
            sign_l = sign.split()
            word.extend(sign_l)
            continue
        word.append(sign)
    words_l.append(word)           


  0%|          | 0/4385354 [00:00<?, ?it/s][A
  1%|          | 30266/4385354 [00:00<00:14, 302648.02it/s][A
  1%|▏         | 59736/4385354 [00:00<00:14, 300216.21it/s][A
  2%|▏         | 88581/4385354 [00:00<00:14, 296584.90it/s][A
  3%|▎         | 117576/4385354 [00:00<00:14, 294559.85it/s][A
  3%|▎         | 147065/4385354 [00:00<00:14, 294656.37it/s][A
  4%|▍         | 177115/4385354 [00:00<00:14, 296382.82it/s][A
  5%|▍         | 206815/4385354 [00:00<00:14, 296566.73it/s][A
  5%|▌         | 237263/4385354 [00:00<00:13, 298896.22it/s][A
  6%|▌         | 266805/4385354 [00:00<00:13, 297842.95it/s][A
  7%|▋         | 298833/4385354 [00:01<00:13, 304235.94it/s][A
  8%|▊         | 330666/4385354 [00:01<00:13, 308330.59it/s][A
  8%|▊         | 362353/4385354 [00:01<00:12, 310843.56it/s][A
  9%|▉         | 393052/4385354 [00:01<00:12, 309476.05it/s][A
 10%|▉         | 424349/4385354 [00:01<00:12, 310515.88it/s][A
 10%|█         | 457409/4385354 [00:01<00:12, 316279.14it/s

 95%|█████████▌| 4181609/4385354 [00:14<00:00, 383382.89it/s][A
 96%|█████████▌| 4220015/4385354 [00:14<00:00, 381075.92it/s][A
 97%|█████████▋| 4258584/4385354 [00:14<00:00, 382446.18it/s][A
 98%|█████████▊| 4296866/4385354 [00:14<00:00, 380976.51it/s][A
 99%|█████████▉| 4334991/4385354 [00:14<00:00, 377870.18it/s][A
100%|█████████▉| 4372803/4385354 [00:14<00:00, 374249.66it/s][A
100%|██████████| 4385354/4385354 [00:14<00:00, 296996.29it/s][A

In [22]:
with open("output/ogsl.p", "rb") as f:
    o = pickle.load(f)

In [23]:
val = list(o["value"])
utf = list(o["utf8"])
names = list(o["name"])

In [24]:
d = dict(zip(names, utf))
d2 = dict(zip(val,names))

In [25]:
names_l = []
utf8_l = []
for w in tqdm.tqdm(words_l):
    seq = [d2[s.lower()] if s.lower() in d2 else s for s in w]
    names_l.append(seq)
    utf8 = [d[n] if n in d else n for n in seq]
    utf8_l.append(''.join(utf8))


  0%|          | 0/4385354 [00:00<?, ?it/s][A
  1%|          | 45475/4385354 [00:00<00:09, 454741.76it/s][A
  2%|▏         | 92056/4385354 [00:00<00:09, 458002.55it/s][A
  3%|▎         | 139008/4385354 [00:00<00:09, 461397.82it/s][A
  4%|▍         | 187764/4385354 [00:00<00:08, 468944.07it/s][A
  5%|▌         | 235361/4385354 [00:00<00:08, 471027.44it/s][A
  6%|▋         | 283075/4385354 [00:00<00:08, 472844.50it/s][A
  8%|▊         | 330366/4385354 [00:00<00:08, 472862.59it/s][A
  9%|▊         | 381462/4385354 [00:00<00:08, 483665.40it/s][A
 10%|▉         | 432922/4385354 [00:00<00:08, 492546.88it/s][A
 11%|█         | 482462/4385354 [00:01<00:07, 493396.34it/s][A
 12%|█▏        | 533371/4385354 [00:01<00:07, 497998.90it/s][A
 13%|█▎        | 583479/4385354 [00:01<00:07, 498916.19it/s][A
 14%|█▍        | 632780/4385354 [00:01<00:07, 491422.64it/s][A
 16%|█▌        | 682118/4385354 [00:01<00:07, 492006.60it/s][A
 17%|█▋        | 732740/4385354 [00:01<00:07, 496183.11it/

In [26]:
df = pd.DataFrame({"transliteration":all_, "words":words_l, "names":names_l, "utf-8":utf8_l, "lemm" : lemm_})
df

Unnamed: 0,transliteration,words,names,utf-8,lemm
0,Startepsd2/admin/ed3a/P011046,[Startepsd2/admin/ed3a/P011046],[Startepsd2/admin/ed3a/P011046],Startepsd2/admin/ed3a/P011046,Startepsd2/admin/ed3a/P011046
1,1(barig@c),[1(barig@c)],[DIŠ],𒁹,n
2,še,[še],[ŠE],𒊺,še[barley]N
3,ba-lul,"[ba, lul]","[BA, LUL]",𒁀𒈜,X
4,nagar,[nagar],[NAGAR],𒉄,nagar[carpenter]N
5,1(barig@c),[1(barig@c)],[DIŠ],𒁹,n
6,nig₂-du₇,"[nig₂, du₇]","[GAR, |U.GUD|]",𒃻𒌌,niŋdu[appropriate thing]N
7,ag₂,[ag₂],[|NINDA₂×NE|],𒉘,aŋ[measure]V/t
8,hur-sag-še₃-mah,"[hur, sag, še₃, mah]","[|HI×AŠ₂|, SAG, EŠ₂, MAH]",𒄯𒊕𒂠𒈤,X
9,sa₁₂-du₅,"[sa₁₂, du₅]","[SAG, DUN₃]",𒊕𒂅,saŋ.DUN₃[recorder]N


In [27]:
with open("output/sux.p", "wb") as w:
    pickle.dump(df, w)

In [28]:
sux_text = ' '.join(df['utf-8']).strip()
sux_text = sux_text.replace('Start', '\n')
sux_text = re.sub(r'\n+', '\n', sux_text)

In [29]:
with open("output/sux.txt", 'w', encoding="utf-8") as w:
    w.write(sux_text)