# Extracting Personal Names from QPN Glossary
Extract proper nouns with the id_text of where they appear from `gloss-qpn.json`. The appropriate JSON ZIP file must be located in the directory `jsonzip`.

In [1]:
import pandas as pd
import zipfile
import json

Extract the data from the `gloss-qpn.json` file and use the `json` package to store the results in the variable `data_json`.

In [2]:
filename = ""
zfilename = "jsonzip/rinap-szar2gina2019.zip"
z = zipfile.ZipFile(zfilename)
names = z.namelist()
for name in names: 
    if "gloss-qpn.json" in name: 
        filename = name
qpn = z.read(filename).decode('utf-8')         #read and decode the qpn glossary json file
data_json = json.loads(qpn)          

# First DataFrame: Entries
The DataFrame `df_pn` is created, capturing the columns `headword`, `xis`, and `pos`. The column `pos` is used to select personal names (regnal names and personal names). The value of `xis` is an ID which is used in `instances` (below).

In [3]:
entries = data_json["entries"]
df = pd.DataFrame(entries)
df = df[["headword", "xis", "pos"]]
df_pn = df.loc[df["pos"].isin(["PN", "RN"])]

# Second DataFrame: Instances
Instances links the `xis` ID number with text IDs. It notes the text's ID number, but also line and word.

In [4]:
instances = data_json["instances"]

Create a list of lists (called `l`) where every list has two elements: the `xis` number and one text ID. The text ID is made out of the references in `instances` which have the format PROJECT:ID_TEXT.ID_LINE.ID_WORD.

In [5]:
l = []
for i in df_pn["xis"]:
    for k in instances[i]:
        QPN = k.split(":")[1]
        QPN = QPN.split(".")[0]
        d = [i, QPN]
        l.append(d)

Create the DataFrame and give names to the columns.

In [6]:
inst_df = pd.DataFrame(l)
inst_df.columns = ["xis", "id_text"]

Merge the two DataFrames on the field `xis`, using the keys from the left object (that is `inst_df`).

In [7]:
df = inst_df.merge(df_pn, on='xis', how='left')
df

Unnamed: 0,xis,id_text,headword,pos
0,qpn.r000002,Q006482,Abi-hata[chief of the Ruʾuʾa tribe]RN,RN
1,qpn.r000002,Q006483,Abi-hata[chief of the Ruʾuʾa tribe]RN,RN
2,qpn.r000002,Q006484,Abi-hata[chief of the Ruʾuʾa tribe]RN,RN
3,qpn.r00000b,Q006524,Ada[ruler of Šurda]RN,RN
4,qpn.r00000b,Q006557,Ada[ruler of Šurda]RN,RN
5,qpn.r00000b,Q006563,Ada[ruler of Šurda]RN,RN
6,qpn.r00000b,Q006563,Ada[ruler of Šurda]RN,RN
7,qpn.r00000b,Q006592,Ada[ruler of Šurda]RN,RN
8,qpn.r000010,Q006573,"Adad-narari[Adad-narari III, king of Assyria]RN",RN
9,qpn.r000015,Q006482,Ahhe-iddina[Gambulean sheikh]RN,RN
