# Edge List Functions


Motivation:
    
The primary benefit of the Attestation Network is that it can act as a kind of control for the dataset, since there is a one-to-one relationship between each text and each PN instance. This means that we can supervise a ‘gold standard’ network, and then see how close the unsupervised disambiguation methods can get to this network. 

In [1]:
import pandas as pd

Importing CSV file as pandas dataframe

Params:
path - file path

Returns:
dataframe

In [2]:
def file_to_df(path):
    nodes = pd.read_csv(path)
    return nodes

Draw edges between all the PNs (rows) with the same Pnum (column). Keep rows with source ID < target ID to avoid duplicated. 

Params:
df - node dataframe
merge_column - column containing common attribute that you want edges in between 
id_column - column with node id in node dataframe 

In [3]:
def draw_edges(df, merge_column, id_column):
    edges = df.merge(df, on = [merge_column], how = 'left')
    edges = edges[edges[id_column + '_x'] < edges[id_column + '_y']]
    edges = edges.rename(columns={"role_x": "source role", "role_y": "target_role", id_column + '_x': 'Source ID', id_column + '_y': 'Target ID'})
    return edges

Writes dataframe to csv

In [4]:
def df_to_file(df, path):
    df.to_csv(path)

# Example 1

In [5]:
df = file_to_df("example_node_list.csv")

In [6]:
#nodelist
df.head()

Unnamed: 0,Id,Label,text,fn,gf,clan,role,date,p_num
0,1,Nanaya-iddin,YOS 20 21,,,,neighbor,HE.SE.---.05.02,P296765
1,2,Anu-ittannu,YOS 20 21,Nanaya-iddin,,,neighbor,HE.SE.---.05.02,P296765
2,3,Kidin-Anu,YOS 20 21,Nanaya-iddin,,,neighbor,HE.SE.---.05.02,P296765
3,4,Anu-ušezib,YOS 20 21,Nanaya-iddin,,,neighbor,HE.SE.---.05.02,P296765
4,5,Šamaš-ittannu,YOS 20 21,Tanittu-Anu,,,neighbor,HE.SE.---.05.02,P296765


In [7]:
#edgelist
draw_edges(df, 'p_num', 'Id').head()

Unnamed: 0,Source ID,Label_x,text_x,fn_x,gf_x,clan_x,source role,date_x,p_num,Target ID,Label_y,text_y,fn_y,gf_y,clan_y,target_role,date_y
1,1,Nanaya-iddin,YOS 20 21,,,,neighbor,HE.SE.---.05.02,P296765,2,Anu-ittannu,YOS 20 21,Nanaya-iddin,,,neighbor,HE.SE.---.05.02
2,1,Nanaya-iddin,YOS 20 21,,,,neighbor,HE.SE.---.05.02,P296765,3,Kidin-Anu,YOS 20 21,Nanaya-iddin,,,neighbor,HE.SE.---.05.02
3,1,Nanaya-iddin,YOS 20 21,,,,neighbor,HE.SE.---.05.02,P296765,4,Anu-ušezib,YOS 20 21,Nanaya-iddin,,,neighbor,HE.SE.---.05.02
4,1,Nanaya-iddin,YOS 20 21,,,,neighbor,HE.SE.---.05.02,P296765,5,Šamaš-ittannu,YOS 20 21,Tanittu-Anu,,,neighbor,HE.SE.---.05.02
5,1,Nanaya-iddin,YOS 20 21,,,,neighbor,HE.SE.---.05.02,P296765,6,Anu-zer-lišir,YOS 20 21,Tanittu-Anu,,,neighbor,HE.SE.---.05.02


# Drehem Example

The node list used here is the most updated data taken from filtered.csv.

In [16]:
#nodelist
df = file_to_df("nov_2019_nodelist.csv")
df.rename(columns={'Unnamed: 0': 'Node ID'}, inplace=True) 
df.head()

Unnamed: 0,Node ID,Name,id_word,CDLI No,role,profession,Original date,Converted Date
0,0,ur-{d}dam-gal-nun-na[]PN,P142785.4.1,P142785,,dubsar[scribe]N,IS01 - 07 - 00,85.07
1,1,ur-{d}šul-pa-e₃[]PN,P142785.6.1,P142785,,dubsar[scribe]N,IS01 - 07 - 00,85.07
2,2,{d}šul-gi-i₃-li₂[]PN,P142785.16.1,P142785,recipient,,IS01 - 07 - 00,85.07
3,3,lu₂-giri₁₇-zal[]PN,P142785.17.2,P142785,intermediary,,IS01 - 07 - 00,85.07
4,4,{d}i-bi₂-{d}suen[]PN,P142785.19.2,P142785,,lugal[king]N,IS01 - 07 - 00,85.07


In [34]:
#edgelist
#TODO(add Sumerian to role)
merged = draw_edges(df, 'CDLI No', 'Node ID')
#dropping repetitive columns
merged = merged.drop(['Original date_y', 'Converted Date_y'], axis=1)
#renaming columns
merged.columns = ['Source ID', 'Source Name', 'Source word ID', 'CDLI NO', 'Source Role', 'Source Profession', 'Original Date', 'Converted Date', 'Target ID', 'Target Name', 'Target Word ID', 'Target Role', 'Target Profession']
merged.head()

Unnamed: 0,Source ID,Source Name,Source word ID,CDLI NO,Source Role,Source Profession,Original Date,Converted Date,Target ID,Target Name,Target Word ID,Target Role,Target Profession
1,0,ur-{d}dam-gal-nun-na[]PN,P142785.4.1,P142785,,dubsar[scribe]N,IS01 - 07 - 00,85.07,1,ur-{d}šul-pa-e₃[]PN,P142785.6.1,,dubsar[scribe]N
2,0,ur-{d}dam-gal-nun-na[]PN,P142785.4.1,P142785,,dubsar[scribe]N,IS01 - 07 - 00,85.07,2,{d}šul-gi-i₃-li₂[]PN,P142785.16.1,recipient,
3,0,ur-{d}dam-gal-nun-na[]PN,P142785.4.1,P142785,,dubsar[scribe]N,IS01 - 07 - 00,85.07,3,lu₂-giri₁₇-zal[]PN,P142785.17.2,intermediary,
4,0,ur-{d}dam-gal-nun-na[]PN,P142785.4.1,P142785,,dubsar[scribe]N,IS01 - 07 - 00,85.07,4,{d}i-bi₂-{d}suen[]PN,P142785.19.2,,lugal[king]N
5,0,ur-{d}dam-gal-nun-na[]PN,P142785.4.1,P142785,,dubsar[scribe]N,IS01 - 07 - 00,85.07,5,ur-{d}dam-gal-nun-na[]PN,P142785.23.1,,dubsar[scribe]N


In [36]:
df_to_file(merged, 'fa19_edgelist.csv')

# 

# Notes on Directionality

the Role for Source and Target (s_role) | (t_role) will help us establish directionality in the edges at a later stage, based on a hierarchy for the following roles: 

|Roles
|-
|“ki[source]”
|
|“maškim[administrator]” 
|
|“i3-dab5[recipient]”
|
|“giri3[intermediary]”
|
|“mu-ku(x=DU)[delivery_by/for]”
|


The spreadsheet will have a Source and Target, based on the node IDs, and these are the possibilities for labels for those columns: (Source —> Target)

|Relationships
|-
|ki[source] —> maškim[authorized]
|
|ki[source] —> giri3[intermediary]
|
|ki[source] —> i3-dab5[recipient]
|
|ki[source] —> mu-kux(DU)[delivery]
|
|giri3[intermediary] —> maškim[authorized]
|
|maškim[authorized] —> giri3[intermediary]
|