## Notebook Description
This Jupyter Notebook opens a csv file that contains rows of hadiths with their corresponding isnads, cleans it up, turns it into a graph, and uploads that graph to GraphSpace. 

This is very similar to subgraph-generator-from-narrator-files.ipynb, but instead of taking individual narrator-based files as input, it takes a hadith-based file.

Useful references: 
- https://graphspace-python-library.readthedocs.io/en/develop/tutorial/tutorial.html 
- https://manual.graphspace.org/projects/graphspace-python/en/latest/reference

## Imports

In [2]:
from graphspace_python.graphs.classes.gsgraph import GSGraph
import plotly.express as px
import json
import pandas as pd

## Functions

### clean_index_list(column_name)
- **input**: df, a dataframe, and column_name, a string. The name of the column that stores the strings of comma-separated digits (indices of the scholars - either students or teachers)
- **output**: list of numeric indices built from that column

basically: strings of lists of numbers ---> turns into ---> lists of numeric indices

In [3]:
def clean_index_list(df, column_name): 
    inds_corrected = []
    for indx, data in df.loc[:,[column_name]].iterrows():
        
        inds_original = data[0] # currently a string of numbers separated by commas
        
        # if it's null, append it to corrected list: null students = no students.
        if pd.isna(inds_original):
            inds_corrected.append(inds_original)

         # if it's a string, split by commas and turn the strings of digits into ints. 
        elif isinstance(inds_original, str):
            temp = []
            for item in inds_original.split(','):
                if item.strip().isdigit():
                    temp.append(int(item.strip()))
                else:
                    print("Non-numeric character found in what is supposed to be a string of comma-separated digits of teachers or students at id="+str(indx)+", value: "+item.strip())
            inds_corrected.append(temp)
        else:
            raise TypeError("index value at indx "+str(indx)+" is neither str nor NaN")
        
    return inds_corrected

## Read + clean the data

### Start with the hadith dataset

In [4]:
df = pd.read_csv('data/hadiths.csv', encoding='utf-8')
df.head()

Unnamed: 0,id,URL,isnad,notes
0,1,https://sunnah.com/bukhari:212,"1, 53, 10511, 11065, 20001",
1,2,https://sunnah.com/bukhari:1877,"1, 9, 53",
2,3,https://sunnah.com/ibnmajah:3841,"1, 53, 13",
3,4,https://sunnah.com/nasai:1857,"1, 53, 17",
4,5,https://sunnah.com/nasai:1856,"1, 53, 11455",


In [5]:
# Clean the columns with the teacher/student indices


isnads_corrected = clean_index_list(df, 'isnad')

# remove old columns  
del df['isnad']

# assign corrected columns to the dataset
df = df.assign(isnad=isnads_corrected)

df = df.fillna('')
df.head()

Unnamed: 0,id,URL,notes,isnad
0,1,https://sunnah.com/bukhari:212,,"[1, 53, 10511, 11065, 20001]"
1,2,https://sunnah.com/bukhari:1877,,"[1, 9, 53]"
2,3,https://sunnah.com/ibnmajah:3841,,"[1, 53, 13]"
3,4,https://sunnah.com/nasai:1857,,"[1, 53, 17]"
4,5,https://sunnah.com/nasai:1856,,"[1, 53, 11455]"


### Read + clean the narrators dataset

In [6]:
info = pd.read_csv('data/variousnarrators.csv', encoding='utf-8')
info = info.fillna('')
info.set_index('id', inplace=True)
#str(info.loc[info['id'] == 846]['displayname'])
#info.loc[846, 'displayname']

## Make the graph
### makegraph(specifiednarrator, isnaddf, narratordf):
**input:**
- **specifiednarrator**: the ID number of the narrator whose isnads you want to include in the graph. Any isnad in the dataset *without* this narrator's ID will be excluded.
- **isnaddf**: in this case, this is the "isnad" dataframe defined above, created from hadiths.csv. It expects a dataframe in that format (list of hadiths, each hadith having a numerical list of IDs as its isnad). 
- **narratordf**: this is the "info" dataframe defined above, created from variousnarrators.csv. It expects a list of narrator IDs, each ID corresponding to various columns of information (displayname, fullname, gender, generation, etc.). 


**output:** a directed graph with narrators as nodes and the edges symbolizing the teacher-student relationships between them

In [7]:
def makegraph(specifiednarrator, isnaddf, narratordf):
    G = GSGraph()
    
    # add nodes
    for currentisnad in isnaddf['isnad']:
        if specifiednarrator not in currentisnad: # we only want the isnads with the specified narrator in it
            continue
        for narrator in range(len(currentisnad)):
            n = currentisnad[narrator]
            if not G.has_node(n): 
                G.add_node(n, label=narratordf.loc[n, 'displayname'], fullname=narratordf.loc[n, 'fullname'],searchname=narratordf.loc[n, 'searchname'],arabicname=narratordf.loc[n, 'arabicname'],gender=narratordf.loc[n, 'gender'], info=narratordf.loc[n, 'info']) 

    # add edges 
    for indx, data in isnaddf.iterrows():
        currentisnad = data['isnad']
        if specifiednarrator not in currentisnad:
            continue
        for narrator in range(1, len(currentisnad)):
            n = currentisnad[narrator]
            n_previous = currentisnad[narrator-1]
            if not G.has_edge(n_previous, n):
                G.add_edge(n_previous, n, narratedfrom=narratordf.loc[n_previous, 'displayname'], narratedto=narratordf.loc[n, 'displayname'], hadithurl=data['URL']) 
    
    return G


### Make the graph

In [8]:
# Set up connection to GraphSpace

from graphspace_python.api.client import GraphSpace
graphspace = GraphSpace('USERNAME', 'PASSWORD')

In [9]:
# Create a variable and initialize it as a GraphSpace graph
G = makegraph(11457, df, info)

# set metadata for the graph
metadata = {
     'description': 'This is a graph of hadith narrators',
     'directed': True
}
G.set_data(metadata)


print('There are '+str(len(G.nodes))+' nodes and '+
      str(len(G.edges))+' edges in the original graph.')
G.nodes()
G.edges

There are 13 nodes and 14 edges in the original graph.


OutEdgeView([(1, 13), (1, 37), (1, 5250), (13, 11457), (11457, 11011), (11457, 10683), (11457, 10920), (11457, 11044), (11457, 18679), (37, 11457), (10683, 11019), (5250, 11457), (10920, 11013), (11013, 20012)])

In [17]:
#graph = graphspace.post_graph(narratorsgraph)
graph = graphspace.post_graph(G)
graph.get_name()
graph.id

34317