# Node Labeler for Edges Data
This Jupyter notebook is useful if you have a *nodes* CSV with a unique *ID* and *Label* for each node, plus an *edges* CSV with a *Source* and *Target* column that use the unique ID rather than the label. It creates a new CSV with the Source and Target labels (rather than IDs), as well as Weight and Type columns from the original *edges* CSV.

To summarize, you start with:

**Nodes**

| ID | Label   |
|----|---------|
| 1  | Anna    |
| 2  | Levin   |
| 3  | Vronsky |

**Edges**

| Source | Target | Weight | Type     |
|--------|--------|--------|----------|
| 1      | 1      | 62     | Directed |
| 1      | 2      | 11     | Directed |
| 1      | 3      | 188    | Directed |

And end with:

**Labeled Network**

| Source | Target  | Weight | Type     |
|--------|---------|--------|----------|
| Anna   | Anna    | 62     | Directed |
| Anna   | Levin   | 11     | Directed |
| Anna   | Vronsky | 188    | Directed |

## Step 1. Import modules

In [None]:
#os lets you change the working directory
import os
#pandas lets you create, modify, and export dataframes
import pandas as pd

## Step 2. Specify source directory and edge file
Replace `/Users/qad/Documents/aknet` with the path to the working directory below to the directory where you have your source files.

For instance, the default path to the Documents directory is (substituting your user name on the computer for YOUR-USER-NAME):

- On Mac: '/Users/YOUR-USER-NAME/Documents'
- On Windows: 'C:\\\Users\\\YOUR-USER-NAME\\\Documents'

Then, replace `AK-I.csv` with the name of the Gephi edge list file that you want to use. It should end in .csv, and have a header row that reads *Source*, *Target*, *Weight*. It can have additional columns, but needs to have those three columns, in that order. If your file does not have a header row (but is laid out in the right order), type a `#` in front of the line that reads `next(edgereader)`. 

In [None]:
#Put the path to the directory with your edge file below
os.chdir('/Users/qad/Documents/aknet')
#Put the name of your edge file below
edges = 'AK-edges_1.csv'
nodes = 'AK-nodes_1.csv'

## Step 3. Create dataframes
A dataframe is like a spreadsheet or database table that you can manipulate in Python. The next cell creates two dataframes: one for the nodes, and one for the edges.

In [None]:
dfnodes = pd.read_csv(nodes, usecols = ["ID" , "Label"])
dfedges = pd.read_csv(edges)

## Step 4. Combine dataframes
The following cell renames the *ID* column in the nodes dataframe to *Source* and the *Label* column to *SourceLabel*, then creates a new dataframe that has all the edge data, and the label corresponding to the ID from the source column. Then, it renames what was originally the *ID* column in the noes dataframe to *Target*, and the *Label* column to *TargetLabel*, and creates a new dataframe with the labels corresponding to both the source and the target. 




In [None]:
#Rename "ID" column as "Source", and "Label" column as "SourceLabel"
dfnodes.rename(columns={"ID": "Source", "Label":"SourceLabel"}, inplace=True)
#Create temporary dataframe that adds a column with the labels for the source data from the edges dataframe
dftemp = pd.merge(dfnodes, dfedges, on='Source', how='inner')
#Rename original "ID" column (currently called "Source") as "Target", and "SourceLabel" as "TargetLabel"
dfnodes.rename(columns={"Source": "Target", "SourceLabel":"TargetLabel"}, inplace=True)
#Create dataframe with labels for source and target data
dfmerged = pd.merge(dftemp, dfnodes, on='Target', how='inner')
#Show dataframe
dfmerged

## Step 5: Cleanup
We need the labels to be in columns called "Source" and "Target", not "SourceLabel" and "TargetLabel". But we already have "Source" and "Target" columns, so the next step is to drop those existing columns, reorder the columns to restore the original column order, then rename "SourceLabel" and "TargetLabel".

In [None]:
#Drop current "Source" and "Target" columns that still have the ID numbers
dfmerged.drop(columns=['Source', 'Target'], axis=1, inplace=True)
#Define new sort order
columnsTitles = ['SourceLabel', 'TargetLabel', 'Weight', 'Type']
#Re-sort the dataframe
dfclean = dfmerged.reindex(columns=columnsTitles)
#Rename "SourceLabel" as "Source" and "TargetLabel" as "Target"
dfclean.rename(columns={"SourceLabel": "Source", "TargetLabel":"Target"}, inplace=True)
#Show dataframe
dfclean

## Step 6. Export
The following cell defines an output filename that takes the name of the input edges file, and appends "\_labels" to it, and then creates that file with the contents of the cleaned-up dataframe.

In [None]:
#Sets up model for creating the file name
file_name = edges.replace('.csv', '_labels.csv')
#Exports the new CSV file
dfclean.to_csv(file_name)

## Suggested citation
Dombrowski, Quinn. *Node Labeler for Edges Data*. Jupyter notebook. https://github.com/quinnanya/network-triads. 2019.