# Introduction
A lot of data can come in the form of not just individual objects with particular characteristics, but also objects that can be related to each other in certain ways. Examples of this can include say, friends on a social network, or citations in academic papers. This is somewhat similar to relational databases, but the relations in this case are all amongst discrete actors that have similar “status” rather than rows that can differ. 

<img src="https://www.smrfoundation.org/wp-content/uploads/2009/09/2009-September-NodeXL-CHI-2010-Tag-Network.png" width=700px>

The usefulness of networks is that we can use them to examine relations between actors, and on a greater scale, relations within a given group of actors, rather than just individual actors. This then allows us to help determine certain facts and ideas about the group. For example, for a network of people, it may indicate if there are semi distinct cliques, or if there is a “leader” amongst all the other actors. These then allow us to analyze the overarching structures of real world groups.

In this case rather than examining theoretical or arbitrary networks, we can analyze ones that occur in the real world, such as those within social media, or other networks where there are actors that can have links, and delivering some higher level data that might lend insight into the nature of the network and potentially its relation to other networks. One of the big practical differences is that this allows us to reduce network information into quantifiable and qualitative data that might be easier to draw conclusions from than the raw network. This applied approach also means that certain characteristics of a network can be analyzed on a sliding scale, rather than as a hard binary as can be the case in graph theory

# Tutorial Contents
This tutorial will start with some of the basics of networks and extend to bigger concepts. The software we’ll be using to help compute some examples is called ORA, developed by CASOS (or the Center for Computational Analysis of Social and Organizational Systems) at CMU, although having the software is not imperative. In order, the following will be covered

* [Network Basics](#Basics)
* [Node Characteristics](#Node-Characteristics)
* [Network Characteristics](#Network-Characteristics)
* [Further Resources](#Further-Resources)


# Network Basics
In a network there is a set of nodes, which represent the actors and a set of links that can go between two nodes, which represent relations. If that sounds familiar, it’s because networks are effectively equivalent to graphs, with some modified terminology of 
* Node - Vertex
* Link - Edge
Otherwise they are conceptually the same. Networks can also be undirected or directed, with friendships being an example of the latter and say, monetary transactions between people being an example of the latter. They can also have weighted edges, which should be some quantity that would indicate something about the link in question. 

### Paths
A path is basically a sequence of nodes where each node has a link to the next and previous nodes. Usually these paths are ways for us to get from a particular starting node to a particular ending node by “walking” along the links. 

## Example With ORA
First, you can download ORA for windows at the following link:
http://www.casos.cs.cmu.edu/projects/ora/download.php

Consider the network indicated in the picture indicating a group of classmates and where each link represents a certain number of times that the source node has helped the destination node with homework that year:
<img src="nBasics.png" width=300px>

We can see that it is asymmetric, because the edges are all directed and some are unidirectional. Further, we can see that there is a path between say, Bob and Sam that goes Bob-Tim-Sarah-Sam

One way to represent this network is simply providing a list of nodes and a list of edges, potentially with weights in a deliniated text csv file as such:

Edges:
```
From,To,Weight
Bob,Tim,1
Jane,Tim,2
Sarah,Sam,3
Sarah,Alice,4
Tim,Sarah,1
Sarah,Tim,1
```

To get the CSV file, you can just copy the above text into a file labelled "filename.csv"

We can then start up ORA. Click the magic wand in the corner to import data and select "Table of network links" under "Import Excel or Text Delimited Files"
<img src="Screen Shot 2018-03-27 at 4.11.04 PM.png">
Hit next twice and then select the .csv file containing the above edges, and select both FROM and TO columns to contain node names, and have the nodeset class be "Agent".

Then, hit the "new" button at the bottom and set the source, target, and link weight to From, To and Weight, respectively as such:
<img src="Screen Shot 2018-03-27 at 4.11.35 PM.png">

Then, hit next, and hit finish
<img src="Screen Shot 2018-03-27 at 4.11.43 PM.png">

Your network should now be imported into ORA as a Meta-network, which is shown in the sidebar, and which contains both a nodeset and an edge matrix as seen here:
<img src="Screen Shot 2018-03-27 at 4.20.35 PM.png" width=500px>



# Node Centrality
In examining nodes, we also are interested in the overall “importance” of a given node. If the network represents people, does a given node correspond to a popular or unpopular person. If they were to say, espouse an idea, how quickly would it spread. More morbidly if we were to “remove” this person, what impact would it have? There is no objective way to consider the importance of a given actor, so there are multiple measures that can potentially inform it.

One of these characteristics is centrality, which is a characteristic of a singular node that usually in some way indicates its “importance”. There are multiple kinds of centrality and here are a few basic types
### In-degree centrality
What is the degree of links coming in. This may indicate how many of a certain relation points inward toward the node. So for example, The in-degree centrality of Bob in the example network is 0 because the node has no outgoing links.
### Out-degree centrality
What is the degree of outgoing links. This may indicate how many of a certain relation points inward toward the node. So for example, The (unscaled) out-degree centrality of Bob in the example network is 1 because the node has one outgoing link.
### Total-degree
This is a measure of the overall degree so it would be the sum of the in-degree and out-degree centrality.
### Betweenness
How many shortest paths between two nodes pass through this one? This one can indicate that a node may be an important link between two disparate groups or is in a potentially high traffic area (if the centrality is high). This can mean that say, the removal of nodes that have high betweenness causes the network to fracture more.
### Eigenvector Centrality and PageRank Centrality
These are two different measures but both, in different ways indicate a node’s connections to other nodes that are high in centrality, essentially indicating the importance of neighbors

## Example in ORA
If we take the meta network we imported in the last example, if we select "Agent: size 6" in the sidebar, indicating the nodeset, and then "Editor" on the top bar, which lets us see the actual nodes, we should get the following:

<img src="Screen Shot 2018-03-27 at 6.00.20 PM.png">

Right now it's just a list of the node names of the network that we imported. However, in ORA we can add more characteristics. We can add attributes that we can manually edit by selecting Attributes > Create New Attribute. However, for our purpouses, the more interesting ability is to create attribute measures, such as those above, which are calculated from the network structure. 

To do this, select "Create New Attribute Measure" from the Attributes menu

<img src="Screen Shot 2018-03-27 at 6.00.29 PM.png">

This brings up a window that prompts you for which networks you want to include in this measure, since a nodeset can correspond to multiple different netowrks. In this case we can just say "Entire Meta-Network" because we only have one. It then also prompts you to choose a node measure. Among the choices are those that we described above. For this example, say we choose betweenness (although you can play around with all the other measures too).

<img src="Screen Shot 2018-03-27 at 6.12.14 PM.png" width=500px>

This creates a new column with values for betweeness centrality. Now, remembering the definition given earlier, since all except Tim and Sarah are leaves, in that they represent dead ends, they have no betweeness, and Tim and Sarah, which are on the shortest paths from say, Bob to Sam, have higher and equal betweeness, in keeping with the definition. 

<img src="Screen Shot 2018-03-27 at 6.19.26 PM.png" width=500px>


# Network Characteristics

However, suppose we don't just care about individual nodes but we want more general metrics of the network. These will often be measures that are easy to calculate for a single node, but can also yield potentially useful information when calculated for the entire network.

### Density
This is something that comes up in graph theory as well. it basically means that of all the potential links in the network, how many actually exist? A high density can mean that all the nodes have relations with almost all the other nodes in the graph, whereas a low density can indicate that nodes have relations with a select group but not necessarily with everyone else

### Characteristic Path Length
Of particular importance are the shortest paths between two nodes, which often represent the closest set of connections they have. In the case of networks, we would be interested in how long these shortest paths are. If we average the length of the shortest paths amongst all pairs of nodes (excluding ones where that is not applicable) what we get is called the characteristic path length, and it can indicate something about the relative “distance” of nodes. For example, in a network of airports and airplane traffic, a long characteristic path length could indicate that if you want to fly between two arbitrary cities, you should prepare for a lot of connecting flights. 

### Reciprocity
This is of importance in directed graphs. In effect, if node a has a link going to node b, how likely is it that there is a link in the opposite direction. In essence, how likely are links to be reciporcated?

## Example in ORA

To view network characteristics, one thing we can do is to generate a report. A report is an external file generated by ORA that contains selected quantitative data about a given network. The way to do this is to select the meta-network in the sidebar and click "Generate Reports."

<img src="Screen Shot 2018-03-28 at 1.47.10 AM.png">

This should then allow you to select which meta-network you want to use. In this case, we are using our imported meta network. We can also select the type of report. In this particular case we are selecting "All measured by category" but there are other reports that we can make use of as well. We then select "Network Measures" and then select the folder we want to save the report in

<img src="Screen Shot 2018-03-28 at 1.47.13 AM.png" width=500px>
<img src="Screen Shot 2018-03-28 at 1.47.21 AM.png" width=500px>

The resulting report should open automatically and look something like this

<img src="Screen Shot 2018-03-29 at 1.39.27 PM.png" width=700px>

If you scroll down further, you can see a list of network level characteristics. You can use the included search bar to search for some of the ones that I mentioned above.


# Visual Analysis

We've covered node-level measures and some network-level measures, but  Moreover, some means of analyzing networks are not necessarily qualitative. Often it can be useful to just look at the network visually. Some uses of these include being able to survey the whole network, and visually being able to identify trends. However, the layout, coloring and other choices in presentation can be changed to inform what kids of discoveries you are able to make.

## Example in  ORA
ORA offers some tools to help you visualize the network. We'll cover basic visualizations and also usage of coloring and sizing to indicate network measures. If we select the meta-network in the sidebar and click "Visualize"
<img src="Screen Shot 2018-03-30 at 12.36.25 AM.png">

This should bring up a window with the visualization in it. Currently the layout is automatically generated, and you can refresh the autolayout by pressing the play icon. You can also manually move nodes around by dragging them.
<img src="Screen Shot 2018-03-29 at 3.39.29 PM.png">
We can also color and size nodes by measures and also modify link appearance by link measures. To do this we can select Node Appearance of Link Appearance menus, which bring up different characteristics like size and color. We then choose to "Set node ____ by attribute or measure" for nodes, which brings up a window prompting us to choose which attribute/measure, or "Set link ____ by link value". For example, this is what happens when we set node size based on degree centrality and link color by link value.

<img src="Screen Shot 2018-03-30 at 1.35.23 AM.png">



# Further Resources

The scope of this tutorial was relatively basic and it does not cover all aspects of ORA or of network analysis. For further information on that, there is extensive help documentation in ORA under Help>Help Contents. This should bring up a help window that will allow you to browse through help pages or search for a particular topic.

<img src="Screen Shot 2018-03-29 at 2.02.33 PM.png" width=700px>

Additionally there is an ORA Google group at https://groups.google.com/forum/#!forum/ora-google-group if you have any further questions about the software or any issues that you may have. For more information about network analysis in general, one option is 08-640: Dynamic network analysis which is a 12-unit course on network analysis offered by CMU.