# Connected: A Social Network Analysis Tutorial with NetworkX

__Presenters__: Rob Chew & Peter Baumgartner

#### Installation

``` bash
$ git clone https://github.com/rtidatascience/connected-nx-tutorial.git
$ cd connected-nx-tutorial
$ conda env create -f environment.yml
$ source activate connected
```

## Outline
- Introduction & Background
- Creating Graphs
- Visualizing Graphs
- Centrality
- Link Prediction
- Community

# What is Social Network Analysis? 

- Examples
    - Zachary's Karate Club
    - Florentine Marriages 
    - Semantic Text Network
- Definitions


## Examples
### Zachary's Karate Club Network
> The *Iris* dataset of social network analysis

<center><img src="https://upload.wikimedia.org/wikipedia/commons/2/2b/Karate_Cuneyt_Akcora.png" width="300"></center>

A social network of a karate club was studied by Wayne W. Zachary for a period of three years from 1970 to 1972. The network captures 34 members of a karate club, documenting 78 pairwise links between members who interacted **outside** the club. During the study a conflict arose which led to the split of the club into two. Based on collected data Zachary assigned correctly all but one member of the club to the groups they actually joined after the split.

There is even a [Zachary's Karate Club CLUB](http://networkkarate.tumblr.com/), which awards a trophy to the first person at a network conference to use Zachary's Karate Club Network as an example

### 15th Century Florentine Marriages

<center><img src="https://upload.wikimedia.org/wikipedia/commons/thumb/2/20/15th_Century_Florentine_Marriges_Data_from_Padgett_and_Ansell.pdf/page1-577px-15th_Century_Florentine_Marriges_Data_from_Padgett_and_Ansell.pdf.jpg" width="450"></center>

[[Padgett and Ansell, 1993](http://home.uchicago.edu/~jpadgett/papers/published/robust.pdf)]

The graph above is a marriage network of 16 influential Florentian families in the 1430s.  At this time in Renaissance Italy, the major families were essentially an oligarchy, controlling politics and money in the region.

Based on this network, can you surmise which family ascended to power in the proceeding decades?

By examining the right networks, we can understand which actors are the most central.  In this case, the network forecasts the Rise of the Medici's, even though they were not the most wealthy or most politically connected family at the time.

### Semantic Text Network

<center><img src="http://noduslabs.com/wp-content/uploads/2011/12/figure-5-meaning-circulation.png" width="400"></center>

A network of words in a document, connected and weighted by the frequency of appearance within 2-word and 5-word windows.

[Paranyushkin, D. (2011). Identifying the pathways for meaning circulation using text network analysis. Berlin: Nodus Labs](http://noduslabs.com/research/pathways-meaning-circulation-text-network-analysis/)

### Definitions

<img style="float: left; margin: 10px" src="http://revolution-computing.typepad.com/.a/6a010534b1db25970b0147e0ae51b2970b-800wi" width="400">

__Network__: a pattern of interconnections among a set of things [[Source](http://www.cs.cornell.edu/home/kleinber/networks-book/networks-book-ch01.pdf)]

__Social Network__: a network where the *things* are people and the *interconnections* are social interactions

__Social Network Analysis__ (SNA): the application of _graph and network theory_ to investigate social structures.

__Graph Theory:__ the study of graphs, which are mathematical structures used to model pairwise relations between objects.

__Network Theory:__ the study of complex interacting systems that can be represented as graphs equipped with extra structure.

### Parts of Graphs

<img style="float: left;" src="http://i.imgur.com/upMNKXf.png" width="400">

__Node / Vertex__: The entity of analysis which has a relationship. Node is used in the network context, vertex is used in the graph theory context, but both terms are often used interchangeably.

__Link / Edge / Relationship__: The connections between the nodes. Link is used in the network context, edge is used in the graph theory context, and all words are used interchangably with *relationship*.

__Attributes__: Both nodes and edges can store attributes, which contain additional data about that object.

__Weight__: A common *attribute* of edges, used to indicate *strength* or *value* of a relationship.

__Degree__: Number of edges a node has.

### Types of Graphs

Graphs are typically classified based on the presence of weights and direction attached to the edges in a graph. The table below covers what we call each type of graph:

|                | Absent     | Present  |
|----------------|------------|----------|
| __Weights__ | Unweighted | Weighted |
| __Directionality__ | Undirected | Directed |

__Additional flavors__: parallel edges, self-loops, *n*-partite graphs

In context:
> We are talking about a(n) __\[unweighted/weighted\]__ __\[undirected/directed\]__ graph (with __\[parallel edges | self loops\]__).

---

__Network__: a pattern of interconnections among a set of things