# Math Foundations

## Graph-theoretic Concepts

#### 1 - For this exercise we will use datasets CAMPNET and ZACKAR.

1. CAMPNET. This is a network of 18 participants in a qualitative methods class. Ties are directed and represent that the ego indicated that the nominated alter was one of the three people with which s/he spent the most time during the seminar. 
2. ZACKAR. These are data collected from the members of a university karate club by Wayne Zachary. The ZACHE matrix represents the presence or absence of ties among the members of the club; the ZACHC matrix indicates the relative strength of the associations (number of situations in and outside the club in which interactions occurred).

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import networkx as nx

%matplotlib inline

In [None]:
campnet = pd.read_csv('data/campnet.csv', index_col='ID')
zachar_e = pd.read_csv('data/ZACKE.csv', index_col='ID')
zachar_c = pd.read_csv('data/ZACKC.csv', index_col='ID')

#### 2 - Calculate overall density of the ZACKAR networks

In [None]:
zachar_e_npa = zachar_e.as_matrix(columns=list(zachar_e))
zachar_e_graph = nx.from_numpy_matrix(zachar_e_npa)
plt.figure(figsize=(12,12))
nx.draw_networkx(zachar_e_graph)

In [None]:
nx.density(zachar_e_graph)

In [None]:
zachar_c_npa = zachar_c.as_matrix(columns=list(zachar_c))
zachar_c_graph = nx.from_numpy_matrix(zachar_c_npa)
plt.figure(figsize=(12,12))
nx.draw_networkx(zachar_c_graph)

In [None]:
nx.density(zachar_c_graph)

#### 3 - Now calculate and interpret the density of the CAMPNET network. Does it make sense to look at the density of this dataset?

In [None]:
campnet_npa = campnet.as_matrix(columns=campnet.index)
campnet_graph = nx.from_numpy_matrix(campnet_npa)
campnet_digraph = nx.DiGraph(campnet_graph)
plt.figure(figsize=(12,12))
nx.draw_networkx(campnet_digraph)

In [None]:
nx.density(campnet_graph)

#### 4 - Calculate the number of weak AND strong components in the CAMPNET dataset.

In [None]:
campnet_weak = nx.number_weakly_connected_components(campnet_digraph)
campnet_weak

In [None]:
campnet_strong = nx.number_strongly_connected_components(campnet_digraph)
campnet_strong

#### 5 - Trace all paths between two nodes in CAMPNET. Kudos to whomever can find the longest path between two actors in this dataset. Make sure to find two actors who have a path. Graph the network to identify some actors that have long paths between them.

In [None]:
for path in nx.all_simple_paths(campnet_digraph, source=0, target=17):
    print(path)

#### 6 - Run geodesic distance on Campnet. Choose how you would like your undefined distances to be saved as.

In [None]:
campnet_shortest_path = nx.shortest_path(campnet_digraph)
campnet_shortest_path

#### 7 - Run reachability on campnet. Try to explain how this may be useful to know in an organizational or public health setting. Think interventions…

In [None]:
campnet_reach = nx.descendants(campnet_digraph, source=5)
campnet_reach

## Matrix Multiplication

In this exercise, we will be using matrix multiplication.

We will be using the PADGETT dataset. This dataset contains two matrices, PADGM and PADGB. Padgm represents marriages ties and padgb represents business ties. 

In [None]:
padgm = pd.read_csv('data/padgm.csv', index_col='ID')
padgb = pd.read_csv('data/padgb.csv', index_col='ID')

#### 1 - Multiply padgm by padgb. Call the result mb.

In [None]:
mb = np.dot(padgm, padgb)
mb

#### 2 - Display the contents of mb. If there is a zero in the (2,6) cell (Albizzi to Ginori), what does that mean? Interpret the 1s, 2s and 3s as well.

#### 3 - Multiply padgb by padgm. Call the results bm. Yes, order matters. Think of mb and bm as two newly measured social relations among these families. What would you call these relations? What does it mean for a family to have the mb relation with another family? How is it different from having the bm relation with the other family?

In [None]:
bm = np.dot(padgb, padgm)
bm

#### 4 - Multiply padgm by padgm and call the result mm. Display mm. How do you interpret the values?

In [None]:
mm = np.dot(padgm, padgm)
mm

#### 5 - Switching datasets, multiply the dataset campnet by its transpose. How do you interpret the values?

In [None]:
campnet_mm = np.dot(campnet, campnet.transpose())
campnet_mm

## Matrix Algebra

LINKS worksheet - by hand

## 2-mode to 1-mode

#### Importing a 2-mode dataset

In Excel, open the file called “Aom division membership.xls”. The data are from a survey of 3,324 Academy of Management members, asking them which of 23 divisions of the AOM they belonged to. Save the dataset as a matrix “membership”. 

#### Converting to 1-mode

From the Ucinet main menu, go to Data|Affiliations and put in membership as the input dataset. Choose Columns as the mode, and choose sum of cross products as the method. Finally, call the output dataset “comembership”. Results should be this:

You can verify that this matrix is constructed by pre-multiplying the membership matrix by its transpose as follows. From Ucinet’s toolbar, press the CLI button. This opens a command line area. Now type this matrix multiplication command:

->xtx = prod(transp(membership) membership)  //result will be dataset called xtx
->dsp xtx   // result should be same matrix as above

#### Normalizing

Notice in the matrix that only 7 people belong to both BPS (strategic management division) and CAR (careers division). It is true that researchers interested in corporate strategy tend not to be interested in careers. But we need to be careful: by chance along the overlap between the divisions is likely to be small since one of the divisions is very small (just 130 members). We should correct for the sizes of the different divisions. To do that we rerun Data|Affiliations, but this time choose Bonacich (1972) as the method. This effectively compares the observed overlaps with what you would get by chance, given the different division sizes, and assuming the choice of one division is independent of the choice of another. The results are normalized to run between 0 and 1. Before running it,  make sure to change the output file name to “normcomemb”. Your result should look like the below. Now, all of the numbers are comparable to each as the influence of group size has been filtered out.

A cluster analysis of this matrix yields the following 4-cluster solution:

For those unfamiliar with these divisions, the first cluster consists of divisions that very macro: the unit of analysis is usually the firm. The second cluster consists of super micro divisions, where the unit of analysis is the individual. 