# <center>Structural Analysis and Visualization of Networks</center>

## <center/>Course Project #1

### <center>Student: *Nazarov Ivan*</center>

#### <hr /> General Information

**Due Date:** 17.05.2015 23:59 <br \>
**Late submission policy:** -0.5 points per day <br \>


Please send your reports to <mailto:leonid.e.zhukov@gmail.com> and <mailto:shestakoffandrey@gmail.com> with message subject of the following structure:<br \> **[HSE Networks 2015] *Nazarov* *Ivan* Project*1***

Support your computations with figures and comments. <br \>
If you are using IPython Notebook you may use this file as a starting point of your report.<br \>
<br \>
<hr \>

## Description

### Data

As a dataset to analyse you can choose one option in the following list:
1. Real Dataset (can be found [here](http://snap.stanford.edu/) or [here](http://konect.uni-koblenz.de/networks/))
2. Generated Dataset. Use more complex structure rather than just a simple ER model. For instance, you may consider multilevel network, where on the lower level you have several Watts-Strogatz graphs and on the upper level these graphs are respesented as randomly connected nodes.
3. Your data mined from Social Networks, Twitter, LiveJournal e.t.c.

**The order of your dataset should be no less than $10^4$ nodes**

### Models

Consider one of the following models:
1. SIR-based (or another with more than 3 letters) epidemic model
2. Independent Cascade Model
3. Linear Threshold Model

### Tasks

#### Network Descriptive Analysis

Provide information on your netowork: Source, Descriptive Statistics, Visualization

#### Main Task for model (1)

You are in charge of leading the vaccination campaign against some outbroken nonlethal disease. You have options to vactinate or provide medical treatment to infected ones. However, everything has its costs:
* Vaccination of a node costs $500 \$$ and make it immune to the disease all life-long. Unfortunately, you can help this way only to no more than $10\%$ of your population
* Medical Treatment costs $120\$$ per day of illness period, which in turn may take from $3$ to $7$ days

Your task is to implement the simulation model, propose some vaccination strategies and compare them.

#### Main Task for models (2-3)

You are running the marketing campaign for brand new pocket device. Initially you can sign contracts with a few people to advertize your gadget among their neigbours. The more "famous" person you are picking the greater price appears in the contract.
* Contract cost can be calculated as $300 \$ \times \text{NN}(i)$, where $\text{NN}(i)$ is size of the neigbourhood of the person $i$.
* You earn $250\$$ per each affected person

Your task is to maximize your influence and maximize profit of your campaign

In [None]:
import networkx as nx
import numpy as np
import re as re
import os

import scipy.sparse as sp

import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
## Create a copy of the FB graph's adjacency matrix.
if not os.path.exists( './data/proj01/facebook-wosn-links.npz' ) :
    G = nx.read_edgelist( './data/proj01/facebook-wosn-links/out.facebook-wosn-links',
        nodetype = int, data = ( ( 'flag', bool ), ( 'dttm', long ), ), comments = '%',
        create_using = nx.Graph( ) )
    A = nx.to_scipy_sparse_matrix( G )
## http://stackoverflow.com/questions/8955448/save-load-scipy-sparse-csr-matrix-in-portable-data-format
    np.savez( './data/proj01/facebook-wosn-links.npz',
        data = A.data, indices = A.indices, indptr = A.indptr, shape = A.shape )

In [None]:
## Load the cached adjacency matrix 
with np.load( './data/proj01/facebook-wosn-links.npz', 'r' ) as data :
    A = sp.csr_matrix( ( data[ 'data' ], data[ 'indices' ], data[ 'indptr' ] ), shape = data[ 'shape' ] )
# A[:, nz].tocsr( )[nz, :].tocsc( )

In [None]:
## Personal U[0.1] threshold
# theta = np.random.uniform( size = A.shape[ 0 ] )
## Flat thershold
theta = np.array( [ .35 ] * A.shape[ 0 ], dtype = np.float )

In [None]:
def ltm_sparse( A, affected, theta, niter = np.inf ):
    kiter = 0
## The tick array contains the time in tick when the activation took place
    tick = np.full( A.shape[ 0 ], np.inf, np.float )
## Pre multiply the vertex degrees by the threshold
    deg = np.array( A.sum( axis = 1 ).getA1( ), dtype = np.float )
    theta_deg = theta * deg
    while len( affected ) > 0 and kiter < niter :
## Deferred activation
        tick[ affected ] = kiter
## Get the nodes which have not been affected so far
        unaffected = np.isinf( tick ).nonzero( )[ 0 ]
## Find the number of affected neighbours of every unaffected vertex
        B = A[ np.isfinite( tick ), : ].tocsc( )[ :, unaffected ].sum( axis = 0 )
## Activate whenver the share of affected exceeds the threshold
        affected = unaffected[ B.getA1( ) > theta_deg[ unaffected ] ]
## Next simulation tick
        kiter += 1
## Return the tick of simulation when a node was activated. If the tick
##  is infinite, then the vertex has not been affected at all.
    return tick, deg

In [None]:
def profit( tick, deg, rate = 0.0, a = 250.0, b = 300.0 ) :
    discount = 1.0 / ( 1.0 + rate )
    t, n = np.unique( tick, return_counts = True )
    return np.sum( a * n[ t > 0 ] * ( discount ** t[ t > 0 ] ) ) - b * np.sum( deg[ tick == 0 ] )

In [None]:
seed = np.random.choice( G.number_of_nodes( ), size = 10000, replace = False )
tick, deg = ltm_sparse( A.tocsr( ), seed, theta )
print tick[ np.isfinite( tick ) ]
print profit( tick, deg, rate = 0 )

In [None]:
plt.plot( *np.unique( tick, return_counts = True ), color = 'black', linewidth = 2 )

<hr/>

In [None]:
from collections import deque
## The graph's vertices must be numbered from zero.
def bfs_spread( G, seed, theta = 0.5, niter = 1000 ) :
	deg = np.zeros( G.number_of_nodes( ), np.int )
## Initalize the array of states
	tick = np.full( G.number_of_nodes( ), np.inf, np.float )
## Cache deque operations for faster access (Python won't have to make lookups)
	deq = deque.popleft ; enq = deque.extend
## Initialize with the initial seed
	Q = deque( seed ) ; tick[ np.array( seed ) ] = 0
## While the (de)queue is not empty
	while Q :
## get the first vertex to have been added
		vertex = deq( Q )
## For each neighbour ...
		neighbours = np.array( G[ vertex ].keys( ), np.int )
		deg[ vertex ] = len( neighbours )
## See which neighbours have not been affected yet...
		unaffected = neighbours[ np.isinf( tick[ neighbours ] ) ]
		if len( unaffected ) :
##  and record the ``infection'' time.
			tick[ unaffected ] = tick[ vertex ] + 1
## Add it to the queue and update its distance from the source
			enq( Q, unaffected )
## Perform the actual spreading
	tock = { t: np.nonzero( tick == t )[ 0 ] for t in np.unique( tick ) }
# 	tuck = np.zeros( G.number_of_nodes( ), np.bool )
# 	tuck[ tock[ 0.0 ] ] = True
# 	kiter = 1.0
# 	while kiter < niter :
# 		ancestors = tock[ kiter - 1.0 ]
# 		kiter += 1
# np.concatenate( [ po.get( s, np.array( 0, np.float ) ) for s in po.keys( ) if s < 2.0 ] )
	return tick, tock

In [None]:
import scipy.io
data = scipy.io.loadmat( name )
## Read the boolean connectivity matrix and the associated data
A = spma.csr_matrix( data[ 'A' ], dtype = np.bool )