# Lecture 2: Numpy

![](../Images/l2_numpy.png) [numpy](http://www.numpy.org/) is cool for a lot of reasons, but mostly because it is the python module for playing with big (or almost big) array and embedding non trivial mathematical functions.

In [2]:
import numpy as np

## numpy arrays

In [4]:
cacca=np.array([[1, 2, 3], [4,5,6]])

In [5]:
cacca

array([[1, 2, 3],
       [4, 5, 6]])

#### .size

In [4]:
cacca.size

6

#### .shape

In [29]:
cacca.shape

(2, 3)

#### Accessing elements

In [30]:
cacca[0]

array([1, 2, 3])

In [31]:
cacca[0,1]

2

In [32]:
cacca[:,0]

array([1, 4])

#### Operation on arrays
Operation element by element!

In [33]:
cacca *2.

array([[ 2.,  4.,  6.],
       [ 8., 10., 12.]])

In [34]:
cacca %2

array([[1, 0, 1],
       [0, 1, 0]])

#### Mask
You can access the elements of an numpy array even with a boolean numpy array

In [35]:
cacca %2==0

array([[False,  True, False],
       [ True, False,  True]])

In [36]:
cacca[cacca %2==0]

array([2, 4, 6])

#### Data types and structured data types

In [37]:
np.array([[1, 2, 3], [4,5,6]], dtype=float)

array([[1., 2., 3.],
       [4., 5., 6.]])

In [38]:
np.array([[1., 'cacca', 42], [0,'bad',1.4]], dtype=object)

array([[1.0, 'cacca', 42],
       [0, 'bad', 1.4]], dtype=object)

In [39]:
np.array([[1., 'cacca', 42], [0,'bad',1.4]], dtype=float)

ValueError: could not convert string to float: 'cacca'

##### Structured data type

In [40]:
tmnt_list=['Donatello', 'Raffaello', 'Michelangelo', 'Leonardo']
tmnt_ages=[14, 15, 13, 16]

In [41]:
tmnt_np=np.array(list(zip(tmnt_ages,tmnt_list)), dtype=[('age', 'i8'), ('name','S20')])

In [42]:
tmnt_np

array([(14, b'Donatello'), (15, b'Raffaello'), (13, b'Michelangelo'),
       (16, b'Leonardo')], dtype=[('age', '<i8'), ('name', 'S20')])

In [43]:
tmnt_np['name']

array([b'Donatello', b'Raffaello', b'Michelangelo', b'Leonardo'],
      dtype='|S20')

In [44]:
tmnt_np[0]

(14, b'Donatello')

![](../Images/l2_ninja_turtle.jpg)

###### Searching on array with structured data type

In [45]:
tmnt_np[tmnt_np['name']=='Leonardo']

array([], shape=(0, 4), dtype=[('age', '<i8'), ('name', 'S20')])

In [46]:
tmnt_np[tmnt_np['name']=='Leonardo']['age']

array([], shape=(0, 4), dtype=int64)

### Operation among array

In [47]:
cacca

array([[1, 2, 3],
       [4, 5, 6]])

#### Transpose

In [48]:
cacca.T

array([[1, 4],
       [2, 5],
       [3, 6]])

#### np.dot

In [27]:
np.dot(cacca, cacca.T)

array([[14, 32],
       [32, 77]])

#### Reading/writing from/to file

In [50]:
adjacency_matrix=np.array([[0, 1, 0, 1],
       [0, 0, 1, 1],
       [1, 1, 0, 1],
       [0, 0, 1, 0]])

In [51]:
np.savetxt('something_new.txt',adjacency_matrix, fmt='%u', delimiter=',')

In [52]:
np.genfromtxt('something_new.txt', delimiter=',', dtype='i8')

array([[0, 1, 0, 1],
       [0, 0, 1, 1],
       [1, 1, 0, 1],
       [0, 0, 1, 0]])

Actually, there are multiple ways to read from/save to file in numpy

| file type | save | read | pro | con |
|:---:|:---:|:---:|:---:|:---:|
| .txt, .csv | np.savetxt  | np.genfromtxt, np.loadtxt  | files can be read by anyone | it is not memory efficient |
| .npy, .npz | np.save, np.savez | np.load  | it is memory efficient; it conserves the array data structure | file can be read only by numpy |


#### Note on Jupyter notebooks
You do not know either what the function is doing, what are the inputs, the outputs or even if there are any crucial options? No worry! **Shift+Tab** inside the parentheses!

In [None]:
np.genfromtxt()

### Interesting stuff and functions

#### np.zeros()

In [57]:
np.zeros(42)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0.])

In [55]:
np.zeros(42, dtype='int')

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [56]:
np.zeros(42, dtype=str)

array(['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '',
       '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '',
       '', '', '', '', '', '', '', ''], dtype='<U1')

#### np.ones()

In [59]:
np.ones(42)

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1.])

In [60]:
np.ones(42, dtype='int')

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

In [61]:
np.ones(42, dtype=str)

array(['1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1',
       '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1',
       '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1',
       '1', '1', '1'], dtype='<U1')

#### np.arange()

In [62]:
np.arange(42)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41])

In [63]:
np.arange(4,42)

array([ 4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
       21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37,
       38, 39, 40, 41])

#### np.unique()

In [64]:
cacca=np.array([1,2,4,1,2,4,12,42])

In [65]:
np.unique(cacca)

array([ 1,  2,  4, 12, 42])

In [66]:
np.unique(cacca, return_counts=True)

(array([ 1,  2,  4, 12, 42]), array([2, 2, 2, 1, 1]))

#### np.sum()

In [67]:
adjacency_matrix

array([[0, 1, 0, 1],
       [0, 0, 1, 1],
       [1, 1, 0, 1],
       [0, 0, 1, 0]])

In [68]:
np.sum(adjacency_matrix)

8

In [69]:
np.sum(adjacency_matrix, axis=0)

array([1, 2, 2, 3])

In [70]:
np.sum(adjacency_matrix, axis=1)

array([2, 2, 3, 1])

#### np.where()

In [71]:
np.where(cacca==1)

(array([0, 3]),)

In [72]:
np.where(adjacency_matrix==1)

(array([0, 0, 1, 1, 2, 2, 2, 3]), array([1, 3, 2, 3, 0, 1, 3, 2]))

#### np.max

In [3]:
np.max([0,1,2,3,4])

4

#### np.min

In [4]:
np.min([0,1,2,3,4])

0

### Exercise:
![](../Images/l2_zachary.jpg)
1. **load the file ./zachary.txt** It is an edge list of a monopartite undirected network. 

>This is the well-known and much-used Zachary karate club network. The data was collected from the members of a university karate club by Wayne Zachary in 1977. Each node represents a member of the club, and each edge represents a tie between two members of the club. The network is undirected. An often discussed problem using this dataset is to find the two groups of people into which the karate club split after an argument between two teachers.


--[Konect website](http://konect.uni-koblenz.de/networks/)
<br/>**Pay attention!** Those data were downloaded from the web and do not use the Python form for indices: the first element here is 1 and not 0! **Pay attention 2!** the first two lines contains other uninteresting data regarding the network, you can safely skip them. 
2. **calculate the degree sequence** 
3. **build the adjacency matrix**
4. **build the adjacency list**
5. **calculate the nearest neighbours degree** 

#### 1. Load the file

In [92]:
z_ed=np.genfromtxt('./Data/zachary.txt', skip_header=2, dtype='i8')-1

#### 2. The degree sequence

In [111]:
nodes, k=np.unique(z_ed, return_counts=True)

In [112]:
k

array([16,  9, 10,  6,  3,  4,  4,  4,  5,  2,  3,  1,  2,  5,  2,  2,  2,
        2,  2,  3,  2,  2,  2,  5,  3,  3,  2,  4,  3,  4,  4,  6, 12, 17])

#### 3. The adjacency matrix

In [94]:
l_z=len(np.unique(z_ed))

In [107]:
adjm=np.zeros((l_z, l_z), dtype='i8')
for e in z_ed:
    adjm[e[0],e[1]]=1
    adjm[e[1],e[0]]=1

#### 4. The adjacency list

In [108]:
l_z=len(np.unique(z_ed))

In [109]:
adjl=np.zeros(l_z, dtype=object)
for i in range(l_z):
    adjl[i]=np.where(adjm[i]==1)[0]

In [110]:
adjl

array([array([ 1,  2,  3,  4,  5,  6,  7,  8, 10, 11, 12, 13, 17, 19, 21, 31]),
       array([ 0,  2,  3,  7, 13, 17, 19, 21, 30]),
       array([ 0,  1,  3,  7,  8,  9, 13, 27, 28, 32]),
       array([ 0,  1,  2,  7, 12, 13]), array([ 0,  6, 10]),
       array([ 0,  6, 10, 16]), array([ 0,  4,  5, 16]),
       array([0, 1, 2, 3]), array([ 0,  2, 30, 32, 33]), array([ 2, 33]),
       array([0, 4, 5]), array([0]), array([0, 3]),
       array([ 0,  1,  2,  3, 33]), array([32, 33]), array([32, 33]),
       array([5, 6]), array([0, 1]), array([32, 33]), array([ 0,  1, 33]),
       array([32, 33]), array([0, 1]), array([32, 33]),
       array([25, 27, 29, 32, 33]), array([25, 27, 31]),
       array([23, 24, 31]), array([29, 33]), array([ 2, 23, 24, 33]),
       array([ 2, 31, 33]), array([23, 26, 32, 33]),
       array([ 1,  8, 32, 33]), array([ 0, 24, 25, 28, 32, 33]),
       array([ 2,  8, 14, 15, 18, 20, 22, 23, 29, 30, 31, 33]),
       array([ 8,  9, 13, 14, 15, 18, 19, 20, 22, 23, 26, 

#### 5. The Average Nearest Neighbour Degree

In [138]:
k_nn=np.zeros(l_z)
for i_adjl, aa in enumerate(adjl):
    k_nn[i_adjl]=np.sum(k[aa])/k[i_adjl]

In [139]:
k_nn

array([ 4.3125    ,  5.77777778,  6.6       ,  7.66666667,  7.66666667,
        6.25      ,  6.25      , 10.25      , 11.8       , 13.5       ,
        7.66666667, 16.        , 11.        , 11.6       , 14.5       ,
       14.5       ,  4.        , 12.5       , 14.5       , 14.        ,
       14.5       , 12.5       , 14.5       ,  8.        ,  4.33333333,
        4.66666667, 10.5       ,  8.75      , 11.        ,  9.        ,
       10.75      ,  9.        ,  5.08333333,  3.82352941])

#### 5.bis Calculate the $k^{nn}$ from the edge list

In [158]:
k_nn_el=np.zeros(np.max(z_ed)+1)
for n in range(np.max(z_ed)+1):
    cacca=np.where(z_ed==n)
    for c in range(len(cacca[0])):
        x=cacca[0][c]
        y=int(not cacca[1][c])
        who=z_ed[x][y]
        k_nn_el[n]+=k[who]
    k_nn_el[n]/=k[n]

In [159]:
k_nn_el

array([ 4.3125    ,  5.77777778,  6.6       ,  7.66666667,  7.66666667,
        6.25      ,  6.25      , 10.25      , 11.8       , 13.5       ,
        7.66666667, 16.        , 11.        , 11.6       , 14.5       ,
       14.5       ,  4.        , 12.5       , 14.5       , 14.        ,
       14.5       , 12.5       , 14.5       ,  8.        ,  4.33333333,
        4.66666667, 10.5       ,  8.75      , 11.        ,  9.        ,
       10.75      ,  9.        ,  5.08333333,  3.82352941])

In [162]:
np.all(k_nn==k_nn_el)

True

In [163]:
def k_nn_edgelist():
    k_nn_el=np.zeros(np.max(z_ed)+1)
    for n in range(np.max(z_ed)+1):
        cacca=np.where(z_ed==n)
        for c in range(len(cacca[0])):
            x=cacca[0][c]
            y=int(not cacca[1][c])
            who=z_ed[x][y]
            k_nn_el[n]+=k[who]
        k_nn_el[n]/=k[n]
    return k_nn_el

In [164]:
%timeit k_nn_edgelist()

301 µs ± 3.43 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [165]:
def k_nn_adj():
    k_nn=np.zeros(l_z)
    for i_adjl, aa in enumerate(adjl):
        k_nn[i_adjl]=np.sum(k[aa])/k[i_adjl]
    return k_nn

In [166]:
%timeit k_nn_adj()

136 µs ± 412 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


### Other interesting function of numpy

#### np.diag()

In [116]:
np.diag(adjacency_matrix)

array([0, 0, 0, 0])

extracts the diagonal from a square matrix or

In [64]:
np.diag([1,2,3,4])

array([[1, 0, 0, 0],
       [0, 2, 0, 0],
       [0, 0, 3, 0],
       [0, 0, 0, 4]])

builds a diagonal matrix from the array in the argument.

#### np.vstack()

In [203]:
np.vstack((np.arange(4), np.arange(4,8)))

array([[0, 1, 2, 3],
       [4, 5, 6, 7]])

In [202]:
np.vstack((np.arange(4), np.arange(4,8))).T

array([[0, 4],
       [1, 5],
       [2, 6],
       [3, 7]])

it stacks the two arrays one over the other.

#### np.isin()

In [66]:
cacca=np.array([0,1,3,4,5,42])

In [67]:
np.isin(cacca, np.array([0,42,3]))

array([ True, False,  True, False, False,  True])

In [68]:
cacca[np.isin(cacca, np.array([0,42,3]))]

array([ 0,  3, 42])

In [69]:
np.isin(np.array([0,42,3]),cacca)

array([ True,  True,  True])

### Exercise: 
6. **calculate the clustering coefficient**
6.bis **qualitatively behaviour of the clustering with the degree**
7. **qualitatively the network is assortative or disassortative?

#### 6. The clustering coefficient

In [171]:
np.all(adjm==adjm**3)

True

In [178]:
three_cycles=np.diag(np.dot(np.dot(adjm,adjm),adjm))/2
couple_of_k=k*(k-1)/2

In [179]:
couple_of_k

array([120.,  36.,  45.,  15.,   3.,   6.,   6.,   6.,  10.,   1.,   3.,
         0.,   1.,  10.,   1.,   1.,   1.,   1.,   1.,   3.,   1.,   1.,
         1.,  10.,   3.,   3.,   1.,   6.,   3.,   6.,   6.,  15.,  66.,
       136.])

Actually there is a zero! How does Python answer this?

In [183]:
three_cycles/couple_of_k

  """Entry point for launching an IPython kernel.


array([0.15      , 0.33333333, 0.24444444, 0.66666667, 0.66666667,
       0.5       , 0.5       , 1.        , 0.5       , 0.        ,
       0.66666667,        nan, 1.        , 0.6       , 1.        ,
       1.        , 1.        , 1.        , 1.        , 0.33333333,
       1.        , 1.        , 1.        , 0.4       , 0.33333333,
       0.33333333, 1.        , 0.16666667, 0.33333333, 0.66666667,
       0.5       , 0.2       , 0.1969697 , 0.11029412])

Nan!

In [184]:
clus=np.zeros(len(k))
for i in range(len(k)):
    if couple_of_k[i]>0:
        clus[i]=three_cycles[i]/couple_of_k[i]

In [185]:
clus

array([0.15      , 0.33333333, 0.24444444, 0.66666667, 0.66666667,
       0.5       , 0.5       , 1.        , 0.5       , 0.        ,
       0.66666667, 0.        , 1.        , 0.6       , 1.        ,
       1.        , 1.        , 1.        , 1.        , 0.33333333,
       1.        , 1.        , 1.        , 0.4       , 0.33333333,
       0.33333333, 1.        , 0.16666667, 0.33333333, 0.66666667,
       0.5       , 0.2       , 0.1969697 , 0.11029412])

In [186]:
np.sum(clus)/len(k)

0.5706384782076823

#### 6.bis What is qualitatively the relation between the degree and the clustering?

In [204]:
np.vstack((k, clus)).T

array([[16.        ,  0.15      ],
       [ 9.        ,  0.33333333],
       [10.        ,  0.24444444],
       [ 6.        ,  0.66666667],
       [ 3.        ,  0.66666667],
       [ 4.        ,  0.5       ],
       [ 4.        ,  0.5       ],
       [ 4.        ,  1.        ],
       [ 5.        ,  0.5       ],
       [ 2.        ,  0.        ],
       [ 3.        ,  0.66666667],
       [ 1.        ,  0.        ],
       [ 2.        ,  1.        ],
       [ 5.        ,  0.6       ],
       [ 2.        ,  1.        ],
       [ 2.        ,  1.        ],
       [ 2.        ,  1.        ],
       [ 2.        ,  1.        ],
       [ 2.        ,  1.        ],
       [ 3.        ,  0.33333333],
       [ 2.        ,  1.        ],
       [ 2.        ,  1.        ],
       [ 2.        ,  1.        ],
       [ 5.        ,  0.4       ],
       [ 3.        ,  0.33333333],
       [ 3.        ,  0.33333333],
       [ 2.        ,  1.        ],
       [ 4.        ,  0.16666667],
       [ 3.        ,

In [207]:
np.vstack((k[np.argsort(k)],clus[np.argsort(k)])).T

array([[ 1.        ,  0.        ],
       [ 2.        ,  1.        ],
       [ 2.        ,  1.        ],
       [ 2.        ,  1.        ],
       [ 2.        ,  1.        ],
       [ 2.        ,  1.        ],
       [ 2.        ,  1.        ],
       [ 2.        ,  1.        ],
       [ 2.        ,  1.        ],
       [ 2.        ,  1.        ],
       [ 2.        ,  1.        ],
       [ 2.        ,  0.        ],
       [ 3.        ,  0.33333333],
       [ 3.        ,  0.66666667],
       [ 3.        ,  0.66666667],
       [ 3.        ,  0.33333333],
       [ 3.        ,  0.33333333],
       [ 3.        ,  0.33333333],
       [ 4.        ,  1.        ],
       [ 4.        ,  0.5       ],
       [ 4.        ,  0.66666667],
       [ 4.        ,  0.5       ],
       [ 4.        ,  0.5       ],
       [ 4.        ,  0.16666667],
       [ 5.        ,  0.5       ],
       [ 5.        ,  0.6       ],
       [ 5.        ,  0.4       ],
       [ 6.        ,  0.66666667],
       [ 6.        ,

#### 7. Assortativity

In [208]:
np.vstack((k[np.argsort(k)],k_nn[np.argsort(k)])).T

array([[ 1.        , 16.        ],
       [ 2.        ,  4.        ],
       [ 2.        , 10.5       ],
       [ 2.        , 14.5       ],
       [ 2.        , 12.5       ],
       [ 2.        , 14.5       ],
       [ 2.        , 14.5       ],
       [ 2.        , 12.5       ],
       [ 2.        , 14.5       ],
       [ 2.        , 11.        ],
       [ 2.        , 14.5       ],
       [ 2.        , 13.5       ],
       [ 3.        , 14.        ],
       [ 3.        ,  7.66666667],
       [ 3.        ,  7.66666667],
       [ 3.        , 11.        ],
       [ 3.        ,  4.33333333],
       [ 3.        ,  4.66666667],
       [ 4.        , 10.25      ],
       [ 4.        , 10.75      ],
       [ 4.        ,  9.        ],
       [ 4.        ,  6.25      ],
       [ 4.        ,  6.25      ],
       [ 4.        ,  8.75      ],
       [ 5.        , 11.8       ],
       [ 5.        , 11.6       ],
       [ 5.        ,  8.        ],
       [ 6.        ,  7.66666667],
       [ 6.        ,

### Exercise:
1. **load the file ../Data/highschool.txt** It is an edge list of a monopartite directed network. 
>This directed network contains friendships between boys in a small highschool in Illinois. Each boy was asked once in the fall of 1957 and the spring of 1958. This dataset aggregates the results from both dates. A node represents a boy and an edge between two boys shows that the left boy chose the right boy as a friend.

--[Konect website](http://konect.uni-koblenz.de/networks/)
<br/>**Pay attention!** Those data were downloaded from the web and do not use the Python form for indices: the first element here is 1 and not 0! **Pay attention 2!** the first two lines contains other uninteresting data regarding the network, you can safely skip them. 
2. **calculate the in- and out-degree sequences** 
3. **build the adjacency matrix**
4. **calculate the in- and out- nearest neighbours degrees**
5. **calculate the in- and out-clustering coefficients**