Version: 6 May 2019
## 39. Methodenseminar
# Big Data Module II: Introduction to Social Network Science with Python
## Pre-Course Preparation
Congratulations, you have successfully started Anaconda and opened this Jupyter Notebook. You're now ready to do analyses with a large number of pre-installed packages.
### Getting Acquainted to Notebooks
The cell you're currently looking at is a "Markdown" cell. it is used to write text. Double click on this cell to see the markdown code. You will see that (multiple) hash keys are used to create headlines.

You can display maths $a^2+b^2=c^2$ in text, equations

\begin{equation}
p(x)=Cx^{-\alpha},
\end{equation}

tables

| This | is |
|-|-|
| a | table|

and images.

<img src="images/jupyter.jpg" width="963" height="292"/>

To switch back from code to display, "run" the cell, either by clicking the "Run" button above or by pushing "Shift-Enter".

There are many great keyboard shortcuts. Press "h" to see a list of shortcuts.

New cells are created by pushing the "+" button. New cells are automatically computer "Code" cells.

In [1]:
'Hello Anaconda'

'Hello Anaconda'

In [2]:
s = 'Hello Anaconda'

In [3]:
s

'Hello Anaconda'

In [4]:
print(s)

Hello Anaconda


### Importing Packages
The Anaconda distribution of Python is handy because it comes with many pre-installed code packages. In principle, such modules or libraries are activated with the ``import`` command:

In [5]:
import math

In [6]:
math.log10(10)

1.0

Sometimes you will want to use a short name for a package:

In [7]:
import math as mt

In [8]:
mt.log10(10)

1.0

Note that you have to type the module name (``math`` or ``mt``) before each function call. You can also import a specific function of a module. Then the explicit call is not necessary:

In [9]:
from math import log10

In [10]:
log10(10)

1.0

Now that you know the basics of importing, make sure that the pre-installed packages work. NumPy, Pandas, and NetworkX are the main ones we will be using in the course. Make sure those work.

#### NumPy
NumPy is the fundamental package for scientific computing with Python. Information and tutorials [here](http://www.numpy.org/).

In [11]:
import numpy as np

An example command:

In [12]:
a = [1, 2, 3, np.nan, 5, np.nan, np.nan, 8]
a

[1, 2, 3, nan, 5, nan, nan, 8]

In [13]:
np.mean(a)

nan

#### Pandas
Pandas provides data structures and data analysis tools. Information and tutorials [here](http://pandas.pydata.org/).

In [14]:
import pandas as pd

In [15]:
df = pd.Series([1, 2, 3, np.nan, 5, np.nan, np.nan, 8])
df

0    1.0
1    2.0
2    3.0
3    NaN
4    5.0
5    NaN
6    NaN
7    8.0
dtype: float64

In [16]:
np.mean(df)

3.8

#### NetworkX
NetworkX is a package for the creation, manipulation, and study of networks. Information and tutorials [here](https://networkx.github.io/).

In [17]:
import networkx as nx

In [18]:
g = nx.Graph()
g.add_node(3)
g.add_edge(1, 2)
print('nodes:', list(g.nodes()))
print('edges:', list(g.edges()))

nodes: [3, 1, 2]
edges: [(1, 2)]


### Installing Packages
Even though Anaconda comes with many pre-installed packages, some are missing. The best way to install packages is to call ``conda install package-name`` in the terminal following [this user guide](https://docs.anaconda.com/anaconda/user-guide/tasks/install-packages/). You can open a terminal in the home screen as shown in the above image. However, some packages are not available for installation using ``conda``. In this case, some packages can be installed using ``pip install package-name``. There is an important difference between ``conda`` and ``pip``, particularly regarding environments and dependencies. Learn more about this [here](https://www.anaconda.com/understanding-conda-and-pip/), if interested.
#### Powerlaw
Powerlaw is a package for the estimation of distribution parameters. Information and tutorials [here](https://github.com/jeffalstott/powerlaw). We'll be using this package in the course. Since it can't be installed using ``conda``, execure ``pip install powerlaw`` in the terminal. Alternatively, you can install packes in a notebook by adding an exclamation mark. If you want to do that, uncomment (remove the hash key) in the following cell:

In [19]:
#!pip install powerlaw

#### Graph-tool
Graph-tool is a powerful module for the manipulation, statistical analysis, and visualization of networks. Information and documentation [here](https://graph-tool.skewed.de/). Graph-tool can't be installed using ``conda`` or ``pip``, but must be installed using Docker, package managers, or manual compilation, following the instructions [here](https://git.skewed.de/count0/graph-tool/wikis/installation-instructions).

In [20]:
from graph_tool.all import *

ModuleNotFoundError: No module named 'graph_tool'

In [None]:
h = Graph()
h.add_vertex(3)
h.add_edge(0, 1)
print('nodes:', list(h.vertex_index))
print('edges:', list(h.edge_index))

### GESIS Notebooks
If you don't succeed installing Graph-tool, you will have the opportunity to work with [GESIS Notebooks](https://notebooks.gesis.org/), a virtual server with Anaconda and all necessary packages already installed. To do so, go to [github.com/gesiscss/methods_seminar_2019](https://github.com/gesiscss/methods_seminar_2019) and click on the "launch binder" button.

<img src="images/github.jpg" width="689" height="482"/>

<font color='red'>Note that GESIS Notebooks are only temporally provided to users. While notebooks and figures can be saved, the virtual server will shut down after a while and all changes will be lost. Therefore, make sure to save your files frequently.</font>

### Practicing Python
It is highly recommended that you have some experience with Python when entering the course. If you don't have that yet, you can study a few sections of the *Python Data Science Handbook* by Jake VanderPlas, either by

- visiting the book's [website](https://jakevdp.github.io/PythonDataScienceHandbook/), clicking on the sections you want to study, and typing the code in a blank local notebook;
- visiting the book's [code repository](https://github.com/jakevdp/PythonDataScienceHandbook), cloning or downloading the notebooks, and opening them locally; or
- visiting the book's [code repository](https://github.com/jakevdp/PythonDataScienceHandbook) and executing the notebooks on a virtual server ("launch binder").

We propose that you practice Python by studying sections [1](https://jakevdp.github.io/PythonDataScienceHandbook/02.01-understanding-data-types.html), [2](https://jakevdp.github.io/PythonDataScienceHandbook/02.02-the-basics-of-numpy-arrays.html), [3](https://jakevdp.github.io/PythonDataScienceHandbook/02.03-computation-on-arrays-ufuncs.html) on NumPy, sections [1](https://jakevdp.github.io/PythonDataScienceHandbook/03.01-introducing-pandas-objects.html), [2](https://jakevdp.github.io/PythonDataScienceHandbook/03.02-data-indexing-and-selection.html), [3](https://jakevdp.github.io/PythonDataScienceHandbook/03.03-operations-in-pandas.html) on Pandas, and sections [1](https://jakevdp.github.io/PythonDataScienceHandbook/04.01-simple-line-plots.html), [2](https://jakevdp.github.io/PythonDataScienceHandbook/04.02-simple-scatter-plots.html) on the visualization package Matplotlib.

### Recommended Textbooks
##### Networks
Newman, Mark (2018). *Networks: An Introduction*. Oxford University Press, 2nd edition. Rather complete and technical overview of Network Science.

Barabási, Albert-Lázló (2016). *Network Science*. Cambridge University Press. Less broad but more readable account of Network Science than Newman's. Can be read [online](http://networksciencebook.com/). Data is [available](http://networksciencebook.com/translations/en/resources/data.html).

Zweig, Katharina A. (2016). *Network Analysis Literacy: A Practical Approach to the Analysis of Networks*. Springer. ...

Wasserman, Stanley & Katherine Faust (1995). *Social Network Analysis: Methods and Applications*. Cambridge University Press. Dated but still a standard on early work in Social Network Analysis.

##### Networks and Python
Caldarelli, Guido & Alessandro Chessa (2016). *Data Science & Complex Networks*. Oxford University Press. From the physics perspective. Notebooks are [online](https://github.com/datascienceandcomplexnetworks/book_code).

