Version: 27 July 2020
### 40. Methodenseminar (Online-Seminar)
## Methods of Computational Social Science
# Introduction to Social Network Science with Python

### Pre-Course Preparation
Congratulations, you have successfully installed Anaconda and opened this notebook. You're now ready to do data analyses with a large number of pre-installed packages.
#### Getting Acquainted to Jupyter Notebooks
Consult [this introduction](https://jupyter-notebook.readthedocs.io/en/stable/notebook.html) to learn what Jupyter Notebooks are.

The text you're currently reading is written in a cell which is a "Markdown" cell. Double click on this cell to see the markdown code. You will see that (multiple) hash keys are used to create headlines.

In markdown cells, you can display maths,

 $a^2+b^2=c^2$

equations,

\begin{equation}
p(x)=Cx^{-\alpha},
\end{equation}

tables,

| This | is |
|-|-|
| a | table|

and images.

<img src='images/jupyter.png'/>

To switch from markdown code to display, "run" the cell, either by clicking the "Run" button above or by pushing "Shift-Enter".

There are many great keyboard shortcuts. Press "h" to see a list of shortcuts.

Say hello to Anaconda:

In [1]:
'Hello Anaconda'

'Hello Anaconda'

In [2]:
x = 'Hello Anaconda'

In [3]:
x

'Hello Anaconda'

In [4]:
print(x)

Hello Anaconda


#### Importing Packages
The Anaconda distribution of Python is handy because it comes with many pre-installed code packages. In principle, such modules or libraries are activated with the ``import`` command:

In [5]:
import math

Now that the package is ready, you can call one of its methods, e.g., the ``log10()`` method:

In [6]:
math.log10(10)

1.0

Sometimes you will want to use a short name for a package:

In [7]:
import math as mt

In [8]:
mt.log10(10)

1.0

Note that you have to type the package name (``math`` or ``mt``) before each method call. You can also import a specific method of a package. Then the explicit call is not necessary:

In [9]:
from math import log10

In [10]:
log10(10)

1.0

Now that you know the basics of importing, make sure that the pre-installed packages work. NumPy, Pandas, and NetworkX are the main ones we will be using in the course. Check them out.

##### NumPy
NumPy is the fundamental package for handling vectors, matrices, and tensors. Information and tutorials [here](http://www.numpy.org/).

In [11]:
import numpy as np

An example command:

In [12]:
a = [1, 2, 3, np.nan, 5, np.nan, np.nan, 8]
a

[1, 2, 3, nan, 5, nan, nan, 8]

In [13]:
np.mean(a)

nan

In [14]:
np.nanmean(a)

3.8

##### Pandas
Pandas provides data structures and data analysis tools (it is the closest Python gets to Excel ;) Information and tutorials [here](http://pandas.pydata.org/).

In [15]:
import pandas as pd

In [16]:
s = pd.Series(a)
s

0    1.0
1    2.0
2    3.0
3    NaN
4    5.0
5    NaN
6    NaN
7    8.0
dtype: float64

In [17]:
s.mean()

3.8

##### NetworkX
NetworkX is the package for the creation, manipulation, and study of networks that we use in class. Information and tutorials [here](https://networkx.github.io/).

In [18]:
import networkx as nx

In [19]:
g = nx.Graph()
g.add_edge('Peter', 'Mary')
print('nodes:', list(g.nodes()))
print('edges:', list(g.edges()))

nodes: ['Mary', 'Peter']
edges: [('Mary', 'Peter')]


In [20]:
nx.draw(g)

#### Installing Packages (not needed for the course)
Even though Anaconda comes with many pre-installed packages, some may be missing. The best way to install packages using the Anaconda Navigator. You can also call ``conda install package-name`` in the terminal following [this user guide](https://docs.anaconda.com/anaconda/user-guide/tasks/install-packages/). Open a terminal in the home screen as shown in the above image. However, some packages are not available for installation using ``conda``. In this case, some packages can be installed using ``pip install package-name``. There is an important difference between ``conda`` and ``pip``, particularly regarding environments and dependencies. Learn more about this [here](https://www.anaconda.com/understanding-conda-and-pip/), if interested.

#### Practicing Python
Finally, you will need to have basic knowledge of coding in Python to benefit from the course. Please do not underestimate this. There are a few ways you can get experience with coding in Python.

If you have enrolled in the 2-day [Introduction to Python for Social Scientists](https://training.gesis.org/?site=pDetails&child=full&pID=0x387DA6358EF341928F67355388570B76&subID=0xC6F60AF2989A45BFBFDB2570E677EA1E) you should be fine.

If you haven't enrolled -- or if you just want to make sure you're prepared --, please work through notebooks 01 to 06 in the same ILIAS folder as this notebook. These notebooks teach you most elementary Python skills.

However, those notebooks still don't teach you about NumPy and Pandas. To get acquainted with those packages, please study the relevant sections of the *Python Data Science Handbook* by Jake VanderPlas, either by

- visiting the book's [website](https://jakevdp.github.io/PythonDataScienceHandbook/), clicking on the sections you want to study, and typing the code in a blank local notebook;
- visiting the book's [code repository](https://github.com/jakevdp/PythonDataScienceHandbook), cloning or downloading the notebooks, and opening them locally; or
- visiting the book's [code repository](https://github.com/jakevdp/PythonDataScienceHandbook) and executing the notebooks on a virtual server ("Open in Colab" or "launch binder" at the bottom of the page).

Check out sections [1](https://jakevdp.github.io/PythonDataScienceHandbook/02.01-understanding-data-types.html), [2](https://jakevdp.github.io/PythonDataScienceHandbook/02.02-the-basics-of-numpy-arrays.html), [3](https://jakevdp.github.io/PythonDataScienceHandbook/02.03-computation-on-arrays-ufuncs.html) on NumPy and sections [1](https://jakevdp.github.io/PythonDataScienceHandbook/03.01-introducing-pandas-objects.html), [2](https://jakevdp.github.io/PythonDataScienceHandbook/03.02-data-indexing-and-selection.html), [3](https://jakevdp.github.io/PythonDataScienceHandbook/03.03-operations-in-pandas.html), [6](https://jakevdp.github.io/PythonDataScienceHandbook/03.05-hierarchical-indexing.html), [7](https://jakevdp.github.io/PythonDataScienceHandbook/03.06-concat-and-append.html) on Pandas.

### Recommended Readings
There are two schools of doing network analysis. Borgatti & Everett (2018) teach the Social Network Analysis way that has been practiced in the social sciences for over fifty years. Menczer et al. (2020) teach the network science way that is being practiced in the interdiciplinary complex networks community for about 20 years. In the course, we will draw on both schools. Zweig (2016) adds important commentary and epistemological guidance to doing better network analysis. We will also make reference to the general social network theory of White (2008). A summary of this theory is available in German (Schmitt & Fuhse, 2015).

#### References
Borgatti, S.P., M.G. Everett, & J.C. Johnson. 2018. *Analyzing Social Networks*. 2nd Edition. SAGE.

White, H.C. 2008. *Identity and Control: How Social Formations Emerge*. Princeton University Press.

Schmitt, M. & J. Fuhse. 2015. *Zur Aktualität von Harrison White: Einführung in sein Werk*. Springer VS.

Menczer, F., S. Fortunato, & C.A. Davis. 2020. *A First Course in Network Science*. Cambridge University Press. Tutorials, datasets, and other material at: https://github.com/CambridgeUniversityPress/FirstCourseNetworkScience/

Zweig, K.A. 2016. *Network Analysis Literacy*. Springer.