### GESIS Fall Seminar in Computational Social Science 2022
### Introduction to Computational Social Science with Python
# Day 1-2: Setting up Your Workflow

## Overview

* Map of software tools and options
* Typical workflow
   1. Installing Python and managing libraries with **Anaconda**
   2. Writing and running Python code with **Jupyter**
   3. Version control with **git**
   4. Code sharing and cloud storage with **GitHub**
   5. Efficient and reproducible workflows with **Bash**

## Map of Software Tools and Options

![Software map](figs/software_map.png "Software map")

## IDEs

* Integrated development environment
* A software application that facilitates computer programming and software development
  * Text editor with syntax highlighting, auto completion and smart indentation
  * Shell with syntax highlighting
  * Popular libraries
  * (Debugger)
* For example:
  * Spyder
  * PyCharm
  * Atom + Hydrogen/Terminal
  * **Jupyter + Anaconda**

## Anaconda

![Anaconda](figs/anaconda.png "Anaconda")

* Freemium open-source cross-platform distribution of the Pyhton and R programming languages
  * `conda` – package management system
  * `pandas`, `numpy`, `statsmodels`, `networkx`, `scikit-learn`, `matplotlib` – packages for data science 
  * Anaconda Navigator – graphical user interface
  * Jupyter Notebook – web app for creating and sharing code

## Installing Anaconda

* Go to https://www.anaconda.com/download/
* Select your OS
* **Download Python 3.8 version**
* Follow instructions

## Jupyter

![Jupyter](figs/jupyter.png "Jupyter")

* Open-source web application for creating and sharing documents with:
  * Live code
  * Equations
  * Visualizations
  * Explanatory text
* Supports more than 40 programming languages, including Python and R
* Notebook files have *.ipynb* extension and can be easily shared, e.g. on GitHub

## Launching Jupyter

* Launch Anaconda Navigator and click on Jupyter Notebook icon

or 

* Open Terminal/cmd and type: 

```
> jupyter notebook
```

## Using Jupyter

* New &rarr; Notebook: Python 3
* Insert &rarr; Insert Cell Below
* Cell &rarr; Cell Type &rarr;
  * Markdown
    * Lightweight markup language
    * See cheatsheet: https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet
    * CTRL+ENTER to run
    * Double-click to edit
  * Code
    * CTRL+ENTER to run
* Cell &rarr; Run All
  * Code is run top-down so you can use code from cells above in current cell

## Shutting Down Jupyter

* Do not forget to Command+S / CTRL+S !
* Jupyter is a server and closing the browser window will not shut it down
* To close a notebook:
    * File &rarr; Close and Halt
    * On Notebook Dashboard &rarr; Slect notebook &rarr; Shutdown
* To shut down server:
    * On Notebook Dashboard &rarr; Quit
    * Terminal &rarr; CTRL+C &rarr; `y`

## Alternative Python Workflow

* Use another IDE

or 

* Use text editor (e.g. Atom) to create .py files
* Run files in Terminal/cmd

```
> cd Path/to/file
> python filename.py
```

## Version Control and Cloud Storage with GitHub

![GutHub Octocat](figs/Octocat.jpg "GutHub Octocat")

* Code hosting platform for version control and collaboration
* Based on Git
  * Version control system for tracking changes in computer files and coordinating work on those files among multiple people
  * Created in 2005 by Linus Torvalds
* Largest host of source code in the world
* Bought by Microsoft in 2018

## GitHub Lingo

* **Repository** – a space for a project/assignment
* **Clone** – a copy of the repository that lives on your computer
* **Branch** – a paralel version of the repository
* **Commit** – save changes with a short description
* **Pull request** – ask changes to be merged
* **Merge** – incorporate changes (then delete branch)

## GitHub Workflow

![GutHub Workflow](figs/github.jpg "GutHub Workflow")

## Getting Started with GitHub

* Create personal account on https://github.com/
* Go to https://education.github.com/ and get the Student Developer Pack for some cool freebies

* Three ways to interact
  1. Browser
  2. Command line 
  3. GitHub Desktop

## Terminal = Console = Shell = Bash = Command Line = Command Prompt

(for our purposes here)

![Terminal](figs/terminal.png "Terminal")

* Efficient way to access files, run programs, and execute code
* Allows to schedule and batch-process tasks
* Provides scripts for reproducible workflows across different operating systems

## Useful Bash Commands
* Print current working directory
```
pwd
```
* Change current working directory
```
cd Path/to/directory
```
* Go back to the parent directory of the current one
```
cd ..
```
* Go back to your home directory
```
cd ~
```
* Create a new directory
```
mkdir dirname
```
* Print a list of files and subdirectories
```
ls
```
* Launch a Python interpreter (type `exit()` to stop and go back to bash) 
```
python
```

## Change Your Default Text Editor for Git

You can use your favorite editor by customizing the Git default editor.

For example, you can use [**Nano**](https://www.nano-editor.org/). It is much easier to use than Vim: `Ctrl+o` to save and `Ctrl+x` to close.

To set Nano as the default editor for your commit messages, run the following:

```
git config --global core.editor "nano"
```

Nano comes pre-installed with Linux and OS. For Windows, download and install [**Nano-win**](https://github.com/mcandre/nano-win).

## Important Git Commands 

* Copy online repository
```
git clone https://github.com/gesis-css-python/materials.git
```
* Update local repository
```
git pull
```
* See the status of local respository
```
git status
```
* See the change history of local respository
```
git log
```
* Stage all changes
```
git add --all
```
* Commit staged changes
```
git commit -m "your commit message here"
```
* Upload your changes to online repository
```
git push
```

## Viewing Course Materials on GitHub (Browser)

* Syllabus and course materials at https://github.com/gesis-css-python/materials

* Answers to exercises at https://github.com/gesis-css-python/materials/answers

    * To view this repository, you need to be added to the Students team of the GitHub organization "GESIS Introduction to Computational Social Science with Python"
    * Please e-mail your GitHub username to m.tsvetkova@lse.ac.uk
    * Do not forget to **accept the invitation** to join the team and organization!


## Cloning Course Materials from GitHub (Command Line)

### \*Install and set up `git`

Follow instructions here: https://help.github.com/articles/set-up-git/

### Cloning

  `> cd Path/to/directory`

  `> git clone https://github.com/gesis-css-python/materials.git`

### Updating

  `> cd Path/to/materials`

  `> git pull`


## Storing Your Work on the Cloud (Command Line)

![Git Commands](figs/git.jpg "Git Commands")


## When Working Solo or in Small Teams (Command Line)

For your solo work or small-team projects where you have push privileges:

1. To create a local copy, **clone** the repository (Terminal)

  `> cd Path/to/directory`

  `> git clone link.git`
  
  (You can obtain the link when you click the "Clone or download" button on the GitHub page for the repository)

2. You can now make changes in the downloaded file (Jupyter)

3. To create a new version of the file, **commit** changed file 

  `> cd Path/to/directory`
  
  `> git add --all` 
  
  `> git commit -m 'Submitting assignment'`
  

4. To update the online copy with the local changes you made, **push to the master branch**
  
  `> git push`


5. If someone else is also working on the file, **pull** every time before starting work to make sure there are no conflicts (Terminal)

  `> cd Path/to/directory`
  
  `> git pull`


## When Working in Large Teams

If you do not have push privileges or more generally, in cases when the collaborative work is more strictly managed:

1. Fork or branch
   * **fork** means you create your own copy on the cloud; this is more commonly used by external collaborators
   * **branch** is linked to the original repository more directly; this is used by internal team members to encapsulate work on specific features
2. Make changes and add commits.
3. Open a **pull request**
4. Discuss and review commits.
5. Wait for repository admin to **merge** your branch/fork.


## Using GitHub: General Notes

* For cloud storage and sharing, do not forget to push (especially on GitHub Desktop!)
   * Check if the changes are online. If you cannot see them after refreshing, no one else can.
* GitHub does all the version control for you. Do not duplicate and rename files!


* Additional resources
   * Get started: [GitHub tutorials](https://guides.github.com/)
   * Get it done: [Git cheatsheet](https://education.github.com/git-cheat-sheet-education.pdf)