### GESIS Fall Seminar in Computational Social Science 2022
### Introduction to Computational Social Science with Python
# Day 1-3: Introduction to Programming with Python

## Overview

* Programming languages
* About Python
* Scalars: `int`, `float`, `bool`, `None`
    * Operators: arithmetic, boolean, comparison, assignment, membership
* Non-scalars: `list`, `tuple`, `str`, `set`, `dict`
    * Methods
    * Ordered vs. unordered non-scalars       
    * Mutable vs. immutable non-scalars
* Debugging


## Programming Languages

A programming language is a formal language used to specify a set of instructions for a computer to execute. It has:

* Primitive constructs – literals (chracters, numbers) and operators
* Syntax – rules for putting primitives together
* Static semantics – rules for forming meaningful commands
* Semantics – the meaning of commands

## Markup vs. Programming Languages


|              | Markup Languages | Programming Languages   
| :----------- |:---------------- | :----------------------
|  |![Markup languages](figs/markup_lang.png "Markup languages") | ![Programming languages](figs/program_lang.png "Programming languages")
| **Examples** | TeX, HTML, XML, **Markdown**   | C, Java, JavaScript, **Python**, R            
| **Use**      | Structure and present data | Transform and generate data  
| **Execution**| Program (e.g. a browser)   | Computer hardware 
| **Structure**| Inline tags    | Primitive constructs, syntax, static semantics, semantics 

(Image sources: Wikimedia)

## Why Python?

![Python](figs/python.png "Python")

* Open-source – free and well-documented
* Simple and concise syntax
* Many useful libraries
* Cross-platform
* [Widely used in industry and science](https://youtu.be/cKzP61Gjf00)

# Objects, Data Types, and Expressions

* Computer programs manipulate data in the form of objects
* Objects have types
  * Scalar — indivisible
  * Non-scalar — with internal structure, can be ordered/unordered and mutable/immutable
* We can do things with objects
    * Use variables to associate them with names
    * Combine objects and operators to evaluate expressions
    * Call methods on objects
    * Pass objects to functions

## Data Types in Python


| Type     | Scalar     | Mutability | Order   
| :------: |:----------:|:----------:| :---------:
| `int`    | scalar     | immutable  |             
| `float`  | scalar     | immutable  |  
| `bool`   | scalar     | immutable  | 
| `None`   | scalar     | immutable  | 
| `str`    | non-scalar | immutable  | ordered
| `tuple`  | non-scalar | immutable  | ordered
| `list`   | non-scalar | mutable    | ordered
| `set`    | non-scalar | mutable    | unordered
| `dict`   | non-scalar | mutable    | unordered

## Scalar Data Types

* Integer
* Float
* Boolean
* NoneType

In [38]:
a = 34
b = None
print(b)

None


## Non-Scalar Data Types

* String – sequence of characters (immutable, ordered)
* List – sequence of values (mutable, ordered)
* Tuple – sequence of values (immutable, ordered)
* Set – collection of unique values (mutable, unordered)
* Dictionary – a set of key/value pairs (mutable, unordered)

In [41]:
ls = [1, 2, 3, 4]
dc = {'a': 1, 'b': 2}
tu = ((1,2), 2, 3, 4, 5)
st = {4, 1, 2, 3, 4, 5, 5, 5, 4}
print(st)

{1, 2, 3, 4, 5}


## Using Operators with Objects

* Arithmetic: `+`, `-`, `*`, `/`, `**` exponent, `%` modulus, `//` floor division
* Boolean: `and`, `or`, `not`
* Comparison: `==`, `!=` does not equal, `>`, `<=`
* Assignment: `=` , `+=`, `-=`
* Membership: `in`

In [45]:
# Note that the arithmetic operators + and * have different meanings 
# depending on the types of objects with which they are used

3*'a' + '2'

'aaa2'

In [51]:
# Boolean operators return bool

False or False

False

In [57]:
# Assignment vs. test for equality

a = 2
a += 4 # a = a + 4
a

b = [1, 2, 3, 4]
5 in b

False

## Unordered Types vs. Sequences

* Unordered types: `set`, `dict`
* Ordered types (sequences): `str`, `list`, `tuple`
  

In [58]:
st = {1, 2, 2, 'a', 'b'} # sets are unordered
print(st)

{1, 2, 'b', 'a'}


## Dictionary Operations: Indexing

* Dictionaries are indexed by keys

In [59]:
mydic = {'Howard': 'aerospace engineer', 'Leonard': 'physicist', 'Sheldon': 'physicist', 
         'Penny': 'waitress', 'Raj': 'astrophysicist'}

mydic['Penny']

'waitress'

## Sequence Operations: Indexing and Slicing

* Lists, tuples, and strings are indexed by numbers. **Indexing in Python starts from 0!**
* Use `seq[index]` to extract individual elements
* Use `seq[start:end]` to get sub-sequence starting from index `start` and ending at index `end-1`
* Use `seq[start:end:step]` to get sub-sequence starting from index `start`, in steps of `step`, ending at index `end-1`

In [70]:
st = 'some string'
print(st[3]) # get element at index 3 (the fourth element)
print(st[::2]) # get elements with even indeces
print(st[::-1]) # get elements in reverse order

' '

## 🏋️‍♀️ PRACTICE


In [None]:
# Q1: Make three new strings from the first and last, 
# second and second to last, and third and third to last letters 
# in the string below. Print the three strings.

p = 'redder'


## Evaluating Functions with Objects 

### `function(object)`

* Use the name of a type to convert values to that type
* `len()` – returns the length of the sequence or collection
* `max()` – returns the largest element
* `sum()` – returns the sum of all elements

In [86]:
len({1, 2, 3, 4, 3, 2})


4

## 🏋️‍♀️ PRACTICE


In [None]:
# Q2: Use set() and len() to count the number of unique letters
# in the string below

s = 'jackie will budget for the most expensive zoology equipment'


## Calling Methods on Objects

### `object.method()`

Use the period `.` to link the method to the object.

In [87]:
string1 = 'Hello'

string1.upper()

'HELLO'

## [String Methods](http://docs.python.org/3/library/stdtypes.html#string-methods)

* `S.upper()` – change to upper case
* `S.lower()` – change to lower case
* `S.capitalize()` – capitalize the first word
* `S.find(S1)` – return the index of the first instance of input
* `S.replace(S1, S2)` – find all instances of S1 and change to S2
* `S.strip(S1)` – remove whitespace characters from the beginning and end of a string (useful when reading in from a file)
* `S.split(S1)` – split the string into a list
* `S.join(L)` – combine the input sequence into a single string

In [88]:
print('Make me scream!'.upper())

x = 'make this into a proper sentence'
print(x.capitalize() + '.')

print('Find the first "i" in this sentence.'.find('i'))

MAKE ME SCREAM!
Make this into a proper sentence.
1


In [92]:
x = ' This is a long sentence that we will use as an example.\n'
x = x.strip()
x


'This is a long sentence that we will use as an example.'

## 🏋️‍♀️ PRACTICE


In [None]:
# Q3: Remove the trailing white space in the string below, 
# replace all double spaces with single space, and format to a sentence 
# with proper punctuation. Print the resulting string.

string1 = '  this  is a very badly.  formatted string -  I would  like to make it cleaner\n'


In [None]:
# Q4: Convert the string below to a list

s = "['apple', 'orange', 'pear', 'cherry']"


In [None]:
# Q5: Reverse the strings below.

s1 = 'stressed'
s2 = 'drawer'


## Set Methods

![Set operations](figs/sets.png "Set operations")

* `S1.union(S2)`, `S1|S2` — elements in S1 or S2, or both
* `S1.intersection(S2)`, `S1&S2` — elements in both S1 and S2
* `S1.difference(S2)`, `S1-S2` — elements in S1 but not in S2
* `S1.symmetric_difference(S2)`, `S1^S2` — elements in S1 or S2 but not both

In [99]:
st1 = set('homophily')
st2 = set('heterophily')

st1 & st2

{'h', 'i', 'l', 'o', 'p', 'y'}

## Mutability

* Immutable types: `str`, `tuple`, and all scalars
* Mutable types: `list`, `set`, `dict`

**Objects of mutable types can be modified once they are created.**

In [102]:
dic = {1:'a', 2:'b'}
ls = [5, 4, 1, 3, 2]

dic[3] = 'c'
dic[1] = 'x'
dic

ls.sort()
ls

[1, 2, 3, 4, 5]

## [List Methods](http://docs.python.org/3/library/stdtypes.html#mutable-sequence-types)

* `L.append(e)`
* `L.insert(i, e)`
* `L.remove(e)`
* `L.extend(L1)`
* `L.pop(i)`
* `L.sort()`
* `L.reverse()`

In [109]:
ls1 = [1, 2, 3]
ls1.append(4)
ls1.extend((5, 6, 7))

ls1 = list(reversed(ls1))
ls2

[7, 6, 5, 4, 3, 2, 1]

In [None]:
mylist = [4, 5, 2, 1, 3]
mylist.sort()  # Sorts in-place. It is more efficient but overwrites the input.
print(mylist)

mylist = [10, 9, 6, 8, 7]
sorted(mylist) 
print(mylist)


## Mutability Can Be Dangerous

In [113]:
ls1 = [1, 2, 3]
ls2 = [4, 5, 6, 7]

ls1.append(ls2)
print(ls1)

ls2.extend([8, 9, 10])

print(ls1)

[1, 2, 3, [4, 5, 6, 7]]
[1, 2, 3, [4, 5, 6, 7, 8, 9, 10]]


## Aliasing vs. Cloning

![Aliasing](figs/aliasing.png "Aliasing")

In [115]:
ls1 = [1, 2, 3]
ls2 = ls1  # Using [:] is one way to clone

ls1.reverse()
print(ls2)

[3, 2, 1]


## 🏋️‍♀️ PRACTICE


**Q6**: What will the following program print?

```
ls1 = [1, 2, 3, 4, 5]
ls2 = ls1
ls2[2] = 0
print(ls1)
```

* (A) `[1, 2, 3, 4, 5]`
* (B) `[1, 0, 3, 4, 5]`
* (C) `[1, 2, 0, 4, 5]`
* (D) `0`

## 🏋️‍♀️ PRACTICE


In [None]:
# Q7: Use a list operation to create a list of ten elements, 
# each of which is '*'


In [None]:
# Q8: Assign each of the three elements in the list below 
# to three variables a, b, c
ls = [['dogs', 'cows', 'rabbits', 'cats'], 'eat', {'meat', 'grass'}]


In [None]:
# Q9: Create a new list that contains only unique elements from list x

x = [1, 5, 4, 5, 6, 2, 3, 2, 9, 9, 9, 0, 2, 5, 7]


In [None]:
# Q10: Print the second smallest and the second largest numbers 
# in this list of unique numbers

x = [2, 5, 0.7, 0.2, 0.1, 6, 7, 3, 1, 0, 0.3]


## Computer Bugs

![Computer Bug](figs/bug.jpg "Computer Bug")

The actual first computer bug. On September 9, 1947, Admiral Grace Hopper found this moth trapped on a relay of the Harvard Mark II computer. (Image source: U.S. Naval Historical Center Online Library)

## What Is Computer Programming Really about?

*99 little bugs in the code,*

*99 bugs in the code,*

*1 bug fixed...run again,*

*100 little bugs in the code...*

## How to Debug: Two Options

1. **Google** the error and find an answer on **Stackoverflow**
1. Use **`print()`** systematically

## Learn from Other Programmers

![Not sure if I am a good programmer or just good at googling](figs/good_programmer.jpg "Not sure if I am a good programmer or just good at googling") 

## Debugging Systematically

1. Compare input in successful and failing runs
* Formulate a hypothesis
* Design an experiment to test the hypothesis; use `print()`
* Keep record of your experiment
* Repeat

## After Debugging for Hours...

* Stop

* Try commenting your code or explaining it to someone else

* Sleep on it

![Best Debugger](figs/debugging_sleep.png "Best debugger")

(Image source: Reddit)