# Preliminaries

First things first, what exactly *is* Python?
* ***Programming* Language**: In fact, a *Turing complete* programming language, which means that anything a computer can possibly do you can do with Python (unlike SQL, HTML, or CSS)
* ***Interpreted* Language**: No compilation required, you can just start typing and seeing results! In older languages like C/C++, you had to *compile* your code down into a language (assembly code) that the processor can understand, and you had to wait for this compilation to complete before you could run the program. Python, on the other hand, implements "on-the-fly execution", meaning that it converts the code, sends it down to the processor, gets back the result, and prints it out, all in real time (note that this "real time" may still be extremely slow). Thus we call Python an "interpreted" rather than "compiled" language.
* ***Dynamically-Typed* Language**: First, we need to understand what we mean when we say "type" in programming contexts. Any variable that you create, in any language, has a "type". For example, the number `3` has type integer (`int` in Python), the string of characters "hello" has type string (`str` in Python), the true/false value `True` has type boolean (`bool` in Python). Since these are "basic" datatypes, i.e., since they are built into Python, they are called "primitive" types. Any more complicated variable in Python is ultimately a combination of these primitive types. For example, the `pandas` data science library has a `DataFrame` type, which is essentially a spreadsheet of numbers or strings, and thus is really just a fancy combination of these two primitive types.

    Now, when we say Python is "dynamically typed", this means that any variable you create, say `day`, could refer to a *string* at one point in the code but then to an *int* at other points. This sounds scary, but it just means you need to keep track in your head of what type `day` *should* have when you're using it. For example, the following code should raise a red flag in your head (note that the "`#`" character represents a comment, so that everything after "`#`" on a line is just a note-to-self):
    
    ```python
    day = 3 # day now refers to the int 3
    # some other code...
    day = "Tuesday" # day is *changed* to refer to the string "Tuesday"
    # the rest of the code...
    ```

    So `day` *started* the program as referring to a number, but then was "dynamically typed" to refer instead to a string. You should use comments, good variable names, and other good coding practices to avoid situations where someone using your code doesn't know what type a variable is at what point in the code D:
    
As a final note before we start, it's important to know that Python is sweeping a *lot* of things -- complicated things that have foisted many an emergency all-nighter onto computer scientists -- under the rug:

Assembly, the "first" language in a sense, solely allows moving 0s and 1s around in your computer's memory. Then C came and was a bit more human-readable, but still required programmers to explicitly *allocate* sectors of memory space, and *free* these sectors when they were no longer needed. Then Java came with a "garbage collector", which meant that it would *scan* memory and automatically free up any sectors you weren't using. Finally, we have Python, which did all of the above automatically *plus* interpreted code in real time.

The point of all this is just that, depending on what you're hoping to *do* with the computer, Python may not be the best language for the task. For example, if you're trying to perform a complex estimation procedure as fast as possible, a language like C is probably better, since the compilation process *optimizes* the machine code to be as fast as possible, unlike Python which just runs it as you type it.

BUT, since we're social scientists, Python is *usually* the right language for the job, since this sweeping-under-rugs allows you to do a *ton* of cool/complex stuff, using very few lines of code. For example, I went back into my undergrad CS practice exams folder and found a problem asking us to turn this C code
<!--![C code](c_code.png)-->
<img src="c_code.png" alt="C code" style="width:50%;"/>
into assembly code, which turns out to look like this:
```assembly
sw $ra, ($sp) # store f's return address in the runtime stack
sub $sp, $sp, 12 # push return address and both parameters onto stack
sw $a0, 8($sp) # store first argument passed into location for p
sw $a1, 4($sp) # store second argument passed into location for n
lw $t0, 4($sp) # load n
mul $t0, $t0, 4 # multiply n by 4 since an int is 4 bytes
lw $t1, 8($sp) # load p
add $t0, $t0, $t1 # add subscript offset calculated above to value of p
lw $a0, ($t0) # load indicated array element
li $v0, 1
syscall # print element
add $sp, $sp, 12 # pop return address and both parameters from stack
lw $ra, ($sp) # get return address into register
jr $ra
```

So now you have a feel for how much time coding Python is saving you :P

# Installing

Now that we have some background, let's actually go and get the Python interpreter onto our computers and start coding!

This can be a daunting task in and of itself, since there are dozens of different versions to choose from. Long story short, I'm going to recommend the [Anaconda Distribution](https://www.anaconda.com/download/), which is a version of Python packaged with tons of extra "data science" libraries so that you don't have to spend time downloading these individually. If you can't use Conda for some reason, then the [latest version of Python](https://www.python.org/downloads/) (Python 3.7 at the moment) should be sufficient. 

Unfortunately, many systems are still using the old version (version 2.x instead of 3.x) of Python, which has very different syntax down to the `print` command from Python 2 not working in Python 3... So steer clear. Worst-case scenario there is a `2to3` program that Python has created to automatically convert version-2 code to version-3 code, though it doesn't work perfectly (in fact, it has never worked for me...).

## The Shell

Once Conda is installed, you should be able to open up a "shell" and use Python. A "shell" is just a non-graphical interface allowing you to run various programs on your computer. On Windows you hold down the Windows Key and press R to open the "Run..." prompt, then just type "cmd" and press Enter. I've never used a Mac but Google tells me that Mac users should press CMD+Space to open "spotlight search", and then type "terminal" and press Enter.

On Windows, the shell looks like a big ugly box with ugly text in it:
<img src="win_shell.png" alt="Windows shell" style="width:50%;"/>



For your first experience in Python, start the "interactive console" by simply typing "python" (case-sensitive) in that shell, and then once Python loads type "2+2" to see the result:

<img src="win_python.png" alt="Python console" style="width:50%;"/>

As we can see, it successfully interprets `2+2` and returns the result, `4`. We can also see that a line starting with `>>>` means the console is waiting for us to type some Python code, while lines without the `>>>` are results returned by the Python interpreter.

Although this is cool, most of the time we won't be typing code directly into the Python console. Instead, we will be making text files containing our code and ending in the `.py` suffix, for example `add_numbers.py`. We will then run these files from the shell using `python <filename>`, in this case `python add_numbers.py`. If you know Stata, this is equivalent to writing a `.do` file and clicking the run button, where instead of a run button we have the `python` shell command.

So, if you're still in the Python console type `quit()` to exit. For following along with the next step of the tutorial, you should make a Python file called `hello_world.py`. Here I highly recommend that you create a new, clean folder where you'll do all the work in the tutorial. For example `C:\python_tutorial`. That will make it a lot easier for you to find and navigate to your Python code files when you're working in the shell.

# The Basics

## Syntax, Errors, and Figuring Things Out

The most important things to keep in mind when coding Python are
1. Everything is *case sensitive*: Just like in R, Python will treat `x` as different from `X`.
2. Almost all of Python's syntax is implemented through *spacing*: Whereas other languages are filled with curly brackets `{...}` delimiting the start and end of code blocks, in Python a code block will be any consecutive lines of code at the same level of *indentation*:

In [4]:
%%capture
z = "Start of first code block"
for x in ["This","is","the","first","code","block"]:
    x = x + "This is the start of the second code block"
    print(x)
    if x == "More blocks":
        print("Yet another level")
    print("Back to second code block")
print("End of first code block")

Because of how central spacing is to Python, I *highly* recommend the [Sublime Text Editor](https://www.sublimetext.com/3). It has a feature that is invaluable when coding Python: if you select all text (Ctrl+A on Windows), it shows *spaces* with a dot and *tabs* with a line. Since python treats a tab *differently* from 4 spaces, this will save you at least several hundred headaches. For example, highlighting the code we looked at above in Sublime, we can instantly see the error, which will make Python unable to interpret your code:
<img src="error.png" alt="Python error" style="width:75%;"/>

3. In Python, errors are your friends. They give you a description of the error, and the line of code where it happened. When I teach this is the \#1 hardest thing to get across to students: I guess we've been trained to think ERROR == BAD, but for me literally 80-90\% of my coding time is really spent finding, understanding, and fixing errors, so imo you should treat errors as a regular part of the coding process, alongside writing code, running code, and looking stuff up online. Which brings me to the fourth and final point:

4. Programmers don't memorize anything beyond the basic functions of the language. They use **Google**:
![How to fix anything](google.png)

So yeah I pretty much Google almost anything I need to do beyond... printing things. "How to load csv in Python", "Scatterplot in Python", "Python convert string to int", and so on. I'm probably pretty extreme in the not-memorizing-things dimension, but yeah I think a lot of people get scared at the beginning thinking they have to memorize everything, and that is far from the case.

## Our First Program: The Obligatory "Hello World!"

Open up the folder you created at the end of the Shell section above, and then open the `hello_world.py` file you created in Sublime. On the first line, just type `print("Hello World!")` and save. Now open your shell, and let's navigate to the folder where your `.py` file is stored. To do this, on Windows, use:
* The shell command `dir` (for "**dir**ectory") to see all of the contents of your *current* folder,
* `cd ..` (for "**c**hange **d**irectory") to navigate *up* one level,
* `cd <folder name>` to navigate *down* into `<folder name>` (where you replace `<folder name>` with the name of a particular folder within the current folder), and
* `cls` to clear the screen, in case you get overwhelmed with the results of the `dir` command, for example.

On Mac (and Linux) shell, these are equivalent to (respectively) `ls`, `cd ..`, `cd <folder_name>`, and `clear`. Also note that you can type the beginning of a folder (or file) name and press Tab to autocomplete the rest, if it is unambiguous which folder or file you're trying to type.

Once you are in the folder containing `hello_world.py`, run the program by typing `python hello_world.py` and pressing Enter. It should produce the following:
![Our first Python program](hello.png)

Very exciting stuff.

## Doing Something Less Lame

So now we know how to use Python as a calculator and how to make it print messages...
![Very fancy calculator](k.gif)

To make things a bit more fun, let's dive into the full set of language features now, building our way up to conditional checks, looping, and functions (in this section) and then more advanced data structures and datatype conversion (in the next section).

### Assigning Values to Variables

If you're familiar with R, you might have to adjust your habits a bit, because instead of using `<-` to set the value of a variable, in Python you use a single equals sign `=`:

In [3]:
# Assigning numeric values to variables
num_senators = 100
num_moc = 435
num_dc_electors = 3
total_electors = num_senators + num_moc + num_dc_electors
print(total_electors)

# Assigning string values to variables
quavo = "raindrop"
offset = "droptop"
takeoff = quavo + offset
print(takeoff)

538
raindropdroptop


(By the way, you actually can use a single equals `=` to set the value of a variable in R. For example, both `x = 2 + 2` and `x <- 2 + 2` do the same thing. But `<-` makes it a *LOT* more clear what the line of code is doing, imo. In Python, sadly, you have no choice but to use `=`.)

Although in this document I'm just typing Python code in and the output is getting spit out right underneath it, in your case you should type these lines of code into a `.py` file (for example, you can edit your `hello_world.py` file), save it, and then execute `python hello_world.py` from the shell, which will then display the output.

### Lists

There is one more datatype we need to learn before we can move on: the list. A list is exactly what it sounds like, a "container" which can hold an ordered sequence of (any) Python objects. The syntax for creating a list just involves using square brackets `[` and `]` to demarcate the start and end of the list, respectively, and then placing a comma `,` between elements of the list. For example:

In [11]:
lucky_numbers = [42,109,77,3,-5]
# or
the_goats = ["Michael Jordan", "Lebron James", "Kobe Bryant", "Arvydas Sabonis"]

Note that Python is not picky (for once) about whether there is a space after the commas or not. Once we've created a list, we can access its `n`th element using square brackets again, this time putting the index you want between them, after the name of the list: `listname[n]`. 

**IMPORTANT NOTE** here: Python, unlike R and unlike humans (but like C/C++/Java), counts starting from **zero** rather than one. So the *first* element of a list is accessed via `listname[0]`, the *second* element of the list is accessed via `listname[1]`, and so on. This has caused so many headaches that it even has a [Wikipedia-recognized](https://en.wikipedia.org/wiki/Off-by-one_error) name and abbreviation: the Off-By-One Error or OBOE. Even more confusingly, putting *negative* numbers inside the square brackets lets you select elements starting from the *end* of the list, with the count starting at `-1`, so that `listname[-1]` is the last element, `listname[-2]` is the second-to-last element, and so on.

So, using our code from above:

In [13]:
print(lucky_numbers[0])
print(lucky_numbers[-2])
print(the_goats[0])
print(the_goats[2])
print(the_goats[-1])

42
3
Michael Jordan
Kobe Bryant
Arvydas Sabonis


### Conditionals

Now that we know how to assign variables of different types, we move to how to make the program do *different things* based on the value of some variable(s). For example, let's modify our code above (where we set the values of numeric variables) a bit. Instead of the total number of delegates, we now keep track of the number of delegates who cast a vote for each candidate:

In [14]:
clinton_delegates = 227
trump_delegates = 304

Now say we want to make the program act *differently* based on the values of these variables. For example, we want to print out the winner of the election based on who has more than 270 electoral college votes. The Python construct which allows us to do this is called an **if statement**. The syntax is as follows:
```python
if <condition a>:
    do a thing
elif <condition b>:
    do a different thing
else:
    do a third thing
```

The `elif` portion is optional, however -- if you only have two possible conditions you want to check for, this simplifies to:
```python
if <condition>:
    do a thing
else:
    do a different thing
```
So let's use this construct to implement our conditional-printing program. The syntax for checking if two things are equal is `==` (**Note that this is different from the SINGLE EQUALS `=` that you use for setting the value of a variable**. Many headaches await those who do not heed this call) and for checking if two things are *not* equal you use `!=`. For checking if thing `A` is greater than thing `B`, use `A > B`, for less than use `A < B`, and finally Python has a *negation* operator, `not` which takes any true statement and "flips it" to false, and vice versa. For example:

In [15]:
print(2 == 3)
print(not 2 == 3)

False
True


**EXERCISE**: Try to use these operators to implement the conditional printing. Make a new `election.py` file and open it in Sublime. Your code should print `"Clinton Wins"` if the value of `clinton_delegates` is greater than the value of `trump_delegates`, and `"Trump Wins"` otherwise.

<a href="./election.py" target="_blank">Click here for the solution.</a>

**MORE ADVANCED EXERCISE**: Let's make our program slightly less boring. Rather than setting the `delegates` variables to particular numbers, let's randomly generate two numbers that add up to 538. In Python, to generate a random integer between `a` and `b` (inclusive), you use the `random.randint(a,b)` function. However, this is our first encounter with Python *packages*, since this function is not automatically loaded into Python by default. So, just like the `library(<package name>)` function in R, in Python you type `import <package name>` at the top of your `.py` file. In this case, `random.randint()` is a function from the `random` library, so you'll type `import random` at the top of your file. Now make a new `.py` file called `election_random.py` and use this function to randomly assign delegates to each candidate before the `if` statement, ensuring that they add up to 538. Now each time you run `election_random.py` your program will produce a non-deterministic output! Slightly less boring.

Highlight this text for a hint if you're stuck: <span style="background-color:black;">If you generate one number x between 0 and 538, you get the other number for free via 538 - x.</span>

<a href="./election_random.py" target="_blank">Click here for the solution.</a>

### Loops

We're now ready for probably the *main* Python construct you will use in your future endeavors: the **for loop**. A for loop is useful any time you want the program to iterate through a list: whether a sequence of numbers or a list of values. A for loop begins with the keyword `for` and then provides (a) the list to be looped over (`list_name` below) and (b) what you want Python to call the "current" item of the list that you're looking at (`variable_name` below). Though this may sound a bit daunting, looking at the code hopefully makes it less so:

```python
for variable_name in list_name:
    do a thing using variable_name 
```

So if the list referred to by `list_name` has `N` elements, the code block inside the loop (the "`do a thing using variable_name`" part) will be run `N` times (or `N` "iterations"), with `variable_name` being updated to point to the subsequent element in the list at the end of each iteration. Thus we can now use this construct to print out our list of GOATs:

In [16]:
for current_goat in the_goats:
    print(current_goat)

Michael Jordan
Lebron James
Kobe Bryant
Arvydas Sabonis


### Dictionaries

### Type Conversion

## Data Science (BIG DATA Science)

We probably won't get to this within an hour (we probably won't get halfway through this tutorial tbh), but it's still useful to read this after to know how to start doing actual *social science stuff* in Python. On this front, the key package you need to know about is **Pandas**. The best way to think of Pandas, imo, is as a Python equivalent of Stata with everything besides the regression functions. I know you're probably thinking... that's the whole point of Stata. But hear me out: (a) you can do lots of stuff in Pandas that you can't do in Stata and (b) you can always export `dta` or `csv` files out of Pandas and into Stata using the `pd.write_stata()` or `pd.write_csv()` functions.

Building on point (a), the difference between what you can do in Pandas versus what you can do in Stata is kind of staggering: you can have *more than one* dataset open at the same time, you can mix-and-match giant text data and giant numerical data within the same data frame, you can **MACHINE LEARN THINGS WITH BIG DATA AND SCIENCE AND ALGORITHMS AND HACKING**, and so on.

TODO: Really simple Pandas example. Load in a csv, compute means, drop rows, maybe a merge at the end if there's time

## Next Steps