Preliminary skills in Python for Social Data Science 
===

Congrats on your offer for our Social Data Science program! This program will be primarily quantitative in nature although you will definitely be exposed to a diverse set of research methodologies, all of which we hope you will find insightful and useful. In order to make the best use of your time here, we believe that having some basic experience with Python would be a considerable asset. In this course we will be using Python extensively. There are many other languages being used for data science. In time you may be exposed to some of them, at least in passing. Yet a strong comfort with Python will be key.  

We will be teaching most of the skills that you will need to get started in the first term (Michaelmas Term, 2019). Then you can apply these in your option courses in Hilary Term 2020. Finally, you will be able to focus on specific skills to apply to a compelling research problem in the third term, Trinity Term 2020. 

## How to use this notebook 
This notebook is a representation of the skills that you should be familiar with. We will be starting from almost-basics in the course but having this sense of what is in the basics should help. We will cover the material on this sheet in an optional workshop on Thursday, October 10, 2019 as a part of induction week. If you are confident in this material you might want to attend the parallel session on 'Developing a Research Question' instead. 

It is okay if you do not know everything on this sheet inside out before course starts, but these are topics that we will only explain briefly. The basics are pretty easy to grasp on their own through some independent study. This will give us considerably more time for the tricky parts of Python, particularly as found in ```pandas```, the Python library for scientific computing. Basically, we expect you to be able to open **this** notebook in Jupyter Lab on your own computer, run some commands, and understand some basic syntax.

If you want resources for practicing these skills there are many introductory texts. For a patient and clear overview, we recommend (and teach from) [Python Crash Course by Eric Matthes](https://ehmatthes.github.io/pcc/). However, you might want to start with an even brisker treatment. For that we recommend having a look at the notebooks from Jake Van Der Plas' "Whirlwind tour of Python": https://github.com/jakevdp/WhirlwindTourOfPython. The links to the appropriate pages of Whirlwind Tour are written in each section. 

The Whirlwind tour URL is GitHub repository, much like the one for this course. In it are a number of short Python notebooks for the basic skills. You can actually just click on the files that end in ```.ipynb``` and read them in the browser. However, to make use of them to execute code you will have to open them on your own computer. 

To use the GitHub library, the simplest possible way is to download the library by selecting the green "Clone or Download" button in the upper right header, and then "Download Zip". Once it's downloaded, unzip the archive, place it in an appropriate folder and then launch Jupyter Lab (which can be installed as a part of the Anaconda distribution for Scientific python). To install Jupyter Lab simply install Anaconda and then click on the Anaconda Navigator to launch it. Jupyter Lab will be the first option in the upper left corner. 

**NOTE: We will be using Python 3.7. Install this version, not the 2.7 version. If you are on Windows, please consider upgrading to Windows 10. You should be able to use the 64-bit version regardless, but Windows 10 makes it easier.**

From Jupyter Lab you should be able to navigate to a folder containing the python notebooks (notice that Jupyter Lab's side bar is also a file navigator). For what it is worth, you do not have to 'clone' or even know much about GitHub if you simply want to just download and extract the files. Later, we will want to deepen your GitHub skills once you are more familiar with programming and servers. 

# Basics I: Running Python 

Python is a programming language. It is interpreted by the computer and transformed into low level code that can be interpreted by a processor. To make statements in Python, the most direct way is to type commands into a "Python console" (for example, by opening the terminal, or on windows the "anaconda prompt", then typing "python"). If you already have Python installed, then you will see a welcome message and a series of three chevrons, like so: (with what is likely to be some slight difference in the welcome message depending on your setup.)

~~~ python
Python 3.7.3 (default, Mar 27 2019, 16:54:48) 
[Clang 4.0.1 (tags/RELEASE_401/final)] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
~~~

And in here you can enter commands. For more complex work, you will want to turn to Jupyter notebooks and Python script files (\*.py files). 

To exit the Python console, type ```exit()``` or press control-D. Then you will see the standard prompt, which will likely be either a single chevron (```>```) or a ```$```. 

Notebooks can be run by launching Jupyter Lab. If you are in a Terminal window or an Anaconda prompt, then instead of typing "python", type 'jupyter lab'. It should launch your default browser and navigate to "http://localhost:8888/lab" which is the Jupyter Lab environment. Note that typing "jupyter lab" will not work on the standard windows command prompt, but only on the "Anaconda Prompt", which you can access through the start menu or by pressing windows and typing it in the search bar (assuming you've already installed Anaconda). 

In terms of basic Python, you will want to be able to: 
* Assign a variable: ```var1 = "hello world"```
* Know what counts as a variable and what counts as an operator: e.g., ```var1``` versus ```+```
* How to run a command in a jupyter notebook (using Jupyter or Jupyter lab).
* How to make a comment in code: ```#this is a comment```
* How does Python use whitespace?

For more information on basic operations, including some basics of syntax and how to assign values to variables see:
- [2. How to Run Python Code](https://github.com/jakevdp/WhirlwindTourOfPython/blob/6f1daf714fe52a8dde6a288674ba46a7feed8816/01-How-to-Run-Python-Code.ipynb)
- [3. Basic Python Syntax](https://github.com/jakevdp/WhirlwindTourOfPython/blob/6f1daf714fe52a8dde6a288674ba46a7feed8816/02-Basic-Python-Syntax.ipynb)
- [4. Python Semantics: Variables](https://github.com/jakevdp/WhirlwindTourOfPython/blob/6f1daf714fe52a8dde6a288674ba46a7feed8816/03-Semantics-Variables.ipynb)
- [5. Python Semantics: Operators](https://github.com/jakevdp/WhirlwindTourOfPython/blob/6f1daf714fe52a8dde6a288674ba46a7feed8816/04-Semantics-Operators.ipynb)

What we will cover in the course:
- We will be discussing how to use Jupyter, some advanced tips and how to install different utilities in the sidebar. We will assume you will know these basics. We will also cover some of the logic behind when and why to program. 

# Basics II: Variables and basic data structures 

## Scalar Data Types
Python stores data in variables. You can store a variety of data types in a variable, and Python will usually know what to do with the variables. You can also store these variables in collections. Scalar data types are types that refer to specific characters or single numerical value. The string variable type has properties of both scalar and collections, but that's a matter for classification. What's more important is knowing when and how to use a string. Here are some basics about strings and numbers that are worth considering:  

## Strings 

Strings are sets of characters, encased in quotes. Before course starts you should know:

* How to assign text to a varaible: ```newvar = "Hello"```
* What's the difference between using ' " and '''? I.e., ``` 'one string' + "another" + '''third''' ```
* How to access a character in a string by index: ```print(newvar[0])```
* How to transform a string variable to all upper case or all lower case: ```newvar.upper()```
* WHat are escape characters and how to use them: ```"\n \"``` for example. 
* How to determine the length of a string: ```len(newvar)```

## Integers and numbers

Integers are whole numbers. Floating point numbers allow decimals. 

* How to assign a number to a variable. 
* How to use the basic arithmetic operators: ```-,/,//,+,*,^```
* What happens when you combine an integer with a floating point number?

This is covered in: 
- [5. Built-In Scalar Types](https://github.com/jakevdp/WhirlwindTourOfPython/blob/6f1daf714fe52a8dde6a288674ba46a7feed8816/05-Built-in-Scalar-Types.ipynb)

What we will teach in class: 
- We will assume that you understand the differences between ingeter and floating point. We will review Boolean operators. We will also discuss different kinds of string encodings, for example, issues with Unicode and emoji. 

# Basics III: Scalar Variables and Core Data Types

## Collections 
Collections are aggregates of values. These values can either be scalar data types or other collections (i.e. you can have a 'list of lists'. Three fundamental types of collections in Python are 'lists','dictionaries', and 'sets'. 

### Lists 

Lists are sequentially ordered collections of objects. An object is usually a variable or a set of variables collected together as some 'type'. For example, you could have a list of strings, a list of "Tweet" objects, of numbers, etc. 

~~~ python 
list1 = [ "Eggs","Bacon","Tomatoes","Mushrooms","Hash browns"]
~~~

* How to create a list: ```ListBasic = []```, 
* How to add to a list: ```ListBasic.append(NEWVAR)```
* How to find an element in a list: ```"Hello" in ListBasic```

### Dictionaries 

Dictionaries are collections of key-value pairs, where the key is a number or string and the value can be any kind of object. For example, you could have a dictionary where the keys are genres and the values are lists of albums such as: 

~~~ python 
dict1 = {"Disco":["Donna Summer","The Bee-Gees"],
         "Classical":["Mozart","Beethoven","Bach"],
         "Soul":["Aretha Franklin","Ray Charles"]
        }        
~~~

* How to add a key value pair to a dictionary: ```dict1["Country"] = ["Dolly Parton"]```
* How to query for a single value: ```print(dict1["Rock"]```
* How to ask if a key is in a dictionary: ```if "Rock" in dict1: print(dict1["Rock"])```
* How to iterate over keys, values, and items: 

~~~ python 
for i in dict1.values: print(i) 
for j in dict1.values: print(j)
for i,j in dict1.items():print(i,j)~~~

### Sets

Sets are collections where there is only one copy of any element in a set. So a set could include {1,3,5,7,9} but not {1,1,1,3,5}. With sets we can ask for membership, get the union of two sets (i.e., all the elements from both) or the intersection (all the _common_ elements). 

* How to create a set from a collection: ```example_set = set([1,2,3])```
* How to get the intersection  / union of a set: ```inter_set = intersection(example_set, set_2)```
* How to ask if an element is in a set: ```if x in example_set: print("Found it!")```


A further review of these can be found in 
- [6. Built-in data structures](https://github.com/jakevdp/WhirlwindTourOfPython/blob/6f1daf714fe52a8dde6a288674ba46a7feed8816/06-Built-in-Data-Structures.ipynb)

What we will teach in class:
- Just more practice with dictionaries and lists, how to query them and nest them inside each other. We will also discuss when is it better to use a list, a dictionary, or another data structure (particularly the Series and the DataFrame). 

# Basics IV: Control Statements 
    
To control the flow of operations, Python includes a number of control statements. So if you wanted to do some operation for every element in a list, or do such an operation only under some conditions, then this would require the use of control statements. The basic control statements to be considered are: 

## If-Else statements 

An if statement produces a truth condition. One condition, ```True``` means that we will run the instructions nested inside the if statement. To be nested, a statement will be formatted visually like so: 

~~~ python 
newvar = 5 
if newvar > 3: 
    print("Yes, it is greater than 3") 
~~~

In that case, the print statement is indented and will run because ```newvar``` is 5 and 5 is greater than 3. 

Features to know about an if statement: 
* How to use the logical comparisons: ```==, >, >=, <, <=```
* How to use an ```else``` statement
* How to use an ```elif``` statement
* How to use two logical comparisons using parentheses: ```if (NEWVAR > 3): print(NEWVAR)```
  
## Loops

Loops allow us to iterate over elements in a collection. The classic loop statement is a ```for``` loop that does some operation *for* each element in a collection. You should know:
* How to create a for loop: ```for i in range(10): print(i)```
* How to break out of a for loop: ```for i in range(10): if i > 5: break```
* What an iterator is? (i.e. do we have to always use the letter ```i```?)
* What does the enumerate function do? (It returns a counter)

Further notes on these can be found at:
- [7. Control Flow](https://github.com/jakevdp/WhirlwindTourOfPython/blob/6f1daf714fe52a8dde6a288674ba46a7feed8816/07-Control-Flow-Statements.ipynb)

What we will teach in class:
- More algorithms using things like while loops, list comprehensions, and double-loops.

# Basics V: Functions and methods

A function is a way to take input, perform some operations, and return output. We will cover how to create a function in class and when it is a good idea. But you should be able to identify what a function is and what is an argument (i.e., something that we use as input for a function). 

For example, imagine we wanted to both transform a string to lower case and get right of all letters that are not alpha numeric. It's a weird and basic example, I admit, but imagine there's a reason. This would be two operations, the first is ```s.lower()``` which will take string ```s``` and return that string in lower case. The second would be to check if every character is alphanumeric (```if s.isalnum():```). If the character is, we keep it, if not then we throw it away. 

If you have to do this procedure once, you might just write some lines of code and disregard functions. But if you have to do it a lot in different cases and parts of the code, then you might want to write a function for that. 

To do this we first define the function, including the variable that will carry our input. Then we use that function in our work. Remember, that the function has to be read by the Python interpreter before the command that uses it. So you should place your functions at the top of your code rather than in the middle or the bottom. Later we will show more abstract strategies for keeping your code tidy and organised. 


Here is one example of a function to accomplish the goal above (i.e., return a lower case string that only includes letters and numbers): 
~~~ python
def loweralphanum(text): 
    text = text.lower()
    newtext = ""
    for letter in text: 
        if letter.isalphanum():
            newtext.append(letter)

    return newtext
~~~

Some things about functions you will want to know: 
* How to build a simple function using ```def``` and ```return```.
* How to pass an argument into a function.
* A function does not always return some value. Explain why? Look at list.sort() for an example. 
* What is the difference between a function and a method? (Admittedly, this one is a little tricky and will be covered in class.)


In [1]:
# Try it out here: 

def loweralnum(text): 
    text = text.lower()
    newtext = ""
    for letter in text: 
        if letter.isalnum():
            newtext += letter

    return newtext

oldtext = "The OII is @1St. Gile's, Oxford."
print( loweralnum(oldtext))

theoiiis1stgilesoxford


Further notes on this can be found at: 
- [8. Defining and using functions](https://github.com/jakevdp/WhirlwindTourOfPython/blob/6f1daf714fe52a8dde6a288674ba46a7feed8816/08-Defining-Functions.ipynb)

What we will teach in the course: 
- We will go into greater depth in week one on the variaties of arguments you can specify, how to groups functions together as a library and what is a class (i.e. what is an object). 

# Basics VI: Fault tolerance - Debugging and Exceptions

In the above code example, I experienced a couple different bugs when I tried different things. I kept the code simple for illustration. But in research you will encounter lots of complexity as well as lots of ways of handling errors and debugging. The classic way to deal with errors is to use an exception. For example, you can ```raise``` or ```catch``` exceptions in your own code. Here is an example: 

~~~
x = 0 

try: 
    if 12/x > 1: 
        print("The value is less than or equal to 12")
except ZeroDivisionError: 
    print("You cannot divide by zero") 
    
~~~

Some things to know about exceptions:
* How to identify what exception was thrown when the code fails. 
* How to wrap some code in a ```try``` and ```except``` statement.
* What to include inside the try/except statement. This is instead of just placing the entire program in a try/except.

Further notes on these:
- [9. Errors and Exceptions](https://github.com/jakevdp/WhirlwindTourOfPython/blob/6f1daf714fe52a8dde6a288674ba46a7feed8816/09-Errors-and-Exceptions.ipynb)

What we will teach in class:
- We will expect you to know what an error is and where to see it. We will teach about strategies to avoid unintentaional errors in your code. 

# Extra skills

We have no problem with you coming in with extra Python skills. These really are some of the basics. They come easier with practice. In class we will provide a lecture explaining these and more complex topics. In the labs, we will provide a means to help you practice using these skills with real world data. Best of luck. If anything is unclear to you or you notice a breaking bug / typo, please get in contact with me bernie.hogan@oii.ox.ac.uk. 

# Final notes 

This course is shortened from last year and all the notes are pretty extensively rewritten. You can get a feel for the course by examining the [notebooks under 2018](https://github.com/oxfordinternetinstitute/sds-python). Don't be alarmed if it seems complicated. The students from last year still ended up learning lots and getting great grades. 
