# QMUL SBM PHD Python Workshop 
## Introduction to Python Programming

This is the first part of the PhD Workshop on [Python](https://www.python.org/) programming. In this session, we will introduce the basics of programming with Python, in which we will look at variable assignments and different data types (numerical, text, list, and tuples). In the following sessions we will work on extracting and cleaning data and processing textual data with natural language processing techniques.

The first set of computer lab tutorials on Python programming follows the structure and organisation of McKinney, W. 2017. Python for Data Analysis, 2nd Edition, O'Reilly, but there are many equally good online resources. We might also cover web scraping if the time permits.

## Install Python and Jupyter Notebook on Your Personal Computers

You can install Python and Jupyter Notebook by downloading the most recent stable version of the [Anaconda](https://www.anaconda.com/products/individual) distribution, which can be downloaded from the provided link. Different interpreters and development environments can be used with Python. We will use the [Jupyter Notebook](https://jupyter.org/), which is a web browser based application that allows you to run and to insert text, equations, and graphics. These can be directly opened on Google Colab.

After installation, you can open a Jupyter notebook into which you can write code and execute. Follow the steps on [Jupyter Notebook website](https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/execute.html) for operating system specific instructions, which involves running `jupyter notebook` in a terminal. You can `rename` and `save` notebooks from the `File` menu.

## Markdown and Code Cells

You can insert **code** and **markdown** cells into Jupyter notebooks. Code cells are for typing and running Python code. Markdown cells are for the text content and organizing and formatting documents. 

You can add section titles by `#`, `##`, `###`, you can create numbered and bullet lists by starting the line with the corresponding marker, e.g. `1.`,`2.`, `3.`. You can format text for instance by `*italic_text*` and `**bold_text**`. Visit [Markdown Guide](https://www.markdownguide.org/) for further details.  

You can change the cell type from the drop-down menu at the top or choose the cell and press the keys <kbd>y</kbd> and <kbd>m</kbd> for the code and the markdown cells, respectively. In Google Colab, you need to press <kbd>ctrl</kbd> + <kbd>m</kbd> + <kbd>y</kbd> and <kbd>ctrl</kbd> + <kbd>m</kbd> + <kbd>m</kbd>, respectively.

Python is an interpreted language, meaning that you can freely execute different segments of code (which are cells on Jupyter notebooks) without the need to compile it first to the machine code. This reduces the development time and facilitates experimentation with data in an interactive way but slows down execution, especially for computationally intensive tasks. This is why most heavy computations are implemented in other languages C and C++, which is then wrapped in Python. 

If you click on *Run* or hit <kbd>shift</kbd>+<kbd>enter</kbd>  (also <kbd>ctrl</kbd>+<kbd>enter</kbd> in Jupyter notebook) shortcuts, it runs the selected cell (more options under *Cell*).  

## Getting Started

We shall now start using Python as a simple calculator. You can use the operators `+` (add), `-` (substract), `*` (multiply), `/` (divide), `**` (power), `%` (modulus) in *expressions*. For instance:

In [None]:
# Calculate (3 - 5 * 1.8) / 1.2
__3 __ 5 __ 1.8 __ 1.2

### Variable Assignment

Assign the outcome of an expression on the right of the equation to the variable. 

In [None]:
# Assign square of 3 to x
x = __

You can print the value of the variable `x` by simply running `x` or `print(x)`. `print()` is the first *function* we are seeing and using. A function is a block of code that takes inputs through its arguments, performs some taks, and potentially returns a value at the end.

In [None]:
print(__)

Some dataypes such as integers, floating numbers, and strings are *immutable*, meaning that if you try to alter the value, Python basically creates a new object with the same name.

Check what results you expect to see:

In [None]:
x = 10
y = x 
x *= 2 #This is called augmented assignment and it is equivalent to x = x * 2
print('x:', x, ',y:', y) #You can print multiple values by putting a comma in between 

## Data Types

Python has several built-in data types and you can define new types by creating classes. A thorough coverage of object-oriented programming is beyond the scope of this workshop. 

You can learn the type of an object by calling the `type` function. Here we focus on the most fundamental data types. 

### Numeric Data Types
The numeric data is held as integers (`int`) and floating numbers (`float`). 

In [None]:
x = 1
y = 2.8
# Print the types of the variables x and y
print(__(x), __(y), __(x + y)) 

For numerical computations, the most commonly used Python package (library) is `numpy`, which most importantly provides a collection of mathematical functions and the `array` type (`ndarray`). We will look at `numpy` in more detail later but we shall now illustrate how it is used. For using functions or classes from a package, you should first `import` the package.

In [None]:
import numpy as np # np is an alias for numpy, which we will use to access numpy functions

We can now call the functions and constants from the `numpy` package by for instance`np.sqrt()`.

In [None]:
# Print the result of the expression pi + sqrt(3)
print(__.__+__.__(3))

### Text Data Type

The text data type in Python is `str`, which is immutable but iterable. You define a `str` by using `''` or `""`. 

In [None]:
# Assign your first name and last name to the respective variables
first_name = __
last_name = __

You can concatenate `str` objects by `+`: 

In [None]:
# Create your full name variable
name = first_name __ ' ' __ last_name
# Print your full name:
__

You can access the individual characters in your string variable, because it is iterable. Python is zero-indexed, meaning that the index of the first, second, third elements are 0, 1, 2, .... The subscripting is done by `x[n]`, which returns the fourth element of the iterable x. 

In [None]:
# Print the second letter of your name
name[__]

The last element can be accessed by -1 and you can go backwards by -1, -2, -3, etc.

In [None]:
# Print the first two and last two letters of your name, 
print(__,__,__,__)

You can *slice* the string by giving the first (inclusive) and last (exclusive) indices from which to slice. For instance, `x[1:4]` returns the x elements in indices 1, 2, and 3.  

In [None]:
# Return the second to fourth letters of your name, including both
__

Note that you cannot change this value because `str` objects are immutable.

In [None]:
# Try the following
name[3] = 'z'

You can get the length of a `str` object (and any iterable) by calling the `len()` function:

In [None]:
# Print the number of letters in your name
__

#### Str Methods
When you create a string object, it comes equipped with a set of methods. Some useful methods are as follows. You can get a list of the methods by `dir(str)`
- `split`: splits the string into words separated by whitespace
- `count`: counts the appearance of provided substring in the string
- `find`: returns the first instance of the substring
- `isalpha`: returns `True` if all alphabetical letters, `False` otherwise
- `isdigit`: returns `True` is all digits, `False` otherwise
- `lower` (`upper`): change all letters to lower character
- `replace`: replace all instances of the substring to the new substring 
- `join`: if you have a series of objects you can join them by the provided string

In [None]:
sentence = "We have all been looking forward to this day"
# split the sentence to a list of words
words = __
#print words
__

In [None]:
# Count the number of instances of the phrases 'THE' / 'the' / 'tHe' ... and also the start index of the first instance
text = "instances of THE, while the first starts at index"
print('counts of "the" ', __, ' - first "the" starts at index: ', __)

## LIST

List is a sequence data type and provides an iterable collection of individual elements. You can create a list of any objects you want. For instance, a list of numbers, a list of strings, a list of numbers and strings mixed, list of lists, list of functions, and basically list of any objects.

In [None]:
# list of integers 2, 7, 9
first_list = [2,7,9] 
# print the list
__

In [None]:
# create a list of strings 'apple' and 'orange'
second_list = __
# print the list 
__

Empty list is created by `x = []`, which might be needed if you populate a list through a loop.

In [None]:
x = [] #empty list

You can concatenate lists by using `+`. Note the data types of elements are not affected.

In [None]:
# Append second_list to the end of the first_list
__ __ __

You can convert other data types to list by the `list()` function, which is useful in some instances.

In [None]:
my_string = __
# Create a list of characters in my_string
__

Lists are iterable, which means you can index and slice lists as for strings

In [None]:
x = [1, 4, "three", -2.8]
# Subscript the element in index 2
__

In [None]:
# Slice the list of all elements except the last one
x[__]

Lists are mutable, meaning you can change the values of individual elements, differently from strings and numerical basic datatypes.

In [None]:
# Change the value of "three" to 3
__
#Print x
__

### List Aliasing 

Lists are mutable and Python calls by object. When you assign one variable to another list, the two variables become alias for the same list object, also called deep copy. You must keep this in mind when working with lists. 

In [None]:
# What output do you expect from the following?
x = [3, 6, 8]
y = x
y[2] = -1
print(x, y)

If you want a shallow copy, you can use the `list()` function.

In [None]:
# What output do you expect from the following?
x = [3, 6, [1, 7, 2]]
y = list(x)
y[1] = -1
print(x, y)

### List Methods
There are various methods of lists for inserting, removing and sorting elements, which are mutator methods, meaning that they change the list object itself. You call the method `method1` of the object x by calling `x.method1()`.
- `insert(ind, y)` inserts element `y` at index `ind` and shifts all consecutive items by one to the right
- `append(y)` adds element `y` at the end of the list
- `extend([y1, y2, y3])` append elements `y1`, `y2`, and `y3` at the end of the list
- `pop(ind)` removes the element in index ind, remove the last element if no index provided
- `sort()` sorts the list if it is sortable (numbers or strings)

In [None]:
y = [-6, 2, 'python', 3.1, [1.2, 0]]
y.insert(2, 'anaconda')
# What output do you expect?
y

In [None]:
# Add 'jupyter' at the end of the list
__
# Add [-2, 6] at the end of the list
__
# Remove the first (zeroth index) element
__
# Print y
__

In [None]:
list1 = [-6, 7, 2]
# Sort the elements
__
# Print list1
__

In [None]:
# What output do you expect?
list2 = [0, -2.2, 'accounting']
list2.sort()

## TUPLE
A tuple is a fixed-length, immutable sequence of Python objects. It is very similar to list but cannot be altered. You can convert any sequence or iterable to tuple using `tuple()` (similar to `list()`). Elements can be accessed with square brackets [] as for other sequence types.

In [None]:
# Create a tuple
tup = 1, 2, 3, 'a', None, [1, 6, 2]
tup

In [None]:
# Slice elements in indices 2  to 4 (inclusive)
__

In [None]:
# What do you expect?
tup[-1] = 0
tup

In [None]:
# What do you now expect?
tup[-1][1] = -2
tup

Tuples can be unpacked:

In [None]:
a = (-1, 1)
x, y = a
# What output do you expect?
print(x, y)

## SET
A set is an unordered collection of unique elements. They can be created by the `set` constructor or using curly brackets {}. They are very handy because they allow set operations such as union, intersection, and difference.
- Intersection of sets `a` and `b`: `a & b` or `a.intersection(b)`
- Union of sets `a` and `b`: `a | b` or `a.union(b)`
- Difference of set `a` from `b`: `a - b` or  `a.difference(b)`

In [None]:
x = set([2, 2, 2, 1, 3, 3])
y = {3, 'a', 6, 1, 'a'}
# What outputs do you expect?
print(x,y)

In [None]:
a = {0, 1, 2, 3, 4}
b = set([3, 4, 5, 6, 7, 8])
# Print union of a and b
print(__)
# Print intersection of a and b
print(__)
# Print difference of a from b
print(__)

## DICT

The `dict` structures are dictionaries of **key-value** pairs, where key and value are Python objects. Dictionaries are critical for labelling objects, which provide for instance the basis of indices in tabular data (Pandas) as we will soon see. You can create a dictionary by putting in curly brackets {} the `key:value` pairs,  separated by commas.  
For example, let's create a dictionary of profits (say in million pounds):

In [None]:
firms = {"firm A": 2.68, "firm B": None, "firm C": 1.13}
firms

You access values by key, i.e. `my_dict[key]`:

In [None]:
# What is the profit of firm A?
__

You can insert new key-value pairs or update existing ones by assignment: `my_dict[key]=value`

In [None]:
# Decrease the profit entry of firm A by 0.13
__

# Insert new entry for firm D that has a profit of 3.0
__

# print firms dictionary
__

You can remove elements using the `pop()` method as in lists.

In [None]:
# Remove the firm B entry
__

# Print the dictionary
__

The `keys()` and `values()` methods return the keys and the values 
of the dictionary.



In [None]:
# Keys
print(__)

# Values
print(__)

You can pair up sequences with `zip` function to form dictionaries, i.e. `dict(zip(seq1, seq2))`:

In [None]:
countries = ('UK', 'Spain', 'Italy', 'France') # keys
capitals = ('London', 'Madrid', 'Rome', 'Paris') # values

# Create a dict that returns capital for the country chosen
my_dict = __
# Print my_dict
my_dict