# Introduction to Python and numeric computing

In this exercise we're going to introduce the Python programming langauge and it's libraries that are going to be used during this course.
 
- Intro to Jupyter notebooks
- Basics of python
- Python data structures like list, dictionary, tuple etc.
- Intro to Scientific Python (Numpy, Pandas)

# I. Python

Python is a interpreted, general-purpose, high-level language. 

[Python homepage](https://www.python.org)

We will be using Python 3 which is the latest version though Python 2 is still in use. 

Markdown and LaTeX support $\alpha$

## Anaconda Python Distribution

We recommend using the Anaconda Python distribution. It has an easy installer and contains everything you need to begin your scientific python journey.

https://www.anaconda.com/distribution/



## Jupyter Notebooks

We will use the Jupyter notebook environment for the exercises in this course as it is the standard these days and really well suited for learning.

https://jupyter.org/

As an alternate, if you cannot or don't want to install the libraries in your laptop use Google Colab notebooks instead. 

https://colab.research.google.com/

In [6]:
%time

Wall time: 0 ns


In [7]:
import sys 
print ("Python:", sys.version)

Python: 3.7.4 (default, Aug  9 2019, 18:34:13) [MSC v.1915 64 bit (AMD64)]


### Magic Commands
Jupyter supports certain **magic commands** to help with certain useful tasks.

There are two types of these commands.
    1. Line magics: Used with single %, e.g. %time
    2. Cell magics: Used with double %%, e.g. %%script

Here we can run a magic command to let the plots we create stay inline in the notebook.

In [23]:
%time sum(range(1, 10000))

Wall time: 0 ns


49995000

In [21]:
%%timeit
def sum(a, b):
    return a + b

sum(1, 2)

111 ns ± 2.44 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


In [24]:
%%script bash

for i in 1 2 3; do
   echo $i
done

Couldn't find program: 'bash'


## 1. Basic data types

### **Numerical**
Integers and floats work as you would expect from other languages.


In [25]:
# integers
x = 3

print(type(x)) # Prints "<class 'int'>"
print(x)       # Prints "3"

<class 'int'>
3


In [26]:
# numeric operators
print(x + 1)   # Addition; prints "4"
print(x - 1)   # Subtraction; prints "2"
print(x * 2)   # Multiplication; prints "6"
print(x ** 2)  # Exponentiation; prints "9"

4
2
6
9


In [27]:
# shorthand operators
x += 1
print(x)  # Prints "4"

x *= 2
print(x)  # Prints "8"

4
8


In [28]:
# floating numbers
y = 2.5

print(type(y)) # Prints "<class 'float'>"

print(y, y + 1, y * 2, y ** 2) # Prints "2.5 3.5 5.0 6.25"

<class 'float'>
2.5 3.5 5.0 6.25


### **Booleans** 
Python implements all of the usual operators for Boolean logic, but uses English words rather than symbols (&&, ||, etc.)

In [29]:
t = True
f = False

print(type(t)) # Prints "<class 'bool'>"

<class 'bool'>


In [30]:
# boolean operators
print(t and f) # Logical AND; prints "False"
print(t or f)  # Logical OR; prints "True"
print(not t)   # Logical NOT; prints "False"
print(t != f)  # Logical XOR; prints "True"

False
True
False
True


### **Strings** 

Python has great support for strings

In [31]:
hello = 'hello'    # String literals can use single quotes
world = "world"    # or double quotes; it does not matter.

print(hello)       # Prints "hello"
print(len(hello))  # String length; prints "5"

hello
5


In [32]:
# string concatenation
hw = hello + ' ' + world  # String concatenation
print(hw)  # prints "hello world"

hello world


In [33]:
# string formatting
hw12 = '%s %s %d' % (hello, world, 12)  # sprintf style string formatting
print(hw12)  # prints "hello world 12"

hello world 12


**Other useful functions of strings**

In [34]:
s = "hello"

print(s.capitalize())  # Capitalize a string; prints "Hello"
print(s.upper())       # Convert a string to uppercase; prints "HELLO"
print(s.rjust(7))      # Right-justify a string, padding with spaces; prints "  hello"
print(s.center(7))     # Center a string, padding with spaces; prints " hello "
print(s.replace('l', '(ell)'))  # Replace all instances of one substring with another;
                                # prints "he(ell)(ell)o"
    
print('  world '.strip())  # Strip leading and trailing whitespace; prints "world"

Hello
HELLO
  hello
 hello 
he(ell)(ell)o
world


## 2. Data Structures

### **Lists**

A list is the Python equivalent of an array, but is resizeable and can contain elements of different types:

In [38]:
xs = [3, 1, 2]    # Create a list
print(xs, xs[2])  # Prints "[3, 1, 2] 2"
print(xs[-1])     # Negative indices count from the end of the list; prints "2"
xs[2] = 'foo'     # Lists can contain elements of different types
print(xs)         # Prints "[3, 1, 'foo']"
xs.append('bar')  # Add a new element to the end of the list
print(xs)         # Prints "[3, 1, 'foo', 'bar']"
x = xs.pop()      # Remove and return the last element of the list
print(x, xs)      # Prints "bar [3, 1, 'foo']"


[3, 1, 2] 2
2
[3, 1, 'foo']
[3, 1, 'foo', 'bar']
bar [3, 1, 'foo']


In [40]:
# Slicing

nums = list(range(5))     # range is a built-in function that creates a list of integers
print(nums)               # Prints "[0, 1, 2, 3, 4]"
print(nums[2:4])          # Get a slice from index 2 to 4 (exclusive); prints "[2, 3]"
print(nums[2:])           # Get a slice from index 2 to the end; prints "[2, 3, 4]"
print(nums[:2])           # Get a slice from the start to index 2 (exclusive); prints "[0, 1]"
print(nums[:])            # Get a slice of the whole list; prints "[0, 1, 2, 3, 4]"
print(nums[:-1])          # Slice indices can be negative; prints "[0, 1, 2, 3]"
nums[2:4] = [8, 9]        # Assign a new sublist to a slice
print(nums)               # Prints "[0, 1, 8, 9, 4]"

[0, 1, 2, 3, 4]
[2, 3]
[2, 3, 4]
[0, 1]
[0, 1, 2, 3, 4]
[0, 1, 2, 3]
[0, 1, 8, 9, 4]


In [41]:
# Looping

animals = ['cat', 'dog', 'monkey']
for animal in animals:
    print(animal)

cat
dog
monkey


In [43]:
# With Index

animals = ['cat', 'dog', 'monkey']
for idx, animal in enumerate(animals):
    print('#%d: %s' % (idx + 1, animal))

#1: cat
#2: dog
#3: monkey


### **List Comprehension**

In [44]:
nums = [0, 1, 2, 3, 4]
squares = []
for x in nums:
    squares.append(x ** 2)
print(squares)   # Prints [0, 1, 4, 9, 16]

[0, 1, 4, 9, 16]


In [45]:
nums = [0, 1, 2, 3, 4]
squares = [x ** 2 for x in nums]
print(squares)   # Prints [0, 1, 4, 9, 16]

[0, 1, 4, 9, 16]


In [47]:
# Conditions

nums = [0, 1, 2, 3, 4]
even_squares = [x ** 2 for x in nums if x % 2 == 0]
print(even_squares)  # Prints "[0, 4, 16]"

[0, 4, 16]


### **Dictionaries**

A dictionary stores (key, value) pairs, similar to a Map in Java or an object in Javascript. You can use it like this:

In [48]:
d = {'cat': 'cute', 'dog': 'furry'}  # Create a new dictionary with some data
print(d['cat'])       # Get an entry from a dictionary; prints "cute"
print('cat' in d)     # Check if a dictionary has a given key; prints "True"
d['fish'] = 'wet'     # Set an entry in a dictionary
print(d['fish'])      # Prints "wet"
# print(d['monkey'])  # KeyError: 'monkey' not a key of d
print(d.get('monkey', 'N/A'))  # Get an element with a default; prints "N/A"
print(d.get('fish', 'N/A'))    # Get an element with a default; prints "wet"
del d['fish']         # Remove an element from a dictionary
print(d.get('fish', 'N/A')) # "fish" is no longer a key; prints "N/A"

cute
True
wet
N/A
wet
N/A


In [49]:
# Loops

d = {'person': 2, 'cat': 4, 'spider': 8}
for animal in d:
    legs = d[animal]
    print('A %s has %d legs' % (animal, legs))

A person has 2 legs
A cat has 4 legs
A spider has 8 legs


In [50]:
# Item access

d = {'person': 2, 'cat': 4, 'spider': 8}
for animal, legs in d.items():
    print('A %s has %d legs' % (animal, legs))

A person has 2 legs
A cat has 4 legs
A spider has 8 legs


Also has *sets* and *tuples*

## 3. Functions

In [51]:
def sign(x):
    if x > 0:
        return 'positive'
    elif x < 0:
        return 'negative'
    else:
        return 'zero'

for x in [-1, 0, 1]:
    print(sign(x))

negative
zero
positive


In [52]:
def hello(name, loud=False):
    if loud:
        print('HELLO, %s!' % name.upper())
    else:
        print('Hello, %s' % name)

hello('Bob') 
hello('Fred', loud=True) 

Hello, Bob
HELLO, FRED!


## Anonymous functions

In [53]:
sq = lambda x: x**2   # short functions can also be defined without def <name>

In [54]:
sq(9) # call variable sq with arg 9, same as creating a function with def sq; prints 81

81

This is really useful and widely used in data analysis as well will see. For example, we can use this to apply the same function to all elements in a sequence.

In [59]:
my_list = [1, 3, 6, 8]
list(map(lambda x: x**3, my_list))  # built-in functions that applies function to all elements of the list; prints [1, 27, 216, 512]

[1, 27, 216, 512]

## 4. Classess

In [60]:
class Greeter(object):

    # Constructor
    def __init__(self, name):
        self.name = name  # Create an instance variable

    # Method
    def greet(self, loud=False):
        if loud:
            print('HELLO, %s!' % self.name.upper())
        else:
            print('Hello, %s' % self.name)

g = Greeter('Ram')  # Construct an instance of the Greeter class
g.greet()            # Call an instance method; prints "Hello, Fred"
g.greet(loud=True)   # Call an instance method; prints "HELLO, FRED!"

Hello, Ram
HELLO, RAM!


# II. Numpy

Numpy is the most important scientific library for Python. It provides us with optimized N-dimensional array data structures and functions that enables other libraries to process data.

[Numpy homepage](http://www.numpy.org)


## Installation
```
pip install numpy
conda install numpy
```

## 1. Arrays

A numpy array is a grid of values, all of the same type, and is indexed by a tuple of nonnegative integers. The number of dimensions is the rank of the array; the shape of an array is a tuple of integers giving the size of the array along each dimension.

In [61]:
import numpy as np

`np` is the most widely used shorthand for numpy. We will come across several of these for other libraries.

It is generally advised to stick with these as it will make your code easily understandable and compatible with other code out there.

In [62]:
a = np.array([1, 2, 3])   # Create a rank 1 array
print(type(a))            # Prints "<class 'numpy.ndarray'>"
print(a.shape)            # Prints "(3,)"
print(a[0], a[1], a[2])   # Prints "1 2 3"
a[0] = 5                  # Change an element of the array
print(a)                  # Prints "[5, 2, 3]"

<class 'numpy.ndarray'>
(3,)
1 2 3
[5 2 3]


In [64]:
b = np.array([[1,2,3],[4,5,6]])    # Create a rank 2 array
print(b.shape)                     # Prints "(2, 3)"
print(b[0, 0], b[0, 1], b[1, 0])   # Prints "1 2 4"

(2, 3)
1 2 4


## 2. Comparison with Python Lists

In [65]:
# regular python list
a = [i for i in range(1, 1000000)]

**%%timeit** is a *magic function* in Jupyter that lets us compare code running times.

In [66]:
%%timeit

sum_a = sum(a)

20.1 ms ± 915 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [67]:
# numpy array
b = np.arange(1, 1000000)

In [68]:
%%timeit

sum_b = np.sum(b) 

371 µs ± 47.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


Numpy arrays are  around _7-8_ times faster than regular python lists in this particular example.

Numpy also provides many functions to create arrays:

In [69]:
a = np.zeros((2,2))   # Create an array of all zeros
print(a)              # Prints "[[ 0.  0.]
                      #          [ 0.  0.]]"

[[0. 0.]
 [0. 0.]]


In [70]:
b = np.ones((1,2))    # Create an array of all ones
print(b)              # Prints "[[ 1.  1.]]"

[[1. 1.]]


In [71]:
c = np.full((2,2), 7)  # Create a constant array
print(c)               # Prints "[[ 7.  7.]
                       #          [ 7.  7.]]"

[[7 7]
 [7 7]]


In [72]:
d = np.eye(2)         # Create a 2x2 identity matrix
print(d)              # Prints "[[ 1.  0.]
                      #          [ 0.  1.]]"

[[1. 0.]
 [0. 1.]]


In [73]:
e = np.random.random((2,2))  # Create an array filled with random values
print(e)                     # Might print something like "[[ 0.91940167  0.08143941]
                             #                              [ 0.68744134  0.87236687]]"

[[0.90091763 0.43479981]
 [0.09738119 0.78615373]]


## 3. Array Indexing

In [75]:
# Create the following rank 2 array with shape (3, 4)
# [[ 1  2  3  4]
#  [ 5  6  7  8]
#  [ 9 10 11 12]]
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(a)

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]


In [76]:
# We can use simple numeric indexes to view a particular element of the array just like python lists
idx = a[0]
print(idx)

[1 2 3 4]


In [77]:
idx2 = a[2, 1]
print(idx2)

10


In [78]:
# Use slicing to pull out the subarray consisting of the first 2 rows
# and columns 1 and 2; b is the following array of shape (2, 2):
# [[2 3]
#  [6 7]]
b = a[:2, 1:3]

print(b)

[[2 3]
 [6 7]]


In [79]:
# A slice of an array is a view into the same data, so modifying it
# will modify the original array.
print(a[0, 1])   # Prints "2"

2


In [80]:
b[0, 0] = 77     # b[0, 0] is the same piece of data as a[0, 1]
print(a[0, 1])   # Prints "77"

77


## 4. Data Types

In [81]:
x = np.array([1, 2])   # Let numpy choose the datatype
print(x.dtype)         # Prints "int64"

int32


In [82]:
x = np.array([1.0, 2.0])   # Let numpy choose the datatype
print(x.dtype)             # Prints "float64"

float64


In [83]:
x = np.array([1.0, 2.5], dtype=np.int64)   # Force a particular datatype
print(x.dtype)                             # Prints "int64"

int64


In [None]:
x

## 5. Aggregration

In [84]:
x = np.array([[1,2],[3,4]])

print(np.sum(x))          # Compute sum of all elements; prints "10"
print(np.sum(x, axis=0))  # Compute sum of each column; prints "[4 6]"
print(np.sum(x, axis=1))  # Compute sum of each row; prints "[3 7]"

10
[4 6]
[3 7]


In [85]:
x = np.array([[1,2],[3,4]])

print(np.mean(x))
print(np.mean(x, axis=0))
print(np.mean(x, axis=1))

2.5
[2. 3.]
[1.5 3.5]


In [86]:
np.min(x), np.max(x)

(1, 4)

In [87]:
np.random.rand(4,5)

array([[0.48596115, 0.25053743, 0.54895668, 0.14071102, 0.50148336],
       [0.02634721, 0.67733948, 0.6088495 , 0.06468929, 0.14784897],
       [0.28429104, 0.04220989, 0.70366856, 0.50924034, 0.54043945],
       [0.76089855, 0.27671334, 0.41569531, 0.52158513, 0.41960064]])