# Python 

* About the Python language
* Running Python code
* Python data structures
* Contol structures 

Some slides copied and adapted from [Software Carpentry](http://swcarpentry.github.io/python-novice-gapminder/02-variables/index.html)

# Running Python 

* Python is an interpreted language, the `python` program runs python code
* Run `python` with no arguments to get a Python prompt
    * enter Python expressions and see the results (demo)
* Save your program in a file `code.py` and run it with a command line:
    * `python code.py`
* Use the Jupyter Notebook environment to run your code
    * enter code in blocks and click __Run__, see the result below
    

# Variables and Values

* Variables are names for values.
* In Python the `=` symbol assigns the value on the right to the name on the left.
* The variable is created when a value is assigned to it.
* Variable names
    * can only contain letters, digits, and underscore _ (typically used to separate words in long variable names)
    * cannot start with a digit
* Here, Python assigns an age to a variable `age` and a name in quotes to a variable `first_name`.

In [1]:
age = 23
first_name = 'Steve'

# Use print to display values.

* Python has a built-in function called `print` that prints things as text.
* Call the function (i.e., tell Python to run it) by using its name.
* Provide values to the function (i.e., the things to print) in parentheses.
* To add a string to the printout, wrap the string in single or double quotes.
* The values passed to the function are called ‘arguments’

In [2]:
print(first_name, 'is', age, 'years old')

Steve is 23 years old


# Variables must be created before they are used.

* If a variable doesn’t exist yet, or if the name has been mis-spelled, Python reports an error.
* Unlike some languages, which “guess” a default value.

In [4]:
print(last_name)

NameError: name 'last_name' is not defined

# Order of execution

* In a notebook, the order of execution is the order you run the cells
* You could define a variable early in the notebook but if the cell is not run it will not be defined
* To prevent confusion, it can be helpful to use the `Kernel` -> `Restart & Run All` option which clears the interpreter and runs everything from a clean slate going top to bottom.

# Variables can be used in calculations.

* We can use variables in calculations just as if they were values.
    * Remember, we assigned 23 to `age` a few lines ago.

In [7]:
age = age + 2
print("Age in two years is:", age)

Age in two years is: 29


# Use an index to get a single character from a string

* Square brackets after variable name access parts of the string
* Each character position is given a number or index
* index starts from 0 
* So, the fist character is 0, the second 1 etc

In [11]:
city = "Wuhan"
print(city[3])

a


# Use a slice to get a substring

* If we want to get part of a string, use a slice
* Same square bracket notation
* This time two indices: `[start:end]`
* Value is every character from `start` upto but not including `end`
* If you miss `start` or `end` then the start or end of the string is assumed

In [15]:
print(city[0:3])
print(city[2:4])
print(city[:3])
print(city[2:])

Wuh
ha
Wuh
han


# Python Data Structures

* Understand the different data structures available in Python
* Strings, Lists, Tuples, Dictionaries
* Methods defined on each type 
* What to use for different tasks
* Packages and data structures for numerical data analysis:
    * Numpy vectors, arrays, matrices

## Strings

* Sequence of characters representing text
* In Python, strings are unicode - so can store any characters
* Can use single quote ', double quote " or triple quotes """ 
* Strings are objects, call methods on them
* Check out [the documentation](https://docs.python.org/3/library/string.html) for more

In [91]:
s1 = 'single quoted string might have "quotes in it"'
s2 = "double quoted string could contain it's or that's"
s3 = """triple quoted string
can contain 
newlines"""
s4 = "String containing 中文"
s3

'triple quoted string\ncan contain \nnewlines'

In [92]:
print(s3)

triple quoted string
can contain 
newlines


# Operations on Strings

In [93]:
# convert s4 to uppercase and store as s5
s5 = s4.upper()
s5

'STRING CONTAINING 中文'

In [94]:
# find the first occurence of 'g' in s4
firstg = s4.find('g')
# get all characters after that
s4[firstg:]

'g containing 中文'

# Lists and Tuples

* Lists are sequences of values
* Written inside square brackets, separated by commas
    * ['this', 'is', 'a', 'list', 'of', 'strings'], [3, 4, 6]
    * ['strings', 'and', 3, 9, 2]
    * ['lists', 'containing', ['another', 'list']]
* Lists can be modified, elements added, removed, replaced
* Tuples are just like lists but can't be modified
    * ('this', 'is', 'a', 'tuple')
    * sometimes it's more efficient to use a tuple
* Check out the [documentation](https://docs.python.org/3/library/stdtypes.html#sequence-types-list-tuple-range)

In [95]:
list1 = ['this', 'is', 'a', 'list', 'of', 'strings']
list2 = ['embedded', 'list', list1]
list2

['embedded', 'list', ['this', 'is', 'a', 'list', 'of', 'strings']]

In [96]:
# the 'in' operator checks whether something is in a list
'a' in list1

True

In [97]:
if 'a' in list1:
    print("found it")
else:
    print("not found")

found it


# Lists and Loops

In [98]:
text = "To be or not to be that is the question"
# split the string at every space character, generate a list
words = text.split()
print(words)

# create a new empty list
wordlengths = []
for word in words:
    # append the length of this word to our list
    wordlengths.append(len(word))

print(wordlengths)

['To', 'be', 'or', 'not', 'to', 'be', 'that', 'is', 'the', 'question']
[2, 2, 2, 3, 2, 2, 4, 2, 3, 8]


# Tuples vs Lists

In [99]:
# could use a tuple to represent a data record
record = ('steve', 'cassidy', 39)
# can use some of the same operations as lists
print("Length: ", len(record))
print("Second element: ", record[1])

Length:  3
Second element:  cassidy


In [100]:
# but this can't be modified
record.append(21)

AttributeError: 'tuple' object has no attribute 'append'

# Tuples, Lists and Strings

These are all _sequence_ types and share some common operations.

In [101]:
s = ['a', 'list', 'of', 'stuff']
t = [1, 2, 3]

'a' in s         # boolean test
'x' not in s     # boolean test
s + t            # concatenate s and t
t * 3            # t repeated 3 times
s[1]             # second element
s[:2]            # all elements up to the second
s[2:4]           # all elements from 3rd to 4th
min(s)           # smallest element
max(s)           # largest element
s.count('a')     # how many times does 'a' occur

1

# Dictionaries

* Dictionaries are associative arrays
* Associate a key with a value
* Key can be any immutable type (string, number, tuple)
* Value can be anything
* O(1) access to elements (hash table)
* Compare with lists that are O(n)


# Dictionaries

In [102]:
info = dict()
info['name'] = 'Steve Cassidy'
info['age'] = 53
info['weight'] = 80
info

{'name': 'Steve Cassidy', 'age': 53, 'weight': 80}

In [103]:
info = {
    'name': 'Steve Cassidy', 
    'age': 53, 
    'weight': 80
}
info['age']

53

In [104]:
if 'age' in info:
    print("Age: ", info['age'])

Age:  53


# An Example

Let's write some code to use strings, lists and dictionaries to solve a problem. 

Given a text, count how many times each word occurs in the text.


In [105]:
text = """Python is an easy to learn, powerful programming language. 
It has efficient high-level data structures and a simple but effective 
approach to object-oriented programming. Python’s elegant syntax 
and dynamic typing, together with its interpreted nature, make 
it an ideal language for scripting and rapid application development 
in many areas on most platforms."""

# remove punctuation and convert to lowercase
text = text.replace(',',' ').replace('.', ' ').replace('-', ' ')
text = text.lower()
print(text)


python is an easy to learn  powerful programming language  
it has efficient high level data structures and a simple but effective 
approach to object oriented programming  python’s elegant syntax 
and dynamic typing  together with its interpreted nature  make 
it an ideal language for scripting and rapid application development 
in many areas on most platforms 


In [106]:
# split text into words on whitespace
words = text.split()
# intialise a dictionary for word counts
# we will use the word as a key and store the count for each word
count = dict()
# count the words
for word in words:
    if word in count:
        count[word] += 1        # word is already in the dictionary
    else:
        count[word] = 1         # word is not in the dictionary

for word in sorted(count):
    print(word, count[word])

a 1
an 2
and 3
application 1
approach 1
areas 1
but 1
data 1
development 1
dynamic 1
easy 1
effective 1
efficient 1
elegant 1
for 1
has 1
high 1
ideal 1
in 1
interpreted 1
is 1
it 2
its 1
language 2
learn 1
level 1
make 1
many 1
most 1
nature 1
object 1
on 1
oriented 1
platforms 1
powerful 1
programming 2
python 1
python’s 1
rapid 1
scripting 1
simple 1
structures 1
syntax 1
to 2
together 1
typing 1
with 1


# NumPy Data Structures

* [NumPy](https://docs.scipy.org/doc/numpy/user/index.html) is a Python module for numerical computing
* It supports a multidimensional array type (array)
* Operations on numerical data
* Faster than using Python lists to hold data
* Arrays can be one, two, three or many dimensional

# One Dimensional Arrays

In [2]:
import numpy as np

# make an array from a Python list
a1 = np.array([1, 2, 3, 2, 1])
print(a1)
# the type of the array is defined by the dtype property
print("Type is:", a1.dtype)

# all elements in the array have the same type
# so here integers are coerced to float64
a2 = np.array([1.0, 2.1, 3, 2, 1])
print(a2)
print("Type is:", a2.dtype)

# make a sequence from 1..x with arange
a3 = np.arange(10000)
print(a3)

[1 2 3 2 1]
Type is: int64
[ 1.   2.1  3.   2.   1. ]
Type is: float64
[   0    1    2 ..., 9997 9998 9999]


# NumPy Arrays are faster than Python Lists

In [5]:
import time

# make a big array of 10 million integers
bigarray = np.arange(10000000)

# time the execution of sum
start = time.time()
print(bigarray.sum())
print("Time:", time.time()-start)

49999995000000
Time: 0.011702775955200195


In [6]:
# convert to a Python list
biglist = list(bigarray)
start = time.time()
print(sum(biglist))
print("Time:", time.time()-start)

49999995000000
Time: 1.0716309547424316


# Two Dimensions

In [110]:
# Create a 3 x 2 array from nested Python lists, specify the dtype
mat = np.array([[1, 2, 3], [4, 5, 6]], np.int32)
print(mat)

[[1 2 3]
 [4 5 6]]


In [111]:
# make a diagonal array from a Python list
diag = np.diag([1, 2, 3, 4, 5])
print(diag)

[[1 0 0 0 0]
 [0 2 0 0 0]
 [0 0 3 0 0]
 [0 0 0 4 0]
 [0 0 0 0 5]]


In [112]:
# properties of arrays
print(diag.shape) # tuple describing shape
print(diag.size)  # number of elements
print(diag.ndim)  # number of dimensions

(5, 5)
25
2


# More Dimensions

In [113]:
# make a one dimensional array of integers up to 24
# then reshape it to be 2 x 3 x 4
threeD = np.arange(24).reshape(2,3,4)
threeD

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])

# Axes in Multi-dimensional Arrays

* One dimensional arrays have one axis
* Two dimensional arrays have two axes (etc)
* In our 2 x 3 x 4 array the first axis has size 2
* The second has size 3 etc

In [114]:
mat

array([[1, 2, 3],
       [4, 5, 6]], dtype=int32)

In [115]:
mat.shape

(2, 3)

In [116]:
# use [] notation to get at parts of the matrix
print(mat[0])     # first row
print(mat[0,:])   # same
print(mat[:,0])   # first column

[1 2 3]
[1 2 3]
[1 4]


# More Examples

See the documentation: [NumPy Quickstart](https://docs.scipy.org/doc/numpy/user/quickstart.html), [NumPy Basics](https://docs.scipy.org/doc/numpy/user/basics.html)

# Python Functions and Control Structures

We need to know how to define and call functions and how to write loop and conditionals.

In [8]:
def count_words(text):
    """Count the words in a given text string. 
    Return a dictionary with words as keys and word counts as values."""
    
    words = text.split()
    count = dict()
    
    for word in words:
        if word in count:
            count[word] += 1        
        else:
            count[word] = 1         
    return count

c1 = count_words("this is my text")
c2 = count_words("and this is another text with this in it a few times")

print(c1)
print(c2)

{'this': 1, 'is': 1, 'my': 1, 'text': 1}
{'and': 1, 'this': 2, 'is': 1, 'another': 1, 'text': 1, 'with': 1, 'in': 1, 'it': 1, 'a': 1, 'few': 1, 'times': 1}


## Notes

* The 'def' keyword is used to introduce a function.  
* There are no type declarations for variables
    * However, Python 3.5 introduced _type hinting_
* Just after the 'def' line we can include a __Documentation String__
    * Describes the function, how it could be used
    * Becomes the online help for the function
    

In [127]:
help(count_words)

Help on function count_words in module __main__:

count_words(text)
    Count the words in a given text string. 
    Return a dictionary with words as keys and word counts as values.



# A More Complex Function

In [131]:
def make_list(n, element=0):
    """Generate a list of n elements, 
     if element is supplied use that (default 0)"""
    
    result = []
    for i in range(n):
        result.append(element)
    return result

print(make_list(3))
print(make_list(3, 1))
print(make_list(3, element=2))

[0, 0, 0]
[1, 1, 1]
[2, 2, 2]


# The End