Disclaimer: This notebook has been copied from [intro-python.ipynb](http://nbviewer.jupyter.org/github/phelps-sg/python-bigdata/blob/master/src/main/ipynb/intro-python.ipynb)
I have updated some cells to be able to run with Python 3.

# Introduction to Python

Copyright [Steve Phelps](http://sphelps.net) 2014


# Python is interpreted

- Python is an _interpreted_ language, in contrast to Java and C which are compiled languages.

- This means we can type statements into the interpreter and they are executed immediately.


In [None]:
5 + 5

In [None]:
x = 5
y = 'Hello There'
z = 10.5

In [None]:
x + 5

# Assignments versus equations

- In Python when we write `x = 5` this means something different from an equation $x=5$.

- Unlike variables in mathematical models, variables in Python can refer to different things as more statements are interpreted.


In [None]:
x = 1
print('The value of x is ', x)

x = 2.5
print ('Now the value of x is ', x)

x = 'hello there'
print ('Now it is ', x)

# Calling Functions

We can call functions in a conventional way using round brackets

In [None]:
print(round(3.14))

# Types

- Values in Python have an associated _type_.

- If we combine types incorrectly we get an error.

In [None]:
print(y)

In [None]:
y + 5

# The type function

- We can query the type of a value using the `type` function.

In [None]:
type(1)

In [None]:
type('hello')

In [None]:
type(2.5)

In [None]:
type(True)

# Null values

- Sometimes we represent "no data" or "not applicable".  

- In Python we use the special value `None`.

- This corresponds to `Null` in Java or SQL.


In [None]:
result = None

- When we fetch the value `None` in the interactive interpreter, no result is printed out.


In [None]:
result

- We can check whether there is a result or not using the `is` operator:

In [None]:
result is None

In [None]:
x = 5
x is None

# Converting values between types

- We can convert values between different types.

- To convert an integer to a floating-point number use the `float()` function.
- To convert a floating-point to an integer use the `int()` function.

In [None]:
x = 1
print(type(x))
print(x)


In [None]:
y = float(x)
print (type(y))
print (y)

In [None]:
print (int(y))

# Converting to and from ASCII values

- The functions `chr()` and `ord()` can be used to convert characters from and to [ASCII](https://en.wikipedia.org/wiki/ASCII).


In [None]:
print (ord('a'))

In [None]:
print (chr(97))

# Variables are not typed

- _Variables_ themselves, on the other hand, do not have a fixed type.
- It is only the values that they refer to that have a type.
- This means that the type referred to by a variable can change as more statements are interpreted.


In [None]:
y = 'hello'
print ('The type of the value referred to by y is ', type(y))
y = 5.0
print ('And now the type of the value is ', type(y))

# Polymorphism

- The meaning of an operator depends on the types we are applying it to.



In [None]:
1 / 5

In [None]:
1.0 / 5.0

In [None]:
1 + 1

In [None]:
'a' + 'b'

In [None]:
'1' + '1'

# Conditional Statements and Indentation


- The syntax for control structures in Python use _colons_ and _indentation_.

- Beware that white-space affects the semantics of Python code.



In [None]:
x = 5
if x > 0:
    print ('x is strictly positive')
    print (x)
    
print ('finished.')

In [None]:
x = 0
if x > 0:
    print ('x is strictly positive')
print (x)
    
print ('finished')

# Lists



We can use _lists_ to hold an ordered sequence of values.

In [None]:
l = ['first', 'second', 'third']
print (l)

Lists can contain different types of variable, even in the same list.

In [None]:
another_list = ['first', 'second', 'third', 1, 2, 3]
print (another_list)

# Mutable Datastructures

Lists are _mutable_; their contents can change as more statements are interpreted.

In [None]:
l.append('fourth')
print (l)

# References

- Whenever we bind a variable to a value in Python we create a *reference*.

- A reference is distinct from the value that it refers to.

- Variables are names for references.


In [None]:
X = [1, 2, 3]
Y = X

- The above code creates two different references (named `X` and `Y`) to the *same* value `[1, 2, 3]`

- Because lists are mutable, changing them can have side-effects on other variables.

- If we append something to `X` what will happen to `Y`?

In [None]:
X.append(4)
X

In [None]:
Y

# State and identity

- The state referred to by a variable is *different* from its identity.

- To compare *state* use the `==` operator.

- To compare *identity* use the `is` operator.

- When we compare identity we check equality of references.

- When we compare state we check equality of values.


In [None]:
X = [1, 2]
Y = [1]
Y.append(2)

In [None]:
X == Y

In [None]:
X is Y

In [None]:
Y.append(3)
X


In [None]:
X = Y

In [None]:
X is Y

# Iteration

- We can iterate over each element of a list in turn using a `for` loop:


In [None]:
for i in l:
    print (i)

- To perform a statement a certain number of times, we can iterate over a list of the required size.

In [None]:
for i in [0, 1, 2, 3]:
    print ("Hello!")

# For loops with the range function

- To save from having to manually write the numbers out, we can use the function `range()` to count for us.  As in Java and C, we count starting at 0.


In [None]:
range(4)

In [None]:
for i in range(4):
    print ("Hello!")

# List Indexing

- Lists can be indexed using square brackets to retrieve the element stored in a particular position.





In [None]:
print (l[0])

In [None]:
print (l[1])

# List Slicing

- We can also a specify a _range_ of positions.  

- This is called _slicing_.

- The example below indexes from position 0 (inclusive) to 2 (exclusive).



In [None]:
print (l[0:2])

- If we leave out the starting index it implies the beginning of the list:



In [None]:
print (l[:2])

- If we leave out the final index it implies the end of the list:

In [None]:
print (l[2:])

# Negative Indexing

- Negative indices count from the end of the list:



In [None]:
print (l[-1])

In [None]:
print (l[:-1])

# Collections

- Lists are an example of a *collection*.

- A collection is a type of value that can contain other values.

- There are other collection types in Python:

    - `tuple`
    - `set`
    - `dict`

# Tuples

- Tuples are another way to combine different values.

- The combined values can be of different types.

- Like lists, they have a well-defined ordering and can be indexed.

- To create a tuple in Python, use round brackets instead of square brackets

In [None]:
tuple1 = (50, 'hello')
tuple1

In [None]:
tuple1[0]

In [None]:
type(tuple1)

# Tuples are immutable

- Unlike lists, tuples are *immutable*.  Once we have created a tuple we cannot add values to it.



In [None]:
tuple1.append(2)

# Sets

- Lists can contain duplicate values.

- A set, in contrast, contains no duplicates.

- Sets can be created from lists using the `set()` function.




In [None]:
X = set([1, 2, 3, 3, 4])
X

In [None]:
type(X)

- Alternatively we can write a set literal using the `{` and `}` brackets.

In [None]:
X = {1, 2, 3, 4}
type(X)

# Sets are mutable

- Sets are mutable like lists:

In [None]:
X.add(5)
X

- Duplicates are automatically removed

In [None]:
X.add(5)
X


# Sets are unordered

- Sets do not have an ordering.

- Therefore we cannot index or slice them:



In [None]:
X[0]

# Operations on sets

- Union: $X \cup Y$


In [None]:
X = {1, 2, 3}
Y = {4, 5, 6}
X.union(Y)

- Intersection: $X \cap Y$:

In [None]:
X = {1, 2, 3, 4}
Y = {3, 4, 5}
X.intersection(Y)

- Difference $X - Y$:


In [None]:
X - Y

# Arrays

- Python also has fixed-length arrays which contain a single type of value

- i.e. we cannot have different types of value within the same array.   

- Arrays are provided by a separate _module_ called numpy.  Modules correspond to packages in e.g. Java.

- We can import the module and then give it a shorter _alias_.


In [None]:
import numpy as np

- We can now use the functions defined in this package by prefixing them with `np`.  

- The function `array()` creates an array given a list.

In [None]:
x = np.array([0, 1, 2, 3, 4])
print (x)
print (type(x))

# Functions over arrays

- When we use arithmetic operators on arrays, we create a new array with the result of applying the operator to each element.

In [None]:
y = x * 2
print (y)

- The same goes for functions:

In [None]:
x = np.array([-1, 2, 3, -4])
y = abs(x)
print (y)

# Populating Arrays

- To populate an array with a range of values we use the `np.arange()` function:


In [None]:
x = np.arange(0, 10)
print (x)

- We can also use floating point increments.


In [None]:
x = np.arange(0, 1, 0.1)
print (x)

# Basic Plotting

- We will use a module called `matplotlib` to plot some simple graphs.

- This module provides functions which are very similar to MATLAB plotting commands.


In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

y = x*2 + 5
plt.plot(x, y)


# Plotting a sine curve

In [None]:
from numpy import pi, sin

x = np.arange(0, 2*pi, 0.01)
y = sin(x)
plt.plot(x, y)

# Plotting a histogram

- We can use the `hist()` function in `matplotlib` to plot a histogram

In [None]:
# Generate some random data
data = np.random.randn(1000)

ax = plt.hist(data)

# Computing histograms as matrices

- The function `histogram()` in the `numpy` module will count frequencies into bins and return the result as a 2-dimensional array.

In [None]:
np.histogram(data)

# Defining new functions



In [None]:
def squared(x):
    return x ** 2

print (squared(5))

# Local Variables

- Variables created inside functions are _local_ to that function.

- They are not accessable to code outside of that function.

In [None]:
def squared(x):
    result = x ** 2
    return result

print (squared(5))

In [None]:
print (result)

# Functional Programming

- Functions are first-class citizens in Python.

- They can be passed around just like any other value.

More info: https://www.python-course.eu/python3_lambda.php

In [None]:
print(squared)

In [None]:
y = squared
print (y)

In [None]:
print (y(5))

# Mapping the elements of a collection

- We can apply a function to each element of a collection using the built-in function `map()`.

- This will work with any collection: list, set, tuple or string.

- This will take as an argument _another function_, and the list we want to apply it to.

- It will return the results of applying the function, as a list.

In [None]:
l = map(squared, [1, 2, 3, 4])
list(l)

# List Comprehensions

- Because this is such a common operation, Python has a special syntax to do the same thing, called a _list comprehension_.


In [None]:
[squared(i) for i in [1, 2, 3, 4]]

- If we want a set instead of a list we can use a set comprehension

In [None]:
{squared(i) for i in [1, 2, 3, 4]}

# Cartesian product using list comprehensions



<img src="files/220px-Cartesian_Product_qtl1.svg.png">

The [Cartesian product](https://en.wikipedia.org/wiki/Cartesian_product) of two collections $X = A \times B$ can be expressed by using multiple `for` statements in a comprehension.


In [None]:
A = {'x', 'y', 'z'}
B = {1, 2, 3}
{(a,b) for a in A for b in B}

# Cartesian products with other collections

- The syntax for Cartesian products can be used with any collection type.


In [None]:
first_names = ('Steve', 'John', 'Peter')
surnames = ('Smith', 'Doe')

[(first_name, surname) for first_name in first_names for surname in surnames]

# Anonymous Function Literals

- We can also write _anonymous_ functions.
- These are function literals, and do not necessarily have a name.
- They are called _lambda expressions_ (after the $\lambda-$calculus).

In [None]:
l=map(lambda x: x ** 2, [1, 2, 3, 4])
list(l)

# Filtering data

- We can filter a list by applying a _predicate_ to each element of the list.

- A predicate is a function which takes a single argument, and returns a boolean value.

- `filter(p, X)` is equivalent to $\{ x : p(x) \; \forall x \in X \}$ in set-builder notation.


In [None]:
l=filter(lambda x: x > 0, [-5, 2, 3, -10, 0, 1])
list(l)

We can use both `filter()` and `map()` on other collections such as strings or sets.

In [None]:
l=filter(lambda x: x != ' ', 'hello world')
list(l)

In [None]:
l=map(ord, 'hello world')
list(l)

In [None]:
l=filter(lambda x: x > 0, {-5, 2, 3, -10, 0, 1})
list(l)

# Filtering using a list comprehension

- Again, because this is such a common operation, we can use simpler syntax to say the same thing.

- We can express a filter using a list-comprehension by using the keyword `if`:

In [None]:
data = [-5, 2, 3, -10, 0, 1]
[x for x in data if x > 0]

- We can also filter and then map in the same expression:

In [None]:
from numpy import sqrt
[sqrt(x) for x in data if x > 0]

# The reduce function

- The `reduce()` function recursively applies another function to pairs of values over the entire list, resulting in a _single_ return value.

In [None]:
# reduce is removed from the core of python 3. Have to import it.
import functools
functools.reduce(lambda x, y: x + y, [0, 1, 2, 3, 4, 5])

# Big Data

- The `map()` and `reduce()` functions form the basis of the map-reduce programming model.

- [Map-reduce](https://en.wikipedia.org/wiki/MapReduce) is the basis of modern highly-distributed large-scale computing frameworks.

- It is used in BigTable, Hadoop and Apache Spark. 

- See [these examples in Python](https://spark.apache.org/examples.html) for Apache Spark.

# Reading Text Files

- To read an entire text file as a list of lines use the `readlines()` method of a file object.


In [None]:
f = open('/etc/group')
result = f.readlines()
f.close()


In [None]:
# Print the first line
print (result[0])

To concatenate into a single string:


In [None]:
single_string = ''.join(result)

In [None]:
print(single_string)