# Basics of programming in Python
## Memory and memory addressing
For an analyst Python is an excellent programming language, partly because of its efficiency. Because we are efficiency-oriented, we have to undestand at least the basics of memory management and addressing in programming, especially in Python. Understanding of this issue is important not only for our programs' running time, but first of all - their correctness. This notebook presents slightly simplified and hopefully clear explanations.

Effective memory management is crucial because of two reasons: **memory is slow**, and in the age of large datasets **memory is valuable**. Because of these reasons we will avoid copying objects and rewriting them in a different place by all means. You may think that RAM is "fast", because it is much faster than HDD and even SSD. But it is much slower than contemporary CPUs (this is why CPU cache exists, link below). We will avoid copying, reading and writing because of our programs' efficiency.

Every object which is stored in memory, no matter its size, has its own address. It is true both for small (single int) and large (enormous dataset for analysis) objects. Objects' addresses are always "small", even if object itself is very large. This is why it is much easier to pass information about object's address than pass the whole object - create a copy of it in the memory.

Look at a simple example:

In [None]:
a = 3
b = a
print(a, b)
b = 4
print(a, b)

As you can see, for a number, assignment operator "=" copies an object. How does this operator behave for a list?

In [None]:
colors = ["red", "blue", "green"]
colors2 = colors
colors2.append("black")
print(colors)

After creating colors2 variable you could expect a copied object. It seems that appending "black" to colors2 should not have any effect on colors. However, for an object of type list operator "=" copies address (reference/alias) to an object. After a line:

colors2 = colors

both colors and colors2 variables contain the address of the same list. You could think about it as writing the address of a building on two different sheets of paper. If you append "black" to a list with a given address (second sheet of paper - colors2), when returning to the same address (read from the first sheet of paper - colors), you will see the only list (the same building) which exists in memory.

Go back to the previous example and try to understand what happens there. This example perfectly shows that you may cause errors which are difficult for debugging, if you write code without understanding references.

In [None]:
colors = ["red", "blue", "green"]
numbers = [4, 5, 6]

mixedList1 = colors
mixedList1.append(numbers)
print(mixedList1)

mixedList2 = []
mixedList2.append(colors)
mixedList2.append(numbers)
print(mixedList2)

If you write *mixedList1 = colors* instead of *mixedList1 = list(colors)*, variable mixedList1 is not the address of a copy, but only a new address of the old object. This is why when writing:

mixedList2.append(colors)

you append to the first place of a new list a mixed list (earlier, you have modified the list which had had the "address" colors)

See the correct code, which shows two ways to create a new object (a copy):

In [None]:
colors = ["red", "blue", "green"]
numbers = [4, 5, 6]

mixedList1 = list(colors)
# or
mixedList1 = colors.copy()

mixedList1.append(numbers)
print(mixedList1)

mixedList2 = []
mixedList2.append(colors)
mixedList2.append(numbers)
print(mixedList2)

* Introduction of 64-bit CPUs is directly connected with memory addressing: https://www.youtube.com/watch?v=KgiMzKb8dD0
* For the curious, how important are levels of cache:
https://www.extremetech.com/extreme/188776-how-l1-and-l2-cpu-caches-work-and-why-theyre-an-essential-part-of-modern-chips
* For the very curious, how CPU works on the low level:
https://www.youtube.com/watch?v=cNN_tTXABUA
* If you are deeply interested in programming, you should understand the difference between objects, references and pointers.

## Flow control

### For, ranges and iterators
There is an easy way to create ranges of numbers in Python. See a few examples using "for" and iterator "range":

In [None]:
# If you want to simply print a range of numbers, you will see a "strange" result:
print(range(4))
# Output "range(0,4)" tells you what has been created.
# It does not tell you about all the elements it can show.
print("Print all elements in range(4): ")
for i in range(4):
    print(i)
# See two other examples:
print("Print all elements in range(2, 10, 2): ")
for i in range(2, 10, 2):
    print(i)

print("Print all elements in range(0, -11, -3): ")
for i in range(0, -11, -3):
    print(i)

Iterators allow you to traverse a container (e.g. a list), when you want to see what each element contains.

In [None]:
colors = ["red", "blue", "green"]
for color in colors:
    print(color)

If you try to do a similar thing for dictionaries, the iterator returns two-element tuples containing pairs from the dictionary. In practice it is not very convenient.

If you do not want to return a tuple, but two variables instead, you can use automatic unzipping of tuples. As you can see below, if you pass a number of arguments equal to the length of a single tuple, Python unzipped them.

In [None]:
author = {'name': 'Maciej', 'surname': 'Wilamowski', 'age': 32}
for element in author.items():
    print(element)
print("\nUnzipped tuples: ")
for key, value in author.items():
    print(key, value)

### Enumerate
In some cases you may not only need information about list's elements' content, but also about their indices. enumerate(), a counting iterator, is used for this purpose:

In [None]:
for i, color in enumerate(colors):
    print(i, color)

### Zip
Sometimes you may have two lists, over which you want to iterate simultaneously. zip() joins the lists and retuns their elements as a tuple. Number of elements returned by zip() is equal to length of the shortest list.

In [None]:
colors = ["red", "blue", "green"]
numbers = [4, 5, 6, 7]
names = ["Matt", "Ben", "John", "Adam", "Jim"]

for color, number in zip(colors,numbers):
    print(color, number)

print("\nZip for 3 elements")
for color, number, name in zip(colors,numbers,names):
    print(color, number, name)

### List Comprehensions
Calling functions/operations on all list elements is used so often, that there is a special syntax/instruction for that (list comprehensions), which creates a list based on another existing list. This is a one-line for loop, which has the following syntax:

[what_to_do(x) for x in some_list optional_logical_condition]

For example:

In [None]:
list1 = list(range(5))
print([x**2 for x in list1])
# You could perform this operation only for even numbers.
print([x**2 for x in list1 if x % 2 == 0])
# The operation may have more than one argument.
list2 = list(range(2, 12, 2))
print([x * y for (x, y) in zip(list1, list2)])

### If and while
There are two more basic flow control tools: if and while. Their implementation is fully analogous to other programming languages.

In [None]:
x = 3
if x < 2:
    print("Value below 2")
elif x > 10:
    print("Value above 10")
else:
    print("Value between 2 and 10, inclusive")

In [None]:
import math
# Description of other functions available in math module.
# https://docs.python.org/3/library/math.html
math.pow(2, 3)
tol = 0.1
diff = 1
k = 1
while(diff > tol):
    diff = math.e - abs(math.pow((1 + 1 / k), k))
    print(k, math.pow((1 + 1 / k), k), diff)
    k += 1

### Continue
Sometimes you may want to skip a loop iteration. You could use continue statement for that. For example:

In [None]:
for i in range(11):
    if i % 3 == 0:
        continue
    else:
        print(i)

### Break
A loop (for and while) may be stopped using break statement.

In [None]:
import math
# Description of other functions available in math module.
# https://docs.python.org/3/library/math.html
math.pow(2, 3)
tol = 0
diff = 1
k = 1
while(diff > tol):
    diff = math.e - abs(math.pow((1 + 1 / k), k))
    print(k, math.pow((1 + 1 / k), k), diff)
    k += 1
    if k > 15:
        print("Value of tol (tolerance) is probably wrong... break.")
        break

## Error handling
When you use Python for data analysis you may experience errors relatively often. The simplest examples are missing values or dividing by 0. You often do not want to stop the whole program because of that.

In the code below the program returns an error in the third line and does not execute the fourth (you may check it by running the next cell).

In [None]:
a = 0
b = 4
c = b / a
d = a + b

In [None]:
d

In [None]:
a = 0
b = 4
try:
    c = b / a
# In the case of division, the only error you may expect is:
except ZeroDivisionError as e:
    print("You tried to divide by zero!")
    c = b * float('inf')
# You do not really expect an exception here.
d = a + b
print (c, d)

Because you want to know how to handle a given error (know what to do when it happens, for example assign "inf") you should not catch all exceptions (in the cell below, the last statement will not run). However in some cases, especially during writing or testing code catching all exceptions may be useful.

In [None]:
a = 0
b = 4
try:
    f = b / a
except Exception as e:
    print (e.__doc__)
    
g = a + b
print (f, g)

This is why you may want to find the error, run additional code (e.g. logging), and then stop the script regardless.

In [None]:
a = 0
b = 4
try:
    f = b / a
except Exception as e:
    print (e.__doc__)
    print ("Error, stopping the script.")
    raise 
    
g = a + b
print (f, g)

In practice error handling may be more advanced. Now you do not need to know anything more. For the curious, read the following links:
* https://docs.python.org/3/tutorial/errors.html
* https://jeffknupp.com/blog/2013/02/06/write-cleaner-python-use-exceptions/
* http://www.pythonforbeginners.com/error-handling/exception-handling-in-python
* http://eli.thegreenplace.net/2008/08/21/robust-exception-handling/