# Python the basics: A quick recap

> *DS Data manipulation, analysis and visualisation in Python*  
> *December, 2016*

> *© 2016, Joris Van den Bossche and Stijn Van Hoey  (<mailto:jorisvandenbossche@gmail.com>, <mailto:stijnvanhoey@gmail.com>). Licensed under [CC BY 4.0 Creative Commons](http://creativecommons.org/licenses/by/4.0/)*


## First steps

the obligatory...

In [635]:
print("Hello DS_course!") # python 3(!)

Hello DS_course!


python is a calculator

In [636]:
4*5

20

In [637]:
3**2

9

In [638]:
(3 + 4)/2, 3 + 4/2, 

(3.5, 5.0)

In [639]:
21//5, 21%5  # floor division, modulo

(4, 1)

Variable assignment

In [640]:
my_variable_name = 'DS_course'
my_variable_name

'DS_course'

In [641]:
name, age = 'John', 30
print('The age of {} is {:d}'.format(name, age))

The age of John is 30


More information on print format (python2 vs python 3): https://pyformat.info/

## Loading functionalities

In [642]:
import os

In [643]:
os.listdir()

['00-jupyter_introduction.ipynb',
 '02-control_flow.ipynb',
 'data',
 'rehears1.py',
 '.ipynb_checkpoints',
 '__pycache__',
 '05-numpy.ipynb',
 '03-functions.ipynb',
 '01-basic.ipynb',
 '04-reusing_code.ipynb',
 'rehears2.py',
 'python_rehearsel.ipynb',
 'test.py']

Loading with defined prefix (community agreement)

In [644]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Loading functions from any file/module/package:

In [645]:
%%file rehears1.py  
#this writes a file in your directory, check it(!)

"A demo module."

def print_it():
    """Dummy function to print the string it"""
    print('it')

Overwriting rehears1.py


In [646]:
from rehears1 import print_it

In [647]:
print_it()

it


In [648]:
%%file rehears2.py  
#this writes a file in your directory, check it(!)

"A demo module."

def print_it():
    """Dummy function to print the string it"""
    print('it')

def print_custom(my_input):
    """Dummy function to print the string that"""
    print(my_input)

Overwriting rehears2.py


In [649]:
from rehears2 import print_it, print_custom

In [650]:
print_custom('DS_course')

DS_course


<div class="alert alert-danger">
    <b>DON'T</b>: `from os import *` # just don't
</div>

## Datatypes

### Numerical

**floats**

In [651]:
a_float = 5.

In [652]:
type(a_float)

float

**integers**

In [653]:
an_integer = 4

In [654]:
type(an_integer)

int

**booleans**

In [655]:
a_boolean = True
a_boolean

True

In [656]:
3 > 4 # results in boolean

False

### Containers

**Strings**

In [657]:
a_string = "abcde"
a_string

'abcde'

In [658]:
a_string.capitalize(), a_string.capitalize(), a_string.endswith('f') #,...

('Abcde', 'Abcde', False)

In [659]:
a_string.upper().replace('B', 'A')

'AACDE'

In [660]:
a_string + a_string, a_string*5

('abcdeabcde', 'abcdeabcdeabcdeabcdeabcde')

**lists**

In [661]:
a_list = [1, 'a', 3, 4]

In [662]:
a_list.append(8.2)
a_list

[1, 'a', 3, 4, 8.2]

In [663]:
a_list.reverse()
a_list

[8.2, 4, 3, 'a', 1]

<div class="alert alert-info">
    <b>REMEMBER</b>:  The list is updated in-place; a_list.reverse() does not return anything, it updates the list
</div>

In [668]:
a_list + ['b', 5]

[8.2, 4, 3, 'a', 1, 'b', 5]

In [669]:
[el*2 for el in a_list]  # list comprehensions...a short for-loop

[16.4, 8, 6, 'aa', 2]

In [670]:
[el for el in dir(list) if not el[0]=='_']

['append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

<div class="alert alert-success">
    <b>EXERCISE</b>: Rewrite the previous list comprehension by using a builtin string method to test if the element starts with an underscore
</div>

In [671]:
[el for el in dir(list) if not el.startswith('_')]

['append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

<div class="alert alert alert-success">
    <b>EXERCISE</b>: Given the sentence `the quick brown fox jumps over the lazy dog`, split the sentence in words and put all the word-lengths in a list. 
</div>

In [672]:
sentence = "the quick brown fox jumps over the lazy dog"
#split in words and get word lengths
[len(word) for word in sentence.split()]

[3, 5, 5, 3, 5, 4, 3, 4, 3]

**dictionaries**

In [673]:
a_dict = {'a': 1, 'b': 2}

In [674]:
a_dict['c'] = 3
a_dict['a'] = 5

In [675]:
a_dict

{'a': 5, 'b': 2, 'c': 3}

In [676]:
a_dict.keys(), a_dict.values(), a_dict.items()

(dict_keys(['a', 'c', 'b']),
 dict_values([5, 3, 2]),
 dict_items([('a', 5), ('c', 3), ('b', 2)]))

In [677]:
an_empty_dic = dict() # or just {}
an_empty_dic

{}

**tuple**

In [678]:
a_tuple = (1, 2, 4)

<div class="alert alert-info">
    <b>REMEMBER</b>:  (), [], {} => depends from the datatype you want to create!
</div>

In [679]:
collect = a_list, a_dict

In [680]:
type(collect)

tuple

In [681]:
serie_of_numbers = 3, 4, 5

In [682]:
# Using tuples on the left-hand side of assignment allows you to extract fields
a, b, c = serie_of_numbers

In [683]:
print(c, b, a)

5 4 3


### Accessing container values

In [684]:
a_string[2:5]

'cde'

In [685]:
a_list[-2]

'a'

In [686]:
a_list = [0, 1, 2, 3]

In [687]:
a_list[:3]

[0, 1, 2]

In [688]:
a_list[::2]

[0, 2]

<div class="alert alert-success">
    <b>EXERCISE</b>: Reverse the `a_list` without using the built-in reverse method, but using an appropriate slicing command:
</div>

In [689]:
a_list[::-1]

[3, 2, 1, 0]

In [690]:
a_dict['a']

5

In [691]:
a_tuple[1]

2

<div class="alert alert-info">
    <b>REMEMBER</b>: 
        <li> [] for accessing elements
</div>

Assigning new values to items -> `mutable` vs `immutable`

In [692]:
a_list[2] = 10 #element 2 changed -- mutable
a_list

[0, 1, 10, 3]

In [693]:
a_tuple[1] = 10 # cfr. a_string -- immutable
a_string[3] = 'q'

TypeError: 'tuple' object does not support item assignment

## Control flows

### for-loop

In [694]:
for i in [1, 2, 3, 4]:
    print(i, end='\t')

1	2	3	4	

In [695]:
for i in a_list: # anything that is a collection/container can be looped
    print(i, end='')

01103

<div class="alert alert-success">
    <b>EXERCISE</b>: Loop through the characters of the string `Hello DS` and print each character on a new line
</div>

In [696]:
for char in 'Hello DS':
    print(char)

H
e
l
l
o
 
D
S


In [697]:
for i in a_dict: # items, keys, values
    print(i)

a
c
b


In [698]:
for j, key in enumerate(a_dict.keys()):
    print(j, key)

0 a
1 c
2 b


<div class="alert alert-info">
    <b>REMEMBER</b>: When needing a iterator to count, just use `enumerate`. you mostly do not need i = 0 for... i = i +1;check [itertools](http://pymotw.com/2/itertools/) as well...
</div>

### while

In [699]:
b = 7
while b < 10:
    b+=1
    print(b)

8
9
10


### if statement

In [700]:
if 'a' in a_dict:
    print('a is in!')

a is in!


In [701]:
if 3 > 4:
    print('This is valid')

In [702]:
testvalue = False  # 0, 1, None, False, 4 > 3
if testvalue:
    print('valid')
else:
    raise Exception("Not valid!")

Exception: Not valid!

In [703]:
myvalue = 3
if isinstance(myvalue, str):
    print('this is a string')
elif isinstance(myvalue, float):
    print('this is a float')
elif isinstance(myvalue, list):
    print('this is a list')
else:
    print('no idea actually')

no idea actually


## Functions

We've been using functions the whole time...

In [704]:
len(a_list)

4

Calling a method on an object:

In [705]:
a_list.reverse()
a_list

[3, 10, 1, 0]

Defining a function:

In [706]:
def f(a, b, verbose=False):
    """custom summation function
    
    Parameters
    ----------
    a : number
        first number to sum
    b : number
        second number to sum   
    verbose: boolean
        require additional information (True) or not (False)
    
    Returns
    -------
    my_sum : number
        sum of the provided two input elements
    """
    
    if verbose:
        print('print a lot of information to the user')
        
    my_sum = a + b    
    
    return my_sum

In [707]:
f(2, 3, verbose=False)  # [3], '4'

5


<div class="alert alert-info">
    <b>REMEMBER</b>: () for calling functions
</div>

**Functions are objects** as well... (!)

In [708]:
def f1():
    print('this is function 1 speaking...')

def f2():
    print('this is function 2 speaking...')

In [709]:
def function_of_functions(inputfunction):
    return inputfunction()

In [710]:
function_of_functions(f1)

this is function 1 speaking...


**Anonymous functions (lambda)**

In [711]:
add_two = (lambda x: x + 2)

In [712]:
add_two(10)

12

## Numpy

In [713]:
import numpy as np

### Creating numpy array

In [714]:
np.array([1, 1.5, 2, 2.5])  #np.array(anylist)

array([ 1. ,  1.5,  2. ,  2.5])

In [715]:
np.arange(5, 10, 2)

array([5, 7, 9])

In [716]:
np.linspace(5, 9, 3)

array([ 5.,  7.,  9.])

In [717]:
np.zeros((5, 2)), np.ones(5)

(array([[ 0.,  0.],
        [ 0.,  0.],
        [ 0.,  0.],
        [ 0.,  0.],
        [ 0.,  0.]]), array([ 1.,  1.,  1.,  1.,  1.]))

In [718]:
np.zeros((5, 2)).shape, np.zeros((5, 2)).size

((5, 2), 10)

### Slicing

In [719]:
my_array = np.random.random_integers(2, 10, 10)
my_array

array([10,  9,  8,  5, 10,  3,  2,  6,  9, 10])

In [720]:
my_array[-2:]

array([ 9, 10])

Assign new values to items

In [721]:
my_array[0:7:2]

array([10,  8, 10,  2])

In [722]:
my_array[:2] = 10
my_array

array([10, 10,  8,  5, 10,  3,  2,  6,  9, 10])

In [723]:
my_array = my_array.reshape(5, 2)
my_array

array([[10, 10],
       [ 8,  5],
       [10,  3],
       [ 2,  6],
       [ 9, 10]])

In [724]:
my_array[0, :]

array([10, 10])

### Element-wise operations

In [733]:
my_array = np.random.random_integers(2, 10, 10)

In [734]:
my_array%3  # == 0

array([2, 1, 0, 2, 1, 2, 1, 1, 1, 1])

In [735]:
np.exp(my_array), np.sin(my_array), np.max(my_array)

(array([  1.48413159e+02,   2.20264658e+04,   2.00855369e+01,
          2.98095799e+03,   2.20264658e+04,   7.38905610e+00,
          5.45981500e+01,   5.45981500e+01,   1.09663316e+03,
          5.45981500e+01]),
 array([-0.95892427, -0.54402111,  0.14112001,  0.98935825, -0.54402111,
         0.90929743, -0.7568025 , -0.7568025 ,  0.6569866 , -0.7568025 ]),
 10)

In [736]:
np.cumsum(my_array) == my_array.cumsum()

array([ True,  True,  True,  True,  True,  True,  True,  True,  True,  True], dtype=bool)

In [737]:
my_array.max(axis=0)

10

In [738]:
my_array*my_array  # element-wise

array([ 25, 100,   9,  64, 100,   4,  16,  16,  49,  16])

<div class="alert alert-info">
    <b>REMEMBER</b>: The operations do work on all elements of the array at the same time, you don't need a <strike>`for` loop<strike>
</div>

In [739]:
a_list = range(1000)
%timeit [i**2 for i in a_list]

1000 loops, best of 3: 336 µs per loop


In [740]:
an_array = np.arange(1000)
%timeit an_array**2

The slowest run took 15.86 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 1.74 µs per loop


### boolean indexing and filtering(!)

In [741]:
row_array = np.random.randint(1, 20, 10)
row_array

array([15,  1, 12,  1, 15, 13,  1,  5,  8, 15])

Conditions can be checked (element-wise):

In [742]:
row_array > 5

array([ True, False,  True, False,  True,  True, False, False,  True,  True], dtype=bool)

In [743]:
boolean_mask = row_array > 5

You can use this as a filter to select elements of an array:

In [744]:
row_array[boolean_mask]

array([15, 12, 15, 13,  8, 15])

or, also to change the values in the array corresponding to these conditions:

In [745]:
row_array[boolean_mask] = 20
row_array

array([20,  1, 20,  1, 20, 20,  1,  5, 20, 20])

in short - making the values equal to 20 now -20:

In [746]:
row_array[row_array == 20] = -20
row_array

array([-20,   1, -20,   1, -20, -20,   1,   5, -20, -20])

This requires some practice...

In [747]:
AR = np.random.random_integers(0, 20, 15)
AR

array([12,  5,  3,  6,  5,  2, 13,  7, 12, 10,  4, 11, 17, 16, 20])

<div class="alert alert-success">
    <b>EXERCISE</b>: Count the number of values in AR that are larger than 10 (note: you can count with True = 1 and False = 0)
</div>

In [748]:
sum(AR > 10)

7

<div class="alert alert-success">
    <b>EXERCISE</b>: Change all even numbers of `AR` into zero-values.
</div>

In [749]:
AR[AR%2 == 0] = 0
AR

array([ 0,  5,  3,  0,  5,  0, 13,  7,  0,  0,  0, 11, 17,  0,  0])

<div class="alert alert-success">
    <b>EXERCISE</b>: Change all even positions of matrix AR into 30 values
</div>

In [750]:
AR[1::2] = 30
AR

array([ 0, 30,  3, 30,  5, 30, 13, 30,  0, 30,  0, 30, 17, 30,  0])

In [751]:
AR2 = np.random.random(10)
AR2

array([ 0.64285626,  0.44657004,  0.71352247,  0.90797189,  0.37213776,
        0.62817756,  0.92829218,  0.13112223,  0.41866976,  0.72065112])

<div class="alert alert-success">
    <b>EXERCISE</b>: Select all values above the 75th percentile of the array AR2 ad take the square root of these values
</div>

In [752]:
np.sqrt(AR2[AR2 > np.percentile(AR2, 75)])

array([ 0.95287559,  0.96347921,  0.84891173])

In [753]:
AR3 = np.array([-99., 2., 3., 6., 8, -99., 7., 5., 6., -99.])

<div class="alert alert-success">
    <b>EXERCISE</b>: Convert all values -99 of the array AR3 into Nan-values (Note that Nan values can be provided in float arrays as `np.nan`)
</div>

In [754]:
AR3[AR3 == -99.] = np.nan
AR3

array([ nan,   2.,   3.,   6.,   8.,  nan,   7.,   5.,   6.,  nan])

Ready for some real **Pandas!**