# MCSC1 - Material do curso de Métodos Computacionais em Sistemas Complexos I

- *Périodo de aulas:* 06/08 a 21/12 ($\approx20$ aulas)
- *Local:* G68 - sala 11
- *Horário:* às quartas-feiras de 13:30 a 17:00
- *Forma de avaliação:* a definir
- Livro texto: José Unpingco, *Python for Probability, Statistics, and Machine Learning*, Springer (2016).

[![Alt Text](https://images.springer.com/sgw/books/medium/9783319307152.jpg)](http://library1.org/_ads/E844412DCACEB5A9BF29267FA244E908)


## Sumário

1. Getting Started with Scientific Python
2. Probability
3. Statistics
4. Machine Learning

# 01 - Getting Started with Scientific Python

## Python

>**Python is an interpreted language.** This means that Python codes run on a Python virtual machine that provides a layer of abstraction between the code and the platform it runs on, thus making codes portable across different platforms. For example, the same script that runs on a Windows laptop can also run on a Linux-based supercomputer or on a mobile phone. This makes programming easier because the virtual machine handles the low-level details of implementing the business logic of the script on the underlying platform.

## Python

>**Python is a dynamically typed language, which means that the interpreter itself figures out the representative types (e.g., floats, integers) interactively or at run-time.** This is in contrast to a language like Fortran that have compilers that study the code from beginning to end, perform many compiler-level optimizations, link intimately with the existing libraries on a specific platform, and then create an executable that is henceforth liberated from the compiler. As you may guess, the compiler’s access to the details of the underlying platform means that it can utilize optimizations that exploit chip-specific features and cache memory. Because the virtual machine abstracts away these details, **it means that the Python language does not have programmable access to these kinds of optimizations.** So, where is the balance between the ease of programming the virtual machine and these key numerical optimizations that are crucial for scientific work?

## Python

>**The balance comes from Python’s native ability to bind to compiled Fortran and C libraries.** This means that you can send intensive computations to compiled libraries directly from the interpreter. This approach has two primary advantages. First, it give you the fun of programming in Python, with its expressive syntax and lack of visual clutter. This is a particular boon to scientists who typically want to use software as a tool as opposed to developing software as a product. The second advantage is that you can mix-and-match different compiled libraries from diverse research areas that were not otherwise designed to work together. This works because Python makes it easy to allocate and fill memory in the interpreter, pass it as input to compiled libraries, and then retrieve the output back at the interpreter.

## Python

>Moreover, **Python provides a multiplatform solution for scientific codes.** As an open-source project, Python itself is available anywhere you can build it, even though it typically comes standard nowadays, as part of many operating systems. This means that once you have written your code in Python, you can just transfer the script to another platform and run it, as long as the compiled libraries are also available there. What if the compiled libraries are absent? Building and configuring compiled libraries across multiple systems used to be a painstaking job, but as scientific Python has matured, a wide range of libraries have now become available across all of the major platforms (i.e., Windows, MacOS, Linux, Unix) as prepackaged distributions.

## Python 

>Finally, scientific Python facilitates maintainability of scientific codes because **Python syntax is clean, free of semi-colon litter and other visual distractions that makes code hard to read and easy to obfuscate.** Python has many built-in testing, documentation, and development tools that ease maintenance. Scientific codes are usually written by scientists unschooled in software development, so having solid software development tools built into the language itself is a particular boon.

- Uma boa introdução ao python é apresentadada em [Think Python 2nd Edition by Allen B. Downey](http://greenteapress.com/thinkpython2/thinkpython2.pdf).
- Se você não tem familiaridade com o python dê, ao menos, uma lida no Cap. 2 das [Scipy Lecture Notes](https://www.scipy-lectures.org/_downloads/ScipyLectures-simple.pdf).
- Ou assista [A hands-on introduction to Python for beginning programmers](https://www.youtube.com/watch?v=rkx5_MRAV3A).
- **Aqui vamos fazer uma breve introdução seguindo as Scipy Lectures**.

## The Python language

>  Python is a **programming language**, as are C, Fortran, BASIC, PHP,
  etc. Some specific features of Python are as follows:

>  * an *interpreted* (as opposed to *compiled*) language. Contrary to e.g.
    C or Fortran, one does not compile Python code before executing it. In
    addition, Python can be used **interactively**: many Python
    interpreters are available, from which commands and scripts can be
    executed.

>  * a free software released under an **open-source** license: Python can
    be used and distributed free of charge, even for building commercial
    software.

>  * **multi-platform**: Python is available for all major operating
    systems, Windows, Linux/Unix, MacOS X, most likely your mobile phone
    OS, etc.

>  * a very readable language with clear non-verbose syntax

>  * a language for which a **large variety of high-quality packages are
    available for various applications, from web frameworks to scientific
    computing.**

>  * a language very easy to interface with other languages, in particular C
    and C++.

>  * Some other features of the language are illustrated just below. For
    example, Python is an object-oriented language, with dynamic typing
    (the same variable can contain objects of different types during the
    course of a program).

### Python 3 or Python 2?

> In 2008, Python 3 was released. It is a major evolution of the language that made a few changes. **Some old scientific code does not yet run under Python 3**. However, this is infrequent and Python 3 comes with many benefits. We advise that you install Python 3.

In [2]:
#python 2

print "Hello, world!"

SyntaxError: Missing parentheses in call to 'print' (<ipython-input-2-651ed81a069e>, line 3)

In [3]:
#python 3

print("Hello, world!")

Hello, world!


In [5]:
#Uma nova característica do python 3.6
#f-strings
ano = 2018
f'Esse curso refere-se ao ano letivo de {ano}'

'Esse curso refere-se ao ano letivo de 2018'

## Installing a working environment

> Python comes in many flavors, and there are many ways to install it. However, **we recommend to install a scientific-computing distribution**, that comes readily with optimized versions of scientific modules.

**Under Linux**
If you have a recent distribution, most of the tools are probably packaged, and it is recommended to use your package manager.

**Other systems**
There are several fully-featured Scientific Python distributions:

- [Anaconda](https://www.continuum.io/downloads)
- [EPD](https://store.enthought.com/downloads/)

## Obtendo a Anaconda

- Faça o download da versão 3.x em https://www.anaconda.com/download/ ou use o [link direto](https://repo.anaconda.com/archive/Anaconda3-5.2.0-Linux-x86_64.sh) para o .sh
- Siga as instruções em https://docs.anaconda.com/anaconda/install/linux

Basicamente é executar
```sh
sh Anaconda3-5.2.0-Linux-x86_64.sh
```
e seguir os passos do instalador.

Vamos usar o Jupyter Notebook para a maioria dos códigos. Falaremos dele mais adiante. Por hora, vamos imaginar que estamos usando o *python shell*.

## First steps in python

To start the interpreter, type:

```sh
python
```
em the sheel.

In [6]:
a = 3
b = 2*1

In [7]:
type(b)

int

In [8]:
print(b)

2


In [9]:
a*b

6

In [10]:
b = 'hello'
type(b)

str

In [11]:
b + b

'hellohello'

In [12]:
2*b

'hellohello'

>Two variables ``a`` and ``b`` have been defined above. Note that one does
  not declare the type of a variable before assigning its value. In C,
  conversely, one should write:

```C
 int a = 3;
```
>  In addition, the type of a variable may change, in the sense that at
  one point in time it can be equal to a value of a certain type, and a
  second point in time, it can be equal to a value of a different
  type. `b` was first equal to an integer, but it became equal to a
  string when it was assigned the value `'hello'`. Operations on
  integers (``b=2*a``) are coded natively in Python, and so are some
  operations on strings such as additions and multiplications, which
  amount respectively to concatenation and repetition.

## Basic types

### Numerical types

In [14]:
# Integer
a = 4
type(a)

int

In [16]:
# Floats
b = 2.1
type(b)

float

In [17]:
# Complex
c = 1.5 + 0.5j
type(c)

complex

In [18]:
f"parte real: {c.real} | parte imaginária: {c.imag}"

'parte real: 1.5 | parte imaginária: 0.5'

In [27]:
# Booleans

test = 3 > 4
type(test)

bool

In [21]:
test

False

In [22]:
test==0

True

In [23]:
test==1

False

In [30]:
test is True

False

## Operators

**Basic arithmetic operations `+, -, *, /, **, %` (modulo) natively implemented**

In [31]:
7*9

63

In [32]:
2**10

1024

In [33]:
8%3

2

In [34]:
#Type conversion (casting):
float(1)

1.0

**The behaviour of the division operator has changed in Python 3.**

In Python 2.x

```python
>>> 3/2
1
```

In [37]:
# In Python 3
3/2

1.5

In [38]:
# The same way
3/2.

1.5

In [39]:
a=3
b=2
a/float(b)

1.5

You can use Future behavior to always get the behavior of Python3

```python
>>> from __future__ import division
>>> 3 / 2
1.5
```

If you explicitly want integer division use `//`

In [40]:
3.0//2.0

1.0

## Containers

Python provides many efficient types of containers, in which collections of objects can be stored.

### Lists

In [41]:
colors = ['red', 'blue', 'green', 'black', 'white']
type(colors)

list

Indexing: accessing individual objects contained in the list:

In [44]:
colors[0]

'red'

In [43]:
colors[-1]

'white'

**Indexing starts at 0 (as in C), not at 1 (as in Fortran or Matlab)!**

Slicing: obtaining sublists of regularly-spaced elements:

**Slicing syntax:** `colors[start:stop:stride]`

- including `start` and excluding `stop`:  `start <= i < stop`
- All slicing parameters are optional

In [46]:
colors[2:4]

['green', 'black']

In [47]:
colors[::2]

['red', 'green', 'white']

In [48]:
colors[3:]

['black', 'white']

In [49]:
colors[:3]

['red', 'blue', 'green']

### Lists are mutable objects and may have different types

In [57]:
colors = ['red', 'blue', 'green', 'black', 'white']

In [58]:
colors[0] = 'yellow'

In [59]:
colors

['yellow', 'blue', 'green', 'black', 'white']

In [60]:
colors[2:4] = ['gray', 'purple']

In [61]:
colors

['yellow', 'blue', 'gray', 'purple', 'white']

In [64]:
#it will include cyan
colors[2:4] = ['gray', 'purple', 'cyan']

In [65]:
colors

['yellow', 'blue', 'gray', 'purple', 'cyan', 'cyan', 'white']

In [67]:
mixedlist = [3, -200, 'hello']
mixedlist

[3, -200, 'hello']

For collections of numerical data that all have the same type, it is often more efficient to use the array type provided by the numpy module. A NumPy array is a chunk of memory containing fixed-sized items. With NumPy arrays, operations on elements can be faster because elements are regularly spaced in memory and more operations are performed through specialized C functions instead of Python loops.

### Python offers a large panel of functions to modify lists

In [81]:
colors = ['red', 'blue', 'green', 'black', 'white']
colors.append('pink') #add item at the end
colors

['red', 'blue', 'green', 'black', 'white', 'pink']

In [82]:
colors.pop() #removes and returns the last item

'pink'

In [83]:
colors

['red', 'blue', 'green', 'black', 'white']

In [84]:
colors.pop(1) #removes and returns the item 1

'blue'

In [85]:
colors

['red', 'green', 'black', 'white']

In [86]:
colors = ['red', 'blue', 'green', 'black', 'white']
rcolors = colors[::-1] #reverse
rcolors

['white', 'black', 'green', 'blue', 'red']

In [87]:
rcolors.reverse() #reverse in-place

In [89]:
rcolors

['red', 'blue', 'green', 'black', 'white']

### Concatenate, repeat and sort lists

In [92]:
rcolors + colors

['red',
 'blue',
 'green',
 'black',
 'white',
 'red',
 'blue',
 'green',
 'black',
 'white']

In [93]:
rcolors*2

['red',
 'blue',
 'green',
 'black',
 'white',
 'red',
 'blue',
 'green',
 'black',
 'white']

In [94]:
sorted(rcolors) #sort and return a new list

['black', 'blue', 'green', 'red', 'white']

In [96]:
rcolors #the list is not modified

['red', 'blue', 'green', 'black', 'white']

In [97]:
rcolors.sort()  # in-place

In [98]:
rcolors #the list is modified

['black', 'blue', 'green', 'red', 'white']

In [103]:
#there are a bunch of methods
dir(rcolors)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__rmul__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

### Strings

Different string syntaxes (simple, double or triple quotes):

In [104]:
s = 'Hello, how are you?'

s = "Hi, what's up"

s = '''Hello,
       how are you''' # tripling the quotes allows the string to span more than one line

s = """Hi,
       what's up?"""

In [105]:
'Hi, what's up?'

SyntaxError: invalid syntax (<ipython-input-105-ff7eba3d017a>, line 1)

In [120]:
'Hi, what\'s up?' # can be solver with backslash

"Hi, what's up?"

Other uses of the backslash are, e.g., the newline character `\n` and the tab character `\t`.

In [106]:
"Hi, what's up?"

"Hi, what's up?"

In [107]:
'"test"'

'"test"'

Strings are collections like lists. Hence they can be indexed and sliced, using the same syntax and rules.

In [108]:
a = "hello"

In [109]:
a[0]

'h'

In [110]:
a[-1]

'o'

In [111]:
a = "hello, world!"
a[3:6]

'lo,'

In [112]:
a[2:10:2]

'lo o'

In [113]:
a[::3]

'hl r!'

Accents and special characters can also be handled in Unicode strings.

In [128]:
a = "Maringá"

In [126]:
a.encode(encoding='utf8')

b'Maring\xc3\xa1'

In [131]:
b'Maring\xc3\xa1'.decode("utf8")

'Maringá'

In [132]:
a.encode('ascii') 

UnicodeEncodeError: 'ascii' codec can't encode character '\xe1' in position 6: ordinal not in range(128)

In [133]:
a.encode('ascii','replace') 

b'Maring?'

In [134]:
a.encode('ascii','ignore') 

b'Maring'

### Converting weird characters to ascii

In [140]:
from unidecode import unidecode

In [141]:
unidecode(a)

'Maringa'

**unidecode** is python module for "ASCII transliterations of Unicode text".

To install

```sh
pip install unidecode
```

A string is an **immutable object** and it is not possible to modify its contents. One may however create new strings from the original one.

In [142]:
a = "hello, world!"
a[2] = 'z'

TypeError: 'str' object does not support item assignment

In [144]:
a.replace('l','z',1) #replace 'l' by 'z'. 1 is the number of occurences to be replaced.

'hezlo, world!'

In [146]:
a #doesn't change the value of a

'hello, world!'

In [147]:
a.replace('l','z')

'hezzo, worzd!'

Strings have a bunch of methods

In [149]:
help(str)

Help on class str in module builtins:

class str(object)
 |  str(object='') -> str
 |  str(bytes_or_buffer[, encoding[, errors]]) -> str
 |  
 |  Create a new string object from the given object. If encoding or
 |  errors is specified, then the object must expose a data buffer
 |  that will be decoded using the given encoding and error handler.
 |  Otherwise, returns the result of object.__str__() (if defined)
 |  or repr(object).
 |  encoding defaults to sys.getdefaultencoding().
 |  errors defaults to 'strict'.
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __format__(...)
 |      S.__format__(format_spec) -> str
 |      
 |      Return a formatted version of S as described by format_spec.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getatt

In [150]:
a.capitalize() #capitalize a

'Hello, world!'

In [151]:
a.upper() #convert a to uppercase

'HELLO, WORLD!'

In [153]:
a.find(',') #return the position of ',' in a

5

In [155]:
a.find('l') #return just the first occurence

2

In [179]:
a.find('l',4) #return just the first occurence after index 4

10

There is no buil-in function for returning all occurences.

But we can use the `re` module

In [163]:
import re
re.findall('l',a)

['l', 'l', 'l']

In [164]:
[m.start() for m in re.finditer('l', a)]

[2, 3, 10]

Or we can create a function for that

See https://codereview.stackexchange.com/questions/146834/function-to-find-all-occurrences-of-substring

In [166]:
def find_substring(substring, string):
    """ 
    Returns list of indices where substring begins in string

    >>> find_substring('me', "The cat says meow, meow")
    [13, 19]
    """
    indices = []
    index = -1  # Begin at -1 so index + 1 is 0
    while True:
        # Find next index of substring, by starting search from index + 1
        index = string.find(substring, index + 1)
        if index == -1:  
            break  # All occurrences have been found
        indices.append(index)
    return indices

In [167]:
find_substring('l',a)

[2, 3, 10]

In [180]:
a.find('#') #return -1 when nothing is found

-1

### String formatting

`%` operator

In [186]:
'An integer: %i; a float: %f; another string: %s' % (1, 0.1, 'string')

'An integer: 1; a float: 0.100000; another string: string'

In [195]:
'An integer: %03.d; a float: %.2f; another string: %s' % (1, 0.1, 'string')

'An integer: 001; a float: 0.10; another string: string'

In [198]:
i=99
filename = 'processing_of_dataset_%03d.txt' % i
filename

'processing_of_dataset_099.txt'

`.format` method

In [200]:
name='Haroldo'
surname='Ribeiro'
'My name is {}. My surname is {}'.format(name,surname)

'My name is Haroldo. My surname is Ribeiro'

In [208]:
'My name is {:>15}.'.format(name)

'My name is         Haroldo.'

In [210]:
'My name is {:_<15}.'.format(name)

'My name is Haroldo________.'

In [211]:
'My name is {:_^15}.'.format(name)

'My name is ____Haroldo____.'

In [214]:
theta=0.014485
'file_name_{}'.format(theta)

'file_name_0.014485'

In [240]:
'file_name_{:06.3f}'.format(theta) #fill with zeros

'file_name_00.014'

### Dictionaries

A dictionary is basically an efficient table that maps keys to values. It is an unordered container.


In [241]:
tel = {'emmanuelle': 5752, 'sebastian': 5578}

In [242]:
tel.keys()

dict_keys(['emmanuelle', 'sebastian'])

In [243]:
tel.values()

dict_values([5752, 5578])

In [244]:
tel['sebastian']

5578

In [246]:
tel['emmanuelle'] = 5986 #they are mutable object

In [247]:
tel

{'emmanuelle': 5986, 'sebastian': 5578}

In [248]:
tel['haroldo'] = 9999

In [249]:
tel

{'emmanuelle': 5986, 'haroldo': 9999, 'sebastian': 5578}

In [250]:
'haroldo' in tel

True

Creating a dictionary with `dict`

In [255]:
numbers = [1,2,3,4]
strs = ['a','b','c']
dict(list(zip(strs,numbers)))

{'a': 1, 'b': 2, 'c': 3}

`zip` Return a zip object whose .__next__() method returns a tuple where
the i-th element comes from the i-th iterable argument. 

In [258]:
list(zip(numbers,strs))

[(1, 'a'), (2, 'b'), (3, 'c')]

### Tuples

Tuples are basically immutable lists. The elements of a tuple are written between parentheses, or just separated by commas.


In [261]:
t = 12345, 54321, 'hello!'

In [262]:
t

(12345, 54321, 'hello!')

In [263]:
t[0]

12345

In [264]:
t[1] = 'test'

TypeError: 'tuple' object does not support item assignment

## Assignment operator

Assignment statements are used to (re)bind names to values and to modify attributes or items of mutable objects.

In short, it works as follows (simple assignment):
1. an expression on the right hand side is evaluated, the corresponding object is created/obtained
2. a name on the left hand side is assigned, or bound, to the right hand side object

A single object can have several names bound to it.

In [269]:
a = [1, 2, 3]
b = a

In [270]:
a is b

True

The operators `is` and `is not` test for object identity: `x is y` is true if and only if `x` and `y` are the same object.

In [271]:
#changes in b are reflected in a
b[1]='test'

In [272]:
a

[1, 'test', 3]

The `id` function returns the identity of an object.

In [273]:
id(a)

140252684757448

In [274]:
id(b)

140252684757448

The method `.copy` makes a copy of the object.

In [None]:
a.copy

In [275]:
c = a.copy()

In [277]:
c[0] = 'test'

In [278]:
c,a

(['test', 'test', 3], [1, 'test', 3])

In [281]:
#copies are created when slicing.
d = a[:]

In [282]:
id(a),id(d)

(140252684757448, 140252684960456)

In [284]:
a = [1,2,3]
a,id(a)

([1, 2, 3], 140252684485320)

In [285]:
a = ['a', 'b', 'c'] # Creates another object
id(a)

140252684452424

In [286]:
a[:] = [1, 2, 3] # Modifies object in place.
id(a)

140252684452424

## **mutable vs. immutable**

- mutable objects can be changed in place
- immutable objects cannot be modified once created


## Control Flow

Controls the order in which the code is executed.

## `if/elif/else`

In [287]:
if 2**2 == 4:
    print('4 igual a 4')

4 igual a 4


**Blocks are delimited by indentation.**

According to the [PEP8](https://www.python.org/dev/peps/pep-0008/) (Style Guide for Python Code)

- Use 4 spaces per indentation level.

## `for/range`


`range(start, stop[, step])`

Return an object that produces a sequence of integers from start (inclusive)
to stop (exclusive) by step.

In [290]:
t = range(10)

In [293]:
list(t)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Iterating with an index

In [294]:
for i in range(3):
    print(i)

0
1
2


But most often, it is more readable to iterate over values

In [295]:
for word in ('cool', 'powerful', 'readable'):
    print('Python is %s ' % word)

Python is cool 
Python is powerful 
Python is readable 


### You can iterate over any sequence (string, list, keys in a dictionary, lines in a file, . . . ):

In [297]:
vowels = 'aeiouy'

In [298]:
for i in 'powerful':
    if i in vowels:
        print(i)

o
e
u


In [299]:
message = "Hello how are you?"

In [300]:
message.split()

['Hello', 'how', 'are', 'you?']

In [301]:
for word in message.split():
    print(word)

Hello
how
are
you?


### Keeping track of enumeration number

A common task is to iterate over a sequence while keeping track of the item number.

In [302]:
words = ('cool', 'powerful', 'readable')

In [303]:
#the non-pythonic way
for i in range(0, len(words)):
    print((i, words[i]))

(0, 'cool')
(1, 'powerful')
(2, 'readable')


In [307]:
#the pythonic way
for index, item in enumerate(words):
    print((index,item))

(0, 'cool')
(1, 'powerful')
(2, 'readable')


`enumerate(iterable[, start])` -> iterator for index, value of iterable

Return an enumerate object.

In [308]:
test = enumerate(words)

In [309]:
test

<enumerate at 0x7f8f1f6b6ea0>

`next(iterator[, default])`

Return the next item from the iterator.

In [311]:
next(test)

(0, 'cool')

In [312]:
next(test)

(1, 'powerful')

### loop over two or more lists with zip

In [316]:
a = [1,2,3]
b = ['a','b','c']
c = ['aa','bb','cc']

In [317]:
for l1,l2,l3 in zip(a,b,c):
    print(l1,l2,l3)

1 a aa
2 b bb
3 c cc


### Looping over a dictionary

In [320]:
d = dict({'a': 1, 'b':1.2, 'c':1j})

In [324]:
for key, val in sorted(d.items()):
    print('Key: %s has value: %s' % (key, val))

Key: a has value: 1
Key: b has value: 1.2
Key: c has value: 1j


**The ordering of a dictionary in random, thus we use sorted() which will sort on the keys.**

## List Comprehensions

*List comprehension is an elegant way to define and create lists based on existing lists.*

In [325]:
[i**2 for i in range(4)]

[0, 1, 4, 9]

In [326]:
[letter for letter in 'human' ]

['h', 'u', 'm', 'a', 'n']

List comprehensions can utilize conditional statement to modify existing list.

In [329]:
[ x for x in range(20) if x % 2 == 0]

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

The list will be populated by the items in range from 0-19 if the item's value is divisible by 2.

Conditional statement can be combined.

In [330]:
[ x for x in range(20) if x % 2 == 0 if x % 4 == 0]

[0, 4, 8, 12, 16]

We can also perform nested iteration inside a list comprehension.

In [335]:
matrix = [[1, 2],[3,4],[5,6],[7,8]]
transpose = [[row[i] for row in matrix] for i in range(2)]

transpose

[[1, 3, 5, 7], [2, 4, 6, 8]]

**The nested loops in list comprehension don’t work like normal nested loops.** In the above program, `for i in range(2)` is executed before `row[i] for row in matrix`. Hence at first, a value is assigned to `i` then item directed by `row[i]` is appended in the transpose variable.

In [339]:
#non-pythonic way

transposed = []

for i in range(2):
    transposed_row = []
    for row in matrix:
        transposed_row.append(row[i])
    transposed.append(transposed_row)
transposed

[[1, 3, 5, 7], [2, 4, 6, 8]]

- List comprehension is an elegant way to define and create lists based on existing lists.
- List comprehension is generally more compact and faster than normal functions and loops for creating list.
- However, we should avoid writing very long list comprehensions in one line to ensure that code is user-friendly.
- Remember, every list comprehension can be rewritten in for loop, but every for loop can’t be rewritten in the form of list comprehension.

### Exercise

Compute the decimals of Pi using the Wallis formula:
$$
\pi = 2 \sum_{i=1}^\infty \frac{4i^2}{4i^2-1}
$$

In [355]:
%load ../extras/solutions/pi_Wallis.py

In [357]:
# %load ../extras/solutions/pi_Wallis.py
pi = 3.14159265358979312

my_pi = 1.

for i in range(1, 100000):
    my_pi *= 4 * i ** 2 / (4 * i ** 2 - 1.)

my_pi *= 2

print(f'My pi is {my_pi}')
print(f'The relative error is {(pi-my_pi)/pi}')

My pi is 3.141584799578707
The relative error is 2.5000093748515706e-06


## Defining functions

In [358]:
def test():
    print('in test function')

In [359]:
test()

in test function


**Function blocks must be indented as other control-flow blocks.**

### Return statement

Functions can *optionally* return values.

In [360]:
def disk_area(radius):
    return 3.14 * radius * radius

In [361]:
disk_area(3)

28.259999999999998

By default, functions return `None`.

### **Note the syntax to define a function:**

- the def keyword;
- is followed by the function’s name, then
- the arguments of the function are given between parentheses followed by a colon.
- the function body;
- and return object for optionally returning values.

### Parameters

- Functions can take **mandatory** (positional arguments)
- or **optional** (keyword or named arguments) parameters

In [362]:
def double_it(x):
    return(x * 2)

In [363]:
double_it(2)

4

In [364]:
double_it()

TypeError: double_it() missing 1 required positional argument: 'x'

In [365]:
def double_it2(x=2):
    return(x * 2)

In [367]:
double_it2() #assumes x=2 by default

4

Keyword arguments are a very convenient feature for defining functions with a variable number of arguments, especially when default values are to be used in most calls to the function.

### Passing by value or by reference

Most languages (C, Java, . . . ) distinguish “passing by value” and “passing by reference”. In Python, such a distinction is somewhat artificial, and it is a
bit subtle whether your variables are going to be modified or not. Fortunately, there exist clear rules.

Parameters to functions are references to objects, which are passed by value. When you pass a variable to a function, python passes the reference to the object to which the variable refers (the value). Not the variable itself.

- If the value passed in a function is immutable, the function does not modify the caller’s variable. 
- If the value is mutable, the function may modify the caller’s variable in-place:

### An example

In [377]:
def try_to_modify(x, y):
    x = 23
    y.append(42)
    print(x,y)

In [378]:
a = 58
b = [12,13]

In [379]:
try_to_modify(a, b)

23 [12, 13, 42]


In [380]:
print(a,b)

58 [12, 13, 42]


Notice that `a` hasn't changed. 
The variable `x` only exists within the function `try_to_modify`.

What happens if we run `try_to_modify(b, c)` ?

In [381]:
c = [1,2]

In [382]:
try_to_modify(b, c)

23 [1, 2, 42]


In [383]:
print(b,c)

[12, 13, 42] [1, 2, 42]


Notice that `b` hasn't changed, while` c` has a new ellement. 

### Global variables

Variables declared outside the function can be referenced within the function.

But these **global** variables cannot be modified within the function, unless declared **global** in the function

In [384]:
x = 5

In [385]:
def addx(y):
    return x + y

In [386]:
addx(10)

15

In [389]:
def setx(y):
    x = y #notice that we are trying to modify x
    print('x is %d ' % x)

In [390]:
setx(10)

x is 10 


In [391]:
x #but thisdoesn’t work

5

In order to work properly, we must set `x` as `global`.

In [393]:
def setx_work(y):
    global x
    x = y
    print('x is %d ' % x)

In [394]:
setx_work(10)

x is 10 


In [395]:
x

10

### Variable number of parameters

**Special forms of parameters:**
    
- *args: any number of positional arguments packed into a tuple
- **kwargs: any number of keyword arguments packed into a dictionary

In [398]:
def variable_args(*args, **kwargs):
    print('args: ', args)
    print('kwargs: ', kwargs)

In [399]:
variable_args('one', 'two', x=1, y=2, z=3)

args:  ('one', 'two')
kwargs:  {'x': 1, 'y': 2, 'z': 3}


### Docstrings

**Documentation about what the function does and its parameters.**

In [406]:
def ncomplex(real=0.0, imag=0.0):
    """Form a complex number.

    Keyword arguments:
    real -- the real part (default 0.0)
    imag -- the imaginary part (default 0.0)
    
    Returns:
    The complex number real+j*imag
    """
    return complex(2,3)

In [412]:
ncomplex(2,3)

(2+3j)

### Functions are objects

They can be:

- assigned to a variable
- an item in a list (or any collection)

In [413]:
va = ncomplex

In [414]:
va(1,2)

(2+3j)

In [415]:
list_of_functions = [ncomplex,variable_args]

In [416]:
list_of_functions[0](1,2)

(2+3j)

In [417]:
list_of_functions[1](1,2,text='test')

args:  (1, 2)
kwargs:  {'text': 'test'}


### Exercise

Write a function that displays the n first terms of the Fibonacci sequence, defined by:

$$
\begin{cases}
x_0 = 0\\
x_1 = 1\\
x_{n+2} = x_{n+1}+x_{n}
\end{cases}
$$

In [420]:
%load ../extras/solutions/fibonacci.py

In [429]:
# %load ../extras/solutions/fibonacci.py
def fibonacci(n):
    """Display the n first terms of Fibonacci sequence"""
    a, b = 0, 1
    i = 0
    print(b)
    while i < n:
        print(b)
        a, b = b, a+b
        i +=1

In [430]:
fibonacci(10)

1
1
2
3
5
8
13
21
34
55


### Exercise

Modify the previous function to return a list.

In [432]:
def lfibonacci(n):
    """Display the n first terms of Fibonacci sequence"""
    a, b = 0, 1
    i = 0
    l = [a]
    while i < n:
        l.append(b)
        a, b = b, a+b
        i +=1
    return l

In [433]:
lfibonacci(10)

[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55]

## Scripts and Modules

For longer sets of instructions it is convinent to write the code in text files (using a text editor), that we will call either scripts or modules.

- The extension for Python files is `.py`.

In [434]:
# %load ../extras/solutions/1_script.py
message = "Hello how are you?"
for word in message.split():
    print(word)

In Ipython/Jupyter, the syntax to execute a script is `%run script.py`

In [435]:
%run ../extras/solutions/1_script.py

Hello
how
are
you?


It is also possible In order to execute this script as a standalone program, by executing the script inside a shell terminal (Linux/Mac console or cmd Windows console). For example, if we are in the same directory as the test.py file, we can execute this in a console:

```sh
$ python 1_script.py
Hello
how
are
you?
```

You can also create an exutable file by adding the header:

`#!/usr/bin/env python`

in the .py file. You also need to run

```sh
$ chmod +x file.py
```
in order to enable user execution right of the given file.

In [438]:
!chmod +x ../extras/solutions/2_script.py

In [439]:
!../extras/solutions/2_script.py

Hello
how
are
you?


### Standalone scripts may also take command-line arguments

In [440]:
# %load ../extras/solutions/3_script.py
#!/usr/bin/env python

import sys

print(sys.argv)

In [442]:
!chmod +x ../extras/solutions/3_script.py

In [444]:
!../extras/solutions/3_script.py 1 2 test=2

['../extras/solutions/3_script.py', '1', '2', 'test=2']


Notice that the first argument is the file name.

### There is a very convinent module for dealing with arguments
- [argparse](https://docs.python.org/2.7/library/argparse.html#module-argparse)

See https://docs.python.org/2.7/howto/argparse.html for more.

In [1]:
# %load ../extras/solutions/4_script.py
#!/usr/bin/env python

import argparse

parser = argparse.ArgumentParser(description='Double a number.')

parser.add_argument("x", help="value to double",type=float)

args = parser.parse_args()

print(args.x**2)

In [2]:
!chmod +x ../extras/solutions/4_script.py

In [4]:
!../extras/solutions/4_script.py --help

usage: 4_script.py [-h] x

Double a number.

positional arguments:
  x           value to double

optional arguments:
  -h, --help  show this help message and exit


In [5]:
!../extras/solutions/4_script.py 10

100.0


In [7]:
!../extras/solutions/4_script.py 'test'

usage: 4_script.py [-h] x
4_script.py: error: argument x: invalid float value: 'test'


### Creating modules

If we want to write larger and better organized programs (compared to simple scripts), where some objects are defined, (variables, functions, classes) and that we want to reuse several times, we have to create our own modules.

In [8]:
# %load module.py
"A demo module."

def print_b():
    "Prints b."
    print 'b'

def print_a():
    "Prints a."
    print 'a'

c = 2
d = 2

In this file, we defined two functions print_a and print_b. Suppose we want to call the print_a
function from the interpreter. We could execute the file as a script, but since we just want to have access to the
function print_a, we are rather going to import it as a module. The syntax is as follows.

In [14]:
import module

In [15]:
module.print_a()

a


In [16]:
module.print_b()

b


In [18]:
module.c,module.d

(2, 2)

### Importing objects from modules into the main namespace using `from ... import ...`

In [19]:
from module import print_a

`whos` print all interactive variables with some extra information about each variable

In [22]:
whos

Variable   Type        Data/Info
--------------------------------
module     module      <module 'module' from '/h<...>s/MCSC1/aulas/module.py'>
print_a    function    <function print_a at 0x7fa6801cc2f0>


In [23]:
print_a()

a


### `__main__` and module loading

Sometimes we want code to be executed when a module is run directly, but not when it is imported by another module. `if __name__ == '__main__'` allows us to check whether the module is being run directly

In [24]:
# %load module2.py
def print_b():
    "Prints b."
    print('b')

def print_a():
    "Prints a."
    print('a')

print_b() # print_b() runs on import

if __name__ == '__main__':
    # print_a() is only executed when the module is run directly.
    print_a()

In [25]:
import module2

b


In [26]:
%run module2.py

b
a


### How modules are found and imported

When the `import module` statement is executed, the module module is searched in a given list of directories. 

- current path
- installation-dependent default path (e.g., /usr/lib/python)
- list of directories specified by the environment variable PYTHONPATH.

We can modify the environment variable `PYTHONPATH` to include the directories containing the user-defined modules.

On Linux/Unix, add 

```sh
export PYTHONPATH=$PYTHONPATH:/home/emma/user_defined_modules
```
to your `~/.bashrc`

Or modify the sys.path variable itself within a Python script.

```python
import sys
new_path = '/home/emma/user_defined_modules'
if new_path not in sys.path:
    sys.path.append(new_path)
```

## Packages

A directory that contains many modules is called a package. A package is a module with submodules (which can have submodules themselves, etc.). A special file called `__init__.py ` (which may be empty) tells Python that the directory is a Python package, from which modules can be imported

More info about ["How To Package Your Python Code"](https://python-packaging.readthedocs.io/en/latest/ "How To Package Your Python Code").

In [28]:
!ls /home/hvribeiro/Dropbox/gitprojects/works/pyhvr/pyhvr/*

/home/hvribeiro/Dropbox/gitprojects/works/pyhvr/pyhvr/__init__.py
/home/hvribeiro/Dropbox/gitprojects/works/pyhvr/pyhvr/pyhvr.py

/home/hvribeiro/Dropbox/gitprojects/works/pyhvr/pyhvr/__pycache__:
__init__.cpython-36.pyc  pyhvr.cpython-36.pyc


In [31]:
!cat /home/hvribeiro/Dropbox/gitprojects/works/pyhvr/pyhvr/__init__.py

from .pyhvr import *

In [30]:
!cat /home/hvribeiro/Dropbox/gitprojects/works/pyhvr/setup.py

from setuptools import setup

setup(name='pyhvr',
      version='0.1',
      description='My set of functions for data analysis.',
      url='https://github.com/hvribeiro/pyhvr',
      author='Haroldo V. Ribeiro',
      author_email='hvr@dfi.uem.br',
      license='GPL',
      packages=['pyhvr'],
      install_requires=[
          'numpy','pandas','scipy','sklearn',
          'matplotlib','seaborn','palettable',
          'minepy','adjustText','autopep8','docrep'
      ],
      zip_safe=False)

## Input and Output

- We will use numpy and pandas to methods to read and write files. They are much faster and better.
- However, let us just walk around the native python way.

In [33]:
with open('../extras/solutions/workfile.txt', 'w') as f:
    f.write('This is a test \nand another test')

In [34]:
f.closed

True

It is good practice to use the with keyword when dealing with file objects. The advantage is that the file is properly closed after its suite finishes, even if an exception is raised at some point.

In [35]:
with open('../extras/solutions/workfile.txt','r') as f:
    read_data = f.read()

In [36]:
read_data

'This is a test \nand another test'

In [38]:
with open('../extras/solutions/workfile.txt','r') as f:
    for line in f:
        print(line)

This is a test 

and another test


**File modes**
- Read-only: r
- Write-only: w
  - Note: Create a new file or overwrite existing file.
- Append a file: a
- Read and Write: r+
- Binary mode: b

## Standard Library

### os module: operating system functionality

"A portable way of using operating system dependent functionality."

In [44]:
import os

In [45]:
 os.getcwd() #Return a unicode string representing the current working directory.

'/home/hvribeiro/Dropbox/gitprojects/works/MCSC1/aulas'

In [46]:
os.listdir(os.curdir) #Return a list containing the names of the files in the directory.

['module2.py',
 '.ipynb_checkpoints',
 '01 - Getting Started with Scientific Python.slides.html',
 'module.py',
 '01 - Getting Started with Scientific Python.ipynb',
 '__pycache__']

In [49]:
os.path.exists ('module.py') #Test whether a path exists.

True

In [60]:
os.environ['PATH']

'/home/hvribeiro/anaconda3/bin:/usr/bin/hvrpack/:/home/hvribeiro/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/lib/jvm/java-8-oracle/bin:/usr/lib/jvm/java-8-oracle/db/bin:/usr/lib/jvm/java-8-oracle/jre/bin'

### glob: Pattern matching on files

"The glob module provides convenient file pattern matching."

In [61]:
import glob

In [64]:
glob.glob('../extras/solutions/*.py') #Find all files ending in .py:

['../extras/solutions/4_script.py',
 '../extras/solutions/2_script.py',
 '../extras/solutions/fibonacci.py',
 '../extras/solutions/3_script.py',
 '../extras/solutions/1_script.py',
 '../extras/solutions/pi_Wallis.py']

### sys module: system-specific information

System-specific information related to the Python interpreter.

In [66]:
import sys

In [67]:
sys.platform

'linux'

In [68]:
sys.version

'3.6.2 |Anaconda custom (64-bit)| (default, Jul 20 2017, 13:51:32) \n[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]'

 ### pickle: easy persistence
 
 Useful to store arbitrary objects to a file.

In [71]:
import pickle

In [72]:
l = [1, None, 'Stan']

In [75]:
with open('../extras/solutions/test.pkl', 'wb') as f:
    pickle.dump(l, f)

In [76]:
del l

In [77]:
l

NameError: name 'l' is not defined

In [79]:
with open('../extras/solutions/test.pkl', 'rb') as f:
    l = pickle.load(f)

In [80]:
l

[1, None, 'Stan']

### Exercise

Write a program to search your `PYTHONPATH` for a given module.

In [None]:
%load ../extras/solutions/py_path.py

In [85]:
# %load ../extras/solutions/py_path.py
#!/usr/bin/env python

"""Script to search the PYTHONPATH for a module"""

import os
import sys
import glob
import argparse

def find_module(module):
    result = []
    # Loop over the list of paths in sys.path
    for subdir in sys.path:
        # Join the subdir path with the module we're searching for
        pth = os.path.join(subdir, module)
        # Use glob to test if the pth is exists
        res = glob.glob(pth)
        # glob returns a list, if it is not empty, the pth exists
        if len(res) > 0:
            result.append(res)
    return result

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='Search in $PYTHONPATH for a module.')
    parser.add_argument("name", help="module name",type=str)
    args = parser.parse_args()

    result = find_module(args.name)
    print(result)

In [86]:
!chmod +x ../extras/solutions/py_path.py

In [87]:
!../extras/solutions/py_path.py test

[]


In [88]:
!../extras/solutions/py_path.py pyhvr

[['/home/hvribeiro/Dropbox/gitprojects/works/pyhvr/pyhvr']]


In [89]:
!../extras/solutions/py_path.py numpy

[['/home/hvribeiro/anaconda3/lib/python3.6/site-packages/numpy']]


## Exception handling in Python

Exceptions are raised by different kinds of errors arising when executing Python code.

Exceptions are raised by errors in Python:

In [90]:
1/0

ZeroDivisionError: division by zero

In [91]:
1 + 'e'

TypeError: unsupported operand type(s) for +: 'int' and 'str'

In [92]:
l=[1,2,3]
l[3]

IndexError: list index out of range

In [95]:
l.wrong_attribute()

AttributeError: 'list' object has no attribute 'wrong_attribute'

## Catching exceptions with `try/except`

In [101]:
def print_sorted(collection):
    collection.sort()
    print(collection)

In [98]:
print_sorted([5,1,0,3])

[0, 1, 3, 5]


In [99]:
print_sorted((4,1,9))

AttributeError: 'tuple' object has no attribute 'sort'

In [110]:
def print_sorted2(collection):
    try:
        collection.sort()
    except AttributeError:
        print('.sort didn\'t work. imput is likely to be not a colection')
        pass
    print(collection)

In [111]:
print_sorted2((4,1,9))

.sort didn't work. imput is likely to be not a colection
(4, 1, 9)


## Object-oriented programming (OOP)

Python supports object-oriented programming (OOP). The goals of OOP are:
- to organize the code, and
- to re-use code in similar contexts

But we will not work on this issue.

## Numpy

Numpy is the de-facto standard for numerical arrays in Python. It arose as an effort by Travis Oliphant and others to unify the numerical arrays in Python. In this section, we provide an overview and some tips for using Numpy effectively, but for much more detail, [Travis’ book](http://web.mit.edu/dvp/Public/numpybook.pdf) is a great place to start and is available for free online.