# MCSC1 - Material do curso de Métodos Computacionais em Sistemas Complexos I

- *Périodo de aulas:* 06/08 a 21/12 ($\approx20$ aulas)
- *Local:* G68 - sala 11
- *Horário:* às quartas-feiras de 13:30 a 17:00
- *Forma de avaliação:* a definir
- Livro texto: José Unpingco, *Python for Probability, Statistics, and Machine Learning*, Springer (2016).
[![Alt Text](https://images.springer.com/sgw/books/medium/9783319307152.jpg)](http://library1.org/_ads/E844412DCACEB5A9BF29267FA244E908)


## Sumário

1. Getting Started with Scientific Python
2. Probability
3. Statistics
4. Machine Learning

# 01 - Getting Started with Scientific Python

## Python

>**Python is an interpreted language.** This means that Python codes run on a Python virtual machine that provides a layer of abstraction between the code and the platform it runs on, thus making codes portable across different platforms. For example, the same script that runs on a Windows laptop can also run on a Linux-based supercomputer or on a mobile phone. This makes programming easier because the virtual machine handles the low-level details of implementing the business logic of the script on the underlying platform.

## Python

>**Python is a dynamically typed language, which means that the interpreter itself figures out the representative types (e.g., floats, integers) interactively or at run-time.** This is in contrast to a language like Fortran that have compilers that study the code from beginning to end, perform many compiler-level optimizations, link intimately with the existing libraries on a specific platform, and then create an executable that is henceforth liberated from the compiler. As you may guess, the compiler’s access to the details of the underlying platform means that it can utilize optimizations that exploit chip-specific features and cache memory. Because the virtual machine abstracts away these details, **it means that the Python language does not have programmable access to these kinds of optimizations.** So, where is the balance between the ease of programming the virtual machine and these key numerical optimizations that are crucial for scientific work?

## Python

>**The balance comes from Python’s native ability to bind to compiled Fortran and C libraries.** This means that you can send intensive computations to compiled libraries directly from the interpreter. This approach has two primary advantages. First, it give you the fun of programming in Python, with its expressive syntax and lack of visual clutter. This is a particular boon to scientists who typically want to use software as a tool as opposed to developing software as a product. The second advantage is that you can mix-and-match different compiled libraries from diverse research areas that were not otherwise designed to work together. This works because Python makes it easy to allocate and fill memory in the interpreter, pass it as input to compiled libraries, and then retrieve the output back at the interpreter.

## Python

>Moreover, **Python provides a multiplatform solution for scientific codes.** As an open-source project, Python itself is available anywhere you can build it, even though it typically comes standard nowadays, as part of many operating systems. This means that once you have written your code in Python, you can just transfer the script to another platform and run it, as long as the compiled libraries are also available there. What if the compiled libraries are absent? Building and configuring compiled libraries across multiple systems used to be a painstaking job, but as scientific Python has matured, a wide range of libraries have now become available across all of the major platforms (i.e., Windows, MacOS, Linux, Unix) as prepackaged distributions.

## Python 

>Finally, scientific Python facilitates maintainability of scientific codes because **Python syntax is clean, free of semi-colon litter and other visual distractions that makes code hard to read and easy to obfuscate.** Python has many built-in testing, documentation, and development tools that ease maintenance. Scientific codes are usually written by scientists unschooled in software development, so having solid software development tools built into the language itself is a particular boon.

- Uma boa introdução ao python é apresentadada em [Think Python 2nd Edition by Allen B. Downey](http://greenteapress.com/thinkpython2/thinkpython2.pdf).
- Se você não tem familiaridade com o python dê, ao menos, uma lida no Cap. 2 das [Scipy Lecture Notes](https://www.scipy-lectures.org/_downloads/ScipyLectures-simple.pdf).
- Ou assista [A hands-on introduction to Python for beginning programmers](https://www.youtube.com/watch?v=rkx5_MRAV3A).
- **Aqui vamos fazer uma breve introdução seguindo as Scipy Lectures**.

## The Python language

>  Python is a **programming language**, as are C, Fortran, BASIC, PHP,
  etc. Some specific features of Python are as follows:

>  * an *interpreted* (as opposed to *compiled*) language. Contrary to e.g.
    C or Fortran, one does not compile Python code before executing it. In
    addition, Python can be used **interactively**: many Python
    interpreters are available, from which commands and scripts can be
    executed.

>  * a free software released under an **open-source** license: Python can
    be used and distributed free of charge, even for building commercial
    software.

>  * **multi-platform**: Python is available for all major operating
    systems, Windows, Linux/Unix, MacOS X, most likely your mobile phone
    OS, etc.

>  * a very readable language with clear non-verbose syntax

>  * a language for which a **large variety of high-quality packages are
    available for various applications, from web frameworks to scientific
    computing.**

>  * a language very easy to interface with other languages, in particular C
    and C++.

>  * Some other features of the language are illustrated just below. For
    example, Python is an object-oriented language, with dynamic typing
    (the same variable can contain objects of different types during the
    course of a program).

### Python 3 or Python 2?

> In 2008, Python 3 was released. It is a major evolution of the language that made a few changes. **Some old scientific code does not yet run under Python 3**. However, this is infrequent and Python 3 comes with many benefits. We advise that you install Python 3.

In [2]:
#python 2

print "Hello, world!"

SyntaxError: Missing parentheses in call to 'print' (<ipython-input-2-651ed81a069e>, line 3)

In [3]:
#python 3

print("Hello, world!")

Hello, world!


In [5]:
#Uma nova característica do python 3.6
#f-strings
ano = 2018
f'Esse curso refere-se ao ano letivo de {ano}'

'Esse curso refere-se ao ano letivo de 2018'

## Installing a working environment

> Python comes in many flavors, and there are many ways to install it. However, **we recommend to install a scientific-computing distribution**, that comes readily with optimized versions of scientific modules.

**Under Linux**
If you have a recent distribution, most of the tools are probably packaged, and it is recommended to use your package manager.

**Other systems**
There are several fully-featured Scientific Python distributions:

- [Anaconda](https://www.continuum.io/downloads)
- [EPD](https://store.enthought.com/downloads/)

## Obtendo a Anaconda

- Faça o download da versão 3.x em https://www.anaconda.com/download/ ou use o [link direto](https://repo.anaconda.com/archive/Anaconda3-5.2.0-Linux-x86_64.sh) para o .sh
- Siga as instruções em https://docs.anaconda.com/anaconda/install/linux

Basicamente é executar
```sh
sh Anaconda3-5.2.0-Linux-x86_64.sh
```
e seguir os passos do instalador.

Vamos usar o Jupyter Notebook para a maioria dos códigos. Falaremos dele mais adiante. Por hora, vamos imaginar que estamos usando o *python shell*.

## First steps in python

To start the interpreter, type:

```sh
python
```
em the sheel.

In [6]:
a = 3
b = 2*1

In [7]:
type(b)

int

In [8]:
print(b)

2


In [9]:
a*b

6

In [10]:
b = 'hello'
type(b)

str

In [11]:
b + b

'hellohello'

In [12]:
2*b

'hellohello'

>Two variables ``a`` and ``b`` have been defined above. Note that one does
  not declare the type of a variable before assigning its value. In C,
  conversely, one should write:

```C
 int a = 3;
```
>  In addition, the type of a variable may change, in the sense that at
  one point in time it can be equal to a value of a certain type, and a
  second point in time, it can be equal to a value of a different
  type. `b` was first equal to an integer, but it became equal to a
  string when it was assigned the value `'hello'`. Operations on
  integers (``b=2*a``) are coded natively in Python, and so are some
  operations on strings such as additions and multiplications, which
  amount respectively to concatenation and repetition.

## Basic types

### Numerical types

In [14]:
# Integer
a = 4
type(a)

int

In [16]:
# Floats
b = 2.1
type(b)

float

In [17]:
# Complex
c = 1.5 + 0.5j
type(c)

complex

In [18]:
f"parte real: {c.real} | parte imaginária: {c.imag}"

'parte real: 1.5 | parte imaginária: 0.5'

In [27]:
# Booleans

test = 3 > 4
type(test)

bool

In [21]:
test

False

In [22]:
test==0

True

In [23]:
test==1

False

In [30]:
test is True

False

## Operators

**Basic arithmetic operations `+, -, *, /, **, %` (modulo) natively implemented**

In [31]:
7*9

63

In [32]:
2**10

1024

In [33]:
8%3

2

In [34]:
#Type conversion (casting):
float(1)

1.0

**The behaviour of the division operator has changed in Python 3.**

In Python 2.x

```python
>>> 3/2
1
```

In [37]:
# In Python 3
3/2

1.5

In [38]:
# The same way
3/2.

1.5

In [39]:
a=3
b=2
a/float(b)

1.5

You can use Future behavior to always get the behavior of Python3

```python
>>> from __future__ import division
>>> 3 / 2
1.5
```

If you explicitly want integer division use `//`

In [40]:
3.0//2.0

1.0

## Containers

Python provides many efficient types of containers, in which collections of objects can be stored.

### Lists

In [41]:
colors = ['red', 'blue', 'green', 'black', 'white']
type(colors)

list

Indexing: accessing individual objects contained in the list:

In [44]:
colors[0]

'red'

In [43]:
colors[-1]

'white'

**Indexing starts at 0 (as in C), not at 1 (as in Fortran or Matlab)!**

Slicing: obtaining sublists of regularly-spaced elements:

**Slicing syntax:** `colors[start:stop:stride]`

- including `start` and excluding `stop`:  `start <= i < stop`
- All slicing parameters are optional

In [46]:
colors[2:4]

['green', 'black']

In [47]:
colors[::2]

['red', 'green', 'white']

In [48]:
colors[3:]

['black', 'white']

In [49]:
colors[:3]

['red', 'blue', 'green']

### Lists are mutable objects and may have different types

In [57]:
colors = ['red', 'blue', 'green', 'black', 'white']

In [58]:
colors[0] = 'yellow'

In [59]:
colors

['yellow', 'blue', 'green', 'black', 'white']

In [60]:
colors[2:4] = ['gray', 'purple']

In [61]:
colors

['yellow', 'blue', 'gray', 'purple', 'white']

In [64]:
#it will include cyan
colors[2:4] = ['gray', 'purple', 'cyan']

In [65]:
colors

['yellow', 'blue', 'gray', 'purple', 'cyan', 'cyan', 'white']

In [67]:
mixedlist = [3, -200, 'hello']
mixedlist

[3, -200, 'hello']

For collections of numerical data that all have the same type, it is often more efficient to use the array type provided by the numpy module. A NumPy array is a chunk of memory containing fixed-sized items. With NumPy arrays, operations on elements can be faster because elements are regularly spaced in memory and more operations are performed through specialized C functions instead of Python loops.

### Python offers a large panel of functions to modify lists

In [81]:
colors = ['red', 'blue', 'green', 'black', 'white']
colors.append('pink') #add item at the end
colors

['red', 'blue', 'green', 'black', 'white', 'pink']

In [82]:
colors.pop() #removes and returns the last item

'pink'

In [83]:
colors

['red', 'blue', 'green', 'black', 'white']

In [84]:
colors.pop(1) #removes and returns the item 1

'blue'

In [85]:
colors

['red', 'green', 'black', 'white']

In [86]:
colors = ['red', 'blue', 'green', 'black', 'white']
rcolors = colors[::-1] #reverse
rcolors

['white', 'black', 'green', 'blue', 'red']

In [87]:
rcolors.reverse() #reverse in-place

In [89]:
rcolors

['red', 'blue', 'green', 'black', 'white']

### Concatenate, repeat and sort lists

In [92]:
rcolors + colors

['red',
 'blue',
 'green',
 'black',
 'white',
 'red',
 'blue',
 'green',
 'black',
 'white']

In [93]:
rcolors*2

['red',
 'blue',
 'green',
 'black',
 'white',
 'red',
 'blue',
 'green',
 'black',
 'white']

In [94]:
sorted(rcolors) #sort and return a new list

['black', 'blue', 'green', 'red', 'white']

In [96]:
rcolors #the list is not modified

['red', 'blue', 'green', 'black', 'white']

In [97]:
rcolors.sort()  # in-place

In [98]:
rcolors #the list is modified

['black', 'blue', 'green', 'red', 'white']

In [103]:
#there are a bunch of methods
dir(rcolors)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__rmul__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

### Strings

Different string syntaxes (simple, double or triple quotes):

In [104]:
s = 'Hello, how are you?'

s = "Hi, what's up"

s = '''Hello,
       how are you''' # tripling the quotes allows the string to span more than one line

s = """Hi,
       what's up?"""

In [105]:
'Hi, what's up?'

SyntaxError: invalid syntax (<ipython-input-105-ff7eba3d017a>, line 1)

In [120]:
'Hi, what\'s up?' # can be solver with backslash

"Hi, what's up?"

Other uses of the backslash are, e.g., the newline character `\n` and the tab character `\t`.

In [106]:
"Hi, what's up?"

"Hi, what's up?"

In [107]:
'"test"'

'"test"'

Strings are collections like lists. Hence they can be indexed and sliced, using the same syntax and rules.

In [108]:
a = "hello"

In [109]:
a[0]

'h'

In [110]:
a[-1]

'o'

In [111]:
a = "hello, world!"
a[3:6]

'lo,'

In [112]:
a[2:10:2]

'lo o'

In [113]:
a[::3]

'hl r!'

Accents and special characters can also be handled in Unicode strings.

In [128]:
a = "Maringá"

In [126]:
a.encode(encoding='utf8')

b'Maring\xc3\xa1'

In [131]:
b'Maring\xc3\xa1'.decode("utf8")

'Maringá'

In [132]:
a.encode('ascii') 

UnicodeEncodeError: 'ascii' codec can't encode character '\xe1' in position 6: ordinal not in range(128)

In [133]:
a.encode('ascii','replace') 

b'Maring?'

In [134]:
a.encode('ascii','ignore') 

b'Maring'

### Converting weird characters to ascii

In [140]:
from unidecode import unidecode

In [141]:
unidecode(a)

'Maringa'

**unidecode** is python module for "ASCII transliterations of Unicode text".

To install

```sh
pip install unidecode
```

A string is an **immutable object** and it is not possible to modify its contents. One may however create new strings from the original one.

In [142]:
a = "hello, world!"
a[2] = 'z'

TypeError: 'str' object does not support item assignment

In [144]:
a.replace('l','z',1) #replace 'l' by 'z'. 1 is the number of occurences to be replaced.

'hezlo, world!'

In [146]:
a #doesn't change the value of a

'hello, world!'

In [147]:
a.replace('l','z')

'hezzo, worzd!'

Strings have a bunch of methods

In [149]:
help(str)

Help on class str in module builtins:

class str(object)
 |  str(object='') -> str
 |  str(bytes_or_buffer[, encoding[, errors]]) -> str
 |  
 |  Create a new string object from the given object. If encoding or
 |  errors is specified, then the object must expose a data buffer
 |  that will be decoded using the given encoding and error handler.
 |  Otherwise, returns the result of object.__str__() (if defined)
 |  or repr(object).
 |  encoding defaults to sys.getdefaultencoding().
 |  errors defaults to 'strict'.
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __format__(...)
 |      S.__format__(format_spec) -> str
 |      
 |      Return a formatted version of S as described by format_spec.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getatt

In [150]:
a.capitalize() #capitalize a

'Hello, world!'

In [151]:
a.upper() #convert a to uppercase

'HELLO, WORLD!'

In [153]:
a.find(',') #return the position of ',' in a

5

In [155]:
a.find('l') #return just the first occurence

2

In [179]:
a.find('l',4) #return just the first occurence after index 4

10

There is no buil-in function for returning all occurences.

But we can use the `re` module

In [163]:
import re
re.findall('l',a)

['l', 'l', 'l']

In [164]:
[m.start() for m in re.finditer('l', a)]

[2, 3, 10]

Or we can create a function for that

See https://codereview.stackexchange.com/questions/146834/function-to-find-all-occurrences-of-substring

In [166]:
def find_substring(substring, string):
    """ 
    Returns list of indices where substring begins in string

    >>> find_substring('me', "The cat says meow, meow")
    [13, 19]
    """
    indices = []
    index = -1  # Begin at -1 so index + 1 is 0
    while True:
        # Find next index of substring, by starting search from index + 1
        index = string.find(substring, index + 1)
        if index == -1:  
            break  # All occurrences have been found
        indices.append(index)
    return indices

In [167]:
find_substring('l',a)

[2, 3, 10]

In [180]:
a.find('#') #return -1 when nothing is found

-1

### Dictionaries

A dictionary is basically an efficient table that maps keys to values. It is an unordered container.


In [241]:
tel = {'emmanuelle': 5752, 'sebastian': 5578}

In [242]:
tel.keys()

dict_keys(['emmanuelle', 'sebastian'])

In [243]:
tel.values()

dict_values([5752, 5578])

In [244]:
tel['sebastian']

5578

In [246]:
tel['emmanuelle'] = 5986 #they are mutable object

In [247]:
tel

{'emmanuelle': 5986, 'sebastian': 5578}

In [248]:
tel['haroldo'] = 9999

In [249]:
tel

{'emmanuelle': 5986, 'haroldo': 9999, 'sebastian': 5578}

In [250]:
'haroldo' in tel

True

[('a', 1), ('b', 2), ('c', 3)]

In [255]:
numbers = [1,2,3,4]
strs = ['a','b','c']
dict(list(zip(strs,numbers)))

{'a': 1, 'b': 2, 'c': 3}

`zip` Return a zip object whose .__next__() method returns a tuple where
the i-th element comes from the i-th iterable argument. 

In [258]:
list(zip(numbers,strs))

[(1, 'a'), (2, 'b'), (3, 'c')]

### String formatting

`%` operator

In [186]:
'An integer: %i; a float: %f; another string: %s' % (1, 0.1, 'string')

'An integer: 1; a float: 0.100000; another string: string'

In [195]:
'An integer: %03.d; a float: %.2f; another string: %s' % (1, 0.1, 'string')

'An integer: 001; a float: 0.10; another string: string'

In [198]:
i=99
filename = 'processing_of_dataset_%03d.txt' % i
filename

'processing_of_dataset_099.txt'

`.format` method

In [200]:
name='Haroldo'
surname='Ribeiro'
'My name is {}. My surname is {}'.format(name,surname)

'My name is Haroldo. My surname is Ribeiro'

In [208]:
'My name is {:>15}.'.format(name)

'My name is         Haroldo.'

In [210]:
'My name is {:_<15}.'.format(name)

'My name is Haroldo________.'

In [211]:
'My name is {:_^15}.'.format(name)

'My name is ____Haroldo____.'

In [214]:
theta=0.014485
'file_name_{}'.format(theta)

'file_name_0.014485'

In [240]:
'file_name_{:06.3f}'.format(theta) #fill with zeros

'file_name_00.014'

## Numpy

Numpy is the de-facto standard for numerical arrays in Python. It arose as an effort by Travis Oliphant and others to unify the numerical arrays in Python. In this section, we provide an overview and some tips for using Numpy effectively, but for much more detail, [Travis’ book](http://web.mit.edu/dvp/Public/numpybook.pdf) is a great place to start and is available for free online.