Basic Python and native data structures
==========

In [1]:
from IPython.core.display import display, HTML

# Larger display 
display(HTML("<style>.container { width:75% !important; }</style>"))

In a nutshell
-------------
- Scripting language
- Multi-platform (OsX, Linux, Windows)
- Battery-included
- Lots of third-party library (catching up with R for computational biology)
- Lots of help available online (e.g. stackoverflow)

**"Scripting language" means:**

- no type declaration required. 
- many built-in data structures are already available: dictionary, lists... 
- no need for memory handling: there is a memory garbage collector

**Multi-platform**

- Byte code can be executed on different platforms.

**"Battery included" means:**

- Many modules are already provided (e.g. to parse csv files)
- No need to install additional libraries for most simple tasks

All of that in a more poetic form
---------------------------------

In [3]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


Resources
----------
- https://docs.python.org
- https://docs.python.org/2/tutorial/index.html
- http://nbviewer.ipython.org/github/rasbt/python_reference/blob/master/tutorials/key_differences_between_python_2_and_3.ipynb

Hello world example
-------------------

In [4]:
print("hello")

hello


About indentation
-----------------

Before starting, you need to know that in Python, code indentation is an essential part of the syntax. It is used to delimitate code blocks such as loops and functions. It may seem cumbersome, but it makes all Python code consistent and readable. The following code is incorrect:

```python
>>> a = 1
>>>   b = 2
```
since the two statements are not aligned despite being part of the same block of statements (the main block). Instead, they must be indented in the same way:
```python
>>> a = 1
>>> b = 2
```
Here is another example involving a loop and a function (def):
```python
def example():
    for i in [1, 2, 3]:
        print(i)
```     
In C, it may look like 
```c
void example(){
  int i;
  for (i=1; i<=3; i++){
      printf("%d\n", i);
  }
}```
OR
```c
void example(){
int i;
for (i=1; i<=3; i++)
{
printf("%d\n", i);
}
}```

**Note:** both tabs and spaces can be used to define the indentation, but conventionally **4 spaces** are preferred. 

Rules and conventions on naming variables
-------------------------------

* Variable names are unlimited in length
* Variable names start with a letter or underscore *_* followed by letters, numbers or underscores.
* Variable names are case-sensitive
* **Variable names cannot be named with special keywords (see below)**

Variable names conventionally have lower-case letters, with multiple words seprated by underscores. 

**Other rules and style conventions:** PEP8 style recommendations (https://www.python.org/dev/peps/pep-0008/)

Basic numeric types
----------------

**Integers**

In [8]:
a = 10  
b = 2
a + b

12

In [9]:
# incremental operators
a = 10
a += 2    # equivalent to a = a + 2   (there is no ++ operators like in C/C++])
a

12

**Boolean**

In [17]:
test = True
if test:
    print(test)

True


In [18]:
test = False
if not test:
    print(test)

False


In [41]:
# Other types can be treated as boolean
# Main example are integers
true_value = 1
false_value = 0
if true_value:
    print(true_value)
if not false_value:
    print(false_value)

1
0


**Integers, Long, Float and Complex**

In [10]:
long_integer = 2**63

float1 = 2.1           
float2 = 2.0
float3 = 2.

complex_value = 1 + 2j

In [13]:
long_integer

9223372036854775808L

In [15]:
float3

2.0

**Basic mathematical operators**

In [26]:
1 + 2

3

In [32]:
1 - 2

-1

In [27]:
3 * 2

6

In [31]:
3 / 2

1

In [30]:
3 // 2.

1.0

In [33]:
3 % 2

1

In [34]:
3 ** 2

9

**Promotion:** when you mix numeric types in an expression, all operands are converted (or coerced) to the type with highest precision

In [35]:
5 + 3.1

8.1

**Converting types: casting**

A variable belonging to one type can be converted to another type through "casting"

In [36]:
int(3.1)

3

In [37]:
float(3)

3.0

In [38]:
bool(1)

True

In [39]:
bool(0)

False

Keywords
------

- keywords are special names that are part of the Python language.
- **A variable cannot be named after a keywords** --> SyntaxError would be raised
- The list of keywords can be obtained using these commands (**import** and **print** are themselves keywords that will be explained along this course)

In [45]:
import keyword
# Here we are using the "dot" operator, which allows us to access objects (variables, that is) attributes and functions
print(keyword.kwlist)

['and', 'as', 'assert', 'break', 'class', 'continue', 'def', 'del', 'elif', 'else', 'except', 'exec', 'finally', 'for', 'from', 'global', 'if', 'import', 'in', 'is', 'lambda', 'not', 'or', 'pass', 'print', 'raise', 'return', 'try', 'while', 'with', 'yield']


In [44]:
print = 1

SyntaxError: invalid syntax (<ipython-input-44-4fe15ff19304>, line 1)

A note about objects
-------

- Everything in Python is an object, which can be seen as an advanced version of a variable
- objects have methods
- the **dir** keyword allows the user to discover them

In [75]:
print(dir(bool))

['__abs__', '__add__', '__and__', '__class__', '__cmp__', '__coerce__', '__delattr__', '__div__', '__divmod__', '__doc__', '__float__', '__floordiv__', '__format__', '__getattribute__', '__getnewargs__', '__hash__', '__hex__', '__index__', '__init__', '__int__', '__invert__', '__long__', '__lshift__', '__mod__', '__mul__', '__neg__', '__new__', '__nonzero__', '__oct__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdiv__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rlshift__', '__rmod__', '__rmul__', '__ror__', '__rpow__', '__rrshift__', '__rshift__', '__rsub__', '__rtruediv__', '__rxor__', '__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', '__trunc__', '__xor__', 'bit_length', 'conjugate', 'denominator', 'imag', 'numerator', 'real']


Importing standard python modules
---------------

Standard python modules are libraries that are available without the need to install additional software (they come together with the python interpreter). They only need to be **imported**. The **import** keyword allows us to import standard (and non standard) Python modules. Some common ones:
- os
- math
- sys
- urllib2
- tens of others are available. See https://docs.python.org/2/py-modindex.html

In [65]:
import os
os.listdir('.')

['[0]-Introduction_to_Jupyter_Notebook.ipynb',
 '[4]-Useful_third_party_libraries_for_data_analysis.ipynb',
 '[5]-Advanced_Jupyter.ipynb',
 '[3]-Data_visualization.ipynb',
 '.ipynb_checkpoints',
 '.keep',
 '[1]-Basic_python_and_native_python_data_structures.ipynb',
 '[2]-Advanced_data_structures-and-file-parsing.ipynb']

In [66]:
os.path.exists('data.txt')

False

In [67]:
os.path.isdir('.ipynb_checkpoints/')

True

**Import comes in different flavors**

In [68]:
import math
math.pi

3.141592653589793

In [69]:
from math import pi
pi

3.141592653589793

In [70]:
# alias are possible on the module itself
import math as m
m.pi

3.141592653589793

In [71]:
# or alias on the function/variable itself
from math import pi as PI
PI

3.141592653589793

In [73]:
# pi was deleted earlier and from math import pi as PI did not created pi 
# variable in the local space as expected hence the error
del pi
pi

NameError: name 'pi' is not defined

In [74]:
math.sqrt(4.)

2.0

Data structures
------------


There are quite a few data structures available. The builtins data structures are: 
- **lists**
- **tuples**
- **dictionaries**
- **strings**
- **sets** 

Lists, strings and tuples are **ordered sequences** of objects. Unlike strings that contain only characters, list and tuples can contain any type of objects. Lists and tuples are like arrays. Tuples like strings are **immutables**. Lists are mutables so they can be extended or reduced at will. Sets are mutable unordered sequence of unique elements.

Lists are enclosed in brackets:

```    python
l = [1, 2, "a"]
```

Tuples are enclosed in parentheses:

```python
t = (1, 2, "a")```

Tuples are faster and consume less memory.

Dictionaries are built with curly brackets:

```python
d = {"a":1, "b":2}```

Sets are made using the **set** builtin function. More about the data structures here below:

|                    | immutable  | mutable     |
|--------------------|------------|-------------|
| ordered sequence   | string     |             |
| ordered sequence   | tuple      |  list       |
| unordered sequence |            |  set        |
| hash table         |            |  dict       |


**Indexing** starts at 0, like in C

In [76]:
s1 = "Example"
s1[0]

'E'

In [77]:
# last index is therefore the length of the string minus 1
s1[len(s1)-1]

'e'

In [78]:
# Negative index can be used to start from the end
s1[-1]

'e'

In [81]:
# Careful with indexing out of bounds
s1[100]

IndexError: string index out of range

Strings and slicing
-----

There are 4 ways to represent strings:
- with single quotes
- with double quotes
- with triple single quotes
- with triple double quotes

In [82]:
"Simple string"

'Simple string'

In [83]:
'Simple string'

'Simple string'

In [84]:
#single quotes can be used to use double quotes and vice versa
"John's book"

"John's book"

In [85]:
#we can also use escaping
'John\'s book'

"John's book"

In [86]:
"""This is an example of 
a long string on several lines"""

'This is an example of \na long string on several lines'

**A little bit more on the print function: formatting**

In [51]:
print('This {0} is {1} on format'.format('example', 'based'))

This example is based on format


In [52]:
print('This {0} is {1} on format, isn't it a nice {0}?'.format('example', 'based'))

SyntaxError: invalid syntax (<ipython-input-52-adf69de48897>, line 1)

In [53]:
# Notice the escaping of the quote char
print('This {0} is {1} on format, isn\'t it a nice {0}?'.format('example', 'based'))

This example is based on format, isn't it a nice example?


In [54]:
print("You can also use %s %s\n" % ('C-like', 'formatting'))

You can also use C-like formatting



In [64]:
print("You can also format integers %d\n" % (1))

You can also format integers 1



In [55]:
print("You can also specify the precision of floats: %f or %.20f\n" % (1., 1.))

You can also specify the precision of floats: 1.000000 or 1.00000000000000000000

