# Python tutorial
Instructor: Stan Sobolevsky  
Founding Partner at www.indatlabs.com  
Associate Professor Of Practice And Director Of Urban Complexity Lab
at New York University (CUSP NYU)
sobolevsky@indatlabs.com


# Python basics

Why Python?

* easy learning curve, intuitive syntax;  
* free open-source project with community-development model;
* rich supply of modules/libraties;
* improved programmer productivity;
* great performance for a script language;
* advanced integration capabilities (C/C++, Java, XML, JSON etc).

Disadvantages come as a flipside of its strenghts:  
* it is an intepretable language, so performance is limited compared to compiled ones;
* intuitive syntax (e.g. dynamic typing) makes it susceptible to run-time errors;
* community-development model with is a key for its fast development leads to decentralization and often lack of coordination between different modules  
** otherwise Python 2-3 fork would never happen without providing 100% backwards compatibility

So why still Python 2? 

Although Python 3 is thought as the future of the language and increasingly many new projects reply on it, there is still a considerable amount of libraries and projects not ported. So Python 2 is thought as a solid and stable "safe-harbor", while providing comprehensive fnctionality for your data science projects. Besides the differences in syntax is tiny and it will be easy to move to Python 3 when needed. 

In the first session we will cover data types, operators, variables, flow control (loops and branching), useful built-in and user defined functions, and data structures (lists, tuples, sets, dictionaries). We will also master some particular "Pythonic" ways of coding like list and dictionary comprehensions 

## Jupyter (iPython) notebook basics

First spare a few words talking of the eviroment we'll utility for our class. Jupyter or iPython Notebook brings together code, data and textual comments. Depending on the balance between the least, you can think of it as commented code or interactive blog/article/tutorial.

Notebook consists of **cells** as its basic building blocks. Each cell could be 
* *code* - piece of python code, self-consistent or using results of previously processed cells
* *markdown* - textual comments with formatting symbols (standard markdown syntax https://en.wikipedia.org/wiki/Markdown is supported)
* *heading/subheading* - special case of markdown starting with # or ##, ###,... with respect to heading level
* raw cell (like below) - content to be included unmodified for futher possible conversion

Markdown basics (credits to https://en.wikipedia.org/wiki/Markdown ):
# Heading

## Sub-heading

Paragraphs are separated by a blank line.

Two spaces at the end of a line   
produces a line break.

_smth bold_
Text attributes _italic_, 
**bold**, `monospace`.

Horizontal rule:

---

Bullet list:

  * apples
  * oranges
  * pears

Numbered list:

  1. wash
  2. rinse
  3. repeat

A [link](http://example.com).

![Image](Image_icon.png)

> Markdown uses email-style > characters for blockquoting.

Inline <abbr title="Hypertext Markup Language">HTML</abbr> is supported.

If you want to include a formulae - TeX syntax is supported too!
$$
S=\sum\limits_{i=1}^{100}i^2
$$

## Python Expressions

Now let's get back to Python. A simplest code cell can contain an expression using constant values and operators to be calculated. There are 7 arithmetic operators:

| Operator | Function |
|----|---|
| +  | Addition |
| -  | Subtraction |
| *  | multiplication |
| /  | division |
| %  | remainder (modulus) |
| //  | integer division |
| **  | power |

In [1]:
#addition
1+2

3

In [2]:
#third power of 2
2**3

8

In [3]:
#remainder of 10 divided by 3
10%3

1

In [4]:
#notice that if we want actual floating point division we need to instruct python that we are working with floating point numbers
5.0/2

2.5

**WARNING!!!** First common "Pythonic" confusion below:

In [5]:
#otherwise if the numbers appear integer, the division is also going to be integer
5/2

2

In [6]:
#integer division on the real number can be performed as
5.0//2

2.0

The same operator could work differently in different contexts depending on the data type

In [7]:
#e.g. for string data + is going to work as concatenation
'a'+'b'

'ab'

As for the arithmetic operators Python uses the natural order of precedence: multiplication and division performed first, addition and subtraction after. If need to change that use brackets

In [8]:
1+2*3

7

In [9]:
(1+2)*3

9

# Data types

So while data types are not explicitely declared, it is important to be aware of them. Python has 6 built-in Data-Types:

| Type     | Description           | Example |
|----------|-----------------------|---------|
| int      | Integer values        | 5     |
| float    | Floating point values | 5.1   |
| complex  | Complex values        | 2 + 2j  |
| bool     | Boolean values        | True, False    |
| str      | String values         | "abc" |
| NoneType | None value            | None    |

if you want to check type of certain data it could be done through type() function

In [10]:
type(5)

int

In [11]:
type(5.0)

float

In [12]:
type('a')

str

In [13]:
type(False)

bool

# Relational and logical operators

Now once we learned different types, lets consider corresponding operators to work with them. 
Relational operators prodicing logical values are `==` (equal), `!=` (not equal), `>`, `<`, `>=`, `<=`

In [14]:
(3+2)==(2+3)

True

In [15]:
(3-2)!=(2-3)

True

Logical operators include `and`, `or` and `not` 

In [16]:
(1<2) or (2<1)

True

In [17]:
not(2>3)

True

Logical conditions are often using in flow control (choosing which code to execute depending on a certain condition to hold; e.g. if smth is less or equal to smth, then do smth, otherwise stop.  

## Variables

Variables are used to store data.

Each variable is identified by a unique name. Names has to abide the following rules:

Can start with an underscore "\_" a capital or lower case letter. The following letters could by digits or characters.
Python is a case-sensitive language, so a variable `abc` is not the same as `ABC` or `abC`.

A common convention is to use all upper case names for global variables and all lower case names for local variables.

Also the following build-in keywords cannot be used as variable names.

There is no need to declare Python variables upfront, like in some other programming languages. The variable is created the first time it is assigned a value.

In [18]:
A=1

In [19]:
B=2

In [20]:
#notice that each cell can contain multiple operators (including assignment operators) separated by ; 
#assignment operators do not generate any output
C=A; D=1

In [21]:
#if the last operator is different from an assignment, it will generate output
D=2; D

2

Variable types in Python are not explicitly specified, but defined upon assignment. This is quite convenient, however may cause issues/bugs, which are sometimes hard to track

In [22]:
A=5

In [23]:
B=2

In [24]:
# one may expect 5/2=2.5 however recall that if A,B are integer then / works as an integer division
A/B

2

In [25]:
A=5.0; B=2; A/B

2.5

In [26]:
#A handy way of assigning multiple variables at the same time is multiple assignment:
A=B=C=D=1

In [27]:
#We can also use different values in multiple assignment:
A,B,C=1,2,3

In [28]:
#expressions could be used in the right-hand side of the assignment
A=(1+2+3)/3; A

2

remember some useful shortcuts joining assignment operators with addition, subtraction, multiplication or division:

In [29]:
A=5

In [30]:
#increment
A+=1; A

6

In [31]:
#decrement
A-=1; A

5

In [32]:
#double
A*=2; A

10

In [33]:
#half
A/=2; A

5

### Excercise 1. Given five temperature measurements 60.0, 63.0, 65.5, 70.0, 68.5, compute their average and standard deviation

Recall that standard deviation of measurements $x_1, x_2,..., x_n$ is defined as
$$
\sigma=\sqrt{\frac{\sum_{i=1}^n (x_i-\mu)^2}{n}},
$$
where $\mu=(\sum_{i=1}^n x_i)/n$ is the average value. Standard deviation shows how broad is the distribution of the measurements around their average value.

In [34]:
x1,x2,x3,x4,x5=60.0, 63.0, 65.5, 70.0, 68.5