
## What is Python

- dynamic / strongly typed
- interpreted
- object oriented
- open source
- wide range of use cases

## Guido van Rossum

<img src="../assets/guido.jpeg" alt="drawing" width="300"/>

Python was named after Monty Python - see the [Argument Clinic](https://youtu.be/xpAvcGcEc0k?t=84).

For more on the development of Python:
- [The A-Z of Programming Languages](https://www.computerworld.com.au/article/255835/a-z_programming_languages_python/)
- [Lex Fridman interview](https://www.youtube.com/watch?v=ghwaIiE3Nd8)

## Why Python

The motivation behind Python is **programmer productivity**:

> So I set out to come up with a language that made programmers more productive, and if that meant that the programs would run a bit slower, well, that was an acceptable trade-off - Van Rossum

Python is **not** the language of choice for applications where speed is the most important concern

Python is the language of choice for data science - also popular with web developers

Central to the language is the idea of being **pythonic**

```python
#  not pythonic
for i in range(mylist_length):
   do_something(mylist[i])

#  pythonic
for element in mylist:
   do_something(element)
```

## The Zen of Python

Python has a philosophy built into the language

In [2]:
import this

## Running the Python interperter

If you are using the recommended Jupyter Lab, then you can eaisly access a terminal in the same window as this notebook

We can start an interactive Python **interpreter** session by typing the following 

```bash
#  the $ indicates the command is run in a shell
$ python
```

We can see where this Python executable is located on our machine using the bash program `which`:

In [3]:
#  use ! to run bash/shell code in the notebook directly
!which python

/home/stas/anaconda3/envs/dsr/bin/python


And what version of Python we are using by running Python with a command line argument:

In [4]:
#  use ! to run bash/shell code in the notebook directly
!python --version

Python 3.6.9 :: Anaconda, Inc.


# Python development environment

## Which Python should I use?

Use Python 3.6

3.7 has data classes (a class typically containing mainly data, better namedtuple) 
- not compatible with some data science libraries (ie tensorflow)
- if you use these features now your colleagues will need to install a new virtual environment

## System Python

macOS and most Unix operating systems come with a version of Python installed by default. 

This is the **system Python**, and is used by the OS.  You want to avoid using this - breaking this can be painful.

Keep your system Python as clean as possible.

You can do this using a **virtual environment**.  

## Virtual environments

Virtual environments are ignored by most beginners - using them is part of becoming an intermediate level Python programmer.  It is a best practice.

Idea = it is cheaper and simpler to copy the whole Python installation and to customise it than to try to manage a single installation that satisfies all the requirements. It’s the same advantage we have when using virtual machines, but on a smaller scale.
- similar idea that is is eaiser to do a clean install of a buggy OS than to fix it

A virtual environment allows you to **isolate** different installations of Python
- a directory (with many subdirectories) that mirrors a Python installation like the one that you can find in your operating system

Makes it easy access to different installations of Python with different packages
- managing fast moving libraries like TensorFlow, pandas
- reproducibility

Using Python virtual environments (usually one per project) is **computational hygiene**.

```bash
#  create a new environment called dsr
$ conda create --name dsr python=3.6

#  activate the dsr environment
$ conda activate dsr

$ pip install jupyterlab
```

## Anaconda

A distribution of Python ([installers here](https://www.anaconda.com/distribution/)) for data science
- precompiles a lot of the C code used in libraries like `numpy` - useful on Windows

Also has a **virtual environment manager**

```bash
$ conda info —envs

$ conda create --name dsr python=3.6

$ conda env remove -n dsr
```

## iPython

IPython = Interactive Python 
- command shell for interactive computing
- IPython is what runs in Jupyter

We can use `?` to see infomation about Python objects:

Two `??` to see the source:

In [5]:
def example_func():
    """ 
    My docstring
    """
    print('')

example_func?

[0;31mSignature:[0m [0mexample_func[0m[0;34m([0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m My docstring
[0;31mFile:[0m      ~/dsr/dsr-classes/python/basics/<ipython-input-5-adcd8595a29a>
[0;31mType:[0m      function


In [6]:
example_func??

[0;31mSignature:[0m [0mexample_func[0m[0;34m([0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mSource:[0m   
[0;32mdef[0m [0mexample_func[0m[0;34m([0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m    [0;34m""" [0m
[0;34m    My docstring[0m
[0;34m    """[0m[0;34m[0m
[0;34m[0m    [0mprint[0m[0;34m([0m[0;34m''[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mFile:[0m      ~/dsr/dsr-classes/python/basics/<ipython-input-5-adcd8595a29a>
[0;31mType:[0m      function


## Built-in functions

The Python interpreter has a number of functions and types built into it that are always available - [see the documentation for a complete list](https://docs.python.org/3/library/functions.html)

- `len`
- `any`
- `all`
- `reversed`
- `range`

## dir()

See your current **name space** using the Python builtin `dir()`

In [7]:
dir()

['In',
 'Out',
 '_',
 '__',
 '___',
 '__builtin__',
 '__builtins__',
 '__doc__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 '_dh',
 '_exit_code',
 '_i',
 '_i1',
 '_i2',
 '_i3',
 '_i4',
 '_i5',
 '_i6',
 '_i7',
 '_ih',
 '_ii',
 '_iii',
 '_oh',
 'example_func',
 'exit',
 'get_ipython',
 'quit',
 'this']

We can also use `dir` to see what *methods* and *attributes* an instance an object has.

Let's look at the `dir` of a `float` object:

In [8]:
dir(float(64))

['__abs__',
 '__add__',
 '__bool__',
 '__class__',
 '__delattr__',
 '__dir__',
 '__divmod__',
 '__doc__',
 '__eq__',
 '__float__',
 '__floordiv__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getformat__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__int__',
 '__le__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__neg__',
 '__new__',
 '__pos__',
 '__pow__',
 '__radd__',
 '__rdivmod__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rfloordiv__',
 '__rmod__',
 '__rmul__',
 '__round__',
 '__rpow__',
 '__rsub__',
 '__rtruediv__',
 '__setattr__',
 '__setformat__',
 '__sizeof__',
 '__str__',
 '__sub__',
 '__subclasshook__',
 '__truediv__',
 '__trunc__',
 'as_integer_ratio',
 'conjugate',
 'fromhex',
 'hex',
 'imag',
 'is_integer',
 'real']

## Primitives - numbers

In [9]:
float(16)

16.0

In [10]:
int(16.0)

16

We can do exponentiation:

In [11]:
float(4)**20

1099511627776.0

In [12]:
float(4)**0.5

2.0

The modulo (`%`) operator gives us the remainder after division:

In [13]:
20 % 6

2

This can be used to check if a number is even:

In [14]:
20 % 2 == 0

True

Another use case for this is only printing something at certain frequencies when iterating (such as batches or epochs when training neural nets):

In [15]:
for i in range(20):
    if i % 3 == 0:
        print(i)

0
3
6
9
12
15
18


## Primitives - booleans

In [16]:
True

True

In [17]:
False

False

**Truthy, Falsy** 
- values which are evaluated to `True` or `False`

In [18]:
bool(100)

True

In [19]:
bool(0)

False

In [20]:
bool([0, 1])

True

In [21]:
bool(None)

False

In [22]:
True * True

1

In [23]:
True * False

0

In [24]:
False * False

0

In [25]:
if []:
    print('Yay')

## Conditionals

In [26]:
import random

c1 = random.randint(0, 1)
c2 = random.randint(0, 1)

if c1:
    print('c1')
    
elif c2:
    print('not c1, c2')
    
else:
    print('not c1, not c2')

not c1, c2


## Comparisons

In [27]:
5 == 6

False

In [28]:
5 == 3 + 2

True

In [29]:
x = 1
y = 2

x == y  # ... x is equal to y
x != y  # ... x is not equal to y
x > y   # ... x is greater than y
x < y   # ... x is less than y
x >= y  # ... x is greater than or equal to y
x <= y  # ... x is less than or equal to y

True

## Logical operators

`and`, `or`, `not`

In [30]:
True and False

False

In [31]:
True or False

True

In [32]:
not True

False

In [33]:
not False

True

## Variables & objects

In Python (unlike other languages) there is a difference between **objects** and **variables**:
- object = the actual data in memory
- variable = a label that refers to an object

Objects have an identity, type and value.  Only the value changes over time.

In Python, variables **refer** to objects.  They are labels for objects - not the object themselves.
- one object can have many labels
- one label = only one object

Below we create two objects

In [41]:
first = [2, 4, 8]

second = [2, 4, 8]

We can use two different operators to compare these variables.

The `==` operator checks if the two objects have the same values:

In [42]:
first == second

True

The `is` operator checks whether both variables refer to the same object:

In [43]:
first is second

False

In [44]:
third = first

first == third

True

In [45]:
first is third

True

Under the hood Python is comparing the object's `id` - a unique value for each object:

In [46]:
id(first)

140337543343752

In [47]:
id(third)

140337543343752

Most of the time we only care about comparing values, meaning

This behaviour can lead to strange effects:

In [48]:
third.append(16)

first

[2, 4, 8, 16]

# Loops

### `for`

The `range` bulitin provides a convenient way to create an iterable: 

In [49]:
range?

[0;31mInit signature:[0m [0mrange[0m[0;34m([0m[0mself[0m[0;34m,[0m [0;34m/[0m[0;34m,[0m [0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m     
range(stop) -> range object
range(start, stop[, step]) -> range object

Return an object that produces a sequence of integers from start (inclusive)
to stop (exclusive) by step.  range(i, j) produces i, i+1, i+2, ..., j-1.
start defaults to 0, and stop is omitted!  range(4) produces 0, 1, 2, 3.
These are exactly the valid indices for a list of 4 elements.
When step is given, it specifies the increment (or decrement).
[0;31mType:[0m           type
[0;31mSubclasses:[0m     


In [50]:
for item in range(0, 3, 1):
    print(item)

0
1
2


You can control the start, stop & step:

In [51]:
for item in range(1, 6, 2):
    print(item)

1
3
5


### `while`

A common pattern is to use a condition to break out of a loop:

In [52]:
done = False

while not done:
    done = True

## Exercises

Write a program to print out:

```
*****                                                                  
  *                                                                    
  *                                                                    
  *                                                                    
  *                                                                    
  *                                                                    
  *  
```

It might be useful to know you can do

`'*' * 2 = '**'`

`'*' + ' ' = '* '`

Write a program to print:
```
1
22
333
4444
55555
666666
7777777
88888888
999999999
```

## Datetimes

Common in the workflow of the data scientist is working with datetimes.

### ISO 8601

A standard for formatting datetime strings:

`2019-09-23T17:45:18+00:00`

The bit after the `+` represents the offset from UTC 

`2019-09-23T17:45:18Z` is equivalent to the above (`Z` = Zulu = UTC)

## datetime

The `datetime` library offers an object (also called `datetime`) for handling dates in Python

In [59]:
from datetime import datetime

datetime?

[0;31mInit signature:[0m [0mdatetime[0m[0;34m([0m[0mself[0m[0;34m,[0m [0;34m/[0m[0;34m,[0m [0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m     
datetime(year, month, day[, hour[, minute[, second[, microsecond[,tzinfo]]]]])

The year, month and day arguments are required. tzinfo may be None, or an
instance of a tzinfo subclass. The remaining arguments may be ints.
[0;31mFile:[0m           ~/anaconda3/envs/dsr/lib/python3.6/datetime.py
[0;31mType:[0m           type
[0;31mSubclasses:[0m     


In [60]:
datetime(2019, 1, 1)

datetime.datetime(2019, 1, 1, 0, 0)

Also useful is the `timedelta`:

In [61]:
from datetime import timedelta

timedelta?

[0;31mInit signature:[0m [0mtimedelta[0m[0;34m([0m[0mself[0m[0;34m,[0m [0;34m/[0m[0;34m,[0m [0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m     
Difference between two datetime values.

timedelta(days=0, seconds=0, microseconds=0, milliseconds=0, minutes=0, hours=0, weeks=0)

All arguments are optional and default to 0.
Arguments may be integers or floats, and may be positive or negative.
[0;31mFile:[0m           ~/anaconda3/envs/dsr/lib/python3.6/datetime.py
[0;31mType:[0m           type
[0;31mSubclasses:[0m     


These can be used together in intuitive ways:

In [62]:
datetime(2019, 1, 1) - timedelta(days=10)

datetime.datetime(2018, 12, 22, 0, 0)

In [63]:
datetime(2019, 1, 1) > datetime(2018, 1, 1)

True

Attributes on the `datetime` object show us the day, year etc:

In [64]:
datetime(2019, 1, 1).year

2019

We can also use this object to get the current time:

In [65]:
datetime(2019, 1, 1).now()

datetime.datetime(2020, 6, 29, 13, 42, 5, 688774)

We can use the `strftime` to print the datetime in a format we want ([codes for day, week are given here](http://strftime.org/)):

In [66]:
datetime(2019, 1, 1).strftime('%d.%m.%Y')

'01.01.2019'

## Exercise

Create a list of datetimes on a 5 minute frequency between `2019-09-23T17:45:00+00:00` to `2019-09-25T07:05:00+00:00`

Run a **while** loop that:
- starts at 1989-09-09
- increments in 5 day increments
- stops when the date exceeds 1990-10-03
- prints the remaining time until the next 5 day point

Run a **for** loop that:
- starts at 1988-02-28
- iterates in 25 day increments
- prints the month when the month changes (only the month!), otherwise prints 'No change'
- stops when the date reaches (or exceeds) 1989-09-09
- prints the remaining time until the next 25 day point

No change
3
4
5
6
7
No change
8
9
10
11
No change
12
1
2
3
4
No change
5
6
7
8
No change
9
16 days, 0:00:00
