# Introduction to Python

## Collaborative Software Development


---
Originally by Graeme Stewart, EP-SFT for the INSIGHTS Workshop, 2018-09-17

[GitHub](https://github.com/graeme-a-stewart/python-introduction), [CC-BY-4.0](http://creativecommons.org/licenses/by/4.0/)

# What is Python?

* Python is an open-source high-level interpreted language
* It's an easy language
  * Easy to code in, with many useful modules
  * Easy to read
* It's object oriented
* It's dynamic
* It's portable and it's popular
![Logo](images/python-logo.png)

# Python Popularity

![PYPL Language Popularity](images/python-pypl-popularity.png)

From [PopularitY of Programming Languages](https://pypl.github.io/PYPL.html)

# Python Popularity

![Google Trends in Data Science](images/python-r-cpp-googletrends-data.png)

![Google Trends in Machine Learning](images/python-r-cpp-googletrends-machinelearning.png)

(thanks to [Jim Pivarski](https://github.com/codas-hep/scientific-python-ecosystem))

# What's driving this?

All of the deep learning libraries have a Python interface,
in many cases the primary interface.

![Python ML Interfaces](images/python-ml-interfaces.png)

![Python Ecosystem](images/python-ecosystem.png)

Python has a very rich ecosystem of packages and plugins (taken from Jake VanderPlas, [*The Unexpected Effectiveness of Python in Science*](https://speakerdeck.com/jakevdp/the-unexpected-effectiveness-of-python-in-science) at PyCon 2017)

# But wait, an interpreted language for (big) scientific data...?

Isn't that crazy slow?

* Developer productivity is also important  
* Python is often used as a **glue** between advanced tools, e.g.
  * [NumPy](https://www.numpy.org/): Numerical calculations in Python
    * Removes much of Python's runtime overheads, to run *really* fast 
      (in many cases a lot faster than a naive code implementations in C or C++)
  * [Numba](https://numba/pydata.org/): Compile parts of python
  * [ROOT](https://root.cern.ch): Everything in ROOT can be done via python.
  
* Python provides the **logic**, other libraries the computation


# Python - let's go!

<img style="float: right;" src="images/googles.jpg">

How do we get python going?

On most computers it should be simple - just execute `python`...

```py
teal:~$ python
Python 3.6.5 |Anaconda, Inc.| (default, Apr 26 2018, 08:42:37) 
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> print("hello, world!")
hello, world!
>>>
```

Here we started python in its interpreter mode - we can then type commands and Python immediately executes them for us and gives the results (also called the *Read Evaluate Print Loop*, **REPL**)

# ipython - a better shell

The normal `python` shell is fine, but there is a better option, the `ipython` shell:

```py
teal:~$ ipython
Python 3.6.5 |Anaconda, Inc.| (default, Apr 26 2018, 08:42:37) 
Type 'copyright', 'credits' or 'license' for more information
IPython 6.4.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: print("hello, world!")
hello, world!
```

# ipython 

What's great about `ipython`?

* Getting help on anything with `?`
  * Type `?` on it's own for some overview
* Jump to the code definition with `??`
* `TAB` completion for modules and methods
* Easy access to history of inputs and outputs (e.g., `_` is the output of the last command)
* Keyboard shortcuts
* Run shell commands easily, using `!cmd`
* Magic commands
  * Try `%magic` for an overview

# notebooks - ipython on steroids

<img style="float: right;" src="images/jupyter-logo-300.png">

Actually, the most useful and coolest way to run Python interactively is in a [*Jupyter Notebook*](https://jupyter.org/).

This is a web based "shell" for running Python interactively. It can do everything that `ipython` can do in a console, but it can do a lot more as well:
* Notebooks can be saved, preserving your work
* Notebooks can be shared with others
* Cells can contain markdown for better annotation of the code
* Notebooks can run lots of languages (R, C++, ROOT)
* Notebooks can be interfaces to much more powerful facilities (SWAN)

See the backup slides for some getting started links for notebooks

(This entire [presentation](https://github.com/graeme-a-stewart/python-introduction) is written as a Jupyter notebook, using the [RISE extension](https://github.com/damianavila/RISE))

# The nuts and bolts...

<img style="float: right;" src="images/nuts-and-bolts.jpg">

Like any other programming language, we need to have some understanding of the syntax of Python to be able to program in it. So let's look at some of the basic building blocks...

* Variables
  * Numbers, Strings, ...
* Compound objects
  * Lists, Dictionaries, Tuples
* Control Flow
  * Context Manager
  * Loops and Iterating
  * Conditionals
* Functions
* Classes
* Errors and Exceptions

# Variables

## Numbers

* There are two fundamental number types in Python, integers and floats.
  * These behave pretty much as you expect

In [None]:
i=7 
f=9.0
print("My integer is", i, "and my float is", f)

In [None]:
(i*3) + 2

In [None]:
(f*3) + 2

* `int` is effectively unbounded (but for reasonable numbers it's the word size, usually 64bits)
* `float` maps to the C-type `double`, i.e., a usually a 64 bit floating point type

## Operators

All the normal arithmetic operators are available:

In [None]:
i + 2 - 3 # Addition and subtraction <- Look - we introduced you to the Python comment character here!"

In [None]:
f * 3.0 / 9.0 # Multiplication and divisionn

In [None]:
i / 2 # Note that integer division returns a float

In [None]:
i // 2 # But the // operator does an integer divide

In [None]:
i % 2 # Remainder for integer division

In [None]:
f**3 # Power operator (also pow(f,3) works)

## Conversions and casts

In [None]:
i * f # Mixed mode arithmetic "upcasts" to float

In [None]:
g = i * f + 0.5
int(g) # Cast the float result into an integer

In [None]:
float(i) # Cast an int into a float

"Normal" precedence rules apply: power then unary minus then mult/div then add/sub (remember, parentheses are your friends!)

In [None]:
-f**2*-1

## Complex

Complex numbers are a Python basic type too, formed of a real and imaginary floating point pair


In [None]:
2.0+21j # Compose with "j" for the complex part

In [None]:
complex(7, -9) # Or pass two arguments to the "complex" function

In [None]:
c = 1+2j
c * f

In [None]:
c.real

In [None]:
c.imag

In [None]:
abs(c)

## Strings

For storing text in Python we use *strings*, which are just immutable sequences of characters:

In [None]:
s="this is a dead parrot string"; t=str("it's Norwegian Blue") # single quotes are fine too
print(s, t)

In [None]:
s + " it has ceased to be!" # Use "+" to concatenate

Strings are unicode in Python3 (but watch out, they aren't in Python2)

In [None]:
s2=str("this parrot " + '\U0001F600' + " wouldn't go Voom! if you put a million volts though it")
print(s2)

In [None]:
long_s='''this is a long
string split over a few lines and has it's own "quotes" and 'quotes'
so using the triple quote syntax is pretty useful'''
print(long_s)

## String Operations and Maniplulation

In [None]:
str(3.14159) # The str() function will also convert something to a string

In [None]:
mp="the Monty Python show"
len(mp) # This is the length of the string

In [None]:
mp.upper()

In [None]:
mp.title()

In [None]:
mp.find("Python") # This gives the character index where the substring starts (or -1 if not found)

In [None]:
'   one very useful manipulation is to remove leading/trailing whitespace    '.strip()

In [None]:
'# or to see if a string starts with a particular character'.startswith("#")

## ipython help

Let's try using ipython's help and tab completion to get documentation on strings

In [None]:
str?

## Bool

Python has a built in *boolean* type as well, which can be `True` or `False`

In [None]:
t=True; f=bool(False)
print(t, f)

A boolean is the output of the comparison operator, `==`

In [None]:
print(t==f, 7==3+4)

And Python has the usual suite of Boolean operators (do use parentheses!)

In [None]:
(1==1) and (7>9)

In [None]:
(1==1) or (7>9)

In [None]:
not True

## Boolean curiosities...

Booleans will cast into the numbers 1 (`True`) and 0 (`False`)

This leads to some occasionally unexpected behaviour ...

In [None]:
print(9==True, 0.0==False) # Numnbers are False if zero, True otherwise

The `bool()` function will cast it's argument into a truth value, but it's not really recommended to do this, e.g., although strings will cast to `True` if non-zero length, it's not really obvious or clear...

In [None]:
s="the naked truth"
print(bool(s)) # Not clear

In [None]:
print(len(s) > 0) # Much clearer

# Null Value

Python has an explicit *null* value, which can be assigned to any variable using `None`

In [None]:
not_here = None
print(not_here)

`None` is used to explicitly signal that a value is unset or missing

It's a common idiom in Python to use the fact that a `None` value is considered `False`

# Compound Objects

## Lists

Lists are Python's way of grouping objects together - with lists we start to see some of the power of python as a dynamic language

Define a list using square brackets and commas to separate elements:

In [None]:
my_list = [2, 3, 5, 7, 11, 13]
print(my_list)

Lists are ordered and indexed from zero

Use the [] operator to access a specific list element

In [None]:
print(my_list)

In [None]:
my_list[2] # N.B. This is the third element!

If a negative index is given, the list is accessed counting from the right, with -1 as the last element

In [None]:
my_list[-1]

In [None]:
my_list[-3] # Third element from the end

In [None]:
len(my_list) # len() gives the total number of elements in the list

Lists are also mutable, you can change elements as you like:

In [None]:
my_list[0] = 42

In [None]:
my_list[-1] = "bicycle repair man"

In [None]:
print(my_list)

Add elements to a list using `append`:

In [None]:
my_list.append(True)

In [None]:
print(my_list)

And delete them with the `del` keyword:

In [None]:
del my_list[0]
print(my_list)

As you can see, Python is more than happy to have mixed object types in a list!

## List Slices

For for extracting ranges out of lists, `[i:j]`, gets the elements of the list from `i` up to **but not including** `j`


In [None]:
lst=list(range(10))
print(lst)

In [None]:
lst[1:3]

In [None]:
lst[5:-1] # Negative indexes act as before

In [None]:
lst[:4] # Missing the first index means "start at the beginning"

In [None]:
lst[7:] # Missing the last index means "stop at the end"

In [None]:
lst[0:7:2] # A third paramater is a "stride" value

In [None]:
lst[:] # What use is this...?

The answer is that slices are always copies, so this made a *new copy* of the list

## Dictionaries

Dictionaries are used to hold unordered arrays of *keys* and *values*

Python dictionaries can have pretty much anything for the values; keys are restricted to immutable objects

In [None]:
d={"straight" : "Graham Chapman",
   "curved" : "John Cleese",
   "drawn" : "Terry Gilliam",
   "mild" : "Michael Palin"}
print(d)

In [None]:
d["curved"] # Accessor uses [], like lists, but with the key and returns the value

In [None]:
d["extra"] = "Graeme Stewart" # Add or mutate values just by setting them
d["curved"] = "some other guy"
print(d["extra"], "and", d["curved"])

In [None]:
del d["extra"] # Use the del operator to remove entries
"extra" in d   # This is the notation to ask if a certain key exists in the dictionary

## Container merging

We saw how to add single items to containers, but there are also useful methods that merge containers into one another

For lists, you can `extend` one list with another

In [None]:
lst_1=["cats", "lizards", "parrots"]; lst_2=["beetles", "worms", "spiders"]
lst_1.extend(lst_2)
print(lst_1)

For dictionaries use `update` (N.B. existing keys get overwritten)

In [None]:
art={"picasso": "Guernica", "blanchard": "Mujer con abanico", "miro": "Mai 1968"}
more_art={"macdonald": "A Paradox", "pollock": "Full Fathom Five", "miro": "Miss Chicago"}
art.update(more_art)
print(art, len(art))

## Tuples

As well as lists, Python supports *tuples*, which are like lists but *immutable*

Tuples are defined by using commas to separate the different items in the tuple sequence:

In [None]:
tup = (7, "bannanas", True, None)
print(tup)
tup2 = "the", "parentheses", "are", "optional"
print(tup2)

Tuples can be assigned to separate variables like this:

In [None]:
a1, a2, a3, a4 = tup
print(a2)

This is a very common way to return multiple values from functions (you have to provide the same number of variables as the length of the tuple)

## Other Container Types

Just to mention other containers that we didn't have time to look at here:

* `set` - mutable unordered container of distinct objects
* `frozenset` - as above, but immutable

And the `collections` module defines some other containers that can be useful, like ordered dictionaries


# Control Flow

## Iterators and Loops

We met container types in the last section and very often we want to have an action performed repetitively on the contents of a container, or we want to loop over some other pieces of data.

In [None]:
lst_1=["cats", "lizards", "parrots"];
for animal in lst_1:
    print("Today I was bitten by", animal)

The Pythonic idiom here is very common: `for ITEM in COLLECTION`.

But in fact it would be better to describe what the `ITEM` runs over as an **iterator**. In Python an iterator is anything that can produce a sequence of values. e.g., if it is a file then it's each line of the file

In [None]:
macbeth =  open("src/macbeth.txt")
for line in macbeth:
    print(line, end="")
macbeth.close()

## Context Managers

sometimes you need a resource only for a limited time, like opening a file. Context managers allow to automatize this and make sure the resource is closed at the end of the block, no matter what

```python
with resource_allocation() as variable:
    # use resource
    variable.something()
# resource will be freed
```

In [None]:
with open("src/macbeth.txt") as macbeth:
    print(macbeth.read())

For iterating over a list (or a file) what we iterate over is clear, but what about a dictionary?

The *default* iterator on the dictionary are the keys:

In [None]:
for k in art:
    print(k, "painted", art[k])

But there is also a `values` iterator and a (key, value) iterator, called `items`

In [None]:
for v in art.values():
    print(v.upper(), "is a great painting")

In [None]:
for k,v in art.items():    # The return value of each iteration is a two value tuple
    print(v.upper(), "is a great painting by", k)

# A Syntactic Excursion

Now that we touched on iterators, there's another thing we should highlight, *Python's indentation syntax* that marks out *code blocks*

Unlike other languages that might use some braces, `{` and `}`, to mark pieces of code which are in the same block, python uses indentation

Any lines of code that have the same indentation are in the same block

In [None]:
l = list()
for k in art:                    # Note the use of the ":" here, also used in control flow
    l.append(k)                  # This line is in the indented code block, so it's executed each time
    l.append(art[k].swapcase())  # So is this one
print(l)                         # This one is not, so the code block ends on the previous line, this is outside

Depending on your mood you can view this as a wonderful exercise in uncluttered efficiency or as a painful nightmare where it becomes really hard to work out which lines are in the same block


The [very strong advice](https://www.python.org/dev/peps/pep-0008/#tabs-or-spaces) is to always use spaces, never tabs; use a good editor to help

## Conditional Control Flow

Python can execute code conditionally, using an `if ... elif ... else` syntax that will not really surprise you

In [None]:
for number in range(10):
    print("Oh,", number, "- ", end='')
    if number < 3:
        print("that's small")
    elif number < 7:
        print("that's medium")
    else:
        print("that's big")

Evidently this also shows how loops and control statements are naturally nested

## Ternary operator

Python has a compact version of `if ... then ... else ...` called a *ternary operator*

In Python this has a nice natural syntax

In [None]:
st = "it's the truth, Ruth" if len(art) == 5 else "it's a lie, Sky"
print(st)

## Loop Control

You can write a C-style conditional control loop in Python with `while (CONDITION) ...`

In [None]:
i=0
while (i<5):
    print(i)
    i+=1          # Note this nice syntax for adding to a number (it's the same as "i=i+1")
                  # Also supported are "-=", "*=", "/=" - they do what you would expect

## Better Loop Control

Usually a nicer way to get control in loops is to use the keywords `continue` and `break`:
* `continue` stops this iteration and jumps back to the start to get the next value
* `break` exits the loop immediately

In [None]:
words = ['bark', 'nothing', 'roll over', 'die', 'eat']
for cmd in words:
    if cmd == 'nothing':
        continue
    if cmd == 'die':
        break
    print(cmd)

Bonus feature: Python supports a `else` clause for loops. Any idea what this will do?

In [None]:
for cmd in words:
    if cmd == "die":
        break
else:
    print("didn't die")

# Comprehensions

Python has a rather lovely syntax for generating output lists and dictionaries from other iterables

It's very commonly used and replaces a many things that would require short loops with a compact single line

In [None]:
[ x**2 for x in [1, 3, 5, 7, 11, 13, 17] ]

You can read this as `OUTPUT for ITEM in ITERABLE`, and enclosing it within the `[]`s lets Python know this is a *list comprehension*

In [None]:
[ x**2 for x in range(1,100) if x%10 == 0 ]

Above we also added a condition that selected only certain elements of the list

Dictionary comprehensions are very similar to those for lists, just that the output is specified as `key: value` and the syntax for a dictionary comprehension is an expression enclosed in `{}`s

In [None]:
{ x: x**2 for x in range(1,100) if x%10 == 0 }

# Functions

Now we know enough of the nuts and bolts of Python to start building some more interesting things

![Meccano toy](images/meccano.jpg)

Functions are how we start to encapsulate behaviour in our programs, so that tasks can be isolated from one another and different parts of the program don't interfere

Functions normally take some inputs and give back outputs, although skipping one or the other is quite common

In Python we define a function with the `def` keyword:

In [None]:
def double_and_more(i, j):
    '''A trivial function'''
    k = i*2
    k += j
    return k

In [None]:
help(double_and_more) # This is the same as double_and_more? in ipython

In [None]:
double_and_more(7, 5) # Call a function with its name, followed by (), with any arguments inside

In [None]:
def double_and_more(i, j):
    '''A trivial function'''
    k = i*2
    k += j
    return k

* The arguments are given in parentheses after the name of the function
* The string immediately after the `def` is called the *docstring* and is printed when the user asks for help
  * Excepting trivial functions, do always write a docstring
* The `return` value exits the function, returning any values given (can be as many as you like, as a tuple)
  * If there's no return value at the end of the function it implicitly returns `None`
* Variables defined in the scope of the function block are local and not visible outside of it (this is a *good thing*)

Parameters that get passed to a function in Python are *named* and it's usually clearer if the client calls them using that name, e.g.,

In [None]:
def maths_circus(num, message):
    '''A noisy cuber'''
    print("We are shouting, '", message, "', for you", sep="")
    n = num**3
    return n
maths_circus(num=-4, message="pancakes")

This also means that parameters can be given in any order...

In [None]:
maths_circus(message="I love clowns", num=9)

Parameters can also be given default values, then they can be skipped by the client unless they wish to override the default

In [None]:
def maths_circus(num=7, message="what have the Romans ever done for us?"):
    '''A noisy cuber, with defaults'''
    print("We are shouting, '", message, "', for you", sep="")
    n = num**3
    return n
maths_circus() # Both parameters skipped - use defaults

In [None]:
maths_circus(num=-2) # message skipped - use default

In [None]:
maths_circus(message="roads, vineculture, public baths, ...") # num skipped - use default

In [None]:
maths_circus(message="confuse a cat", num=-4) # Both parameters specified - defaults overridden

## Optional Arguments

Sometimes functions need to be able to take *arbitrary* numbers of arguments, which Python can allow using the `*args` and the `**kwargs` parameters

If a function defines these special argument types then

* `args` will be a list of all positional parameters (in the order given)
* `kwargs` will be a dictionary of named arguments, with the key being the name

In [None]:
def mill(debug, *args, mesg="starting", **kwargs):
    '''Process all arguments'''
    print("debug:", debug)
    print(mesg)
    print("These are the positional arguments", args)
    print("These are the named arguments", kwargs)

mill(True, 1, 2, 4, mesg="hello", alice="good", bob="good", eve="spy")

Do not use these argument types to be lazy - it can be very difficult to debug functions that support arbitrary arguments (e.g., misspelling an argument name is a bugbear here)

# Keyword Only Arguments

sometimes you want to force a user to supply the argument by name to help avoid errors

In [None]:
def keyword_only(arg, *, keyword=None):
    print(arg, keyword)
    
keyword_only("something", keyword="else")

In [None]:
keyword_only("something", "else")

# Python Scripts

So far we have worked in the Python interpreter

This is a fantastic way to explore python and work interactively, but in many cases we want to work in a *hands off* manner

In this case, we would rather save our work in a file and get the Python interpreter to execute it for us

In [None]:
!cat src/hello.py

In [None]:
!python src/hello.py

In [None]:
!ls -l src/hello.py

In [None]:
!./src/hello.py

```py
#!/usr/bin/env python
print("hello, world!")
```

* Execute the script directly with python by giving it as the argument, `python hello.py`
* On Linux / OS X we can
  * Use the magic shebang `#!` at the start of the file so that the loader invokes python for us
  * Use `/usr/bin/env python` so that the version of Python is found from `PATH`
  * The script also needs to be marked as executable: `chmod a+x hello.py`


## Passing arguments to scripts

Let's look at another version of our hello script:

In [None]:
!cat src/hello-args.py

In [None]:
!python src/hello-args.py --name Brian

In [None]:
!python src/hello-args.py

In [None]:
!python src/hello-args.py --help

# Python Modules

There was a lot there! The first thing in the script was to import a Python *module*: `import argparse`

Modules are the way that Python extends functionality - it's one of the huge advantages of Python that it has such a rich set of modules that provide well written and easy to use extensions to the core language

In this case we imported the `argparse` module, which is a standard Python module provided by all Python installations


In [None]:
import argparse
argparse?

Python modules usually provide well written interfaces with additional functionality - you might write your own parser for arguments passed in to your script, but making it robust and providing functionality like the `--help` option would take a lot of time

The Python documentation lists the many, many [modules that are available](https://docs.python.org/3/py-modindex.html) in every standard Python installation

In addition many other modules come packaged with, e.g., the [Anaconda Python distribution](https://www.anaconda.com/) or through the standard [*PyPI*](https://pypi.org/) (Python Package Index) repository, installed with `pip`

# Importing from modules

When we import from a module by default, the module name is added to the namespace and the module's functions and other members (attributes) become available to us under that name

In [None]:
import os            # Import the os module (this is a really common one as it allows many core interactions with 
                     # the underlying system)
os.environ["PATH"]   # environ is a dictionary with the current envionment set, and it's not in the os part of the namespace

However, we can also import pieces of a module directly into the top level of the namespace, or import a module or member with a different name

In [None]:
from sys import executable
print(executable)

In [None]:
import math as maths # The British would have called it maths...
maths.sqrt(9)

In [None]:
from math import pi as half_tau
tau = 2.0 * half_tau
print(tau)

It's possible to import everything from a module into the top namespace in Python, using `from module import *`; this is really dangerous as it becomes extremely hard to know how the namespace was populated. **Try to avoid this**

## Writing your own modules

Of course once you know modules can be written, you'd probably like to know how to do it yourself

In [None]:
# Note this - "!" is an ipython special that allows us to execute a shell command
!cat mymod.py 

In [None]:
import mymod
print(mymod.modvar)

In [None]:
mymod.modvar+=1
print(mymod.modvar)

This is pretty easy - any python file found in the current directory can be imported as a module, then it becomes available, using the filename as the namespace entry

Actually, the files don't need to live in the current directory, `$PYTHONPATH` gets searched (from the shell), or `sys.path` inside Python itself

# Classes

Classes are at the core of all object oriented programming languages, and Python is no exception

A **class** is a logical group of properties (variables and functions)
 
 * encapsulation: allows to define behavior of certain objects and group everything logically
 * inheritance: classes can be based on other classes and just extend/modify the behavior


Python has a very natural way of defining and expressing classes - let's look at a simple example

In [None]:
class CounterClass:
    def __init__(self):
        self.counter = 0
        
    def add(self):
        self.counter += 1
        
    def reset(self):
        self.counter = 0
        
    def get(self):
        return self.counter
    
c = CounterClass()
c.add(); c.add()
print(c.get())

In [None]:
c.reset()
print(c.get())

Some of the key features to note:
```py
class CounterClass:
    def __init__(self):
        self.counter = 0
        
    def add(self):
        self.counter += 1
...```

* The keyword `class` introduces a class definition in its following code clock
  * The class will define a new type in the current scope
* Class methods are defined very like functions, using `def`
  * The first parameter is the class instance itself, by convention always called `self`
* The special method `__init__` is called when an instance of the class is created (a.k.a. a constructor)
  * (BTW, there are lots of these [special `__FOO__` attributes](https://docs.python.org/3/reference/datamodel.html#special-method-names) in Python, e.g., `__del__` is your destructor)
* All data members of the class are referenced via the object instance, `self`
  * `self.counter` is a *data member* of the class
  * `counter` would be a plain local variable (watch out!)

Just as an aside, when we say that Python is a dynamic language, it means that even classes can be modified dynamically:

In [None]:
def set(self, n):
    self.counter = n

# Add the "set" function as a new method to the class
CounterClass.set = set
c.set(7)
print(c.get())

In [None]:
CounterClass.msg = "we just added a new data member as well"
print(c.msg)

## Classes, Scopes and Namespaces

Python implements classes as a new data type, which means that they have their own scope and namespace

To find out what attributes are defined in a scope we can use the Python builtin `dir` function

In [None]:
dir(c)

## Subclasses and Inheritance

Python classes can also inherit from other classes, becoming subclasses - this allows objects which extend or specialize the classes that they inherit from in the usual object oriented way

In [None]:
class Poly2:
    def __init__(self, x=0, y=0):
        self.x = x
        self.y = y

class Rectangle(Poly2):
    def area(self):
        return self.x*self.y
        
class Triangle(Poly2):
    def area(self):
        return self.x*self.y/2.0

In [None]:
rect=Rectangle(3,5)
print(rect.area())

In [None]:
tri=Triangle(10,4)
print(tri.area())

In [None]:
picasso=Poly2()
'area' in dir(picasso) # This is a way to ask the object if it has an attribute of that name

So much for extending classes, we can override methods from the base classes as well:

In [None]:
class Square(Rectangle):
    def __init__(self, x=0):
        self.x = x
        
    def area(self):
        return self.x**2

sq=Square(4)
print(sq.area())

The way that Python searches for attributes in a derived class is to search the derived class first, then any parent classes, so the derived class's definition wins out

The derived class can call methods in the parent class - have a look at [`super()`](https://docs.python.org/3/library/functions.html#super)

## Class Introspection

On point that might be coming clear to you now is that Python is quite happy to pass *any* objects into function calls of methods

If the passed object has the right properties to work with the call, it works; if not, then something will fail (this is known in the trade as *Duck Typing*)

Two useful functions can be used to inspect a class's providence
* `isinstance(obj, classinfo)` returns `True` if the object is an instance of, or derived from, the classinfo class
* `issubclass(class, classinfo)` returns `True` if the object is a subclass of the classinfo class

In [None]:
print(isinstance(sq, Poly2))

In [None]:
print(issubclass(Triangle, Poly2))

In [None]:
print(issubclass(Triangle, Rectangle))

While we're on the subject, note that the builtin `type` function will return an object's type

In [None]:
print(type(Triangle), type("python"), type(7))

## Class data members

Data members of Python classes are pretty exposed - they can be accessed and *modified* by clients

This would usually be rather dangerous as it would be easy to violate an invariant of the class this way

In Python there is a convention that methods starting with an underscore (`_`) are not to be accessed directly by clients (this is by convention, members called `__name` get additionally mangled by Python to prevent accidents)

In [None]:
class Rectangle2:
    def __init__(self, x=0, y=0):
        self._x = x
        self._y = y
    
    def area(self):
        return self._x*self._y

rec=Rectangle2(2,8)
print(rec.area())

In [None]:
# Although this still works
rec._x

### Getters and setters

You should usually then write getters and setters for your "public" data members

In [None]:
class Rectangle3:
    def __init__(self, x=0, y=0):
        self._x = x
        self._y = y
    
    def area(self):
        return self._x*self._y
    
    def x(self):
        return self._x
    
    def set_x(self, x):
        if x >= 0.0:
            self._x = x
    
rec=Rectangle3(2,4)
print(rec.area())
rec.set_x(9)
print(rec.x(), rec.area())

Hmmm, but that `()` syntax is a bit of ugly boilerplate, right?

## Decorators and Properties

Python has a very neat way to turn getters and setters into much more natural feeling *properties* like this

In [None]:
class Rectangle4:
    def __init__(self, x=0, y=0):
        self._x = x
        self._y = y
    
    @property
    def area(self):
        return self._x*self._y
    
    @property
    def x(self):
        return self._x
    
    @x.setter
    def x(self, value):
        if value >= 0.0:
            self._x = value
    
rec=Rectangle4(2,4)
print(rec.area, rec.x)
rec.x=9
rec.x=-5
print(rec.x, rec.area)

This is the recommended way of getting and setting data members, with the functions wrapped up feel more natural

The syntax of `@property` is what's known as a *decorator* in Python - think of it like a way of wrapping up a function to change some of its exposed interfaces or behaviours

# Errors and Exceptions

So far all of the simple examples we looked at here have worked as expected - real life isn't like that and things are definitely going to go wrong

So how do we deal with errors in Python?

When something mis-fires in Python an *exception* is raised:

In [None]:
st="bring out your dead"
st.fnd("dead")

In [None]:
x=7; y=5
print(x/(y-5))

## Handling exceptions

The way to handle exceptions in Python is to use a `try... except...` block:

In [None]:
try:
    x=7; y=5
    print(x/(y-5))
except ZeroDivisionError:
    print("please don't divide by zero")

There can be multiple `except` blocks, for handling different errors that might happen:

In [None]:
try:
    f=open("/tmp-file.txt", "w")
    print("some text", file=f)
except OSError:
    print("Failure opening file")
except IOError:
    print("Error writing to file")

If you want to handle multiple exceptions with one piece of code, you can use `as` to set a local variable with one of a few exceptions

In [None]:
try:
    f=open("/tmp-file.txt", "w")
    print("some text", file=f)
except (OSError, IOError) as e:
    print("Failure during file handling:", e)

Notice also that exceptions are printable, and provide some normally helpful text

Exceptions are code blocks and can nest other exception handlers inside themselves, so it's possible to structure your error handling and (possible) recovery in fairly sophisticated ways

In [None]:
def divider(x, y):
    try:
        a = x/y
    except ZeroDivisionError:
        a = 0
    return a

try:
    f=open("tmp-file.txt", "w")
    print(divider(7,0), file=f)
except (OSError, IOError) as e:
    print("Failure during file handling:", e)

!cat tmp-file.txt

If your own code needs to generate an exception, use `raise`:

In [None]:
def buy(price, money):
    if money < price:
        raise RuntimeError("Not enough money to buy")
    money -= price

buy(100.0, 50.0)

If you are writing anything other than trivial code, you will want to define exception classes for you own program, which Python makes it easy to do, as you can just inherit from the built in `Exception` class

In [None]:
class TutorialException(Exception):
    pass # pass is a very handy bit of python syntax used for supporting an empty code block

## LBYL and EAFP

Programming life could be divided into two strategies for dealing with errors

* *Look Before You Leap* - check that things are going to be ok first
* *Easier to Ask Forgiveness than Permission* - go for it and clean up if you need to

In [None]:
my_dict={"k": "v"}

# Easier to Ask Forgiveness than Permission
try:
    x = my_dict["key"]
except KeyError:
    pass # in real life handle missing key (EAFP)

# Look Before You Leap
if "key" in my_dict:
    x = my_dict["key"]
else:
    pass # in real life handle missing key (LBYL)

In general Python prefers EAFP - there are a few advantages (like avoiding some race conditions, avoid double data accesses) and generally the code looks rather cleaner

However, don't get so carried away that you start to use exceptions as control flow (really, keep them for *exceptional* situations)

## The Python Ecosystem

One important aspect of python is its very healthy ecosystem of packages for different tasks. 

* [numpy](http://www.numpy.org/) - The high performance core of all serious Python numerics
* [pandas](https://pandas.pydata.org/) - Python data analysis package for importing and working with bulk data
* [scipy](https://www.scipy.org/) - Lots of common scientific routines, such as minimization
* [xarray](http://xarray.pydata.org/en/stable/why-xarray.html#core-data-structures) - Pandas for multi-dimensional structures
* [matplotlib](https://matplotlib.org/) - The most popular Python plotting package
* [seaborn](https://seaborn.pydata.org/) - Statistical data visualization
* [scikit-learn](http://scikit-learn.org/stable/index.html) - Easy to use machine learning for Python
* ...

If you think you need it, someone probably already did it: **Use the tools**

# Numpy - Numerical computations for Python

Numpy offers a general library for numeric computations

In [None]:
import numpy as np
matrix = np.arange(100, dtype=np.float).reshape(10,10)
matrix

In [None]:
matrix.sum(axis=1)

In [None]:
matrix[2:5,4:7] = 0
matrix

# Pandas - Data Analysis Library

High level interface to bulk data

In [None]:
import pandas as pd
df = pd.read_csv("src/small-training.csv")

df[:10]

In [None]:
df[["PRI_jet_num", "PRI_jet_leading_pt", "PRI_jet_leading_phi"]].describe()

Of course pandas can also present the data visually using the `matplotlib` library

In [None]:
%matplotlib notebook
df[df["PRI_jet_num"]>0].plot.hexbin(x="PRI_jet_leading_eta", y="PRI_jet_leading_phi", gridsize=25)

In [None]:
df_mass_mmc = df[df["DER_mass_MMC"]>0]
df_mass_mmc.plot.scatter(x="DER_mass_MMC", y="DER_mass_vis")

# Seaborn - statistical visualization library

In [None]:
import seaborn as sns
sns.jointplot(df_mass_mmc["DER_mass_MMC"], df_mass_mmc["DER_mass_vis"], 
              kind="reg", line_kws={"color": "red"})

# A Few Last Python Pointers

Let's close this tutorial session going back to core Python and picking up on a few things that are rather important, but quite easy to overlook

Obviously it takes quite a lot of practice to get really comfortable in Python, but the following slides tell you about some key features that will save you a lot of pain

## Some notable useful loop utilities

We looked at loops and how Python happily will run the same code over every item it gets back from an iterator - in general you just don't need to care how far through the sequence you are

However, what if you *do* need to know this, e.g., something special needs to happen at the beginning or the end?

The solution is `enumerate` that produces a counter that runs along with the loop

In [None]:
for i, animal in enumerate(["dog", "cat", "giraffe", "toad"]):
    print(animal, "was number", i, "in my list")

If you have multiple lists that you want to march over in sync, then use `zip` 

In [None]:
forename=["Michael", "Terry", "Graham", "John", "Eric"]
surname=["Palin", "Gilliam", "Chapman", "Cleese", "Idle"]
alive=[True, True, False, True, True]
for fn, sn, al in zip(forename, surname, alive):
    print(fn, sn, "is", "alive" if al else "dead")

Note the clever use of the ternary operator there!

## Parameter passing - bindings and object references

In Python the `=` operator does not copy objects, instead it makes a new binding to them; this means that the same object can have multiple bindings and if the object changes both bindings will reflect that:

In [None]:
my_list = list(range(5))
your_list = my_list
your_list[2] = "stuck in the middle"
print(my_list)

This also behaviour also applies to function calls:

In [None]:
def sillyMiddle(l):
    l[len(l)//2] = "silly"
    
my_list = list(range(5))
your_list = my_list
sillyMiddle(your_list)
print(my_list, your_list) # Both refer to the same, modified object

But beware of a subtlety here...

In [None]:
def setToSeven(x, s, d):
    x=7                 # We just created a *local* variable x, bound to 7; but the caller binding remains unchanged
    s="inner"           # Ditto for the string s
    d["inner"] = True   # Here the object is updated, but the binding remains the same
    print("Set to", x)
    
x=5; s="outer"; d=dict()
print("Start", x, s, d)
setToSeven(x, s, d)
print("End", x, s, d)

Python does not pass by C++ style *value* or *reference*, but by object. So mutable objects changed within a function call will be seen as changed by their caller (as the object binding itself did not change); however, immuatable objects, like numbers, strings and tuples, cannot be changed and so assigning to creates a new local binding - the outside binding remains unchanged.

If you do need to really make a new copy of an object, use the *copy* module

In [None]:
from copy import deepcopy # Deep copy copies the container and copies all objects recursively
her_list = deepcopy(my_list)
her_list.append("this is the end")
print(my_list[-1], "--", her_list[-1])

It's worth reminding you here that slicing a list in normal Python does produce a copy - if you want to modify a list during a loop you'd better do that

# First class functions

One of Python's great features is that functions are *first class objects*, which means that they can be generated on the fly and returned from other functions:

In [None]:
def addText(s):
    def addSomeText(f):
        f=f+s
        return f
    return addSomeText

sw=addText("swallow")
print(type(sw))
sw("good lord it's an unladen ")

In [None]:
ne=addText("newt")
print(ne("she turned me into a "))

## A little bit more about printing and formatting...

We deliberately kept our use of the `print` function quite basic so far, although we did toss in a few of it's extra parameters:
* `end` - string to print at the end of the output (default, `\n`)
* `sep` - separator between output elements (default a space)
* `file` - target output file stream (default `sys.stdout`)

To format things in a little more easily with Python strings, you can use the [*f-string*](https://docs.python.org/3/reference/lexical_analysis.html#f-strings) notation, which allows you to write variable names in `{}`s and they will be substituted in

In [None]:
bird="swallow"
state="unladen"
print(f"Look! It's an {state} {bird}") # N.B. String is prefixed with "f"

Numbers can take format specifiers to control how they are printed

In [None]:
import math
hx=0xdeadbeef
f1=math.pi
print(f"The number is {hx} (or {hx:#0x}); we love {math.pi}, or {math.pi:.3} if we are in a hurry")

# Some standard module highlights

There is an absolute wealth of useful code inside Python's own [standard modules](https://docs.python.org/3/py-modindex.html); a few of the most important and useful modules are:

* `argparse` - for parsing script options passed on the command line
* `configparser` - for reading settings from standard INI files
* `datetime`, `time` - functions for date and time manipulation
* `fnmatch`, `glob` - shell style matching of files and strings
* `logging` - powerful utility for writing log messages from programs
* `math` - maths functions (though for large amounts of data use `numpy`!)
* `os` - operating system interfaces (including the filesystem)
* `re` - regular expressions
* `sys` - system parameters and functions
* `unittest` - xUnit testing framework for Python

## Python2 and Python3

* Python is currently finishing a major version transition, from 2 to 3
  * Python2 support [stops quite soon](https://pythonclock.org/), on 1 January 2020
* Almost every major useful Python module now runs in Python 3, so it's the recommended way to start any new project
* Python 3...
  * Introduces a new `print()` function instead of the old Python2 `print` *statement*
  * Integer division (`3/2`) will return a float (use `3//2` if you want pure rounded int division)
  * Strings in Python3 are all *unicode* and pure data should be stored in `bytes` or `bytearray`
  * `range` becomes an iterator by default and there is no `xrange` (use `list(range(...))` to get a list if you need it)
  * Exceptions are raised more consistently (`raise IOError("disk drive on fire")`)
    * And handled more easily using `as` (`except NameError as err`)
  * Oh, and Python3 is often a lot faster as well

## Python3 and HEP

* However... although Python3 is now the standard, in the HEP community we are a bit behind
  * You may therefore find you have to use Python2 for some HEP use cases
  * In which case you should definitely take a look at the `__future__` module that can allow you to write Python2 code using a lot of Python3 syntax in advance
  * But there are a few things you just don't have - sorry, no f-strings!

```py
lxplus015:~$ python
Python 2.7.5 (default, Jul 13 2018, 13:06:57) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> print("hello, world!") # Actually this is also ok in Python2.7
hello, world!
>>> 3/2
1
>>> from __future__ import division, print_function
>>> print(3/2)
1.5
```


# Final Words

In case you didn't yet realize it, **Python is pretty amazing**

It's a highly productive language that's easy to learn and opens a world of possibility for effective and efficient programming

This introduction was as much of a taster as could be managed in the time we had, but there are now many avenues that you could explore from here:

* The Python tutorial is a nice introduction that covers a lot of the basic ground for Python
* The LHCb *StarterKit* provides grounded training for physicists with nice exercises to do on the way
* The [HEP Software Foundation](https://hepsoftwarefoundation.org/) is compiling a list of [Python resources](https://github.com/hsf-training/PyHEP-resources) for our community

You can find many more resources by searching the internet and sites like *Stack Overflow* contain a wealth of answers to common problems

Last, but not least, your colleagues and fellow Pythonistas will be a source of help, advice, fixes and, if all else fails, solace

**Enjoy Python**

# Python IDEs

Of course you can develop all this in a normal text editor, but there are highly sophisticated integrated development environments out there

* [PyCharm](https://www.jetbrains.com/pycharm/)
* [Visual Studio Code](https://code.visualstudio.com/) (yes, also under Linux and on [GitHub](https://github.com/Microsoft/vscode).
* Eclipse + PyDev
* KDevelop
* Atom, Sublime, Emacs, Vim ...

Or with an data analysis focus:
* [Spyder](https://www.spyder-ide.org/) (comes with Anaconda)
* [Rodeo](https://rodeo.yhat.com/)
* [Jupyterlab](https://jupyterlab.readthedocs.io/en/stable/index.html)

# Backup

## Getting notebooks up and running

We pointed out some of the great features of notebooks at the start, here are some pointers...

* The [Project Jupyter website](https://jupyter.org/) (see [install](https://jupyter.org/install))

The easy ways to install are through the Anaconda python distribution or using pip

Then you can clone this lecture and start the notebook server...

```
git clone ...
jupyter notebook
[I 18:55:02.119 NotebookApp] Serving notebooks from local directory: /Users/graemes/docs
[I 18:55:02.120 NotebookApp] The Jupyter Notebook is running at:
[I 18:55:02.120 NotebookApp] http://localhost:8888/?token=bd9fb3599d7b4f7bf23a53efd8987cf3cc8dc1fe4d358eb6
```

...and navigate to the notebook link given (usually it starts automatically)

# Acknowledgements

Many thanks to all my colleagues who helped provide pieces of material that got sliced, diced and re-assembled into this tutorial

Particular thanks to 

* the LHCb StarterKit team (Arthur, Chris, Violeene, Dario in particular) who wrote a great "follow-along" tutorial
* Jim Pivarski who has a deep understanding of all things Python and how they fit for HEP
* Eduardo Rodrigues, who was the driving force behind our new HSF PyHEP group
