# Python for Data Science

## Lecture 5: Matplotlib

# The programming environment we use

### Diversity of programming
- Python
- Anaconda
- Jupyter

## Jupyter


- Jupyter - A web application to create documents which can contain live code, functions, graphs 
- The main extension of the Jupyter files is: `.ipynb`
- This extension can be converted to many different formats (HTML, PDF, LateX ...)
- The contents are organized by cells

### Cell types


1. Code cells (Python/R/Lua/... code) 
2. Text cells
3. Markdown cells: text formatted by Markdown

### Code cells

In [1]:
print("Hello world")

Hello world


The return of the last command is usually displayed upon run.

In [2]:
2 + 3
3 + 4

7

This can even be a tuple consisting of multiple values as below:

In [3]:
2 + 3, 3 + 4, "hello " + "world"

(5, 7, 'hello world')

### Markdown cells

**This is how you make bold text**

*This is how you make italics*

| This  | Here|
| --- | ---  |
| is a | spreadsheet |

You can even use LaTeX:

$$
    \frac{1}{n}\sin x=    \frac{sinx}{n}=    \frac{sixn}{n}=    six\frac{n}{n}=six=6
$$




# How to use Jupyter

## Command mode and Edit mode

1. Command mode: commands are being run in the cells without altering their contents. 
  - Selected cells are marked with blue
2. Edit mode: modifying the contents of a given cell.
  - The selected cell is marked with green.

### Changing between modes

1. Esc: From edit to command
2. Enter or double click: from command to edit

## Running a cell in edit mode

1. Ctrl + Enter: run cell
2. Shift + Enter: run cell and go to next cell
3. Alt + Enter: run cell and insert new cell

### Useful shortcuts in command mode
- enter: edit cell
- a/b  : insert cell above or below the selected cell (note: a/b for above/below)
- m/y  : change cell type: Markdown/Code
- dd   : delete cell

# Cell magic

Special commands that modify the way a cell operates

In [4]:
%%time

for x in range(1000000):
    pass

CPU times: user 44.5 ms, sys: 68 µs, total: 44.6 ms
Wall time: 43.6 ms


In [5]:
%%timeit

x = 2

10.8 ns ± 0.0476 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each)


In [6]:
%%writefile hello.py

print("Hello world")

Overwriting hello.py


In [7]:
!python hello.py

Hello world


To list all available Cell magic:

In [8]:
%lsmagic

Available line magics:
%alias  %alias_magic  %autoawait  %autocall  %automagic  %autosave  %bookmark  %cat  %cd  %clear  %colors  %conda  %config  %connect_info  %cp  %debug  %dhist  %dirs  %doctest_mode  %ed  %edit  %env  %gui  %hist  %history  %killbgscripts  %ldir  %less  %lf  %lk  %ll  %load  %load_ext  %loadpy  %logoff  %logon  %logstart  %logstate  %logstop  %ls  %lsmagic  %lx  %macro  %magic  %man  %matplotlib  %mkdir  %more  %mv  %notebook  %page  %pastebin  %pdb  %pdef  %pdoc  %pfile  %pinfo  %pinfo2  %pip  %popd  %pprint  %precision  %prun  %psearch  %psource  %pushd  %pwd  %pycat  %pylab  %qtconsole  %quickref  %recall  %rehashx  %reload_ext  %rep  %rerun  %reset  %reset_selective  %rm  %rmdir  %run  %save  %sc  %set_env  %store  %sx  %system  %tb  %time  %timeit  %unalias  %unload_ext  %who  %who_ls  %whos  %xdel  %xmode

Available cell magics:
%%!  %%HTML  %%SVG  %%bash  %%capture  %%debug  %%file  %%html  %%javascript  %%js  %%latex  %%markdown  %%perl  %%prun  %%pypy  %%

In [9]:
help()



Welcome to Python 3.8's help utility!

If this is your first time using Python, you should definitely check out
the tutorial on the Internet at https://docs.python.org/3.8/tutorial/.

Enter the name of any module, keyword, or topic to get help on writing
Python programs and using Python modules.  To quit this help utility and
return to the interpreter, just type "quit".

To get a list of available modules, keywords, symbols, or topics, type
"modules", "keywords", "symbols", or "topics".  Each module also comes
with a one-line summary of what it does; to list the modules whose name
or summary contain a given string such as "spam", type "modules spam".

help> 

You are now leaving help and returning to the Python interpreter.
If you want to ask for help on a particular object directly from the
interpreter, you can type "help(object)".  Executing "help('string')"
has the same effect as typing a particular string at the help> prompt.


# What happens in the background:

- All notebook has its own dedicated _Kernel_ (Python interpreter)
  - The kernel can be interrupted or restarted from the menu (interrupt/restart) 
  - **Always** run the command `Kernel -> Restart & Run All` before you submit your task or homework so you can make sure everything works as you intended!
- All cells exist in a common namespace 
- Cells can be run in any order 
- If you reopen a notebook, by default, inputs and outputs are preserved, but objects are not!

In [10]:
print("This runs first")

This runs first


In [11]:
print("Then this. Take a look at the number to the left")

Then this. Take a look at the number to the left


In [12]:
a=2

`a` to contain integer value of 2.

In [13]:
a

2

`a` still contains vale of 2.

## Cell inputs and outputs can be accessed later on.

In [14]:
42

42

In [15]:
_

42

To access the output right before the last output:

In [16]:
"first"

'first'

In [17]:
"second"

'second'

In [18]:
__

'first'

In [19]:
__

'second'

To access the 'before before' output:

In [20]:
___

'second'

You can access the n.th output via the variable `_output_count`. This is only defined if the n.th cell actually had an output.

Here is a method to list all available outputs. You will understand the code later on.

In [21]:
list(filter(lambda x: x.startswith('_') and 
            x[1:].isdigit(), globals()))

['_2', '_3', '_8', '_13', '_14', '_15', '_16', '_17', '_18', '_19', '_20']

## Inputs can be accessed in a similar way:

Input before:

In [22]:
_i

"list(filter(lambda x: x.startswith('_') and \n            x[1:].isdigit(), globals()))"

N.th input:

In [23]:
_i2

'2 + 3\n3 + 4'

# The Python programming language:

## History of Python


- Python started as a Dutch programmer's hobby project (Guido van Rossum)
- Python 1.0  1994
- Python 2.0  2000
  - garbage collector
  - Unicode support
- Python 3.0  2008
  - Not compatible with versions before!
- Python2 End-of-Life (EOL): January 1, 2020 
  - Thus we only work with Python 3 as well
  - Update SAGE!

## Python community and development

- Python Software Foundation nonprofit organization (Delaware, US)
- There is a strong community working on the development
- Huge standard library available
- Extensive system for third-party modules: PyPI (Python Package Index)
- pip installer (pip = pip installs packages)



In [24]:
import antigravity

## Use cases of Python
 - Web and Internet
 - Scientific calculations
 - Education

# Basic properties of Python

### Difference between interpreted and compiled languages: 

![interpreted](https://runestone.academy/runestone/books/published/thinkcspy/_images/interpret.png)
![compiled](https://runestone.academy/runestone/books/published/thinkcspy/_images/compile.png)

In the reality, many languages (and Python) uses a mixed strategy. We will call Python an interpreted language for now.
###### Interpreted
 - Fast coding
 - Easy fixing
 - Usually slower running
 - The code and the program are not separated

## Whitespaces

- Python uses indentations instead of brackets to affect the run of the program
- No semicolons are used

In [25]:
n = 12
if n % 2 == 0:
    print("n is even")
else:
    print("n is odd")

n is even


##  Dynamically typed language

- Typing is determined during runtime and not during the compilation time

In [26]:
n = 2
print(type(n))

n = 2.1
print(type(n))

n = "foo"
print(type(n))

<class 'int'>
<class 'float'>
<class 'str'>


In [27]:
# implicit conversion:
i = 2
f = 1.2
s = i + f
print(type(i), type(f))
print(type(i + f))
print(s,type(s))

<class 'int'> <class 'float'>
<class 'float'>
3.2 <class 'float'>


In [28]:
# explicit conversion
print("My IQ is " + str(3.1415) + " which is quite low")

My IQ is 3.1415 which is quite low


##  Assignment

There are some differences from other languages:

- In C++: `i = 2` means that the typed variable i gets a copy of the value 2
- In Python `i = 2` means that the name i gets a reference to a numeric object, of which value is 2.

- Right side is evaluated first, then the left side gets a reference to the value on the right side.

Function `id` gives the distinct enumerator of the given object. (Why can this cause a problem?)

In [29]:
i = 2
print(id(i))

i = 3
print(id(i))

94236758584672
94236758584704


In [30]:
i = "elte"
print(id(i))

s = i
print(id(s) == id(i))

old_id = id(s)
s += "matek"
print(id(s) == id(i))
print(old_id == id(s))

140297461067568
True
False
False


In [31]:
a = 2
b = a
print(id(a) == id(b))
a += 1
print(id(a) == id(b))

True
False


In [32]:
a=25
b=25
a is b

True

In [33]:
a=300
b=300
a is b

False

In [34]:
25 is 25

  25 is 25


True

# Simple commands

## if, elif, else

In [35]:
#n = int(input())
n = 12

if n < 0:
    print("N is negative")
elif n > 0:
    print("N is positive")
else:
    print("N is neither negative nor positive")

N is positive


## Conditional statements

- There are `if` one-liners
- The order of the operators is different than we get used to in C. In C, the code would look like this:

~~~C
int x = -2;
int abs_x = x>=0 ? x : -x;
~~~
- This can only be recommended in very short codes. 

In Python:

`<expression1> if <condition> else <expression2>`

In [36]:
n = -2
abs_n = n if n >= 0 else -n
abs_n

2

## Lists

- This is the most commonly used built in data structure
- Operations: Indexing, Length, Appending
- To be discussed in details later

In [37]:
l = []  # empty list
l.append(2)
l.append(2)
l.append("foo")

len(l), l

(3, [2, 2, 'foo'])

In [38]:
l[1] = "bar"
l.extend([-1, True])
len(l), l

(5, [2, 'bar', 'foo', -1, True])

## for, range

### Iterating through a list:

In [39]:
for e in ["foo", "bar"]:
    print(e)

foo
bar


### Iterating through an integer range:

This would look like the below in C++:
~~~C++
for (int i=0; i<5; i++)
    cout << i << endl;
~~~

A `range` **always** starts with 0 (zero)!

In [40]:
for i in range(5):
    print(i)

0
1
2
3
4


We can specify the start value:

In [41]:
for i in range(2, 5):
    print(i)

2
3
4


We can specify the value of the incrementations. In this case, it is mandatory to specify the starting value!

In [42]:
for i in range(0, 10, 2):
    print(i)

0
2
4
6
8


## while

In [43]:
i = 0
while i < 5:
    print(i)
    i += 1
i

0
1
2
3
4


5

There is no `do...while` in Python.

## break and continue

- `break`: If we want to exit a cycle abortively
- `continue`: If we want to jump to the next iteration before it is indicated by the cycle

In [44]:
for i in range(10):
    if i % 2 == 0:
        continue
    print(i)

1
3
5
7
9


In [45]:
for i in range(10):
    if i > 4:
        break
    print(i)

0
1
2
3
4


# Functions

It is important to understand that programming functions are not the same as mathematical functions. For example, the output of the function does not only depend on the inputs.

## Definition of functions

We can define a funtion using the `def` keyword:

In [46]:
def foo():
    print("I am a very smart function")
     
foo()

I am a very smart function


## Function arguments and parameters

1. positional arguments
2. named or keyword arguments

First, you need to indicate the positional arguments, then indicate the keyword args.

In [47]:
def foo(arg1, arg2, arg3):
    print("arg1 ", arg1)
    print("arg2 ", arg2)
    print("arg3 ", arg3)
    
foo(1, 2, "asdfs")

arg1  1
arg2  2
arg3  asdfs


In [48]:
foo(1, arg3=2, arg2=29)

arg1  1
arg2  29
arg3  2


In [49]:
arg3=4

## Default arguments

- Arguments can have default values.
- You have to put the arguments without default values first.

In [50]:
def foo(arg1, arg2, arg3=3):
    print("arg1 ", arg1)
    print("arg2 ", arg2)
    print("arg3 ", arg3)
foo(1, 2)

arg1  1
arg2  2
arg3  3


It is not even mandatory to give input for an argument with a default value.

In [51]:
foo(1, 2)

arg1  1
arg2  2
arg3  3


In [52]:
foo(arg1=1, arg3=33, arg2=222)

arg1  1
arg2  222
arg3  33


You can omit any arg with a default value when providing the inputs. 

In [53]:
def foo(arg1, arg2=2, arg3=3):
    print("arg1 ", arg1)
    print("arg2 ", arg2)
    print("arg3 ", arg3)
    
foo(11, 33)
print("")
foo(11, arg3=33)

arg1  11
arg2  33
arg3  3

arg1  11
arg2  2
arg3  33


This creates a possibility for you to have functions with a huge number of arguments, without making the function difficult to use. Such functions exist in many of the libraries.

As an example, here is a function from the library `pandas`:

~~~python
 pandas.read_csv(filepath_or_buffer, sep=', ', delimiter=None, header='infer', names=None, index_col=None, usecols=None, squeeze=False, prefix=None, mangle_dupe_cols=True, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, skip_blank_lines=True, parse_dates=False, infer_datetime_format=False, keep_date_col=False, date_parser=None, dayfirst=False, iterator=False, chunksize=None, compression='infer', thousands=None, decimal=b'.', lineterminator=None, quotechar='"', quoting=0, escapechar=None, comment=None, encoding=None, dialect=None, tupleize_cols=False, error_bad_lines=True, warn_bad_lines=True, skipfooter=0, skip_footer=0, doublequote=True, delim_whitespace=False, as_recarray=False, compact_ints=False, use_unsigned=False, low_memory=True, buffer_lines=None, memory_map=False, float_precision=None)
 ~~~

# Return value and the `return` command:

- One function can have more than one return value.
  - In this case, the values are stored as a tuple.
- If the running of the function ends without reaching a `return` command, the return value is automatically `none`.
- An empty `return` command is also returning `None` value

In [54]:
def foo(n):
    if n < 0:
        return "negative"
    if 0 <= n < 10:
        return "positive", n
    # return None
    # return

print(foo(-2))
print(foo(3), type(foo(3)))
print(foo(12))

negative
('positive', 3) <class 'tuple'>
None


In [55]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


In [56]:
import random
def make_maze(w = 16, h = 8):
    vis = [[0] * w + [1] for _ in range(h)] + [[1] * (w + 1)]
    ver = [["|  "] * w + ['|'] for _ in range(h)] + [[]]
    hor = [["+--"] * w + ['+'] for _ in range(h + 1)]
 
    def walk(x, y):
        vis[y][x] = 1
 
        d = [(x - 1, y), (x, y + 1), (x + 1, y), (x, y - 1)]
        random.shuffle(d)
        for (xx, yy) in d:
            if vis[yy][xx]: continue
            if xx == x: hor[max(y, yy)][x] = "+  "
            if yy == y: ver[y][max(x, xx)] = "   "
            walk(xx, yy)
 
    walk(random.randrange(w), random.randrange(h))
 
    s = ""
    for (a, b) in zip(hor, ver):
        s += ''.join(a + ['\n'] + b + ['\n'])
    return s 

In [57]:

print(make_maze())

+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                 |     |        |        |     |
+  +--+--+  +--+  +  +--+  +--+  +  +--+  +--+  +
|  |     |     |     |     |     |     |        |
+  +  +  +--+  +--+  +--+--+  +--+  +  +--+--+  +
|     |     |  |  |     |  |  |     |  |        |
+--+  +--+--+  +  +--+  +  +  +  +--+  +  +--+--+
|     |        |           |  |     |  |     |  |
+--+--+  +--+--+--+--+--+--+  +--+--+  +--+  +  +
|        |                 |  |        |  |     |
+  +--+  +  +--+--+--+--+  +  +  +--+--+  +--+  +
|  |     |        |  |     |     |        |     |
+  +--+--+--+--+  +  +  +--+--+--+  +--+  +  +--+
|           |     |  |     |     |  |  |     |  |
+  +--+  +--+  +--+  +--+  +  +  +  +  +--+--+  +
|     |                 |     |                 |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+




### What does it mean when we call a function with a given set of arguments?
 - A local variable is created with the parameter's name and it will point to the argument.
 - The variable points at the original object (not just a copy)!
 - The function by default does not see the objects that were not provided to it as an argument.

In [58]:
l=["maths"]
def add_maths(k):
    k.append("maths")

print(l)
add_maths(l)
print(l)
add_maths(l)
print(l)

['maths']
['maths', 'maths']
['maths', 'maths', 'maths']


In [59]:
s="maths"
def add_maths(k):
    k=k+"maths"

print(s)
add_maths(s)
print(s)
add_maths(s)
print(s)

maths
maths
maths


In [60]:
s="maths"
def add_maths():
    s=s+"maths"

print(s)
add_maths()
print(s)
add_maths()
print(s)

maths


UnboundLocalError: local variable 's' referenced before assignment

### Boolean values

- Booleans can have two values: `True` or `False` (always capitalize the first letter!!!)

In [61]:
x = True
type(x)

bool

In [62]:
True and False

False

In [63]:
True or False

True

In [64]:
not True

False

In [65]:
True and True and False

False

## Numeric types

- Three types available: `int`, `float` and `complex`
- Object type depends on the initial value

In [66]:
i = 2
f = 1.2
c = 1+2j

type(i), type(f), type(c)

(int, float, complex)

- Implicit conversion works between them when doing arithmetical operations
- The result is always the data type which causes the least information loss

In [67]:
c2 = i + c
print(c2, type(c2))

(3+2j) <class 'complex'>


### Range of numeric values

- Integers can have as large values as your computer hardware permits (no limit is coming from Python's side)
- This is not true for Python 2!

In [68]:
2**100000

9990020930143845079440327643300335909804291390541816917715292738631458324642573483274873313324496504031643944455558549300187996607656176562908471354247492875198889629873671093246350427373112479265800278531241088737085605287228390164568691026850675923517914697052857644696801524832345475543250292786520806957770971741102232042976351205330777996897925116619870771785775955521720081320295204617949229259295623920965797873558158667525495797313144806249260261837941305080582686031535134178739622834990886357758062104606636372130587795322344972010808486369541401835851359858035603574021872908155566580607186461268972839794621842267579349638893357247588761959137656762411125020708704870465179396398710109200363934745618090601613377898560296863598558024761448933047052222860131377095958357319485898496404572383875170702242332633436894423297381877733153286944217936125301907868903603663283161502726139934152804071171914923903341874935394455896301292197256417717233543544751552379310892268182402452755752094704

In [69]:
type(2**63 + 1)

int

## float

- Floats are based on C's `double` type, thus they have a limited accuracy
- Complex numbers use two floats
- We can get accurate information via using `sys.float_info` command.

In [70]:
import sys
sys.float_info

sys.float_info(max=1.7976931348623157e+308, max_exp=1024, max_10_exp=308, min=2.2250738585072014e-308, min_exp=-1021, min_10_exp=-307, dig=15, mant_dig=53, epsilon=2.220446049250313e-16, radix=2, rounds=1)

In [71]:
sys.int_info

sys.int_info(bits_per_digit=30, sizeof_digit=4)

## Arithmetical operators

- Addition, substraction, multiplication works as usual:

In [72]:
i = 2
f = 4.2
c = 4.1-3j

s1 = i + f
s2 = f - c
s3 = i * c
print(s1, type(s1))
print(s2, type(s2))
print(s3, type(s3))

6.2 <class 'float'>
(0.10000000000000053+3j) <class 'complex'>
(8.2-6j) <class 'complex'>


### Division versus floor division
- `/` is the division operator which can return real numbers if required
- `//` is the floor division operator, which returns only the integer part of the result
- This works differently in Python 2!


In [73]:
3 / 2

1.5

In [74]:
-3.0 // 2, 3 // 2 

(-2.0, 1)

In [75]:
4 / 0.8, 4.0 / 0.8, 4 // 0.8, 4.0 // 0.8 

(5.0, 5.0, 4.0, 4.0)

How can you eliminate such problems?

### Comparison operators

In [76]:
x = 23
x < 24, x >= 22

(True, True)

These can be written in one line as well:

In [77]:
23 < x < 100

False

In [78]:
23 <= x < 100

True

### Miscellaneous operators

#### Modulo

In [79]:
5 % 3

2

#### Exponentiation

In [80]:
2 ** 3

8

In [81]:
2 ** 0.5

1.4142135623730951

#### Absolute value

In [82]:
abs(-2 - 1j), abs(1+1j)

(2.23606797749979, 1.4142135623730951)

#### Rounding

In [83]:
round(2.3456), round(2.3456, 2), round(3.4999999999999)

(2, 2.35, 3)

#### Explicit conversions

In [84]:
float(2)

2.0

In [85]:
# Rounds towards the floor
int(-2.7), int(2.7)

(-2, 2)

### The modules `math` and `cmath` contain a variety of other functions:


In [86]:
import math

math.log(16), math.log(16, 2), math.exp(2), math.exp(math.log(10))

(2.772588722239781, 4.0, 7.38905609893065, 10.000000000000002)

## Mutable vs. immutable types

- The value of an object with the mutable type can be changed in place
- Immutable objects have the same value throughout their lifecycle
- All numeric types are immutable

In [87]:
x = 2
old_id = id(x)
x += 1
print(id(x) == old_id)

False


In [88]:
for i in range(-10, 260):
    x = i
    y = i + 1 - 1
    if x is not y:
        print(i)

-10
-9
-8
-7
-6
257
258
259


### Mutable bool values?

They are unique immutable objects, only one of each exists:

In [89]:
x = True
y = False
print(x is y)
x = False
print(x is y)

False
True


### Mutable lists?

In [90]:
l1 = [0, 1]
old_id = id(l1)
l1.append(2)
old_id == id(l1)

True

Lists can be modified in place.

## Types of series

- All types containing collections of data (like lists) support the following functions:

| operation | behaviour |
| :----- | :----- |
| `x in s` | 	True if x equals any of the items of s, else False|
| `x not in s` | 	False if x equals any of the items of s, else True |
| `s + t` | 	Concatenation of s and t |
| `s * n or n * s` | 	Equivalent with the addition of s, n times |
| `s[i]` | 	The i.th item of s, begins with 0 |
| `s[i:j]` | 	 A slice of s from i to j |
| `s[i:j:k]` | 	 A slice of s from i to j with an incrementation of k |
| `len(s)` | 	Length of s |
| `min(s)` | 	 The smallest item of s |
| `max(s)` | 	 The largest item of s |
| `s.index(x[, i[, j]])` | 	Index of the first occurence of x in s (inside the slice from i to j) |
| `s.count(x)` | 	The number of occurences of x in s |

[Table source](https://docs.python.org/3/library/stdtypes.html#common-sequence-operations)

### List

- Mutable array 

In [91]:
l = [1, 2, 2, 3]
l

[1, 2, 2, 3]

In [92]:
l[1]

2

In [93]:
# l[4]  # raises IndexError

In [94]:
l[-1], l[len(l)-1]

(3, 3)

### append, insert, del

In [95]:
l = [1, 2, 3]
l.append(3)
l

[1, 2, 3, 3]

In [96]:
l = [1, 2, 3]
l.insert(1, 5)
l

[1, 5, 2, 3]

In [97]:
l= [1, 2, 5, 1]
del l[2]
l

[1, 2, 1]

### Indexing, ranges

In [98]:
l = []
for i in range(20):
    l.append(2*i + 1)
l[2:5]
l[10:]

[21, 23, 25, 27, 29, 31, 33, 35, 37, 39]

In [99]:
l[-4:]

[33, 35, 37, 39]

In [100]:
for i in range(10):
    print(l[i:i+3])

[1, 3, 5]
[3, 5, 7]
[5, 7, 9]
[7, 9, 11]
[9, 11, 13]
[11, 13, 15]
[13, 15, 17]
[15, 17, 19]
[17, 19, 21]
[19, 21, 23]


In [101]:
l[2:10:3] 

[5, 11, 17]

### What will be the result below?

In [102]:
l[::-1]

[39, 37, 35, 33, 31, 29, 27, 25, 23, 21, 19, 17, 15, 13, 11, 9, 7, 5, 3, 1]

The list is mutable, the items can be changed in place:

In [103]:
l = []
old_id = id(l)

for element in range(1, 3):
    l.append(element)
    print(id(l) == old_id)
l

True
True


[1, 2]

In [104]:
l[1] = 12

In [105]:
l = [1, 2]
l.extend([3, 4, 5])
len(l), l

(5, [1, 2, 3, 4, 5])

### The operation `=` sets a reference

- No new object is created

In [106]:
l2 = l
print(l is l2)

l2.append(42)
l

True


[1, 2, 3, 4, 5, 42]

### The items do not have to be the same type

In [107]:
l = [1, -1, "foo", 2, "bar"]
l

[1, -1, 'foo', 2, 'bar']

### You can iterate through a list with a for cycle

In [108]:
for element in l:
    print(element)

1
-1
foo
2
bar


### enumerate

If you need the index as well, the built-in `enumerate` function iterates through index-item pairs

In [109]:
for i, element in enumerate(l):
    print(i, element)

0 1
1 -1
2 foo
3 2
4 bar


### Sorting lists

- Lists can be sorted with the built-in `sorted` function

In [110]:
l = [3, -1, 2, 11]

for e in sorted(l):
    print(e)

-1
2
3
11


The `key` argument lets you set the basis of the sorting action.

In [111]:
shopping_list = [
    ["apple", 5],
    ["pear", 2],
    ["milk", 1],
    ["bread", 3],
]

for product in sorted(shopping_list, key=lambda x: -x[1]):
    print(product)

['apple', 5]
['bread', 3]
['pear', 2]
['milk', 1]


You can use the `.sort()` method to sort the list in place.

In [112]:
l=[3, -1, 2, 11]
l.sort()
l

[-1, 2, 3, 11]

### The * operator
 You can 'unwrap' a list by using the `*` operator. The `*l` expression will enumerate the items from the list `l` but not as a list object. This is particularly useful when you have a function expecting a long set of parameters.

In [113]:
l=["apple","pear","spongecake"]
print(l)
print(*l)

['apple', 'pear', 'spongecake']
apple pear spongecake


In [114]:
def summation(x,y,z):
    return x+y+z
summation(*[10,20,30])

60

### tuple

- A tuple is an immutable array

In [115]:
t = ()  # empty tuple
print(type(t), len(t))

t = ([1, 2, 3], "foo")
type(t), len(t)

<class 'tuple'> 0


(tuple, 2)

In [116]:
t

([1, 2, 3], 'foo')

Indexing works the same as it works with lists.

In [117]:
t[1], t[-1]

('foo', 'foo')

The tuple contains immutable references, however its items can be mutable objects.

In [118]:
t = ([1, 2, 3], "foo")
#t[0]= "bar"  # this raises a TypeError

In [119]:
for e in t:
    print(id(e))
    
print("\nChanging an element of t[0]\n")
t[0][1] = 11

for e in t:
    print(id(e))

print("\n", t)

140297460741056
140297542829616

Changing an element of t[0]

140297460741056
140297542829616

 ([1, 11, 3], 'foo')


### dictionary

- built-in map type
- It assings keys to values

In [120]:
d = {}  # emtpty dictionary, equivalnet: d = dict()
d["apple"] = 12
d["plum"] = 2
d

{'apple': 12, 'plum': 2}

Another way to create it:

In [121]:
d = {"apple": 12, "plum": 2}
d

{'apple': 12, 'plum': 2}

### Deleting a key

In [122]:
del d["apple"]
d

{'plum': 2}

### Iterating through a dictionary

- Keys and values can be iterated separately or together
- Iterating keys:

In [123]:
d = {"apple": 12, "plum": 2}
for key in d.keys():
    print(key, d[key])

apple 12
plum 2


Iterating values:

In [124]:
for value in d.values():
    print(value)

12
2


Iterating them together:

In [125]:
for key, value in d.items():
    print(key, value)

apple 12
plum 2


### What happens in the background?

-  A *hash table* is created
  - Every object gets a hash value assigned, which is equal when the objects are equal. Instead of the objects themselves, the hash values are compared. This can be done faster. 
  - The backdraw is that due to this, keys cannot be mutable!
  - However, keys can still be different types.

In [126]:
d = {}
d[1] = "a"  # numeric types are immutable
d[3+2j] = "b"
d["c"] = 1.0
d

{1: 'a', (3+2j): 'b', 'c': 1.0}

- The tuple is immutable:

In [127]:
d[("apple", 1)] = -2
d

{1: 'a', (3+2j): 'b', 'c': 1.0, ('apple', 1): -2}

- But the list is mutable!

In [128]:
# d[["apple", 1]] = 12  # raises TypeError

### Q. Can these be dictionary keys?

In [129]:
key1 = (2, (3, 4))
key2 = (2, [], (3, 4))

d = {}
d[key1] = 1
d[key2] = 2
d

TypeError: unhashable type: 'list'

### Functions revisited. `args` and `kwargs`

- Both the positionary and the keyword arguments can be collected with  `*` and `**` operátorokkal
- Positional arguments are stored in a tuple

In [130]:
def arbitrary_positional_f(*args):
    print(type(args))
    for arg in args:
        print(arg)
        
arbitrary_positional_f(1, 2, -1)
# arbitrary_positional_f(1, 2, arg=-1)  # raises TypeError

<class 'tuple'>
1
2
-1


- Keyword arguments are stored in a dictionary.

In [131]:
def arbitrary_keyword_f(**kwargs):
    print(type(kwargs))
    for argname, value in kwargs.items():
        print(argname, value)
        
arbitrary_keyword_f(arg1=1, arg2=12)
# arbitrary_keyword_f(12, arg=12)  # TypeError

<class 'dict'>
arg1 1
arg2 12


- We mostly collect both of these:

In [132]:
def arbitrary_arg_f(*args, **kwargs):
    if args:
        print("Positional arguments")
        for arg in args:
            print(arg)
    else:
        print("No positional arguments")
    if kwargs:
        print("Keyword arguments")
        for argname, value in kwargs.items():
            print(argname, value)
    else:
        print("No keyword arguments")
        
arbitrary_arg_f()
arbitrary_arg_f(12, -2, param1="foo")

No positional arguments
No keyword arguments
Positional arguments
12
-2
Keyword arguments
param1 foo


# set

- Consists of unique, hashable items (thus you cannot have mutable types in a set)
- Basic set related operations can be used:

In [133]:
s = set()
s.add(2)
s.add(3)
s.add(2)
s

{2, 3}

In [134]:
s = {2, 3, 2}  # d = {'a': 2}
type(s), s

(set, {2, 3})

### deleting elements

In [135]:
s.add(2)
s.remove(2)
# s.remove(2)  # raises KeyError, since we already removed this element
s.discard(2)  # removes if present, does not raise exception

## frozenset

-  This is an immutable variant of the set.

In [136]:
fs = frozenset([1, 2])
# fs.add(2)  # raises AttributeError

In [137]:
fs = frozenset([1, 2])
s = {1, 2}

d = dict()
d[fs] = 1
# d[s] = 2  # raises TypeError
d

{frozenset({1, 2}): 1}

## set operations

 - Operations can be noted with two different methods:
  1. Functions
  2. Operators

In [138]:
s1 = {1, 2, 3, 4, 5}
s2 = {2, 5, 6, 7}

s1 & s2  # s1.intersection(s2) or s2.intersection(s1)

{2, 5}

In [139]:
s1 | s2  # s1.union(s2) or s2.union(s1)

{1, 2, 3, 4, 5, 6, 7}

In [140]:
s1 - s2, s2 - s1  # s1.difference(s2), s2.difference(s1)

({1, 3, 4}, {6, 7})

- These operations return a new `set` 

In [141]:
s3 = s1 & s2
type(s3), id(s3) == id(s1), id(s3) == id(s2)

(set, False, False)

### `issubset` and `issuperset`

In [142]:
s1 < s2, s1.issubset(s2)

(False, False)

In [143]:
s1

{1, 2, 3, 4, 5}

In [144]:
{1, 2} < s1

True

In [145]:
s1.issuperset({1, 6})

False

## Use cases of sets

- You can remove the duplicates from a list by creating a set:

In [146]:
l = [1, 2, 3, -1, 1, 2, 1, 0]
uniq = set(l)
uniq

{-1, 0, 1, 2, 3}

- Sets and dictionaries are searched in O(1) time. (they are fast)
- Lists are searched in O(n) time.

In [147]:
import random

letters = "abcdef"
word_len = [1, 2, 3, 4, 5]
N = 10000
samples = []

for i in range(N):
    word = []
    for j in range(random.choice(word_len)):
        word.append(random.choice(letters))
    samples.append("".join(word))
    
samples = list(samples)

In [148]:
word = []
for j in range(random.choice(word_len)):
    word.append(random.choice(letters))
word = "".join(word)
print(word)
word in samples

cceec


False

### list lookup

In [149]:
%%timeit

word = []
for j in range(random.choice(word_len)):
    word.append(random.choice(letters))
word = "".join(word)
word in samples

46.4 µs ± 743 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


### set lookup

In [150]:
samples_set = set(samples)
len(samples_set), len(samples)

(3067, 10000)

In [151]:
%%timeit

word = []
for j in range(random.choice(word_len)):
    word.append(random.choice(letters))
word = "".join(word)
word in samples_set

2.28 µs ± 78.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


### Functions part 3

### Mutable default arguments

- Be careful!!!

In [152]:
def insert_value(value, l=[]):
    l.append(value)
    print(l)
    
l1 = []
insert_value(12, l1)
l2 = []
insert_value(14, l2)

[12]
[14]


In [153]:
insert_value(-1)

[-1]


In [154]:
insert_value(-3)

[-1, -3]


- Best practice is to avoid these!

A better solution:

In [155]:
def insert_value(value, l=None):
    if l is None:
        l = []
    l.append(value)
    return l

l = insert_value(2)
l

[2]

In [156]:
insert_value(12)

[12]

### Lambda functions

- This is an unnamed function
- Can have parameters
- Does not access anything

In [157]:
l = [-1, 0, -10, 2, 3]

If we want to sort this by the absolute value, the built-in `sort` function will not be able to do that by default

In [158]:
for e in sorted(l):
    print(e)

-10
-1
0
2
3


But we can provide a `key` for the function:

In [159]:
for e in sorted(l, key=lambda x: abs(x)):
    print(e)

0
-1
2
3
-10


You can use any suitable function though:

In [160]:
for e in sorted(l, key=abs):
    print(e)

0
-1
2
3
-10


### Strings

- Strings are **immutable** arrays, consisting of Unicode characters.

In [161]:
single = 'ab\'c'
double = "ab\"c"
multiline = """
sdfajfklasj;
sdfsdfs
sdfsdf
"""
single == double

False

- Immutable, so it cannot be changed:

In [162]:
s = "abc"

# s[1] = "c"  # TypeError

- Every string operation creates a new object.

In [163]:
print(id(s))
s += "def"
id(s), s

140297548797296


(140297459887472, 'abcdef')

- Most functions that can be used with lists, can also be used with strings
  - Example: indexing

In [164]:
s = "abcdefghijkl"
s[::2]

'acegik'

In [167]:
s = "abc"
print(type(s))

with open("file.txt", "w") as f:
    f.write(s)
    f.write("\n")

<class 'str'>


In [168]:
with open("file.txt") as f:
    text = f.read().strip()
    
print(text)
type(text)

abc


str

## String operations

- Many functions available:

In [169]:
"abC".upper(), "ABC".lower(), "abc".title()

('ABC', 'abc', 'Abc')

In [170]:
s = "\tabc  \n"
print("<START>" + s + "<STOP>")

<START>	abc  
<STOP>


In [171]:
s.strip()

'abc'

In [172]:
s.rstrip()

'\tabc'

In [173]:
s.lstrip()

'abc  \n'

In [174]:
"abca".strip("a")

'bc'

As every function will return a new string, these can be chained together:

In [175]:
" abcd abc".strip().rstrip("c").lstrip("ab")

'cd ab'

### True-false questions

In [176]:
"abc".startswith("ab"), "abc".endswith("cd")

(True, False)

In [177]:
"abc".istitle(), "Abc".istitle()

(False, True)

In [178]:
"  \t\n".isspace()

True

In [179]:
"989".isdigit(), "1.5".isdigit()

(True, False)

### split and join

In [180]:
s = "the quick brown fox jumps over the lazy dog"
words = s.split()
words

['the', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']

In [181]:
s = "R.E.M."
s.split(".")

['R', 'E', 'M', '']

In [182]:
"-".join(words)

'the-quick-brown-fox-jumps-over-the-lazy-dog'