# Introduction to python & jupyter

What this notebook covers:
1. Using the Notebook
2. Variables and types
3. Built-in Functions, help, and libraries
4. Collections (list, tuple, set, dictionary)
5. Conditions and If statements
6. loops
7. Functions, lambdas, map/filter/reduce
8. Classes and Objects
9. Exception Handling
10. JSON
11. File Handling
12. Jupyter architecture

# Using the Notebook

## Modes

* If you press [`Esc`] and [`Return`] alternately, the outer border of your code cell will change from gray/blue to green.

* These are the **Command** (gray) and **Edit** (green) modes of your notebook.

* In Command mode, pressing the [`H`] key will provide a list of all the shortcut keys.

* Command mode allows you to edit notebook-level features, and Edit mode changes the content of cells.

* When in Command mode (esc/gray):
    * The [`B`] key will make a new cell below the currently selected cell.
    * The [`A`] key will make one above.
    * The [`X`] key will delete the current cell.
    * The [`Z`] key will undo your last cell deletion.

* All actions can be done using the menus, but there are lots of keyboard shortcuts to speed things up.
* If you remember the [`Esc`] and [`H`] shortcut, you will be able to find out all the rest.

* Pressing the [`Return`] key turns the border green and engages **Edit** mode, which allows you to type within the cell.

* Pressing [`Shift+Return`] together will execute the contents of the cell.

## Markdown

Notebooks can also render Markdown.

* Turn the current cell into a Markdown cell by entering the **Command** mode ([`Esc`]/gray) and press the [`M`] key.
`In [ ]:` will disappear to show it is no longer a code cell and you will be able to write in Markdown.

* Turn the current cell into a Code cell by entering the **Command** mode ([`Esc`]/gray) and press the [`Y`] key.

# Variables and types




## The runtime environment
The runtime environment is essentially the IPython kernel, the REPL (read-evaluate-print-loop). 

**The order in which cells are executed matters, not the order in which they are rendered**, although when you run all, they are intuitively run in order. Let's have a look at the following example.

In [3]:
foo = [1,2,3]

In [4]:
print(foo)

[1, 2, 3]


In [5]:
foo = "I'm not [1,2,3]"
dir()

['In',
 'Out',
 '_',
 '_2',
 '__',
 '___',
 '__builtin__',
 '__builtins__',
 '__doc__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 '__vsc_ipynb_file__',
 '_dh',
 '_i',
 '_i1',
 '_i2',
 '_i3',
 '_i4',
 '_i5',
 '_ih',
 '_ii',
 '_iii',
 '_oh',
 'exit',
 'foo',
 'get_ipython',
 'open',
 'quit']

In order to know which variables you have in your environment, you can use the following ipython commands:
- `dir()`
- `locals()`
- `globals()`

To get help about a function you can use ?, as for example `?dict`

## Types

* Use the built-in function `type` to find out what type a value has

In [6]:
a = 10
type(a)

int

Must convert numbers to strings or vice versa when operating on them

`1 + '2'` would give

`TypeError: unsupported operand type(s) for +: 'int' and 'str'`

In fact, this operatiopn is now allowed because it’s ambiguous: should be `3` or `'12'`?

Some types can be converted to other types by using the type name as a function.

In [7]:
print(1 + int('2'))
print(str(1) + '2')

3
12


# Built-in Functions and Help

* len, print, max, min, round
* int, str, and float create a new value from an existing one

* Use the built-in function `help` to get help for a function. Every built-in function has online documentation.

In [10]:
help(round)

Help on built-in function round in module builtins:

round(number, ndigits=None)
    Round a number to a given precision in decimal digits.
    
    The return value is an integer if ndigits is omitted or None.  Otherwise
    the return value has the same type as the number.  ndigits may be negative.



The Jupyter Notebook has 3 ways to get help.
    * Place the cursor inside the parenthesis of the function, hold down shift, and press tab.
    * use the function help
    * by using the question mark `?`, you can access the Docstring for quick reference on syntax.

In [8]:
str.replace?

[0;31mSignature:[0m [0mstr[0m[0;34m.[0m[0mreplace[0m[0;34m([0m[0mself[0m[0;34m,[0m [0mold[0m[0;34m,[0m [0mnew[0m[0;34m,[0m [0mcount[0m[0;34m=[0m[0;34m-[0m[0;36m1[0m[0;34m,[0m [0;34m/[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Return a copy with all occurrences of substring old replaced by new.

  count
    Maximum number of occurrences to replace.
    -1 (the default value) means replace all occurrences.

If the optional argument count is given, only the first count occurrences are
replaced.
[0;31mType:[0m      method_descriptor

In [9]:
help(str.replace)

Help on method_descriptor:

replace(self, old, new, count=-1, /)
    Return a copy with all occurrences of substring old replaced by new.

      count
        Maximum number of occurrences to replace.
        -1 (the default value) means replace all occurrences.

    If the optional argument count is given, only the first count occurrences are
    replaced.



# Importing libraries

* Use import to load a library module into a program’s memory.
* Then refer to things from the module as module_name.thing_name.

* Use from ... import ... to load only specific items from a library module.

In [13]:
from math import cos, pi

print('cos(pi) is', cos(pi))

cos(pi) is -1.0


In [14]:
import math

print('pi is', math.pi)

pi is 3.141592653589793


## Aliases

* Use import ... as ... to give a library a short alias while importing it.
* Then refer to items in the library using that shortened name.

In [15]:
import math as m

print('cos(pi) is', m.cos(m.pi))

cos(pi) is -1.0


# Collections

There are four collection data types in the Python programming language:

1. List is a collection which is ordered and changeable. Allows duplicate members.
2. Tuple is a collection which is ordered and unchangeable. Allows duplicate members.
3. Set is a collection which is unordered and unindexed. No duplicate members.
4. Dictionary is a collection which is unordered, changeable and indexed. No duplicate members.

# List

A list is a collection which is ordered and changeable. In Python lists are written with square brackets.

In [16]:
thislist = ["apple", "banana", "cherry"]
print(thislist)

['apple', 'banana', 'cherry']


## Access Items

You access the list items by referring to the index number:

In [17]:
thislist = ["apple", "banana", "cherry"]
print(thislist[1])

banana


## Change Item Value

To change the value of a specific item, refer to the index number:

In [18]:
thislist[0] = "not an apple"
print(thislist)

['not an apple', 'banana', 'cherry']


## Loop Through a List

You can loop through the list items by using a for loop:

In [19]:
thislist = ["apple", "banana", "cherry"]
for x in thislist:
    print(x)

apple
banana
cherry


## Check if Item Exists
To determine if a specified item is present in a list use the in keyword:

In [20]:
thislist = ["apple", "banana", "cherry"]
if "apple" in thislist:
    print("Yes, 'apple' is in the fruits list")

Yes, 'apple' is in the fruits list


## List Length
To determine how many items a list has, use the len() method:

## Add/remove Items
To add an item to the end of the list, use the append() method.

To add an item at the specified index, use the insert() method:

There are several methods to remove the first instance of the items from a list:
- The remove() method removes the specified item:
- The pop() method removes the specified index, (or the last item if index is not specified):
- The del keyword removes the specified index:

In [21]:
thislist = ["apple", "banana", "cherry"]
thislist.append("orange")
print(thislist)

['apple', 'banana', 'cherry', 'orange']


In [22]:
thislist = ["apple", "banana", "cherry"]
thislist.insert(1, "orange")
print(thislist)

['apple', 'orange', 'banana', 'cherry']


In [23]:
thislist = ["apple", "banana", "cherry", "banana"]
thislist.remove("banana")
print(thislist)

['apple', 'cherry', 'banana']


In [24]:
thislist = ["apple", "banana", "cherry"]
print(thislist.pop())
print(thislist)

cherry
['apple', 'banana']


In [25]:
thislist = ["apple", "banana", "cherry"]
del thislist[0]
print(thislist)

['banana', 'cherry']


## The list() Constructor
It is also possible to use the list() constructor to make a list.

In [26]:
thislist = list(("apple", "banana", "cherry")) # note the double round-brackets
print(thislist)

['apple', 'banana', 'cherry']


## List are heterogeneous

Lists may contain values of different types.

A single list may contain numbers, strings, and anything else.

In [27]:
goals = [1, 'Create lists.', 2, 'Extract items from lists.', 3, 'Modify lists.']
goals

[1, 'Create lists.', 2, 'Extract items from lists.', 3, 'Modify lists.']

## Character strings

Character strings can be indexed like lists.

Get single characters from a character string using indexes in square brackets.

In [28]:
element = 'carbon'
print('zeroth character:', element[0])
print('third character:', element[3])

zeroth character: c
third character: b


### Character strings are immutable.

Cannot change the characters in a string after it has been created.
Immutable: can’t be changed after creation.
In contrast, lists are mutable: they can be modified in place.
Python considers the string to be a single value with parts, not a collection of values.

In [29]:
element[0] = 'C'

TypeError: 'str' object does not support item assignment

## Slicing

A slice is a part of a string (or, more generally, any list-like thing).

We take a slice by using `[start:stop]`, where start is replaced with the index of the first element we want and stop is replaced with the index of the element just after the last element we want.
Mathematically, you might say that a slice selects `[start:stop)`.

Taking a slice does not change the contents of the original string. Instead, the slice is a copy of part of the original string.

In [30]:
string = '0123456789'
print(string[2:6])
print(string[2:])
print(string[:6])
print(string[:])
print(string)

2345
23456789
012345
0123456789
0123456789


The more general syntax is:
[start:end:step]

so for example, to reverse a list you can do:
[::-1]

using -n for the start or stop means the nth position from the last

In [31]:
l = list(range(1,10))
print(l)

# reverse
print(l[::-1])

print(l[2:8:2])

print(l[7::-2])

# trimming first and last
print(l[1:-1:])

assert l[7::-1] == l[-2::-1], "Lists are different?!?!"
print(l[-2::-1])

assert l[-2:1:-1] == l[7:-8:-1], "Lists are different?!?!"
print(l[-2:1:-1])

[1, 2, 3, 4, 5, 6, 7, 8, 9]
[9, 8, 7, 6, 5, 4, 3, 2, 1]
[3, 5, 7]
[8, 6, 4, 2]
[2, 3, 4, 5, 6, 7, 8]
[8, 7, 6, 5, 4, 3, 2, 1]
[8, 7, 6, 5, 4, 3]


# Tuple

A tuple is a collection which is ordered and unchangeable. In Python tuples are written with round brackets.

In [32]:
thistuple = ("apple", "banana", "cherry")
print(thistuple)

('apple', 'banana', 'cherry')


## Access Tuple Items

You can access tuple items by referring to the index number, inside square brackets:

In [33]:
thistuple = ("apple", "banana", "cherry")
print(thistuple[1])

banana


## Change Tuple Values

Once a tuple is created, you cannot change its values. Tuples are unchangeable.

In [34]:
thistuple = ("apple", "banana", "cherry")
thistuple[1] = "blackcurrant"

TypeError: 'tuple' object does not support item assignment

## Loop Through a Tuple

You can loop through the tuple items by using a for loop.

In [35]:
thistuple = ("apple", "banana", "cherry")
for x in thistuple:
    print(x)

apple
banana
cherry


## Check if Item Exists

To determine if a specified item is present in a tuple use the in keyword:

In [36]:
thistuple = ("apple", "banana", "cherry")
if "apple" in thistuple:
    print("Yes, 'apple' is in the fruits tuple")

Yes, 'apple' is in the fruits tuple


## Tuple Length

To determine how many items a tuple has, use the len() method:

In [37]:
thistuple = ("apple", "banana", "cherry")
print(len(thistuple))

3


## Add/remove items

Once a tuple is created, you cannot add items to it. Tuples are unchangeable.

Tuples are unchangeable, so you cannot remove items from it.

In [38]:
thistuple = ("apple", "banana", "cheery")
del thistuple[0]

TypeError: 'tuple' object doesn't support item deletion

## The tuple() Constructor

It is also possible to use the tuple() constructor to make a tuple.

In [39]:
thistuple = tuple(("apple", "banana", "cherry")) # note the double round-brackets
print(thistuple)


('apple', 'banana', 'cherry')


## List vs Tuple memory footprint

Tuples need less space because they are of fixed size. Lists have one level of indirection more to add new elements. Lists also overallocate to avoid re-allocating all the time. For this reason, they also need to keep track of the filled and allocated size. 

![list](list.png)

In [40]:
from sys import getsizeof

print(getsizeof("a"))
print(getsizeof(1))

print(getsizeof(["a"]))
print(getsizeof([1]))
print(getsizeof(("a")))
print(getsizeof((1)))

print(getsizeof(("a","b","c")))
print(getsizeof((1,2,3)))
print(getsizeof(["a","b","c"]))
print(getsizeof([1,2,3]))

print(getsizeof(()))
print(getsizeof([]))

50
28
64
64
50
28
64
64
80
80
40
56


### Overallocation

In [41]:
a = []
print(getsizeof(a))
a.append(1)
print(getsizeof(a))
a.append(2)
print(getsizeof(a))
a.append(3)
print(getsizeof(a))

b = [1,2,3]
print(getsizeof(b))
b.append(4)
print(getsizeof(b))

print(getsizeof(list(a)))

56
88
88
88
80
112
80


# Set

A set is a collection which is unordered and unindexed. In Python sets are written with curly brackets.

In [42]:
thisset = {"apple", "banana", "cherry"}
print(thisset)

{'banana', 'apple', 'cherry'}


## Access Items

You cannot access items in a set by referring to an index, since sets are unordered the items has no index.

But you can loop through the set items using a for loop, or ask if a specified value is present in a set, by using the in keyword.

In [43]:
thisset = {"apple", "banana", "cherry"}

for x in thisset:
    print(x)


banana
apple
cherry


In [44]:
thisset = {"apple", "banana", "cherry"}

print("banana" in thisset)

True


## Change/add/remove Items

Once a set is created, you cannot change its items, but you can add new items.

To add one item to a set use the add() method.

To add more than one item to a set use the update() method.

To remove an item in a set, use the remove(), or the discard() method.

You can also use the pop(), method to remove an item, but this method will remove the last item. Remember that sets are unordered, so you will not know what item that gets removed.
The return value of the pop() method is the removed item.

In [45]:
thisset = {"apple", "banana", "cherry"}
thisset.add("orange")
print(thisset)

{'banana', 'apple', 'orange', 'cherry'}


In [46]:
thisset = {"apple", "banana", "cherry"}
thisset.update(["orange", "mango", "grapes"])
print(thisset)

{'banana', 'orange', 'grapes', 'cherry', 'mango', 'apple'}


In [47]:
thisset = {"apple", "banana", "cherry"}
thisset.remove("banana")
print(thisset)

{'apple', 'cherry'}


In [48]:
thisset = {"apple", "banana", "cherry"}
# Note: If the item to remove does not exist, remove() will raise an error.
thisset.remove("orange")

KeyError: 'orange'

In [49]:
thisset = {"apple", "banana", "cherry"}
# Note: If the item to remove does not exist, discard() will NOT raise an error.
thisset.discard("orange")
print(thisset)

{'banana', 'apple', 'cherry'}


In [50]:
thisset = {"apple", "banana", "cherry"}
# Note: Sets are unordered, so when using the pop() method, you will not know which item that gets removed.
x = thisset.pop()
print(x)
print(thisset)

banana
{'apple', 'cherry'}


In [51]:
thisset = {"apple", "banana", "cherry"}
thisset.clear()
print(thisset)

set()


## Get the Length of a Set

To determine how many items a set has, use the len() method.

In [52]:
thisset = {"apple", "banana", "cherry"}
print(len(thisset))

3


## The set() Constructor

It is also possible to use the set() constructor to make a set.

In [53]:
thisset = set(("apple", "banana", "cherry")) # note the double round-brackets
print(thisset)

{'banana', 'apple', 'cherry'}


# Dictionary

A dictionary is a collection which is unordered, changeable and indexed. In Python dictionaries are written with curly brackets, and they have keys and values.

In [54]:
thisdict = {
    "brand": "Ford",
    "model": "Mustang",
    "year": 1964
}
print(thisdict)

{'brand': 'Ford', 'model': 'Mustang', 'year': 1964}


## Accessing Items

You can access the items of a dictionary by referring to its key name, inside square brackets.

There is also a method called get() that will give you the same result.

In [55]:
x = thisdict["model"]
print(x)
y = thisdict.get("model")
print(y)

Mustang
Mustang


## Change Values

You can change the value of a specific item by referring to its key name:

In [56]:
thisdict = {
    "brand": "Ford",
    "model": "Mustang",
    "year": 1964
}
thisdict["year"] = 2018
print(thisdict)

{'brand': 'Ford', 'model': 'Mustang', 'year': 2018}


## Loop Through a Dictionary

You can loop through a dictionary by using a for loop.

When looping through a dictionary, the return value are the keys of the dictionary, but there are methods to return the values as well.

In [57]:
for x in thisdict:
    print('key =', x, '\t value =', thisdict[x])

key = brand 	 value = Ford
key = model 	 value = Mustang
key = year 	 value = 2018


In [58]:
for x in thisdict.values():
    print(x)      

Ford
Mustang
2018


In [59]:
for x, y in thisdict.items():
    print('key=', x, '\t value = ', y)

key= brand 	 value =  Ford
key= model 	 value =  Mustang
key= year 	 value =  2018


## Check if Key Exists

To determine if a specified key is present in a dictionary use the `in` keyword:

In [60]:
thisdict = {
    "brand": "Ford",
    "model": "Mustang",
    "year": 1964
}

if "model" in thisdict:
    print("Yes, 'model' is one of the keys in the thisdict dictionary")

Yes, 'model' is one of the keys in the thisdict dictionary


## Dictionary Length

To determine how many items (key-value pairs) a dictionary has, use the len() method.

In [61]:
print(len(thisdict))

3


## Adding/removing Items

Adding an item to the dictionary is done by using a new index key and assigning a value to it.

There are several methods to remove items from a dictionary:
- The pop() method removes the item with the specified key name
- The popitem() method removes a random item
- The del keyword removes the item with the specified key name

In [62]:
thisdict = {
    "brand": "Ford",
    "model": "Mustang",
    "year": 1964
}
thisdict["color"] = "red"
print(thisdict)

{'brand': 'Ford', 'model': 'Mustang', 'year': 1964, 'color': 'red'}


In [63]:
thisdict = {
    "brand": "Ford",
    "model": "Mustang",
    "year": 1964
}
thisdict.pop("model")
print(thisdict)

{'brand': 'Ford', 'year': 1964}


In [64]:
thisdict = {
    "brand": "Ford",
    "model": "Mustang",
    "year": 1964
}
thisdict.popitem()
print(thisdict)

{'brand': 'Ford', 'model': 'Mustang'}


In [65]:
thisdict = {
    "brand": "Ford",
    "model": "Mustang",
    "year": 1964
}
del thisdict["model"]
print(thisdict)

{'brand': 'Ford', 'year': 1964}


## The dict() Constructor

It is also possible to use the dict() constructor to make a dictionary:

In [66]:
thisdict = dict(brand="Ford", model="Mustang", year=1964)
# note that keywords are not string literals
# note the use of equals rather than colon for the assignment
print(thisdict)

{'brand': 'Ford', 'model': 'Mustang', 'year': 1964}


## Defaultdict

A very handy data structure to create dictionaries of other data structures (e.g., dictionary of lists, of sets, of dictionaries). It is part of the `collections` library

In [67]:
from collections import defaultdict

a = defaultdict(list)

for i in range(1,10):
    if i % 2 == 0:
        a['even'].append(i)
    else:
        a['odd'].append(i)
        
a

defaultdict(list, {'odd': [1, 3, 5, 7, 9], 'even': [2, 4, 6, 8]})

# Conditions and If statements

Python supports the usual logical conditions from mathematics:

- Equals: `a == b`
- Not Equals: `a != b`
- Less than: `a < b`
- Less than or equal to: `a <= b`
- Greater than: `a > b`
- Greater than or equal to: `a >= b`

These conditions can be used in several ways, most commonly in "if statements" and loops.

An "if statement" is written by using the if keyword.

In [68]:
a = 33
b = 200
if b > a:
    print("b is greater than a")

b is greater than a


In this example we use two variables, a and b, which are used as part of the if statement to test whether b is greater than a. As a is 33, and b is 200, we know that 200 is greater than 33, and so we print to screen that "b is greater than a".

## Indentation

Python relies on indentation, using whitespace, to define scope in the code. Other programming languages often use curly-brackets for this purpose.

In [69]:
a = 33
b = 200
if b > a:
print("b is greater than a") # you will get an error

IndentationError: expected an indented block (<ipython-input-69-4276c1871af7>, line 4)

## Elif

The elif keyword is pythons way of saying "if the previous conditions were not true, then try this condition".

In [70]:
a = 33
b = 33
if b > a:
    print("b is greater than a")
elif a == b:
    print("a and b are equal")

a and b are equal


In this example a is equal to b, so the first condition is not true, but the elif condition is true, so we print to screen that "a and b are equal".

## Else

The else keyword catches anything which isn't caught by the preceding conditions.

In [71]:
a = 200
b = 33
if b > a:
    print("b is greater than a")
elif a == b:
    print("a and b are equal")
else:
    print("a is greater than b")

a is greater than b


In this example a is greater to b, so the first condition is not true, also the elif condition is not true, so we go to the else condition and print to screen that "a is greater than b".

## Short Hand If / If ... Else

If you have only one statement to execute, you can put it on the same line as the if statement. Same if you have one if and one else.

In [72]:
if a > b: print("a is greater than b")

a is greater than b


In [73]:
r = range(1,10)
r2 = [x * 2 if x % 2 == 0 else x for x in r]
r2

[1, 4, 3, 8, 5, 12, 7, 16, 9]

General syntax is:

true_exp if condition else false_exp


In [74]:
print("true") if 1 == 1 else print("false")
a = 1 if 1 == 2 else 0
a

true


0

## and / or

The and keyword is a logical operator, and is used to combine conditional statements.

The or keyword is a logical operator, and is used to combine conditional statements.

In [75]:
a = 10
b = 5
c = 20

if a > b and c > a:
    print("Both conditions are True")

Both conditions are True


In [76]:
if a > b or a > c:
    print("At least one of the conditions is True")

At least one of the conditions is True


Or is also used in assignment as a convenience method to avoid assigning something to None

In [10]:
n = None

val = n or "value"
val

'value'

# The while Loop

With the while loop we can execute a set of statements as long as a condition is true.

The while loop requires relevant variables to be ready, in the following example we need to define an indexing variable, i, which we set to 1.

In [77]:
i = 1
while i < 6:
    print(i)
    i += 1

1
2
3
4
5


## The break and continue statements

With the break statement we can stop the loop even if the while condition is true.

With the continue statement we can stop the current iteration, and continue with the next:

In [78]:
i = 1
while i < 6:
    print(i)
    if i == 3:
        break
    i += 1

1
2
3


In [79]:
i = 0
while i < 6:
    i += 1 
    if i == 3:
        continue
    print(i)

1
2
4
5
6


# For Loops

A for loop is used for iterating over a sequence (that is either a list, a tuple, a dictionary, a set, a string, or an iterator).

This is less like the for keyword in other programming language, and works more like an iterator method as found in other object-orientated programming languages.

With the for loop we can execute a set of statements, once for each item in a list, tuple, set etc.
    
The for loop does not require an indexing variable to set beforehand.

In [80]:
fruits = ["apple", "banana", "cherry"]
for x in fruits:
    print(x)

apple
banana
cherry


## Looping Through a String

Even strings are iterable objects, they contain a sequence of characters:

In [81]:
for x in "banana":
    print(x)

b
a
n
a
n
a


## The break Statement

With the break statement we can stop the loop before it has looped through all the items:

In [82]:
fruits = ["apple", "banana", "cherry"]
for x in fruits:
    print(x)
    if x == "banana":
        break

apple
banana


In [83]:
fruits = ["apple", "banana", "cherry"]
for x in fruits:
    if x == "banana":
        break
    print(x)

apple


## The continue Statement

With the continue statement we can stop the current iteration of the loop, and continue with the next:

In [84]:
fruits = ["apple", "banana", "cherry"]
for x in fruits:
    if x == "banana":
        continue
    print(x)

apple
cherry


## The range() Function

To loop through a set of code a specified number of times, we can use the range() function,
The range() function returns a sequence of numbers, starting from 0 by default, and increments by 1 (by default), and ends at a specified number.

The range() function defaults to 0 as a starting value, however it is possible to specify the starting value by adding a parameter: `range(2, 6)`, which means values from 2 to 6 (but not including 6).

The range() function defaults to increment the sequence by 1, however it is possible to specify the increment value by adding a third parameter: `range(2, 30, 3)`.

In [85]:
for x in range(6):
    print(x)

0
1
2
3
4
5


In [86]:
for x in range(2, 6):
    print(x)

2
3
4
5


In [87]:
for x in range(2, 10, 3):
    print(x)

2
5
8


## Else in For Loop

The else keyword in a for loop specifies a block of code to be executed when the loop is finished.

In [88]:
for x in range(6):
    print(x)
else:
    print("Finally finished!")

0
1
2
3
4
5
Finally finished!


## Nested Loops

A nested loop is a loop inside a loop.

The "inner loop" will be executed one time for each iteration of the "outer loop".

In [89]:
adj = ["red", "big", "tasty"]
fruits = ["apple", "banana", "cherry"]

for x in adj:
    for y in fruits:
        print(x, y)

red apple
red banana
red cherry
big apple
big banana
big cherry
tasty apple
tasty banana
tasty cherry


# Functions

A function is a block of code which only runs when it is called.

You can pass data, known as parameters, into a function.

A function can return data as a result.

## Creating and calling a function

In Python a function is defined using the def keyword.

To call a function, use the function name followed by parenthesis.

In [90]:
def my_function():
    print("Hello from a function")

my_function()

Hello from a function


## Parameters

Information can be passed to functions as parameter.

Parameters are specified after the function name, inside the parentheses. You can add as many parameters as you want, just separate them with a comma.

The following example has a function with one parameter (fname). When the function is called, we pass along a first name, which is used inside the function to print a message.

In [91]:
def my_function(fname):
    print("Hello I am", fname)

my_function("Emil")
my_function("Tobias")
my_function("Linus")

Hello I am Emil
Hello I am Tobias
Hello I am Linus


Arguments can be passed positionally or by name. When using name the order does not matter, whereas it does when passing argument positionally.

In [92]:
def foo(a,b,c):
    print(f"a={a}, b={b}, c={c}")

foo("a","b","c")
foo("c", "a", "b")
foo(a="a", b="b", c="c")
foo(c="c", a="a", b="b")

a=a, b=b, c=c
a=c, b=a, c=b
a=a, b=b, c=c
a=a, b=b, c=c


## Default Parameter Value

The following example shows how to use a default parameter value.

If we call the function without parameter, it uses the default value.

In [93]:
def my_function(country = "Switzerland"):
    print("I am from " + country)

my_function("Sweden")
my_function("India")
my_function()
my_function("Brazil")

I am from Sweden
I am from India
I am from Switzerland
I am from Brazil


## Return Values

To let a function return a value, use the return statement.

In [94]:
def times5(x):
    return 5 * x

print(times5(3))
print(times5(5))
print(times5(9)) 

15
25
45


## Packing

In python you can define a function with an undetermined amount of positional arguments. These arguments are then packed in a tuple and passed to the function.

In [95]:
def foo(*args):
    print(type(args))
    for arg in args:
        print(arg)
        
foo(1,2,3)
foo([1,2,3])
foo("a","b","c")

<class 'tuple'>
1
2
3
<class 'tuple'>
[1, 2, 3]
<class 'tuple'>
a
b
c


Similarly, you can also create functions that take an undetermined number of keyworded arguments. These arguments are then packed in a dictionary and passed to the function.

In [96]:
def foo(**kwargs):
    print(type(kwargs))
    for k, v in kwargs.items():
        print(f"key={k}, value={v}")
        
foo(exchange="NYSE", year=2021, stocks=["GME", "NOK", "AMC"])

<class 'dict'>
key=exchange, value=NYSE
key=year, value=2021
key=stocks, value=['GME', 'NOK', 'AMC']


## Unpacking

If you have a function that takes 3 parameters, you can pass to such function a list of 3 elements if you unpack them with the `*` operator

In [97]:
def foo(a, b, c):
    print(f"a={a}, b={b}, c={c}")
    
foo(*[1,2,3])

a=1, b=2, c=3


The same goes for names parameters and dictionaries, using the `**` operator

In [98]:
def foo(a, b, c):
    print(f"a={a}, b={b}, c={c}")
    

foo(**{"b":1, "c":"c", "a": [1,2,3]})

a=[1, 2, 3], b=1, c=c


## Recursion

Python also accepts function recursion, which means a defined function can call itself.

Recursion is a common mathematical and programming concept. It means that a function calls itself. This has the benefit of meaning that you can loop through data to reach a result.

The developer should be very careful with recursion as it can be quite easy to slip into writing a function which never terminates, or one that uses excess amounts of memory or processor power. However, when written correctly recursion can be a very efficient and mathematically-elegant approach to programming.

In [99]:
def fib(k):
    if k == 0:
        result = 0
    elif k == 1:
        result = 1
    else:
        result = fib(k-1) + fib(k-2)
    return result
    
print("Recursion Example Results")
print("fibonacci(25) =", fib(25))

Recursion Example Results
fibonacci(25) = 75025


# Python Lambda

A lambda function is a small anonymous function.

A lambda function can take any number of arguments, but can only have one expression.

Syntax

`lambda arguments : expression`

The expression is executed and the result is returned.

In [100]:
x = lambda a : a + 10
print(x(5))

15


In [101]:
x = lambda a, b : a * b
print(x(5, 6))

30


In [102]:
x = lambda a, b, c : a + b + c
print(x(5, 6, 2))

13


## Why Use Lambda Functions?

The power of lambda is better shown when you use them as an anonymous function inside another function.

Say you have a function definition that takes one argument, and that argument will be multiplied with an unknown number. You can then use that function definition to make a function that always doubles the number you send in, or use the same function definition to make a function that always triples the number you send in, etc.

In [103]:
def myfunc(n):
    return lambda a : a * n

doubler = myfunc(2)
tripler = myfunc(3)

doubler(11)

22

# Map, filter, reduce

Lambda functions are typically used in map, filter, and reduce. These functions facilitate a functional approach to programming.

## map

Map applies a function to all the items in an input_list. Here is the blueprint:

`map(function_to_apply, list_of_inputs)`

**n.b.** in python 3 map (and filter) return iterators and not lists (as in python 2) - so, best is simply to iterate on the results, but for the sake of the example we can also use the list constructor (not recommended otherwise, as it is wasteful)

reference: https://docs.python.org/3.0/whatsnew/3.0.html#views-and-iterators-instead-of-lists

In [104]:
items = [1, 2, 3, 4, 5]

list(map(lambda x: x**2, items))

[1, 4, 9, 16, 25]

Same result with list comprehensions (python style)

In [105]:
items = [1, 2, 3, 4, 5]

[x**2 for x in items]

[1, 4, 9, 16, 25]

The list of input can also be a list of functions :)

In [106]:
def multiply(x):
    return (x*x)
def add(x):
    return (x+x)

funcs = [multiply, add]

for i in range(5):
    v = list(map(lambda x: x(i), funcs))
    print(v)


[0, 0]
[1, 2]
[4, 4]
[9, 6]
[16, 8]


## filter

Filter creates a list of elements for which a function returns true.

In [107]:
number_list = range(-5, 5)
less_than_zero = list(filter(lambda x: x < 0, number_list))
print(less_than_zero)

[-5, -4, -3, -2, -1]


Same result with list comprehensions (python style)

In [108]:
number_list = range(-5, 5)

[ x for x in number_list if x < 0]

[-5, -4, -3, -2, -1]

## reduce

`reduce` is a function for performing some computation on a list and returning the result. It applies a rolling computation to sequential pairs of values in a list. Similar to `fold` or `inject` in other languages.

**n.b.** reduce used to be a built-in function in python 2 but was removed in python 3 because they say most of the times a for loop is more readable. You can still use it from the [functools library](https://docs.python.org/3.0/library/functools.html#functools.reduce)

reference: https://docs.python.org/3.0/whatsnew/3.0.html#builtins

For example, if you wanted to compute the product of a list of integers, you could use a basic for loop.

In [109]:
product = 1
mylist = [1, 2, 3, 4]
for num in mylist:
    product = product * num
product

24

However, a more elegant way to do this is to use the `reduce` function.

In [110]:
from functools import reduce
mylist = [1, 2, 3, 4]

product = reduce((lambda x, y: x * y), mylist, 0)
product

0

# Classes and Objects


Almost everything in Python is an object, with its properties and methods.

A Class is like an object constructor, or a "blueprint" for creating objects.

## Create Classes and Objects

To create a class, use the keyword class.

Then we can use the name of the class to create objects.

In [111]:
# Create a class named MyClass, with a property named x:

class MyClass:
    x = 5
    
# Create an object named p1, and print the value of x:

p1 = MyClass()
print(p1.x)

5


## The `__init__()` function

The examples above are classes and objects in their simplest form, and are not really useful in real life applications.

To understand the meaning of classes we have to understand the built-in `__init__()` function.

All classes have a function called `__init__()`, which is always executed when the class is being initiated (like a constructor).

Use the `__init__()` function to assign values to object properties, or other operations that are necessary to do when the object is being created.

In [112]:
# Create a class named Person, use the __init__() function to assign values for name and age:

class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age

p1 = Person("John", 36)

print(p1.name)
print(p1.age)

John
36


**Note: The __init__() function is called automatically every time the class is being used to create a new object.**

In [113]:
p1 = Person("John", 36)
p2 = Person() # will give an error

TypeError: __init__() missing 2 required positional arguments: 'name' and 'age'

## Methods

Objects can also contain methods. Methods in objects are functions that belongs to the object.

Let us create a method in the Person class.

In [114]:
# Insert a function that prints a greeting, and execute it on the p1 object:

class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age

    def myfunc(self):
        print("Hello my name is " + self.name)

p1 = Person("John", 36)
p1.myfunc()

Hello my name is John


## The self parameter

The self parameter is a reference to the class itself, and is used to access variables that belongs to the class.

It does not have to be named self , you can call it whatever you like, but it has to be the first parameter of any function in the class.

In [115]:
# Use the words mysillyobject and abc instead of self:

class Person:
    def __init__(my_object, name, age):
        my_object.name = name
        my_object.age = age

    def myfunc(abc):
        print("Hello my name is " + abc.name)

p1 = Person("John", 36)
p1.myfunc()

Hello my name is John


## Modify and delete object properties

You can modify properties on objects using the dot notation.

You can delete properties on objects by using the del keyword. You can also delete the entire object by using the del keyword.

In [116]:
# Set the age of p1 to 40:

p1.age = 40

print(p1.age)

40


In [117]:
# Delete the age property from the p1 object:

del p1.age
print(p1.age) # will raise an error

AttributeError: 'Person' object has no attribute 'age'

In [118]:
# Delete the p1 object:

del p1
print(p1.age) # will raise an error

NameError: name 'p1' is not defined

# Iterators

An iterator is an object that contains a countable number of values.

An iterator is an object that can be iterated upon, meaning that you can traverse through all the values.

Technically, in Python, an iterator is an object which implements the iterator protocol, which consist of the methods `__iter__()` and `__next__()`.

## Iterator vs Iterable

Lists, tuples, dictionaries, and sets are all iterable objects. They are iterable containers which you can get an iterator from.

All these objects have a iter() method which is used to get an iterator.

In [119]:
mytuple = ("apple", "banana", "cherry")
myit = iter(mytuple)

print(next(myit))
print(next(myit))
print(next(myit))

apple
banana
cherry


Even strings are iterable objects, and can return an iterator.

In [120]:
mystr = "banana"
myit = iter(mystr)

print(next(myit))
print(next(myit))
print(next(myit))
print(next(myit))
print(next(myit))
print(next(myit))

b
a
n
a
n
a


## Looping Through an Iterator

We can also use a for loop to iterate through an iterable object. The for loop actually creates an iterator object and executes the next() method for each loop.

In [121]:
mytuple = ("apple", "banana", "cherry")
for x in mytuple:
    print(x)

apple
banana
cherry


In [122]:
mystr = "banana"
for x in mystr:
    print(x)

b
a
n
a
n
a


## Create an Iterator

To create an object/class as an iterator you have to implement the methods `__iter__()` and `next()` to your object (`__next__()` in python 3).

As explained before, all classes have a function called `__init__()`, which allows you do some initializing when the object is being created.

The `__iter__()` method acts similar, you can do operations (initializing etc.), but must always return the iterator object itself.

The `next()` method also allows you to do operations, and must return the next item in the sequence.

In [123]:
# Create an iterator that returns numbers, starting with 1, and each sequence will increase by one (returning 1,2,3,4,5 etc.):

class MyNumbers:
    def __iter__(self):
        self.a = 0
        return self
    
    def __next__(self):
        x = self.a
        self.a += 1
        return x

myclass = MyNumbers()
myiter = iter(myclass)

print(next(myiter))
print(next(myiter))
print(next(myiter))
print(next(myiter))

0
1
2
3


# Exception Handling

When an error occurs, or exception as we call it, Python will normally stop and generate an error message.

These exceptions can be handled using the try statement.

The try block lets you test a block of code for errors.

The except block lets you handle the error.

The finally block lets you execute code, regardless of the result of the try- and except blocks.

**Example:** Since the try block raises an error (zzz is undefined), the except block will be executed. Without the try block, the program will crash and raise an error.

In [124]:
try:
    print(zzz)
except:
    print("An exception occurred")

An exception occurred


## Multiple exceptions

You can define multiple exception blocks to handle different types of errors.

**Example:** Print one message if the try block raises a NameError and another for other errors:

In [1]:
try:
    0 / 0
except NameError as ne:
    print(f"Variable zzz is not defined {e}")
except Exception as e:
    print(f"Something else went wrong:\n{e}")

Something else went wrong:
division by zero


## Else

You can use the else keyword to define a block of code to be executed if no errors were raised.

In [126]:
try:
    print("Hello")
except:
    print("Something went wrong")
else:
    print("Nothing went wrong")

Hello
Nothing went wrong


## Finally

The finally block, if specified, will be executed regardless if the try block raises an error or not. Useful to make sure that something is executed in any case (clean up resources, close file handler, commit/abort transaction, etc).

In [127]:
try:
    print(zzz)    
except:
    print("Something went wrong")
finally:
    print("The 'try except' is finished")

Something went wrong
The 'try except' is finished


# JSON

Python has a built-in package called json, which can be use to work with JSON data.

## json => python

Parse JSON: Convert from JSON to Python
If you have a JSON string, you can parse it by using the json.loads() method.

The result will be a Python dictionary.

In [128]:
import json

# some JSON:
x =  '{ "name":"John", "age":30, "city":"New York"}'

y = json.loads(x)
type(y)

dict

## python => JSON

If you have a Python object, you can convert it into a JSON string by using the json.dumps() method.

You can convert Python objects of the following types, into JSON strings:

- dict
- list
- tuple
- string
- int
- float
- True
- False
- None

In [129]:
import json

# a Python object (dict):
x = {
  "name": "John",
  "age": 30,
  "city": "New York"
}

y = json.dumps(x)
type(y)

str

In [130]:
import json

print(json.dumps({"name": "John", "age": 30}))
print(json.dumps(["apple", "bananas"]))
print(json.dumps(("apple", "bananas")))
print(json.dumps("hello"))
print(json.dumps(42))
print(json.dumps(31.76))
print(json.dumps(True))
print(json.dumps(False))
print(json.dumps(None))

{"name": "John", "age": 30}
["apple", "bananas"]
["apple", "bananas"]
"hello"
42
31.76
true
false
null


## Formatting the Result

The example above prints a JSON string, but it is not very easy to read, with no indentations and line breaks.

The json.dumps() method has parameters to make it easier to read the result.

You can use the indent parameter to define the numbers of indents and make the format more readable.

In [131]:
x = {
  "name": "John",
  "age": 30,
  "city": "New York"
}

print(json.dumps(x))
print(json.dumps(x, indent=4))

{"name": "John", "age": 30, "city": "New York"}
{
    "name": "John",
    "age": 30,
    "city": "New York"
}


You can also define the separators, default value is (", ", ": "), which means using a comma and a space to separate each object, and a colon and a space to separate keys from values.

In [132]:
print(json.dumps(x))
print(json.dumps(x, indent=4))
print(json.dumps(x, indent=4, separators=(". ", " = ")))

{"name": "John", "age": 30, "city": "New York"}
{
    "name": "John",
    "age": 30,
    "city": "New York"
}
{
    "name" = "John". 
    "age" = 30. 
    "city" = "New York"
}


# File Handling

The key function for working with files in Python is the open() function.

The open() function takes two parameters; filename, and mode.

There are four different methods (modes) for opening a file:

1. "r" - Read - Default value. Opens a file for reading, error if the file does not exist

2. "a" - Append - Opens a file for appending, creates the file if it does not exist

3. "w" - Write - Opens a file for writing, creates the file if it does not exist

4. "x" - Create - Creates the specified file, returns an error if the file exists

In addition you can specify if the file should be handled as binary or text mode

- "t" - Text - Default value. Text mode

- "b" - Binary - Binary mode (e.g. images)

**Syntax**: To open a file for reading it is enough to specify the name of the file:

`f = open("demofile.txt")`

The code above is the same as:

`f = open("demofile.txt", "rt")`

Because "r" for read, and "t" for text are the default values, you do not need to specify them.

The `open()` function returns a file object,which has a `read()` method for reading the content of the file.

In [133]:
f = open("/Users/dambrosm/.profile", "r")
print(f.read())

#alias ll="ls -la"

if which jenv > /dev/null; then eval "$(jenv init -)"; fi
alias free='ruby ~/scripts/free-mem.rb'

# added by Anaconda2 5.0.1 installer
export PATH="/Users/dambrosm/anaconda2/bin:$PATH"
#export ZEPPELIN_LOG_DIR="~/.zeppelin"
#export ZEPPELIN_HOME=/usr/local/Cellar/apache-zeppelin/0.7.3



## Read Only Parts of the File

By default the `read()` method returns the whole text, but you can also specify how many character you want to return.

In [134]:
f = open("/Users/dambrosm/.profile", "r")
print(f.read(18))

#alias ll="ls -la"


## Read Lines

You can return one line by using the `readline()` method.

In [135]:
f = open("/Users/dambrosm/.profile", "r")
print(f.readline())
print(f.readline())
print(f.readline())

#alias ll="ls -la"



if which jenv > /dev/null; then eval "$(jenv init -)"; fi



By looping through the lines of the file, you can read the whole file, line by line.

In [136]:
f = open("/Users/dambrosm/.profile", "r")
for line in f:
    print(line)

#alias ll="ls -la"



if which jenv > /dev/null; then eval "$(jenv init -)"; fi

alias free='ruby ~/scripts/free-mem.rb'



# added by Anaconda2 5.0.1 installer

export PATH="/Users/dambrosm/anaconda2/bin:$PATH"

#export ZEPPELIN_LOG_DIR="~/.zeppelin"

#export ZEPPELIN_HOME=/usr/local/Cellar/apache-zeppelin/0.7.3



You can also use the method `readlines()` which will read a file line by line, outputting into a list.

In [137]:
f = open("/Users/dambrosm/.profile", "r")
data = f.readlines() 

for line in data:
    print(line)

print(data)

#alias ll="ls -la"



if which jenv > /dev/null; then eval "$(jenv init -)"; fi

alias free='ruby ~/scripts/free-mem.rb'



# added by Anaconda2 5.0.1 installer

export PATH="/Users/dambrosm/anaconda2/bin:$PATH"

#export ZEPPELIN_LOG_DIR="~/.zeppelin"

#export ZEPPELIN_HOME=/usr/local/Cellar/apache-zeppelin/0.7.3

['#alias ll="ls -la"\n', '\n', 'if which jenv > /dev/null; then eval "$(jenv init -)"; fi\n', "alias free='ruby ~/scripts/free-mem.rb'\n", '\n', '# added by Anaconda2 5.0.1 installer\n', 'export PATH="/Users/dambrosm/anaconda2/bin:$PATH"\n', '#export ZEPPELIN_LOG_DIR="~/.zeppelin"\n', '#export ZEPPELIN_HOME=/usr/local/Cellar/apache-zeppelin/0.7.3\n']


## Write to an existing file

To write to an existing file, you must add a parameter to the `open()` function:

- "a" - Append - will append to the end of the file

- "w" - Write - will overwrite any existing content

In [138]:
f = open("demofile.txt", "a")
f.write("\none line")
f.close()

f2 = open("demofile.txt", "a")
f2.write("\nNow the file has one more line!")
f2.close()

f3 = open("demofile.txt", "r")
for line in f3:
    print(line)

Woops! I have deleted the content!

one line

Now the file has one more line!


In [139]:
f = open("demofile.txt", "w")

f.write("Woops! I have deleted the content!")
f.close()
# Note: the "w" method will overwrite the entire file.

f2 = open("demofile.txt", "r")
for line in f2:
    print(line)

Woops! I have deleted the content!


## Create a new file

To create a new file in Python, use the `open()` method, with one of the following parameters:

- "x" - Create - will create a file, returns an error if the file exist

- "a" - Append - will create a file if the specified file does not exist

- "w" - Write - will create a file if the specified file does not exist

## Closing a file

When you’re done working, you can use the `fh.close()` command to end things. What this does is close the file completely, terminating resources in use, in turn freeing them up for the system to deploy elsewhere. 

It’s important to understand that when you use the fh.close() method, any further attempts to use the file object will fail. 

Notice how we have used this in several of our examples to end interaction with a file? This is good practice.


## With Statement

You can also work with file objects using the with statement. It is designed to provide much cleaner syntax and exceptions handling when you are working with code. That explains why it’s good practice to use the with statement where applicable. 

One bonus of using this method is that any files opened will be closed automatically after you are done. This leaves less to worry about during cleanup. 

To use the with statement to open a file:

`with open(“filename”) as file: `

In [140]:
with open("demofile.txt", "r") as f:
    for line in f:
        print(line)

Woops! I have deleted the content!


The functioning of the `with` statement comes from the controlled_execution, which is defined as

```class controlled_execution:
    def __enter__(self):
        set things up
        return thing
    def __exit__(self, type, value, traceback):
        tear things down
```

```with controlled_execution() as thing:
     some code
```

When the `with` statement is executed, Python evaluates the expression, calls the `__enter__` method on the resulting value (which is called a "context guard"), and assigns whatever `__enter__` returns to the variable given by as. Python will then execute the code body, and no matter what happens in that code, call the guard object’s `__exit__` method.

## parse a json file

combine the file handler part with the json loading part

In [141]:
import json

with open('/Users/dambrosm/example.json') as f:
    data = json.load(f)
    print(data['body'])

Fixtures are a great way to mock data for responses to routes


# Jupyter architecture

![](http://jupyter.readthedocs.io/en/latest/_images/notebook_components.png)

## The kernel
The frontend, via the notebook server uses the IPython kernel. The kernel is where code is executed. The kernel itself doesn’t know anything about the notebook document: it just gets sent cells of code to execute when the user runs them.
A kernel process can be connected to more than one frontend simultaneously. In this case, the different frontends will have access to the same variables.

## The notebook server
The notebook server is responsible for saving and loading notebooks, so you can edit notebooks even if you don’t have the kernel for that language—you just won’t be able to run code.

## The frontend
The Notebook frontent:
- collects the code from the cells
- ships the code to the kernel in json via 0MQ
- gets and renders the result back
- stores code and output, together with markdown notes, in an editable document called a notebook.
When you save a notebook, it is sent from your browser to the notebook server, which saves it on disk as a JSON file with a .ipynb extension.


Let's use different browsers to see how variable are shared.
Let's have a look at the ipynb format.


In [142]:
my_variable

NameError: name 'my_variable' is not defined

# Exercise 1

Implement a simple dictionary encoding which, given a list of values and a mapping between values and codes, does the encoding and the decoding.

As data use the `main_category` column from the `ks-projects-201801.csv` file. For the mapping use the one defined in `category_mapping.json`.

Use the function `sys.getsizeof` to check whether the memory footprint of the encoded format is more efficient.

*Note:* to read the csv you can read the text file and split on delimiter, but you'll need to handle the quotation. Instead you can use the csv parser as follows:

In [143]:
import csv
import itertools

with open('ks-projects-201801.csv', 'r') as f:
    csvreader = csv.reader(f, delimiter=',', quotechar='"')
    for row in itertools.islice(csvreader, 5):
        print(" | ".join(row[:5]))

ID | name | category | main_category | currency
1000002330 | The Songs of Adelaide & Abullah | Poetry | Publishing | GBP
1000003930 | Greeting From Earth: ZGAC Arts Capsule For ET | Narrative Film | Film & Video | USD
1000004038 | Where is Hank? | Narrative Film | Film & Video | USD
1000007540 | ToshiCapital Rekordz Needs Help to Complete Album | Music | Music | USD


In [144]:
from sys import getsizeof
from itertools import chain
from collections import deque
try:
    from reprlib import repr
except ImportError:
    pass

def total_size(o, handlers={}, verbose=False):
    """ Returns the approximate memory footprint an object and all of its contents.

    Automatically finds the contents of the following builtin containers and
    their subclasses:  tuple, list, deque, dict, set and frozenset.
    To search other containers, add handlers to iterate over their contents:

        handlers = {SomeContainerClass: iter,
                    OtherContainerClass: OtherContainerClass.get_elements}

    """
    dict_handler = lambda d: chain.from_iterable(d.items())
    all_handlers = {tuple: iter,
                    list: iter,
                    deque: iter,
                    dict: dict_handler,
                    set: iter,
                    frozenset: iter,
                   }
    all_handlers.update(handlers)     # user handlers take precedence
    seen = set()                      # track which object id's have already been seen
    default_size = getsizeof(0)       # estimate sizeof object without __sizeof__

    def sizeof(o):
        if id(o) in seen:       # do not double count the same object
            return 0
        seen.add(id(o))
        s = getsizeof(o, default_size)

        if verbose:
            print(s, type(o), repr(o))

        for typ, handler in all_handlers.items():
            if isinstance(o, typ):
                s += sum(map(sizeof, handler(o)))
                break
        return s

    return sizeof(o)

In [88]:
import csv

with open('ks-projects-201801.csv', 'r') as f:
    csvreader = csv.reader(f, delimiter=',', quotechar='"')
    rows = [row for row in csvreader]
    
header = rows.pop(0)
header

['ID',
 'name',
 'category',
 'main_category',
 'currency',
 'deadline',
 'goal',
 'launched',
 'pledged',
 'state',
 'backers',
 'country',
 'usd pledged',
 'usd_pledged_real',
 'usd_goal_real']

In [89]:
header_dict = {name:index for name, index in zip(header, range(len(header)))}

In [90]:
main_category_raw = [ row[header_dict['main_category']] for row in rows]
main_category_raw

['Publishing',
 'Film & Video',
 'Film & Video',
 'Music',
 'Film & Video',
 'Food',
 'Food',
 'Food',
 'Design',
 'Film & Video',
 'Publishing',
 'Music',
 'Crafts',
 'Games',
 'Games',
 'Design',
 'Comics',
 'Publishing',
 'Music',
 'Food',
 'Fashion',
 'Fashion',
 'Theater',
 'Food',
 'Comics',
 'Music',
 'Crafts',
 'Film & Video',
 'Comics',
 'Film & Video',
 'Food',
 'Design',
 'Design',
 'Art',
 'Music',
 'Film & Video',
 'Music',
 'Film & Video',
 'Art',
 'Music',
 'Publishing',
 'Music',
 'Photography',
 'Games',
 'Music',
 'Film & Video',
 'Food',
 'Film & Video',
 'Games',
 'Art',
 'Film & Video',
 'Photography',
 'Art',
 'Photography',
 'Film & Video',
 'Food',
 'Publishing',
 'Film & Video',
 'Film & Video',
 'Games',
 'Film & Video',
 'Fashion',
 'Art',
 'Food',
 'Music',
 'Technology',
 'Theater',
 'Technology',
 'Design',
 'Crafts',
 'Technology',
 'Technology',
 'Music',
 'Games',
 'Food',
 'Design',
 'Fashion',
 'Theater',
 'Film & Video',
 'Games',
 'Fashion',
 'Games

In [91]:
import json
with open('category_mapping.json') as f:
    mapping = json.loads(f.read())

mapping

{'Film & Video': 0,
 'Fashion': 1,
 'Theater': 2,
 'Publishing': 3,
 'Music': 12,
 'Food': 5,
 'Photography': 7,
 'Games': 13,
 'Journalism': 8,
 'Comics': 9,
 'Design': 10,
 'Crafts': 11,
 'Art': 4,
 'Technology': 6,
 'Dance': 14}

In [71]:
main_category_mapped = [ mapping[x] for x in main_category_raw ]
main_category_mapped

[3,
 0,
 0,
 12,
 0,
 5,
 5,
 5,
 10,
 0,
 3,
 12,
 11,
 13,
 13,
 10,
 9,
 3,
 12,
 5,
 1,
 1,
 2,
 5,
 9,
 12,
 11,
 0,
 9,
 0,
 5,
 10,
 10,
 4,
 12,
 0,
 12,
 0,
 4,
 12,
 3,
 12,
 7,
 13,
 12,
 0,
 5,
 0,
 13,
 4,
 0,
 7,
 4,
 7,
 0,
 5,
 3,
 0,
 0,
 13,
 0,
 1,
 4,
 5,
 12,
 6,
 2,
 6,
 10,
 11,
 6,
 6,
 12,
 13,
 5,
 10,
 1,
 2,
 0,
 13,
 1,
 13,
 0,
 12,
 1,
 9,
 3,
 2,
 5,
 12,
 13,
 9,
 2,
 12,
 12,
 0,
 0,
 0,
 6,
 5,
 0,
 12,
 3,
 10,
 6,
 5,
 0,
 0,
 1,
 1,
 12,
 13,
 6,
 5,
 12,
 12,
 4,
 0,
 12,
 0,
 3,
 10,
 14,
 13,
 5,
 7,
 13,
 6,
 4,
 0,
 4,
 0,
 12,
 6,
 3,
 6,
 13,
 12,
 3,
 4,
 7,
 0,
 4,
 10,
 1,
 6,
 13,
 3,
 12,
 12,
 13,
 13,
 6,
 2,
 6,
 7,
 0,
 0,
 3,
 13,
 12,
 13,
 1,
 6,
 6,
 12,
 0,
 3,
 4,
 0,
 6,
 12,
 12,
 3,
 0,
 10,
 12,
 10,
 0,
 10,
 0,
 13,
 12,
 0,
 10,
 3,
 10,
 6,
 12,
 3,
 0,
 4,
 13,
 12,
 6,
 7,
 13,
 8,
 1,
 13,
 10,
 7,
 10,
 5,
 1,
 7,
 0,
 0,
 9,
 2,
 10,
 1,
 10,
 4,
 13,
 0,
 2,
 13,
 3,
 13,
 0,
 13,
 4,
 12,
 3,
 11,
 5,
 13,
 9,
 

In [92]:
print(f"Size mapped \t= {total_size(main_category_mapped) + total_size(mapping):,}")
print(f"Size raw \t= {total_size(main_category_raw):,}")


Size mapped 	= 3,294,978
Size raw 	= 24,672,365
