# Lecture 2

## Basic [Data Structures](#structures)

 - lists
 - tuples
 - dictionaries
 - sets
 - json
 
## Using [Modules](#modules)

 - importing modules
 - writing simple modules
---

## Data Structures<a class = 'anchor' id = 'structures'></a>

Python has four built-in general purpose containers: `lists`, `tuples`, `dictionaries` and `sets`. 

### Lists

Python lists are one of the most used datatypes. They can contain elemets of various types which makes them very popuplar but this feature makes large lists memory inefficient! 

In [None]:
l = [1, 2, 3, 's', 1.2]

In [None]:
print(type(l))

Lists can also contain other lists so lists can be *nested*. 

In [None]:
b = 'Monthy Python'
l = [1, 2, ['a', b]]

In [None]:
l

The classmethod *reverse()* reverses the list **in place**, that is it modifies the list which stays like that.

In [None]:
l.reverse()
l

When the type match, they can also be sorted, also in place.

In [None]:
l = ['f', 'a', 'z', 't']
l.sort()
l

Loops and control flows are the subjects of later classes but you can iterate on lists:

In [None]:
for element in l:
    print(element)

Lists play a very important role in Python. For example they are used in loops and other flow control structures (discussed later). There are a number of convenient functions for generating lists of various types, for example the `range` function:

In [None]:
start = 10
stop = 20
step = 2
list(range(start, stop, step)) # 'start is included in the list but 'stop' in not ! 

In [None]:
list(range(10)) # with one integer input parameter you'll get a list from zero to 'stop' with an increment valae of 1

 List can be modified, so they are *mutable*.

In [None]:
l1 = [] # instantiate an empty list

In [None]:
l1.append(1)
l1.append('a')
l1.append(1.2)
l1

In [None]:
import numpy as np
l2 = ['Python', 15, np.random.rand(2,2)] # np.random.rand(2,2) creates a 2x2 matrix with random numbers in the [0,1] interval in its elements
l2

In [None]:
l = l1 + l2 # concatenate the two lists
l

In [None]:
len(l) # length

List slicing: `l[start:stop]`, where 
- indexing starts at 0
- `'start'` is included while `'stop'` is not

In [None]:
l[1:4]

In [None]:
l[-1] # the last element is indexed as -1

In [None]:
l.remove('a') # remove a particular item
l

In [None]:
del l[2] # delete an item at a given position
print(l)

List items can be concatenated using the `join` method. 

In [None]:
l = ['Monty', 'Python']
print(''.join(l))
print(' '.join(l))
print(', '.join(l))

It comes handy in writing automated SQL scripts.

In [None]:
database = 'SALES'
table = 'WEBSHOP_SALES'
month = 'April'
day = ['20', '21', '22'] # since it is an input to a string function (the 'join()' method, list elements can only be strings! 

query = f"""
SELECT *
FROM {database}.{table}
WHERE month = '{month}'
AND day IN ({", ".join(day)})
"""

# note paranthesis (needed for the proper SQL syntax) and the curly braces (for the f-string) after the IN clause

print(query)

### Tuples

Tuples are like lists, except that they cannot be modified once created, that is they are *immutable*. 

In Python, tuples are created using the syntax `(..., ..., ...)`, or even `..., ...`:

In [None]:
height_and_weight = (165, 60)
print(height_and_weight)
print(type(height_and_weight))

In [None]:
height_and_weight = 165, 60
print(height_and_weight)
print(type(height_and_weight))

You can iterate on tuples just like on lists:

In [None]:
for parameter in height_and_weight:
    print(parameter)

Access like lists:

In [None]:
height_and_weight[0]

Remember, tuples are *immutable*

In [None]:
height_and_weight[0] = 170 # this assignment will throw an error

We can *unpack* a tuple by assigning it to a comma-separated list of variables:

In [None]:
height, weight = height_and_weight
print(height)
print(weight)

### Dictionaries

Dictionaries are also little like lists, except that each element is a *key-value pair*. Dictionaries are written with curly brackets. A dictionary is a collection which is *mutable* and does not allow duplicates. The syntax for dictionaries is `{key1 : value1, key2 : value 2, ...}:` Take keys as labels of the particular values. 

An *'item'* in a dictionary is a tuple of (*'key'*, *'value'*). 

Another similarity to lists is that these *values* can be any kind of objects: integers, strings, lists, functions, even other dictionaries. 

The difference to lists is that in case of the former the ordering is fix. (This is how we iterate on them.) The order of the dictionary values, however, is not important as these values are retreived based not on their position but on their labels or *keys*. The fact that ordering of the values is not important makes searching a dictionary very fast. (See [hash functions](https://en.wikipedia.org/wiki/Hash_function))

In [None]:
class_size = {
    'Java': 20,
    'C++': 23, 
    'Python': 29
}

In [None]:
print(type(class_size))
print(class_size)

Strings, numbers, and tuples work as keys, and any type can be a value. Other types may or may not work correctly as keys (strings and tuples work cleanly since they are immutable). 

Looking up a value which is not in the dict throws a `KeyError` -- use `in` to check if the key is in the dict, or use `dict.get(key)` which returns the value or `None` if the key is not present (or `get(key, not-found)` allows you to specify what value to return in the not-found case).

In [None]:
class_size['Java']

In [None]:
class_size['JavaScript']

In [None]:
class_size.get('JavaScript', 0)

In [None]:
'JavaScript' in class_size.keys()

To access each element:

In [None]:
for key, value in class_size.items():
    print('The number of students in the', key, 'class is', value, '.')

In [None]:
for key in class_size.keys():
    print(key)

### Sets

A Python `set` is a **collection** which is unordered, unchangeable, and unindexed. The most important aspect of sets that they _can't have two items of the same value_.

You can create a set by calling the `set()` method and adding the inputs or casting a list into a set object. In the latter case the `set()` method removes the duplicates from list.

In [None]:
ls_a = ['a', 'b', 'a', np.pi, 36] 

In [None]:
st_a = set(ls_a)
st_a

In [None]:
st_b = {'d', 36, 'Holy Grail', np.pi}
st_b

We can perform standard _set operations_ with sets.

In [None]:
# union
st_a | st_b

In [None]:
# intersection
st_a & st_b

In [None]:
# difference
st_a - st_b

In [None]:
st_b - st_a

### JSON

`JSON` is a syntax for storing and exchanging data. JSON is text, written with *JavaScript object notation*, which a special syntax for writing data as text. In order to work with JSON objects you need to import the `json` module.

In [None]:
import json

In [None]:
JSON_string = '{"class 0" : "intro", "class 1" : "coding basics", "class 2": "basic structures"}'

In [None]:
type(JSON_string)

Using the `loads()` method of the `json` module the JSON string can be cast into a dictionary. 

In [None]:
class_dict = json.loads(JSON_string)

In [None]:
type(class_dict)

In [None]:
class_dict

In [None]:
class_dict.keys()

In [None]:
class_dict.values()

In [None]:
type(class_dict)

We can cast a dictionary into a **valid** JSON string using the `dumps()` method.

In [None]:
new_dict = {
    'alpha': 0,
    'beta': 'car',
    'gamma': 1.4,
    'delta': None,
    'epsilon': True
}

In [None]:
print(new_dict)

In [None]:
new_json_string = json.dumps(new_dict)

In [None]:
print(new_json_string) # Note that the parentheses change to valid JSON standards.

The conversion patterns are given in the [JSON documentation](https://docs.python.org/2/library/json.html#py-to-json-table).

## Modules<a class = 'anchor' id = 'modules'></a>

Most of the functionality in Python is provided by `modules`. The Python Standard Library is a large collection of modules that provide cross-platform implementations of common facilities such as access to the operating system, file I/O, string management, network communication, and much more. 

Formally, a module is a Python file with the `.py` extension which define classes, functions, variables, or even runnable codes. We can also define our own modules, which is a great way produce reusable codes and to keep our workflow organized. 

The very basic Python functionalities are automatically loaded when starting Python, but most functions, methods, object types, etc. can only be used by *importing* these modules with the `import` statement. 

In [None]:
import math

This includes the whole module and makes it available for use later in the program. When using the module's methods we need to refer to the method's name. 

In [None]:
math.cos(2 * math.pi)

Alternatively, we can chose to import all symbols (functions and variables) in a module to the current namespace, so that we don't need to use the prefix `math.` every time we use something from the math module:

In [None]:
from math import *

cos(pi)

This pattern can be very convenient, but in *large programs that include many modules* it is often a good idea to keep the symbols from each module in their *own namespaces*, by using the import math pattern. This would eliminate potentially confusing problems with name space collisions, that is when more modules have functions and method with the same name which perform completely different tasks.

A third way is to import the necessary functions only.

In [None]:
from numpy import ceil, floor

In [None]:
print(ceil(5.5))
print(floor(5.5))

When importing modules we often use ***aliases***, or simple abbreviations of the module names which we can use for namespace definition. These aliases can be of any string (you could alias you module as *mickeymouse* or any non-keyword string), but it makes sense to follow general conventions for practical purposes: when we find solutions to our coding problems on the web it is easier to copy and paste those solutions if we don't need to redefine those aliases.

In [None]:
import pandas as pd # 'pd' is the conventional alias for the 'pandas' module

In [None]:
data = [1,2,3,4]

series = pd.Series(data)
print(series)

As said, we can also import our own module. 

In [None]:
import sample_module as mymod

In [None]:
mymod.print_hello('Monty Python')