# SESSION A1: Python (i)

By **Miquel Torrens i Dinarès**

*Barcelona School of Economics*

*Data Science Center*

January 4, 2022

## 1. Preliminaries

### 1.1 Getting started

We will be working in Google Colab throughout the course. You can find more details about Colab [here](https://https://colab.research.google.com/notebooks/basic_features_overview.ipynb).

There are different styles to writing code, but try to stick consistently to a (sensible) style guide: it is essential that the code not only works but is also understandable by everyone. The "official" guide for `python` can be founde here:

* Style guide: https://www.python.org/dev/peps/pep-0008/

Rules to code properly are fairly common sense, just like those of natural language. You can make them your own (within reason), the important part is to be **consistent**.

### 1.2 Printing

It can be useful to *print* what we are running on the console, this can be done with the built-in `print` command. You might have noticed that Colab notebooks automatically display the value of the last expression in a cell when you execute it, so you don't need to print that.

In [None]:
# Colab automatically prints "y" to "out", 
# but we need to manually print "x" if we want to see it
x = 15 / 2
print(x)
y = x > 2
y

We may want to use placeholders to print strings. There are a number of ways to do that.

In [52]:
a = 15
b = 2
c = 2
'%i divided by %i is %.2f, and it is %s that this is greater than %i' % (a, b, a/b, (a/b) > c, c)

'15 divided by 2 is 7.50, and it is True that this is greater than 2'

We will go into what these objects are shortly.

## 2. Basic data types

Here we highlight the most important native operations you can apply to basic objects in `python`. We will go to a lower level of detail of the structure of these processes in the `R` session later in the afternoon.

### 2.1 Numerical values

In [None]:
# Assignment
a = 10  # 10

# Increment/Decrement
a += 1  # 11 (a = a + 1)
a -= 1  # 10 (a = a - 1)

# Operations
b = a + 1  # 11
c = a - 1  # 9

d = a * 2  # 20 
e = a / 2  # 5 
f = a % 3  # 1 (remainder)
g = a ** 2  # 100 (exponentiation)

# Operations with other variables
d = a + b  # 21

### 2.2 String values

You can concatenate strings together with the `+` operator:

In [5]:
"Adding" + " " + "strings" + " " + "is" + " " + "pasting"

'Adding strings is pasting'

The built-in function `len` can be used to find the length of a string:

In [6]:
len("four")

4

### 2.3 Logical values

We can evaluate the relationships between different types of data in `python`. The output of such comparisons/operations are boolean variables. Some examples:

In [8]:
# Comparing numbers
x = (1 >= 2)  # greater or equal
y = (1 == 2)  # equal
w = (1 != 2)  # different

# Parentheses are not required, but they help readability
x = (1 <= 2) and (1 > 0)  # both statements are true
y = (1 >  2) or  (1 < 3)  # at least one statement is true

# It is good practice to use "is" instead of == for checking for NoneType:
x = None
y = x is None
z = x is not None
print(y)
print(z)

True
False


### 2.4 Lists

Lists are one of the simplest multi-value objects. They are created with square brackets:

In [9]:
a_list = ["This", "is", "a", "list", "of", "strings"]

Here you can see we created a list of strings. We can also create a list of integers:

In [10]:
num_list = [1, 5, 10]

Lists in `python` need not be homogenous, you can mix object types:

In [12]:
mix_list = ['a', 'b', 1, 2, 3, True, None]

Sometimes you want to access individual elements from a list. You can do this using square brackets together with the index of the element:

In [13]:
mix_list[0]  # first element

'a'

Notice that the first element is indexed at `0`, the second element at `1`, and so on. You can also access a contiguous range of elements:

In [15]:
mix_list[1:3]  # second item (index 1) and third item (index 2) only!

['b', 1]

You can also use negative indices to access items from the end. For example, the last item:

In [None]:
num_list[-1]

You can concatenate multiple lists together with the operator `+`:

In [16]:
num_list + [40, 50, 60]

[1, 5, 10, 40, 50, 60]

And you can check for membership with the operator `in`:

In [19]:
"abc" in ["abc", "def", "ghi"]

True

### 2.5 Tuples

Tuples are created with parenthesis:

In [20]:
x = ("foo", 1)

But can also be created without any perenthesis, implied by the comma:

In [None]:
x = "foo", 1

Elements in the tuple are also accessed via the index (like lists).

Lists can be used most places that a tuple is used, so it can be confusing what the difference is between the two. Besides technicalities, the following rules can help you decide when to use a tuple and when to use a list:

* `list`: many elements (potentially), unknown number, relatively homogenous, mutable.
* `tuple`: few elements, fixed number, completely heterogeneous, immutable (fixed).

The name comes from here: double, triple, quadruple... This hints that they should be of fixed length.
Since their length is fixed, we often use them with destructuring:

In [None]:
name,num = x

Now the variable `name` contains the value `"foo"` and the variable `num` contains the value `1`.

### 2.6 Dictionaries

Dictionaries are another basic type in `python`. They are *associative* data structures. Like a standard dictionary, python dictionaries associate a `KEY` with a `VALUE` and are created with the `{`, `}` operators:


In [21]:
player = {"name": "Jane", "score": 10000}

You can access the value via the *key*, and set it in a similar way:


In [22]:
player["name"]
player["name"] = "Jane Smith"

Each key can only have one value. In the above example, we have overwritten the original `"name"`-key with a new value.

In [None]:
# Create a list whose elements are of type dictionary
# Each element on the list is a player, and each player has three attributes.
players = [{"name": "John", "score": 100, "likes": ["R"]},
           {"name": "Jane", "score": 10000, "likes": ["python"]},
           {"name": "Stephen", "score": 55, "likes": ["julia"]}]
print(players[0])

# We can fetch elements of the dictionary
players[0]['name']
players[0].get('name')

### 2.7 Properties of an object: instances, attributes and methods

Objects in `python` have *classes*. For example:

In [29]:
new_list = [1, 2, 3]  # Create a list
type(new_list)  # This function tells you the type of object

list

`new_list` is an *instance* of the class `list`. Instances can have attributes.

*Attributes* are just variables that are attached to the instance. They are accessed with dot notation:

In [40]:
# I define my own class 
class invented_class: 
    name = "John"
    score = 100
    def show(self): 
        print (self.name) 
        print (self.score) 
 
# Create an object of this new class
new_obj = invented_class()
print(new_obj.score)

100


If the attribute happens to be a function, we call it a *method*. Methods are functions that have a special purpose: they interact with the instance itself in some way.

## 3. Iterables and control flow

### 3.1 Iterate over elements of an object (`for`-loops)

You may want to perform an operation for each element in an *iterable* object, and (possibly) store the result of such operation. We do that by *looping* over this object operating on every of its elements sequentially.

In [47]:
vector = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
squared = []
for num in vector:
    squared_num = num ** 2
    squared = squared + [squared_num]

squared

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

The structure of the loop is critical and always the same: 

In [None]:
## NOT RUN
for ELEMENT in EXISTING_OBJECT:  # Use "for", "in" and ":"
    operation_on(ELEMENT)  # Indent 4 spaces (MANDATORY)

# Loop ENDS when indentation is over
## END NOT RUN

`for`-loops are applicable to every instance of an iterable object class: `list`, `tuple`, `dictionary`.

### 3.2 Operations on lists

There are three main operations we perform on a list:
1. Aggregate (*reduce* operation)
2. Apply a function to each element (*map* operation)
3. Select elements (*filter* operation)

We illustrate these with some examples.

In [None]:
# 1. AGGREGATION
# Summing the numbers in a list: 
nums = [30, 1, 4, 3, 10.5, 100]
total = 0 
for num in nums:
    total += num
    
print(total)

# 2. APPLY A FUNCTION 
# Squaring each number in a list
nums = [30, 1, 4, 3, 10.5, 100]

# This is called a "list comprehension"
# and is the python way to apply a function to 
# every element in a list
squared_nums = [num ** 2 for num in nums]
squared_nums

# 3. FILTER
# Remove all values less than 18:
ages = [0, 3, 21, 45, 10, 97]
adults = [a for a in ages if a > 17]
adults

### 3.4. Control flow

Sometimes you may want to operate only if the object satisfies some relevant criteria, for example make a decision based on some condition holds. In these cases we use `if`-statement operations.

In [51]:
gender = "male"
age = 20 

# Start of if-statement
if gender == "female":
    if age > 18:
        print("woman")
    else: 
        print("girl")
elif gender == "male":
    if age > 18:
        print("man")
    else: 
        print("boy")
else:
    print("other")


man


Again, the structure of an `if`-statement is always the same, so it is essential that we use it correctly

In [None]:
## NOT RUN
if BOOLEAN:  # Use: "if", ":" and supply a boolean condition
    ACTION1a  # Indent with 4 spaces
    ACTION1b
elif BOOLEAN:  # Second layer to "if" (else-if)
    ACTION2
else:  # Rest of scenarios (make sure all contingencies are covered)
    ACTION3

## END NOT RUN

Like in `for`-loops, statements are closed by indentation.

## 5. Functions

Functions (may) use an input to produce an output through a set of operations.

In [None]:
## NOT RUN
def function_name(input):  # Use "def", parenthesis and ":"
    operations
    return output  # End with "return"

## END NOT RUN

For example, here is a function that takes a number, and returns its square:

In [2]:
def squared(x):
    return x**2

squared(7)

16

Here is a more general function that takes a number and a power and returns the number to that power:

In [3]:
def power(x, n):
    return x ** n

power(4, 3)

64

Here is a function that returns the minimum and the sum of a list of numbers:

In [8]:
def min_sum_fun(x):
    minx = x[0] if len(x) > 0 else None
    sumx = 0.0  
    for val in x: 
        if val < minx:
            minx = val
        sumx += val 
    return minx, sumx  # Notice multiple outputs (technically a tuple)


Now we can call that function:

In [14]:
m, s = min_sum_fun([1, 5, 0.3, -1])  # Destructuring
w = (m, s) = min_sum_fun([1, 5, 0.3, -1])  # You can assign directly to a tuple

print(m)
print(s)
print(w)
print(type(w))

-1
5.3
(-1, 5.3)
<class 'tuple'>


## 6. Modules and imports

Talk about import structures

### 6.1 `numpy`

#### 6.1.1. `array` operations

### 6.2 `pandas`

## 7. Structured data operations (`pandas`)

* loading, series/dataframe, iloc, alignmnet, filtering, missing values, subsetting, filtering, groupby, concatenation, merging

## 8. Advanced concepts

## 9. Particular topics

* Dates, web scrapping, text mining?

### 9.1 Web scrapping