**Introductory and intermediate computing for Data Science [Barcelona School of Economics]**

`Instructor:` Maxim Fedotov  
`Program:` M.Sc. in Data Science Methodology

# Class 1: variables and jupyter notebook basics

# Variables

As data scientists, you will be working with -- not surprisingly -- data. A concept of variable in Python embodies what we may mean while saying the word "data". A variable has a *name* and points to an associated *memory cell* where some *value* is stored. So, to define a variable, we just need to assign a value (more generally -- an expression) to a chosen name by using a binary operator `=`. That is, the assignment construction looks as `name = expression`. The name of a variable is also called an *identifier*.

Consider a problem of computing an area of a rectangle with sides 3 and 5. First, let's define the original data in the code cell bellow. Try to run the cell: 1. select it with your mouse or by using the arrows, 2. tap on a play button or use `control (ctrl) + enter` combination, note that the hotkey may be different for the interface you are trying to run this notebook in. Another useful combination is `shift + enter` which selects the next cell after running the previous one. 

You might be asked to choose a Python interpreter depending on the interface you are running this script in. If there is a "default" / "global" version, choose one of those, otherwise choose the latest one. If you chose to create a separate Python evironment for the course, then choose the corresponding interpreter. 

In [None]:
length = 3
height = 5.0

When you run the first cell, your interpreter starts an interactive session which allows you to add the code on the go and run new cells to obtain new results or rewrite the previous one. So, keep track of this, as it also means that previously created variables may be overwritten! This quite often leads to unintentional mistakes in the beginnig of the learning path.

Now, let's try to compute the area of the rectangle in the cell below! We will discuss the arifmetic operations defined in Python further along the notebook. For the moment, do what feels intuitive.

In [2]:
area = # complete the code

There are a couple of simple ways to display a value of a variable. For instance, you can use function `print(...)` to print the value.

In [None]:
# try to print a value of the 'area' variable


In a Jupyter Python notebook you can simply type a name of a variable in a cell and run it. This displays the corresponding value below.

In [None]:
# try to display a value of the 'area' variable in the Jupyter manner


We can learn a numerical address in the memory to which a variable name points to via function `id(...)`. You can also get a hexadecimal representation of this addres by applying function `hex(...)`.

In [2]:
print("A numeric ID of a memory cell to which the `length` variable points to:", id(length))
print("A HEX representation of this address is:", hex(id(length)))

A numeric ID of a memory cell to which the `length` variable points to: 4344054064
A HEX representation of this address is: 0x102ed0130


Note that if you create another variable with the same value, it will have the same id:

In [None]:
another_length = 3
print(
    f"For the `length` variable, the id is: {id(length)}", 
    f"For the `another_length`: {id(another_length)}", 
    sep="\n"
)

For the `length` variable, the id is: 4344054064
For the `another_length`: 4344054064


Recall, that it is possible to *overwrite* a previously created variable, i.e. replace a value that is assigned to its name. Let's illustrate this code by running the cell below.  

In [4]:
length = 6.0
print(length)

6.0


### Naming a variable

There is a conventional way to spell a name to a variable which is called: *snake_case*. Which means that if a name is a combination of several words then they are split by `_`. Preferably, letters in a name should be lower case, although it is not necessary. Note that names (or identifiers) in Python are case-sensitive. Also, a variable name cannot start with a number.  

Generally, any name that you introduce in a program should reflect the nature behind the named entity. So, just try to incorporate a purpose of an entity in its name. For example, `car_speed = 50`. From this example, you can also see that it can be hard to understand a code without any additional comments. So you can comment your code using hash character `#`, like this:

`car_speed = 50  # note that everywhere in the program speed is measured in km/h`. 

Although, if you name some constant in your program that has a pretty comon name, then non-specific name could also work. For example,

`c = 299792458  # speed of light in m/s`.

## Data Types

Data may come in different format. So there exists a broad concept called "data type". You may also think of it as a synonym to the term "data structure". Basically, a data type determines a kind of value which may be stored in its instance and which operations can be performed with it.

First, let's consider some very basic *built-in* types you will deal with most of the time. 

* Boolean values: 
    * <span style="color:green"> bool </span> (boolean) - is a binary variable, either `True` or `False`.
* Numeric types: 
    * <span style="color:green"> int </span> (integer)
    * <span style="color:green"> float </span> (floating-point number) - a decimal representation of a real number.
* Text sequence type: <span style="color:green"> str </span> (string) 
* The null object type: <span style="color:green"> None </span>

In [5]:
my_boolean = True

my_integer = 10
my_float = 5.

my_string = "hey there"  # You can also use single-quotes "'".

my_no_value_object = None

You can find out a type of a variable using function `type(...)`.

In [6]:
type(length)

float

### Sequence types: list, tuple, range

There are also sequence types such as <span style="color:green"> list </span>, <span style="color:green"> tuple </span> and <span style="color:green"> range </span>. The former two define collections of some objects, and the latter one defines a range between two integer numbers. We will disscuss them in depth in Class 3, but here we shall just see how some instances of these classes can be defined.

In [None]:
my_list = [0.1, 0.33, 0.5]
my_tuple = (
    "Ramon Trias Fargas, 25-27", 
    "Roc Boronat, 138", 
    "Carrer de la Mercè, 12",
    "Doctor Aiguader, 80", 
    "Passeig Pujades, 1", 
    "Balmes, 132-134"
)
my_range = range(10)

Feel free to use function `help(...)` onto `range` to see more information about this data type.

If you want to go in the depth of the language, it is a fine practice to look through documentation on the aspects you need. For example, have a look at [built-in types](https://docs.python.org/3/library/stdtypes.html#) page in the online Python documentation, but do not be too enthusiastic about it at the time. The documentation is quite hard to read if you are just starting your jurney, but it should become easier to understand as you accumulate experience. 

#### Indexing: select elements within sequences

You may use a construction `object[integer number]` to select an element from an instance of a sequence type. Note that indexing starts with 0, unlike in R, for example.

In [8]:
my_list[0]

0.1

In [9]:
my_tuple[0]

'Ramon Trias Fargas, 25-27'

In [10]:
my_range[0]

0

Negative numbers are also allowed for indexing. In this case, the number indicates a position of an element starting from the end of the list.

In [11]:
my_list[-1]

0.5

#### Slicing: select multiple elements

One can also select several elements from a sequence using slicing. In contrast to simple indexing, one uses a slice within the square brackets instead of a single number. 

An expression defining a slice may have a form like `start:stop` or `start:stop:step`. This construction indicates that every `step`-th entry starting from the one with index `start` (including) is taken up until index `stop` (excluded).

Note that each of the parameters may be ommitted. If `start` is empty, it is set to 0 (the beginning of a sequence) by default. If `stop` is not specified, then it is assumed to be the end of a sequence. If `step` is ommitted, it is taken to be 1. 


See some examples below, and try yourselves.

In [12]:
my_list[0:3]

[0.1, 0.33, 0.5]

In [13]:
my_list[:3]

[0.1, 0.33, 0.5]

In [14]:
my_list[1:]

[0.33, 0.5]

In [15]:
my_list[::2]

[0.1, 0.5]

In [None]:
my_list[-2:]

[0.33, 0.5]

*Exercise: Given what you saw, try to figure out a neat way to revert the list by using only slicing.*

In [None]:
# Try it below:


### WARNING: overwriting a variable

There is a vital caveat about Python. 

When we "create" a new variable referring to an already existing one, it could happen that after rewriting the original variable the new variable is also affected.

This typically happens with variables which can possibly contain more than one value somehow, including variables that are lists.

This behavior comes from the fact that variables in Python are basically references to memory addresses.

Let's consider an example below.

In [18]:
a = [1]
b = a

At this point, `b` just refers to variable `a`.

Let's see the values of that are assigned to these variables.

In [19]:
a

[1]

In [20]:
b

[1]

Now, let's try to overwrite the value in the original list, i.e. `a`.

In [21]:
a[0] = 2

We can see that the value has indeed changed in list `a`.

In [61]:
a

[2]

But so did the value in list `b`!!!

In [62]:
b

[2]

## Copying a variable: shallow and deep copies

A possible solution for resolving these types of issues is copying.

Let's import function `copy` from module `copy`, and try to create a copy of the original list. 

In [22]:
from copy import copy

a = [1]
b = copy(a) # another way of doing a copy of the list would be: b = a[:]

Then, we can see that after changing a value in the original list, the new variable is not affected.

In [26]:
a[0] = 2
print(f"a = {a}, b = {b}")


a = [2], b = [1]


### WARNING: overwriting a variable with deep structure

The problem of overwriting a new variable referring to an old one becomes worse when the old variable is a collection of objects. A classical example is a list of lists.

Let's try to use our copying solution in this case.

In [31]:
a = [[1], [2]]
b = copy(a)
a[0][0] = 3
print(f"a = {a}, b = {b}")

a = [[3], [2]], b = [[3], [2]]


We can check the addresses of the inner lists of `a` and `b` which are located at index 0.

In [32]:
id(a[0]), id(b[0])

(4419153088, 4419153088)

They are the same, which means that `copy` function was unable copy the inner lists.

There exists `deepcopy` function in `copy` package which allows to make proper copies of collections which contain objects.

Now, we shall try `deepcopy` function in the same example, and see the result.

In [33]:
from copy import deepcopy

a = [[1], [2]]
b = deepcopy(a)
a[0][0] = 3
print(f"a = {a}, b = {b}")

a = [[3], [2]], b = [[1], [2]]


Now, when we check the addresses of the inner lists of variables `a` and `b`, we see that they are different.

In [34]:
id(a[0]), id(b[0])

(4419150144, 4419148800)

## Introduction to expressions

In this section we shall learn what we can do with variables. An *expression* is a composite element of the program which consists of other "smaller" various lexical elements (including identifiers, operations and so on) united together in a specific syntactic manner. Simply, we need expressions to "express a value". Do not think too much about the definition right now, it is only to draw an analogy with a language that we speak.

### Operations

An *operation* comebined with variable identifiers is an example of *expression* in Python. You might have seen most of the *operators* that exist in Python before. The complete list of operators in Python is:

![image.png](attachment:image.png)
Source: [Python documentation](https://docs.python.org/3/reference/lexical_analysis.html#operators-1).

Today we are going to consider some of them.

Despite the operators might seem familiar, they should be treated with care because: 

(1) Some opertors are not applicable to some types, e.g. the expression `1 + "add me"` is wrong. If you use an incorrect operation you will get `TypeError: unsupported operand type(s) ...`.

(2) Effects of some operators differ depending on the types of the arguments, e.g. `2 * 2` gives `4`, try what `"double" * 2` returns.

In [21]:
# try here


### Arithmetc operations

Writing arithmetic expressions in Python is quite intuitive, especally those with addition, subtraction, multiplication and division of two numeric values. The order of the operations is as in mathematics. As usual, parentheses help to modify it if you want to.

In [35]:
print(2 * 3 + (5 - 1) / 2 )

8.0


Let's dwell on some operators that might be not that familiar. 

The *power operator* in python is `**`. The *integer division* operation `left_operand // right_operand` returns an integer part of the devision of the left operand by the right operand. The *modulo* operation `left_operand % right_operand` returns a reminder of the devision of the left operand by the right operand.

In [47]:
# print a result of taking 4 to the power of 60:


# find an integer part of devision of the previous quantity by 4 and print it:


# see what is a reminder of the devision of the previous value by 2:


# can you find an answer to the previous queston by using a composite arithmetic expression?


Some of the operations are customizable, which means that they can be programmed to be used in the same syntactical manner with obects other than numerical values. You have seen an example before `"double" * 2`. So, now we do a short digression introducing what `+` and `*` do with sequence objects (strings is one example of such objects).

#### Addition and multiplication: some sequence type objects

If the operands are sequence objects (or expressions that return sequence objects) then `+` represents *concatenation*.

In [36]:
"Left <-" + "*" + "-> Right"

'Left <-*-> Right'

The result is a new string, where the latter string is placed right after the former one without adding any space character.

Consider a construction of the form `sequence_object * integer`. The multipliaction operator `*` just concatenates the left operand (e.g. a string) with itself as many times as the right operand, `integer`, says minus one. So, effectively we get the original sequence object repeated `integer` times.

In [37]:
"double" * 2

'doubledouble'

For lists and tuples, the addition and multiplication operators work in a somewhat similar manner. See the examples below.

In [38]:
[1, 2] * 2

[1, 2, 1, 2]

In [40]:
["Y", "O"] + ["L", "O"]

['Y', 'O', 'L', 'O']

In [41]:
(1, 2) * 2

(1, 2, 1, 2)

In [42]:
("Just", "the", "two") + ("of", "us")

('Just', 'the', 'two', 'of', 'us')

### Comparisons

Sometimes we need our program to perform differently depending on different conditions.

To perform numerical comparison, we can use the following operators: `>` (greater), `<` (less), `>=` (greater or equal), `<=` (less or equal), `==` (equal), `!=` (not equal).

Suppose that we are solving a tax problem. If a salary is below 1000 EUR, then no tax is applied to it. Otherwise, it is subject to 13% tax (applied only to the amount above 1000).

Try to modify the following short piece of code to check whether the input salary is below 1000:

In [32]:
salary = int(input())

# input() function here represents a user input from stdin, but it reads 
# it as a string that is why we convert it to an integer by applying 
# the function int()

print("The input salary is below 1000:", ) # fill in a logical expression after the comma

1000
The input income is below 1000: False


To finalize the example, compute the tax that amount in the cell below.

In [None]:
tax = # complete the code here

# The next line ensures that the result is printed below the code cell.
tax

There are also *identity comparisons*: `is` and `is not`. They will be quite helpful when you process a real data. For example, you can check whether a variable has value or not:

In [43]:
variable = 5

print("The variable has a value:", variable is not None)

The variable has a value: True


Last but not the least important are *membership test operations* that are also considered as type of comparisons in Python. They can be done by literals `in` and `not in`.

In [44]:
1 in [1, 2, 3]

True

### Boolean operations

Logical "and" test is done by `and` literal, logical "or" test -- by `or`. Logical negation is done by `not` literal. See the examples below: 

In [45]:
print(True and False)

print(True or False)

print(not True)

False
True
False


In the tax problem considered before, suppose that there is a new rule being implemented: if the tax paid this year from other sources of income than one's salary is above 300 EUR and salary before tax is taxable (i.e. exceeds 1000), then the tax deduction of 3% is applied to the taxable part of the salary.

In [None]:
salary = int(input())

other_income = int(input())

print(
    "Tax deduction should be applied:",
    # fill in a logical expression after the comma
)  

Note that some objects other that booleans can give a boolean result. For example, the following values are interpreted as false: False, None, numeric zero of all types, and empty strings and containers (including strings, tuples, lists, dictionaries, sets and frozensets). This helps to write beautiful and concise conditional expressions that we will discuss later.

## A note on type conversion in Python

You have already seen that we used the function `int(...)` to convert a type of an input from `str` to `int`. There are also other functions that convert a variable to another type. Of course, it works only if the provided variable is convertible to the suggested type. The functions that are used for explicit type conversion are the same as type names.

In [46]:
initial_integer = 5
integer_to_float = float(initial_integer)
float_to_string = str(integer_to_float)
string_to_float = float(float_to_string)
float_to_integer = int(string_to_float)

Will you be able to convert `float_to_string` directly to integer?

In [None]:
# try here


There is also an *implicit type conversion* in Python. There are several numeric examples: 
* If at least one value in an expression is a complex number, then the result is converted to a complex number.
* If there is no complex numbers but there is a float number, the result is converted to float.
* If there is no complex or float values but there is an integer, the result is converted to int.
* Boolean values `True` and `False` in numerical expressions are implicitly converted to 1 and 0 respectively.

The machinery of all these exemples comes from the fact that there are lower and higher data types in Python. So, the compiler converts lower data types to higher data types if they occur together in one expression.

In [47]:
print(type(4.1 + 5))
print(type(5 // 2.5))
print(type(5 // 2))
print(True * 5 + False * -1)

<class 'float'>
<class 'float'>
<class 'int'>
5
