# **ISDI DMBA**
# Introduction to Python for Data Analysis - Self-preparation Material
---
This *notebook* was elaborated to serve as self-preparation material for assistants to the DMBA Data Specialisation.

### TABLE OF CONTENTS
1. IMPORT LIBRARIES
2. CREATE AN OBJECT
3. OBJECT TYPES
4. DATA STRUCTURES
5. OPERATORS
6. (BONUS) PROGRAMMING BASICS

### Lecturer: Juan Martin Bellido (jmbelldo@isdi.education)

# IMPORT LIBRARIES
---
Python is an *open-source* and *collaborative* programming language. This means that it's open for anyone to create and share new functionalities, which are structured in *libraries* or (also called) *packages*. We need to install libraries (only once) and import them every time we intend to use them.

Many popular libraries are already pre-installed in our environments, therefore we only need to import them. 

To import a library, we use the follow syntaxis,

```
import (library) as (tag)
```


In [None]:
# we import library "pandas" as tag it as "pd"
import pandas as pd

# CREATE AN OBJECT
---
Python is also an *object-oriented* language. This means that we can define objects (variables) that will get stored temporarily in our environment to then be invoked when needed.

*Note: objects defined will only get lost when explicitly deleted or when restarting our environment*

To define a new object, we use the following syntaxis,

```
new_object = X (data to be stored)
```


In [1]:
# What comes next is a simple example of defining a new object ("name") and then invoking it
name = "Martin" # we define a new object (name) and store text input ("Martin")
print("Hi " + name) # we use the function print() and combine a text input with the object "name"

Hi Martin


# OBJECT TYPES
---
There are three elementary object types recognized in Python,
*  *Numeric (numbers)*, which can be either (i) integers or (ii) *float* (which may contain decimals)
*  *String (text)*
*  *Logic (true/ false)*

*Note: there are more object types, but rarely used in Data Analysis*


In [None]:
# Let's create a first numeric (integer) object
numeric_obj = 2
type(numeric_obj) # we use the type() function to check object type

int

In [None]:
# Let us now overwrite the object with a new input
numeric_obj = 1.2 # important: note that here we are overwriting the object with a new input
type(numeric_obj) # we check again the type(), in this case it contains decimals, therefore it's a float

float

In [2]:
# Next, we define a new object type string (text)
string_obj = "DMBA"
type(string_obj)

str

In [3]:
# Finally, we create an object type boolean (True or False)
bool_obj = True
type(bool_obj)

bool

# DATA STRUCTURES
---
Data in Python can be stored in many different formats, each of them with unique characteristics and treatment.

There are *four* basic data structures included in Python base (below are presented in order of relevance for Data Analysis),
*   *Lists*
*   *Dictionaries*
*   *Tuples*
*   *Sets*

Besides those included as part of Python's original functionalities, there are additional data structures that were incorporated though libraries. The most important data structure for Data Analysis are DataFrames (data tables), which were incorporated by the *Pandas* library. DataFrames will not be covered in this self-phase study guide, but throughout the course.


### Lists
---
*Lists* are essentially vectors, an unidensional structure that can store data from multiple types. Each element (piece of data) stored in a list is *indexed*; this means that elements stored have a specific and unique position within the structure.   

To create a list, we use the following syntax,

```
my_list = [x,y,z ... n]
```



In [4]:
# We create our first list
my_list = [1,"a",3,False,5] # we define a new object that contains a list
my_list # we invoke our new object

[1, 'a', 3, False, 5]

In [8]:
# Note: lists may contain lists as elements
my_list_v2 = [100,"one hundred",my_list] # we create a new list, which holds the first list as one of the elements stored
my_list_v2 # we invoke the object

[100, 'one hundred', [1, 'a', 3, False, 5]]

In [9]:
# La característica principal de un list es que los elementos almacenados se encuentran indexados
## esto significa que cada elemento en un list contiene un número único que nos permite invocarlo
## nota importante: Python indexa al 0 (el 0 es el primer valor)

# The most important thing about lists is that elements stored are indexed
## as result, we are able to invoke specific elements stored
## note: python is 0 indexed; this means that the element "0" is always the first one 
my_list_v2[0] # we invoke the first element stored in list

100

In [10]:
# Now we will overwrite one (only one) specific element
my_list_v2[0] = 101 # we overwrite (replace) the first element (element 0) in vector wit a new number
my_list_v2 # we invoke the object

[101, 'one hundred', [1, 'a', 3, False, 5]]

### Dictionaries
---
*Dictionaries* are unidimensional structures that stored data under a *key-pair* logic. Similar to noSQL (non relational) databases, elements stored are **not** indexed (do not have a fix position) and can be invoked using their *key* values. *Key* values are always unique.

To create a new dictionary we use the following syntax,

```
my_dic = {
  key_1:value_1,
  key_2:value_2,
  key_3:value_3,
  ...
  }
```


In [11]:
# Let us create our first dictionary
my_dic = {
  "school": "ISDI",                     # this line is a "key-pair" element
  "program": "DMBA",                    # this is a second "key-pair" element
  "year": 2021,
  "this is a Python course": True,
  "assistants":["Maria","Juan","Jose"], # note that this "pair" is actually a list
  "lecturer":"Martin"
}

my_dic # we invoke the dictionary and verify that elements are not ordered in the same way we defined them


{'assistants': ['Maria', 'Juan', 'Jose'],
 'lecturer': 'Martin',
 'program': 'DMBA',
 'school': 'ISDI',
 'this is a Python course': True,
 'year': 2021}

In [12]:
# We now invoke one specific element "pair" by parsing its unique "key"
my_dic["program"]

'DMBA'

In [13]:
# We now replace one element ("pair") in the dictionary using (again) its "key" value
my_dic["assistants"] = ["Maria","Juan","Jose","Pepe"] # we are replacing the "pair" value of "key" "assistants"
my_dic

{'assistants': ['Maria', 'Juan', 'Jose', 'Pepe'],
 'lecturer': 'Martin',
 'program': 'DMBA',
 'school': 'ISDI',
 'this is a Python course': True,
 'year': 2021}

### Tuples
---
*Tuples* are unidimensional structures similar to lists, but *unmutable*; this means that elements in a tuple cannot be overwritten.

```
my_tupple = (x,y,z ... n)
```


In [14]:
# We create and invoke a tupple
my_tupple = (2,1,"c","a")
my_tupple

(2, 1, 'c', 'a')

In [None]:
# Tuples are indexed, therefore we can select elements based on its position within the structure
my_tupple[1]

2

In [None]:
# As mentioned, tuples are unmutable
## we cannot overwrite (replace) elements
my_tupple[1] = 3 # this will generate an execution error

TypeError: ignored

### Sets
---
*Sets* are structures similar to lists and tuples (also unidimensional), but with two big differences,
*   *sets do not allow duplicated values*;
*   *elements are not indexed*.

To define a set we use the following syntax,

```
my_set = {x,y,z ... n}
```

In [None]:
my_set = {1,10,0,5,'b','c','a'} # definimos un set
my_set  # lo invocamos, verificamos que el orden de los elementos es distinto al que hemos establecido

{0, 1, 10, 5, 'a', 'b', 'c'}

In [None]:
# A continuación, verificamos que efectivamente un set no permite la indexación
my_set[2] # esto dará error

TypeError: ignored

# OPERATORS
---
In programming, *operators* are symbols that perform specific mathematical, relational or logical operations. 

*   *Arithmetic Operators*: perform mathematical operations
*   *Comparison Operators*: contrast objects and compare values, resulting in a boolean object (true/false)
*   *Logic Operators*: used for performing logic tests, producing a boolean object (true/false) 






### Operadores aritméticos
---


| Operator 	|   Description  	|
|----------	|:--------------:	|
| +        	| addition       	|
| -        	| subtraction    	|
| *        	| multiplication 	|
| /        	| division       	|
| **  	| exponentiation 	|

In [None]:
# Arithmetic operators
## we define two numeric objects
obj_1 = 10
obj_2 = 5

In [None]:
# Test 1
obj_1/obj_2

2.0

In [None]:
# Test 2
obj_1-obj_2

5

### Operadores de comparación
---

| Operator  	|        Description       	|
|-----------	|:------------------------:	|
| <         	| less than                	|
| <=        	| less than or equal to    	|
| >         	| greater than             	|
| >=        	| greater than or equal to 	|
| ==        	| exactly equal to         	|
| !=        	| not equal to             	|


In [None]:
# Comparison objects
## we define two numeric objects
obj_1 = 10
obj_2 = 5

In [None]:
# Test 1
## check if object 1 is greater than 8
obj_1 > 8

True

In [None]:
# Test 2
## check if object 1 is equal to object 2
obj_1 == obj_2

False

### Operadores Lógicos
---

| Operator  	|        Description       	|
|-----------	|:------------------------:	|
| x and y    	| se cumplen condiciones X e Y                    	|
| x or y    	| se cumple  condiciones X o Y                   	|
| not x     	| negamos la condición X   	|


In [None]:
## we define two numeric objects
obj_1 = 10
obj_2 = 5

In [None]:
# Test 1
## test if at least one of the two conditions are fulfilled
obj_1 > 8 or obj_2 > 8 # this reads: "objet 1 is greater than 8 OR object 2 is greater than 8"

True

In [None]:
# Test 2
## test if both conditions are met simultaneously
obj_1 > 8 and obj_2 > 8 # this reads: "objet 1 is greater than 8 AND object 2 is greater than 8"

False

In [None]:
# Test 3
## negate a conditioon
not obj_1 > 8 # this reads: "objet 1 is NOT greater than 8"

False

# (BONUS) PROGRAMMING BASICS
---
There are certain functionalities available in all programming languages. Their use is not strictly needed in Data Analysis, but they are particularly helpful for process automation,

*   *Conditional Expressions*
*   *Loops*

### Conditional Expressions
---
A conditional expression performs a series of logic tests with pre-defined responses. They compose of at least one condition, but they could hold as many as needed.

Basic syntaxis,
```
if (condition):
  response if condition is met
```

Expanded syntaxis,
```
if (condition):
  response if condition is met
elif (second conditioon):
  response if condition is met
...
else: 
  response if non of previous conditions are met
```

Note that in case of having multiple conditions, those will be tested in order; first condition met will trigger the response.

In [3]:
# Let us create an IF condition that checks if an object is positive (greater than 0)
obj = 10 # create an object

if obj > 0:                         # condition 1: test if object is positive
  print("number is positive")       # print() displays text as output

## note that we have not established a response if condition is not met

number is positive


In [2]:
# Let us now expand the previous expression to also test if number is 0 or negative
obj = -10 # overwriting the object with a new number

if obj > 0:                         # condition 1: test if object is positive
  print("number is positive")    
elif obj == 0:                      # condition 2: test if object is 0
  print("number is 0")
else:                               # condition 3: test if rest of conditions are not met
  print("number is negative")



number is negative


### Loops
---
In a loop, we establish a response that will repeat multiple times, based on a given criteria.

*   *For loop*: loop will repeat as many times as number of elements stored in a given list
*   *Do while*: loop will repeat as long as a given condition is kept true

One important point in loops is that we can assign object values dinamically; this means that its value may change on each repetition.

Basic syntax,
```
vector = [x,y,z .. n]
for i in vector:
  print(i)
```

In [6]:
my_vector = ["a","b","c","d","e"] # we create a list
for x in my_vector: # this reads: for each element (X) stored in object "my_vector"
  print(x)

# this loop has 5 repetitions, as there are 5 elements stored in list
# x gets assigned value of each element on each repetition

a
b
c
d
e


In a *while loop* there is not a limited number of repetitions, but the loop will last as long as a given condition keeps true.

```
while (contition):
  response
```
Note: we need to be careful to avoid infinite (endless) loops. In that case we would be forced to restart the environment.


In [7]:
my_object = 10                # we create an object
while my_object <= 25:        # we establish a condition
  print(my_object)            # printing current value of object on each repetition
  my_object = my_object + 1   # adding 1 to current object value


10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
