# TABLE OF CONTENTS
# **IEB MiM&A** 
# Introduction to Python for Data Analysis 🐍📊
# *Notebook 1: Python Basics*
---
### TABLE OF CONTENTS
1. CREATE AN OBJECT
2. OBJECT TYPES
3. DATA STRUCTURES
4. OPERATORS
5. IMPORT LIBRARIES
6. CONDITIONAL EXPRESIONS
7. LOOPS
8. EXERCISES

Lecturer: Juan Martin Bellido (Martin)
* [linkedin.com/in/jmartinbellido](https://www.linkedin.com/in/jmartinbellido/)
* juan.martin.bellido.arias@claustro-ieb.es

Conventions used in this notebook:
* ⚠️ Warning
* ✅ Key remark

---

# CREATE AN OBJECT
---
Python is also an ***object-oriented*** language. This means that we can declare variables (objects) to store them temporarily in our environment and invoke them when needed.


> ⚠️ *Objects declared will be removed when restarting environment (kernel)*



To declare a new object, we use the following syntaxis,

```
new_object_name = X (data to be stored)
```


In [None]:
# What comes next is a simple example of declaring a new object ("name") and then invoking it
name = "Martin" # we define a new object (name) and store text input ("Martin")
print("Hi " + name) # we use the function print() and combine a text input with the object "name"

Hi Martin


# OBJECT TYPES
---
There are three elementary object types recognized in Python,
*  *Numeric (numbers)*, which can be either (i) *integers* or (ii) *float* (which may contain decimals)
*  *String (text)*
*  *Logic (true/ false)*

> ✅ *There are more object types, but rarely used in Data Analysis*




In [None]:
# Let's create a first numeric (integer) object
numeric_obj = 2
type(numeric_obj) # we use the type() function to check object type

int

In [None]:
# Let us now overwrite the object with a new input
numeric_obj = 1.2 # important: note that here we are overwriting the object with a new input
type(numeric_obj) # we check again the type(), in this case it contains decimals, therefore it's a float

float

In [None]:
# Next, we define a new object type string (text)
string_obj = "IEB"
type(string_obj)

str

In [None]:
# Finally, we create an object type boolean (True or False)
bool_obj = True
type(bool_obj)

bool

# DATA STRUCTURES
---
Data in Python can be stored in many different formats, each of them with unique characteristics and treatment.

There are *four* basic data structures included in Python base,
*   *Lists*
*   *Dictionaries*
*   *Tuples*
*   *Sets*



> ✅ *Besides those included as part of Python's original functionalities, there are additional data structures that were incorporated though third party libraries. The most important data structure for Data Analysis are DataFrames (data tables), which were incorporated by the **Pandas** library. DataFrames will not be covered in this notebook*




### Lists
---
*Lists* are essentially vectors, an unidensional structure that can store data from multiple types. Each element (piece of data) stored in a list is *indexed*; this means that elements stored have a specific and unique position within the structure.   

To create a list, we use the following syntax,

```
my_list = [x,y,z ... n]
```

> ✅ *The fact that a data structure is **indexed** means that Python stores not only multiple objects, but also the relative position that each object takes within the structure*

> ⚠️ *Python indexing starts at 0; this means that 0 is always the first position in an indexed structure*


In [None]:
# We create our first list
my_list = [1,"a",3,False,5] # we define a new object that contains a list
my_list # we invoke our new object

[1, 'a', 3, False, 5]

In [None]:
# Note: lists may contain lists as elements
my_list_v2 = [100,"one hundred",my_list] # we create a new list, which holds the first list as one of the elements stored
my_list_v2 # we invoke the object

[100, 'one hundred', [1, 'a', 3, False, 5]]

In [None]:
my_list_v2[0] # we invoke the first element stored in list

100

In [None]:
# Let us now overwrite one (only one) object in structure
my_list_v2[0] = 101 # we overwrite (replace) the first element (position 0) in vector with a new number
my_list_v2 # we invoke the object

[101, 'one hundred', [1, 'a', 3, False, 5]]

### Dictionaries
---
*Dictionaries* are unidimensional structures that stored data under a *key-pair* logic. Similar to noSQL (non relational) databases, elements stored are **not** indexed (do not have a fix position) and can be invoked using their *key* values. *Key* values are always unique.

To create a new dictionary we use the following syntax,

```
my_dic = {
  key_1:value_1,
  key_2:value_2,
  key_3:value_3,
  ...
  }
```


In [None]:
# Let us create our first dictionary
my_dic = {
  "school": "ISDI",                     # this line is a "key-pair" element
  "program": "DMBA",                    # this is a second "key-pair" element
  "year": 2021,
  "this is a Python course": True,
  "assistants":["Maria","Juan","Jose"], # note that this "pair" is actually a list
  "lecturer":"Martin"
}

my_dic # we invoke the dictionary and verify that elements are not ordered in the same way we defined them


{'assistants': ['Maria', 'Juan', 'Jose'],
 'lecturer': 'Martin',
 'program': 'DMBA',
 'school': 'ISDI',
 'this is a Python course': True,
 'year': 2021}

In [None]:
# We now invoke one specific element "pair" by parsing its unique "key"
my_dic["program"]

'DMBA'

In [None]:
# We now replace one element ("pair") in the dictionary using (again) its "key" value
my_dic["assistants"] = ["Maria","Juan","Jose","Pepe"] # we are replacing the "pair" value of "key" "assistants"
my_dic

{'assistants': ['Maria', 'Juan', 'Jose', 'Pepe'],
 'lecturer': 'Martin',
 'program': 'DMBA',
 'school': 'ISDI',
 'this is a Python course': True,
 'year': 2021}

### Tuples
---
*Tuples* are unidimensional structures similar to lists, but *unmutable*; this means that elements in a tuple cannot be overwritten.

```
my_tupple = (x,y,z ... n)
```


In [None]:
# We create and invoke a tupple
my_tupple = (2,1,"c","a")
my_tupple

(2, 1, 'c', 'a')

In [None]:
# Tuples are indexed, therefore we can select elements based on its position within the structure
my_tupple[1]

2

⚠️ The code cell below will intentionally result in error

In [None]:
# As mentioned, tuples are unmutable
## we cannot overwrite (replace) elements
my_tupple[1] = 3 # this will generate an execution error

TypeError: ignored

### Sets
---
*Sets* are structures similar to lists and tuples (also unidimensional), but with two big differences,
*   *sets do not store duplicated values*;
*   *elements are not indexed*.

To define a set we use the following syntax,

```
my_set = {x,y,z ... n}
```

In [None]:
my_set = {1,10,10,0,5,'b','b','c','a'} # declaring a set
my_set  # we invoke the set; we observe only unique values are stored

{0, 1, 10, 5, 'a', 'b', 'c'}

⚠️ The code cell below will intentionally result in error

In [None]:
# We now verify that sets do not allow indexation
my_set[2]

TypeError: ignored

# OPERATORS
---
In programming, *operators* are symbols that perform specific mathematical, relational or logical operations. 

*   *Arithmetic Operators*: perform mathematical operations
*   *Comparison Operators*: contrast objects and compare values, resulting in a boolean object (true/false)
*   *Logical Operators*: used for performing logic tests, producing a boolean object (true/false) 






### Arithmetic operators
---


| Operator 	|   Description  	|
|----------	|:--------------:	|
| +        	| addition       	|
| -        	| subtraction    	|
| *        	| multiplication 	|
| /        	| division       	|
| **  	| exponentiation 	|

In [None]:
# Arithmetic operators
## we define two numeric objects
obj_1 = 10
obj_2 = 5

In [None]:
# Test 1
obj_1/obj_2

2.0

In [None]:
# Test 2
obj_1-obj_2

5

### Comparison operators
---

| Operator  	|        Description       	|
|-----------	|:------------------------:	|
| <         	| less than                	|
| <=        	| less than or equal to    	|
| >         	| greater than             	|
| >=        	| greater than or equal to 	|
| ==        	| exactly equal to         	|
| !=        	| not equal to             	|


In [None]:
# Comparison objects
## we define two numeric objects
obj_1 = 10
obj_2 = 5

In [None]:
# Test 1
## check if object 1 is greater than 8
obj_1 > 8

True

In [None]:
# Test 2
## check if object 1 is equal to object 2
obj_1 == obj_2

False

### Logical operators
---

| Operator  	|        Description       	|
|-----------	|:------------------------:	|
| x and y    	| se cumplen condiciones X e Y                    	|
| x or y    	| se cumple  condiciones X o Y                   	|
| not x     	| negamos la condición X   	|


In [None]:
## we define two numeric objects
obj_1 = 10
obj_2 = 5

In [None]:
# Test 1
## test if at least one of the two conditions are fulfilled
obj_1 > 8 or obj_2 > 8 # this reads: "objet 1 is greater than 8 OR object 2 is greater than 8"

True

In [None]:
# Test 2
## test if both conditions are met simultaneously
obj_1 > 8 and obj_2 > 8 # this reads: "objet 1 is greater than 8 AND object 2 is greater than 8"

False

In [None]:
# Test 3
## negate a conditioon
not obj_1 > 8 # this reads: "objet 1 is NOT greater than 8"

False

# IMPORT LIBRARIES
---
Python is an *open-source* and *collaborative* programming language. This means that it's open for anyone to create and share new functionalities, which are structured in *libraries* or (also called) *modules* or *packages*. We need to install libraries (only once) and import them every time we intend to use them.

> ✅ *Libraries are installed only once in your environment, then imported on every session*

Many popular libraries are already pre-installed in our environments, therefore we only need to import them. 

To import a library, we use the follow syntaxis,

```
import (library) as (tag)
```

> ✅ *Tags allow to change the name used to refer to a library. We are free to choose our own tags, but it is highly recommended to follow tag convensions (e.g. using "pd" for the pandas library)*


We can only import a library once it is installed in our environment. There are many ways to do so, the most popular is using *pip* - a package management system built by the MIT.

Syntaxis to install a library using pip: 

```
pip install pandas
```

> ⚠️ *Note that we previously need to have pip installed in our system. This is commonly installed automatically along with Python.*




In [None]:
# we import library "pandas" as tag it as "pd"
import pandas as pd

Once importing library *pandas* we are now ready to use its functionalities. As an example, we will use function read_csv() from pandas to import a dataset. 

To import a data table we will use function *pd.read_csv()*:

```
pd.read_csv('path')
```
> ⚠️ *This will work only after importing library pandas*


In [None]:
# Importing a dataframe from an URL
df_jamesbond = pd.read_csv("https://data-wizards.s3.amazonaws.com/datasets/jamesbond.csv")

In [None]:
# Visualizing our imported dataset
df_jamesbond

# CONDITIONAL EXPRESSIONS
---

A conditional expression consists of a series of logical tests with pre-defined responses if results prove true or false. Conditional expressions must include at least one condition, but can contain as many as needed.

> ✅ *In programming, conditions are always built using operators; the result can be either True or False*

Basic syntaxis,
```
if (condition):
  response if condition is met
```

Expanded syntaxis,
```
if (condition):
  response if condition is met
elif (second condition):
  response if condition is met
...
else: 
  response if non of previous conditions are met
```

> ✅ *Note that in case of having multiple conditions, those will be tested in order; first condition met will trigger the response*

In [None]:
# Let us create an IF condition that checks if an object is positive (greater than 0)
obj = 10 # create an object

if obj > 0:                         # condition 1: test if object is positive
  print("number is positive")       # print() displays text as output

## note that we have not established a response if condition is not met

number is positive


In [None]:
# Let us now expand the previous expression to also test if number is 0 or negative
obj = -10 # overwriting the object with a new number

if obj > 0:                         # condition 1: test if object is positive
  print("number is positive")    
elif obj == 0:                      # condition 2: test if object is 0
  print("number is 0")
else:                               # condition 3: test if rest of conditions are not met
  print("number is negative")



number is negative


# ITERATIONS
---
Traditionally in programming, iterations are made using *loops* - a sequence of instructions that is executed following a specific criteria.

There are two basic types of loops:

*   *For loop*: loop will repeat as many times as number of elements stored in a given data structure (e.g. a list)
*   *Do while*: loop will repeat as long as a given condition keeps true

Python incorporates a simplified way to build iterations: *comprehension lists*


### For loops
---

Basic syntax,
```
vector = [x,y,z .. n]
for i in vector:
  print(i)
```




> ✅ *In loops, we usually assign object values dinamically; this means that its value may change on each repetition (see x below)*



In [None]:
my_vector = ["a","b","c","d","e"] # we create a list
for x in my_vector: # this reads: for each element (X) stored in object "my_vector"
  print(x)

# this loop has 5 repetitions, as there are 5 elements stored in list
# x gets assigned value of each element on each repetition

a
b
c
d
e


If we want to iterate for a given large number of times (e.g. 100 times); we can use the range() function to avoid manually creating a data structure with a large number of objects stored (e.g. 100 objects).

```
range(start,end)
```


In [None]:
for value in range(1,10):
  print(value)

1
2
3
4
5
6
7
8
9


### While loops
---
In a *while loop*, iteration will continue until a given condition is met.

```
while (condition):
  response
```

> ✅ *Iterations in a while loop are not pre-established, as do not depend on the number of objects stored in a data structure*

> ⚠️ *On while loops, we need to be careful to avoid infinite (endless) loops. In that case we would be forced to restart the environment*



In [None]:
my_object = 10                # we create an object
while my_object <= 25:        # we establish a condition
  print(my_object)            # printing current value of object on each repetition
  my_object = my_object + 1   # adding 1 to current object value


10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25


### Comprehension lists
---
Python includes comprehension lists, as a simplified structure that allows to generate iterations in only one line of code.

```
[expression(element) for element in list]
```




In [None]:
# Let us build an iteration that multiplies by 2 each number from 1 to 10
[x*2 for x in range(1,11)]

[2, 4, 6, 8, 10, 12, 14, 16, 18, 20]

In [None]:
[print('I love ' + food) for food in ['pizza','sushi','paella','tapas']]

I love pizza
I love sushi
I love paella
I love tapas


[None, None, None, None]

# EXERCISES
---


### EX 1
--- 
Create a list that contains 10 different sting objects with names of food. Use a For Loop to iterate on that list to print "I love ... (each food element in list)"

> e.g. "I love pizza" "I love sushi" ...



### EX 2
--- 
Then list provided below contains 5 objects that represent food names. Use a for loop that iterates through the list and prints "I love ... (element)"; exclude "onions" and "pumpkin" without changing the list.

In [None]:
food_list = ["pizza","onions","pasta","chicken","pumpkin"]