# Introduction to Python
Python is a general purpose computer language that is **interpreted** and **high-level**. It was optimized for readibility and ease of writing. It isn't necessarily the fastest, but it is often "fast enough" for most applications. Python has a huge repository of extentions and tools which are easily found and installed, which makes it easy for non-experts to get started with.  In fact, it is now one of the most widely used programming languages in the world.

You are looking at a notebook file, which allows for you to interact with code and run it in chunks. Execute the contents of a code "cell" by pressing `shift + return` on your keyboard, or pressing the play button on the top bar and in the Run menu. Try it below:

In [None]:
print("Hello, world!")

Hello, world!


## 1. Data types

Each language has a set of built in types that it knows how to handle.Variables are used to store information to be referenced and manipulated in a program.  The common types we'll use in Python are
* **Numbers**
* **Text**
* **Boolean**

Variables are used to store information to be referenced and manipulated in a program. You can give whatever name you want for your variable. But the best practice is to give a descriptive name of what it represents, so if someone else reads reads your code (or even if you read your code years later), it will be possible to understand the meaning of the variables that you declared.

### Numbers

In [None]:
# Integers
a = 1 # I can declare a variable by using =
b = -59
c = 12
d = a + b + c
print("d = "+ str(d)) # This is called 'string concatenation.' You can put strings together using the + operation. The str function converts objects into strings.
print("Its type is "+ str(type(d)))  # The type function returns the type of an object

d = -46
Its type is <class 'int'>


In [None]:
# Floating point numbers (floats)
a = 1.                             # Add a decimal after an integer to make it a float
b = -59.23923                      # Floats can have high precision
c = 1.2e3                          # Floats understand scientific notation. This is equivalent to 1.2x10^3, aka 1200
d = a + b + c
print("d = "+str(d)) # Using str again to turn d into a string so that it can be combined with the string "d = "
print("Its type is "+str(type(d)))

d = 1141.76077
Its type is <class 'float'>


### Strings

Strings are special lists of characters meant to hold text. They can appear wrapped in single or double quotes (`'` or `"`). A backslash `\` "escapes" a character within a string so that it isn't interpreted as code.

In [None]:
string1 = "This is a string"            # double quotes
string2 = 'This is another string'      # single quotes
string3 = "This isn't a problem"        # single quote mark within double quotes
string4 = 'Here\'s another way to do it'# Escaped single within singles
# string5 = 'This won't work.'
print(string1)
print(string2)

You can get the length of a string using len(), and you can access a particular character using square brackets to indicate location starting at 0. Negative indices are equivalent to length minus the positive index.

In [None]:
a = "Howdy!"
print(a[3])
print(a[-3])
print(len(a))

#### String methods
Many objects in Python have functions specific to them, called methods. These can be called with dot notation: `object.method()`. Examples for strings are given below:

In [None]:
b = "I like Mary Lyon"
b.upper() # returns an uppercase version

In [None]:
b.index("Mary") #returns the index of a string inside another

In [None]:
b.split() # returns a list of smaller strings.
          # By default it splits on spaces, but you can also give it an input like b.split("y")

#### String slicing
More than one character can be extracted from a string using "slices." The notation for these is `list[start:end:stepsize]`.

In [None]:
s = "Howdy!"
print(s[1])   # Takes one character starting at 1
print(s[4])   # Takes one character starting at 4
print(s[1:4]) # Takes all the characters between 1 and 4
print(s[3:])  # Takes all characters starting at 3 and going to the end
print(s[:3])  # Takes all characters until reaching 3
print(s[::2]) # Takes every other character
print(s[::-1]) # Reverses the string

o
y
owd
dy!
How
Hwy
!ydwoH


### Booleans

Python has a type dedicated to the True and False values, called boolean or bool for short. The only values possible are True and False (capital T and F). You can perform logical comparisons and get boolean values back. Comparisons include `and`, `or` , `not`, `>`, `<`, `>=`, `<=`, `==` (is equal to), and `!=` (not equal to). Look at each cell below, make a guess as to what it should return, and then run the cell. Did it agree with your guess?

In [None]:
True and False

In [None]:
False and False

In [None]:
True or False

In [None]:
1 != 2

In [None]:
(1 != 2) and (5 > 5)

## 2. Data Structures

A data structure is a way of organizing data so that it can be used, manipulated, and accessed by computer programs in an efficient way. There are many types of data structures, and the most adequate data structure to use in a given program depends highly on the characteristics of the problem at hand. This is a complex topic in computer science, and to simplify it for you we will approach only three data structures.

* **Lists**
* **Sets**
* **Dictionaries**

### Lists
A list is a collection of elements where the order matters. You can access list elements either by using element values or by the indexes (the position of the element in the list, starting with 0). It is also possible to get 'slices' of a list. You can add elements to the list, delete elements, compare the list with other lists and so on.

In [None]:
#example of list
list_of_fruits = ['banana', 'watermelon', 'apple']
print('These are the elements contained in the list_of_fruits list ',
      list_of_fruits)

These are the elements contained in the list_of_fruits list  ['banana', 'watermelon', 'apple']


In [None]:
#get the first element of the list by index
print('The value that is stored in the first position of the list is',
      list_of_fruits[0])

The value that is stored in the first position of the list is banana


In [None]:
#get the index of an element by its value
print('The index of the element banana in the list is',
    list_of_fruits.index('banana'))

The index of the element banana in the list is 0


In [None]:
#add element to list at the last position using the function append()
list_of_fruits.append('pineapple')
print('The string pineapple was added in the last position', list_of_fruits)


The string pineapple was added in the last position ['banana', 'watermelon', 'apple', 'pineapple']


In [None]:
#insert element at index using the function insert()
list_of_fruits.insert(1, 'kiwi')
print('The string kiwi was added in the second position', list_of_fruits)

The string kiwi was added in the second position ['banana', 'kiwi', 'watermelon', 'apple', 'pineapple']


In [None]:
#remove element from list
list_of_fruits.remove('watermelon')
print('The string watermelon was removed', list_of_fruits)

The string watermelon was removed ['banana', 'kiwi', 'apple', 'pineapple']


In [None]:
#remove element from list by its index
list_of_fruits.pop(0)
print('The string in position 0 was removed', list_of_fruits)

The string watermelon was removed ['kiwi', 'apple', 'pineapple']


### Sets
A set is a collection of elements where the order does not matter. Sets also don't have indexes. Like in lists, you can add elements to the set, delete elements, compare the set with other sets, compute the union and the intersection of sets, and so on.

In [None]:
#example of set
first_set_of_fruits = {'banana', 'watermelon', 'apple'}
second_set_of_fruits = {'orange', 'banana', 'pomegranate'}

In [None]:
#add element to set
first_set_of_fruits.add('kiwi')
print(first_set_of_fruits)

In [None]:
#union of sets
print(first_set_of_fruits.union(second_set_of_fruits))

In [None]:
#intersection of sets
print(first_set_of_fruits.intersection(second_set_of_fruits))

In [None]:
#difference of sets
print(first_set_of_fruits.difference(second_set_of_fruits))
print(second_set_of_fruits.difference(first_set_of_fruits))

In [None]:
#remove element from set
first_set_of_fruits.remove('apple')
print(first_set_of_fruits)

### Dictionaries
A dictionary is used to store data values in the form of key:value pairs. It is a collection where the order does not matter and does not allow duplicates of key:value pairs. You can add values to a dictionary, update or remove existing values. You can see the keys of a dictionary by using the function keys() and the values by using the function values()

In [None]:
#You can first create an empyt dictionary and then add elements to it
#or create the dictionary with the elements already
prices_of_grocery = dict()
prices_of_grocery['eggs'] = 2
prices_of_grocery['bread'] = 1

prices_of_grocery2 = {'eggs':2, 'bread':1}

#the result is the same
print(prices_of_grocery, prices_of_grocery2)

{'eggs': 2, 'bread': 1} {'eggs': 2, 'bread': 1}


In [None]:
#See dictionary keys
print(prices_of_grocery.keys())

#See dictionary values
print(prices_of_grocery.values())

dict_keys(['eggs', 'bread'])
dict_values([2, 1])
{'eggs': 2, 'bread': 1} {'eggs': 2, 'bread': 1}


In [None]:
#update the value of an entry in the dictionary
prices_of_grocery['eggs'] = 1.5
print(prices_of_grocery)

{'eggs': 1.5, 'bread': 1}


In [None]:
#delete dictionary entry
del prices_of_grocery['bread']
print(prices_of_grocery)

## 3. Operators

Of couse variables are useful only if you can do something with them! For that, we use operators. Here are some operators that will be useful for you (there are more operators than that), execute the code below to see the examples of the operations.

### Aritmetic operators
*   Addition: adds the values of two variables (+)
*   Subtraction: subtracts the values of two variables (-)
*   Multiplication: multiplies the values of two variables (*)
*   Division: divides the values of two variables (/)

In [None]:
money_to_buy_eggs = 2.5
money_to_buy_apples = 1.3
print('money_to_buy_eggs + money_to_buy_apples results in', money_to_buy_eggs + money_to_buy_apples, '\n')
print('money_to_buy_eggs - money_to_buy_apples results in', money_to_buy_eggs - money_to_buy_apples, '\n')
print('money_to_buy_eggs * money_to_buy_apples results in', money_to_buy_eggs * money_to_buy_apples, '\n')
print('money_to_buy_eggs / money_to_buy_apples results in', money_to_buy_eggs / money_to_buy_apples, '\n')

money_to_buy_eggs + money_to_buy_apples results in 3.8 

money_to_buy_eggs - money_to_buy_apples results in 1.2 

money_to_buy_eggs * money_to_buy_apples results in 3.25 

money_to_buy_eggs / money_to_buy_apples results in 1.923076923076923 

money_to_buy_eggs > money_to_buy_apples is True because 2.5 > 1.3 

money_to_buy_eggs < money_to_buy_apples is False because 2.5 is not less than 1.3 

money_to_buy_eggs > money_to_buy_apples is True 

money_to_buy_eggs <= money_to_buy_apples is False 

money_to_buy_eggs >= money_to_buy_apples is True 

money_to_buy_eggs == money_to_buy_apples is False 

money_to_buy_eggs != money_to_buy_apples is True 



### Assignment operators
*   Attribution: assigns a value to a variable (=)

In [None]:
fruit='banana'
print('The value stored in the variable fruit is',fruit, '\n')

another_fruit='watermelon'
yet_another_fruit = 'banana'

print('The value of the operation fruit == another_fruit is', fruit == another_fruit, 'because banana and watermelon are two different values.\n')
print('The value of the operation fruit == yet_another_fruit is', fruit == yet_another_fruit, 'because the two variables store the value \'banana\'.\n')

The value stored in the variable fruit is banana 

The value of the operation fruit == another_fruit is False because banana and watermelon are two different values.

The value of the operation fruit == yet_another_fruit is True because the two variables store the value 'banana'.



### Membership Operators
*   in: true only if the value is present in the data structure
*   not in: true only if the value is not present in the data structure

In [None]:
words_related_to_computers = ['mouse', 'keyboard', 'screen', 'cpu']

#check if element is in list
print('cpu' in words_related_to_computers, ': because \'cpu\' is in the list.\n')
print('key' in words_related_to_computers, ': because \'key\' is in not the list.\n')


True : because 'cpu' is in the list.

False : because 'key' is in not the list.



### Comparison operators
*   Equality: check if the values of two given variables are equal (==)
*   Difference: check if the values of two given variables are different (!=)
*   Greater than: check if the value of one given variable is greater than the other (>)
*   Less than: check if the value of one given variable is less than the other (<)
*   Greater or equal than: check if the value of one given variable is greater or equal than the other (>=)
*   Less or equal than: check if the value of one given variable is less or equal than the other (<=)

In [None]:
print('money_to_buy_eggs > money_to_buy_apples is', money_to_buy_eggs > money_to_buy_apples, 'because 2.5 > 1.3 \n')
print('money_to_buy_eggs < money_to_buy_apples is', money_to_buy_eggs < money_to_buy_apples, 'because 2.5 is not less than 1.3 \n')
print('money_to_buy_eggs > money_to_buy_apples is', money_to_buy_eggs > money_to_buy_apples, '\n')
print('money_to_buy_eggs <= money_to_buy_apples is', money_to_buy_eggs <= money_to_buy_apples, '\n')
print('money_to_buy_eggs >= money_to_buy_apples is', money_to_buy_eggs >= money_to_buy_apples, '\n')
print('money_to_buy_eggs == money_to_buy_apples is', money_to_buy_eggs == money_to_buy_apples, '\n')
print('money_to_buy_eggs != money_to_buy_apples is', money_to_buy_eggs != money_to_buy_apples, '\n')

In [None]:
# When working with text, it is important to lowercase the words
print("banana" == "Banana")
print("banana" == "Banana".lower()) #You can use the .lower() method

False
True


### Logical operators (also known as boolean operators)
*   and: true only if the two statements are true
*   or: true if at least one the two statements are true
*   not: reverse the result. (e.g, returns false if the result is true)

In [None]:
my_age=28
your_age=22

print('my_age >= 18 and your_age >= 18 is', my_age >= 18 and your_age >= 18, 'because the values are both above 18. \n')

print('my_age >= 23 or your_age >= 23 is', my_age >= 23 or your_age >= 23, 'because only one of the values has to be equal or greater than 23. \n')

my_age >= 18 and your_age >= 18 is True because the values are both above 18. 

my_age >= 23 or your_age >= 23 is True because only one of the values has to be equal or greater than 23. 



In [None]:
#Inversing the value of comparison.
print('not my_age >= 18 is', not my_age >= 18, '\n')

not my_age >= 18 is False 



## 4. Control Structures
In programming languages, in order to manipulate data, test statements, iterate through values, etc, we use control structures.
There are two main types of control strutures you will use during this couse: **if-else (conditional) statements** and **loops**.

### Conditional statements
As seen the previous notebook, Python supports the use of logical conditions between variables and data structures (e.g, >, <, ==, !=, and, or, etc).
Conditional statements, (also known as selection structures) can help us test logical conditions or statements, by using the keywords **if**, **elif** and **else**.

*  An "if statement" is written by using the if keyword, followed by the statement
*  The elif keyword is a way of saying "if the previous conditions were not true, then try this condition"
*  The else keyword catches anything which isn't caught by the preceding conditions.


In [None]:
### Examples of conditional statements ###
banana = 'banana'
apple = 'apple'

#Testing equality of variables
if banana==apple:
  print('The variables banana and apple have the same value')
else:
  print('The variables banana and apple do not have the same value')

In [None]:
#It also works the same way to test the difference
if banana!=apple:
  print('The variables banana and apple do not have the same value')
else:
  print('The variables banana and apple have the same value')

In [None]:
carmen_height=1.65
mariana_height=1.55

if carmen_height > mariana_height:
  print('Carmen is the taller than Mariana.')
elif carmen_height < mariana_height:
  print('Mariana is the taller than Carmen.')
else:
  print('Mariana and Carmen have the same height.')

In [None]:
#Testing membership. Since we are not using elif and else, all conditions will be tested.
books_written_by_tolkien = ['Lord of the rings', 'Silmarillion', 'Hobbit']
if 'Lord of the rings' in books_written_by_tolkien:
  print('Lord of the rings was a book written by Tolkien')
if 'Silmarillion' in books_written_by_tolkien:
  print('Silmarillion was also a book written by Tolkien')
if 'Hobbit' in books_written_by_tolkien:
  print('Hobbit was also a book written by Tolkien')

In [None]:
#Testing the three statements simultaneously by combining the statements with the logical operator and
if 'Lord of the rings' in books_written_by_tolkien and 'Hobbit' in books_written_by_tolkien and 'Silmarillion' in books_written_by_tolkien:
  print('all these books were written by Tolkien')
#Remember that if we use the operator 'or', at least one of the statements has to be true.
if 'Lord of the rings' in books_written_by_tolkien or 'Alice in Wonderland' in books_written_by_tolkien or 'Carandiru' in books_written_by_tolkien:
  print('At least one of these books was written by Tolkien')

### Loops
A loop is a way of iterating through data strutures (e.g., lists, dictionaries). Generally, to iterate trought the data until a certain condition is satisfied, use the keyword **while**. Otherwise use the keyword **for**.

To use the for loop without accounting for the indexes, use the structure:
**for** element **in** list.
It is possible to use auxiliary functions like 'enumerate()' to access both the list elements and the indexes of the elements, for this use the structure:
**for** index, element **in** enumerate(list).

For while loops, simply use the structure:
**while** condition.

#### For loops

In [None]:
fruits = ['apple', 'banana', 'orange']
#iterate through the list printing the values stored in it
for fruit in fruits:
  print(fruit)

apple
banana
orange


In [None]:
#iterate through the list printing the values stored in it along with the indexes
for i, fruit in enumerate(fruits):
  print(i, fruit)

0 apple
1 banana
2 orange


In [None]:
#We can combine loops with conditionals
imdb_ratings = [8, 7, 6, 7.5, 5, 9]
for rating in imdb_ratings:
  if rating >= 8:
    print('The IMDB for this movie is', rating, 'so perhaps it is a good movie.')
  elif rating < 8 and rating >= 7:
    print('The IMDB for this movie is', rating, 'so perhaps it is an okayish movie.')
  else:
    print('The IMDB for this movie is', rating, 'so perhaps it is a bad movie.')

#### While loops

In [None]:
price_of_grocery_items = [1, 2.5, 5, 2, 8.5, 4, 3]
total_money_for_groceries = 10

#The variable item will auxiliate us to access the elements in the list
#price_of_grocery_items. Remember that the first element has the index=0
item=0

while total_money_for_groceries > 0:
  item_bought=price_of_grocery_items[item]
  total_money_for_groceries = total_money_for_groceries-item_bought
  print('I bought an item that costs', item_bought, 'euros. Now I have',total_money_for_groceries, 'euros.')
  #Now we increment the value of the variable item, so we access the value of the next element in the list
  item=item+1

#Notice that since after we buy the fourth item (value 2), I have no money anymore.
#Since the value of the total_money_for_groceries is less than 0, the loop ends because the
#condition total_money_for_groceries > 0 is not True anymore.



I bought an item that costs 1 euros. Now I have 9 euros.
I bought an item that costs 2.5 euros. Now I have 6.5 euros.
I bought an item that costs 5 euros. Now I have 1.5 euros.
I bought an item that costs 2 euros. Now I have -0.5 euros.


## 5. Functions
A function is a block of code that only runs when it is called. Programming languages have many native functions (functions that come with the language) and you will use a lot of these (e.g., functions print(), len()), but you can create your own functions.

It is best practice that a function implements a well-defined task (e.g., computing the average of  numbers in a list, getting the longest word in a dictionary).  

A function has:
*   signature: the name of the function
*   parameters: values that you pass to the function so it can perform the necessary calculations. Can be empty if no parameter is needed

You can't define two given functions with the same signature and parameters.

Usually a function returns a value that is the output of the calculations. for that we use the keyword **return**.
To define a function you use the keyword **def**.

You can also use the functions of auxiliary libraries, like [NLTK](https://www.nltk.org/) which you will see during this course.

In [None]:
def get_longest_word_in_list(list_of_words):
  #initally, we create a dummy value with minimum value for a string
  #an empty string has lenght=0
  longest_word = ''
  for word in list_of_words:
    #len() is a function that outputs the size of its parameter
    #(a list, the number of characters in a word, etc)
    if len(word)>len(longest_word):
      longest_word = word

  return longest_word


animals = ['cat', 'dog', 'arara', 'chicken', 'elephant', 'giraffe', 'platypus']
food= ['bread', 'tomato', 'cheese', 'cabbage', 'pineapple']
longest_word = get_longest_word_in_list(animals)
print('The longest word on the list passed as parameter is', longest_word)
longest_word = get_longest_word_in_list(food)
print('The longest word on the list passed as parameter is', longest_word)

The longest word on the list passed as parameter is elephant
The longest word on the list passed as parameter is pineapple


In [None]:
#Lets implement a function that receives as input a text (a string) and outputs
#a dictionary that says how many characters each word has.
def count_number_of_characters(text):
  counts=dict()
  #split() is a function that splits a string on the symbol used as parameter
  #here we are using the blank space, so we get the words.
  #Of course if our text had punctuation or multiword expressions this wouldn't
  #work well
  split_text = text.split(' ')
  for word in split_text:
    if word not in counts.keys():
      counts[word] = len(word)
  return counts

text = 'Tras tres tragos y otros tres y otros tres tras los tres tragos'
other_text = 'Doña Panchívida se cortó un dévido con el cuchívido del zapatévido'
counts_text = count_number_of_characters(text)
print(counts_text)
counts_other_text = count_number_of_characters(other_text)
print(counts_other_text)

{'Tras': 4, 'tres': 4, 'tragos': 6, 'y': 1, 'otros': 5, 'tras': 4, 'los': 3}
{'Doña': 4, 'Panchívida': 10, 'se': 2, 'cortó': 5, 'un': 2, 'dévido': 6, 'con': 3, 'el': 2, 'cuchívido': 9, 'del': 3, 'zapatévido': 10}


In [None]:
#now lets alter the function so it receives a list instead of a single string.
#(a list that stores sentences and each sentence is an element in the list).
#to avoid that same strings with uppercase/lowercase differences are counted
#twice, (for instance, 'a' and 'A')
#we'll pass the sentence to lowercase by using the function lower()
def count_number_of_characters(sentences):
  counts=dict()
  for sentence in sentences:
    sentence = sentence.lower()
    split_text = sentence.split(' ')
    for word in split_text:
      if word not in counts.keys():
        counts[word] = len(word)
  return counts


sentences = ['No man is an island', 'Entire of itself',
'Every man is a piece of the continent','A part of the main']
counts = count_number_of_characters(sentences)
print(counts)

{'no': 2, 'man': 3, 'is': 2, 'an': 2, 'island': 6, 'entire': 6, 'of': 2, 'itself': 6, 'every': 5, 'a': 1, 'piece': 5, 'the': 3, 'continent': 9, 'part': 4, 'main': 4}
