<img src = "https://github.com/barcelonagse-datascience/academic_files/raw/master/bgsedsc_0.jpg">

# Introduction to programming with Python


<img src = "https://www.python.org/static/img/python-logo.png">

## Python brief history 

Created in 1989 by Guido Van Rossum - Monty Python fan hence the name and the jokes!

+ Currently there are two versions available
  + 2.7.x (2010)
  + 3.9.x (2020)

+ Python 2.7 is supported until 2020, users are
  encouraged to move to Python 3 as soon as possible 

## The Zen of Python, by Tim Peters

Beautiful is better than ugly.  
Explicit is better than implicit.  
Simple is better than complex.  
Complex is better than complicated.  
Flat is better than nested.  
Sparse is better than dense.  
Readability counts.  
Special cases aren't special enough to break the rules.  
Although practicality beats purity.  
Errors should never pass silently.  
Unless explicitly silenced.  
In the face of ambiguity, refuse the temptation to guess.  
There should be one-- and preferably only one --obvious way to do it.  
Although that way may not be obvious at first unless you're Dutch.  
Now is better than never.  
Although never is often better than *right* now.  
If the implementation is hard to explain, it's a bad idea.  
If the implementation is easy to explain, it may be a good idea.  
Namespaces are one honking great idea -- let's do more of those!  

In [1]:
# Zen of Python can be imported anytime
import this
this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


<module 'this' from '/usr/lib/python3.6/this.py'>

In [2]:
# To check name spaces in our current environment
a_num = 10
dir()
# ['__builtins__' .... '__spec__', 'a_num']
 
def some_func():
    b_num = 11
    print(dir())
     
some_func()
# ['b_num'] # only available there
 
dir()
# ['__builtins__' ... '__spec__', 'a_num', 'some_func']

['b_num']


['In',
 'Out',
 '_',
 '_1',
 '__',
 '___',
 '__builtin__',
 '__builtins__',
 '__doc__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 '_dh',
 '_i',
 '_i1',
 '_i2',
 '_ih',
 '_ii',
 '_iii',
 '_oh',
 '_sh',
 'a_num',
 'exit',
 'get_ipython',
 'quit',
 'some_func',
 'this']

## Working with Jupyter notebook 

+ Edit cells
+ Command completion and help
+ Type of cells
+ Executing cells (ctrl+enter, shift+enter)
+ Command mode vs Edit mode (esc, enter)
+ Inserting new cells above and below (a,b)
+ Deleting cells (d d)
+ Undo deleted cell (z)
+ Undo inside code cell (ctrl+z)
+ Saving
+ Python help on commands: place cursor and type SHIFT+TAB


## Working in Google Collab

You can find specificities of working with Jupyter notebook in Google Collab (here)[https://colab.research.google.com/notebooks/basic_features_overview.ipynb].

Note that TAB key is used for auto-completion and to explore code documentation.

CTRL+/ is used to comment/uncomment lines of code.

## Data types and Variables


Variables in Python hide what is inside of them. It is _extremely_ important that, whenever you are working with variables, you have in mind what TYPE of thing the variable holds. One way to help yourself is to use helpful names with variables (hint: don't use "x"!).

You can check the type of a variable with the built-in "type" function: 

```python
x = 5
type(x)
```

Use this to check the type of all the variables in the cells below (NOTE: Jupyter will automatically display the value of the last expression evaluated in the cell): 

In [3]:
# Numbers

x = 3
y = 3.2

In [4]:
# Strings are written between with double quotes ("") or single quotes ('')

x = "om" 
y = 'om'

In [5]:
# Boolean

x = True  
x = False 

In [6]:
# Null value in Python is a NoneType, written as "None"

x = None

## Basic operations with numerical data



```python
# Assignment
a = 10      # 10

# Increment/Decrement
a += 1      # 11
a -= 1      # 10

# Operations
b = a + 1   # 11
c = a - 1   # 9

d = a * 2   # 20
e = a / 2   # 5
f = a % 3   # 1   (modulo or integer remainder) 
g = a ** 2  # 100 (a to the power of 2)

# Operations with other variables
d = a + b   # 21
```

In [7]:
a=10
a*=2
a

20

## Basic operations with strings


You can concatenate strings together with the + operator: 

```python
"Hello" + " " + "World"
```

The built-in function "len" can be used to find the length of a string: 

```python
len("foo")
```

## Printing

It can be useful to "print" what we are doing to the screen, this can be done with the built-in "print" command".

You might have noticed that Jupyter notebooks automatically displays the value of the last expression in a cell, when you execute it, so you don't need to print that!

In [8]:
# Printing - note how Jupyter automatically prints "y" to "out", 
# but we need to manually print "x" if we want to see it!

x = 15 / 2
print(x)
y = x > 2
y

7.5


True

## Logical operations

We can check the relationships between different types of data in Python

The output of such comparisons/operations are boolean variables

Lets see some examples 

In [9]:
# Comparing numbers

x = 1 >= 2 
y = 1 == 2 
w = 1 != 2

In [10]:
# It is canonical to use "is" instead of == for checking for NoneType:

x = None
y = x is None
z = x is not None

In [11]:
# There is a boolean algebra to combine comparisons (and/or)
# Note: parentheses not needed here, but they help readability!


x = (1 <= 2) and (1 > 0)
y = (1 > 2) or (1 < 3)

## Indentation 

Most languages don’t care about indentation.

Most humans do. We tend to group similar things together. 

Python encourages “readable” code by enforcing indentation.

## Making decisions on the basis of comparisons ("control flow"): if 

The structure is 

```python

if BOOLEAN: 
    ACTION 1
    ACTION 2 # note indentation! 
else: 
    ACTION 3
```

For example: 

```python

gender = "male"
age = 20 
if gender == "female":
    if age > 18:
        print("woman")
    else: 
        print("girl")
```

What will it print???

### Exercise 1

Practise control flow: 
write some code that prints "high" if the number in x is greater than 5, and "low" if the number is less than 5.

In [35]:

x = 5

# your code here! Hint: use if/else and print.
if x>5:
    print('high')
elif x<5:
    print('low')
else:
    print('equal')


equal


## Looping 





<table><tr>
<td>  **This is the process of repeating a set of operations *when an index varies within a set*! Within the loop the data used in the operations can change** </td>
<td> <img src="https://github.com/barcelonagse-datascience/academic_files/raw/master/images/dullboy.jpg"> </td>
</tr></table>


Looping is fundamental in Python! Let's see examples. 

## Lists

In order to loop, we need something to loop over! In Python, things that can be looped over are called "iterables". 

Whenever you want to loop in Python, think of everything that needs to change inside the loop and try to put that into an iterable. This might be different than you are used to in other languages!

One of the simplest iterables in Python is a list. Lists are created with square brackets: 

```python
my_list = ["She", "turned", "me", "into", "a", "newt"]
```

Here you can see we created a list of strings. We can also create a list of integers: 

```python
ages = [1,5,10]
```

But Python lists don't have to be homogenous, you can mix types! This is most useful for including the NoneType: 

```python
ages = [1, None, 10]
```

In [13]:
# A list!
ages = [1, 2, 10, None, 100]

# A loop! (note the indentation)
for x in ages:
    print("This persons age is: ", x)
print("Done")

This persons age is:  1
This persons age is:  2
This persons age is:  10
This persons age is:  None
This persons age is:  100
Done


### Exercise 2
Repeat the above cell, but only print the age if the
 age exists (i.e. is not None). 
 
HINT: Look back at the cells on logical operations to see how to check if a value is None in Python

In [37]:

# Your code here
for age in ages:
    if age is not None:
        print (age)

print('hi')

0
3
21
45
10
97
hi


## Using Lists


Sometimes you want to access individual elements from a list. You can do this using square brackets together with the "index" of the element: 

```python
ages = [1,5,10,20,30]
ages[0]
```

The first element is indexed at 0, the second element at 1, etc. 

You can also access a contiguous range of elements: 

```python
ages[1:3] # second item (index 1) and third item (index 2) only!
```

You can also use negative indices to access items from the end. For example, the last item: 

```python
ages[-1]
```

You can concatenate multiple lists together with the +: 

```python
ages + [40, 50, 60]
```

And you can check for membership with "in": 

```python
"foo" in ["foo", "bar", "baz"]
```

## Operations on Lists

In data science, we deal with data! Data, being many datum, are often stored in lists (or list-like structures). 

There are three main operations we perform with lists: 

1. Aggregate (reduce)
2. Applying a function (map) to each element
3. Filter the elements

Let's look at examples to understand what these terms mean.

In [15]:
# Aggregation: 
# Summing the numbers in a list: 

nums = [30,1,4,3,10.5,100]

total = 0 

for num in nums:
    total += num
    
total

148.5

In [16]:
# Aggregation: 
# Finding the minimum number in a list of number: 

nums = [30,1,4,3,10.5,100]

min_num = nums[0] 

for num in nums: 
    if num < min_num:
        min_num = num
    
min_num

1

### Exercise 3

Aggregation: Count the number of NoneTypes in a list:

In [17]:

nums = [30, None, 4, 3, None, 10.5, 100]

total = 0 

for num in nums:
    # Your code here
    if num is None:
        total +=1
    
total


2

In [18]:
# Applying a function: 
# Squaring each number in a list

nums = [30,1,4,3,10.5,100]

# This is called a "for comprehension"
# and is the Pythonic way to apply a function to 
# every element in a list
squared_nums = [num**2 for num in nums]
    
squared_nums

[900, 1, 16, 9, 110.25, 10000]

In [19]:
# Applying a function: 
# Getting the length of each string in a list: 

names = ["foo", "bar", "baz", "foobarbaz"]

# Note the "len" command to get the length of a string. 
# Hint: this same command can be used to ge the length of a list!
lengths = [len(name) for name in names]

lengths

[3, 3, 3, 9]

In [20]:
# Filter:
# Remove all values less than 18:

ages = [0, 3, 21, 45, 10, 97]

adults = [a for a in ages if a > 17]

adults

[21, 45, 97]

In [21]:
# Filter:
# Remove NoneTypes from a list: 

names = ["foo", "bar", None, "baz"]

only_names = [name for name in names if name is not None]

only_names

['foo', 'bar', 'baz']

## Tuples

Another iterable is called a "tuple". Rather than using square brackets, tuples are created with parentheses: 

```python
x = ("foo", 1)
```

But can also be created without any perentheses, implied by the comma: 

```python
x = "foo", 1
```

Elements in the tuple are also accessed via the index (like lists): 

```python
x[0]
```

Lists can be used most places that a tuple is used, so it can be confusing what the difference is between the two. Besides technical differences that we won't go in to here, the following rules can help you decide when to use a tuple and when to use a list: 

* LIST: Potentially many elements, unknown number of elements, relatively homogenous elements.
* TUPLE: Few elements, fixed number of elements, completely heterogeneous elements.


The name comes from here: _double, triple, quadruple, quintuple, sextuple, septuple, octuple._ Which gives a hint that they should be of fixed length! Because of this, we rarely iterate over them in a for loop like lists. 

Because they have a fixed length, we often use them with destructuring: 

```python
name,num = x
```
Now the variable "name" contains the value "foo" and the variable "num" contains the value 1. This may not seem particularly useful at the moment, but we will soon see how it can be used. 

In [22]:
# Destructuring tuples in a for loop:

# Note: a list of tuples is a useful data structure 
# when your data is a set of "pairs":

scoreboard = [("om", 100), ("nandan", 10000), ("arapakis", 55)]

for name,score in scoreboard: 
    print(f"{name} has scored {score} points") # string interpolation with f""!

om has scored 100 points
nandan has scored 10000 points
arapakis has scored 55 points


In [23]:
# Aggregating a list of tuples: 

# Challenge:
# Return the name of the top scoring teacher.
# Hint: this is an aggregation!

scoreboard = [("om", 100), ("nandan", 10000), ("arapakis", 55)]

# Your code here
max_score=0
for name,score in scoreboard:
    if score > max_score:
        max_score=score
        top_scorer=name
top_scorer

#alternative
# best_teacher,score=scoreboard[0]
# for t,s in scoreboard:
#     if s>max_score:
#         best_teacher,max_score= t,s

# best_teacher



'nandan'

## Dictionaries


We saw that it can be great to put our data into a tuple if it is easily represented as a pair (or a triple, quadruple, etc.). But sometimes our data is more complicated than that, and we don't want to try and remember the "order" of each distinct part (as we need in a tuple). 

Dictionaries are another basic type in Python. 

They are "associative" data structures. Like the eponymous dictionary, they associate a KEY with a VALUE and are created with the {}: 

```python
teacher = {"name": "nandan", "score": 10000}
```

You can access the value via the key:

```python
teacher["name"]
```

You can also set a value in a similar way: 

```python
teacher["name"] = "nandan rao"
```

Note that each key can ONLY HAVE ONE VALUE. In the above example, I have overwritten the original "name" key with a new value.

In [24]:
teachers = [{"name": "om", "score": 100, "likes": ["statistics", "more statistics", "even more statistics"]},
            {"name": "nandan", "score": 10000, "likes": ["ice cream"]},
            {"name": "arapakis", "score": 55, "likes": ["R", "D3"]}]
teachers[0]
#Try those options to access first element of first dictionary
#teachers[0][0]
#teachers[0]['name']
#teachers[0].get('name')

{'likes': ['statistics', 'more statistics', 'even more statistics'],
 'name': 'om',
 'score': 100}

### Exercise 5

Collect all the likes of the teachers into one list: 

Hint: this is an aggregation!

In [25]:

# Notice: What is "teachers"? 
# A list of dictionaries, but each dictionary has three keys 
# and the "likes" key contains a list of strings!
teachers = [{"name": "om", "score": 100, "likes": ["statistics", "more statistics", "even more statistics"]},
            {"name": "nandan", "score": 10000, "likes": ["ice cream"]},
            {"name": "arapakis", "score": 55, "likes": ["R", "D3"]}]

# Your code here

likes=[]
for teacher in teachers:
    likes+= teacher['likes']
    #Alternative 1
#     for like in teacher['likes']:
#         likes.append(like)

# Alternative 2
# my_likes=[teacher['likes'] for teacher in teachers]
# likes=[l for like in my_likes for l in like]

# Alternative 3
#likes=[like for teacher in teachers for like in teacher['likes']]
    
likes

['statistics',
 'more statistics',
 'even more statistics',
 'ice cream',
 'R',
 'D3']

## Classes: Custom Types

We've seen how we can use the `type` function to check the type of a variable. But Python also lets us create our own "types." These are called _classes_. We can use the terms "type" and "class" interchangeably in Python. 

In [26]:
# This is the syntax for creating a class. Right now, this class does nothing. 
# We'll learn how to create more useful classes later.

class MySimpleClass():
    pass


foo = MySimpleClass()


# Check the type of `foo`!

## Instances and Attributes


In the above example, `MySimpleClass` is the class and we call `foo` an instance of the class `MySimpleClass`. To connect this to what we've seen: 

```
bar = [1,2,3]
type(bar)
```

`bar` is an instance of the class `list`. 

Instances can have _attributes_. Attributes are just variables that are attached to the instance. They are accessed with dot notation: 

```
foo.shape
```

Would access the hypothetical "shape" attribute of instance `foo`. If the attribute happens to be a function, we call it a _method_. 

Methods are functions that have a special purpose: they interact with the instance itself in some way.

We'll see examples now: 

## Advanced operations with strings

Strings are actually iterables, just like lists! They can be subset just like lists: 

```python
x = "my python string"
x[3:9]
```

You can also turn a string into a list of strings via the "split" method: 

```python
x = "my python string"
y = x.split(" ")
y == ["my", "python", "string"]
```

And the reverse is also possible via the "join" method: 

```python
space = " "
z = space.join(y)
z == x
```


You can also make everything lower (or upper!) case, replace certain substrings with other substrings, and check for the existence of a substring with "in":  

```python
z = "My Python String"
z.lower() == x
z.upper() == "MY PYTHON STRING"
w = z.replace("Python", "R")
"Python" in w
```
There are many more easy-to-use, built-in tools for working with text data in Python. You can read more here: https://docs.python.org/3/library/stdtypes.html#string-method

### Exercise 6

Count the instances of "foo" in the following text, 
 ignoring case.

In [27]:
x = "Hello, I would like a foo. Foo went for a walk. Foo bar baz. Baz to the foo."

# Your code here

count=0
for s in x.lower().replace('.',' ').split():
    if 'foo' ==s:
        count+= 1

print(count)

x.split()
#or
x.lower().count('foo')

4


4

## Data and operations in a bundle: functions

Input data (if any) --> Set of operations --> Output data (if any)

```python
def name(input):
    operations
    return output
```

For example, here is a function that takes a number, and its square:

```python
def squared(x):
    return x**2
```

Here is a more general function that takes a number and the power, and returns the number to that power: 

```python
def power(x, n):
    return x**n
```

Here is a function that returns the minimum and the sum of a list of numbers:

```python
def minsumfun(x):
    minx = x[0] if len(x) > 0 else None
    sumx = 0.0  
    for y in x: 
        if y < minx:
            minx = y
        sumx += y 
    return minx,sumx # notice multiple outputs, technically a tuple!
```

Now we can call that function: 

```python
m,s = minsumfun([1,5,0.3,-1]) # Destructuring! Very Pythonic! :)
```

### Exercise 7

Functions are very useful for transforming data:

Let's create a function to compute the highest scores for each teacher.

In [28]:

teachers = [{"name": "om", "scores": [100, 200, 150]},
            {"name": "nandan", "scores": [10000, 9999, 99987 ]},
            {"name": "arapakis", "scores": [55, 100, 5]}]



# Fill in this function! 
# It should take a dictionary, as in the list above, 
# and it should return a 2-tuple with their name and
# their highest score: (name, score)
def highest_score(person):
    # Your code here
    max_score=0
    for score in person['scores']:
        if score>max_score:
            max_score=score
    
    return person['name'],max_score


# Apply the function to each element in the list "teachers": 

# Your code here
#test 
#highest_score({'name':'foo', 'scores':[20,10,1000,1]})

[highest_score(teacher) for teacher in teachers]


[('om', 200), ('nandan', 99987), ('arapakis', 100)]


## Reading and writing from files

We can think of files as data types actually! Python effectively does this, and you should then not be suprised that they have attributes etc. 

There are two modes, read and write

In short, the built-in `open` function creates a Python file object, which serves as a link
to a file residing on your machine. After calling `open`, you can transfer strings of data
to and from the associated external file by calling the returned file object’s methods.

`with` is a file context manager which allows us to wrap file-processing code in a logic layer that
ensures that the file will be closed automatically on exit.

To understand how Python works with such data, lets work with the text file *textfile.txt*


In [29]:
# Run this only if running from Collab
# Mount google drive if running from Google Collab

from google.colab import drive
drive.mount('/content/drive')

# Set current directory if running from Google Collab
import os
os.chdir('/content/drive/My Drive/Classroom/19D031 Foundations in Data Science All Programs (except DS)/FDS_materials/intro_programming/Python')

Mounted at /content/drive


In [30]:
# Lets read the file into a variable:

with open('../../Data/textfile.txt') as f:
    content = f.read()

print(content)

Why Do People Use Python?

Because there are many programming languages available today, this is the usual first question of newcomers. Given that there are roughly 1 million Python users out there at the moment, there really is no way to answer this question with complete accuracy; the choice of development tools is sometimes based on unique constraints or personal preference.

But after teaching Python to roughly 225 groups and over 3,000 students during the last 12 years, some common themes have emerged. The primary factors cited by Python users seem to be these:

Software quality
For many, Python’s focus on readability, coherence, and software quality in general sets it apart from other tools in the scripting world. Python code is designed to be readable, and hence reusable and maintainable—much more so than traditional scripting languages. The uniformity of Python code makes it easy to understand, even if you did not write it. In addition, Python has deep support for more advanced

In [31]:
# The file object is actually an iterator, so we can do our usual tricks:

with open('../../Data/textfile.txt') as f:
    for line in f: 
        print(line)

Why Do People Use Python?



Because there are many programming languages available today, this is the usual first question of newcomers. Given that there are roughly 1 million Python users out there at the moment, there really is no way to answer this question with complete accuracy; the choice of development tools is sometimes based on unique constraints or personal preference.



But after teaching Python to roughly 225 groups and over 3,000 students during the last 12 years, some common themes have emerged. The primary factors cited by Python users seem to be these:



Software quality

For many, Python’s focus on readability, coherence, and software quality in general sets it apart from other tools in the scripting world. Python code is designed to be readable, and hence reusable and maintainable—much more so than traditional scripting languages. The uniformity of Python code makes it easy to understand, even if you did not write it. In addition, Python has deep support for more a

In [32]:
# We can also read it into a list by converiting 
# the iterator into a list directly:

with open('../../Data/textfile.txt') as f:
    content = list(f)

content[0]

'Why Do People Use Python?\n'

## Reading and writing from URL files

Here you have equivalent code for files that we can access with an URL command. Now we are using urllib package. The main difference is how to read the file, and the fact that the object is no longer an iterable.
If we want to split into lines we have to code it.

In [33]:
# We load the data
import urllib.request

url = "https://raw.githubusercontent.com/barcelonagse-datascience/academic_files/master/data/textfile.txt"
file = urllib.request.urlopen(url)

txt=file.read().decode() # decode is used to convert to string format
txt

'Why Do People Use Python?\n\nBecause there are many programming languages available today, this is the usual first question of newcomers. Given that there are roughly 1 million Python users out there at the moment, there really is no way to answer this question with complete accuracy; the choice of development tools is sometimes based on unique constraints or personal preference.\n\nBut after teaching Python to roughly 225 groups and over 3,000 students during the last 12 years, some common themes have emerged. The primary factors cited by Python users seem to be these:\n\nSoftware quality\nFor many, Python’s focus on readability, coherence, and software quality in general sets it apart from other tools in the scripting world. Python code is designed to be readable, and hence reusable and maintainable—much more so than traditional scripting languages. The uniformity of Python code makes it easy to understand, even if you did not write it. In addition, Python has deep support for more 

In [34]:
# Let's recover lines using split and the target line break
txt.split('\n')[0:10]

['Why Do People Use Python?',
 '',
 'Because there are many programming languages available today, this is the usual first question of newcomers. Given that there are roughly 1 million Python users out there at the moment, there really is no way to answer this question with complete accuracy; the choice of development tools is sometimes based on unique constraints or personal preference.',
 '',
 'But after teaching Python to roughly 225 groups and over 3,000 students during the last 12 years, some common themes have emerged. The primary factors cited by Python users seem to be these:',
 '',
 'Software quality',
 'For many, Python’s focus on readability, coherence, and software quality in general sets it apart from other tools in the scripting world. Python code is designed to be readable, and hence reusable and maintainable—much more so than traditional scripting languages. The uniformity of Python code makes it easy to understand, even if you did not write it. In addition, Python has 