# Introduction to Python

This notebook aims at giving you a first glimpse at Python's syntax and its possibilities. This versatile language is widely used, not only for Machine Learning and Data Science in general, but for many other applications, both in the industry and academia, and will be very useful for your career.

A more in-depth introduction to base Python is given in a parallel class. In this notebook, we will quickly recall the important python data types that you will certainly come across, as well as the syntax for the basic programing principles (loops, conditions, functions, classes). The other notebook available on moodle (`Introduction_to_python_modules.ipynb`) will go a bit more in depth with the essential Python modules (libraries), that you will need for this course (and for almost any Data Science project).

Depending on your personal programming background and skills, learning a new programming language can be a lot of new information at once. Feel free to go through these notebooks at your own pace and multiple times to really understand the concepts shown here through examples that you can run, modify, and play a bit with.

If you need further explanations about programming in Python, we can, for example, recommend:
- https://openclassrooms.com/en/courses/2304731-learn-python-basics-for-data-analysis for Python beginners,
- https://jakevdp.github.io/PythonDataScienceHandbook/ a very good book teaching Numpy, Pandas, Matplotlib,
- https://www.youtube.com/playlist?list=PL-osiE80TeTsqhIuOqKhwlXsIBIdSeYtc an introduction to object oriented programming (OOP). Note that this goes beyond the level expected for the course.

Search engines are also a very useful tool when you are stuck. The Python community is *very* active, you will, in most cases, find a rapid answer to your "*how to ...*" questions!
Discussing with other students is also highly encouraged, to help each other out and learn in groups (as long as you don't share your final answers for graded materal). And of course, don't hesitate to ask your questions to the TAs when you are stuck.

## A word on IPython Jupyter notebooks

In the Anaconda environment, Python code can be run in different manners. One could use simple `*.py` scripts, that can be edited, for example with the Spyder software (also provided in Anaconda), similarly to `*.R` sripts edited with RStudio.

Another way is to use `*.ipynb` notebooks (as the one you are looking at), which come in handy for data analysis tasks for example, as you can structure and present your projects nicely with comments and written analysis between code cells.

In notebooks, both code and markdowns can be run and rendered successively with `Shift` + `Enter`. You can switch a cell from code to markdown by pressing `Esc` and then `M`, or with the menu above. Double click on a rendered markdown (this one, or the title above for example) to edit it.

Jupyter notbooks are a WYSIWYG (What You See Is What You Get) environment. It is possible for example to use **bold**, *italic*, and `verbatim`.

    This is also some verbatim code.
    $ echo "hi"

- This is a list,
- with two entries...

1. and this an enumerated one,
1. with two entries as well.

LaTeX code is also possible: $E=mc^2$,
$$
\int_{-\infty}^\infty f(t) e^{-i\omega t} dt.
$$

Feel free to spend some time getting familiar with the environment and to play a bit with the menu above to learn how to create, move, delete cells, and anything else that you would need.

## Basic data types

### Numbers, text and booleans

The syntax for the most basic types and for variable attribution is similar to a lot of other (interpreted, non-typed) programming languages you may be familiar with. Here are some examples:

In [None]:
some_integer = 2
some_integer # (the last called object in the cell is shown in "Out")

In [None]:
# Other examples printed with the "print" function
print("integers:", 4)
print("floats:", 12.46, 1.3e4)
print("strings:", "Hello World!")
print("Complex numbers:", 4 + 3j)

In [None]:
print("booleans:", True, False)
print("booleans:", (1 > 2 or 4 >= 3) )

In [None]:
a=2
print("Type of 'a' is:")
type(a)

In [None]:
example_string = "Hello"
print("First Letter of", example_string, "is", example_string[0]) # Note that indexing starts at 0 in Python
print("Last Letter of", example_string, "is", example_string[-1]) # [-1] Accesses the last element of an ennumerable object
# example_string[0] = 'B' # This does not work! Strings are "immutable"

In [None]:
combined_string = example_string + " everyone"
combined_string

### Data Containers

For sequences, the most used base type are lists...

In [None]:
my_list = [2, 6, 4, 432, 36]
my_list

In [None]:
my_list[1:4]  #careful: 1:4 returns the values indexed at 1, 2 and 3 but NOT at 4

In [None]:
my_list + [12] # This creates a new list, with 12 in the end

In [None]:
my_list.append(13) # This appends 13 to the list ("in place")
my_list

In [None]:
[2, 4, "sd", True, ["inner","list"]] # lists can contain different subtypes, even other lists

...and tuples (often returned by functions, as shown a bit below in the next sections) which are the "immutable" version of lists:

In [None]:
my_tuple = (2, 56, 3)
my_tuple

In [None]:
not_a_tuple = (4)
my_1D_tuple = (4,)
my_1D_tuple

There are also sets (any value can appear only once):

In [None]:
my_set = {12, "w", 12}
my_set

and dictionaries, that can be seen as named lists:

In [None]:
my_dict = {
    "name": "Sophie",
    "Age": 23,
    "Passions": ["Tennis", "Machine Learning"]
}
my_dict

In [None]:
my_dict["Age"]

In [None]:
# Side note: any immutable types can be dict-keys
my_weird_dict = {
    "name": "strings",
    True: "booleans",
    1: "...but treated as 0/1",
    -3: "integers",
    2.5: "floats",
    1+1j: "complex numbers",
    (1, 2): "tuples"
}
my_weird_dict

## Programming principles

In Python, loops, conditions, and function definitions use colons (`:`) and indentation instead of curly braces, which are used in many other languages (like R).

### Loops

Here are some useful examples of some for loops:

In [None]:
# Ranges start at 0 by default
for i in range(4):
    temp = i*2
    print(i, "* 2 =", temp)

In [None]:
# Start, stop, and step-size can be specified
for j in range(10, 1, -3):
    print(j)


In [None]:
# Note that the scope of variable transcends loops in Python,
# e.g., we can access the enumeration index and variables defined inside of the loop, outside of it:
print(i, temp)

In [None]:
list_to_enumerate_on = ["a", "b", "c"]

for el in list_to_enumerate_on:
    print(el)

for i, el in enumerate(list_to_enumerate_on):
    print(i, "-", el)

In [None]:
# while loops are useful, if you don't know the number of necessary iterations
x = 20
while(x > 0.1):
    x = x / 2
    print(x)

In some cases, simple `for` loops can be written using "list comprehension", which can come in pretty handy sometimes. Here is an example:

In [None]:
my_list = [2, 6, 4]
twice_my_list = [i * 2 for i in my_list]
twice_my_list

In [None]:
## Warning: what does the following line produce?
# my_list * 2

### Conditions

`If... else` conditions are pretty straight forward:

In [None]:
cond = 4 > 35

if cond:
    print("4 is greater than 35")
else:
    print("4 is NOT greater than 35")
        

In [None]:
a = 4
b = 35
if a > b:
    print("a is greater than b")
    
elif a == b:
    print("a equals b")
        
else:
    print("b is greater than a")

Conditions can also be added to list comprehensions. Here is a modification of the example above:

In [None]:
my_list = [2, 6, 4]
twice_my_list = [i * 2 for i in my_list if i <= 5]
twice_my_small_list = [i * 2 if (i <= 5) else "too large" for i in my_list]
twice_my_small_list

### User defined functions

Functions are very useful for modularity and code clarity. Especially if a task is repeated more than once, a function is always better than repeating code!

One can define a Python function and use it, for example, in the following way:

In [None]:
def my_function(a, b, c):
    """
    Short description of the function 1.
    """
    result = (a + b) / c
    return result

In [None]:
fct_result = my_function(3, 7, 2)
fct_result

Multiple outputs are possible (returning tuples):

In [None]:
def my_multi_function(a, b, c):
    """
    Short description of the function 2.
    """
    result1 = (a + b) / c
    result2 = (a - b) / c
    
    return result1, result2

In [None]:
fct_result2 = my_multi_function(3, 7, 2)
fct_result2

The outputs can directly be stored in separate variables:

In [None]:
res1, res2 = my_multi_function(3, 7, 2)
print(res1)
print(res2)

Separating your code in different meaningful functions is in general very recommended, as it makes your code clearer, cleaner, and helps avoiding copy-pasting (which is a very bad coding practice, as it is very error-prone once something that has been copy-pasted needs modifications).

As for lists, functions can be defined in a one-line manner in Python, with `lambda`:

In [None]:
one_line_fct = lambda x, y: (x+y)**2
one_line_fct(2,3)

## Python modules (libraries)

You can import Python libraries with the `import` command, for example, the numpy module can be imported as follows:

In [None]:
import numpy

Then, the module's variables, functions, classes, ... can be accessed with the dot (.):

In [None]:
numpy.array([1,2]) # This creates a numpy array (more details in the other notebook on modules)

Module imports can be given aliases, for example, numpy is more often imported as:

In [None]:
import numpy as np

Then, for example, the same line as above becomes:

In [None]:
np.array([1,2,3])

another example, with the "random" sub-module of numpy:

In [None]:
np.random.randn(2) #gaussian sample of size 2, from numpy's 'random' submodule

One can import only specific function or submodules in the main namespace. For the same examples:

In [None]:
from numpy import array, random

In [None]:
array([1, 2])

In [None]:
random.randn(2)

It is also possible to import everything from a module into the main namespace, although it is really not recommended, especially for big modules as numpy, as functions can overlap with existing ones. It is thus better to use each module in their own namespace separately as above (e.g. with `import numpy as np`).

In [None]:
# !! NOT RECOMMENDED !!

#from numpy import *

Although this can be useful to import functions from a custom created file. Suppose we have a script containing some functions in the same folder as this notebook, named `my_functions.py` then you can import these functions with:

    from my_functions import *

The notebook `Introduction_to_python_modules.ipynb` available on moodle introduces some important Python modules that are widely used for many (if not all) data-related applications: `numpy`, `pandas`, `matplotlib` and `scikit-learn`.

## Introduction to object-oriented programming

Python is an object-oriented language. OOP is a very powerful coding paradigm that is widely used in many languages. It helps structuring the code around objects, and makes, arguably, the code much cleaner and clearer to read and conceptualize. It also makes it easier to model concrete real-world concepts into code. Many Python modules are centered around this principle.

Formally, in OOP, we can define what we call **classes**. A class is like a blueprint that describes an object. Such an object can have (among other things):
- properties (that we call **attributes**) which are inner variables of the objects, containing data, and
- behaviours (that we call **methods**) that are inner functions of the objects.

For example, let's say that we want to define a class that represents dogs. Then its attibutes could be: its name, its age, its weight, its color, etc... and its methods could be: to bark, to run, to bathe, etc...

Once we defined a class (i.e. the blueprint with its corresponding attributes and methods), we can create actual objects (that we call **instances**) from this "blueprint".

You are not required to know how to define classes for this course, but we believe that some OOP notions will help you better understand what you are actually doing when you use this course's required modules, like `scickit-learn` and `pandas`, that are designed around this paradigm.

For example, we could create the class dog as follows:

In [None]:
# ( This cell is a bit more advanced, you might skip the details as long as you understand the main idea )
class Dog:
    def __init__(self, name, age):
        #This method is called to initialize the instance when created from the class
        #Here we can define the atributes of the class
        self.name = name
        self.age = age
        self.dirty = False
    
    # Below are the other methods of the class
    
    def bark(self):
        print(self.name, "says 'Waf, Waf!'")
    
    def run(self, nb_meters):
        # If the Dog runs more than 100 meters, it gets dirty.
        if(nb_meters >= 100):
            print("I am dirty.")
            self.dirty = True
    
    def bathe(self):
        print("I am clean.")
        self.dirty = False
    
    def get_stick(self):
        # The Dog returns a wild stick
        return("stick")

We can then instantiate a dog from the class (i.e. create a dog from the blueprint):

In [None]:
dog1 = Dog(name = "Johnny", age = 8)

Wow, `dog1` is now a `Dog` called Johnny of age 8 which is clean (by default). We can then use its attributes and methods (that might modify some aspects of Johnny). Here are a few examples:

In [None]:
dog1.name

In [None]:
dog1.dirty

In [None]:
dog1.bark()

In [None]:
dog1.run(nb_meters=3)

In [None]:
dog1.dirty

In [None]:
dog1.run(nb_meters=112)

In [None]:
dog1.dirty

In [None]:
dog1.bathe()

In [None]:
dog1.dirty

In [None]:
surprise = dog1.get_stick() #Johnny got you a surprise
surprise

In [None]:
dog1.name = "Jean-Claude" #dog1 changes its name

#(defining a method change_name() would be cleaner than modifying the attribute directly like that)

In [None]:
dog1.name

The example above is maybe simpler in comparison, but you are conceptually doing the same things when you deal with `pandas` or `sklearn`, as `DataFrame` and any machine learning models are defined as classes, that you use instances of, and call methods from, with the same syntax as above.

You can now read again the notebook `Introduction_to_python_modules.ipynb`, where you should be able to understand more easily what the syntax actually means in terms of attributes, methods classes and instances, especially for the `pandas` and `sklearn` parts.

Note that, in `numpy` and `pandas` for example, some methods aiming at changing the data or data structure of a class instance (an array or  DataFrame, respectively) are not "*in place*". This means that the methods return the modified object (e.g. `reshaped_array = some_np_array.reshape(2,3)`) instead of modifying the instance itself (as in `dog1.bathe()` above). This behaviour can be controlled in some `pandas` methods with the `inplace = True/False` argument (e.g `df.drop("row_name", inplace=True)` actually removes a row from `df`, whereas `df2 = df.drop("row_name")` returns `df` without the specified row into `df2`).