## Lesson 3 Overview

1. List comprehension
2. Functions
3. Importing data
4. Packages

## Let's load today's lesson!

### Open Azure Notebooks library 

Go to https://notebooks.azure.com -> Sign in if needed -> Select **python-intro**

### Update lesson file to latest version

Select **New** -> **From URL** -> input https://raw.githubusercontent.com/kelvnt/python-intro/master/Lesson3.ipynb (URL is available in **Lesson3.ipynb**) -> Click outside input then select **Upload** (overwrite if needed)

### Open Jupyter lab

From your browser's bookmark or **Run** -> Change browser URL path from **/nb/tree** to **/nb/lab**

Select **Lesson3.ipynb**

In lesson 2, you've learnt about List and how to create and manipulate them, and about Loops and how to create and update Lists using Loops.


# List comprehensions

`Estimation: 10 minutes`

List comprehensions provide a concise way to create lists.

It consists of brackets containing an expression followed by a `for` clause, then zero or more `for` or `if` clauses. The expressions can be anything, meaning you can put in all kinds of objects in lists.

The result will be a new list resulting from evaluating the expressions in the context of the `for` and `if` clauses which follow it.

The list comprehension always returns a result list.

### Syntax

The list comprehension starts with a '[' and end with ']' and this will ensure the end result is going to be a list.

**newList = [expression(x) for x in oldList if filter(x)]**

**newList** is the new list result.

**expression(x)** is the expression based on the variable used for each element in the old list.

**for x in oldList** is a `for` loop.

**if filter(x)** is a if-statement to filter unwanted results.

Example 1: Create a list of squares for even numbers from 0 to 9

In [None]:
# Normal loop implementation
evenSquares = []
for x in range(10):
    if x % 2 == 0:
        evenSquares.append(x**2)
        
print(evenSquares)

In [None]:
# List comprehensions implementation
[x**2 for x in range(10) if x % 2 == 0]

Example 2: Create a tuple that have all different numbers combination between 2 array range from 0 to 4.

In [None]:
# Normal loop implementation
combinations = []
for x in range(5):
    for y in range(5):
        if (x != y):
            combinations.append((x, y))
            
print(combinations)

In [None]:
# List comprehensions implementation
combinations = [(x, y) for x in range(5) for y in range(5) if x != y]
print(combinations)

Example 3: Given 2-dimentional list (like a table), flatten it (to a normal list)

In [None]:
table = [[1, 2, 3],
         [4, 5, 6],
         [7, 8, 9]]

# Normal loop implementation
flatten = []
for row in table:
    for num in row:
        flatten.append(num)

print(flatten)

In [None]:
# List comprehensions implementation
[num for row in table for num in row]

Example 4: Given 2-dimentional list (like a table), transpose rows and columns using Nested List Comprehensions

In [None]:
# Normal loop implementation
transpose = []
for num in range(len(table[0])):
    fields = []
    for row in table:
        fields.append(row[num])
    transpose.append(fields)

print(transpose)

In [None]:
# List comprehensions implementation
[[row[num] for row in table] for num in range(len(table[0]))]

Exercise: Create a tuple that have number from 2 to 5 (x) with number divisible by x between 1 and 25. `(5 minutes)`

In [None]:
# Normal loop implementation


# List comprehensions implementation


# Expected result:
# [(2, 2), (2, 4), (2, 6), (2, 8), (2, 10), (2, 12), (2, 14), (2, 16), 
#  (2, 18), (2, 20), (2, 22), (2, 24), (3, 3), (3, 6), (3, 9), (3, 12), 
#  (3, 15), (3, 18), (3, 21), (3, 24), (4, 4), (4, 8), (4, 12), (4, 16), 
#  (4, 20), (4, 24), (5, 5), (5, 10), (5, 15), (5, 20), (5, 25)]

# Sorting

`Estimation: 5 minutes`

### Ascending - from small to big, from low to high, from A to Z, from a to z

By default, all sorting are in ascending order.

**sorted(list)** will return a new sorted list, leaving the original list unaffected.

**list.sort()** will sort the list **in-place**. If the list is immutable, it will return None.

### Descending - in reverse of Ascending

Both sort() and sorted() accept an additional parameter for reverse in boolean type.

**sorted(list, reverse=True)**

**list.sort(reverse=True)**

Example 1: Given a list of numbers, create a new sorted list in ascending order and in-place descending order.

In [None]:
listOfNum = [56, 34, 65, 12, 88, 54, 99, 78]

# new ascending order list named ascending
ascending = sorted(listOfNum)

# in-place descending order
listOfNum.sort(reverse=True)

print("Ascending: ", ascending)
print("Descending: ", listOfNum)

Example 2: Given a list of alphabets, create a new sorted list in descending order and in-place ascending order.

In [None]:
listOfAlp = ['g', 'w', 's', 'a', 'k', 'e', 'q', 'd']

# new descending order list named descending
descending = sorted(listOfAlp, reverse=True)

# in-place ascending order
listOfAlp.sort()

print("Ascending: ", listOfAlp)
print("Descending: ", descending)

Exercise: Flatten list given and sort by ascending order `(5 minutes)`

In [None]:
# list given
listOfNum = [76, 23, 54, 68]
listOfFloat = [54.1, 53.9, 54.0]
listOfAlp = ['e', 'x', 'p']
listOfCapAlp = ['E', 'D', 'I', 'A']
combined = [listOfNum, listOfFloat, listOfAlp, listOfCapAlp]

# flatten array


# sort by ascending order


# Expected result:
# ['23', '53.9', '54', '54.0', '54.1', '68', '76', 'A', 'D', 'E', 'I', 'e', 'p', 'x']

# Slicing

`Estimation: 5 minutes`

Slicing is used to extract a part of a List, a Tuple or a String.

### Syntax

**list[start:end:step]**

**start** by default is 0

**end** by default is the last list item

**step** by default is 1 

Example 1: Given a list, slice list with default step.

In [None]:
items = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# all items in list
print("Initial list: ", items)

# first and last item in the array/list
print("First item in list: ", items[0])
print("Last item in list: ", items[-1])

# all item except the last two items
print("All item except last two items in list: ", items[:-2])

# items from 3rd to 5th array/list
print("3rd to 5th items in list: ", items[2:5])

Example 2: Slice list with different step.

In [None]:
# all items in the array/list reversed
print("All reversed: ", items[::-1])

# first 3 items reversed
print("First 3 items reversed: ", items[2::-1])

# last 3 items reversed
print("Last 3 items reversed: ", items[:-4:-1])

# all items in even position
print("All items in even position: ", items[::2])


Exercise: Clean list and print "Expedia" `(5 minutes)`

In [None]:
listOfExpedia = ['(', '$', 'a', 'i', 'd', 'e', 'p', 'x', 'E', ')']

# hint:
# convert list to string by using: ''.join(list)



# Merging

`Estimation: 1 minutes`

Merging newList into existingList.

### Syntax

**existingList.extend(newList)**

Example 1: Given 2 list, merge them.

In [None]:
existingFruits = ["Apple", "Banana", "Mango"]
newFruits = ["Dragon Fruits", "Kiwi"]

existingFruits.extend(newFruits)
print("Merge new fruits into existing list: ", existingFruits)

# Function

### What is a function?
In computer programming, a function (subroutine) is a sequence of program instructions that perform a specific task, packaged as a unit. This unit can then be used in programs wherever that particular task should be performed. (Wikipedia)

Function is basically a group of commands to the computer, and it can come with parameters to modify the behaviour of these commands.

General syntax to define a function:
```Python
def functionname( parameters ):
    operations inside the function # take note of the indentation, operations and return indented belong to this function
                                   # we can put any kind of command inside the function, e.g. variable assignment, conditional, loop, etc.
    return some_value # if some_value is missing, function returns None. If return statement is missing, function also returns None
```

Syntax to execute a function:
```Python
functionname( parameters ) # we need substitute parameters with their real values
```

Here are a few examples:

In [None]:
# Function to return sum of 3 numbers
def sum_of_three(num1, num2, num3):  # we define the function here
    sum = num1 + num2 + num3
    return sum


sum_of_2_4_5 = sum_of_three(2, 4, 5)  # we execute the function here, replace the parameters with the actual values,
                                      # then we assign sum_of_2_4_5 to the value returned by sum_of_three function
print(sum_of_2_4_5)

print(sum_of_three(10, 20, 30))  # we can also print directly the value returned by sum_of_three function

In [None]:
# Parameters are optional
def hello():
    print('hello')

In [None]:
# We need to execute the function by calling it
def hello():
    print('hello')


hello()

Let's do some practices with function:

In [None]:
# define a cube function that take in one number in parameter, return the cubic value of the number

# Start here



# Don't touch the code below
print(cube(3))  # make this work without error and print 27

In [None]:
# define a print_lower_than_10000 function that take in a list, print the numbers from the list smaller than 10000. Hint: function can contain loop

# Start here



# Don't touch the code below
daily_steps = [11980, 10437, 17616, 24586, 16136, 13700, 39812, 9195, 12855, 11309, 23606, 11848, 6120, 6254, 8754, 6469, 8849, 9911, 7709, 534, 13465, 7341, 11230, 7878, 11029, 8790, 9006, 21942]
print_lower_than_10000(daily_steps)  # make this work without error

## What is a CSV file?

A CSV is a comma separated values file which allows data to be saved in a table structured format. A CSV file is similar to an Excel spreadsheet, though it doesn't have the style formatting and has a .csv extension instead. Traditionally they take the form of a text file containing information separated by commas, hence the name.

### Content in CSV

Starting from this lesson, we will use the data from the passengers in Titanic.

In "/data/titanic.csv", you'll find a dataset containing all the passengers in the fateful sinking of Titanic, along with some demographics and details, and whether or not they survived.

*** The following sample data is from the Titanic Dataset ***

     PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked

     1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S

     2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C


However, CSV file is a table, as we are familiar with Excel, to make it more clear, we can have a comparison with Excel we generally use in our daily work. The first row is the column names, and the specific information of each person starts from the second row.

If you open the CSV file using a text Editor such as Notepad, you'll see the content as above.

If you open the CSV file using Excel, the content will have the same format as a .xls/.xlsx file.

## How to list the content of a CSV file

To read and analyse the data, we use the Python csv library. CSV literally stands for comma separated variable, where the comma is what is known as a "delimiter."

   *** Tips: Whatever the libraries you need, you should import all the necessary libraries at the beginnng of the code.***
    
** Let's get started! **

### How to read a CSV file in Python -- `csv.reader()` function

    1. The first thing we need to do is to import the csv library, then we can use its functions in our program.
    2. Open the file by using its path
    3. To read a CSV file, we use csv.reader() function from the csv library
    4. Use "for" statement we learnt in Lesson 3 to read through the data row by row from a CSV file
    5. Print each row


In [None]:
# Print CSV file
import csv
with open('data/titanic.csv') as csvfile:
    readCSV = csv.reader(csvfile)
    for row in readCSV:
        print(row)

This is how we read a CSV file row by row in Python.

### How to print some specific data from a CSV file

When using the csv library in Python, a row is a variable of type List with the first index (0) being the first column, the second index (1) being the second column, and so on.

The following code block will print the team_no, team_name, and team_captain values of each row.

In [None]:
# Print CSV file
import csv
with open('data/titanic.csv') as csvfile:
    readCSV = csv.reader(csvfile)
    for row in readCSV:
        print(row[0], row[1], row[2])

#### `with` statement

From the code above, we use a keyword `with` when opening the CSV file. Then what's `with`? And what's the `with` statement used for?

By using `with` statement, you can get better syntax and errors handling. We call it exceptions handling.

`with` statement simplifies errors handling by wrapping common preparation and cleanup tasks. In addition, it will automatically close the file even if there are errors when opening it. The `with` statement also provides a way for ensuring that a clean-up is always used.

Here is how we print the second row of the CSV data:

In [None]:
# Print the data of the first row
import csv
with open('data/titanic.csv') as csvfile:
    readCSV = csv.reader(csvfile)
    for row_index, row in enumerate(readCSV):
        if row_index == 1:
            print(row)
            break

As we already know, the index of a **list** starts from 0, it applies to every programming language as well. 

CSV is a table (or a list of list of cell), therefore, the index of column and row start from 0 as well.

In the code sample above, we use `if row_index == 1` because we want to print the first row only, and index 0 is the column names.

# Packages

So far everything that we've been working with has been based on Python's rich standard library. However, with it being a popular language, there are many third party packages written by the community to expand on it's capabilities. Some popular packages include:
* Pandas - wrangling and munging of tabular data
* Numpy - numerical operations
* Sci-kit learn - machine learning algorithms
* Matplotlib - plotting
* Tensorflow - machine learning and AI

It can be tedious to manually install extra packages that one needs / depends on (a particular package can have dependencies on another 10 packages that has to be installed prior), and hence there's a need for some kind of standardized packaging system that allows for the ease of installation and distribution of packages - PyPI (Python Packaging Index) / Anaconda. Feel free to read more about them in the links below.

[PyPI](https://pypi.org/help/)
[Anaconda](https://www.anaconda.com/download/)

For installation on your personal PC, I'd recommend for you to use Anaconda's distribution for ease of maintaining versions. At work however, due to admin rights restrictions, we may have to use install python independently followed by pip installing packages in PyPI.

More details on packaging [here](http://the-hitchhikers-guide-to-packaging.readthedocs.io/en/latest/introduction.html)

As you've seen above, modules and packages can be brought in or imported with the following statement:

* import module_name

In some cases, you might see the following syntax used to import certain functions in a module/package:

* from module_name import function_name

What is the difference between these 2?

## Homework
