## Lesson 3 Overview

1. List comprehension
2. Functions
3. Importing data
4. Packages

## Let's load today's lesson!

### Open Azure Notebooks library 

Go to https://notebooks.azure.com -> Sign in if needed -> Select **python-intro**

### Update lesson file to latest version

Select **New** -> **From URL** -> input https://raw.githubusercontent.com/kelvnt/python-intro/master/lessons/Lesson3.ipynb (URL is available in **Lesson3.ipynb**) -> Click outside input then select **Upload** (overwrite if needed)

### Open Jupyter lab

From your browser's bookmark or **Run** -> Change browser URL path from **/nb/tree** to **/nb/lab**

Select **Lesson3.ipynb**

In lesson 2, you've learnt about List and how to create and manipulate them, and about Loops and how to create and update Lists using Loops.


# List comprehensions

`Estimation: 10 minutes`

List comprehensions provide a concise way to create lists.

It consists of brackets containing an expression followed by a `for` clause, then zero or more `for` or `if` clauses. The expressions can be anything, meaning you can put in all kinds of objects in lists.

The result will be a new list resulting from evaluating the expressions in the context of the `for` and `if` clauses which follow it.

The list comprehension always returns a result list.

### Syntax

The list comprehension starts with a '[' and end with ']' and this will ensure the end result is going to be a list.

**newList = [expression(x) for x in oldList if filter(x)]**

**newList** is the new list result.

**expression(x)** is the expression based on the variable used for each element in the old list.

**for x in oldList** is a `for` loop.

**if filter(x)** is a if-statement to filter unwanted results.

Example 1: Create a list of squares for even numbers from 0 to 9

In [None]:
# Normal loop implementation
evenSquares = []
for x in range(10):
    if x % 2 == 0:
        evenSquares.append(x**2)
        
print(evenSquares)

In [None]:
# List comprehensions implementation
[x**2 for x in range(10) if x % 2 == 0]

Example 2: Create a tuple that have all different numbers combination between 2 array range from 0 to 4.

In [None]:
# Normal loop implementation
combinations = []
for x in range(5):
    for y in range(5):
        if (x != y):
            combinations.append((x, y))
            
print(combinations)

In [None]:
# List comprehensions implementation
combinations = [(x, y) for x in range(5) for y in range(5) if x != y]
print(combinations)

Example 3: Given 2-dimentional list (like a table), flatten it (to a normal list)

In [None]:
table = [[1, 2, 3],
         [4, 5, 6],
         [7, 8, 9]]

# Normal loop implementation
flatten = []
for row in table:
    for num in row:
        flatten.append(num)

print(flatten)

In [None]:
# List comprehensions implementation
[num for row in table for num in row]

Exercise: Add all possible tuples that have the first number ranging from 2 to 5 (x) and the second number ranging from 1 to 25 and divisible by the first number to a list. `(5 minutes)`

In [None]:
# Normal loop implementation

# List comprehensions implementation

# Expected result:
# [(2, 2), (2, 4), (2, 6), (2, 8), (2, 10), (2, 12), (2, 14), (2, 16), 
#  (2, 18), (2, 20), (2, 22), (2, 24), (3, 3), (3, 6), (3, 9), (3, 12), 
#  (3, 15), (3, 18), (3, 21), (3, 24), (4, 4), (4, 8), (4, 12), (4, 16), 
#  (4, 20), (4, 24), (5, 5), (5, 10), (5, 15), (5, 20), (5, 25)]

In [None]:
for i in range(10):
    print(i)
    
print(i)

# Sorting

`Estimation: 5 minutes`

### Ascending - from small to big, from low to high, from A to Z, from a to z

By default, all sorting are in ascending order.

**sorted(list)** will return a new sorted list, leaving the original list unaffected.

**list.sort()** will sort the list **in-place**. If the list is immutable, it will return None.

### Descending - in reverse of Ascending

Both sort() and sorted() accept an additional parameter for reverse in boolean type.

**sorted(list, reverse=True)**

**list.sort(reverse=True)**

Example 1: Given a list of numbers, create a new sorted list in ascending order and in-place descending order.

In [None]:
listOfNum = [56, 34, 65, 12, 88, 54, 99, 78]

# new ascending order list named ascending
ascending = sorted(listOfNum)
print(ascending)
print(listOfNum)

# in-place descending order
listOfNum.sort(reverse=True)

print("Ascending: ", ascending)
print("Descending: ", listOfNum)

Example 2: Given a list of alphabets, create a new sorted list in descending order and in-place ascending order.

In [None]:
listOfAlp = ['g', 'w', 's', 'a', 'k', 'e', 'q', 'd']

# new descending order list named descending
descending = sorted(listOfAlp, reverse=True)

# in-place ascending order
listOfAlp.sort()

print("Ascending: ", listOfAlp)
print("Descending: ", descending)

Exercise: Flatten list given `(5 minutes)`

In [None]:
# list given
listOfNum = [76, 23, 54, 68]
listOfFloat = [54.1, 53.9, 54.0]
listOfAlp = ['c', 'g', 'h']
listOfCapAlp = ['C', 'A', 'R', 'E']
combined = [listOfNum, listOfFloat, listOfAlp, listOfCapAlp]

print(combined)
# flatten array

flatten = [item for nested_list in combined for item in nested_list]
print(flatten)

# Slicing

`Estimation: 5 minutes`

Slicing is used to extract a part of a List, a Tuple or a String.

### Syntax

**list[start:end:step]**

**start** by default is 0

**end** by default is the last list item

**step** by default is 1 

Example 1: Given a list, slice list with default step.

In [None]:
items = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# all items in list
print("Initial list: ", items)

# first and last item in the array/list
print("First item in list: ", items[0])
print("Last item in list: ", items[-1])

# all item except the last two items
print("All item except last two items in list: ", items[:-2])

# items from 3rd to 5th array/list
print("3rd to 5th items in list: ", items[2:5])

Example 2: Slice list with different step.

In [None]:
# all items in the array/list reversed
print("All reversed: ", items[::-1])

# first 3 items reversed
print("First 3 items reversed: ", items[2::-1])

# last 3 items reversed
print("Last 3 items reversed: ", items[:-4:-1])

# all items in even position
print("All items in even position: ", items[::2])


Exercise: Reverse the list and print "changi" `(5 minutes)`

In [None]:
my_list = ['i', 'g', 'n', 'a', 'h', 'c']

# hint:
# convert list to string by using: ''.join(list)
print(''.join(my_list[::-1]))

# Merging

`Estimation: 1 minutes`

Merging newList into existingList.

### Syntax

**existingList.extend(newList)**

Example 1: Given 2 list, merge them.

In [None]:
existingFruits = ["Apple", "Banana", "Mango"]
newFruits = ["Dragon Fruits", "Kiwi"]

existingFruits.extend(newFruits)
print("Merge new fruits into existing list: ", existingFruits)

# Function

### What is a function?
In computer programming, a function (subroutine) is a sequence of program instructions that perform a specific task, packaged as a unit. This unit can then be used in programs wherever that particular task should be performed. (Wikipedia)

Function is basically a group of commands to the computer, and it can come with parameters to modify the behaviour of these commands.

General syntax to define a function:
```Python
def functionname( parameters ):
    operations inside the function # take note of the indentation, operations and return indented belong to this function
                                   # we can put any kind of command inside the function, e.g. variable assignment, conditional, loop, etc.
    return some_value # if some_value is missing, function returns None. If return statement is missing, function also returns None
```

Syntax to execute a function:
```Python
functionname( parameters ) # we need substitute parameters with their real values
```

Here are a few examples:

In [None]:
# Function to return sum of 3 numbers
def sum_of_three(num1, num2, num3):  # we define the function here
    sum = num1 + num2 + num3
    print(sum)


sum_of_2_4_5 = sum_of_three(2, 4, 5)  # we execute the function here, replace the parameters with the actual values,
                                      # then we assign sum_of_2_4_5 to the value returned by sum_of_three function
print(sum_of_2_4_5)

print(sum_of_three(10, 20, 30))  # we can also print directly the value returned by sum_of_three function

In [None]:
# Parameters are optional
def hello():
    print('hello')

In [None]:
# We need to execute the function by calling it
def hello():
    print('hello')


hello()

Let's do some practices with function:

In [None]:
# define a cube function that take in one number in parameter, return the cubic value of the number

# Start here
def cube(x):
    return x**3

def cube_many(list_):
    output = []
    for i in list_:
        output.append(i**3)
    return output
    

# Don't touch the code below
print(cube(3))  # make this work without error and print 27
cube_many([1,5,123,34,23])

## What is a CSV file?

A CSV is a comma separated values file which allows data to be saved in a table structured format. A CSV file is similar to an Excel spreadsheet, though it doesn't have the style formatting and has a .csv extension instead. Traditionally they take the form of a text file containing information separated by commas, hence the name.

### Content in CSV

Starting from this lesson, we will use the data from the passengers in Titanic.

In "/data/titanic.csv", you'll find a dataset containing all the passengers in the fateful sinking of Titanic, along with some demographics and details, and whether or not they survived.

*** The following sample data is from the Titanic Dataset ***

     PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked

     1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S

     2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C


However, CSV file is a table, as we are familiar with Excel, to make it more clear, we can have a comparison with Excel we generally use in our daily work. The first row is the column names, and the specific information of each person starts from the second row.

If you open the CSV file using a text Editor such as Notepad, you'll see the content as above.

If you open the CSV file using Excel, the content will have the same format as a .xls/.xlsx file.

## How to list the content of a CSV file

To read and analyse the data, we use the Python csv library. CSV literally stands for comma separated variable, where the comma is what is known as a "delimiter."

   *** Tips: Whatever the libraries you need, you should import all the necessary libraries at the beginnng of the code.***
    
** Let's get started! **

### How to read a CSV file in Python -- `csv.reader()` function

    1. The first thing we need to do is to import the csv library, then we can use its functions in our program.
    2. Open the file by using its path
    3. To read a CSV file, we use csv.reader() function from the csv library
    4. Use "for" statement we learnt in Lesson 3 to read through the data row by row from a CSV file
    5. Print each row


In [None]:
# Print CSV file
import csv
with open('../data/titanic.csv') as csvfile:
    readCSV = csv.reader(csvfile)
    for row in readCSV:
        print(row)

This is how we read a CSV file row by row in Python.

### How to print some specific data from a CSV file

When using the csv library in Python, a row is a variable of type List with the first index (0) being the first column, the second index (1) being the second column, and so on.

The following code block will print the team_no, team_name, and team_captain values of each row.

In [None]:
# Print CSV file
import csv
with open('../data/titanic.csv') as csvfile:
    readCSV = csv.reader(csvfile)
    for row in readCSV:
        print(row[0], row[1], row[2])

#### `with` statement

From the code above, we use a keyword `with` when opening the CSV file. Then what's `with`? And what's the `with` statement used for?

By using `with` statement, you can get better syntax and errors handling. We call it exceptions handling.

`with` statement simplifies errors handling by wrapping common preparation and cleanup tasks. In addition, it will automatically close the file even if there are errors when opening it. The `with` statement also provides a way for ensuring that a clean-up is always used.

Here is how we print the second row of the CSV data:

In [None]:
# Print the data of the first row
import csv
with open('../data/titanic.csv') as csvfile:
    readCSV = csv.reader(csvfile)
    for row_index, row in enumerate(readCSV):
        if row_index == 1:
            print(row)
            break

As we already know, the index of a **list** starts from 0, it applies to every programming language as well. 

CSV is a table (or a list of list of cell), therefore, the index of column and row start from 0 as well.

In the code sample above, we use `if row_index == 1` because we want to print the first row only, and index 0 is the column names.

#### But being able to do this isn't very useful yet, how can we store this data with the data types that we have learnt?

Can we save it into a nested list? After which, how can we manipulate it?

In [None]:
#how can we save the data?
df = []

with open('../data/titanic.csv') as csvfile:
    readCSV = csv.reader(csvfile)
    for row in readCSV:
        df.append(row)
print(df)

In [None]:
# To access a row
print(df[3])

# To access a particular cell, nested list
print(df[3][3])

# To access a column
for i in range(1,len(df)):
    i = i-1
    print(df[i][9])
    
# To assign a new column that takes the passenger's fare and multiplies it by 2
df[0].append("fare_x2")
for i in range(1,len(df)):
    i = i-1
    if i == 0:
        continue
    df[i].append(float(df[i][9])*2)

print(df[0:2])

#### As you can see, it gets increasingly cumbersome to perform simple manipulation and calculation in such a data structure. Nested lists does not seem like a great way to store data. What alternatives do we have?

# Packages

So far everything that we've been working with has been based on Python's rich standard library. However, with it being a popular language, there are many third party packages written by the community to expand on it's capabilities. Some popular packages include:
* Pandas - wrangling and munging of tabular data
* Numpy - numerical operations
* Sci-kit learn - machine learning algorithms
* Matplotlib - plotting
* Tensorflow - machine learning and AI

It can be tedious to manually install extra packages that one needs / depends on (a particular package can have dependencies on another 10 packages that has to be installed prior), and hence there's a need for some kind of standardized packaging system that allows for the ease of installation and distribution of packages - PyPI (Python Packaging Index) / Anaconda. Feel free to read more about them in the links below.

[PyPI](https://pypi.org/help/)
[Anaconda](https://www.anaconda.com/download/)

For installation on your personal PC, I'd recommend for you to use Anaconda's distribution for ease of maintaining versions. At work however, due to admin rights restrictions, we may have to use install python independently followed by pip installing packages in PyPI.

More details on packaging [here](http://the-hitchhikers-guide-to-packaging.readthedocs.io/en/latest/introduction.html)

### As you've seen above / previous lessons, modules and packages can be brought in or imported with the following statement:

```python
import module_name
```

In some cases, you might see the following syntax used to import certain functions in a module/package:

```python
from module_name import function_name
```

What is the difference between these 2?
Run the lines below and think..

In [None]:
import math
math.sqrt(12)

In [None]:
sqrt(12)

In [None]:
math.log10(0.5)

We can also provide an alias to refer to the module_name by

In [None]:
import math as xyz
xyz.sqrt(12)
xyz.log10(0.5)

In [None]:
from math import sqrt
sqrt(12)

In [None]:
log10(0.5)

When we do an **import module_name**, we are bringing in all functions from the module in, and they are then called specifically by:

`module_name.function_name()`

However, when we do a **from module_name import function_name**, only the specified function_name is imported into your environment and can be called directly by the function name:

`function_name()`

### When do we use one over another? Do we preferentially prefer 1 method to another?

_This is usually up to the individual's preference.._

But I would highly recommend you to use **import module_name** and call functions via `module_name.function_name()` as it gives you better awareness of where that function is coming from. _In an unlikely case where 2 different modules have the same function name, using the 2nd method may cause some confusion over which one you're using.._ There are of course [more factors to consider](https://stackoverflow.com/questions/710551/use-import-module-or-from-module-import) but I'll leave you with that for now.

#### With that out of the way, let's look at how to import and work with data using Pandas!

Pandas provide a function called `read_csv` that takes the path of the csv file as an argument.
If you want to import data from an excel file, simply use the `read_excel` function in pandas instead.

More importantly, pandas introduces a new data type called a data frame that is great for storing tabular data.

In [None]:
import pandas as pd

df = pd.read_csv("../data/titanic.csv")

In [None]:
# Look at the first 5 rows of data
print(df.head())

# Access a particular column
print(df['Age'].head())

# Assign a new column 
df['fare_x2'] = df['Fare']*2
print(df['fare_x2'].head())

#### This is so much easier right?

I'll stop here for now, and show you how we can access and work with a pandas DataFrame in the next lesson.

Quick Recap:
You've figured out how to -
* List comprehensions
* Write python functions
* How to import data - csv files
* Packaging environment in Python

See you in our next lesson!

## Homework

WHAT'S A LESSON WITHOUT HW??????????????????

#### List Comprehensions

Q1: Find all of the numbers from 1-1000 that are divisible by 7

Q2: Count the number of spaces in the string below

Q3: Find all of the words in the string below that are less than 4 letters

Q4: Remove all of the vowels in the string below

In [None]:
#Q1
div_7 = [x for x in range(1, 1001) if x % 7 ==0]
print(div_7)

#Q2
sample_text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore"

#1st method, without list comprehensions
print(sample_text.count(' '))

#2nd method, with list comprehensions
results = [character for character in sample_text if character == ' ']
print(len(results))

#Q3
results = [word for word in sample_text.split() if len(word) < 4]
print(results)

#Q4
vowels = ['a','e','i','o','u']
results = [letter for letter in sample_text if letter.lower() not in vowels]
print(results)

#### Functions

Q1: Write a function that takes in a path to a csv file, reads it in, prints out the first 5 rows of the data (`hint: dataframe.head()`), and asks the user whether or not it has been imported correctly.

Q2: Write a Python function to multiply all the numbers in a list

Q3: Write a Python program to reverse a string (eg. "abcdefg" to "gfedcba")

Q4*: Write a Python function to calculate the factorial of a number (a non-negative integer). The function accepts the number as an argument. 

In [None]:
#Q1
import pandas as pd

def import_check(path):
    df = pd.read_csv(path)
    print(df.head())
    check = input("Is the file imported correctly? (Y/N)")
    
    if check == "Y":
        print("File imported correctly.")
    else: print("Please check if the file path is correct or file was extracted correctly.")
    
    return df

df_titanic = import_check("../data/titanic.csv")

#Q2
def multiply_list(list_num):
    result = 1
    for num in list_num:
        result *= num # same as result = result * num
    return result

product = multiply_list([23,12,2,5,6])
print(product)

#Q3
def rev_string(string):
    return string[::-1]

result = rev_string(".emosewa si HGC")
print(result)

#Q4
def factorial(num):
    result = 1
    while num >= 1:
        result = result * num
        num = num - 1
    return result

result = factorial(4)
print(result)