<img align="center" src="http://sydney.edu.au/images/content/about/logo-mono.jpg">
<h1 align="center" style="margin-top:10px">Statistical Learning and Data Mining (QBUS3820)</h1>
<h2 align="center" style="margin-top:10px">Tutorials 1: Introduction to Python</h2>
<br>
The goal of this series of 13 tutorials is to teach you how to program rapidly, efficiently and correctly using Python 3.

Python is free, portable, and a high-level interpreted language. It is one of the most popular languages used in industry and government.
It is the preferred programming language for ML people - those who need to apply statistical techniques or data analysis at work. Python shines in ML. Its combination of consistent syntax, shorter development time flexibility makes it well-suited to developing sophisticated models and prediction engines that can plug directly into production systems.

Python is a dynamically typed language, which means that the Python interpreter infers the type of an object at runtime. In comparison, compiled languages like C/C++ are generally statically typed. In these cases, the type of an object has to be attached to the object before compile time. Though as a high-level interpreted language, Python is not as fast as c++. But definitely it is so much easier to learn and hence become is the best programming language for everyone who learn programming for the first time. 

We use the web application - Jupyter notebook (formerly ipython) to edit and run our codes via a web browser. We will create documents that contains live codes, text, equations and visualisation.

In this tutorial 1 we will learn about Basic arithmetics, modules (libraries), data types, data structures, Iteration, functions, logics&conditionals and looping.


We will be using Python 3 (3.0 or newer) in our tutorials. The syntax of Python 3 is slightly different from its predecessor, python 2. Instructions for setting up Python on your personal computer by installing the Anaconda distribution are provided here: http://www.marcelscharth.com/python/#setting-up-python

In order to write python code, we will be using **Jupyter-notebook** interface. To run a cell you can press Ctrl+Enter or hit the Play button at the top.


## 1. The Basics

### 1.1.Getting Started


To get started, you can use your Notebook as a calculator. First we will we need to familiar some essential operators in python, namely, +, -, *, /, %, //,  **
For example: we add, subtract, multiply, divide, Modulus, divide and put into whole number, 

The following statement assigns a value to the variable x. Because the variable does not yet exist, the assignment statements creates the variable.

In [21]:
x = 5
x + 2

7

In [22]:
5//2

2

In [23]:
8%5

3

In [4]:
6%2  # there is no remainder if 6 is divided by 2

0

We can always use the build-in function print() to visualise our output

In [20]:
print('7%5 is',7%5)

7%5 is 2


In this case it seems no difference. But it becomes very useful when we want to display a mix of numbers and strings

In [21]:
print('5 is my favorite number')

5 is my favorite number


### 1.2. Modules
Here we will look at some essential libraries for Machine Learning and Data Scienc. These are

Since you already installed the scientific Python distribution Anaconda. We have alread plenty of libraries installed. We just need to call the library base on our needs.

* NumPy - scientific computing
* Pandas - data pre-processing
* Matplotlib - plotting
* Seaborn - to make plots elegant
* StatsModels - statistics
* Scikit-learn - data mining and statistical/machine learning

One of Python’s greatest assets is its extensive set of libraries.
Libraries are sets of routines and functions that are written in a given language. A robust set of libraries can make it easier for developers to perform complex tasks without rewriting many lines of code.
Machine learning is largely based upon mathematics. Specifically, mathematical optimization, statistics and probability. Python libraries help researchers/mathematicians who are less equipped with developer knowledge to easily “do machine learning”.

Numpy is an extensive library for data storage and calculations. It optimizes Python in the sense that it contains data structures, algorithms to do efficient computation of multi-dimensional arrays and matrices. 

Pandas is the Python Data Analysis Library. It gives us high performance data structure. (reshape data, merging, labelling and tabularing, indexing, fixing missing data etc)

Matplotlib is used to visualise data. If you want more controls consider Seaborn.

There are two ways to do this: either by loading the entire package (or a subset of it) or a specific function.

### 1.3 Data structure

Generally, data structures can be divided into two categories: primitive and non-primitive data structures. The former are the simplest forms of representing data, whereas the latter are more advanced: they contain the primitive data structures within more complex data structures for special purposes.

<img src="trees.png">

Today we will primary consider primitive variable types: Integers, Float, Strings, Boolean. 

#### Boolean ####

This built-in data type that can take up the values: True and False, which often makes them interchangeable with the integers 1 and 0. Booleans are useful in conditional and comparison expressions, just like in the following examples:

In [24]:
x = 2 > 0
print(x)

True


In [4]:
type(x)

bool

In [5]:
x = 4
y = 2
print(x == y)

False


In expressions involving numbers, a <TT>False</TT> is automatically converted to zero and a <TT>True</TT> is converted to one.  For example:

In [6]:
y = 2*x
print(y)

8


#### Numerical

The basic numerical data types are integers and floats. Integer $\mathbb{Z}$ are whole number from $-\infty$ to $\infty$. Floating point number are rational numbers end with decimal places.

In [7]:
x = 2
type(x)

int

In [25]:
x = 2.0
type(x)

float

#### String

Strings are collections of alphabets, words or other characters. In Python, you can create strings by enclosing a sequence of characters within a pair of single or double quotes. For example: 'coffee', "cookie", etc.

In [9]:
sentence = 'For truth is always strange; stranger than fiction.' 
type(sentence)

str

You can also concatenate two strings using the '+' operator.

In [27]:
x = 'coffee'
y = 'cookie'
print (x + '&' + y)

coffee&cookie


**Non-primitive** types are the sophisticated members of the data structure family. They don't just store a value, but rather a collection of values in various formats.
* Arrays
* Lists
* Files

**Array** in Python are a compact way of collecting basic data types, all the entries in an array must be of the same data type. However, arrays are not all that popular in Python, unlike the other programming languages such as C++ or Java. So no need to look at it for the moment. **List** in Python is a sequence of values. The individual elements or items of a list can be of any type (even a list).



In [14]:
x =[1, 'Science', True]
type(x)

list

In [15]:
lunch = ['pad thai','souvlaki', 'pizza']

print(lunch)

['pad thai', 'souvlaki', 'pizza']


In [18]:
colors = ["red", "green", "blue", "purple"]
for i in colors:
    print(i)

red
green
blue
purple


In [28]:
for i in range(0,12,2):
    print(i)

0
2
4
6
8
10


In [21]:
range(5) # This means [0,1,2,3,4] Remember to specify range(start,end,step)

range(0, 5)

In [24]:
print(range(2,8,2))

range(2, 8, 2)


Here I have a more complicated example using the numPy arrange problem.

In [5]:
import numpy as np
numbers = np.arange(0,12, 2)
print(numbers)

[ 0  2  4  6  8 10]


Essentially, the code creates the following sequence of values stored as a NumPy array 0,2,4,6. 

Let's take a step back and analyse how this worked. The output range consists of values starting from 0 and incrementing in steps of 2: 2, 4, 6, 8, 10. The range stops at 8. Why? We set the stop parameter to 10. Remember though, numpy.arrange() will create a sequence up to but excluding the stop value. So once arrange() get to 10, the function can't go any further. If it attempts to increment by the step value of 12, which should be excluded, according to the syntax stop = 12. Again, np.arrange will produce values up to but excluding the stop value.

Alternative we loop over list to obtain the equivalent result.

In [6]:
total = 0
for number in numbers:
    print(number)
    total += number  # sums the number to the total, equivalent to total = total + number

print('Total =',total)   

0
2
4
6
8
10
Total = 30


When dealing with a list of mix of string and integers. Eg: here we loop over a dictionary to extract age for each person.

In [7]:
d = {'Ann': 33, 'David': 25, 'Clio': 40} 
for i in d:
    print(i, 'corresponds to', d[i])

Ann corresponds to 33
David corresponds to 25
Clio corresponds to 40


#### 1.6. Defining functions

Functions are essential part of any programming language. You might have already encountered and used some of the in-built Python functions. Defining functions makes the coding process more efficient, prevents errors, and improves readability. You as a Data Scientist or Business analyst will constantly need to write your own functions to solve problems that your data poses to you.

Why do we need functions? You use functions in programming to bundle a set of instruction that you need to use repeatedly. This is a piece of code written to carry out a specified task. 

Let us now learn how to define a function. Here are the key steps.
1. Use the keyword ``def`` to declare the function and follow this up with the function name.
2. Add parameters to the function: they should be within the parentheses of the function. End your line with a colon.
3. Add statements that the functions should execute (optional).
4. End your function with a return statement if the function should output something. Without the return statement, your function will return an object ``None``.

In [28]:
import math

In [29]:
def area(radius):
    return math.pi*radius**2

In [31]:
area(2)

12.566370614359172

In [10]:
area(2)

12.566370614359172


There is a special type of function called <TT>method</TT>, which is a function as part of a class. We access it with an instance or object of the class. (Note: A function doesn't have this restriction: it just refers to a standalone function. This means, all methods are functions, but not all functions are methods.)

Consider the following example, where you first define a function ``subtract()`` and then a ``Difference`` class with a ``diff()`` method:

In [35]:
# define a function diff()
def subtract(a, b):
    return a - b

In [36]:
subtract(2, 3)

-1

In [37]:
# Create a class called difference
# Note: The argument self. is the syntax that Python refers to instance attributes;
# self. is the instance the method is called on.
class Difference(object):
    def diff(self, a, b):
      self.contents = a - b
      return self.contents

If you now want to call the ``diff`` method which is the part of the ``Difference`` class, you first need to define an instance or object of that class. So, let us define such a object. 

In [38]:
# To call diff(), instantiate Difference class
diffInstance = Difference()
diffInstance.diff(2,3)

-1

#### Conditionals: IF, ELIF, ELSE

In [1]:
score = 50
if (score >= 50):
    print("PASS")

PASS


In [2]:
score = 50
if(score >= 50):
    print("PASS")
else:
    print("FAIL")

PASS


In [33]:
score = -1
if(score < 50 and score >=0):
    print("FAIL")
elif(score <65 and score >=50):
    print("PASS")
elif(score <75 and score >=65):
    print("CREDIT")
elif(score <85 and score >=75):
    print("DISTINCTION")
elif(score <= 100 and score >=85):
    print("HIGH DISTINCTION")
else:
    print("INVALID")

INVALID


#### More about looping

<img src="loops.png">

Loops are important in Python or in any other programming language as they help you to execute a block of code repeatedly. You will often come face to face with situations where you would need to use a piece of code over and over but you don't want to write the same line of code multiple times.


In [11]:
# While loop
# Take user input
number = 2  

# Condition of the while loop
while number < 5 :  
    print("Cat")
    # Increment the value of the variable "number by 1"
    number = number+1

Cat
Cat
Cat


The code example above is a very simple while loop: if you think about it, the three components about which you read before are all present: the ``while`` keyword, followed by a condition that translates to either ``True`` or ``False`` (``number < 5``) and a block of code that you want to execute repeatedly (the last two lines)

If you go into detail in the above code, you see that there is a variable ``number`` in which you store an integer ``2``. Since the value in ``number`` is smaller than 5, you print out ``"Cat"`` and increase the value of ``number`` with one. While the value in ``number`` stays smaller than 5, you continue to execute the two lines of code that are contained within the while loop:

You print out "Thank you" two more times before the value of number is equal to ``5`` and the condition doesn't evaluate to ``True`` any more. Because the condition now evaluates to ``False``, you will exit the while loop and continue your program if it contains any more code. In this case, there isn't any more code so your program will stop.

The above example is a bit basic, you can also include conditionals, or, in other words, an if condition, to make it even more customized. Take a look at the following example:

In [12]:
# Take user input
number = 2 

# Condition of the while loop
while number < 5 :  
    # Find the mod of 2
    if number%2 == 0:  
        print("The number "+str(number)+" is even")
    else:
        print("The number "+str(number)+" is odd")

    # Increment `number` by 1
    number = number+1

The number 2 is even
The number 3 is odd
The number 4 is even


Let us look at the difference between for loop and while loop. Say I want to print "Cat" 3 times.

In [13]:
for i in range(3):  
    print("Cat")

Cat
Cat
Cat


In [34]:
# Take user input
number = 2  

while number < 7 :
    print("Thank you")
    # Increment `number` by 1
    number = number+1

Thank you
Thank you
Thank you
Thank you
Thank you


See how easy a for loop is converted to while loop and vice versa. How does it work? 
Well. In a for loop, the integer mentioned inside the range function is the range or the number of times the control needs to loop and execute the code in the for loop's clause.
Note that the range() function's count starts from 0 and not from 1. That means that, in the above example, the count should be like 0,1,2 and not 1,2,3. That's how number counting in a computer's memory works. So, while designing a for loop, always keep in mind that you have to consider the count of range from 0 and not from 1.
