## Quick Overview of Python

This section will give a quick overview of the python programming language and its features. The focus will be on features that are important from data analysis and machine learning perspective. Here is a quick list of features that will be discussed in this section.

* Declaring variables
* Conditional Statements
* Control flow statements
* Collections
* Functions
* Functional Programming
* Modules and packages

So, let's get started!

### Declaring Variables

In Python, a variable can be declared and then assinged a value without specifying data type. Python is dynamically typed language and automatically **infers the variable type from values assigned to it**. Also the variable need not be assigned a value of same type during its lifetime. A variable initialized with a value of one type (e.g. integer) and can later be re-assigned value of a different type (e.g. string).

Python supports the following types:

- *int* - Integer type.
- *float* - Floating point numbers.
- *bool* - Booleans are subtypes of integers and assigned value using literals *True* and *False*.
- *str* - Textual data.

Creating variables of different types.

In [1]:
var1 = 2
var2 = 5.0
var3 = True
var4 = "Machine Learning"

Printing value of all the variables.

In [2]:
print("Value of var1 :", var1)
print("Value of var2 :", var2)
print("Value of var3 :", var3)
print("Value of var4 :", var4)

Value of var1 : 2
Value of var2 : 5.0
Value of var3 : True
Value of var4 : Machine Learning


Checking the data type of each variable.

Python provides a method *type()* , which takes a variable as an argument and prints its data type.

In [3]:
type( var1 )

int

In [4]:
type( var2 )

float

In [5]:
type(var3)

bool

In [6]:
type(var4)

str

As you can see the variables' data types are appropriately chosen based on the values assigned to them.

### Conditional Statements

Python supports *if-elif-else* for writing conditional statments. 

The condition should be terminated by *:* (colon) and the code block following that must be indented. **Indentation is python is not optional**. It is a syntax of python. The conditional statement need not be enclosed with a bracket.

An simple example of a simple *if* condition is shown below.

In [None]:
# Checking a condition if the variable value is more than 1
if var1 > 1:
    print( "Bigger than 1" )

Note: A line starting with hash (#) is a comment line in python.

An example of a complex if-elif-else statement to check if the variable x is larger than y or smaller or same.

In [None]:
x = 10
y = 12

# if x is greater than y
if x > y:
    print ("x > y")
# if x is lesser than y    
elif x < y:
    print ("x < y")
else:
    print ("x = y")

Ternary operators are operators that evaluate something based on a condition being true or false. It enables to test a condition in a single line instead of the multi-line if-else. 

For example, Assigning *True* or *False* to variable based on condition check as below.

In [None]:
# Initialize
x = 5

# Assign True if x is more than 10 or assign False using ternary operator
isGreater = True if x > 10 else False

In [None]:
isGreater

### Generating Sequence Numbers

Sometimes, it may be necessary to create a list of sequence numbers. *range()* function generates a sequence of numbers. It takes three parameters

- start: Starting number of the sequence.
- stop: Generate numbers up to, **but not including this number**.
- step: Difference between each number in the sequence. Optional. Default is 1.

Generating a sequence of numbers from 1 to 5.

In [None]:
# Initializing the sequence of numbers starting from 1
# and ending (not including) with 6
numbers = range( 1, 6 )

numbers

### Control Flow Statements

A *for loop* can be used to print the sequence of number by iterating through it as follows. 

In [None]:
# Iterate through the collection
for i in numbers:
    print (i)

The *for* loop automatically advances the iterator through the range after every loop and completes after the iterator reached the end.

A while loop is used to keep executing a loop until a condition is false. Printing a list of interger from 1 to 5 using *while loop* below.

In [None]:
# Initialize the value of 1
i = 1

# check the value of i to check if the loop will be continued or not
while i < 5:
    
    print(i)
    # Increment the value of i.
    i = i+1
    
# print after the value of i
print('Done')

In *while* loop, the state of i has to be managed explicitly. There is a good chance of going into an infinite loop, if the state of i is not incremented. Beware!

### Functions

Function are most important part of a languange.

- Functions can be created **using *def* keyword**. 
- The function signature should contain the function name followed by the input parameters enclosed in brackets and must end with a colon(:). 
- **Parameters are optional, if initialized in the definition**. 
- The code block inside the method should be **indented**. 
- The function **ends with a return statement**. No return statement implies the function returns *None*, which is same as *void* return type in languages like C, C++ or Java.

The data types of parameters and return types are inferred at runtime.

An example of a function which takes two parameters and returns the addition of their values usng '+' operator. 

In [None]:
def addElements( a, b ):
    return a + b

Few examples of method invocation are shown below.

Invoking the function with two integer values.

In [None]:
result = addElements( 2, 3 )

result

Invoking the function with two float values.

In [None]:
result = addElements( 2.3, 4.5 )

result

Invoking the function with two strings.

In [None]:
result = addElements( "python", "workshop" )

result

It can be observed that the data type of parameters and return types are automatically determined based on values passed to the function. If two string are passed, they are concatenated as the + operator is also overloaded for string concatenation.

The default value for the parameters can be defined in function signatures. This makes the parameter optional.

Defining the method *addElements* with parameter b initilaized to 4.

In [None]:
def addElements( a, b = 4 ):
    return a + b

Invoking the function with only one parameter.

In [None]:
addElements( 2 )

Invoking the function with both the parameters.

In [None]:
addElements( 2, 5 )

In the last example, the default value of b is overridden with the value passed.

### Working with Collections

Collections are useful containers or data structures to store and manipulate list of homogeneous or heterogeneous elements. We will discuss the following collections in this section.

- List
- Tuple
- Set
- Dictionary


#### List



Lists are like arrays, but can contain **heterogeneous items** i.e. a single list can conatin items of type integer, float or string or objects.  It is also **not a unique set of items** i.e the values can repeat. Lists are mutable and generally initialized with a list of values specified inside a square brackets or an empty list.

In [None]:
## Create an empty list
emptyList = []

Creating a list of batsmnn in indian cricket team in the order of batting.

In [None]:
batsmen = ['Rohit', 'Dhawan', 'Kohli', 'Rahane', 'Rayudu', 'Dhoni']

List index starts with 0. And an item in the list can be accessed using index as below.

In [None]:
batsmen[0]

A slice of the list can be obtained using an index range separated by a colon(:). A range [0:2] means starting with index 0 until index 2, but not including 2.

In [None]:
## Slicing an list
batsmen[0:2]

To find the last batsman, an index value of -1 can be used.

In [None]:
## Accessing the last element
batsmen[-1]

To find out number of elements in the list, the list can be passed to a function called *len()*.

In [None]:
# how many elements in the list
len( batsmen )

Two separate lists can be concatenated into one list using + operator. 

In [None]:
bowlers = ['Bumrah', 'Shami', 'Bhuvi', 'Kuldeep', 'Chahal']

In [None]:
all_players = batsmen + bowlers

In [None]:
all_players

Finding if an item exisits in a list or not, the *in* operator can be used. It returns True if exists, else returns False.

In [None]:
'Bumrah' in bowlers

In [None]:
'Rayudu' in bowlers

Finding the index of an item in the list.

In [None]:
all_players.index( 'Dhoni' )

The items in an list can be arranged in reverse order by calling *reverse()* function on the list.

In [None]:
all_players.reverse()

all_players

#### Tuples

Tuple is also a list, but it is immutable. Once a tuple has been created it can not be modified anymore. For example, create a tuple which can contain the name of a cricketer and the year of his ODI debut.

In [None]:
odiDebut = ( 'Kohli', 2008 )

In [None]:
odiDebut

Tuple element's index also starts with 0.

In [None]:
odiDebut[0]

It is not allowed to change the tuple elements. For example, if we try to change the year in the tuple, it gives an error.

In [None]:
tup1[1] = 2009

An existing list can be converted into tuple. Converting the list *all_players* into tuple so that it can not be modified anymore.

In [None]:
players = tuple( all_players )

In [None]:
players

#### Set

A set is a collection of **unique elements** i.e. the values can not repeat. A set can be initialized with a list items enclosed with curly brackets.

In [None]:
setOfNumbers = {6,1,1,2,4,5}

Set automatically removes duplicates. Set contains only unique list of numbers.

In [None]:
setOfNumbers

Set supports operations like union, intersection and difference. 

To understand theses operations, let's create two Sets with list of batsmen played for India in 2011 and 2015 world cup teams.

In [None]:
wc2011 = {"Dhoni", "Sehwag", "Tendulkar", "Gambhir", "Kohli", "Raina", "Yuvraj", "Yusuf"}
wc2015 = {"Dhoni", "Dhawan", "Rohit", "Rahane", "Kohli", "Raina", "Rayudu", "Jadeja"}

To find the list of all batsmen who played in either 2011 or 2015 world cup. To obtain this, we can take union of the above two sets.

In [None]:
wc2011.union( wc2015 )

To find list of all batsmen who played for both 2011 and 2015 world cup, we can take intersection of these two sets wc2011 and wc2015.

In [None]:
wc2011.intersection( wc2015 )

If we need to find the new batsmen who were not part of 2011 world cup and played in 2015 world cup. This can be obtained by taking differene of wc2015 from wc2011.

In [None]:
wc2015.difference( wc2011 )

#### Dictionary

Dictionary is **a list of key and value pairs**. All the **keys in a dictionary are unique**. 

For example a dictionary that contains the ICC ODI World Cup winner from 1975 till 2011, where the key is year of tournament and the value is the name of the winning country.

In [None]:
wcWinners = { 1975: "West Indies", 
              1979: "West Indies", 
              1983: "India",
              1987: "Australia",
              1991: "Pakistan",
              1996: "Srilanka",
              1999: "Australia",
              2003: "Australia",
              2007: "Australia",
              2011: "India"}

The value of a specific dictionary element can be accessed by key. For example, to find the winning country in a specific year. 

In [None]:
wcWinners[1983]

List of all winning countries.

In [None]:
wcWinners.values()

The above list had repeated names of certain countries as they have won multiple times. To find unique list of countries, the above list can be converted to a Set.

In [None]:
set(wcWinners.values())

Adding a new key value pair to the dictionary.

In [None]:
wcWinners[2015] = 'Australia'

In [None]:
wcWinners

### Dealing with Strings

A string in python can be initialized with a single or double quote.

In [None]:
string0 = 'python'
string1 = "machine learning"

If multiline, the string can be initialized with triple quotes as below.

In [None]:
string2 = """This is a 
 multiline string"""

Converting a string to uppper or lower case.

In [None]:
# Converting to upper case
string0.upper()
# Similarly string.lower() can be used to convert to lower case.
# string0.lower()

Splitting the string into a list of words or tokens separated by space.

In [None]:
tokens = string1.split(' ')
tokens

### Functional Programming

Functional programming supports **functions being passed as parameters to another functions like variables**. This allows to create higher order functions. One core benefits of functional programming in data analysis is applying transformations or filters to a set of records or columns more efficiently than using plain looping.

#### Example 1: Map

Let's say have a list named *intList* and contains integers as defined below.

In [None]:
intList = [1,2,3,4,5,6,7,8,9]

And we want to create another list named *squareList*, which will contain the squared value all elements in *intList*. Typical approach to accomplish this is to write a *for loop* as below.

In [None]:
# Create an empty list.
squareList = []

# Loop through the intList, square every item and append to result list squareList.
for x in intList:
    squareList.append( pow( x, 2 ) )

print( squareList )    

The above code is quite verbose and not efficient. The loop transforms items in sequential fashion and has no scope of parallel processing. Using functional programming approach, this can be written more efficiently as described below in steps.

Step 1:

- Define a function *square_me()* that takes an integer and returns the square value of it. 

In [None]:
def square_me( x ):
    return x * x

Step 2:

- The function *square_me* and the list of integers can be passed to a higher order function map(). map() iterates through the list and transforms each element using the function.

In [None]:
squareList = map( square_me, mylist)

Printing the result as a list.  

In [None]:
list(squareList)

The square function *square_me()*,we used, is just one line of code and can actually be written as an **anonymous function**. Anonymous function is a function without a name and is defined using *lambda* keyword.

Write the above map using anonymous function.

In [None]:
squareList = map(lambda x: x*x, mylist)
list(squareList)

#### Example 2: Filter

Similarily, filters can also be applied using functional programming. For example, if we want to select only the even numbers from the list *intList*.

The higher order function *filter()* takes another function argument, which returns True or False. If returns False, the element will be filtered out. To verify if an integer is even, we can use filter *x % 2 == 0*.

In [None]:
evenInts = filter( lambda x : x % 2 == 0, mylist )

In [None]:
list( evenInts )

Functional programming is an important aspect of python programming and will be used extensively during data analysis. During data analysis we will deal with mostly a collection of records. So, to accomplish tasks like transformations or filterins, functional programming can be very handy in the place of plain looping.

### Modules and Packages

In python, a module is a file that consists of **functions, classes and variables**. A **set of modules under a namespace (mostly a directory) is called a package**. The modules and packages can be imported to another module using *import* statement. For example, to use mathematics functions, python's *math* module can be imported.

In [None]:
import math

## Taking square root of a value
math.sqrt(16)

Using *from..import* syntax, a specific module or object (e.g. class or function) can be imported from a package. For example to import only *sample()* function from *random* module the following import style can be used.  

In [None]:
from random import sample

Another example is generating a random set of numbers in a range i.e. range(0, 10). *random.sample()* takes the range function and number of random numbers to be generated as parameters. The code below helps to generate 3 random numbers between 0 and 10.

In [None]:
sample( range(0, 11), 3)

### Returning multiple values from a function using Tuple

It may be necessary to return multiple values from a function. This can be achieved by returning a tuple. This is an important feature, we will using in the subsequent chapters. 

For example, define a function that returns a mean and a median of a list of numbers generated randomly.

In [None]:
import random 

randomList = random.sample( range(0, 100), 20)

randomList

In [None]:
from statistics import mean, median

def getMeanAndMedian( listNum ):
    return mean(listNum), median(listNum)

getMeanAndMedian() returns a tuple with two elements and is stored into two separate variables during invocation.

In [None]:
mean, median = getMeanAndMedian( randomList )

Printing the mean and median values from the list.

In [None]:
print( "Mean: ", mean, " Median: ", median)

### Conclusion:

In this section, we have given you a **crash course on python language and its features**. This quick overview is enough to get you started with Machine Learning using Python. For more detailed overview and deep dive into the programming language and its features, you should refer to a resource that is dedicated for python programming. We  suggest the following links for python programming references.

- https://www.python.org/doc/
- https://docs.python.org/3/tutorial/index.html