*Stanislav Borysov [stabo@dtu.dk], DTU Management*
# Advanced Business Analytics

## Refreshing Python and Machine Learning: Part 1 - Python basics

*Based on the notebooks from 42184 Data Science for Mobility E19 / 42577 Introduction to Business Analytics E19*

<p>
<a href="https://www.python.org/">Python</a> is the <a href="http://cacm.acm.org/blogs/blog-cacm/176450-python-is-now-the-most-popular-introductory-teaching-language-at-top-us-universities/fulltext"><b><u>most popular</u></b></a> programming language in scientific computing today. Python innately provides efficient data structures allowing for easy data manipulation. Furthermore, Python provides a simple approach to object-oriented programming, which in turn allows for intuitive programming.
</p>
<p>
Python is an "interpreted" language. This means that every Python command that is executed is actually translated to C or C++ code before being executed. C and C++ are very fast and powerful programming languages, but writing programs in these languages can be difficult. We can think of Python as a friendly interface to this powerful computing backend. Therefore, Python code is often slower to run than the equivalent C/C++ code, but the Python code is often more understandable and yields shorter production time.
</p>
<p>
While you shouldn't worry about this detail here, tools exist to decrease the execution time spent translating between Python and C. These tools come in the form of Python packages called "modules". These modules are prewritten code that can be imported into your code and used by you. Packages exist for vector algebra (<a href="http://www.numpy.org/">numpy</a>), statistics and machine learning (<a href="http://www.scipy.org/">scipy</a>), and plotting (<a href="http://matplotlib.org/">matplotlib</a>). We will be using these modules.
</p>
<p>
We will spend the first recitation becoming acquainted with basic Python and some of the packages we will be using. Here are some additional Python tutorials that you might want to read:
</p>
<ul>
    <li>
        <a href="https://docs.python.org/2/tutorial/">Python's Tutorial</a>
    </li>
    <li>
        <a href="https://developers.google.com/edu/python/">Google's Python Tutorial</a>
    </li>
</ul>

<p>
The **Jupyter Notebook** is an open-source web browser application that allows you to create and share documents that contain live code, equations, videos, visualizations and explanatory text.
</p>
<p>
IPython Notebook is a tool for interactively writing and executing Python code. It allows the programmer to easily write and test code by allowing snippets of code and their results to be displayed side-by-side. Each snippet of code is called a "cell". We will see later that even plots are possible. 

There are two central kinds of cells, they are called **Code** and **Markdown**. The cell type can be set using the menu above.

The **Code** cells simply contain the Python code that we want to run each time.

The **Markdown** cells contain text (explanations, sections, etc). The text is written in Markdown. Markdown allows you to write using an easy-to-read, easy-to-write plain text format, then convert it to structurally valid XHTML (or HTML). You can read more about it here:

http://daringfireball.net/projects/markdown/

## You can create sections

* Write words in **bold** and *italics*
* Create lists
* Establish a [hyperlink](https://en.wikipedia.org/wiki/Hyperlink)

(just double click on this cell to see how!   :-) )

### Before we start 

These notebooks serve for you to learn interactively. We present a concept, give a few examples in code, and you try them. You will learn nothing if you ONLY do this! Don't be afraid of:
- Messing around with our examples. Don't worry, you can always get the original file back! ;-)
- Looking in the internet for other code, explanations, even solutions to the exercises!
- Ask us in the class, by email, in the corridor

We organized it into groups of topics, ending with small exercises, to let you assimilate the concepts. After you finish a module, don't move on if you feel insecure about what you've learned. Take a break, ask us, go back and try again.  

## Our pledge

If you do follow these notebooks and our guidance, we promise you'll feel very happy with yourself in a month or two! :-) You'll love Python, and feel ready to become a Data Scientist.

But, the learning curve is steep, so bear with us! :-)

# Python Data Structures

#### 1. Python as a Calculator: Numeric Types and Their Methods

In [None]:
# Firstly, this is a single-line comment.

# The '#' simple indicates a line of code that is ignored during execution.
# Comments are helpful for explaining your code both as a reminder to 
# yourself and to others viewing your code (like a grader!!!).
# Try executing this cell.

* Integers (int) are a numerical data-type. 

In [None]:
print(1+2)
print(3-2)
print(5*6)

* **Floats** are another numeric type that allows for fractions.
     A decimal point indicates the number is a float.

    For example,
     2 is an **int**,
     2.0 is a **float**.

In [None]:
print(2)
print(2.0)
print(1.1+.5)
print(3.2-1.9)
print(6*5.2)
print(7.1/2)

Other mathematical functions are provided in various packages. 

Python has a **`math`** package that contains many useful mathematical functions. 

But we need to **`import`** pre-existing packages to use the code.

In [None]:
import math
print(math.pi)
print(math.sin(math.pi/2))

The period in **`math.<some math function>`** indicates which package to look into for the borrowed code, but it can get annoying.

In [None]:
from math import *
print(pi)
print(sin(pi/2))

For efficiency, you may wish to only **`import`** the functions you know you will use rather than importing all of the code in the module (which means you'd use more memory).

In [None]:
from math import sin,pi
print(pi)
print(sin(pi/2))

Furthermore, you may wish to rename imported functions to make better sense with your own code.

In [None]:
from math import sin as S
print(S(pi/2))

#### 2. Variables and Strings

Let's define our own **variable**.

In [None]:
x = 1.0
print(x)
print(x*2)

Variables are case-sensitive

In [None]:
x=1
X=2.0
print(x)
print(X)

Variables can be mixed in strings, by using the function str()

In [None]:
print("The value of x is "+str(x))

...or by using C-like references

In [None]:
print("The value of x is %d and the value of X is %f"%(x,X))

The following retrieves the value stored in x, adds 2 to it, and stores the result in x.

In [None]:
x+=2  # same as x=x+2
print(x)
x-=10 # same as x=x-10
print(x)
x=x+2
print(x)  #just to make sure you get the point!  ;-)

A **string** is sequence of characters. It allows Python to store and manipulate words

In [None]:
print("Hello World.")

Let's store a string in a variable called **"s"**.

Note that using ' and " to define strings are interchangeable.

In [None]:
s = 'This is a string.'
print(s)

We can access individual characters from the string.

In [None]:
print(s[0]) #first caracter of the string is 0

You can add strings together

In [None]:
s = s + " Another string."
print(s)
s+=" A third string."
print(s)

You can concatenate (combine) strings with numbers in several different ways

In [None]:
s1="Yes, this is number" + str(20) + "and this is" + str(10)   #Notice that when you use the "+" 
s2="Yes, this is number " + str(20) + " and this is " + str(10) # you have to put manually space between the components.
s3="Yes, this in number %d and this is %d" % (20, 10)    #Now, we use the % references
s4="Yes, this in number {} and this is {}".format(20, 10)    #Now, we use the format method
print(s1)
print(s2)
print(s3)
print(s4)

#### 3. True or False?

**Booleans**, or bools, are our last native data type. They can hold only two possible values: **True** or **False**.

There are several functions that act on booleans. Let x and y be variables storing booleans.

* "Not x" switches the value of x. If x is True, then "not x" is False.
* "x and y" returns True if x and y are True.
* "x or y" returns True if x or y are True.

In [None]:
x = True
y = False
print(x)
print(not x)
print(x and y)
print(x and not y)
print(x or y)
print(not x or y)

**'=='** tests for equality

In [None]:
print(1 == 2)
print(1 == 1)

**'!='** tests for inequality

In [None]:
print(1 != 2)

Notice the difference between "=" and "=="

In [None]:
x=1
print(x==1)
y=2
print(y==1)

## If statements: Quizzing Your Code

Booleans are almost always used for If-statements. 

If-statements execute a section of code if a given bool evaluates to True. Note that the sections of code under if-statements need to be indented.

Using a **":"** indicates that an indented section of code follows. 

(to understand this VERY important point, try playing with the indentation to see what happens! ;-) ). 

In [None]:
flag = True
x = 0
if flag:
    x = 1
    print("Flag is True.")
else:
    x = 2
    print("Flag is False.")
print(x)

We can check for other cases as well. Controlling the execution of codelike this is referred to as "flow of control".

In [None]:
if x == 0:
    print("A")
elif x == 1:
    print('B')
else:
    print("C")

***Exercise***: Write a small Python script to add 'ing' at the end of a given string if its length is at least 3. If the string already ends with 'ing' then add 'ly' instead. If the string length of the given string is less than 3, leave it unchanged.

In [None]:
# ...

## Storing data in Python: Lists and Dictionaries

### LISTS

> Lists are a data structure designed for easy storage and access to data. They are initialized using by using "[]" to enclose a comma separated sequence of values. These values can be anything.

In [None]:
L1 = [] # an empty list
x = 5
L2 = [1,2.0,'a',"abcd",True,x] # a list containing different values

> Lists can be built dynamically (aka on the fly) using **`append`** and **`extend`**

In [None]:
L1.append(1)
L1.append(2)
print(L1)
L3 = ['a','b','c']
L1.extend(L3)
print(L1)

> * Values stored in lists are accessible by their index in the list. 
> * Lists maintain the ordering in which values were stored in them.
> * We use "[i]" to retrieve the i-th element in a list.

> Note that the first element in a list in Python has index 0.

In [None]:
L = ['a','b','c','d','e']
print(L[0])
print(L[1])

# We can access from the ends of lists as well.
print(L[-1])
print(L[-2])

# We can access chunks of a list to produce sub-lists.
print(L[:2])
print(L[2:4])

# There is a useful function for producing sequences of numbers.
print(range(10))
print(range(10)[0])
print(range(10)[1])
print(range(10)[9])

# The length of a list can be calculated using "len()"
print(len(range(10)))

### A quick note...

<p>
<img src="https://i.stack.imgur.com/IC6Xm.png"/>
</p>

The above image illustrates how references work in Python. 2 is an object, a and b are names. We can have different names pointing at the same object. And objects can exist without any name.

Assigning a variable only attaches a nametag. And deleting a variable only removes a nametag. If you keep this idea in mind, then the Python object model will never surprise you again.

Let's see another example...

In [None]:
B=L

In [None]:
B.append(9)
print(L)

confused? :-P Don't forget to play with the code yourself! 

Try to actually copy a list (such that, if you change the values in one, the other will NOT be affected). Feel free to search the web to find the solution.

In [None]:
# This post about it is excellent: https://stackoverflow.com/questions/2612802/how-to-clone-or-copy-a-list

#First way to copy a list
L_copy = L[:]  #this is equivalent:  L_copy = list(L) 

#Second way to copy a list, using
import copy
L_copy2 = copy.copy(L)

#Third way: it copies the list, and any sublists (or dictionaries, or whatever) that are inside!
L_copy3 = copy.deepcopy(L)

#Let's make some changes:
L_copy.append(10)

# Can you append another list?
L_copy2.append([7, 8, 9])

#or is it this that we want?
L_copy3.extend([7, 8, 9])


print("original =\t\t\t\t\t", L)   #the \t is just to put a "tab", to align the output...
print("used [:] and appended 10 =\t\t\t", L_copy)
print("used copy.copy and appended [7, 8, 9] =\t\t", L_copy2)
print("used copy.deepcopy and extended [7, 8, 9] =\t",L_copy3)


#If you are wondering what is the difference of a list (L_copy) and an numpy array (L_copy2) follow the link below:
# http://pythoncentral.io/the-difference-between-a-list-and-an-array/

Keep playing with lists

Suggestion: Try summing two lists

In [None]:
# ...

Suggestion: Try multiplying a list by a number (an integer and a float)

In [None]:
# ...

Suggestion: Try changing an element in a list

In [None]:
# ...

### DICTIONARIES

> **Dictionaries**, called "dicts" for short, allow you to store values by providing identifying keys. 

> Dicts are initialized using "{}".

#json is essentially the same format -> if you understand dictionaries, you'll understand json!  ;-)

In [None]:
D = {} # an empty dict
D2 = {'key1':1,'key2':"moose",4:5}
print(D2)
# Key-value pairs can also be defined like this
D2[6] = False
print(D2)
# values can be retrieved using their keys.
print(D2['key1'])
print(D2[6])

if not D2[6]:
    print("Dicts are fun.")
else:
    print("Dicts are not that fun.")
    
# The keys and values of dicts can be accessed as lists.
print("keys: "+str(D2.keys()))
print("values: "+str(D2.values()))

You can nest dictionaries.

In [None]:
D1={2800:"Kongens Lyngby", 2840: "Holte", 2000:"Frederiksberg", 2100:"Copenhagen Ø", 2150:"Nordhavn", 2200: "Copenhagen N"}
D={"dict1": D1, "dict2":D2}
print(D)

## For Statements

For-loops allow you to execute a section of code for several values in a list.

The syntax is like the syntax of if-statements.

In [None]:
X = range(5)
print(X)

for x in X:
    print(x)

for i in range(len(X)):
    # doubles the list element
    print(X[i]*2)

### `Break` and `continue`  Statements

You might face a situation in which you need to exit a loop completely when an external condition is triggered or there may also be a situation when you want to skip a part of the loop and start next execution.

Python provides break and continue statements to handle such situations and to have good control on your loop.

The **`break`** statement in Python terminates the current loop and resumes execution at the next statement, just like the traditional break found in C.

In [None]:
for letter in 'Python':
    if letter == 'h':
        break
    print('Current Letter: %c'%letter)

The **`continue`** statement in Python returns the control to the beginning of the while loop. The continue statement rejects all the remaining statements in the current iteration of the loop and moves the control back to the top of the loop.

In [None]:
for letter in 'Python':   
    if letter == 'h':
        continue
    print('Current Letter: %c'%letter)

## EXERCISES

1) Write a Python program to sum all the items in a list.

In [None]:
# ...

2) Write a Python function that takes a list of words and returns the length of the longest one.

In [None]:
# ...

3) Write a Python program to get the largest number from a list.

In [None]:
# ...

4) Write a Python program to print the numbers of a specified list after removing even numbers from it.

In [None]:
# ...

## Modules - Functions

Modules allow a programmer to write reusable code. Writing code with many functions is called programming "functionally".

Functions are defined using the key work **"def"**.

* Consider this example:

> First choose an initial value for x.

In [None]:
x = 0
for i in range(100):
    x+=i
print(x)

> What if we do this for a new initial value for x? 

> What if we use a different number instead of 100?

> We don't want to rewrite this for loop every time. 

> **Let's define a function.**

> We need to use the keyword def

In [None]:
def ForSum(x,y):
    for i in range(y):
        x+=i
    # "return" indicates what values to output
    return x

In [None]:
# Same calculation from above
print(ForSum(0,100))
print(ForSum(10,50))

Interestingly, pointers can store functions. This means that functions can be inputs to other functions.

In [None]:
F = ForSum
print(F(0,100))

def execute(funct,x):
    return funct(x,100)

print(execute(F,10))

# Now, just for fun:
print(F(F(F(10,100),50),1000))

#### Let's look at calculating an average using a Python list.
 


In [None]:
import time

def avg(X, pr=False): 
    sum = 0.0
    for x in X:
        sum += x
    if pr:
        print("...and the average is...", sum/len(X))
    return sum/len(X)

X = range(1000000) # 0,1,2,3,...,999999
startTime = time.clock()
Y = avg(X)
wallTime1 = time.clock() - startTime     #This is the time it takes to do an average calculation (of 1000000 list) with python lists
print(str(wallTime1)+" seconds using Python list.")   

You should understand what the pr=False in the def statement means

***It provides a default value (False) to the parameter pr.***

 If you don't even mention it, it will assume it is False

In [None]:
avg(X)

You can also use the function normally

In [None]:
avg(X, True)

### Defining Functions of your Own

In [None]:
#Function with parameter called in main

def happyBirthday(person):
    print("Happy Birthday to you!")
    print("Happy Birthday to you!")
    print("Happy Birthday, dear " + person + ".")
    print("Happy Birthday to you!")

def main():
    happyBirthday('Emily')
    happyBirthday('Andre')

main()

Let's see what will happen if the input variable has a default value...

In [None]:
#Function with parameter called in main

def happyBirthday(person = 'Emily'):
    print("Happy Birthday to you!")
    print("Happy Birthday to you!")
    print("Happy Birthday, dear " + person + ".")
    print("Happy Birthday to you!")

In [None]:
def main():
    happyBirthday()
    happyBirthday('Andre')

What would you expect to see???

In [None]:
main()

### `*args` and `**kwargs` in python 

`*args` is used to send a non-keyworded variable length argument list to the function. Here’s an example to help you get a clear idea:

In [None]:
def test_var_args(f_arg, *argv):
    print("first normal arg:", f_arg)
    for arg in argv:
        print("another arg through *argv :"+str(arg))

test_var_args('yasoob','python','eggs','test')

`**kwargs` allows you to pass keyworded variable length of arguments to a function. You should use `**kwargs` if you want to handle named arguments in a function. Here is an example to get you going with it:

In [None]:
def greet_me(**kwargs):
    if kwargs is not None:
        for key, value in kwargs.items():
            print("%s == %s" %(key,value))

In [None]:
greet_me(name="yasoob")

In [None]:
def test_args_kwargs(arg1, arg2, arg3):
    print("arg1:", arg1)
    print("arg2:", arg2)
    print("arg3:", arg3)

In [None]:
# first with *args
args = ("two", 3,5)
test_args_kwargs(*args)

In [None]:
# now with **kwargs:
kwargs = {"arg3": 3, "arg2": "two","arg1":5}
test_args_kwargs(**kwargs)

### Exercises

1) Write a Python function to find the Max of three numbers

In [None]:
# ...

2) Write a Python function that receives a "day of week" (0=sunday, 6=saturday), and returns whether it's weekeday or weekend

In [None]:
# ...

3) Write a Python function that receives two lists (representing two vectors) and returns their internal product.

In [None]:
# ...

4) Write a Python function that receives a list, and returns its average and standard deviation

In [None]:
# ...