# Introduction

**Python** is a general-purpose programming language that is becoming more and more popular for doing **data science**. Companies worldwide are using Python to harvest insights from their data and get a competitive edge. This module focuses on Python specifically for data science. 

What you will learn
- Python
- Specifically for Data Science
- Store data
- Manipulate data
- Tools for data analysis
- https://www.python.org

## Python Versions

There are currently two different supported versions of Python, 2.7 and 3.6. Python 3.x introduced many backwards-incompatible changes to the language, so code written for 2.7 may not work under 3.x and vice versa. For this class all code will use Python 3.6. 

https://wiki.python.org/moin/Python2orPython3

"Python 2.x is legacy, Python 3.x is the present and future of the language""


## Basic data types

Like most languages, Python has a number of basic types including integers, floats, booleans, and strings. These data types behave in ways that are familiar from other programming languages.



In [None]:
x = 3
print(type(x)) # Prints "<class 'int'>"
print(x)       # Prints "3"
print(x + 1)   # Addition; prints "4"
print(x - 1)   # Subtraction; prints "2"
print(x * 2)   # Multiplication; prints "6"
print(x ** 2)  # Exponentiation; prints "9"
x += 1
print(x)       # Prints "4"
x *= 2
print(x)       # Prints "8"
y = 2.5
print(type(y)) # Prints "<class 'float'>"
print(y, y + 1, y * 2, y ** 2) # Prints "2.5 3.5 5.0 6.25"

Note that unlike many languages, **Python** does not have unary increment **(x++)** or decrement **(x--)** operators. Python also has built-in type for complex numbers; you can find all of the details in the [documentation](https://docs.python.org/3/library/stdtypes.html#numeric-types-int-float-long-complex).

**Booleans**: Python implements all of the usual operators for Boolean logic, but uses English words rather than symbols (&&, ||, etc.):

In [None]:
t = True
f = False
print(type(t)) # Prints "<class 'bool'>"
print(t and f) # Logical AND; prints "False"
print(t or f)  # Logical OR; prints "True"
print(not t)   # Logical NOT; prints "False"
print(t != f)  # Logical XOR; prints "True" 

**Strings**: Python has great support for strings:


In [None]:
hello = 'hello'           # String literals can use single quotes
world = "world"           # or double quotes; it does not matter.
print (hello)             # Prints "hello"
print (len(hello))        # String length; prints "5"
hw = hello + ' ' + world  # String concatenation
print(hw)                 # prints "hello world"
hw12 = '%s %s %d' % (hello, world, 12)  # sprintf style string formatting
print(hw12)  # prints "hello world 12"

String objects have a bunch of useful methods, you can find a list of all string methods in the [documentation](https://docs.python.org/3/library/stdtypes.html#string-methods).





In [None]:
s = "hello"
print (s.capitalize())  # Capitalize a string; prints "Hello"
print (s.upper())       # Convert a string to uppercase; prints "HELLO"
print (s.rjust(7))      # Right-justify a string, padding with spaces; prints "  hello"
print (s.center(7))     # Center a string, padding with spaces; prints " hello "
print (s.replace('l', '(ell)'))  # Replace all instances of one substring with another;
                                 # prints "he(ell)(ell)o"
print ('  world '.strip())       # Strip leading and trailing whitespace; prints "world"

## Lists

A list is the Python equivalent of an array, but is resizeable and can contain elements of different types. You can find all about list in [documentation](https://docs.python.org/3/tutorial/datastructures.html#more-on-lists).

In [None]:
# Create the areas list
areas = ["hallway", 11.25, "kitchen", 18.0, "living room", 20.0, "bedroom", 10.75, "bathroom", 9.50]

# Print out areas
print(areas)

# Print out the type of areas
print(type(areas))

# Print out second element from areas
print(areas[1])

# Print out last element from areas
print(areas[-1])

# Print out the area of the living room
print(areas[-5])

# Add two new elements to the end of the list
areas.append("laundry")
areas.append(8.75)

# Print out the new list
print(areas)

# Remove the first two elements of list
del(areas[0:2])
print(areas)

# Remove the last two elements of list
del(areas[-2:])
print(areas)

**Loops**: You can loop over the elements of a list like this:

In [None]:
for area in areas:
    print(area)
    
# If you want access to the index of each element within the body of a loop, use the built-in enumerate function   
for idx, area in enumerate(areas):
    print ('#%d: %s' % (idx + 1, area))
    
# Print only the areas
areas_value = [area for idx, area in enumerate(areas) if not(idx % 2 == 0)]
print(areas_value) 

In [None]:
# Take care when copy things
# Watch out! list() and [:] might not copy properly if you put complex things in your lists.
# Please see http://www.python-course.eu/deep_copy.php

print(areas_value)

# copy by reference
y = areas_value     #copy by value should be y = list(areas_value)
y[1] = 10.0
print(areas_value,y)


## Functions and Packages

To leverage the code that brilliant Python developers have written, you'll learn about using functions, methods and packages. This will help you to reduce the amount of code you need to solve challenging problems!


In [None]:
# Maybe you already know the name of a Python function, but you still have to figure out how to use it. 
# Ironically, you have to ask for information about a function with another function: help(). 
# In IPython specifically, you can also use ? before the function name.
# To get help on the max() function, for example, you can use one of these calls:

?max #or # help(max)

In [None]:
# Create list areas
areas = [11.25, 18.0, 20.0, 10.75, 9.50]

# Print out the index of the element 20.0
print(areas.index(20.0))

# Print out how often 14.5 appears in areas
print(areas.count(14.5))

# Reverse the orders of the elements in areas
areas.reverse()

# Print out areas
print(areas)

In [None]:
# Python functions are defined using the "def" keyword. For example:

def sign(x):
    if x > 0:
        return 'positive'
    elif x < 0:
        return 'negative'
    else:
        return 'zero'

for x in [-1, 0, 1]:
    print(sign(x))
# Prints "negative", "zero", "positive"

In [None]:
# We will often define functions to take optional keyword arguments, like this:

def hello(name, loud=False):
    if loud:
        print('HELLO, %s!' % name.upper())
    else:
        print('Hello, %s' % name)

hello('Ivanovitch') # Prints "Hello, Ivanovitch"
hello('Silva', loud=True)  # Prints "HELLO, SILVA!"


**Modules** are pieces of code that other people have written to fulfill common tasks, such as genrating random numbers, performing mathematical operations, etc. 

The basic way to use a module is to add **import module_name** at the top of your code, and then using **module_name.var** to access functions and values with the name **var** in the module. 

For example, as a data scientist, some notions of geometry never hurt. Let's refresh some of the basics. For a fancy clustering algorithm, you want to find the circumference $C$ and area $A$ of a circle. When the radius of the circle is $r$, you can calculate $C$ and $A$ as:

\begin{equation*}
C   = 2 \pi r\\
A   = \pi r^{2}
\end{equation*}

In [None]:
# Import the math package
import math

# Definition of radius
r = 0.43

# Calculate C
C = 2 * math.pi * r

# Calculate A
A = math.pi * r ** 2

# Build printout
print("Circumference: " + str(C))
print("Area: " + str(A))

There is another kind of **import** that can be used if you only need certain functions from a module. These take the form **from module_name import var**, and the **var** can be used as if it were defined normally in your code. For example, to import only the **pi** constant from the previous example:

In [None]:
# Import the math package
from math import pi

print(pi)

You can imort a module or object under a different name using the **as** keyword. This is mainly used when a module or object has a long or confusing name. For example:

In [None]:
from math import sqrt as square_root

print(square_root(10))

## Numpy

Numpy (numeric Python) is a Python package to efficiently do data science. Learn to work with the Numpy array, a faster and more powerful alternative to the list, and take your first steps in data exploration.

- Alternative Python List: Numpy Array
- Calculations over entire array
- Easy and fast

You can read all about numpy datatypes in the [documentation](https://docs.scipy.org/doc/numpy/reference/).

In [None]:
height = [1.60, 1.90, 1.77]
weight = [50.30, 88.00, 101.00]
ratio = weight/height ** 2    #limitation of list

In [None]:
import numpy as np

np_height = np.array(height)
np_weight = np.array(weight)
np_ratio = np_weight/np_height**2
print(np_ratio)

In [None]:
# Numpy: Remarks
# First of all, Numpy arrays cannot contain elements with different types. 
# If you try to build such a list, some of the elments' types are changed to end up with a homogenous list. 
# This is known as type coercion.

x = np.array([1, "natal", False]) #numpy array contains only one type
print(x)

print (np.array([True, 1, 2]) + np.array([3, 4, False]))

y = height + weight
print(y)

w = np_height + np_weight
print(w)


Boolean logic is the foundation of decision-making in your Python programs. Learn about different comparison operators, how you can combine them with boolean operators and how to use the boolean outcomes in control structures.

In [None]:
#Numpy: Subsetting
print(np_height)

print(np_height > 1.7)

print(np_height[np_height > 1.7])

In [1]:
# 2D Numpy array 
# Create baseball, a list of lists (height,weight)
baseball = [[180, 78.4],
            [215, 102.7],
            [210, 98.5],
            [188, 75.2]]

# Import numpy
import numpy as np

# Create a 2D Numpy array from baseball: np_baseball
np_baseball = np.array(baseball)

# Print out the type of np_baseball
print(type(np_baseball))

# Print out the shape of np_baseball
print(np_baseball.shape)

# Print out the numpy 2D array
print(np_baseball)

<class 'numpy.ndarray'>
(4, 2)
[[ 180.    78.4]
 [ 215.   102.7]
 [ 210.    98.5]
 [ 188.    75.2]]


In [None]:
# Subsetting 2D Numpy Arrays
# If your 2D Numpy array has a regular structure, i.e. each row and column has a fixed number of values,
# complicated ways of subsetting become very easy.

# Print out the 4th row of np_baseball
print(np_baseball[3])

# Print out the entire second column of np_baseball
print(np_baseball[:,1])

# Print out height of 3rd player
print(np_baseball[2,0])

# Exercises

### Exercise 1

Using comparison operators, generate boolean arrays that answer the following questions:

- Which areas in my_house are greater than or equal to 18?
- You can also compare two Numpy arrays element-wise. Which areas in my_house are smaller than the ones in your_house?
   
Make sure to wrap both commands in print() statement, so that you can inspect the output.

In [None]:
# Create arrays
import numpy as np
my_house = np.array([18.0, 20.0, 10.75, 9.50])
your_house = np.array([14.0, 24.0, 14.25, 9.0])

# my_house greater than or equal to 18
# TODO

# my_house less than your_house
# TODO

### Exercise 2

Before, the operational operators like **<** and **>=** worked with **Numpy arrays** out of the box. Unfortunately, this is not true for the boolean operators **and**, or, and not.

To use these operators with Numpy, you will need **np.logical_and()**, **np.logical_or()** and **np.logical_not()**. Here's an example on the my_house and your_house arrays from before to give you an idea:

logical_and(your_house > 13, 
            your_house < 15)

  
Generate boolean arrays that answer the following questions:

 - Which areas in my_house are greater than 18.5 or smaller than 10?
    
 - Which areas are smaller than 11 in both my_house and your_house? 
    
Make sure to wrap both commands in print() statement, so that you can inspect the output.        

In [None]:
# Create arrays
import numpy as np
my_house = np.array([18.0, 20.0, 10.75, 9.50])
your_house = np.array([14.0, 24.0, 14.25, 9.0])

# my_house greater than 18.5 or smaller than 10
# TODO

# Both my_house and your_house smaller than 11
# TODO

### Exercise 3

- Adapt the for loop in the sample code to use **enumerate()**. 
- On each run, a line of the form **"room x: y"** should be printed, where **x** is the index of the list element and **y** is the actual list element, i.e. the area. Make sure to print out this exact string, with the correct spacing.



In [None]:
# areas list
areas = [11.25, 18.0, 20.0, 10.75, 9.50]

# TODO: Change for loop to use enumerate()
for a in areas :
    print(a)

### Exercise 4

Write a for loop that goes through each sublist of house and prints out the **x is y sqm**, where **x** is the name of the room and **y** is the area of the room.

In [None]:
# house list of lists
house = [["hallway", 11.25], 
         ["kitchen", 18.0], 
         ["living room", 20.0], 
         ["bedroom", 10.75], 
         ["bathroom", 9.50]]
         
# Build a for loop from scratch
# TODO