## The basics

- Indendation, not braces. Python uses whitespace to structure code instead of braces as in many other languages like R.

- A colon denotes the start of an indented code block

- Python statements do not need to be terminated by semicolons. Semicolons can be used however, to separate multiple statements on a single line

- Every number, string, data structure, function, class, module, and etc. exists in the Python interpreter in its own box which is a Python object. 

- Comments use # hashtags

- Functions are called using parentheses and passing arguments

- Almost every object in Python has methods that have access to the object's internal contents. 

- Functions can take both positional and keyword arguments

- When assigning `a` to a new variable `b`, any changes to a would actually reflect to b as well since both points to the same object. 

In [3]:
a = [1,2,3]

b=a

a.append(4)

b

[1, 2, 3, 4]

In [4]:
# To check whether an object is a particular instance of a type, use isinstance

a=5
isinstance(a,int)

True

In [9]:
a = 'foo'

a.format

<function format>

- A **module** is simply a .py file containing function and variable definitions along with such things imported from other .py files.

- Most objects in python are **mutable** e.g. the object or values that they contain can be changed. Lists, dicts, arrays, and classes are mutable. Tuples and strings are not. 

- VERY IMPORTANT. To enable floating point division, we must include the lines `from __future__ import division` otherwise division is integer based

In [11]:
c = """ 
This is a longer line that spans
multiple
lines"""

c

' \nThis is a longer line that spans\nmultiple\nlines'

Numbers can be converted to string using `str()` function

In [15]:
a = 5.6
s = str(a)

s

'5.6'

The backslash characher \ is an escape character, used to specify special characters like newline or unicode characters.

In [19]:
s = '12\\3'
print s

12\3


In [20]:
# Add 2strings together
a= 'this is the first half '
b='and this is the 2nd half'

a+b

'this is the first half and this is the 2nd half'

In [21]:
type(a+b)

str

### Exception Handling

Handling Python errors or exceptions gracefully is an important part of building robust programs. In apps, many functions only work on certain kinds of input. 

For example, suppose we want a version of float that fails gracefully, returning the input argument. We can do this by writing a function that encloses the call to `float` in a `try/except` block of a function. The code in the except part of the block will only be executed if `float(x)` raises an exception

In [24]:
def attempt_float(x):
    try:
        return float(x)
    except:
        return x
    
print attempt_float('hah')
print attempt_float(212)

hah
212.0


- A **tuple** is a 1D, fixed-length, immutable sequence of python objects. The easiest way to create one iwth a comma-separated sequence of values. When defining tuples in more complicated expressions, it's often necessary to enclose the values in parentheses

In [31]:
tup = 4,5,6
print tup

tup2 = (4,5,6), (7,8)
print tup2

# convert objects to tuple using tuple

print tuple([4,0,2])

(4, 5, 6)
((4, 5, 6), (7, 8))
(4, 0, 2)


- A **list** are variable-length and their contents are mutable. They are defined using square brackets [] or using the `list` type function

- Elements can be added to the end of the list with the `append` method. Using `insert` we can insert an element at a specific location (index) in the list

- The opposite operation to insert is `pop` which removes and returns an element at a specific index. Elements can be removed by value using `remove` which locates the first such value and removes it from the last.

In [33]:
b_list = list(('foo','bar','baz'))

b_list[1] = 'peekaboo'

print b_list

['foo', 'peekaboo', 'baz']


In [35]:
b_list.append('dwarf')

b_list

['foo', 'peekaboo', 'baz', 'dwarf', 'dwarf']

In [37]:
b_list.insert(0,'red')

b_list

['red', 'red', 'foo', 'peekaboo', 'baz', 'dwarf', 'dwarf']

In [40]:
b_list.pop(3)
b_list

['red', 'red', 'foo', 'dwarf']

In [42]:
b_list.remove('red')
b_list

['foo', 'dwarf']

In [43]:
'dwarf' in b_list

True

Using **extend** to append elements to an existing list, especially if you are building a large list is preferable

In [49]:
list_of_list = ((1,2,3,4,5),(1,2,3))

everything = []
for chunk in list_of_list:
    everything.extend(chunk)
    
everything

[1, 2, 3, 4, 5, 1, 2, 3]

**enumerate** allows us to keep track of the index of the current item. When indexing data, a useful pattern that uses enumerate is computing a `dict` mapping the values of a sequence to their locations in the sequence

In [50]:
some_list = ['foo','bar','baz']

mapping = dict((v,i) for i, v in enumerate(some_list))
mapping

{'bar': 1, 'baz': 2, 'foo': 0}

In [52]:
mapping = dict((i,v) for i,v in enumerate(some_list))
mapping

{0: 'foo', 1: 'bar', 2: 'baz'}

**zip** pairs up elements of a number of lists, tuples or other sequences to create a list of tuples. The number of elements is produces is determined by the shortest sequence

In [53]:
seq1 = ['foo','bar','baz']
seq2 = ['one','two','three']

zip(seq2, seq1)

[('one', 'foo'), ('two', 'bar'), ('three', 'baz')]

In [54]:
seq3 = [False, True]

zip(seq2, seq1, seq3)

[('one', 'foo', False), ('two', 'bar', True)]

Given a "zipped" sequence, zip can be applied to "unzip" the sequence by converting a list of rows into a list of columns.

The * in a function call is = to the following:

`zip(seq[0],seq[1],...,seq[len(seq) -1])`

In [58]:
pitchers = [('Nolan','Ryan'), ('Roger', 'Clemens'), ('Schilling','Curt')]

first_names, last_names = zip(*pitchers)

first_names

('Nolan', 'Roger', 'Schilling')

In [75]:
# reversed iterates over the elements of a sequence in reverse order

list(reversed(range(10)))

[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

### Dict

aka hash map or associative array. It's a collection of key-value pairs where key and value are python objects. One way to create one is by using curly brackets {} and using colons to separtae keys and values

- Elements can be accessed and inserted or set using the same syntax as accessing lements of a list. You can call a key

- The keys and values method give you lists of the keys and values. 

- keys of dict have to be immutable like scalars or tuples

In [60]:
d1 = {'a' : 'some value', 'b' : [1,2,3,4], 7: 'an int'}

d1

{7: 'an int', 'a': 'some value', 'b': [1, 2, 3, 4]}

In [71]:
d1.keys()

['a', 'b', 7]

In [62]:
print d1[7]
print d1['a']

an int
some value


In [63]:
7 in d1

True

In [65]:
d1[5] = 'some value'

d1

{5: 'some value', 7: 'an int', 'a': 'some value', 'b': [1, 2, 3, 4]}

In [69]:
d1['dummy'] = 'another value'

ret = d1.pop('dummy')
ret

'another value'

In [70]:
d1

{7: 'an int', 'a': 'some value', 'b': [1, 2, 3, 4]}

In [73]:
# Add to a dictionary or update keys/values using update mehtod

d1.update({'b':[1,2,3], 'new': 'haha'})

d1

{7: 'an int', 'a': 'some value', 'b': [1, 2, 3], 'new': 'haha'}

#### Default Values

It's very common to assign a default value to dict if the key exists:

In [None]:
if key in some_dict:
    value = some_dict[key]
else:
    value = default_value
    
value = some_dict.get(key, default_value) 

In [77]:
words = ['apple','bat','bar','atom','book']

by_letter = {}

for word in words: # For each word
    letter=word[0] # letter is the first letter of the word
    if letter not in by_letter: # if the first letter of the word is not in the list
        by_letter[letter] = [word] # The key is the first letter of the word along with the word
    else:
        by_letter[letter].append(word) # If it's already in the list, add the word to the dict key
        
by_letter

{'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}

The `setdefault` dict method is used for this purpose to append words per letter, otherwise leave blank []

In [79]:
# Rewritten

words = ['apple','bat','bar','atom','book']

by_letter = {}

for word in words: # For each word
    letter=word[0] # letter is the first letter of the word
    by_letter.setdefault(letter, []).append(word)
    
by_letter

{'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}

### Set

A set is an unordered collection of unique elements. A set can be created via the `set` function or using a set literal with curly braces { }. Sets support math set operatinos like union, intersection, difference, and symmetric difference

In [80]:
set ([2,2,2,1,1,1,1,3,3,3,3,4])

{1, 2, 3, 4}

In [81]:
{2,2,2,2,3,3,3,3,4,4,4,4,1}

{1, 2, 3, 4}

In [82]:
a = {1,2,3,4,5}
b= {3,4,5,6,7,8}

a | b # union (or)

{1, 2, 3, 4, 5, 6, 7, 8}

In [83]:
a & b # intersection

{3, 4, 5}

In [84]:
a - b # difference

{1, 2}

In [85]:
a ^ b # symmetric difference (xor)

{1, 2, 6, 7, 8}

### List, Set, and Dict Comprehensions

** List comprehensions** are one of the most popular Python language featurs. They allow us to form a new list by filtering elements of a collection and transforming the leements passing the filter in 1 statement.  They take the form:

`[expr for val in collection if condition]`

In [None]:
# Equal to:

result = []
for val in collection:
    if condition:
        result.append(expr)

In [87]:
# Filter condition can be omitted, leaving only the expression. For ex, given a list of strings
# we can filter out strings iwth length 2 or less
# then convert to uppercase

strings = ['a','as','bat','car','dove','python']

[x.upper() for x in strings if len(x) > 2]  # list comprehension.. apply uppercase for each element in string if the length > 2

['BAT', 'CAR', 'DOVE', 'PYTHON']

Set and dict comprensions are a natural extension producing sets and dicts in a similar way instead of lists. A dict comprehension looks like this:

`dict_comp = {key-expr: value-expr for value in collection if condition}`

In [93]:
# dict comp

loc_mapping = {index : val for index, val in enumerate(strings)}  # enumerate includes index and string value

loc_mapping

{0: 'a', 1: 'as', 2: 'bat', 3: 'car', 4: 'dove', 5: 'python'}

In [97]:
# Nested list comprehensions 

all_data = [['Tom','Billy','Jeff','Andy'],
            ['Susie','Laura','Annie','Jess']]

names_of_interest = []
for names in all_data:
    enough_es = [name for name in names if name.count('e') >=1]  # get names that have at least one 'e'
    names_of_interest.extend(enough_es)
    
names_of_interest

# 2 For loops (nested for loop... per each chunk, partition by chunk within chunk)

['Jeff', 'Susie', 'Annie', 'Jess']

### Functions

The primary and most important method of code organization and resuse in Python. Most programmers doing data analysis don't write enough functions. Functions are declared using the `def` keyword and returned from using the `return` keyword. 

- Can have multiple return statements
- If the end of a function is reached without encountering a return statement, None is returned. 
- Keyword arguments must follow any positional arguments of a function. They can be called upon any order
- Any variables assigned within a function, by default, are part of the local namespace. After the function is finished, the local namespace is destroyed.
- Assigning global vars within a function is possible, but those vars must be declared as a global using the `global` keyword

In [105]:
def func():  # upon calling func()
    a = [] # empty list is created 
    for i in range(5): # 5 elements are appended
        a.append(i) # a is destroyed when the function exists
        
print func()
print a

None
[]


In [106]:
a = []
def func():
    for i in range(5):
        a.append(i)
        
print a

[]


In [114]:
# Global vars

a = None

def bind_a_var():
    global a
    a = ['boo']
    
print bind_a_var()
print a

None
['boo']


In [121]:
# Returning multiple values

def f():
    a = 5
    b = 6
    c = 7
    return a,b,c # returns a tuple

print f() # a tuple
a,b,c = f()

print b

[5, 6, 7]
6


In [123]:
# Return a dictionary

def f():
    a = 5
    b = 6
    c = 7
    return {'a':a,'b':b,'c':c}

print f()

{'a': 5, 'c': 7, 'b': 6}


#### Functions as objects

Anyone who's worked with user-submitted survey data can expect messy results where we need to do whitespace stripping, removing punctuation symbols, proper capitalization, etc.

In [127]:
import re # import regular expression module

def remove_punctuation(value):
    return re.sub('[!#?]', '',value) # substitute the punctuations with nothing otherwise return value

clean_ops = [str.strip, remove_punctuation, str.title]  # title = capitolize each word

def clean_strings(strings, ops):
    result =[]
    for value in strings:
        for function in ops:
            value = function(value)
        result.append(value)
    return result

comments = ['  Good', ' Bad!', ' #OkAy dokey']

clean_strings(comments, clean_ops)

['Good', 'Bad', 'Okay Dokey']

This is a more functional pattern which enablse us to easily modify how strings are transformed. The clean_strings function is also more reusable. We can use functions are arguments to other functions.

We can use built-in map function which applies a function to a collection of some kind:

In [129]:
print map(remove_punctuation, comments)
print map(str.strip, comments)

['  Good', ' Bad', ' OkAy dokey']
['Good', 'Bad!', '#OkAy dokey']


### Lambda Functions

They are really just simple functions consisting of a single statement, the result of which is the return value. They are defined using the lambda keyword, which has no meaning other than "we are declaring an unnamed function" (or anonymous)

They are very convenient in data analysis because there are many cases where dat atransformation functions will take functions as arguments. It's more clear.

In [131]:
def short_func(x):
    return x * 2

# equivalent

a = lambda x: x*2
a(2)

4

In [133]:
def apply_to_list(some_list, f):
    return [f(x) for x in some_list]

ints = [4,0,1,5,6]
apply_to_list(ints, lambda x: x * 2) # for each element in list, apply x * 2

[8, 0, 2, 10, 12]

In [139]:
# We want to sort a collection of strings by the # of distinct letters in each string

strings = ['foo','card','bar','aaaa','abab','abab']
print list(strings)
print set(list(strings))
strings.sort(key=lambda x: len(set(list(x))))

print strings

['foo', 'card', 'bar', 'aaaa', 'abab', 'abab']
set(['aaaa', 'abab', 'foo', 'bar', 'card'])
['aaaa', 'foo', 'abab', 'abab', 'bar', 'card']
