# Python Language Essentials

A brief of basic content about Python language from syntax to bacic knowledge about function and object-oriented in python

Overview:
* The python interpreted
* The basic syntax
* Data structure and sequences
* Functions
* File and operating system

# The python interpreted

Python is an interpreted language. The Python interpreter runs a program by executing
one statement at a time

# The Basics

## Language Semantics
- Python use indent instead of braces. And it does need semi-colon at the end of line. But with multiple statement in the same line, it need semi-colon

In [3]:
a = 1; b = 2

- Everything is object

- Mutable

In [12]:
a = [1, 2, 3]
b = a

In [13]:
b[2] = 5
a

[1, 2, 5]

- Immutable: String and tuples does not support mutable

- Variable pass to function is an reference not value

In [14]:
def append_element(list, element):
    list.append(element)
    
data = [1, 2, 3]
append_element(data, 4)
data

[1, 2, 3, 4]

- Strongly typed: Not like Javascript or other language, number in different kind is cannot cast or auto convert

In [15]:
5 + '5'

TypeError: unsupported operand type(s) for +: 'int' and 'str'

In [22]:
isinstance(1, (int, float, str))

True

## Scalar Types

* number

* string

    To get the sub string we can use list and slice

In [24]:
s = "dadasdajveuc"
list(s)

['d', 'a', 'd', 'a', 's', 'd', 'a', 'j', 'v', 'e', 'u', 'c']

In [30]:
s[:10]

'dadasdajve'

- boolean

- datetime

In [31]:
from datetime import datetime, date, time

In [32]:
dt = datetime(2011, 10, 29, 20, 30, 21)
dt

datetime.datetime(2011, 10, 29, 20, 30, 21)

The **strftime** method formats a datetime as a string:

In [35]:
dt.strftime('%d/%m/%Y %H:%M')

'29/10/2011 20:30'

To string to datetime object use **strptime** 

In [37]:
datetime.strptime('20091031', '%Y%m%d')

datetime.datetime(2009, 10, 31, 0, 0)

## Control flow

- if else, if elif else

In [40]:
x = -1
if x < 0:
    print "It's negative"

It's negative


In [42]:
if x < 0:
    print 'Its negative'
elif x == 0:
    print 'Equal to zero'
elif 0 < x < 5:
    print 'Positive but smaller than 5'
else:
    print 'Positive and larger than or equal to 5'

Its negative


- for loops

In [44]:
sequence = [1, 2, None, 4, None, 5]
total = 0
for value in sequence:
    if value is None:
        continue
    total += value
total

12

- while loops

- pass

pass is the “no-op” statement in Python. It can be used in blocks where no action is to
be taken; it is only required because Python uses whitespace to delimit blocks:

- Exception handling: try .. except

- range and xrange

In [45]:
range(10)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [48]:
range(0, 20, 2)

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

xrange is recommended to use with large ranges. Because it return an generator, and the value is executed when the iterator is called

In [50]:
sum = 0
for i in xrange(10000):
    # % is the modulo operator
    if x % 3 == 0 or x % 5 == 0:
        sum += i
sum

0

- Ternary Expressions: **``` value = true-expr if condition else ```**

In [54]:
a = 4
result = 1 if a > 3 else a
result

1

# Data Structures and Sequences

Python’s data structures are simple, but powerful. Mastering their use is a critical part
of becoming a proficient Python programmer.

## Tuple

A tuple is a **one-dimensional**, **fixed-length**, **immutable** sequence of Python objects

In [55]:
tup = 4, 5, 6
tup

(4, 5, 6)

In [57]:
nested_tup = (4,5,6), (7,8)
nested_tup

((4, 5, 6), (7, 8))

Unpack tuple

In [58]:
tup = (2, 3, (4,5))
a,b,(c,d) = tup

In [59]:
d

5

Tuple methods

In [61]:
a = (1, 2, 2, 2, 3, 4, 2)
a.count(2)

4

## List

lists are **variable-length** and **mutable**

In [62]:
a_list = [2, 3, 7, None]

In [63]:
tup = ('foo', 'bar', 'baz')

In [65]:
b_list = list(tup)
b_list

['foo', 'bar', 'baz']

Adding and removing elements

In [66]:
b_list.append('dwarf')
b_list

['foo', 'bar', 'baz', 'dwarf']

In [67]:
b_list.insert(3, "more")
b_list

['foo', 'bar', 'baz', 'more', 'dwarf']

Sorting

In [70]:
a = [7, 2, 5, 1, 3]
a.sort()
a

[1, 2, 3, 5, 7]

In [72]:
b = ['saw', 'small', 'He', 'foxes', 'six']
b.sort(key=len)
b

['He', 'saw', 'six', 'small', 'foxes']

Binary search and maintaining a sorted list

In [73]:
import bisect

In [74]:
c = [1, 2, 2, 2, 3, 4, 7]

In [75]:
bisect.bisect(c, 2)

4

In [76]:
bisect.bisect(c, 5)

6

Slicing

In [79]:
seq = [7, 2, 3, 7, 5, 6, 0, 1]
seq[1:5]

[2, 3, 7, 5]

## Built-in Sequence Functions

### enumerate

In [80]:
some_list = ['foo', 'bar', 'baz']
mapping = dict((v, i) for i, v in enumerate(some_list))
mapping

{'bar': 1, 'baz': 2, 'foo': 0}

### sorted

### zip

zip “pairs” up the elements of a number of lists, tuples, or other sequences, to create
a list of tuple

In [81]:
seq1 = ['foo', 'bar', 'baz']
seq2 = ['one', 'two', 'three']
zip(seq1, seq2)

[('foo', 'one'), ('bar', 'two'), ('baz', 'three')]

### reversed

## Dict

In [82]:
d1 = {'a': 'some_value', 'b':[1,2,3,4]}
d1

{'a': 'some_value', 'b': [1, 2, 3, 4]}

In [83]:
d1[7] = "value"

In [84]:
d1

{7: 'value', 'a': 'some_value', 'b': [1, 2, 3, 4]}

In [85]:
'a' in d1

True

In [86]:
d1.values()

['some_value', [1, 2, 3, 4], 'value']

In [87]:
d1.keys()

['a', 'b', 7]

Creating dict from sequences

In [94]:
mapping = dict(
    zip(
        range(5), 
        reversed(range(5))
    )            
)
mapping

{0: 4, 1: 3, 2: 2, 3: 1, 4: 0}

Default value

In [95]:
words = ['apple', 'bat', 'bar', 'atom', 'book']
words

['apple', 'bat', 'bar', 'atom', 'book']

In [96]:
by_letter = {}

In [101]:
for word in words:
    letter = word[0]
    print letter
    if letter not in by_letter:
        by_letter[letter] = [words]
    else:
        by_letter[letter] = word
by_letter

a
b
b
a
b


{'a': 'atom', 'b': 'book'}

## Valid dict key types

While the values of a dict can be any Python object, the keys have to be immutable
objects like scalar types (int, float, string) or tuples (all the objects in the tuple need to
be immutable, too). The technical term here is **hashability**

In [102]:
hash('string')

-9167918882415130555

In [103]:
hash((1, 2, (2, 3)))

1097636502276347782

In [104]:
hash((1, 2, [2, 3])) # fails because lists are mutable

TypeError: unhashable type: 'list'

## Set

A set is an unordered collection of unique elements. You can think of them like dicts,
but keys only, no values

In [105]:
set([2,2,2,1,3,3])

{1, 2, 3}

In [106]:
a = {1,2,3,4,5}
b = {3,4,5,6,7,8}

In [112]:
a | b # union (or)

{1, 2, 3, 4, 5, 6, 7, 8}

In [113]:
a & b # intersection (and)

{3, 4, 5}

In [114]:
a - b # difference

{1, 2}

In [115]:
a ^ b # symmetric difference (xor)

{1, 2, 6, 7, 8}

check if a set is a subset

In [116]:
a_set = {1, 2, 3, 4, 5}

In [117]:
{1,2,3}.issubset(a_set)

True

## List, Set, and Dict Comprehensions

List comprehensions are one of the most-loved Python language features. They allow
you to concisely form a new list by filtering the elements of a collection and transforming
the elements passing the filter in one conscise expression. They take the basic form:

```[expr for val in collection if condition]```

This is equivalent to the following for loop:
``` 
result = []
for val in collection:
if condition:
    result.append(expr)
```

In [124]:
strings = ['a', 'love', 'dog', 'film', 'dataset']
[x.upper() for x in strings if len(x) > 2]

['LOVE', 'DOG', 'FILM', 'DATASET']

# Functions

In [129]:
def add(x, y, z = 1.5):
    if z > 1:
        return z * (x + y)
    else:
        return x + y 
add(4,5)

13.5

There is no issue with having **multiple return** statements. If the end of a function is
reached without encountering a return statement, **None** is returned.

## Namespaces, Scope, and Local Functions



Functions can access variables in two different scopes: **global** and **local**. An alternate

and more descriptive name describing a variable scope in Python is a **namespace**

In [134]:
def func():
    a = []
    for i in range(5):
        a.append(i)
    return a
func()

[0, 1, 2, 3, 4]

Assigning global variables within a function is possible, but those variables must be
declared as global using the **global** keyword:

In [136]:
a = None
def func():
    global a 
    a = 'valie'
    return a
func()
print a

valie


We should not use **global** keyword alot beacause its probably a sign that some object-oriented programming is in order

Functions can be declared anywhere, and there is no problem with having local func-
tions that are dynamically created when a function is called

In [137]:
def outer_function():
    def inner_function():
        pass
    pass

## Returning Multiple Values

In [140]:
def f():
    a = 5
    b = 6
    c = 7
    return a, b, c
a,b,c = f()

In [143]:
result = f()
type (result)

tuple

## Functions Are Objects

Since Python functions are objects, many constructs can be easily expressed that are
difficult to do in other languages. Suppose we were doing some data cleaning and
needed to apply a bunch of transformations to the following list of strings:

In [144]:
states = ['    Alabama ', 'Georgia!', 'Georgia', 'georgia', 'FlOrIda', 'south   carolina##', 'West virginia?']

This messy data can get when we work with user-servey.

In [146]:
import re #Regular expression

In [151]:
def cleaning_strings(strings):
    result = []
    for value in strings:
        value = value.strip()
        value = re.sub('[!#?]', '', value) #remove punctation
        value = value.title()
        result.append(value)
    return result
cleaning_strings(states)

['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South   Carolina',
 'West Virginia']

An alternate approach that you may find useful is to make a list of the operations you
want to apply to a particular set of strings:

In [153]:
def remove_punctuation(value):
    return re.sub('[!#?]', '', value)

In [154]:
clean_ops = [str.strip, remove_punctuation, str.title]

In [168]:
def clean_strings(strings, ops):
    result = []
    for value in strings:
        for op in ops:
            value = op(value)
        result.append(value)
    return result
clean_strings(states, clean_ops)

['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South   Carolina',
 'West Virginia']

**map** function

In [170]:
map(remove_punctuation, states)

['    Alabama ',
 'Georgia',
 'Georgia',
 'georgia',
 'FlOrIda',
 'south   carolina',
 'West virginia']

## Anonymous (lambda) Functions

Python has support for so-called anonymous or **lambda functions**, which are really just
simple functions consisting of a single statement, the result of which is the return value

In [175]:
def multi(x):
    return x * x
result = lambda x: x * x

I usually refer to these as lambda functions in the rest of the book. They are especially
convenient in data analysis because, as you’ll see, there are many cases where data
transformation functions will take functions as arguments. It’s often less typing (and
clearer) to pass a lambda function as opposed to writing a full-out function declaration
or even assigning the lambda function to a local variable. For example, consider this
silly example:

In [176]:
def apply_to_list(some_list, f):
    return [f(x) for x in some_list]
ints = [4,0,1,5,6]
apply_to_list(ints, lambda x: x*2)

[8, 0, 2, 10, 12]

## Closures: Functions that Return Functions

a closure is any dynamically-generated function
returned by another function. The key property is that the returned function has access
to the variables in the local namespace where it was created

In [177]:
def make_closure(a):
    def closure():
        print('I know the scret: %d' % a)
    return closure

In [179]:
closure = make_closure(5)
closure()

I know the scret: 5


## Extended Call Syntax with *args, **kwargs

a tuple **args** and dict **kwargs**

In [183]:
def say_hello_then_call(f, *args, **kargs):
    print 'args is', args
    print 'kargs is', kargs
    print("Hello! Now I'm going to call %s" % f)
    return f(*args, **kargs)
def g(x, y, z=1):
    return (x + y)/z

say_hello_then_call(g, 1, 2, z=1)

args is (1, 2)
kargs is {'z': 1}
Hello! Now I'm going to call <function g at 0x7fda2c5d29b0>


3

## Currying: Partial Argument Application

Currying is a fun computer science term which means deriving new functions from
existing ones by partial argument application. For example, suppose we had a trivial
function that adds two numbers together:

In [184]:
def add_number(x, y):
    return x + y

In [187]:
add_five = lambda y: add_number(5, y)
add_five

<function __main__.<lambda>>

The second argument to add_numbers is said to be curried

In [188]:
from functools import partial
add_five = partial(add_number, 5)
add_five

<functools.partial at 0x7fda2c5bfa48>

???

## Generators

Having a consistent way to iterate over sequences, like objects in a list or lines in a file,
is an important Python feature. This is accomplished by means of the iterator proto-
col, a generic way to make objects iterable. For example, iterating over a dict yields the
dict keys:

In [190]:
some_dict = {'a': 1, 'b': 2, 'c': 3}

In [191]:
for key in some_dict:
    print key

a
c
b


When you write for key in **some_dict** , the Python interpreter first attempts to create
an iterator out of **some_dict** :

In [192]:
dict_iterator = iter(some_dict)

In [193]:
dict_iterator

<dictionary-keyiterator at 0x7fda2c5bfd60>

In [194]:
list(dict_iterator)

['a', 'c', 'b']

A generator is a simple way to construct a new iterable object. Whereas normal func-
tions execute and return a single value, generators return a sequence of values lazily,
pausing after each one until the next one is requested

In [197]:
def squares(n = 10):
    for x in xrange(1, n + 1):
        print 'Generating square from 1 to %d' % (n ** 2)
        yield i ** 2
        
gen = squares()

In [198]:
gen

<generator object squares at 0x7fda2c5eca50>

In [199]:
for x in gen:
    print x

Generating square from 1 to 100
99980001
Generating square from 1 to 100
99980001
Generating square from 1 to 100
99980001
Generating square from 1 to 100
99980001
Generating square from 1 to 100
99980001
Generating square from 1 to 100
99980001
Generating square from 1 to 100
99980001
Generating square from 1 to 100
99980001
Generating square from 1 to 100
99980001
Generating square from 1 to 100
99980001


## Generator expresssions

A simple way to make a generator is by using a generator expression. This is a generator
analogue to list, dict and set comprehensions; to create one, enclose what would other-
wise be a list comprehension with parenthesis instead of brackets:

In [210]:
gen = (x ** 2 for x in xrange(100))
gen

<generator object <genexpr> at 0x7fda2c5ecbe0>

In [211]:
def _make_gen():
    for x in xrange(100):
        yield x ** 2

gen = _make_gen()
gen

<generator object _make_gen at 0x7fda2c5eca00>

Generator expressions can be used inside any Python function that will accept a gen-
erator:

In [209]:
sum = (x ** 2  for x in xrange(100))
sum

<generator object <genexpr> at 0x7fda2c5ec9b0>

In [212]:
dict((i, i **2) for i in xrange(5))

{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

## itertools module

The standard library **itertools** module has a collection of generators for many common
data algorithms. For example, **groupby** takes any sequence and a function; this groups
consecutive elements in the sequence by return value of the function. Here’s an exam-
ple:

In [215]:
import itertools
first_letter = lambda x: x[0]
names = ['Alan', 'Adam', 'Wes', 'Will', 'Albert', 'Steven']

In [216]:
for letter, names in itertools.groupby(names, first_letter):
    print letter, list(names) # name is a generator

A ['Alan', 'Adam']
W ['Wes', 'Will']
A ['Albert']
S ['Steven']
