# Python Basics

Recap of [Python@Codecademy](https://www.codecademy.com/en/tracks/python) in case you missed it.

# Frequent Asked Questions

## Why Python

* Easy to learn, develop, and extend
* Less nonsense characters
    * Comma at the ends
    * (){}
* Indentation-based block syntax
* Community support
* Works (pretty well) as commandline script, web backend, and data analysis
* The best glue language !
* High quality documentation (well formed, handy)

## Python 2 or 3

7th year in the transition. 

Python 3 provides ...

* New built-in Modules
* str, byte = unicode, str
* New Concept
    * Generators instead of lists
    * Extend yield with asyncio
* Function changes (incompatible)

Python 2 and 3 can coexist on the same host.

## IDE

* IDE is nice, as long as you know how to work with Text Editors when they're not available.

* IDLE
    * Built-in Python GUI on some distributions
    * Almost shell + python
* IPython
    * Written in python, IPython provides an interactive interface that supports Python, Julia, Haskell, Cython, R, Octave, Bash, Perl, Ruby, ... But the team surely focus on Python.
    * And much more...

In [1]:
# Global configs & imports
import gzip, csv, os, os.path

PATH = {
    "data": "../data"
}

# Language Syntax

## Defining Blocks

Unlike C, Python relies on indentation to define program blocks, and it has to be exactly the same from the same block. Mixing tab (\t) and spaces is one of the most common source to syntax errors.

In [2]:
def hello_world(s=None):
    print(s or "Hello World!")

In [3]:
hello_world()

Hello World!


In [4]:
hello_world("夜露死苦！")

夜露死苦！


## Some Naming Conventions

* `CamelCase` for class names, and `snake_case`otherwise
But you can have your call !
* Follow the standard (PEP-8): 
    * 4 spaces, no tab
    * Cases, newlines, ...

## Pass

`pass` has no 

# Variables

All variables in Python are interpreted as Objects (unlike boxing / unboxing in Java). Builtin types are:

* *None*
* *bool*
* *iterator* and *generator*
* Numeric Types
    * *int*
    * *long* (built-in support for BigInt)
    * *float*
    * *complex*
* Sequence Types
    * *str*, *unicode* (for Python2)
    * *tuple*
    * *list*
    * *bytearray*
    * *buffer*
    * *xrange*
* *set* and *frozenset*
* *dict*

In [5]:
print(type(3), int(3).bit_length())

(<type 'int'>, 2)


In [6]:
print(
    type(3) is int, 
    isinstance(3, int))

(True, True)


## str

String, ASCII in Python2, can hode unicode string in Python3.

In [7]:
s = "Hello Bar"  # single quotes work as well

In [8]:
print(s.replace("Hello", "Foo"))
print(s.lower())
print(s.upper())
print(s.startswith("H"))
print(s.endswith("ar"))

Foo Bar
hello bar
HELLO BAR
True
True


In [9]:
print("hello" in "Hello world")
print("hello" in "Hello world".lower())
print("Hello" + " World")
print("Abc" * 3)

False
True
Hello World
AbcAbcAbc


Strings can also be defined with 3 single or double quotes. It's easier to have multi-line string this way.

In [10]:
s = """Hello World!
And Bar"""
print(s)

Hello World!
And Bar


Since string declaration alone can be a valid statement, it is also used as comment at times. We'll talk about this usecase later.

## List

* list(...) creates a shallow copy from ..., while ... can be any sequence type or generator
* Shorthand as []

In [11]:
l = [0,1,2,3,4,5] # A List with 6 elements
print(l)
print(len(l))

[0, 1, 2, 3, 4, 5]
6


In [12]:
print(list(l)) # Shallow copy a list
print(list(s))

[0, 1, 2, 3, 4, 5]
['H', 'e', 'l', 'l', 'o', ' ', 'W', 'o', 'r', 'l', 'd', '!', '\n', 'A', 'n', 'd', ' ', 'B', 'a', 'r']


## Tuple

Immutable type, elements are ordered in `tuple` as in `list`.
Defined with `tuple()` or `()`.

In [13]:
print((1,2,3,4,5)) # Shorthand
print(tuple(l)) # Converts a list to tuple
print(tuple([1,2,3])) # Long syntax

(1, 2, 3, 4, 5)
(0, 1, 2, 3, 4, 5)
(1, 2, 3)


In [14]:
print(tuple()) # Empty tuple
print((1,)) # Extra comma for 1-element tuple

()
(1,)


### Variable Assignment

Python converts comma-separated variables to tuples (well, at times), and allows assignment this way as long as their length matches. The same technique also works for `list` elements.

In [15]:
a = 3
b = 6
a, b = b, a
print("a=%d, b=%d" % (a, b))

a=6, b=3


## Indexing

Indexing retrieves an element from `list` or `dict`, or extract a character (str with length = 1) from a `str`. Python uses 0-based indexing, and the index for first element in `list` is 0.

In [16]:
# l = [0,1,2,3,4,5]
print(l[0])
print(l[1])

0
1


Indexing also works for negative integers to count from back. When counting from back, it starts with -1 for there's no -0 for integers. Note: there are +0 and -0 in flow point number.

In [17]:
print(l[-1])

5


## Slicing

Slicing works pretty much like indexing, but it targets spaces between elements, and returns the same variable type.

In [18]:
# s = "Hello Bar"
print(s[0]) # H
print(s[0:2]) # He
print(s[1:2]) # e

H
He
e


In [19]:
# s = "Hello Bar"
print(s[:2]) # He
print(s[:-2]) # Hello B
print(s[2:]) # llo Bar

He
Hello World!
And B
llo World!
And Bar


In [20]:
# l = [0, 1, 2, 3, 4, 5]
print(l[:])
print(l[0:3])
print(l[3:])

[0, 1, 2, 3, 4, 5]
[0, 1, 2]
[3, 4, 5]


In [21]:
l[0:2] = [6, 7]
print(l)

[6, 7, 2, 3, 4, 5]


## Dict

`dict` takes a hashable type as key, and can hold any Python object as its value. 
Elements in `dict` are **unordered**.

In [22]:
d = {"name": "John", "email": "john@example.com"}

In [23]:
d['name']

'John'

In [24]:
d['name'] = "John Doe"
print(d)

{'name': 'John Doe', 'email': 'john@example.com'}


In [25]:
'name' in d # Search if key `name` exists

True

In [26]:
try:
    print(d['no_such_key'])
except Exception as e:
    print(type(e))

<type 'exceptions.KeyError'>


In [27]:
d.get('no_such_key', 'Not Exist')

'Not Exist'

## Set

Takes only hashable types as its members.

In [28]:
set([1,2,2,3,3,3])

{1, 2, 3}

In [29]:
{1,2,2,3,3,3}

{1, 2, 3}

## Trailing Comma in Builtin Collections

Python allows trailing comma (,) on tuple, list, dict, ... declarations.

In [30]:
[1,2,3] == [1,2,3,]

True

In [31]:
users = [
    {
        "name": "John",
        "age": 13,
    },
    {
        "name": "Eric",
        "age": 31,
    },
]

## str Formatting

Strings can be formatted in 2 different ways:

In [32]:
print("User `%s` spent %03.2f on %d services" %
      ("Nobody", 2.4321, 6.0))

User `Nobody` spent 2.43 on 6 services


In [33]:
print("The %(order_literal)s method is %(more_or_less)s flexible" % {
        'order_literal': 'second',
        'more_or_less': 'more',
        'extra_variable': 'not_used',
    })

The second method is more flexible


## Variable Scope

There are 4 tiers of variable scope, and they were searched in order:

* Local
* Enclosing (Non-local)
* Global (Module)
* Builtin

### Masking Outer Values
Assignments in the inner scope masks outer values in its scope. 

In [34]:
value = "outmost"

def test_outer():
    def test_inner():
        value = "inner"
    
    print("before innter: %s" % value)
    test_inner()
    print("after innter: %s" % value)

test_outer()
print("Eventually: %s" % value)

before innter: outmost
after innter: outmost
Eventually: outmost


### Changing Outer Variables

To change values in outer scopes, use keyword `nonlocal` (Python3 only) / `global` (Python2+) matching its occurance.

In [35]:
value = "outmost"

def test_outer():
    def test_inner():
        global value  # makes all the differences
        value = "inner"
        
    print("before innter: %s" % value)
    test_inner()
    print("after innter: %s" % value)

test_outer()
print("Eventually: %s" % value)

before innter: outmost
after innter: inner
Eventually: inner


Alternatively, you can host the value in a mutable container (`dict`, `list`, ...), and manipulate that container object via its methods.

## Operators

### Assignment

* `=`
* `+=`, `-=`, `*=`, `/=`, `%=`, `**=`

### Equality

* `==`
* `!=`

### Numerical

* `+`, `-`, `*`, `%`
* `/` (division)
    * **In Python2**, `/` means truncate division if both operands are `int`, and true division otherwise. `//` is truncate division, and the return value is `int` if and only if both operands are `int`.
    * **In Python3**, `/` always means true division and returns `float`, while `//` truncate division and returns `int`.
* `\**`: Power

### Comparison

* `>`, `>=`, `<`, `<=`
* Can be chained, eg., `3 >= x >= 0`

### Binary

* `and`: If Left is True, return Right; else return Left
* `or`: If Left is True, return Left; else return Right
* `not`: lower precedence than `and` and `or`

### Misc

* `in`: Searches whether value exists in collection (`list`, `set`), substring exists in `str`, or key exists in `dict`.
* `is`: Type comparison, commonly `var is None` or `type(var) is None`.
* `... if ... else ...`: Mimics the ternary operator ( ... ? ... : ...) in C. 

In [36]:
# Precedence matters
not True or True and False

False

# Functions

Function is how we compose related operations together, so they can be reused effectively, and it's easier for human to read. **SOURCE CODES ARE FOR HUMAN**. Like variables, Functions are also objects and thus have their own properties and methods.

In [37]:
def func():
    """
    Function documents, usually for unittests or automatic documentation.
    """
    
    def inner():
        """
        A nexted functions.
        """
        print("Inner function called")
    
    return inner

ret_val = func()
print(type(ret_val))
ret_val()

print(ret_val.__doc__)

<type 'function'>
Inner function called

        A nexted functions.
        


## Arguments

When defining a function (with keyword `def`), you can define positional and named arguments. Only named ones can be optional, and they must appear after required ones. Python also supports default values to arguments, and those with default values should appear after the others.

In [38]:
def func(var1, var2, named_var1, named_var2=None):
    print("Variables: var1=%s, var2=%s, named_var1=%s, named_var2=%s" % (var1, var2, named_var1, named_var2))

In [39]:
func(1,2,named_var1=3)

Variables: var1=1, var2=2, named_var1=3, named_var2=None


In [40]:
func(1, 2, 3, 4)

Variables: var1=1, var2=2, named_var1=3, named_var2=4


In [41]:
func(1, 2, named_var2=4, named_var1=3)

Variables: var1=1, var2=2, named_var1=3, named_var2=4


### Gotcha

Use only immutable types on default values. Mutable ones are only initiated once and may surprise you.

In [42]:
def func(new_value, old_list=[]):
    old_list.append(new_value)
    print("new_value: %s, list: %s" % (new_value, old_list))

func(1)
func(2)
func(3, [])
func(4)

new_value: 1, list: [1]
new_value: 2, list: [1, 2]
new_value: 3, list: [3]
new_value: 4, list: [1, 2, 4]


As an alternative, have None in the declaration, and replace it with you default value in function body.

In [43]:
def func(new_value, old_list=None):
    old_list = old_list or []
    old_list.append(new_value)
    print("new_value: %s, list: %s" % (new_value, old_list))

func(1)
func(2)
func(3, [])
func(4)

new_value: 1, list: [1]
new_value: 2, list: [2]
new_value: 3, list: [3]
new_value: 4, list: [4]


## Packing Arguments

Python function syntax allows \* and \*\* in front of variable names. As a common practice, we like to define `function(\*args, \*\*kwargs)`. When invoked, `\*args` will hold positional arguments, and `\*\*kwargs` named ones, unless its value mapped to other variable names (to that function).

In [44]:
def test(*args, **kwargs):
    print("args: %s, kwargs: %s" % (args, kwargs))

In [45]:
test(1,2,3)

args: (1, 2, 3), kwargs: {}


In [46]:
test(1,2,3,a="alpha", b="beta")

args: (1, 2, 3), kwargs: {'a': 'alpha', 'b': 'beta'}


In [47]:
def test(number, *args, **kwargs):
    print("number: %d, \nargs: %s, \nkwargs: %s" % (number, args, kwargs))

In [48]:
test(3,4,5)

number: 3, 
args: (4, 5), 
kwargs: {}


## Unacking Arguments

Vice versa, we can prepend \* or \*\* to `list` and `dict` repectively, and it works as if we have them as individual positional and named arguments in that function call.

In [49]:
def test(name, subject, score):
    print("name: %s, subject: %s, score: %.2f" % (name, subject, score))

In [50]:
test("John", "History", 0)

name: John, subject: History, score: 0.00


In [51]:
test(*["Cliff", "Math", 100.1])

name: Cliff, subject: Math, score: 100.10


In [52]:
test(**{'name': "Paris", 'subject': "IQ", 'score': 0})

name: Paris, subject: IQ, score: 0.00


In [53]:
test(*["Confucius", "Confucism"], **{"score": 47.0})

name: Confucius, subject: Confucism, score: 47.00


## Some Builtin Functions

* `len`, `type`
* `print`
    * function in Python3, keyword in Python2
* `max`, `min`, `abs`
* `sum`
* `all`, `any`
* `map`, `reduce`
* `filter`
* `ascii`, `ord`
* `isinstance`, `issubclass`
* `input` (`raw_input` in python2)
* `zip`

type `help(func)` in IPython to access help page.

# Flow Control

## If ... elif .... else

In [54]:
if 3 > 5:
    print("3 > 5")
elif 3 < 5:
    print("3 < 5")
else:
    print("3 == 5")

3 < 5


In [55]:
if 5 > 3:
    print("5 > 3")
elif 4 > 3:
    print("4 > 3")
else:
    print("Nah")

5 > 3


## for ... in ...

In [56]:
for i in [1,2,3]:
    print(i)

1
2
3


In [57]:
for i in range(3):
    print(i)

0
1
2


In [58]:
d = {"name": "John", "email": "john@example.com"}
for key in d:
    print(key)

name
email


In [59]:
for key, value in d.items():
    print("%s => %s" % (key, value))

name => John
email => john@example.com


In [60]:
for i in range(5):
    pass
else:
    print("Finished iteration")

Finished iteration


In [61]:
for i in range(5):
    if i < 2:
        continue
    elif i > 3:
        break
    print(i)
else:
    print("Finished iteration")

2
3


## while ...

In [62]:
i = 0
while i < 3:
    print(i)
    i += 1
else:
    print("Finished iteration")

0
1
2
Finished iteration


In [63]:
i = 0
while i < 10:
    print(i)
    i += 1
    if i > 3:
        break
else:
    print("Finished iteration")

0
1
2
3


# List Comprehensions

* Shorthand representations of `list`, `dict` and `set`

In [64]:
[(i, j) for i in range(3) for j in range(3,5)]

[(0, 3), (0, 4), (1, 3), (1, 4), (2, 3), (2, 4)]

In [65]:
[i for i in range(10) if i % 2 == 0]

[0, 2, 4, 6, 8]

In [66]:
users = [
    {
        "name": "John",
        "age": 13,
    },
    {
        "name": "Eric",
        "age": 31,
    },
]

print({u['name']: u['age'] for u in users})


{'John': 13, 'Eric': 31}


## Gotcha

Nested list prehensions can be slow.

In [78]:
# Load sample data with 1M lines (gzipped)

filename = os.path.join(PATH['data'], "large-one-column.csv.gz")
with gzip.open(filename, "rt") as fp:
    haystack = list(csv.DictReader(fp))

C:\Users\kenneths\python\pytalk\data\large-one-column.csv.gz


ValueError: Invalid mode ('rtb')

In [69]:
def nested_iter(haystack):
    """
    Iterates over haystack for len({item['title']}) +1 times
    """
    return {
        title: [item for item in haystack if item['title'] == title] 
        for title in 
            {item['title'] for item in haystack}
    }

%timeit nested_iter(haystack)

NameError: global name 'haystack' is not defined

In [None]:
def direct_iter(haystack):
    output = {}
    
    for item in haystack:
        output.setdefault(item['title'], []).append(item)
    
    return output

%timeit direct_iter(haystack)

# File Operation

## `open`

# Class and Objects

Python is an OO language, that is, everything in Python is a Object. Variables, modules, and functions are all objects. Every object inherits from `object`. 

There is no scope as public, protected, or private in Python Classes. 

Methods are *bound* functions, and Python automatically appends proper references (object or class) as the first positional parameter to them for functions defined as instance or class method.

In [79]:
class SampleObject(object):
    """
    Inherits from `object`
    """
    
    # Holds static (class) attributes
    
    def __init__(self):
        """
        self means the instance. 
        """
        
        # Define instance attributes here
        
        print(self)
    
    @classmethod
    def class_method(cls):
        """
        cls is SampleObject, a `type` object
        """
        print(cls)
    
    @staticmethod
    def static_method():
        """
        No access to object instance / class
        """
        pass

so = SampleObject()
so.class_method()

<__main__.SampleObject object at 0x0000000004108CF8>
<class '__main__.SampleObject'>


In [None]:
class AnotherObject(SampleObject):
    pass

ao = AnotherObject()
ao.class_method()

# Module & Packages

In Python, most functionalities are organized in modules and packages. Modules are python files (.py), while packages folders with at least a `__init__.py` in it.

* chef/
    * metadata.rb
* doc/
    * install.md
    * netowrk.md
* src/
    * lib/
        * network/
            * \_\_init\_\_.py
            * tcp.py
        * \_\_init\_\_.py
        * s3copy.py
    * s3copy.py
* tests/

For a project with such folder structure, cwd = **src**, and we run "python s3copy.py", then `lib`, `lib.network` are packages visible to the python process, and `lib.netowrk.tcp` a valid module.

When you `import lib.network.tcp`, Python will open "lib/netowrk/tcp.py", parse its content, and wrap it into a module object in current (s3copy.py) scope as `lib.netowrk.tcp`. Alternatively, you can call `import lib.network.tcp as tcp` to assign it to a different name (`tcp` instead of `lib.netowrk.tcp`).

Similarly, you can import `lib.network`, and Python will read and parse from "lib/netowrk/\_\_init\_\_.py".

## Search Path

Check `sys.path` for your search path, and Python will search against them in order as if it's cwd. 

Python projects usually append root (src) directory to search path (only affects running that Python process) or CWD to it's src upon execution to minimize search path related issues.