# Contents
* Profiling
* Debugger
* Passing by reference vs passing by value
* Import, python path and package layout
* Namespace and scope
* Object lifecycle
* Serialization

# Profiling

## Tools for profiling
* time
* timeit
* (builtin) cProfile
* line profiler

In [None]:
def cos_min(lst):
    min_element = np.cos(lst[0])
    for x in lst:
        if np.cos(x) < min_element:
            min_element = np.cos(x)
    return min_element

In [None]:
%prun cos_min(np.arange(10000))

In [None]:
#! pip install line_profiler
%load_ext line_profiler
%lprun -f cos_min cos_min(np.arange(10000))

In [None]:
#!pip install numba
import numba
@numba.jit
def cos_min_fast(lst):
    min_element = np.cos(lst[0])
    for x in lst:
        if np.cos(x) < min_element:
            min_element = np.cos(x)
    return min_element

In [None]:
%time cos_min(np.arange(1000000))

In [None]:
%time cos_min_fast(np.arange(1000000))

---

# Debugger

In [None]:
def cos_min(lst):
    min_element = np.cos(lst[0])
    
    # Put this statement in the code to start python debugger
    import pdb; pdb.set_trace()
    for x in lst:
        if np.cos(x) < min_element:
            min_element = np.cos(x)
    return min_element

In [None]:
cos_min([2,3,4,5,6])

---

# Passing by reference
## All arguments in the Python language are passed by reference

In [None]:
a = 90
print(id(a))
def test_arg_id(x):
    print(id(x))
test_arg_id(a)

## If arguments' contents are changed, these changes reflect back in the caller

In [None]:
def append(lst, element):
    lst.append(element)
lst = [1, 2]
append(lst, 3)
print(lst)

## Are the values of these arguments changed?

In [None]:
def f(x):
    x = 9
x = None
f(x)
print(x)

In [None]:
def f(x=None):
    if x is None:
        x = list('default')
    else:
        x = list(x)
    x[0] = 10
    
x = [1]
f(x)
print(x)

## Python always create a new object for "="

In [None]:
def f(x):
    x = 9
    print(id(x))
    
x = None
print(id(x))
f(x)

### Python is a strong type program language. Once created, the type of an object is fixed. The object type cannot be overwritten at runtime.

---

# import, python runtime and package layout
```
import this
```

References
1. https://www.python.org/dev/peps/pep-0402/
2. https://www.python.org/dev/peps/pep-0382/
3. https://www.python.org/dev/peps/pep-0328/
4. http://python-notes.curiousefficiency.org/en/latest/python_concepts/import_traps.html

## Paths where a package may be installed to
* (linux) `/usr/lib/python3.*/site-packages/`
* (linux) `/usr/local/lib/python3.*/site-packages/`
* (mac) `/usr/local/Cellar/python/3.*/Frameworks/Python.framework/Versions/3.*/lib/python3.*/site-packages`
* `(virtual-env-path)/lib/python3.*/site-packages/`
* (linux) `~/.local/lib/python3.*/site-packages/`
* (mac) `~/Library/Python/3.7/lib/python/site-packages/`

## Python runtime paths

```
sys.path

['/usr/lib/python3.8',
'/usr/lib/python3.8/lib-dynload',
'/usr/local/lib/python3.8/dist-packages',
'/usr/lib/python3/dist-packages'
...]
```

For packages located in the paths listed in `sys.path`, python can load them on the fly. Libraries do not have to be existing before startup.  Python allows one to install libraries (to `sys.path`) or updating `sys.path` at runtime.

In [None]:
import runtime_lib_demo

In [None]:
!mkdir -p templib && cd templib && touch runtime_lib_demo.py

In [None]:
sys.path.append(os.path.abspath('templib'))
import runtime_lib_demo

## PYTHONPATH and pth file

### Python also searches packages in paths specified by environment variable PYTHONPATH

```
$ PYTHONPATH=/tmp/abc/def python -c 'import sys; print(sys.path)'
```

### pth file in site directories

```
$ cat /usr/local/lib/python3.8/site-packages/easy-install.pth
/tmp/abc/def

$ python -c 'import sys; print(sys.path)'
```
pth file is often generated by command `pip install -e`. This way adds a custom package to system site packages.


## How to make a Python package / sub-package

Putting `__init__.py` in a folder, the folder becomes a Python package / sub-package

### namespace package (for sub-packages)

By adding paths in variable `__path__` (usually in `__init__.py`), packages can be "imported" as sub-package

```
layout:
/tmp/foo/bar/
  __init__.py
  spam.py

./baz/
 __init__.py
```

```
$ cat /tmp/foo/bar/spam.py
print('hello world')

$ cat baz/__init__.py
__path__ = ['/tmp/foo/bar']

$ python -c 'from baz import spam'
hello world
```

When developing a Python package, `PYTHONPATH`, `pth` file (through `pip install -e`) and `__path__` are commonly used to temporarily install a package without polluting the default package systems.

## Relative import
```
# layout:
#   foo/
#     __init__.py
#     bar/
#       __init__.py
#       baz.py
#       where_we_are.py
# cat where_we_are.py
from . import baz       # == from foo.bar import baz
from ..bar import baz   # == from foo.bar import baz
from ... import foo     # == import foo
from .... import applicable_but_smells_bad
```

## How does Python import a package
two steps:
* Load the module (importlib.import_module)
* Register the module in a proper namespace

```
from foo import bar as ham

# equivalent to
imoprt foo
ham = foo.bar
del foo
```

## Circular import
Difference between `from foo import bar` and `import foo` in the circular case


* works

```
# cat foo.py
import bar
spam = 1

# cat bar.py
import foo
ham = 2
```


* not working

```
# cat foo.py
import bar
spam = 1

# cat bar.py
from foo import spam
ham = 2
```

```
print(sys.path)
[..., '/tmp/foo',
'/tmp/foo/bar', ...]

from bar import baz
from foo.bar import baz

print(sys.modules)
'bar.baz': <module 'bar.baz' from '/tmp/foo/bar/baz.pyc>,
'foo.bar.baz': <module 'foo.bar.baz' from '/tmp/foo/bar/baz.pyc>,
```

## Each module is imported only once
* Modules presented in `sys.modules` will not be imported again
* Avoid double import
  - Same package being imported twice from different paths
  - How? When the path of a sub-package is added to sys.path
  - No way to guarantee single import

### double import
```
/foo/bar/baz/ham.py
PYTHONPATH=/foo/bar/baz:/foo/bar
import ham
import baz.ham
```

---

# Namespace and scope
<img src="https://freecontent.manning.com/wp-content/uploads/namespacing-with-python_01.png" style="width: 700px;"/>

## Namespace and scope
* global namespace (module level)
  - `dir()` or `globals()`
  
* class namespace
  - `dir(<class name>)` or `<class name>.__dict__`
  
* local namespace (inside functions)
  - `dir()` or `locals()`
  
* closure
  - `locals()` shares the local namespace of the parent function, while its `globals()` points to the regular global namespace.

In [None]:
class A:
    var = 1
    def print_me(self):
        print(dir())

In [None]:
A().print_me()

In [None]:
dir(A)

In [None]:
# Variables in differet namespace can have the same name
x = "I'm global"
def func():
    global x
    x = "I'm local"
    another_local_var = 'xyz'
    def func_closure():
        global x #NO
        x = "I'm local in closure"
        y = another_local_var
        print(dir()) # globals()
        print(x)
    # func_closure.x_in_func = x # x in local namespace of func
    return func_closure
fclosure = func()
print(x)
fclosure()
print(dir(fclosure))
#print(fclosure.x_in_func)

## Variable bindings in closure

In [None]:
funs = []
for key in range(3):
    def plus_key(n, key=key):
        return key + n
    funs.append(plus_key)
    
print(funs[0](10))
print(funs[1](10))
print(funs[2](10))
print(funs)

---

# Python object lifecycle

* Variables defined in modules: They live as long as the module live in memory
* Local varialbes in functions: typically, destroyed when function ends
* Attributes of objects: destroyed when the object is destroyed
* Closure: affect the lifecycle of variables in parent functions. These variables are destroyed when closures are destroyed.
* Implicit variables in function call: same life cycle as local variables in functions. They are destroyed when function execution finish. The statement "del" does not delete these variables immediately
```
def func(x):
    del x
func(' '*100)
```


## Object lifecycle in Ipython

* Outputs/returns are cached in In/Out history cache. This sometimes causes large memory usage. Use magic `%reset` to clear the history cache

## Reference counting
* Python use automatic reference counting to manage lifecycle of an object
* When reference count of an object is reduced to 0, Python garbage collector will destroy the object.
* Reference decreasement happens in: assignment, "del" statement, function finish
```
import sys
obj = None
sys.getrefcount(obj)
```

In [None]:
import sys
sys.getrefcount(None)

## `__del__` method

* Like the destructor of C++. It can be used to destruct resources (e.g. files, sockets)
* If `__del__` method is defined for a class, it is called only if the reference count of the object is decreased to 0
* Statement `del obj` does not call `__del__` method. It only decrease the reference count of an object
* `__del__` is useful, but generally using `__del__` is not recommended. This is due to the defects of reference counting algorithm. (See [circular reference](circular-reference))  

## Circular reference

```
class A:
    pass
a = A()
a.var = a
```

```
class B:
    def __init__(self, var):
        self.var = var
a = A()
b = B(a)
a.var = b
```

Circular reference may hide deeply in the code. https://bugs.python.org/issue12836 is a circular reference bug in Python standard library (ctypes) found 9 years ago. No fix so far. 

## Problems caused by circular reference
* Memory leak
* `__del__` method never be called. Certain resources never be released.


## Solutions for circular reference

1. Manual garbage collection
```
import gc; gc.collect()
```

2. Context manager than `__del__` method

3. Weakref library
https://docs.python.org/3/library/weakref.html#example

4. Define "close" method, and make explict calls to "close" method for destruction. However, be cautious to the Exceptions raised in the program. They may skip the statements that call "close" method.

---

# Serialization

* `pickle.dumps` / `pickle.loads` is a way to replay/rerun the data construction procedure in Python interpreter
  - Insecure for the pickled object from untrusted sources
  - Performance: Moderate
  - Compatibility
* `json.dumps` / `json.loads`
  - text based, human readable, cross-platform compatibility
  - Performance: slow
  - limited supports to Python data structure

## Picklable
* built-in variables and built-in functions
  (None, boolean, numbers, str, tuple, list, sets, dict, ...)

* functions, classes that are defined at the top level of a module which is importable
* instances of custom classes if `__dict__` is picklable or `__setstate__()` is defined

## Unpicklable
* io object, file object, socket, closure cannot be pickled

## Multiprocessing
* Multiprocessing is related to pickle.
* When a process is forked, only picklable objects can be cloned to child processes. (Objects were pickled in the parent process and sent to child processes then unpickled).
* Unpicklable objects may vanish in child processes. If your program relies on unpicklable objects, the program may be terminated silently or with strange errors that cannot be understood.

### Rule of thumb: whenever possible, use multithreading than multiprocessing

# Conclusion
* Python is great, but imperfect
* Thinking like a Python interpreter
* More reading, more writing