# Python Tricks - The Book
## Assertions
- used to inform developers about **unrecoverable** errors in a program, **NOT** for expected error conditions (like FileNotFound, where the user can just retry)!
- Therefore, `assert` is a *debugging aid* not a handler of runtime errors
### Pitfalls
- Assertions can be globally disabled, so **do not use them for data validation**
- passing a tuple to assert always evaluates to `True` as non-empty tuples evaluate as True

In [6]:
# Always passes:
assert (1 == 42, "This should fail but doesn't")

# Correct way:
assert 1 == 42, "This actually fails"

AssertionError: This actually fails

## Comma Placement for Lists
- to better see diffs in versioning systems, one can spread out lists over multiple lines
- whenn adding items, instead of adding a comma to the last line and adding annother line, one can **always add commas to the end of a line, even the last one**

In [8]:
names = [
    "Alice",
    "Bob",
    "Dilbert",      # Note the trailing comma
]
print(names)

['Alice', 'Bob', 'Dilbert']


## Underscores
1. Leading Underscore `_var``
    - `_var` is – **by convention** – a (private) variable for *internal* use
    - when importing with wildcard `*`, functions starting with `_` will not be imported (but wildcard imports should be avoided anyway...)
2. Trailing Underscore `class_``
    - used – **by convention** – to break naming conflicts with python keywords (e.g. `list_`)
3. Double Leading Underscore `__var``
    - A double underscore prefix causes the Python interpreter to rewrite the attribute name in order to avoid naming conflicts in subclasses (name mangling)

In [13]:
class Test:
    def __init__(self):
        self.foo = 11
        self._bar = 23
        self.__baz = 23
t = Test()
print(dir(t))

class ExtendedTest(Test):
    def __init__(self):
        super().__init__()
        self.foo = 'overridden'
        self._bar = 'overridden'
        self.__baz = 'overridden'

print("== EXTENDED CLASS ==")
et = ExtendedTest()
print(et.foo)
print(et._bar)
print(et.__baz)

['_Test__baz', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_bar', 'foo']
== EXTENDED CLASS ==
overridden
overridden


AttributeError: 'ExtendedTest' object has no attribute '__baz'

*Note* that `__baz` is automatically renamed to `_Test__baz` to avoid name collision with subclasses
> Double underscores in python are often called "dunders"

4. Double Leading and Trailing underscores `__init__`
    - These **are not touched by name mangling**
    - used for dunder-methods (aka magic methods)
    - avoid use for own methods due to possible (future) collisions with python syntax

5. Single Underscore `_`
    - used for a temporary or insignificant variable
    - is a valid variable name, though

## Functions as First Class Citizens
- functions can be passed from and to other functions
- functions can be stored in lists and objects
- Objects can be made callable by implementing the `__call__(self,...)` method

## `__str()__` and `__repr()__` Dunder methods
- `str` should be readable
- `repr` should be unambiguous

## Custom Exception Classes
- for a custom module you should write custom exception Classes
- you should also write a custom BaseException for your module and have all other exceptions inherit from it
    - this makes it easier not only to catch specific exceptions but also generic ones

In [45]:
class MyModuleBaseValidationError(ValueError):
    pass

class NameTooShortError(MyModuleBaseValidationError):
    pass

def validate(name):
    if len(name) < 10:
        raise NameTooShortError(name)

try:
    validate("joe")
except NameTooShortError:
    print("Name too short!")
except MyModuleBaseValidationError:
    print("Error in MyModule")

Name too short!


## Class Variables VS Instance Variables
### Class Variables
- Not tied to an instance, but stored in the class itself
- is shared among *all instances of the class*
- When changing `ClassName.class_var = VALUE` it is changed for all instances
- When changing `instance1.class_var = VALUE` then `class_var` is introduced as an instance variable shadowing the class variable

### Instance Variable
- created for each individual instance

In [51]:
class Dog:
    num_legs = 4 # <- Class variable

    def __init__(self, name):
        self.name = name # <- Instance variable

jack = Dog("Jack")
jill = Dog("Jill")

print(jack.name, jill.name)
print(jack.num_legs)
print(Dog.num_legs) #Class variable

jack.num_legs = 6 # c

print(jack.name, jill.name)
print(jack.num_legs)
print(jill.num_legs)
print(Dog.num_legs) #Class variable

Jack Jill
4
4
Jack Jill
6
4
4


## Stock Python Data Types
### Dictionary
- $O(1)$ for lookup, insert, delete
**Special Forms**: OrdererdDict, DefaultDict, ChainMap
### Array-like
#### List
- Mutable, dynamic array
- usable as stack
- slow for immitating a queue (to due shifting of $n$ elements for each insert)
#### Tuples
- immutable container
#### str
- immutable array of unicode chars
### bytes
- immutable array of single bytes

### Sets
### Set
- membership test in $O(1)$, union, intersect etc. in $O(n)$
### Multiset (`collections.Counter`)
- allows for multiple occurences of a value in a set  

## Generators and Iterators
- Iterators can be used to write pythonic loops
- constructing an iterator on a custom class requieres a lot of boilerplate code (implementing `__next()__` etc.)
- Generators (functions which `yield` results instead of `return`ing them) can be used to achieve the same results
    - Generator functions can be expressed as *generator expressions*, similar to list comprehensions  
$\Rightarrow$ both yield results on-the-fly which makes them ***very memory efficient***


In [53]:
square_generator = (x**2 for x in range(10)) # Use round parenthesis instead of square brackets or curly braces
print(square_generator)

for e in square_generator:
    print(e)

<generator object <genexpr> at 0x1296c1dd0>
0
1
4
9
16
25
36
49
64
81


## Pythonic Productivity Hacks
### Using `dir()` to examine the structure of a module/class/object

In [63]:
import pandas as pd
dir(pd)

['BooleanDtype',
 'Categorical',
 'CategoricalDtype',
 'CategoricalIndex',
 'DataFrame',
 'DateOffset',
 'DatetimeIndex',
 'DatetimeTZDtype',
 'ExcelFile',
 'ExcelWriter',
 'Float64Index',
 'Grouper',
 'HDFStore',
 'Index',
 'IndexSlice',
 'Int16Dtype',
 'Int32Dtype',
 'Int64Dtype',
 'Int64Index',
 'Int8Dtype',
 'Interval',
 'IntervalDtype',
 'IntervalIndex',
 'MultiIndex',
 'NA',
 'NaT',
 'NamedAgg',
 'Period',
 'PeriodDtype',
 'PeriodIndex',
 'RangeIndex',
 'Series',
 'SparseDtype',
 'StringDtype',
 'Timedelta',
 'TimedeltaIndex',
 'Timestamp',
 'UInt16Dtype',
 'UInt32Dtype',
 'UInt64Dtype',
 'UInt64Index',
 'UInt8Dtype',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__docformat__',
 '__file__',
 '__getattr__',
 '__git_version__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '__version__',
 '_config',
 '_hashtable',
 '_lib',
 '_libs',
 '_np_version_under1p14',
 '_np_version_under1p15',
 '_np_version_under1p16',
 '_np_version_under1p17',
 '_np_version_under1p

In [64]:
dir(pd.DataFrame)

['T',
 '_AXIS_ALIASES',
 '_AXIS_IALIASES',
 '_AXIS_LEN',
 '_AXIS_NAMES',
 '_AXIS_NUMBERS',
 '_AXIS_ORDERS',
 '_AXIS_REVERSED',
 '__abs__',
 '__add__',
 '__and__',
 '__annotations__',
 '__array__',
 '__array_priority__',
 '__array_wrap__',
 '__bool__',
 '__class__',
 '__contains__',
 '__copy__',
 '__deepcopy__',
 '__delattr__',
 '__delitem__',
 '__dict__',
 '__dir__',
 '__div__',
 '__doc__',
 '__eq__',
 '__finalize__',
 '__floordiv__',
 '__format__',
 '__ge__',
 '__getattr__',
 '__getattribute__',
 '__getitem__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__iand__',
 '__ifloordiv__',
 '__imod__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__invert__',
 '__ior__',
 '__ipow__',
 '__isub__',
 '__iter__',
 '__itruediv__',
 '__ixor__',
 '__le__',
 '__len__',
 '__lt__',
 '__matmul__',
 '__mod__',
 '__module__',
 '__mul__',
 '__ne__',
 '__neg__',
 '__new__',
 '__nonzero__',
 '__or__',
 '__pos__',
 '__pow__',
 '__radd__',
 '__rand__',
 '__rdiv__',
 '__reduce__',
 '__reduce_e

### Using `help()`

In [69]:
help(pd.DataFrame.join)

Help on function join in module pandas.core.frame:

join(self, other, on=None, how='left', lsuffix='', rsuffix='', sort=False) -> 'DataFrame'
    Join columns of another DataFrame.
    
    Join columns with `other` DataFrame either on index or on a key
    column. Efficiently join multiple DataFrame objects by index at once by
    passing a list.
    
    Parameters
    ----------
    other : DataFrame, Series, or list of DataFrame
        Index should be similar to one of the columns in this one. If a
        Series is passed, its name attribute must be set, and that will be
        used as the column name in the resulting joined DataFrame.
    on : str, list of str, or array-like, optional
        Column or index level name(s) in the caller to join on the index
        in `other`, otherwise joins index-on-index. If multiple
        values given, the `other` DataFrame must have a MultiIndex. Can
        pass an array as the join key if it is not already contained in
        the calli