# Working with Data (Part 2)

## Classes

### Use property to future proof

- Initially, use use an attribue don't implement a property unless you have to
- When the need arise, then implement the getter
- Only implement the setter when absolutely needed

### Additional Thoughts

- Use property to implement read-only attributes: just omit the setter
- If not absolutely needed, do not alter the object's state (attributes)
- Instead of changing the attributes, consider create a method to duplicate the object with a new set of attributes
- For example, it is better for the object to provide a `get_refreshed_state` method which return a new object instead of providing a `refresh` method which update the current object with new attributes (I am guilty of this).
- A class `__init__` is an initializer, not constructor. That means inside `__init__`, the object has been created. The real constructor is `__new__`

###  `__str__` vs. `__repr__`

In general, `__repr__` (pronounce repper) shows the object in its raw state, where as `__str__` shows a more polished output, good for UI. PyCharm uses `__repr__` to display variables in its debugging window. Here are a few tips:

* Use `__str__` for UI display
* Use `__repr__` for internal works such as debugging
  * Identifiable: It should show an object's type
  * Distinguishable: If an object has many attributes, `__repr__` should display just enough attributes to distinguish one object from the next. 
  * Debugging Note: Because one of its use is in debugging, it should return a single line instead of multiple lines of text. At the same time, the output should not be too long where it loses its significance

In [1]:
class Person(object):
    def __init__(self, first, last):
        self.first = first
        self.last = last
        
    def __str__(self):
        return '{person.first} {person.last}'.format(person=self)
    
    def __repr__(self):
        return 'Person({0!r}, {1!r})'.format(self.first, self.last)
    
singer = Person('Sylvie', 'Vartan')
print 'str:', singer         # <<< __str__ used
print 'repr:', repr(singer)  # <<< __repr__

str: Sylvie Vartan
repr: Person('Sylvie', 'Vartan')


## Context Manager

Context manager is an object or function which provides a context for code within a `with` block. Note that context is not the same as scope. We can divide the execution flow of a context manager into three parts:

1. Prepare the context (optional)
2. Provide the context to the caller (by means of the `yield` statement)
3. Remove the context (optional)

### Common Context Managers

The most popular context manager is `open` which is in the book. Other include the temporary files in the `tempfile` module.

In [2]:
from tempfile import TemporaryFile
with TemporaryFile() as f:
    f.write('Hello, world')
    f.seek(0)
    print f.read()
# After we exit the with block, the temp file will be deleted automatically

Hello, world


#### Example: HTML tag

A simple example which surround a block of output with HTML tag. First, we are implementing the context manager using a class:

In [3]:
class Tag:
    def __init__(self, name):
        self.name = name
    
    def __enter__(self):
        print '<%s>' % self.name
        
    def __exit__(self, exc_type, exc_value, traceback):
        print '</%s>' % self.name
        
with Tag('p'):
    print 'Lorem ipsum dolor sit amet,'
    print 'consectetur adipiscing elit.'


<p>
Lorem ipsum dolor sit amet,
consectetur adipiscing elit.
</p>


Next, we implement the same context manager as a function:

In [4]:
from contextlib import contextmanager

@contextmanager
def tag(name):
    print '<%s>' % name
    yield
    print '</%s>' % name

with tag('p'):
    print 'Lorem ipsum dolor sit amet,'
    print 'consectetur adipiscing elit.'


<p>
Lorem ipsum dolor sit amet,
consectetur adipiscing elit.
</p>


#### Example: Redirect stdout

How do I test a function which writes output to stdout? The answer is to temporarily redirect the output to a file/buffer, run the function, restore the redirection and verify the output.

In [5]:
from contextlib import contextmanager
from StringIO import StringIO
import sys

@contextmanager
def redirect_stdout():
    old_stdout = sys.stdout
    sys.stdout = StringIO()
    try:
        yield sys.stdout
    finally:
        sys.stdout = old_stdout

# ==============================================================================

print 'Redirect stdout example'                   # Output to stdout

with redirect_stdout() as buffer:
    print('Yabadabado')                           # Output redirected

print 'Captured: {!r}'.format(buffer.getvalue())  # Output restored to stdout


Redirect stdout example
Captured: 'Yabadabado\n'


#### Example: pushd

Similar to the terminal's `pushd` command, the `pushd` context manager temporarily change the working directory, then back.

In [6]:
import os
import contextlib


@contextlib.contextmanager
def pushd(new_dir):
    original_dir = os.getcwd()
    os.chdir(new_dir)
    yield
    os.chdir(original_dir)
    

print 'Before pushd:', os.getcwd()

with pushd('/'):
    print 'In the with block:', os.getcwd()

print 'After pushd: ', os.getcwd()

Before pushd: /Users/hvu/projects/idiomatic
In the with block: /
After pushd:  /Users/hvu/projects/idiomatic


### Context Manager Usages

Context managers are useful in many cases, including:

- Set up, do action, clean up
- Acquire resource, do action, release resource
- Go to a tab, do action, return to previous tab (See `temporarily_goto_sheet()` in our code tree)
- Set up, do action, verify
- Set up, do action, handle exceptions
- pushd, do action, popd
- Redirect output to a file, do action, restore redirection


## Generators and Generator Expression

* A function which returns a generator object
* A generator expression syntax is similar to list comprehension, but with parentheses instead of square brackets
* A generator object gives one item at a time
* Code won't get execute until a `next(generator_object)` is called
* The most popular generator object: the file object returned by `open()`

#### Demo: generators are lazy

The code for generator will not get executed until `next()` is called:

In [7]:
def make_bread():
    """ This kitchen prepares only two loaves a day """
    print '>>> Prepare kitchen'
    for loaf in ['Sour dough', 'Cinnamon raisin']:
        yield loaf
        print '>>> Clean up'
    print '>>> Close kitchen'
    
breads = make_bread()  # This function returns a generator object
print breads

<generator object make_bread at 0x10636b9b0>


The first `next()` will get the generator cranking

In [8]:
print next(breads)

>>> Prepare kitchen
Sour dough


Notice in the call above, the code after the `yield` statement is not yet executed. Another call to the `next()` function will get it cranking again:

In [9]:
print next(breads)

>>> Clean up
Cinnamon raisin


Since the generator only return two loaves, if we call `next()` again, we will get a `StopIteration` error:

In [10]:
print next(breads)

>>> Clean up
>>> Close kitchen


StopIteration: 

This is how for loop works: it keeps calling `next()` until a `StopIteration` error is raised:

In [11]:
for loaf in make_bread():
    print loaf

>>> Prepare kitchen
Sour dough
>>> Clean up
Cinnamon raisin
>>> Clean up
>>> Close kitchen


#### Generators and Generator Expression Are More Efficient and Use Less Memory

Consider a "find first" problem using list comprehension and generator expression. The list comprehension will have to traverse the whole list, even after locating the target. The generator expression, which looks similar to list comprehension, will stop after the target is found.

In [12]:
import timeit

sheets = ['Sheet {}'.format(i) for i in range(10000)]
target = 'Sheet 5'

list_comp_timing = timeit.timeit(
    setup="from __main__ import sheets, target",
    stmt="[sheet for sheet in sheets if sheet == target][0]",
    number=10,
)
print 'Timing for list comprehension:   {:.6f}'.format(list_comp_timing)

generator_expr_timing = timeit.timeit(
    setup="from __main__ import sheets, target",
    stmt="next((sheet for sheet in sheets if sheet == target))",
    number=10,
)
print 'Timing for generator expression: {:.6f}'.format(generator_expr_timing)

print 'List comprehension is about {:.0f} times slower'.format(list_comp_timing / generator_expr_timing)


Timing for list comprehension:   0.018306
Timing for generator expression: 0.000053
List comprehension is about 344 times slower
