# Intermediate Python for Engineers - Day 2

# Setup

Here are some setup instructions:

## Browser

Please use Chrome or Firefox or Safari. *Preferably not IE or Edge*.

## Jupyter notebook server

1. Go to [https://hub.pythoncharmers.com/](https://hub.pythoncharmers.com/)
2. Your username is your email address
3. The password is the same as you used to log in to Zulip and is in the setup email.

## Load up your notebook

- It will be called "Yourname Day 2.ipynb".
- Save it (Control-S or File | Save Notebook).

We'll give more instructions on getting started with the notebooks as we start the course.

## Zulip chat server

1. Go to [https://pythoncharmers.zulipchat.com](https://pythoncharmers.zulipchat.com)
2. **Test:** Please send a message to our stream (#**485 Python for Engineers** ) saying Hi!

## Resources

- Please download the course materials PDF (in your home folder)
- You may also download the `.zip` file containing the Jupyter Notebooks of the course materials

## Trainer notebook

We suggest that you load up the trainer's notebook from the Trainer folder in a separate browser tab and reload when needed.

(To do this, right-click the file, then choose "Open in New Browser Tab". Then hit F5 or Command-R whenever you want to see the latest version of the trainer's notebook.)

### Session 2: Further language features

Session 2 will teach you further Python language features and explain important distinctions and “gotchas” that help you with developing robust, efficient systems in Python:

- Lazy sequences (continued)
- When & how to use decorators
    - Worked example: writing a disk-backed caching decorator
    - Properties, class methods
- Context managers:
	- Managing state; streamlining exception handling
    - Worked example: elegant code beyond PEP8 with context managers
- Managing memory
	- Mutability vs immutability
	- Memory profiling; garbage collection


In [3]:
import math
from calendar import isleap
def leap_years(start, end=None):
    print('Starting function ...')
    if end is None:
        end = math.inf
    year = start - 1
    while year < end:
        year += 1
        if isleap(year):
            print('About to yield ...')
            yield year
            print('Resuming function ...')
    print('Ending function')

In [4]:
years = leap_years(2000, 2040)
years

<generator object leap_years at 0x7f0b946f84a0>

In [5]:
next(years)

Starting function ...
About to yield ...


2000

In [6]:
list(years)

Resuming function ...
About to yield ...
Resuming function ...
About to yield ...
Resuming function ...
About to yield ...
Resuming function ...
About to yield ...
Resuming function ...
About to yield ...
Resuming function ...
About to yield ...
Resuming function ...
About to yield ...
Resuming function ...
About to yield ...
Resuming function ...
About to yield ...
Resuming function ...
About to yield ...
Resuming function ...
Ending function


[2004, 2008, 2012, 2016, 2020, 2024, 2028, 2032, 2036, 2040]

In [10]:
def weekDays():
    yield 'Monday'
    yield 'Tuesday'
    yield 'Wednesday'
    yield 'Thursday'
    yield 'Friday'

In [11]:
w = weekDays()

In [13]:
next(w)

'Tuesday'

In [26]:
import toolz as tz
import os
from pathlib import Path
data_folder = Path('/Data')
paths = data_folder.glob('*')
files = list(filter(os.path.isfile, paths))
    
def get_ext(path):
    return path.suffix
def get_filesize(path):
    return path.stat().st_size
paths_by_ext = tz.groupby(get_ext, files)

def count_size_by_extension(paths_by_ext: dict) -> dict:
    return {extension: sum(get_filesize(file) for file in files)
            for extension, files in paths_by_ext.items()}

In [27]:
count_size_by_extension(paths_by_ext)

{'.txt': 4810857,
 '.gexf': 120238,
 '.hdf': 476544,
 '.pdf': 13947496,
 '.zip': 5274816,
 '.csv': 19871826,
 '.bz2': 10575422,
 '.xlsx': 1658671,
 '.sqlite': 12931072,
 '.jpg': 489445,
 '.xz': 138236,
 '.png': 2255125,
 '.h5': 25877534,
 '.parquet': 4156633,
 '.xls': 3543040,
 '.hdf5': 176120,
 '.json': 207544,
 '.gz': 9334755,
 '': 9,
 '.bin': 12288,
 '.ipynb': 11733,
 '.mat': 61832,
 '.npz': 1436}

In [41]:
import hashlib
import collections

def get_hash(filename):
    """Read the given file's contents and return the MD5 checksum.
    
    If the file doesn't exist, raise an exception.
    """
    path = Path(filename)
    data = path.read_bytes()
    hasher = hashlib.md5(data)   # pass a byte-string
    return hasher.hexdigest()

get_hash('/Data/abalone.csv')
hashes = tz.groupby(get_hash, files)

hashes_inv = tz.groupby(get_hash, files)
result = tz.valfilter(lambda data: len(data) > 1, hashes_inv)


#print([item for item, count in collections.Counter(hashes).items() if count > 1])

In [43]:
result

{'d41d8cd98f00b204e9800998ecf8427e': [PosixPath('/Data/Airports.sqlite'),
  PosixPath('/Data/response.xlsx')],
 '79adb24ba37966d25928bcfcab8ae81e': [PosixPath('/Data/big-mac-2020-01-14.xls'),
  PosixPath('/Data/big-mac.xls')],
 '9b630a5faeef79758df24f372cc1a7ca': [PosixPath('/Data/melbourne-temp.csv'),
  PosixPath('/Data/bundoora-temp.csv')],
 '5c5729dadb801b8d478634a237e1b4bd': [PosixPath('/Data/forbes.csv'),
  PosixPath('/Data/forbes1964.csv')]}

In [50]:
def make_bold(func):
    def inner(*args):
        print(type(args), args)
        return '<b>' + func(*args) + '</b>'
    # Important:
    return inner

In [51]:
def hello(name):
    return f'Hello {name}!'

In [52]:
bold_hello = make_bold(hello)

In [53]:
bold_hello('Ed')


<class 'tuple'> ('Ed',)


'<b>Hello Ed!</b>'

In [66]:
import time

def timeme(func):
    def inner(*args, **kwargs):
        print(kwargs)
        start_time = time.time()
        result = func(*args, **kwargs)        
        end_time = time.time()
        print(f"Time passed {end_time - start_time}")
        return result
    return inner

In [67]:
@timeme
def hello(text, sleep=1):
    time.sleep(sleep)
    return text

In [68]:
hello('Rob', sleep=2)

{'sleep': 2}
Time passed 2.0020434856414795


'Rob'

In [85]:

def bounded(lower, upper):
    def decorator(func):
        def inner(*args, **kwargs):
            for arg in args:
                if not lower <= arg <= upper:
                    raise ValueError(f'arg must be between {lower} and {upper}')
            return func(*args, **kwargs)
        return inner
    return decorator


In [86]:
@bounded(3, 4)
def square(x):
    return x**2

In [87]:
class Simple:
    def hello(self):
        print('Hello')

In [88]:
from contextlib import contextmanager   # decorator you apply to a gen function

@contextmanager
def context_hello():
    # __enter__ block
    print('__enter__')
    
    try:
        yield Simple()
    except Exception as e:
        print(e)
        
    # __exit__ block
    print('__exit__')

In [89]:
with context_hello() as obj:
    1 / 0
    obj.hello()

__enter__
division by zero
__exit__


In [101]:
import os
import pandas as pd
@contextmanager
def move_directory(path):
    old_cwd = os.getcwd()
    os.chdir(path)
    try:
        yield
    except Exception as e:
        print(e)
    
    os.chdir(old_cwd)

In [102]:
import pandas as pd

print(os.getcwd())
with move_directory('/Data'):
    print(os.getcwd())
    forbes = pd.read_csv('forbes.csv')

display(forbes[:3])
    
print(os.getcwd())

/
/home/data/Data


Unnamed: 0,rank,name,country,sales,profits,assets,marketvalue
0,1,Citigroup,United States,94.71,17.85,1264.03,255.3
1,2,General Electric,United States,134.19,15.59,626.93,328.54
2,3,American Intl Group,United States,76.66,6.46,647.66,194.87


/


In [103]:
with move_directory('Data'):
    forbes = pd.read_csv('forbes.csv')

forbes[:3]

Unnamed: 0,rank,name,country,sales,profits,assets,marketvalue
0,1,Citigroup,United States,94.71,17.85,1264.03,255.3
1,2,General Electric,United States,134.19,15.59,626.93,328.54
2,3,American Intl Group,United States,76.66,6.46,647.66,194.87


In [104]:
pd.DataFrame.from_dict({'A': [1, 2, 3], 'B': [4, 5, 6]}, orient='index')


Unnamed: 0,0,1,2
A,1,2,3
B,4,5,6


In [169]:
import json
class BankAccount:
    def __init__(self, balance=0):
        self.__balance = balance
        
    def withdraw(self, amount):
        self.__balance -= amount
        
    def to_json(self, filename):
        print('Serializing to disk')
        balance = json.dumps(filename)

    @classmethod
    def from_json(cls, filename):
        print('Deserializing from disk')
        balance = json.loads(filename)
        account = cls(balance)   # BankAccount or a subclass
        return account

In [170]:
account = BankAccount(10**5)

In [171]:
account.to_json('account.json')


Serializing to disk


In [172]:
# BankAccount.from_json('account.json')

In [185]:
!cp -r Materials/08*  Materials/extras/BeyondPEP8/* ./

In [186]:
%cd

/home/jovyan


In [187]:
!cp -r Materials/08*  Materials/extras/BeyondPEP8/* ./

In [188]:
# %load rt_ugly.py
#!/usr/bin/env python
"""
Ugly script to iterate through a routing table and print routes.

Lightly modified version of code from Raymond Hettinger's PyCon 2015 talk:

    "Beyond PEP8 -- Best practices for beautiful intelligible code"

This code was originally transliterated from Java into Python. In several
respects it still looks like Java code. In his talk, Hettinger explains how
it can be made much more Pythonic.

Quote from the talk:

> Pythonic means "coding beautifully in harmony with
>   the language to get the maximum benefits from Python".  Learn to recognize
>   non-pythonic APIs and to recognize good code.  Don't get distracted by PEP 8.
>   Focus first on Pythonic versus NonPythonic (P vs NP).  When needed, write
>   an adapter class to convert from the former to the latter.

    * Avoid unnecessary packaging in favor of simpler imports
    * Create custom exceptions
    * Use properties instead of getter methods
    * Create a context manager for recurring set-up and teardown logic
    * Use magic methods:
          __len__ instead of getSize()
          __getitem__ instead of getRouteByIndex()
          make the table iterable
    * Add good __repr__ for better debuggability
'''

"""

from __future__ import print_function

import jnettool.tools.elements
from jnettool.tools import RoutingTable

ne=jnettool.tools.elements.NetworkElement( '171.0.2.45' )
try:
    routing_table=ne.getRoutingTable()  # fetch table

except jnettool.tools.elements.MissingVar:
  # Record table fault
  logging.exception( '''No routing table found''' )
  # Undo partial changes
  ne.cleanup( '''rollback''' )

else:
    num_routes=routing_table.getSize()   # determine table size
    for RToffset in range ( num_routes ):
           route=routing_table.getRouteByIndex( RToffset )
           name=route.getName()       # route name
           ipaddr=route.getIPAddr()          # ip address
           print("%15s -> %s"% (name,ipaddr)) # format nicely
finally:
    ne.cleanup( '''commit''' ) # lockin changes
    ne.disconnect()


In [191]:
!python rt_ugly.py

Creating NetworkElement(171.0.2.45)
         route1 -> 25.21.1.6
         route2 -> 25.21.1.7
Messages: commit
Disconnecting ...


In [190]:
!tree jnettool

[01;34mjnettool[00m
├── __init__.py
├── [01;34m__pycache__[00m
│   └── __init__.cpython-38.pyc
└── [01;34mtools[00m
    ├── [01;34melements[00m
    │   ├── __init__.py
    │   └── [01;34m__pycache__[00m
    │       └── __init__.cpython-38.pyc
    ├── __init__.py
    └── [01;34m__pycache__[00m
        └── __init__.cpython-38.pyc

5 directories, 6 files


In [193]:

from __future__ import print_function

import jnettool.tools.elements
from jnettool.tools import RoutingTable

ne=jnettool.tools.elements.NetworkElement( '171.0.2.45' )
try:
    routing_table=ne.getRoutingTable()  # fetch table

except jnettool.tools.elements.MissingVar:
  # Record table fault
  logging.exception( '''No routing table found''' )
  # Undo partial changes
  ne.cleanup( '''rollback''' )

else:
    num_routes=routing_table.getSize()   # determine table size
    for RToffset in range ( num_routes ):
           route=routing_table.getRouteByIndex( RToffset )
           name=route.getName()       # route name
           ipaddr=route.getIPAddr()          # ip address
           print("%15s -> %s"% (name,ipaddr)) # format nicely
finally:
    ne.cleanup( '''commit''' ) # lockin changes
    ne.disconnect()


Creating NetworkElement(171.0.2.45)
         route1 -> 25.21.1.6
         route2 -> 25.21.1.7
Messages: commit
Disconnecting ...


In [194]:
!python rt_ugly.py

Creating NetworkElement(171.0.2.45)
         route1 -> 25.21.1.6
         route2 -> 25.21.1.7
Messages: commit
Disconnecting ...


In [196]:
!python rt_elegant.py

Creating NetworkElement(171.0.2.45)
         route1 -> 25.21.1.6
         route2 -> 25.21.1.7
Messages: commit
Disconnecting ...
