# Pythonic Code
## We Code Fest 2019 - Valladolid, Spain
Hey there! This jupyter notebook is part of a live coding talk called "Pythonic Code", hosted in the WeCodeFest 2019. This notebook will compare python scripts with a pythonic version of the same code. My intention is to show good practices and tricks in the Python programming language.

## File management
Files must be opened to perform any I/O operation with them. They must be closed too. It's a quite common mistake to forget about that, leaving the file descriptor unclosed. This can lead to errors when reaching the limit of open file descriptors. How does python help to prevent this situation? Using **context managers**. Let's see an example.

In [1]:
myfile = open('test.txt', 'w')
myfile.write('test')
myfile.close()

In [2]:
with open('test.txt', 'w') as myfile:
    myfile.write('test')

## Command line arguments
It is very common to use `sys.argv` and `getopt` to parse arguments from command line. It works but python provides an easier way to do it: the **argparse** module.

In the `getopt` module documentation, the following note is present at the top of the page:
> The `getopt` module is a parser for command line options whose API is designed to be familiar to users of the C `getopt()` function. Users who are unfamiliar with the C `getopt()` function or who would like to write less code and get better help and error messages should consider using the `argparse` module instead. 

In [7]:
import getopt

argv = ["getopt_example.py", "--list", "--group=default"]


def usage():
    msg = """
    {0} [ -l [ -g group_id ]
    Usage:
    \t-l                                    # List all groups
    \t-l -g group_id                        # List agents in group

    Params:
    \t-l, --list
    \t-g, --group
    """.format(basename(argv[0]))
    print(msg)


arguments = {'n_args': 0, 'n_actions': 0, 'group': None, 'list': False}
try:
    opts, args = getopt.getopt(argv[1:], "lcafsSri:g:qdh", ["list", "group=", "help"])
    arguments['n_args'] = len(opts)
except GetoptError as err:
    print(str(err) + "\n" + "Try '--help' for more information.")
    exit(1)

for o, a in opts:
    if o in ("-l", "--list"):
        arguments['list'] = True
        arguments['n_actions'] += 1
    elif o in ("-g", "--group"):
        arguments['group'] = a
    elif o in ("-h", "--help"):
        usage()
    else:
        print("Invalid options.\nTry '--help' for more information.\n")

arguments

{'n_args': 2, 'n_actions': 1, 'group': 'default', 'list': True}

In [1]:
import argparse
parser = argparse.ArgumentParser(description='List groups or all agents inside a group.')
parser.add_argument('-l', '--list', help="List all groups", dest='list', action='store_true')
parser.add_argument('-g', '--group', help="List all agents in a group", dest='group', type=str)
parser.parse_args(['--list', '--group=default'])

Namespace(group='default', list=True)

## Iterators
### Lists/Dict/Set comprehesions
We all know what a for loop is, and we have all used one at any time in our lives. Python includes that too but it also includes much other ways to iterate.

The most basic one are lists comprehesions. Lists comprehesions builds a list in a single line. Let's compare creating a list using a "traditional method" and creating one using a list comprehesion.

In [35]:
%%time
my_list = []
for i in range(10000000):
    my_list.append(i)

Wall time: 10.6 s


In [36]:
%%time
my_list = [i for i in range(10000000)]

Wall time: 1.74 s


In addition to list comprehesions, there are set comprehesions and dict comprehesions too!

In [2]:
%%time
import random
random_items = random.sample(range(1000000000), k=10000000)

Wall time: 20.1 s


In [3]:
%%time
my_dict = {}
for i in random_items:
    my_dict[i] = i

Wall time: 6.43 s


In [4]:
%%time
my_dict = {i:i for i in random_items}

Wall time: 4.72 s


In [7]:
%%time
my_set = set()
for i in random_items:
    my_set.add(i)

Wall time: 4.45 s


In [8]:
%%time
my_set = {i for i in random_items}

Wall time: 3.22 s


In [9]:
%%time
my_set = set(random_items)

Wall time: 2.35 s


As you can see, generally using a list/set/list comprehesion is faster than building it using a traditional for loop. Why? There is a few reasons:
* Using precompiled C code is faster than using interpreted python code
* Calling a function bound to a local name will be faster than calling a function in a external module.
* Avoid creating unnecessary lists in memory if your're only iterating them once: generators are faster.

## Unpacking tuples
To improve code's readability when iterating through any list/set/generator containing tuples it's possible to name each item in the tuple. For example, imagine we're iterating through a sql query in a database, each row contains the following  fields:
* Agent name
* Agent ID
* Agent IP
* Agent OS name

In [1]:
database_rows = [('my_manager', 0, '172.10.0.34', 'Ubuntu Bionic'),
                 ('my_1st_agent', 1, '183.24.64.1', 'CentOS 7'),
                 ('my_2nd_agent', 2, '193.142.144.12', 'Solaris 10')]

for row in database_rows:
    if row[1] > 0:
        print(row[3])

CentOS 7
Solaris 10


It's necessary to check databases's headers to know what `row[1]` and `row[3]` stands for. That makes the code harder to read and understand. But, in Python it is possible to ***unpack*** tuples:

In [2]:
for a_name, a_id, a_ip, a_os in database_rows:
    if a_id > 0:
        print(a_os)

CentOS 7
Solaris 10


This code looks much better, but we're only using two columns, why store the rest in variables? Let's just _ignore_ them:

In [3]:
for _, a_id, _, a_os in database_rows:
    if a_id > 0:
        print(a_os)

CentOS 7
Solaris 10


## Control flow
In python, as in many other languages, there are _control flow_ keywords such as `break` and `continue`. 

The `continue` statement works exactly the same as C, it can be used as _"filtering"_ inside a loop:

In [23]:
for i in range(2, 10):
    if i % 2 == 0:
        print(f"{i} is an even number.")
        continue
    print(f"{i} is not an even number.")

2 is an even number.
3 is not an even number.
4 is an even number.
5 is not an even number.
6 is an even number.
7 is not an even number.
8 is an even number.
9 is not an even number.


But personally, I prefer to use `filter` function to do it:

In [20]:
for i in filter(lambda x: x % 2 == 0, range(2, 10)):
    print(f"{i} is an even number.")

2 is an even number.
4 is an even number.
6 is an even number.
8 is an even number.


The `break` statement is used to stop iterating. Typically, is important to know whether the iteration stopped because of a `break` statement or not. The easiest way to do it is the following:

In [30]:
is_broken = False
for i in range(2, 10):
    for n in range(2, i):
        if i % n == 0:
            print(f"{i} = {n} * {i // n}.")
            is_broken = True
            break
    if not is_broken:
        print(f"{i} is a prime number.")
    else:
        is_broken = False

2 is a prime number.
3 is a prime number.
4 = 2 * 2.
5 is a prime number.
6 = 2 * 3.
7 is a prime number.
8 = 2 * 4.
9 = 3 * 3.


But there's a more pythonic way of doing this: using an `else` statement. Yes, Python supports `else` statements in loops. The code in the `else` statement will be executed when the loop terminates through exhaustion of the list (with `for`) or when the condition becomes false (with `while`), but **not when the loop is terminated by a `break` statement**. Therefore, previous example would look like this:

In [31]:
for i in range(2, 10):
    for n in range(2, i):
        if i % n == 0:
            print(f"{i} = {n} * {i // n}.")
            is_broken = True
            break
    else:
        print(f"{i} is a prime number.")

2 is a prime number.
3 is a prime number.
4 = 2 * 2.
5 is a prime number.
6 = 2 * 3.
7 is a prime number.
8 = 2 * 4.
9 = 3 * 3.


### Itertools module
The `itertools` module in python includes lots of functions to work with iterables in a easy way. It's very similar to [Haskell's list API](https://hackage.haskell.org/package/base-4.12.0.0/docs/Data-List.html), which means `itertools` can be a huge help when doing functional programming in python.


Let's see a few examples where using `itertools` can highly improve the code.

## Real world problems
Let's do solve some real problems, starting from a basic solution to a more pythonic one.

#### Advent of Code 2017 - Day 4: High-Entropy Passphrases

_A new system policy has been put in place that requires all accounts to use a passphrase instead of simply a password. A passphrase consists of a series of words (lowercase letters) separated by spaces._

_To ensure security, a valid passphrase must contain no duplicate words._

_For example:_

- _`aa bb cc dd ee` is valid._
- _`aa bb cc dd aa` is not valid - the word aa appears more than once._
- _`aa bb cc dd aaa` is valid - aa and aaa count as different words._

_The system's full passphrase list is available as your puzzle input. How many passphrases are valid?_

Attempt 1: Having an auxiliar list where all previously seen words are stored.

In [35]:
def pass_checker(passphrase):
    already_seen_words = []
    for word in passphrase:
        if word in already_seen_words:
            return False
        already_seen_words.append(word)
    return True

In [36]:
assert(pass_checker(['aa','bb','cc','dd','ee']) == True)
assert(pass_checker(['aa','bb','cc','dd','aa']) == False)
assert(pass_checker(['aa','bb','cc','dd','aaa']) == True)

In [79]:
import string
import random
pass_generator = random.sample([letter*i for letter in string.ascii_lowercase for i in range(1,500)], k=500*25)

In [80]:
%%timeit
pass_checker(pass_generator)

808 ms ± 13.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


Attempt 2: Using a set instead of a list

In [89]:
def pass_checker2(passphrase):
    already_seen_words = set()
    for word in passphrase:
        if word in already_seen_words:
            return False
        already_seen_words.add(word)
    return True

In [90]:
assert(pass_checker2(['aa','bb','cc','dd','ee']) == True)
assert(pass_checker2(['aa','bb','cc','dd','aa']) == False)
assert(pass_checker2(['aa','bb','cc','dd','aaa']) == True)

In [91]:
%%timeit
pass_checker2(pass_generator)

1.08 ms ± 2.59 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


Attempt 3: Using `itertools.groupby`.

In [92]:
import itertools
def pass_checker3(passphrase):
    return all(map(lambda x: len(list(x[1])) == 1, itertools.groupby(sorted(passphrase))))

In [93]:
assert(pass_checker3(['aa','bb','cc','dd','ee']) == True)
assert(pass_checker3(['aa','bb','cc','dd','aa']) == False)
assert(pass_checker3(['aa','bb','cc','dd','aaa']) == True)

In [94]:
%%timeit
pass_checker3(pass_generator)

9.54 ms ± 77.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [95]:
%%timeit
sorted(pass_generator)

5.36 ms ± 43.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


Why is attempt 2 better than the 3rd? Because the 2nd attempt is more simple: doesn't need sorting and the list of passwords is only iterated once.