<img src= '../images/software_eng.jpg'>

<img src= '../images/doc.jpg'>

# Leveraging documentation

When writing code for Data Science, it's inevitable that you'll need to install and use someone else's code. You'll quickly learn that using someone else's code is much more pleasant when they use good software engineering practices. In particular, good documentation makes the right way to call a function obvious. In this exercise you'll use python's help() method to view a function's documentation so you can determine how to correctly call a new method.

In [4]:
# load the Counter function into our environment
from collections import Counter

# View the documentation for Counter.most_common
help(Counter.most_common)

Help on function most_common in module collections:

most_common(self, n=None)
    List the n most common elements and their counts from the most
    common to the least.  If n is None, then list all element counts.
    
    >>> Counter('abcdeabcdabcaba').most_common(3)
    [('a', 5), ('b', 4), ('c', 3)]



## PEP 8 in documentation
So far we've focused on how **PEP 8** affects functional pieces of code. There are also rules to help make comments and documentation more readable. In this exercise, you'll be fixing various types of comments to be **PEP 8** compliant.

In [5]:
def print_phrase(phrase, polite=True, shout=False):
    if polite:  # It's generally polite to say please
        phrase = 'Please ' + phrase

    if shout:  # All caps looks like a written shout
        phrase = phrase.upper() + '!!'

    print(phrase)

In [6]:
# Politely ask for help
print_phrase('help me', polite=True)

# Shout about a discovery
print_phrase('eureka', shout=True)

Please help me
PLEASE EUREKA!!


## Writing docstrings

We just learned some about the benefits of docstrings. In this exercise, you will practice writing docstrings that can be utilized by a documentation generator like Sphinx.

In [51]:
# Complete the function's docstring
def tokenize(text, regex=r'[a-zA-z]+'):
  """Split text into tokens using a regular expression

  :param text: text to be tokenized
  :param regex: regular expression used to match tokens using re.findall 
  :return: a list of resulting tokens

  >>> tokenize('the rain in spain')
  ['the', 'rain', 'in', 'spain']
  """
  return re.findall(regex, text, flags=re.IGNORECASE)

# Print the docstring
help(tokenize)

Help on function tokenize in module __main__:

tokenize(text, regex='[a-zA-z]+')
    Split text into tokens using a regular expression
    
    :param text: text to be tokenized
    :param regex: regular expression used to match tokens using re.findall 
    :return: a list of resulting tokens
    
    >>> tokenize('the rain in spain')
    ['the', 'rain', 'in', 'spain']



In [52]:
print(tokenize('the rain in spain'))

['the', 'rain', 'in', 'spain']


## Using good function names
A good function name can go a long way for both user and maintainer understanding. A good function name is descriptive and describes what a function does.

Give function the best possible name from the following options: `do_stuff`, `hypotenuse_length`, `square_root_of_leg_a_squared_plus_leg_b_squared`, `pythagorean_theorem`.

In [55]:
import math

def do_stuff(leg_a, leg_b):
    """Find the length of a right triangle's hypotenuse

    :param leg_a: length of one leg of triangle
    :param leg_b: length of other leg of triangle
    :return: length of hypotenuse
    
    >>> hypotenuse_length(3, 4)
    5
    """
    return math.sqrt(leg_a**2 + leg_b**2)

In [56]:
# Print the length of the hypotenuse with legs 6 & 8
print(do_stuff(6, 8))

10.0


## Using good variable names
Just like functions, descriptive variable names can make your code much more readable. In this exercise, you'll write some code using good variable naming practices.

There's not always a clear best name for a variable. The exercise has been written to try and make a clear best choice from the provided options.

In [58]:
from statistics import mean

Choose the best variable name to hold the sample of pupil diameter measurements in millimeters from the following choices: `d`, `diameter`, `pupil_diameter`, or `pupil_diameter_in_millimeters`.

In [None]:
# Sample measurements of pupil diameter in mm
your_option = [3.3, 6.8, 7.0, 5.4, 2.7]

Take the mean of the measurements and assign it to a variable. Choose the best variable name to hold this mean from the following options: `m`, `mean`, `mean_diameter`, or `mean_pupil_diameter_in_millimeters`.

In [60]:
# Average pupil diameter from sample
mean_diameter = mean([3.3, 6.8, 7.0, 5.4, 2.7])
print(mean_diameter)

5.04


<img src= '../images/modularity.jpg'>

# Python modularity in the wild
You'll utilize a class & a method from the popular package numpy.

## Refactoring for readability
Refactoring longer functions into smaller units can help with both readability and modularity. In this exercise, you will refactor a function into smaller units. The function you will be refactoring is shown below. Note, in the exercise, you won't be using docstrings for the sake of space; in a real application, you should include documentation!

`def polygon_area(n_sides, side_len):
    """Find the area of a regular polygon
    :param n_sides: number of sides
    :param side_len: length of polygon sides
    :return: area of polygon
    >>> round(polygon_area(4, 5))
    25
    """
    perimeter = n_sides * side_len
    apothem_denominator = 2 * math.tan(math.pi / n_sides)
    apothem = side_len / apothem_denominator
    return perimeter * apothem / 2`

In [61]:
def polygon_perimeter(n_sides, side_len):
    return n_sides * side_len

def polygon_apothem(n_sides, side_len):
    denominator = 2 * math.tan(math.pi / n_sides)
    return side_len / denominator

def polygon_area(n_sides, side_len):
    perimeter = polygon_perimeter(n_sides, side_len)
    apothem = polygon_apothem(n_sides, side_len)
    return perimeter * apothem / 2

In [63]:
# Print the area of a hexagon with legs of size 10
polygon_area(n_sides=6, side_len=10)

259.8076211353316

<img src= '../images/testing.jpg'>

In [68]:
import doctest
from collections import Counter

In [69]:
def sum_counters(counters):
    # Sum the inputted `counters`
    return sum(counters, Counter())

def sum_counters(counters):
    """Aggregate collections.Counter objects by summing counts

    :param counters: list/tuple of counters to sum
    :return: aggregated counters with counts summed

    >>> d1 = text_analyzer.Document('1 2 fizz 4 buzz fizz 7 8')
    >>> d2 = text_analyzer.Document('fizz buzz 11 fizz 13 14')
    >>> sum_counters([d1.word_counts, d2.word_counts])
    Counter({'buzz': 2, 'fizz': 4})
    """
    return sum(counters, Counter())

In [70]:
doctest.testmod()

**********************************************************************
File "__main__", line 10, in __main__.do_stuff
Failed example:
    hypotenuse_length(3, 4)
Exception raised:
    Traceback (most recent call last):
      File "C:\Users\vilieri.i\AppData\Local\Continuum\anaconda3\lib\doctest.py", line 1329, in __run
        compileflags, 1), test.globs)
      File "<doctest __main__.do_stuff[0]>", line 1, in <module>
        hypotenuse_length(3, 4)
    NameError: name 'hypotenuse_length' is not defined
**********************************************************************
File "__main__", line 11, in __main__.sum_counters
Failed example:
    d1 = text_analyzer.Document('1 2 fizz 4 buzz fizz 7 8')
Exception raised:
    Traceback (most recent call last):
      File "C:\Users\vilieri.i\AppData\Local\Continuum\anaconda3\lib\doctest.py", line 1329, in __run
        compileflags, 1), test.globs)
      File "<doctest __main__.sum_counters[0]>", line 1, in <module>
        d1 = text_analyze

TestResults(failed=4, attempted=5)

<img src= '../images/git.png'>