##Answer to part 3

####Do this: re-run the test suite for the new function and see if we've broken anything

In [2]:
def find_common_ngrams(text, cutoff, n):
    
    # first generate a dict of n-gram counts
    words = text.lower().split(' ')
    ngram_counts = {}
    for start in range(len(words) +1 - n):
        ngram = ' '.join(words[start:start+n])
        current_count = ngram_counts.get(ngram, 0)
        ngram_counts[ngram] = current_count + 1
    
    # now return just the ngrams with count > cutoff
    result = []
    for ngram, count in ngram_counts.items():
        if count >= cutoff:
            result.append(ngram)
        
    return result

text = "it was the best of times it was the worst of times"

# test different cutoffs for the same n
assert find_common_ngrams(text, 2, 1) == ['it', 'was', 'the', 'of', 'times']
assert find_common_ngrams(text, 1, 1) == ['it', 'was', 'the', 'best', 'of', 'times', 'worst']

# test different n with the same cutoff
assert find_common_ngrams(text, 2, 3) == ['it was the']
assert find_common_ngrams(text, 1, 12) == ['it was the best of times it was the worst of times']

# test crazy values of n
assert find_common_ngrams(text, 2, 0) == []
assert find_common_ngrams(text, 2, -4) == []

AssertionError: 

Looks like something is wrong - the new version fails at the very first test. Good job we tested it before starting to use it in real code. 

But, the output isn't very helpful. Knowing that this single test doesn't give us any clues about *why* it's failing. 

Let's switch to a better tool.....

##Testing with `nose`

`nose` is a testing framework that does all the stuff we've seen, but better. 

###Naming conventions for `nose`

- install with `pip install nose` if you haven't already done so
- files that contain test code should start with `test` 
- tests should be in functions that start with `test`
- instead of using `assert`, use `assert_equal()`

Here's our test suite for the old version of the function as a set of `nose` tests:

In [3]:
from nose.tools import assert_equal

def find_common_ngrams(text, cutoff, n):
    if n < 1:
        return []
    words = text.lower().split(' ')
    result = []
    for start in range(len(words) +1 - n):
        ngram = ' '.join(words[start:start+n])
        if text.count(ngram) >= cutoff and ngram not in result:
            result.append(ngram)
            
    return result

text = "it was the best of times it was the worst of times"

def test_single_words():
    assert_equal(find_common_ngrams(text, 2, 1),['it', 'was', 'the', 'of', 'times'])

def test_all_words():
    assert_equal(find_common_ngrams(text, 1, 1),['it', 'was', 'the', 'best', 'of', 'times', 'worst'])

def test_small_n():
    assert_equal(find_common_ngrams(text, 2, 3),['it was the'])

def test_large_n():
    assert_equal(find_common_ngrams(text, 1, 12),['it was the best of times it was the worst of times'])

def test_zero_n():
    assert_equal(find_common_ngrams(text, 2, 0),[])

def test_negative_n():
    assert_equal(find_common_ngrams(text, 2, -4),[])

To run them, do

`nosetests`

and see the output:

```
$ nosetests
......
----------------------------------------------------------------------
Ran 6 tests in 0.002s

OK
```

As expected, all tests run OK. 

####Do this: edit the *test_ngrams.py* file, replace the old function definition with the new one, and re-run `nosetests`.

[click here for part 5](Testing for scientists part 5.html)

In [222]:
# ignore this cell, it's for loading custom js code
from IPython.core.display import Javascript
Javascript(filename="custom.js")

<IPython.core.display.Javascript object>

In [223]:
# ignore this cell, it's for loading custom css code
from IPython.core.display import HTML
HTML(filename="custom.css")