## Lab exam 2020

## Rules

* **Read the rules carefully before you begin.**

<div class="alert alert-danger">
    
* This submission must be 100% your own work. Do not discuss, share or collaborate on any part of this exam with any other person. 
    
</div>

* This is an **open book** exam. You **may** consult external references (books, Stack Overflow, etc.) but you **must not** copy and paste code verbatim; nor may you reveal or discuss any aspect of the exam with anyone in an online (or offline) medium.
* You must submit the correct file on Moodle by the posted deadline. There are no extensions.
* You may not import *any* libraries beyond those already imported for you. You do not need to use a library just because it is imported!

---

### Marking
* This exam is marked out of 80.
* The division of marks is listed by each task.
* The three parts of this exam are *independent*. If you cannot complete one, this will not affect your ability to complete the others.
* The parts increase in difficulty. Remember: partial solutions will get credit. You do not have to have completely working code, or implement all of the features requested, to get most of the marks. Make an attempt if you can.
* You are warned that spending excess time will not likely increase your grade but will increase your stress levels. 
   

## Important
Please enter, into the cell marked **[STATEMENT BELOW]** below, the following statement:

> I, [your name], have read and understand the rules governing this lab exam and I will abide by them.
    
(double-click a cell to edit it).

I, Iliyan Kalphov, have read and understand the rules governing this lab exam and I will abide by them.
[STATEMENT HERE]

---

# Tasks

In [5]:
import numpy as np
import math, random, functools

# Background
A **word search puzzle** is a square grid of letters, where some sequences of letters make up words, and the rest are random letters. The puzzle is to find a list of known words in the grid. Words may be written horizontally, and sometimes vertically or even diagonally. In some variations, words can be reversed. 

The puzzle below contains the words `block`, `spain`, `sugar`, `beans`:

        S V B S B
        P Q L U E
        A I O G A
        I C C A N
        N W K R S

## Part 1: Reading words

You are building a system to help design word search puzzles. To do this, you have been asked to process some dictionary data to produce suitable word sets.

### (a) Read in words

Write a function `read_lines(fname)`. This should read a file named `fname` and return a list of lines, which should be stripped of newlines. There are 32 lines in the file `test.txt` and 19438 lines in the file `dictionary.txt`, and 0 lines in `empty.txt`. Write tests to make sure this function works. 

**[5 marks]**

In [2]:
# Solution goes here
def read_lines(fname):
    with open(fname) as file:
        list_of_lines = [line.strip('\n') for line in file]
    return list_of_lines

assert read_lines("empty.txt") == []
assert len(read_lines("test.txt")) == 32
assert len(read_lines("dictionary.txt")) == 19438

### (b) Clean up
You want to read the file `dictionary.txt` using the function above. 

Unfortunately, the dictionary has been transcribed poorly. Each line has several words in it, separated by one or more spaces. The case of letters is sometimes upper or lower case.  The words are in random order. Occasionally a page number appears mixed in with the words, which must be ignored. Words never contain line numbers; they are always separated by a space.

For example, part of `dictionary.txt` reads:

    Jags dunks
    uncoated CHAUFFEUSE drudgery
    249 ACCELERATIONS
    alabama
    Mellows sealed
    
The 249 is a page number and should be ignored. Valid words for our purposes only consist of alphabetic letters. These would be, in this case,

    jags
    dunks
    uncoated
    chauffeuse
    drudgery
    accelerations
    alabama
    mellows
    sealed
    
Any contractions or hyphenated words, like `don't` or `can't` or `topsy-turvy` should *also* be removed, as they have non-alphabetic characters.
  
* Write a function `clean_line(line)` that will take *one* line and return the valid words in it, all lowercase, **as a list**.
* Write tests to make sure `clean_line(line)` works correctly.
* Write a function `clean_all_lines(lines)` that will take a list like the return value of `read_lines` and return a list of words, cleaned by `clean_line` and **sorted into alphabetic order.**
* Write tests to make sure `clean_all_lines` works correctly.
* The words in `dictionary_txt` begin "a" and end "zwolle". There are 65152 words. 

**[12 marks]**

In [3]:
# Solution goes here
def clean_line(line):
    return [word.lower() for word in line.split() if word.isalpha()]

assert clean_line("")==[]
assert clean_line("ABC abC") == ["abc","abc"]
assert clean_line("123 abC") == ["abc"]
assert clean_line("123 ABc don't 32") == ["abc"]
assert clean_line("aBc topsy-turvy") == ["abc"]
assert clean_line("abc   sqrt 12") == ["abc","sqrt"]

def clean_all_lines(lines):
    word_list = []
    for line in lines:
        word_list.extend(clean_line(line))
    word_list.sort()
    return word_list

dictionary = read_lines("dictionary.txt")
assert len(clean_all_lines(dictionary))==65152
assert clean_all_lines(dictionary)[0] == "a"
assert clean_all_lines(dictionary)[len(clean_all_lines(dictionary)) - 1] == "zwolle"

### (c) Choose good words
Dictionary words aren't all good choices for a word search. 

The file `common_words.txt` has the 1000 most common words in English in it, one word per line. 

* Use the functions you defined above to read this file. All words are 1 to 13 characters long.
* Write tests to validate you have read it correctly.

#### Filtering the good words
* Write a function `good_words(dictionary, common_words)` that will take the dictionary words from `clean_all_lines()` and the common words you just loaded, and apply all of the following rules to select words from the dictionary that are good for a word search. Return the "good" words as a list, sorted in order of word length, shortest word first. A word is good if it:
    * is not a common word;
    * is three to eight characters long;
    * has at least one vowel;
    * is not equal to itself reversed (e.g. `naan` reversed is `naan` and would be excluded);
    * if two words in the dictionary are the same *except* one has an `s` at the end, the word without the `s` should be kept, and the other discarded. For examples, "cats" and "cat" should become just "cat"; "burger" and "burgers" should become "burger".
* Write tests to check `good_words` works OK  

For the last two marks, make your solution *reasonably* efficient -- in particular, it should avoid executing in O(N^2) time, where N is the number of words in the dictionary. As a very rough guide, an efficient implementation might take 100-2000ms running the timing test on most machines, but not *much* more.

There are around 20000-40000 good words. `yourself`, `zucchini`, `galaxies`, `ant`, `ape`, `ark` are all "good words".

**[20 marks]**

In [4]:
# Solution goes here
common_words = read_lines("common_words.txt")

#common_words assertions
assert len(common_words) == 1000
for word in common_words:
    assert len(word)>=1 and len(word)<=13

def good_words(dictionary_words, common_words):
    dictionary = {}
    l = []
    for word in dictionary_words:
        if not word in common_words:
            dictionary[word] = True
    for word in dictionary:
        flag = True
        if len(word)<3 or len(word)>8:
            flag = False
        if len([ch for ch in word if ch in "aeiou"])==0:
            flag = False
        if word == word[::-1]:
            flag = False
        if word.endswith('s'):
            if word[:len(word)-1] in dictionary:
                flag = False
        if flag:
            l.append(word)
    return sorted(l,key = lambda x: len(x))
    
#good_words asserstions
assert good_words(["cat","the","turtle","from"], common_words) == ["cat","turtle"]
assert good_words(["cat","postcodes"], common_words) == ["cat"]
assert good_words(["psst","krr","turtle"], common_words) == ["turtle"]
assert good_words(["prosecution","shovel","bob"], common_words) == ["shovel"]
assert good_words(["cat","cats","dog","dogs"], common_words) == ["cat","dog"]

dictionary = clean_all_lines(read_lines("dictionary.txt"))

assert len(good_words(dictionary, common_words)) >=20_000 and len(good_words(dictionary, common_words)) <=40_000
for word in ["yourself","zucchini","galaxies","ant","ape","ark"]:
    assert word in good_words(dictionary, common_words)

In [5]:
%%timeit -n 10 -r 2
## Timing test
good_words(dictionary, common_words)

1.15 s ± 9.04 ms per loop (mean ± std. dev. of 2 runs, 10 loops each)


## Part 2: A word search finder

Word search puzzles use random letters to hide the letter patterns.

For example: find the words "spleen, matrix, coding, lambda, basil" in the 8x8 grid below:

    G S P L E E N D
    S P G H C F R M
    T P Z L O C A B
    I L D Z D V J A
    M A T R I X B S
    A J C G N X D I
    A A B S G R Z L
    V L A M B D A L
    
The solution can be seen in the lower case characters below:

    G s p l e e n D
    S P G H c F R M
    T P Z L o C A b
    I L D Z d V J a
    m a t r i x B s
    A J C G n X D i
    A A B S g R Z l
    V l a m b d a L
    
### Find words
Write a function `find_words(puzzle, words)` that takes a word search puzzle as a string, and a list of potential words, and returns each of the words found in the puzzle in a list. This should find words hidden horizontally (left-to-right) or vertically (top-to-bottom). The search should ignore case, and always return the found words in lower case. It should only ever return a detected word at most *once*. It should ignore any blank lines.

Note: if you choose to only detect horizontal words, you will lose five of the possible marks.

[**18 marks**]


In [8]:
# Solution goes here
def find_words(puzzle, words):
    puzzle_rows = puzzle.strip().lower().split("\n")
    array = np.array([row.split(" ") for row in puzzle_rows])
    new_l = []
    for word in words:
        word = word.lower()
        for row in array:
            row_string = "".join(row)
            if not word in new_l: 
                if word in row_string:
                    new_l.append(word)
        for column in range(array.shape[1]):
            column_string = "".join(array[:,column])
            if not word in new_l: 
                if word in column_string:
                    new_l.append(word)
    return new_l

In [9]:
## Tests

search = """
G S P L E E N D
S P G H C F R M
T P Z L O C A B
I L D Z D V J A
M A T R I X B S
A J C G N X D I
A A B S G R Z L
V L A M B D A L
"""

# compare two lists, ignoring order
def unordered_test(a,b):
    return sorted(a) == sorted(b)

assert unordered_test(find_words(search, ["lada", "bail"]), [])
assert unordered_test(find_words(search, []),[])
assert unordered_test(find_words(search, ["sbs"]),[])
assert unordered_test(find_words(search, ["matrix"]), ["matrix"])
assert unordered_test(find_words(search, ["lambda", "basil"]),["lambda", "basil"])
assert unordered_test(find_words(search, ["LAMBDA", "basil"]),["lambda", "basil"])
assert unordered_test(find_words(search, ["lAmbda", "BaSiL"]),["lambda", "basil"])
    
search_2 = """
F B S B s
G L P E u
B O A A g
P C I N a
O K N S r
"""

assert unordered_test(find_words(search_2, ["lada", "bail"]),[])
assert unordered_test(find_words(search_2, ["spain", "bail"]),["spain"])
assert unordered_test(find_words(search_2, ["spain", "sugar"]),["spain", "sugar"])
assert unordered_test(find_words(search_2, ["spainish"]),[])

search_3 = """
B S B S S S
E P E U U P
A A A G G A
N I N A A I
S N S R R N
S U G A R R
"""

assert unordered_test(find_words(search_3, ["lada", "bail"]),[])
assert unordered_test(find_words(search_3, ["spain"]),["spain"])
assert unordered_test(find_words(search_3, ["spain", "beans", "sugar"]),["spain", "beans", "sugar"])

## Adapt for reversed words

Adapt your solution to re-define a new version of `find_words(puzzle, words)` that also detects **reversed** words (both horizontal and vertical). For example

    M P F A O
    A I R J L
    P E O A L
    S C M M E
    Z E P Z H


contains "hello", but written bottom-to-top, and "pez" written right-to-left, as well as "jam" written top-to-bottom and "air" written left-to-right. The return value 
of `find_word(search, ["hello", "pez", "jam", "air"])` should be `["hello", "pez", "jam", "air"]` (in any order).

[**5 marks**]

In [12]:
# Solution goes here
def find_words(puzzle, words):
    puzzle_rows = puzzle.strip().lower().split("\n")
    array = np.array([row.split(" ") for row in puzzle_rows])
    new_l = []
    for word in words:
        word = word.lower()
        for row in array:
            row_string = "".join(row)
            if word in row_string or word in row_string[::-1]:
                if not word in new_l: 
                    new_l.append(word)
        for column in range(array.shape[1]):
            column_string = "".join(array[:,column])
            if word in column_string or word in column_string[::-1]:
                if not word in new_l: 
                    new_l.append(word)
    return new_l

In [13]:
search_4 = """
M P F A O
A I R J L
P E O A L
S C M M E
Z E P Z H
"""

assert find_words(search_4, [])==[]
assert find_words(search_4, ["lada", "bail"])==[]
assert find_words(search_4, ["spain", "bail"])==[]
assert find_words(search_4, ["hello"]) == ["hello"]
assert unordered_test(find_words(search_4, ["hello", "spam", "from", "me", "cats"]), ["hello", "spam", "from", "me"])
assert unordered_test(find_words(search_4, ["hello", "jam", "air", "pez"]), ["hello", "jam", "air", "pez"])

## Part 3: Generating a word search

Write a function `generate_word_search(n, words, horizontal=True, vertical=False, reversed=False)` that will generate a word search of size `n x n` containing all the words in `words` and return it as a string, one row per line. **If it is impossible to fit the words in the grid because they are too long, an error should occur.**

You do not need to consider the case where there are too many words to fit into a word search. You do not have to deal with overlapping words, but you must make sure that every word in `words` appears correctly in the final puzzle.

Each parameter `horizontal`, `vertical` should enable or disable embedding words in that orientation -- **at least one of these must be True, or an error should occur**; `reversed` will allow reversed words in all enabled orientations. If multiple directions are enabled, the direction of each word should be set randomly.

Every entry in the output word search should be an upper case letter. Passing an input word with a non-letter **should cause an error**.

Note:

* `random.choice(l)` chooses a random element of a list `l`
* `random.randint(0, n)` chooses a random number between 0 and (including) `n`

Note: the easiest (but not only) way to approach this is to initialise a grid of random letters and make *random attempts* to place words in it.



Every letter in a row in the returned string should be separated by spaces, as in the examples.

For example, 

    grid = generate_word_search(5, ["hello", "from", "me"])   

might produce:

    N I U R I
    Y X F J I
    H E L L O
    F R O M A
    H I M E Z
    
(note that the "background" letters are chosen *randomly*)
    
and
    
    grid = generate_word_search(8, ["hello", "from", "my", "secret", "lair"], horizontal=True, vertical=True)   
    
might produce:

    S C D D L G C R
    E E F H A M J J
    C T Q E I H F Q
    R K Y L R H E A
    E W C L T H H X
    T D A O A S K G
    Q P F R O M E P
    M Y L Y K C R G
    
while 

    grid = generate_word_search(4, ["hello", "from", "my", "secret", "lair"], horizontal=True, vertical=True)   
    
would produce an error (`secret` is more than 4 characters long), as would:

    grid = generate_word_search(5, ["hello", "from", "CS1P"])   

(as 1 is not a letter).


**[20 marks]**

In [15]:
# Solution goes here
def generate_word_search(n, words, horizontal = True, vertical = False, reversed = False):
    elements = [chr(random.randint(ord("A"),ord("Z"))) for i in range(n*n)]
    array = np.array(elements).reshape(n,n)
    for word in words:
        assert len(word) <= n, "{0} is larger than the grid".format(word)
            
        for elt in word:
            assert elt.isalpha(), "{0} contains non-letter characters".format(word)
            
        if reversed == True:
            reverse = random.choice(["reversed","not_reversed"])
            if reverse == "reversed":
                word = word[::-1]
                
        if horizontal == False and vertical == False:
            assert horizontal != False and vertical != False, "At least one of the directions have to be True" 
        
        if horizontal == True and vertical == True:
            direction = random.choice(["horizontal","vertical"])
            if direction == "horizontal":
                generate_words_on_row(array,word)
            else:
                generate_words_on_column(array,word)
        elif horizontal == True and vertical == False:
            generate_words_on_row(array,word)
        elif horizontal == False and vertical == True:
            generate_words_on_column(array,word)

    return "\n".join(" ".join(elt for elt in row) for row in array)
    
def generate_words_on_row(array, word):
    word = word.upper()
    random_row = random.choice(array)
    random_row_index = random.randint(0,len(random_row)-1)

    while len(random_row) - random_row_index < len(word):
        random_row = random.choice(array)
        random_row_index = random.randint(0,len(random_row)-1)

    for i in range(random_row_index, random_row_index + len(word)):
        random_row[i] = word[i - random_row_index]
        
    return array

def generate_words_on_column(array, word):
    word = word.upper()
    random_column = array[:,random.randint(0,array.shape[1]-1)]
    random_column_index = random.randint(0,len(random_column)-1)

    while len(random_column) - random_column_index < len(word):
        random_column = array[:,random.randint(0,array.shape[1]-1)]
        random_column_index = random.randint(0,len(random_column)-1)

    for i in range(random_column_index, random_column_index + len(word)):
        random_column[i] = word[i - random_column_index]
        
    return array        
    
print(generate_word_search(6, ["quizzz", "lambda", "matrix", "spleen", "basil"]))


U R T O I G
X C V Z M Q
B A S I L Z
M A T R I X
L D I W W N
S P L E E N


In [20]:
print(generate_word_search(8, ["coding", "lambda", "matrix", "spleen", "basil"]))
print()
print(generate_word_search(8, ["coding", "lambda", "matrix", "spleen", "basil"], horizontal=False, vertical=True))
print()
print(generate_word_search(8, ["coding", "lambda", "matrix", "spleen", "basil"],  vertical=True))
print()

S P L E E N N N
M D H E H C B L
O L D F F G W M
O Z C D J R Z G
L A M A T R I X
Z M I Y Z F F P
B A S I L O P U
K C O G S D X O

Y X G U L Z F Q
I X K A A R B M
Q A C P M T A Y
D V O I B E S E
R D D C D J I X
O U I E A J L V
H O N L C Z E H
G Y G Q H U N B

Z L A M B D A L
V L J X V B Z B
V O S P L E E N
Z S D H L N L Y
X C O D I N G U
J M A T R I X C
E O F G X Q F I
X I W B A S I L



In [28]:
print(generate_word_search(4, ["beta", "pram", "nimo"]))


A B X B
P R A M
N I M O
B E T A


In [37]:
## Tests

def fails(expr):
    try:
        expr()
    except:
        return True
    return False

def n_lines(s):
    return len(s.splitlines())

def elt_check(s, n):    
    return all(len(line.split())==n for line in s.splitlines())

assert n_lines(generate_word_search(8, ["matrix", "spleen", "basil"]))==8
assert n_lines(generate_word_search(6, ["matrix", "spleen", "basil"]))==6
assert n_lines(generate_word_search(12, ["matrix", "spleen", "basil"]))==12

assert elt_check(generate_word_search(12, ["matrix", "spleen", "basil"]), 12)
assert elt_check(generate_word_search(8, ["matrix", "spleen", "basil"]), 8)
assert elt_check(generate_word_search(6, ["matrix", "spleen", "basil"]), 6)
assert elt_check(generate_word_search(3, ["tea", "bee", "elk"]), 3)
assert n_lines(generate_word_search(3, ["tea", "bee", "elk"])) ==  3
assert n_lines(generate_word_search(6, []))==6
assert elt_check(generate_word_search(6, []),6)


assert (lambda :generate_word_search(8, ["127", "9", "matrix", "spleen", "basil"]))

assert fails(lambda :generate_word_search(4, ["coding", "lambda", "matrix", "spleen", "basil"])) and not fails(lambda:generate_word_search(4, ["code", "lamb", "matt"]))
assert fails(lambda :generate_word_search(8, ["127", "9", "matrix", "spleen", "basil"]))
assert fails(lambda :generate_word_search(8, ["back{}", "span's", "matrix", "spleen", "'''"]))
assert fails(lambda :generate_word_search(8, ["coding", "lambda", "matrix", "spleen", "basil"], horizontal=False)) and not fails(lambda :generate_word_search(8, ["coding", "lambda", "matrix", "spleen", "basil"]))


# END OF EXAM

Please:

* Take a break.
* Make sure you read each question carefully. The number one reason to lose marks is to not read the question!
* Check that each of your cells run as you expect. Try `Kernel/Restart and Run All` to make sure.
* Submit your solution on Moodle 
* And then relax :) 

---

