# Question 1 Word processing
Download this corpus of 10,000 common English words and write the following functions given a list of words:

Compute the average length of the words (get_average_word_length(words))
What is the longest word (get_longest_word(words))?
What is the longest word that starts with a single letter (get_longest_words_startswith(words,start))
What is the most common starting letter (get_most_common_start(words))?
What is the most common ending letter (get_most_common_end(words))?
For testing you can use this bit of code to download the words from the corpus:

from urllib.request import urlopen
u='https://storage.googleapis.com/class-notes-181217.appspot.com/google-10000-english-no-swears.txt'
response = urlopen(u)
words = [i.strip().decode('utf8') for i in response.readlines()]

**Validation Tests** <br>
Check for corner cases and constraints in the inputs enlist all cases used for testing

In [None]:
assert isinstance(words, list), "words must be a list"
assert len(words) > 0, "words can not be an empty list"
assert all(isinstance(word, str) for word in words), "words must include only string-type objects"
assert isinstance(start, str) and len(start) == 1 and start.isalpha() and start.islower(), "start must be a lower-case letter"

**Functional Tests** <br>
Check function output matches expected result enlist all cases used for testing

*get_average_word_length(words)*

In [None]:
data = ["worda", ""]
assert 2.5 == get_average_word_length(data), "checking for lists including empty words (ideally impossible)"

In [None]:
data = ["worda", "wordb", "wordc", "wordd"]
assert 5 == get_average_word_length(data), "checking for lists in which all words are of equal length"

*get_longest_word(words)*

In [None]:
data = ["worda", "wordb", "wordc"]
assert "worda" == get_longest_word(data), "checking for the tie situation"

In [None]:
x = ['aaaaa', 'bb', 'ccc' , 'eeeee']
assert "aaaaa" == get_longest_word(data), "checking for the tie situation"

*get_longest_words_startswith(words, start)*

In [None]:
words = ["aaaaa", "aword", "bword"]
start = "a"
assert "aaaaa" == get_longest_words_startswith(words, start), "checking for the tie situation"

In [None]:
words = ["aaaaa", "aword", "bword"]
start = "c"
assert "" == get_longest_words_startswith(words, start), "checking for the empty words scenario"

*get_most_common_start(words)*

In [None]:
data = ["aword", "bword", "cword"]
assert "a" == get_most_common_start(data), "checking for the tie situation"

In [None]:
data = = ['aa', 'bb', 'ccc', 'ddd' , 'eeee']
assert "a" == get_most_common_start(data),   "checking condition where every word is made up of the same letters"

*get_most_common_end(words)*

In [None]:
data = ["worda", "wordb", "wordc"]
assert "a" == get_most_common_end(data), "checking for the tie situation"

In [None]:
data = ["a", "worda", "wordb", "wordbb"]
assert "a" == get_most_common_start(data), "checking if function counts the correct position of each word"

In [None]:
data = = ['a', 'bb', 'ccc', 'ddd' , 'eeee']
assert "a" == get_most_common_end(data),   "checking condition where every word is made up of the same letters"

# Question 2 Interval objects
Problem: Interval arithmetic
Using Python object oriented programming, write a class called Interval that represents a one-dimensional open interval on the real line. This main purpose of this class is to simplify overlapping continuous intervals. The code below should get you started but there are a lot of missing pieces that you will have to figure out.

The API should take a pair of integers as input and respond to the + operator such that

 
 >>> a = Interval(1,3) 
 >>> b = Interval(2,4) 
 >>> c = Interval(5,10) 
 >>> a + b  
 Interval(1,4) 
 >>> b+c 
 [ Interval(2,4), Interval(5,10)]
Note that in the case of non-overlapping intervals, the output should be a list of constituent Intervals. Keep in mind that these are open intervals. Specifically,


>>> Interval(2,3)+Interval(1,2) 
 [Interval(2,3), Interval(1,2)]

Note that these do not produce a single interval because each interval is open (not closed). The interval endpoints can be negative also (e.g., Interval(-10,-3) is valid). The output does not have to be sorted.

It's up to you to write the dunder functions for your object. If you do this right, you will have a very general solution to this problem.

This is where good object-oriented design pays off.

Note: Make sure to implement __eq__ method below, to pass all the test cases in the grader.

Starter Code:
     # fill out the necessary methods shown below and add others if need be.

     class Interval(object):
         def __init__(self,a,b):
             """
             :a: integer
             :b: integer
             """
             assert a<b
             assert isinstance(a,int)
             assert isinstance(b,int)
             self._a = a
             self._b = b
          def __repr__(self):
             pass
          def __eq__(self,other):
             pass
          def __lt__(self,other):
             pass
          def __gt__(self,other):
             pass
          def __ge__(self,other):
             pass
          def __le__(self,other):
             pass
          def __add__(self,other):
             pass

**Validation Tests** <br>
Check for corner cases and constraints in the inputs enlist all cases used for testing

In [None]:
assert isinstance(a, int), "a must be an integer"
assert isinstance(b, int), "b must be an integer"
assert a < b, "a must be smaller than b"

**Functional Tests** <br>
Check function output matches expected result enlist all cases used for testing

*\_\_repr\_\_*

In [None]:
a, b = 1, 2
assert "Interval(1,2)" == Interval(a, b).__repr__(), "checking for repr format"

*\_\_eq\_\_*

In [None]:
a1, b1, a2, b2 = 1, 2, 1, 2
assert Interval(a1, b1) == Interval(a2, b2), "checking for true equality"

In [None]:
a1, b1, a2, b2 = 1, 2, 2, 1
assert Interval(a1, b1) != Interval(a2, b2), "checking for false equality"

In [None]:
a1, b1, a2, b2 = 1, 2, 1, 3
assert Interval(a1, b1) != Interval(a2, b2), "checking for false equality"

In [None]:
a1, b1, a2, b2 = -1, 3, 1, 3
assert Interval(a1, b1) != Interval(a2, b2), "checking for false equality"

*\_\_add\_\_*

In [None]:
a1, b1, a2, b2 = 1, 2, 3, 4
assert [Interval(1, 2), Interval(3, 4)] == Interval(a1, b1) + Interval(a2, b2), \
    "checking for intervals without intersection (case 1: a1 < b1 < a2 < b2)"

In [None]:
a1, b1, a2, b2 = -4, -3, 1, 2
assert [Interval(-4, -3), Interval(1, 2)] == Interval(a1, b1) + Interval(a2, b2), \
    "checking for intervals without intersection (case 2: a2 < b2 < a1 < b1)"

In [None]:
a1, b1, a2, b2 = 1, 3, 2, 4
assert Interval(1, 4) == Interval(a1, b1) + Interval(a2, b2), \
    "checking for intervals with intersection (case 3: a1 < a2 < b1 < b2)"

In [None]:
a1, b1, a2, b2 = 1, 4, 2, 3
assert Interval(1, 4) == Interval(a1, b1) + Interval(a2, b2), \
    "checking for intervals with intersection (case 4: a1 < a2 < b2 < b1)"

In [None]:
a1, b1, a2, b2 = 2, 3, 1, 4
assert Interval(1, 4) == Interval(a1, b1) + Interval(a2, b2), \
    "checking for intervals with intersection (case 5: a2 < a1 < b1 < b2)"

In [None]:
a1, b1, a2, b2 = 2, 4, 1, 3
assert Interval(1, 4) == Interval(a1, b1) + Interval(a2, b2), \
    "checking for intervals with intersection (case 6: a2 < a1 < b2 < b1)"

In [None]:
a1, b1, a2, b2 = -3, -1, -4, -3
assert Interval(-4, -1) == Interval(a1, b1) + Interval(a2, b2), \
    "checking for intervals with intersection (case 7: a2 < b2 < a1 < b1)"

# Question 3 Random slash- no pictures
Random Image Slash
A 6x6 black-and-white image is represented as a Numpy array x as in the following,

>>> import numpy as np
>>> x = np.eye(6) 
Note that this is not a grayscale or color image for which there would be three dimensions (e.g., 6 x 6 x 3). This can easily be visualized using Matplotlib's imshow function, as in the following:

>>> from matplotlib.pylab import subplots, cm >>> fig, ax = subplots()  >>> ax.imshow(x,cmap=cm.gray_r)  
drawing

To debug an image processing algorithm, you have to generate a large number of exemplar training images that consist of such Numpy arrays. Each image should represent a forward or backward leaning (shown above) slash symbol. Each symbol must consist of at least two non-zero pixels and be contiguous (i.e., no gaps). For example, the longest possible slash symbol that is representable in a 6x6 image is the 6 nonzero pixel diagonal image show above (or its opposite leaning forwardslash variant).

The assignment is to write a function that can produce a uniformly random forward or backslashed image (i.e., Numpy array) of at least two non-zero pixels. Here is some code that generates the following figure,

fig,axs=subplots(3,3,sharex=True,sharey=True)
 for ax in axs.flatten():
     ax.imshow(gen_rand_slash(),cmap=cm.gray_r)
 

drawing

Here is the function signature: gen_rand_slash(m=6,n=6,direction='back'). The direction keyword argument can be either back or forward. The m is the number of rows in the image.

Note: Don't import matplotlib in your solutions. Only allowed external library is numpy.


**Validation Tests** <br>
Check for corner cases and constraints in the inputs enlist all cases used for testing

In [None]:
assert isinstance(m, int), "m must be an integer"
assert isinstance(n, int), "n must be an integer"
assert isinstance(direction, str), "direction must be a string"
assert m > 1 and n > 1, "m and n must be at least 2"
assert direction == "back" or direction == "forward", "direction can only be back or forward"

In [None]:
import numpy as np
op = gen_rand_slash()
assert(isinstance(op, np.ndarray)), \
        "output must be a numpy array"

**Functional Tests** <br>
Check function output matches expected result enlist all cases used for testing

In [None]:
import numpy as np
def count_slash_area(slash):
    """Counts the black area of slash which is marked as 1"""
    return len(np.where(slash > 0)[0])

In [None]:
# get 100 random slashes in back direction
data = [gen_rand_slash(6, 8) for i in range(100)]
assert all(2 <= count_slash_area(slash) <= 6 for slash in data), \
    "slash area must be at least 2 and not more than the length of the shorter side (case 1: back direction and m < n)"

In [None]:
# get 100 random slashes in back direction
data = [gen_rand_slash(8, 6) for i in range(100)]
assert all(2 <= count_slash_area(slash) <= 6 for slash in data), \
    "slash area must be at least 2 and not more than the length of the shorter side (case 2: back direction and m > n)"

In [None]:
# get 100 random slashes in forward direction
data = [gen_rand_slash(6, 8, "forward") for i in range(100)]
assert all(2 <= count_slash_area(slash) <= 6 for slash in data), \
    "slash area must be at least 2 and not more than the length of the shorter side (case 3: forward direction and m < n)"

In [None]:
# get 100 random slashes in forward direction
data = [gen_rand_slash(8, 6, "forward") for i in range(100)]
assert all(2 <= count_slash_area(slash) <= 6 for slash in data), \
    "slash area must be at least 2 and not more than the length of the shorter side (case 4: forward direction and m > n)"

In [None]:
def check_if_contiguous(arr, mode):
    """
    Takes as input array produced by gen_rand_slash() and verifies if all 1's are contiguous diagonal elements.
    :param arr: numpy array with 1's
    :type arr: numpy array
    :param mode: string indicating forward or backward
    :type mode: str

    returns True if no assertion error

    Author: Chaitanya
    """
    
    import numpy as np
    
    assert(isinstance(arr, np.ndarray))
    assert(isinstance(mode, str) and (mode == 'forward' or mode == 'back'))
    
    indices_of_1 = np.where(arr)
    
    if(mode == 'forward'):
        all_true_ideally =  (indices_of_1[0]-indices_of_1[1]==(indices_of_1[0][0]-indices_of_1[1][0]))
        for i in all_true_ideally:
            assert(i == True)
    else:
        all_true_ideally = (indices_of_1[0]+indices_of_1[1]==(indices_of_1[0][0]+indices_of_1[1][0]))
        for i in all_true_ideally:
            assert(i == True)
    return True

In [None]:
# slash area must be contiguous
mode = "forward"
data = [gen_rand_slash(8, 6, mode) for i in range(100)]
for i in data:
  assert(check_if_contiguous(arr, mode)), \
      "1's in numpy array not contiguous"

NameError: ignored