## Problem: Write data columns

Using the following data,

    data=[5,4,6,1,9,0,3,9,2,7,10,8,4,7,1,2,7,6,5,2,8,2,0,1,1,1,2,10,6,2]

write a function `write_columns` to write the three following calculated columns to a user-specified comma-separated file:

    data_value, data_value**2, (data_value+data_value**2)/3.
    
Your written floating-point values should be formatted to the hundreths place. Your function can only accept lists of integers/floats as input.

In [1]:
def write_columns(data,fname):
    '''
    Given data as a list, write three columns to fname.
    
    :param: data 
    :type : list
    :param: fname
    :type: str
    '''

    assert all(isinstance(i, (int, float)) for i in data)
    assert isinstance(fname, str)
    
    output = []
    for data_value in data:
        output.append([data_value, data_value**2, format((data_value+data_value**2)/3, '.2f')])
        
    with open(fname, 'w') as f:
        for i in output:
            f.write(', '.join(str(j) for j in i)+'\n')
            
    return

# data=[5,4,6,1,9,0,3,9,2,7,10,8,4,7,1,2,7,6,5,2,8,2,0,1,1,1,2,10,6,2]
# write_columns(data,"out.txt")


## Problem: Text Processing

Download this [corpus of 10,000 common English words](https://storage.googleapis.com/class-notes-181217.appspot.com/google-10000-english-no-swears.txt) and write the indicated functions that answer the following questions:

- What is the longest word?
- What is the longest word that starts with a particular character (e.g., `s`)
- What is the most common starting letter?
- What is the most common ending letter?

Your functions should only take list of words as input.

In [1]:
# you can use this bit of code to download the words from the corpus
from urllib.request import urlopen

u='https://storage.googleapis.com/class-notes-181217.appspot.com/google-10000-english-no-swears.txt'
response = urlopen(u)
words = [i.strip().decode('utf8') for i in response.readlines()]
# print(words)

# write a function to compute the longest word
def get_longest_word(words):
    '''
    Given data as a list, return the longest word.
    
    :param: words 
    :type : list
    '''
    assert isinstance(words, list)
    
    dic = {}
    for word in words:
        if len(word) not in dic: 
            dic[len(word)] = [word]
        else:
            dic[len(word)].append(word)
    
    longest_words = dic.get(max([*dic.keys()]))
    return max(longest_words)
    
def get_longest_words_startswith(words,starts):
    '''
    Given data as a list, return the longest word starts with the particular character.
    
    :param: words 
    :type : list
    :param: starts 
    :type : str
    '''
    assert isinstance(words, list)
    assert isinstance(starts, str)
    assert len(starts)==1
    
    dic = {}
    for word in words:
        if word[0] == starts:
            if len(word) not in dic: 
                dic[len(word)] = [word]
            else:
                dic[len(word)].append(word)
    
    longest_words = dic.get(max([*dic.keys()]))
    return max(longest_words)

def get_most_common_start(words):
    '''
    Given data as a list, return the most common starting character.
    
    :param: words 
    :type : list
    '''
    assert isinstance(words, list)
    
    dic = {}
    for word in words:
        char = word[0]
        if char not in dic:
            dic[char] = 1
        else:
            dic[char] += 1
            
    most_common = [key for key, value in dic.items() if value==max(dic.values())]
    return max(most_common)

def get_most_common_end(words):
    '''
    Given data as a list, return the most common ending character.
    
    :param: words 
    :type : list
    '''
    assert isinstance(words, list)
    
    dic = {}
    for word in words:
        char = word[-1]
        if char not in dic:
            dic[char] = 1
        else:
            dic[char] += 1
            
    most_common = [key for key, value in dic.items() if value==max(dic.values())]
    return max(most_common)

# a = ['a', 'bb', 'ccc', 'ddd']
# print(get_most_common_end(words))

In [3]:
### BEGIN  TESTS
assert get_longest_words_startswith(words,'s')=='sustainability'
### END  TESTS

In [4]:
### BEGIN  TESTS
assert get_most_common_end(words)=='s'
### END  TESTS

In [5]:
assert len(get_longest_word.__doc__)>1
assert len(get_most_common_start.__doc__)>1
assert len(get_most_common_end.__doc__)>1
assert len(get_longest_words_startswith.__doc__)>1

## Problem: Write chunks of five

Using the same [corpus of 10,000 common English words](https://storage.googleapis.com/class-notes-181217.appspot.com/google-10000-english-no-swears.txt) as before,
create a new file that consists of each
consecutive non-overlapping sequence of five lines merged into one line. Here
are the first 10 lines:

    the of and to a
    in for is on that
    by this with i you
    it not or be are
    from at as your all
    have new more an was
    we will home can us
    about if page my has
    search free but our one
    other do no information time

If the last group has less than five at the end, just write out the last group.

In [2]:
def write_chunks_of_five(words,fname):
    '''
    :param: words
    :type: list
    :param: fname
    :type: str
    '''
    assert isinstance(words, list)
    assert isinstance(fname, str)
    
    i = 0
    for word in words:
        with open(fname, 'a') as f:
            f.write(word+' ')
            i +=1
            if i==5:
                f.write('\n')
                i = 0
        
    return

# write_chunks_of_five(words,'out3.txt')