## Python challenges

This notebook includes a series of challenges to test your Python coding skills. If you get stuck, try googling for answers. If you don't understand *why* a particular answer works, try searching for the answer to that question. Revisit old tutorials from this class as needed, and finally, turn to the course chatroom for help. Best of luck.

In [None]:
# required software
# conda install numpy pandas toyplot requests -c conda-forge

In [1]:
import requests
import numpy as np
import pandas as pd
import toyplot

### Challenge 1: 
Execute the code cell below to see an example of how it works. 
Use markdown in the cell after the code-block to describe the function `random_words_api`. Try to be descriptive about what each step of code in this function does, and why it works. 

In [5]:
def random_words_api(nwords=10):
    "no docstring"
    URL = "https://random-word-api.herokuapp.com/word"
    response = requests.get(url=URL, params={"number": nwords})
    return response.json()

# demonstration
random_words_api(5)

['snatchers', 'ideally', 'circulator', 'environ', 'spilth']

**ANSWER Description:** The `random_words_api` function takes in a user input (an integer) that specifies the number of random words that they would like to draw. If no user input is given, the function by default draws 10 words. Then, the function draws the specified number of words by sending a GET request to a REST API containing random words and receives the specified number of random words. The function then outputs these words (given the specified number of words) in json format - a dictionary like hierarchical file structure. 

### Challenge 2: 
Use the `random_words_api` function to get 50 random words and store the result as a variable. Write a function that takes the list of words as input and returns a dictionary with the longest word as the key and the length of the longest word as a value. If there is a tie in the length of words then have it return additional words as keys with their lengths as values.

In [42]:
# get 50 random words and store result as variable
randomwords = random_words_api(50)

In [54]:
# defining function
def max_word_length(words):
    """Takes list of words as inputs and returns dictionary with 
    longest word as key and length of longest word as value"""
    # all_dict to store all words and their lengths, final_dict only for longest word and longest length
    all_dict = {}
    final_dict = {}
    # create dictionary with all words and their lengths
    for x in words:
        all_dict[x] = len(x)
    # find max length and respective word
    max_value = all_dict.get(max(all_dict, key=all_dict.get))
    # create final dictionary with longest word only 
    for key in all_dict:
        if len(key) == max_value:
            final_dict[key] = len(key)    
    return final_dict

# demo with 50 random words generated above
max_word_length(randomwords)

{'dexterousnesses': 15, 'psychobiography': 15, 'overcentralized': 15}

### Challenge 3: 
Write a function to take the list of words as input and trim all words to be at most 5 characters in length, and return as a list.

In [69]:
def trim_word_length(randomwords):
    """Takes list of words as input and trim all words to be at 
    most 5 characters in length, and return as a list."""
    # list to store result
    trimmed_list = []
    # for each word, if length > 5, trim word to only 5 characters and then append to list
    # else, directly append the entire word to the list
    for word in randomwords:
        if len(word) > 5:
            new_word = word[0:5]
            trimmed_list.append(new_word)
        else:
            trimmed_list.append(word)
    return(trimmed_list)

# demo with 50 random words generated above
trim_word_length(randomwords)

['metal',
 'oxido',
 'endos',
 'basem',
 'songl',
 'refur',
 'canep',
 'exclu',
 'dexte',
 'canon',
 'sirup',
 'psych',
 'marsh',
 'devel',
 'regre',
 'scuff',
 'outfo',
 'undra',
 'unive',
 'sacro',
 'recra',
 'mucid',
 'forfe',
 'fiber',
 'bioto',
 'tumul',
 'terra',
 'fogie',
 'dispe',
 'bizon',
 'mediu',
 'pheno',
 'hurtf',
 'utter',
 'menia',
 'corre',
 'rante',
 'depor',
 'taper',
 'pauci',
 'malar',
 'overc',
 'domic',
 'infla',
 'diplo',
 'excom',
 'monop',
 'behav',
 'unchi',
 'legal']

### Challenge 4: 
Write a function to take a list of words as input and to count the occurrence of all letters in every word and return as a dictionary mapping letters to integers., e.g., {'a': 10, 'b': 3, 'c': 5, ...}.  

In [79]:
def count_characters(randomwords):
    """Takes list of words as input and count the occurence 
    of all letters in every word and return as a dictionary
    mapping letters to integers"""
    # create dictionary with all characters with values as 0 first
    import string
    all_char = string.ascii_lowercase
    final_count = {}
    for char in all_char:
        final_count[char] = 0
    
    # loop through each word and each character and count characters
    # final result returned in final_count dictionary
    for word in randomwords:
        for char in word:
            final_count[char] = final_count.get(char) + 1    

    return final_count

# demo with 50 random words generated above
count_characters(randomwords)

{'a': 32,
 'b': 9,
 'c': 24,
 'd': 22,
 'e': 64,
 'f': 10,
 'g': 7,
 'h': 10,
 'i': 40,
 'j': 0,
 'k': 2,
 'l': 23,
 'm': 19,
 'n': 26,
 'o': 34,
 'p': 14,
 'q': 0,
 'r': 36,
 's': 41,
 't': 31,
 'u': 25,
 'v': 5,
 'w': 0,
 'x': 5,
 'y': 9,
 'z': 3}

### Challenge 5:
Use [toyplot](https://toyplot.readthedocs.io/en/stable/tutorial.html) to create a barplot of the occurrences of each letter in your dictionary from the previous challenge. This will represent a histogram of the letters. Play with the size and color of the figure to try to make it look nice.

In [81]:
barplot_data = count_characters(randomwords)

In [218]:
import toyplot
import numpy as np

# create list for all occurences of each letter in dictionary from previous challenge
letter_occurence_list = []
for x in barplot_data:
    letter_occurence_list.append(barplot_data.get(x))
print(letter_occurence_list)

[32, 9, 24, 22, 64, 10, 7, 10, 40, 0, 2, 23, 19, 26, 34, 14, 0, 36, 41, 31, 25, 5, 0, 5, 9, 3]


In [219]:
# plot barplot/histogram with toyplot
dimension = np.linspace(0, max(letter_occurence_list))
canvas = toyplot.Canvas(width=500, height=300)
axes = canvas.cartesian(label="Histogram of letter occurence distribution", xlabel="Number of letter occurences", ylabel="Frequency")
mark = axes.bars(np.histogram(letter_occurence_list, 20), color=["blue"])

### Challenge 6: 
Using numpy create a new variable called `arr` with 1000 random samples from a normal distribution. Use the numpy `.histogram` function to bin these values into 20 bins, and then plot the histogram using a barplot from toyplot. Color the bars of the histogram orange.

In [124]:
arr = np.random.normal(size=1000)

In [220]:
# plot barplot/histogram with toyplot
dimension = np.linspace(0, max(arr))
canvas = toyplot.Canvas(width=500, height=300)
axes = canvas.cartesian(label="Histogram of random normal distribution", xlabel="Random normal distribution values", ylabel="Frequency")
mark = axes.bars(np.histogram(arr, 20), color=["blue"])

### Challenge 7: 
Write a `while` statement to continue running code in a loop until a condition is met, and then call `break` to end the loop. Inside of the loop, randomly draw a single value from a uniform distribution between 0 and 100. If the value is less than 25 and greater than 22 then break the loop, otherwise, continue the loop until a value meeting this condition is sampled. Use a variable to keep track of how many iterations of the loop are run, and print this value after calling `break`. 


In [148]:
# provide initial value from uniform distribution, sampled between 0 to 100
value = np.random.uniform(low=0, high=100, size=1)
# counter to count number of iterations loop is run
counter = 0

# keep running loop and sampling until sampled value < 25 and value > 22, then print number of iterations
while value > 0:
    if value < 25 and value > 22:
        counter += 1
        break
    else:
        value = np.random.uniform(low=0, high=100, size=1)
        counter += 1
print("Number of iterations:", counter)

Number of iterations: 17


### Challenge 8: 
Use pandas to load a CSV file from https://eaton-lab.org/data/iris-data-dirty.csv and save as a dataframe. Add custom names to the columns in the dataframe, based on the type of values in them (e.g., numeric versus strings). You can come up with any column names you want for these.

In [163]:
import pandas as pd
# import csv file and label with custom column names
irisdata = pd.read_csv("https://eaton-lab.org/data/iris-data-dirty.csv", 
                       names=['leaf length (numeric)', 'leaf width (numeric)', 'petal length (numeric)', 'petal width (numeric)', 'Species name (string)'])
# display first few rows of new data frame
irisdata.head()

Unnamed: 0,leaf length (numeric),leaf width (numeric),petal length (numeric),petal width (numeric),Species name (string)
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


### Challenge 9: 
Calculate the mean value of the data in the left-most column for all data where the right-most column matches the value "Iris-setosa". 

In [175]:
# subset out all data that matches the value "Iris-setosa"
subset = irisdata[(irisdata["Species name (string)"] == "Iris-setosa")]
subset.head()

Unnamed: 0,leaf length (numeric),leaf width (numeric),petal length (numeric),petal width (numeric),Species name (string)
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


In [176]:
subset.tail()

Unnamed: 0,leaf length (numeric),leaf width (numeric),petal length (numeric),petal width (numeric),Species name (string)
45,4.8,3.0,1.4,0.3,Iris-setosa
46,5.1,3.8,1.6,0.2,Iris-setosa
47,4.6,3.2,1.4,0.2,Iris-setosa
48,5.3,3.7,1.5,0.2,Iris-setosa
49,5.0,3.3,1.4,0.2,Iris-setosa


In [177]:
# find mean of left most column in subsetted data
mean = subset["leaf length (numeric)"].mean()
print(mean)

5.010204081632653


### Challenge 10:
Create a copy of your iris data dataframe and name it `df2`. Sort the rows of this dataframe based on the values in the first (leftmost) column so that the lowest values are first, and the highest values at the bottom. After sorting, reset the index of the dataframe so that the index is once again ordered. Once you get this to work, try to rewrite it in a simpler form by chaining multiple function calls together to accomplish the goal in one line, by calling code that looks a bit like this, but with the correct function calls: `df.function().function().function()`

In [199]:
# combine all function calls together
df2 = irisdata.copy().sort_values(by=["leaf length (numeric)"]).reset_index(drop=True)
df2

# individual function calls below for reference
#df2 = irisdata.copy()
#df2 = df2.sort_values(by=["leaf length (numeric)"])
#df2 = df2.reset_index(drop=True)

Unnamed: 0,leaf length (numeric),leaf width (numeric),petal length (numeric),petal width (numeric),Species name (string)
0,4.3,3.0,1.1,0.1,Iris-setosa
1,4.4,3.2,1.3,0.2,Iris-setosa
2,4.4,3.0,1.3,0.2,Iris-setosa
3,4.4,2.9,1.4,0.2,Iris-setosa
4,4.5,2.3,1.3,0.3,Iris-setosa
...,...,...,...,...,...
145,7.7,2.8,6.7,2.0,Iris-virginica
146,7.7,2.6,6.9,2.3,Iris-virginica
147,7.7,3.8,6.7,2.2,Iris-virginica
148,7.7,3.0,6.1,2.3,Iris-virginica


### Challenge 11:
Write a function that uses string formatting (curly braces) to create a [mad lib](https://en.wikipedia.org/wiki/Mad_Libs) containing at least 4 words that will be filled in. The returned object of your function should be a string where the missing words are filled by randomly sampled words from the `random_words_api()` function. The sentence or paragraph of your mad lib can be anything you wish, be creative. 

In [217]:
def mad_lib(num_words):
    """Fill in mad lib sentence with random words."""
    randomwords = random_words_api(num_words)
    sentence = "The weather is {} today and I plan to go to {} with {} and spend the day doing {}."
    new = sentence.format(randomwords[0], randomwords[1], randomwords[2], randomwords[3])
    return(new)

# demo with 4 words
mad_lib(4)

'The weather is fuzztones today and I plan to go to oafishly with greenier and spend the day doing viscoses.'

<div class="alert alert-success">
After completing all challenges in this notebook, save and download the .ipynb file to your computer. Move the file to your hack-program repo and put it in a folder called notebooks. Add/stage this file and folder and commit the change, and push to GitHub. The assignment is due by end of day on 3/7/2021.  
</div>