# Pre-requisites quiz

The course assumes that you are comfortable in using Python and familiar with basic mathematics, statistics  and the use of `numpy`. 

For example, you should be able to complete the following quiz.

## Basic Python

1. Generate two collections of 1,000 random 3-character words (ASCII lowercase, only use a-z). 

In [1]:
#Begin by importing needed packages
import string 
import random

#Create lists to hold words
word_collection_1 = []
word_collection_2 = []

#Get list of letters
letters = list(string.ascii_lowercase)

#Create 1000 words in each list
for i in range(1000):
    word_collection_1.append(''.join(random.sample(letters, 3)))
    word_collection_2.append(''.join(random.sample(letters, 3)))

2. Count the number of unique words found in both collections.

In [2]:
#Sets have unique elements. By creating a set of words the number of words in the set will be equal to the number of unique wo

def word_count(list):
    count = len(set(list))
    print(count)

In [3]:
#Display the word counts of each collection.
word_count(word_collection_1)
word_count(word_collection_2)

964
965


3. Find the most frequently occurring word(s) in the combined collection.

In [4]:
from collections import Counter

#Create combined collection
combined = word_collection_1 + word_collection_2

#Find the top 5 most frequent words
Counter(combined).most_common(5)

[('itx', 3), ('yja', 3), ('vol', 3), ('deh', 3), ('jvi', 3)]

4. Count the number of words in the first collection that consist of all vowels.

In [5]:

#Iterate over all of the words and then all of the letters.
def vowel_word_count(word_collection):
    #Create a set of vowels
    vowels = [
        'a', 'e', 'i', 'o', 'u'
    ]
    num_vowel_words = 0
    
    for word in word_collection:
        if all(letter in vowels for letter in word): #Returns true if all of the values are vowels
            num_vowel_words += 1
        else:
            pass
        
    print("The number of vowel words is: " + str(num_vowel_words) +".")

In [6]:
#Display vowel word count
vowel_word_count(word_collection_1)

The number of vowel words is: 4.


5. Write a function that takes as input a list of words, and builds a new string starting from the first word according to the following rule;

If the next word begins with the same character as the first word, for a new "first" word by concatenating the next word with the first, otherwise discard the next word. Return the final string. Test this on collections 1 and 2.

For example, given the input
```python
['abc', 'def', 'cde', 'def', 'efg']
```

the function returns `abccdeefg`.

In [7]:
#Create function to build words accordding to specifications.

def word_builder(word_list):
    
    first_word = word_list[0] #Get initial word
    
    for i in range(len(word_list)-1): #Iterate over all the words in the list
        next_word = word_list[i+1]
        if (first_word[0] == next_word[0]): #Compare the first letters of the two words,concatenating if they have the same first letter. 
            first_word = first_word + next_word
        else:
            pass
    return(first_word)

In [8]:
final_word = word_builder(word_collection_1)

In [9]:
#Check if every third letter is the same
def word_checker(final_word):
    for i in range(len(final_word)):
        first_letter = final_word[0]
        if ((i%3) == 0):
            assert(final_word[i] == first_letter)
        else: 
            pass
word_checker(final_word)

## Using `numpy`

Only use `numpy` to complete the following exercise.

1. Set the random see in `numpy` to 123

In [10]:
import numpy as np

np.random.seed(123)

2. Create $X_1$, a 10 $\times$ 5 matrix of numbers from a $N(μ=10, σ=5)$ distribution

In [11]:
X1 = 5 * np.random.randn(10, 5) + 10

print(X1, np.mean(X1), np.std(X1))

[[ 4.57184698 14.98672723 11.41489249  2.46852643  7.10699874]
 [18.25718269 -2.13339622  7.85543686 16.32968129  5.66629799]
 [ 6.60556924  9.52645516 17.45694813  6.80549002  7.7800902 ]
 [ 7.82824362 21.02965041 20.93393044 15.02026949 11.930932  ]
 [13.68684288 17.45366014  5.32083066 15.87914522  3.73059666]
 [ 6.81124249 14.53552598  2.8565965   9.2996564   5.69122552]
 [ 8.72190315 -3.99294553  1.14233448  6.50061383 14.63731216]
 [ 9.13182159 10.01422958 13.44111356  5.60231828 11.41813662]
 [ 5.97316741  1.36165253  8.04550103 12.86902931 11.69294525]
 [ 9.94084753 21.96182633 12.0645608  14.89368003 21.19071669]] 10.066357205418926 5.94811407014073


3. Create $X_2$ by scaling the rows of $X_1$ so that they have zero mean and unit standard deviation

In [12]:
X2 = (X1 - 10)/5
print(X2, np.mean(X2), np.std(X2))

[[-1.0856306   0.99734545  0.2829785  -1.50629471 -0.57860025]
 [ 1.65143654 -2.42667924 -0.42891263  1.26593626 -0.8667404 ]
 [-0.67888615 -0.09470897  1.49138963 -0.638902   -0.44398196]
 [-0.43435128  2.20593008  2.18678609  1.0040539   0.3861864 ]
 [ 0.73736858  1.49073203 -0.93583387  1.17582904 -1.25388067]
 [-0.6377515   0.9071052  -1.4286807  -0.14006872 -0.8617549 ]
 [-0.25561937 -2.79858911 -1.7715331  -0.69987723  0.92746243]
 [-0.17363568  0.00284592  0.68822271 -0.87953634  0.28362732]
 [-0.80536652 -1.72766949 -0.39089979  0.57380586  0.33858905]
 [-0.01183049  2.39236527  0.41291216  0.97873601  2.23814334]] 0.013271441083785076 1.1896228140281462


4. Create $X_3$ by extracting the odd rows of $X_1$

In [13]:
#Get odd rows
X3 = X1[::2]
print(X3)

[[ 4.57184698 14.98672723 11.41489249  2.46852643  7.10699874]
 [ 6.60556924  9.52645516 17.45694813  6.80549002  7.7800902 ]
 [13.68684288 17.45366014  5.32083066 15.87914522  3.73059666]
 [ 8.72190315 -3.99294553  1.14233448  6.50061383 14.63731216]
 [ 5.97316741  1.36165253  8.04550103 12.86902931 11.69294525]]


5. Create $X_4$ by scaling the columns of $X_3$ so that each column sums to 1

In [14]:
#Scale columns
X4 = X3/X3.sum(axis = 0 , keepdims = True)
print(X4)

[[ 0.11556937  0.38099702  0.26313414  0.05544409  0.15811622]
 [ 0.1669788   0.24218437  0.40241457  0.15285403  0.17309113]
 [ 0.34598268  0.44371212  0.12265488  0.35665195  0.08299816]
 [ 0.22047652 -0.10150984  0.0263329   0.14600639  0.32565032]
 [ 0.15099263  0.03461633  0.18546351  0.28904354  0.26014417]]


6. What is the eigenvector with eigenvalue 1 of $X_4$?. Hint: be careful when checking equality for floats.

In [15]:
#Get eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(X4)

#Check if the eigenvalues are close to 1
eigen_value_one = np.isclose(eigenvalues, 1)

#Find corresponding eigenvector
eigenvector_one = eigenvectors[eigen_value_one]

print(eigenvector_one)

[[-0.47442176+0.j  0.45872578+0.j  0.59208286+0.j  0.59208286-0.j
   0.20921644+0.j]]


7. Create $X_5$ by replacing all negative values in $X_2$ by 0

In [16]:
X5 = np.where(X2<0, X2, 0)
print(X5)

[[-1.0856306   0.          0.         -1.50629471 -0.57860025]
 [ 0.         -2.42667924 -0.42891263  0.         -0.8667404 ]
 [-0.67888615 -0.09470897  0.         -0.638902   -0.44398196]
 [-0.43435128  0.          0.          0.          0.        ]
 [ 0.          0.         -0.93583387  0.         -1.25388067]
 [-0.6377515   0.         -1.4286807  -0.14006872 -0.8617549 ]
 [-0.25561937 -2.79858911 -1.7715331  -0.69987723  0.        ]
 [-0.17363568  0.          0.         -0.87953634  0.        ]
 [-0.80536652 -1.72766949 -0.39089979  0.          0.        ]
 [-0.01183049  0.          0.          0.          0.        ]]


8. Print the matrix $X_1$ such that each value has only 3 significant digits

In [17]:
print(np.round(X1, 3))

[[ 4.572 14.987 11.415  2.469  7.107]
 [18.257 -2.133  7.855 16.33   5.666]
 [ 6.606  9.526 17.457  6.805  7.78 ]
 [ 7.828 21.03  20.934 15.02  11.931]
 [13.687 17.454  5.321 15.879  3.731]
 [ 6.811 14.536  2.857  9.3    5.691]
 [ 8.722 -3.993  1.142  6.501 14.637]
 [ 9.132 10.014 13.441  5.602 11.418]
 [ 5.973  1.362  8.046 12.869 11.693]
 [ 9.941 21.962 12.065 14.894 21.191]]


9. Suppose you are given observations $y = \pmatrix{1,2,3,4,5,6,7,8,9,10}^T$. Find the least squares solution to $X_1 T \beta = y$.

In [18]:
XT = np.transpose(X1)
y = np.transpose(np.array([[1,2,3,4,5,6,7,8,9,10]]))

#Least squares matrix algebra
beta = np.matmul(np.matmul(np.linalg.inv(np.matmul(XT, X1)), XT), y)

print(beta)

[[-0.02284738]
 [ 0.03630393]
 [-0.15890427]
 [ 0.1685749 ]
 [ 0.49194246]]


10. What is the vector in the column space of $X_1$ closest to $y$?

In [19]:
#Use euclidian norm to calculate distances
distances = np.linalg.norm(X1-y, axis=0)

#Get index of minimum distance
min_index = np.argmin(distances)
print(distances)

#Get vector using index
min_vector = X1[:,min_index]
print(min_vector)

[19.8554828  33.16018066 26.78792331 22.75046635 18.45776625]
[ 7.10699874  5.66629799  7.7800902  11.930932    3.73059666  5.69122552
 14.63731216 11.41813662 11.69294525 21.19071669]
