**Q1** Compute the euclidean length of $\langle 3,4,5\rangle$, rounded to an integer.

In [1]:
import numpy as np
import pandas as pd
from scipy.spatial import distance

a = (3,4,5)
distance.euclidean(0,a)

7.0710678118654755

**Q2** Compute the cosine similarity of $\langle 3,4,5\rangle$ and $\langle 6,3,2\rangle$.

In [2]:
from scipy import spatial

dataSetI = [3, 4, 5]
dataSetII = [6, 3, 2]
result = 1 - distance.cosine(dataSetI, dataSetII)
result

0.80812203564176865

**Q3** Adomovicius 2005 gives in formula 10b a definition of $r_{c,s}$ (the estimated rating of item $s$ for user $c$), in which the rating's of the neighbours of $c$ (which are collected in the set $C$), weighted by their similarity to $c$.  Give the exact formula. Do not forget to normalize.

wtf

**Q4**  Write a function `shorten(text, n)` to process a text, omitting the n most frequently occurring words of the text. 
* How readable is it?
* Give the steps, and write your program completely declaratively, with one line of code for each step

1. tokenize
2. use the `Counter` method to count
3. use the `most_common(n)` method of Counter object
4. remove unwanted words using list comprehension
5. turn into a text again by `' '.join()`

In [3]:
import nltk
from collections import Counter

def shorten(text, n):
    # tokenize
    tokenizer = nltk.tokenize.RegexpTokenizer(r'\w+')
    tokens = tokenizer.tokenize(text.lower())
    
    # Counter & most_common
    countdict = Counter(tokens)
    popular = countdict.most_common(n)
    
    # remove unwanted words
    delete = [key for key, value in popular]
    result = [i for i in tokens if i not in delete]
    
    # return text
    result =" ".join(result)
    return result

text = "xd xd xd xd xd ur xd xd xd xd mom xd xd xd xd xd xd gay xd xd xd xd xd xd lmao"
shorten(text,1)

'ur mom gay lmao'

**Q5** Write a function that takes a list of words (containing duplicates) and returns a list of words (with no duplicates) sorted by decreasing frequency. 

E.g. if the input list contained 10 instances of the word table and 9 instances of the word chair, then table would appear before chair in the output list.

In [4]:
words1 = ["table","table","table","table","table","table","table","table","table","table","chair","chair","chair","chair","chair","chair","chair","chair","chair"]
words = ["hello", "apple", "banana", "apple", "hello", "apple"]

def wordfreq(words):
    freqdict = Counter(words)
    freq = freqdict.most_common(len(freqdict))
    return [key for key,value in freq]

wordfreq(words)

['apple', 'hello', 'banana']

**Q6** The following code from the CI book actually programs the function from **Q3**. Refactor this code into much more readable declarative code. You may assum you have a function `sim(p1,p2)` which computes the similarity between two persons, and a function `neighbours(p)` which picks out the closest neighbours of `p`.  You may also assume that `prefs` is a global variable. 


```
# Gets recommendations for a person by using a weighted average
# of every other user's rankings
def getRecommendations(prefs,person,similarity=sim_pearson):
  totals={}
  simSums={}
  for other in prefs:
    # don't compare me to myself
    if other==person: continue
    sim=similarity(prefs,person,other)

    # ignore scores of zero or lower
    if sim<=0: continue
    for item in prefs[other]:
	    
      # only score movies I haven't seen yet
      if item not in prefs[person] or prefs[person][item]==0:
        # Similarity * Score
        totals.setdefault(item,0)
        totals[item]+=prefs[other][item]*sim
        # Sum of similarities
        simSums.setdefault(item,0)
        simSums[item]+=sim

  # Create the normalized list
  rankings=[(total/simSums[item],item) for item,total in totals.items()]

  # Return the sorted list
  rankings.sort()
  rankings.reverse()
  return rankings
```

In [5]:
def rating(c,s):
    # this must be defined
    C = neighbours(c)
    simsum= sum(sim(c,d) for d in C)
    weighted_estimate= sum (sim(c,d)*prefs[d][s] for d in C)
    return weighted_estimate / simsum

def getRecommendations(person):
    all_items= set([i for person in prefs for i in prefs[person]])
    estimates = [(rating(person,s),s) for s in all_items]
    estimates.sort(reverse=True)
    
    # only return not rated items
    return [pair for pair in estimates if pair[1] not in prefs[person] or prefs[person][pair[1]]==0 ]    

**Q7** Here we test your recursive thinking. Consider the following code

```
def boom(n):  # n must be an integer
    if n==0:
        return []
    else:
        return [boom(n-1),boom(n-1)]
```

1. Give the exact output produced by 
```
for n in range(4):
    print boom(n)
```

In [6]:
def boom(n):
    if n == 0:
        return []
    else:
        return [boom(n-1),boom(n-1)]
    
for n in range(4):
    print(boom(n))

[]
[[], []]
[[[], []], [[], []]]
[[[[], []], [[], []]], [[[], []], [[], []]]]


* n==0: 1
* n==1: 2
* n==2: 4
* n==3: 8
#### 2^n == aantal lege lijsten


3. Rewrite the defnition of `boom(n)` in such a way that it does not use recursion

In [7]:
def boom_it(n):
    boom = []
    if n == 0:
        True
    else:
        for i in range(0,n):
            boom = [boom,boom]
    return boom

for n in range(5):
    print(boom_it(n))

boom_it(4)

[]
[[], []]
[[[], []], [[], []]]
[[[[], []], [[], []]], [[[], []], [[], []]]]
[[[[[], []], [[], []]], [[[], []], [[], []]]], [[[[], []], [[], []]], [[[], []], [[], []]]]]


[[[[[], []], [[], []]], [[[], []], [[], []]]],
 [[[[], []], [[], []]], [[[], []], [[], []]]]]

**Q8** Give examples of both intrinsic and extrinsic evidence that search engines use to rank their results.

zie slides week 7

In [8]:
def boom2(n):
    boom = []
    if n == 0:
        pass
    else:
        for i in range(n):
            boom = [boom,boom]
    return boom
        
for i in range(5):
    print(boom2(i))

[]
[[], []]
[[[], []], [[], []]]
[[[[], []], [[], []]], [[[], []], [[], []]]]
[[[[[], []], [[], []]], [[[], []], [[], []]]], [[[[], []], [[], []]], [[[], []], [[], []]]]]
