# Problems

---
### Comparing lists
Write a function, returning True (False) if two lists do (do not) contain the same elements.

1. First imagine that elements in the list do not repeat, for example the following two lists are considered the same
```python
a = [1, 2, 3, 1]
b = [3, 2, 1]
```
2. The lists are no longer the same, if some elements repeat. For example, the lists above are no longer the same.
3. Design a third program, which first counts the number of occurences of each element in the list, and then compares the two lists. 
    - You might use `defaultdict` from the `collections` module and 
    - implement two separate functions: count_elements and compare_lists. Here is the docstring for the first one.

```python
def count_elements(x: list) -> dict:
    """Count the number of occurrences of each element in the list.
    The algorithm is O(n).

    Examples:
        >>> count_elements([1, 2, 1, 3, 2])
        defaultdict(<class 'int'>, {1: 2, 2: 2, 3: 1})

        >>> count_elements([])
        defaultdict(<class 'int'>, {})

        >>> count_elements(["a", "b", "a", "c"])
        defaultdict(<class 'int'>, {'a': 2, 'b': 1, 'c': 1})
    """
```

In [1]:
def compare(x: list, y: list) -> bool:
    """Checks if the lists the same.
    
    Examples:
        >>> compare([], [])
        True
        >>> compare([1, 2, 3], [3, 2, 2, 1])
        True
        >>> compare([1, 2, 3], [3, 2, 5, 1])
        False
    """
    return set(x) == set(y)

In [2]:
def compare2(x: list, y: list) -> bool:
    """Compare two lists.
    The algorithm is O(n*log(n)) due to sorting algorithm, comparison is only O(n).

    Examples:
        >>> compare2([], [])
        True
        >>> compare2([1, 2, 3], [3, 2, 1])
        True
        >>> compare2([1, 2, 3], [3, 2, 2, 1])
        False
    """
    return sorted(x) == sorted(y)


In [3]:
from collections import defaultdict

def count_elements(x: list) -> dict:
    """Count the number of occurrences of each element in the list.
    The algorithm is O(n).

    Examples:
        >>> count_elements([1, 2, 1, 3, 2])
        defaultdict(<class 'int'>, {1: 2, 2: 2, 3: 1})

        >>> count_elements([])
        defaultdict(<class 'int'>, {})

        >>> count_elements(["a", "b", "a", "c"])
        defaultdict(<class 'int'>, {'a': 2, 'b': 1, 'c': 1})
    """
    count = defaultdict(int)
    for a in x:
        count[a] += 1
    return count

def compare_lists(x: list, y: list) -> bool:
    """Compare two lists based on the counts of their elements.
    The algorithm is O(n).

    Examples:
        >>> compare_lists([1, 2, 1], [2, 1, 1])
        True
        >>> compare_lists([1, 2, 1], [1, 2, 2])
        False
        >>> compare_lists([], [])
        True
        >>> compare_lists(["apple", "banana", "apple"], ["banana", "apple", "apple"])
        True
    """
    return count_elements(x) == count_elements(y)

To compare the performance, we can randomly generaly two large lists of integers and time the two functions.

In [4]:
from random import randint
import timeit

l1 = [randint(1, 100) for _ in range(int(1e5))]
l2 = [randint(1, 100) for _ in range(int(1e5))]

# do 100 iterations for each function and compare the time
%timeit -n 100 compare2(l1, l2)
%timeit -n 100 compare_lists(l1, l2)

14.4 ms ± 208 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
8.27 ms ± 42.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


---
### Creating a text with a great Poe-ness
An [n-gram](https://en.wikipedia.org/wiki/N-gram) is a contiguous sequence of n items from a given sample of text or speech. Here as items we choose characters.
1. Count the number of n-grams in a text. Here is an outline:
```python
# create empty dictionary using defaultdict

for row in open('poe.txt'):
    # save n-grams from line into dictionary 
```
2. Sort them by frequency.

You can use for example the text [here](https://ipnp.cz/strelecek/supplementary/23ZS/poe.txt) and read it by lines:

In [5]:
from collections import defaultdict

n = 3
kgrams = defaultdict(int)

for row in open('poe.txt'):
    # separate the line to n-grams
    for i in range(len(row)-n):
        kgrams[row[i:i+n]] += 1

# create pairs (frequency, n-gram)
pairs = [ (frequency, gram) for gram, frequency in kgrams.items() ]

# sort the pairs and print them
for frequency, gram in reversed(sorted(pairs)):
    print(frequency, gram)

1794  ne
1783  se
1762  po
1737 em 
1644  a 
1621 ...
1607  js
1480 se 
1475 sem
1429 jse
1386  je
1291  na
1214  př
999  do
989 ost
971  pr
953  – 
945 ak 
944 ch 
943 ho 
938 na 
868 ou 
811 ně 
810 to 
791 že 
781 la 
775  by
765  to
761 jak
757 pro
717  za
709  že
707 al 
697  ja
689  ta
676  v 
673 ní 
659 val
658 m s
656 pře
650 byl
642 l j
640  st
633 , k
608 , ž
606 le 
602 ova
591 il 
588 , a
585 e s
569  vy
561 kte
559 ter
551 tak
546 a p
541 li 
536 lo 
529 sta
522 mi 
516 e p
514 e v
510 , j
508 roz
493 ého
493  kt
466 ako
462 l, 
458 e n
457  ro
452 ých
452 e j
451 o s
442 en 
437 nou
434 o p
428 ím 
426 a n
425 né 
423  pa
422 a s
421 je 
420 do 
407 e m
406  od
405  sv
400 u, 
398 sti
396 m, 
395 o n
394 e t
392 pod
387 při
382 si 
380  ně
380  mi
378 e, 
372  ná
370 pří
366  si
363 ech
361 nos
360  mn
359 ce 
354  ob
353 a t
351 ly 
348 i, 
347  vš
347  ve
346 tel
345 řed
342 a v
339  ji
334 , n
325 ale
325  z 
325  te
321 ko 
318 o v
318 i p
314 ne 
314 ení
314 by 
313

# Problematic problems

---
### Generating some nonsense
Use the previous code for generating text. *Choose the first n-gram randomly, then randomly choose the next letter from the list of letters that follow the n-gram. For random selection, you can use*
```python
import random
random.choice(your_list_of_letters)
```

In [10]:
from collections import defaultdict
import random

def build_continuation_model(text: str, n: int) -> defaultdict:
    """
    Creates a dictionary where keys are n-grams (sequences of n characters)
    and values are lists of characters that follow those n-grams in the text.

    Args:
        text (str): The text to build the model from.
        n (int): The length of the n-grams.

    Returns:
        defaultdict(list): The continuation model dictionary.
    """
    model = defaultdict(list)
    for i in range(len(text) - n):
        ngram = text[i : i + n]
        following_char = text[i + n]
        model[ngram].append(following_char)
    return model

def generate_text(model: defaultdict, seed_length: int=3, length: int =300) -> str:
    """
    Generates text using the continuation model.

    Args:
        model (defaultdict(list)): The continuation model.
        seed_length (int, optional): The length of the initial seed n-gram. Defaults to 3.
        length (int, optional): The desired length of the generated text. Defaults to 300.

    Returns:
        str: The generated text.
    """
    seed = random.choice(list(model))  
    generated_text = seed

    while len(generated_text) < length:
        ngram = generated_text[-seed_length:]
        possible_continuations = model.get(ngram)

        if possible_continuations:
            next_char = random.choice(possible_continuations)
            generated_text += next_char
        else:  
            generated_text += '/' + random.choice(list(model)) 

    return generated_text

# Load and preprocess text
with open("poe.txt", "r") as file:
    text = file.read().replace("\n", " ")  # Remove newlines

# Main execution
n_gram_size = 3
model = build_continuation_model(text, n_gram_size)
generated_poem = generate_text(model)
print(generated_poem)

ní? Jestný každé ost k pánové požadat. Má že mnou v panedosálemi." "Ne, jazy tváři toment pootenu, pro mezi neboť měl jsem si, nejvíc tak jstevřít ztrácenězů, jako stě.“ Havratrnéhost, že bez usadil, aby jedek – zku jsem ita veno zlat. Mosky za menu bys se o vznení mohambral jsem množskonnosně – nu
