# Problems

---
### Comparing lists
Write a function, returning True (False) if two lists do (do not) contain the same elements.

1. First imagine that elements in the list do not repeat, for example the following two lists are considered the same
```python
a = [1, 2, 3, 1]
b = [3, 2, 1]
```
2. The lists are no longer the same, if some elements repeat. For example, the lists above are no longer the same.
3. Design a third program, which first counts the number of occurences of each element in the list, and then compares the two lists. 
    - You might use `defaultdict` from the `collections` module and 
    - implement two separate functions: count_elements and compare_lists. Here is the docstring for the first one.

```python
def count_elements(x: list) -> dict:
    """Count the number of occurrences of each element in the list.
    The algorithm is O(n).

    Examples:
        >>> count_elements([1, 2, 1, 3, 2])
        defaultdict(<class 'int'>, {1: 2, 2: 2, 3: 1})

        >>> count_elements([])
        defaultdict(<class 'int'>, {})

        >>> count_elements(["a", "b", "a", "c"])
        defaultdict(<class 'int'>, {'a': 2, 'b': 1, 'c': 1})
    """
```

In [3]:
def compare(x: list, y: list) -> bool:
    """Checks if the lists the same.
    
    Examples:
        >>> compare([], [])
        True
        >>> compare([1, 2, 3], [3, 2, 2, 1])
        True
        >>> compare([1, 2, 3], [3, 2, 5, 1])
        False
    """
    return set(x) == set(y)

In [10]:
def compare2(x: list, y: list) -> bool:
    """Compare two lists.
    The algorithm is O(n*log(n)) due to sorting algorithm, comparison is only O(n).

    Examples:
        >>> compare2([], [])
        True
        >>> compare2([1, 2, 3], [3, 2, 1])
        True
        >>> compare2([1, 2, 3], [3, 2, 2, 1])
        False
    """
    return sorted(x) == sorted(y)


False

In [15]:
from collections import defaultdict

def count_elements(x: list) -> dict:
    """Count the number of occurrences of each element in the list.
    The algorithm is O(n).

    Examples:
        >>> count_elements([1, 2, 1, 3, 2])
        defaultdict(<class 'int'>, {1: 2, 2: 2, 3: 1})

        >>> count_elements([])
        defaultdict(<class 'int'>, {})

        >>> count_elements(["a", "b", "a", "c"])
        defaultdict(<class 'int'>, {'a': 2, 'b': 1, 'c': 1})
    """
    cetnosti = defaultdict(int)
    for a in x:
        cetnosti[a] += 1
    return cetnosti

def compare_lists(x: list, y: list) -> bool:
    """Compare two lists based on the counts of their elements.
    The algorithm is O(n).

    Examples:
        >>> compare_lists([1, 2, 1], [2, 1, 1])
        True
        >>> compare_lists([1, 2, 1], [1, 2, 2])
        False
        >>> compare_lists([], [])
        True
        >>> compare_lists(["apple", "banana", "apple"], ["banana", "apple", "apple"])
        True
    """
    return count_elements(x) == count_elements(y)

To compare the performance, we can randomly generaly two large lists of integers and time the two functions.

In [19]:
from random import randint
import timeit

l1 = [randint(1, 100) for _ in range(int(1e5))]
l2 = [randint(1, 100) for _ in range(int(1e5))]

# do 100 iterations for each function and compare the time
%timeit -n 100 compare2(l1, l2)
%timeit -n 100 compare_lists(l1, l2)

13.9 ms ± 265 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
8.28 ms ± 67.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


---
### Creating a text with a great Poe-ness
An [n-gram](https://en.wikipedia.org/wiki/N-gram) is a contiguous sequence of n items from a given sample of text or speech. Here as items we choose characters.
1. Count the number of n-grams in a text. Here is an outline:
```python
# create empty dictionary using defaultdict

for row in open('poe.txt'):
    # save n-grams from line into dictionary 
```
2. Sort them by frequency.

You can use for example the text [here](https://ipnp.cz/strelecek/supplementary/23ZS/poe.txt) and read it by lines:

In [None]:
from collections import defaultdict

n = 3
kgrams = defaultdict(int)

for row in open('poe.txt'):
    # separate the line to n-grams
    for i in range(len(row)-n):
        kgrams[row[i:i+n]] += 1

# create pairs (frequency, n-gram)
pairs = [ (frequency, gram) for gram, frequency in kgrams.items() ]

# sort the pairs and print them
for frequency, gram in reversed(sorted(pairs)):
    print(frequency, gram)

---
### Parsing the input
Parse the input below as a list of lists of dictionary. Elements of the outer list are the rounds, the inner list elements are separated by semicolon, and the dictionary consists of colors.
```
first line: 3 blue, 4 red; 1 red, 2 green, 6 blue; 2 green
second line: 1 blue, 2 green; 3 green, 4 blue, 1 red; 1 green, 1 blue
third line: 8 green, 6 blue, 20 red; 5 blue, 4 red, 13 green; 5 green, 1 red
```

The resulting object should look like
```python
[
    [{'blue': 3, 'red': 4}, 
     {'blue': 6, 'red': 1, 'green': 2}, 
     {'green': 2}],
    ...
 ]
```

In [None]:
from collections import defaultdict
colors=[]
for l in open("colors.txt"):
    line = l.split(":")[1].rstrip("\n").split(";")
    linelist = []
    for i in line:
        d = i.split(",")
        
        linedict = defaultdict(int)
        for elem in d:
            num,col = elem.split()
            linedict[col]=num
        linelist.append(linedict)
    colors.append(linelist)

print(colors[0])

[defaultdict(<class 'int'>, {'blue': '3', 'red': '4'}), defaultdict(<class 'int'>, {'red': '1', 'green': '2', 'blue': '6'}), defaultdict(<class 'int'>, {'green': '2'})]


In [None]:
def reverse_lines(filename: str, outputname: str)->None:
    """Prints the lines of a file in reverse order into outputname."""
    with open(outputname, 'w') as outfile:
        f = list(open(filename))
        # the last line doesn't have a newline character
        outfile.write(f[-1]+'\n') 
        for line in reversed(f[1::-2]):
            outfile.write(line) 
        outfile.write(f[0].rstrip())

def reverse_lines_and_words(filename: str, outputname: str)->None:
    """Prints the lines of a file in reverse order into outputname. 
    Now reverse even the words in lines."""
    def reverse_line(line: str)->str:
        return ' '.join(reversed(line.split()))

    with open(outputname, 'w') as outfile:
        f = list(open(filename))
        # the last line doesn't have a newline character
        outfile.write(reverse_line(f[-1])+'\n')
        for line in reversed(f[1::-2]):
            outfile.write(reverse_line(line)+'\n') 
        outfile.write(reverse_line(f[0].rstrip()))

reverse_lines('input.txt', 'output.txt')
reverse_lines_and_words('input.txt', 'output2.txt')

# Problematic problems

---
### Generating some nonsense
Use the previous code for generating text. *Choose the first n-gram randomly, then randomly choose the next letter from the list of letters that follow the n-gram. For random selection, you can use*
```python
import random
random.choice(your_list_of_letters)
```

In [26]:
from collections import defaultdict
import random

def build_continuation_model(text: str, n: int) -> defaultdict:
    """
    Creates a dictionary where keys are n-grams (sequences of n characters)
    and values are lists of characters that follow those n-grams in the text.

    Args:
        text (str): The text to build the model from.
        n (int): The length of the n-grams.

    Returns:
        defaultdict(list): The continuation model dictionary.
    """
    model = defaultdict(list)
    for i in range(len(text) - n):
        ngram = text[i : i + n]
        following_char = text[i + n]
        model[ngram].append(following_char)
    return model

def generate_text(model: defaultdict, seed_length: int=3, length: int =300) -> str:
    """
    Generates text using the continuation model.

    Args:
        model (defaultdict(list)): The continuation model.
        seed_length (int, optional): The length of the initial seed n-gram. Defaults to 3.
        length (int, optional): The desired length of the generated text. Defaults to 300.

    Returns:
        str: The generated text.
    """
    seed = random.choice(list(model))  
    generated_text = seed

    while len(generated_text) < length:
        ngram = generated_text[-seed_length:]
        possible_continuations = model.get(ngram)

        if possible_continuations:
            next_char = random.choice(possible_continuations)
            generated_text += next_char
        else:  
            generated_text += '/' + random.choice(list(model)) 

    return generated_text

# Load and preprocess text
with open("poe.txt", "r") as file:
    text = file.read().replace("\n", " ")  # Remove newlines

# Main execution
n_gram_size = 3
model = build_continuation_model(text, n_gram_size)
generated_poem = generate_text(model)
print(generated_poem)

Poznávrhnu vzpouzen, zakládal výpočínala, že zvran 72 prosi to jsem a pocitlitelné očívají dušil dávka, kterozumokrajem na pohli –" ,,Ledek, ó jakov. Pootevřeligránovalo hled. B. Cítil. Třebaženujícímu z Rozbolehlo napodním a znikdo do počáte na zepři teď nemocně úzko víc ji než na nedů ještěně ztrá
