<a href="https://colab.research.google.com/github/konstantint/ai-auto-olympiad/blob/main/informatics/15liners/simple_dna_segments.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

AI plays informatics olympiad with itself

# Initialization boilerplate

In [None]:
!pip install -q -U google-generativeai

In [None]:
#@title Imports & api key loading
import google.generativeai as genai
import os
from google.colab import userdata
import ipywidgets as widgets # Import ipywidgets
from IPython.display import display # Import display to show widgets

# --- Configuration ---

# Fetch the API key from Colab secrets
try:
    api_key = userdata.get('GOOGLE_API_KEY')
    if not api_key:
        raise ValueError("API key not found. Please add it to Colab secrets under the name 'GOOGLE_API_KEY'.")
    genai.configure(api_key=api_key)
except Exception as e:
    print(f"Error configuring Gemini API: {e}")
    # You might want to exit or handle this error appropriately
    exit()


# Part 1: AI creates an olympiad task

First, the olympiad task author comes up with an olympiad task along with a set of tests for it.

In [None]:
#@title Prompt
task_author_prompt = widgets.Textarea(
    value="""You are an experienced author of informatics olympiad problems.

Your task is to come up with a perfect warm-up programming task for the Baltics Olympiad in Informatics.
Informatics olympiad warm-up problems must have the following properties:
  - Their statement reads like a real-life situation or a story where some character needs to solve some
    kind of a problem they are facing.
  - The solution to the problem involves the use of algorithmic coding. Participants will write a program
    in Python that must read the input data via stdin and output the result via stdout.
  - The participants usually need to come up with a clever combination of one or more classical
    algorithms in order to solve the problem.
  - The problem must allow a range of solutions using various algorithmic techniques, with different
    algorithmic complexities (e.g. a brute force solution might be O(n^3) while a smart use of dynamic programming
    could allow a linear solution, or the like).
  - The optimal solution to the problem could be found, implemented and tested by an olympiad participant (who
    is experienced in competitive programming) within about 30 minutes.
  - A Python solution of less than 15 lines of code must exist.
  - The problem must be accompanied by a number of tests, where each test consists of a text file with problem input data
    and a text file with correct output.

You are trying to develop a new olympiad problem. Please, proceed step by step as follows:

# Step 1.

List the most common algorithms or algorithmic techniques that are usually involved in informatics olympiad problems.

# Step 2.

Using the list of algorithms produced in Step 1, come up with five potential problems which require a combination of two or three
algorithmics techniques listed in Step 1 to be solved.

# Step 3.

For each of the problems you produced in Step 2, verify that it indeed requires a clever combination of algorithmic techniques to be solved
and that it can be solved using a range of approaches with different solutions having different algorithmic complexities.
Remove problems that do not fit these criteria.

# Step 4.

Write the optimal solution for each of the problems remaining after Step 3 in Python.
Leave only those who can be solved in at most 15 lines of code.

# Step 5.

Come up with a potential storyline for each of the problems remaining after step 4. Assess the interestingness of the storyline
on a scale from 1 (boring) to 5 (exciting).

Leave only the problems that scored at least 4.

# Step 6.

Finally, pick one of the problems from step 5 and:

  - List what are the possible algorithmic solutions with possible algorithmic complexities.
    For each such algorithmic complexity, create a test case (i.e. a pair of input and output files)
    that could be solved using a Python solution with such algorithmic complexity within 0.1 seconds,
    but would not be solvable by code that has worse complexity.

  - List all the tricky corner cases the problem may have and for each corner case create a test case that would fail
    unless the solution code takes that corner case into account.

# Step 7.

Output the produced problem in the following format:

<problem>
<statement>
.. problem statement in Markdown format ...
</statement>
<solutions>
  <solution complexity="linear/quadratic/...">
    ... solution in Python ...
  </solution>
  <solution ...>
    ... solutions for other complexities ...
  </solution>
</solutions>
<tests>
  <test comment="..explanation of the test case">
    <input>
      .. input text data ...
    </input>
    <output>
    .. expected output text data...
    </output>
  </test>
  ... other tests as required
</tests>
""",
    description='Prompt:',
    disabled=True,
    layout=widgets.Layout(height='800px', width='auto'), # Adjust height/width
    style={'description_width': 'initial'}
)
task_author_prompt

Textarea(value='You are an experienced author of informatics olympiad problems.\n\nYour task is to come up wit…

In [None]:
#@title Generate task
%%time
task_author = genai.GenerativeModel(
  model_name="gemini-2.5-pro-exp-03-25",
  generation_config= {
      "temperature": 0.5,
      "top_p": 1,
      "top_k": 50,
  },
)
generated_task = task_author.generate_content(task_author_prompt.value)

CPU times: user 1.44 s, sys: 188 ms, total: 1.63 s
Wall time: 4min 6s


### Full thinking process (collapsible)

In [None]:
print(generated_task.text)

Okay, let's create the warm-up problem following the specified steps.

**Step 1: List Common Algorithms/Techniques**

Here's a list of common algorithms and techniques relevant to olympiad problems, particularly those suitable for warm-ups:

*   **Sorting:** Comparison sorts (Merge Sort, TimSort), Counting Sort.
*   **Searching:** Binary Search, Linear Search.
*   **Greedy Algorithms:** Activity selection, Huffman coding basics, simple interval problems.
*   **Dynamic Programming (DP):** Simple 1D DP (e.g., max subarray sum), basic 2D DP (e.g., LCS, Knapsack - maybe too complex for warm-up), Digit DP (unlikely for warm-up).
*   **Data Structures:**
    *   Arrays/Lists
    *   Stacks/Queues
    *   Heaps/Priority Queues
    *   Hash Maps/Dictionaries/Sets
    *   Prefix Sums
*   **Graph Algorithms:** Basic Traversal (BFS, DFS). (Shortest paths, MST etc. usually beyond warm-up).
*   **Two Pointers / Sliding Window:** Efficiently processing subarrays/subsequences.
*   **Basic Math/Number

## Generated problem

In [None]:
#@title Extract data
# (it's not really parsable XML)
import re
from IPython import display

problem_data = generated_task.text[generated_task.text.index("<problem>")+len("<problem>"):]
problem_data = problem_data[:problem_data.index("</problem>")]
statement = re.findall("<statement>(.+?)</statement>", problem_data, re.DOTALL)[0].strip()
solutions = re.findall("<solution complexity=\"(.+?)\">(.+?)</solution>", problem_data, re.DOTALL)
tests = re.findall("<test comment=\"(.+?)\">.*?<input>(.+?)</input>.*?<output>(.+?)</output>.*?</test>", problem_data, re.DOTALL)
tests = [(cmt, inp.strip(), out.strip()) for cmt, inp, out in tests]

In [None]:
#@title Problem statement
display.display(display.Markdown(statement))

# Simple DNA Segments

A young biologist, Alex, is studying genetic sequences. A DNA sequence can be represented as a sequence of integers, where each integer corresponds to a specific nucleotide type. Alex is particularly interested in finding "simple" segments within a long DNA sequence.

A segment is considered "simple" if it contains **at most K** different types of nucleotides (i.e., at most K distinct integer values). Alex needs your help to find the length of the **longest contiguous simple segment** in a given DNA sequence.

**Input**

The first line contains two integers, N and K (0 <= K <= N <= 200,000), representing the length of the DNA sequence and the maximum number of distinct nucleotide types allowed in a simple segment.

The second line contains N integers `a_1, a_2, ..., a_N` (-1,000,000,000 <= a_i <= 1,000,000,000), representing the DNA sequence.

**Output**

Output a single integer, the length of the longest contiguous subarray (segment) of the sequence `a` that contains at most K distinct integers.

**Example 1**

*Input:*
```
7 2
1 2 1 3 1 2 4
```

*Output:*
```
3
```

*Explanation:*
The subarrays are:
[1] (1 distinct) - len 1
[1, 2] (2 distinct) - len 2
[1, 2, 1] (2 distinct) - len 3
[1, 2, 1, 3] (3 distinct) - len 4 (Too many)
[2] (1 distinct) - len 1
[2, 1] (2 distinct) - len 2
[2, 1, 3] (3 distinct) - len 3 (Too many)
[1] (1 distinct) - len 1
[1, 3] (2 distinct) - len 2
[1, 3, 1] (2 distinct) - len 3
[1, 3, 1, 2] (3 distinct) - len 4 (Too many)
...
The longest subarrays with at most K=2 distinct elements are [1, 2, 1] and [1, 3, 1], both of length 3.

**Example 2**

*Input:*
```
10 3
1 2 3 4 1 2 3 1 2 1
```

*Output:*
```
5
```

*Explanation:*
The subarray [4, 1, 2, 3, 1] (from index 3 to 7, 0-based) has elements {4, 1, 2, 3}. 4 distinct elements > K=3.
The subarray [1, 2, 3, 1, 2] (from index 4 to 8) has elements {1, 2, 3}. 3 distinct elements <= K=3. Length is 5. This is the longest possible.

**Example 3**

*Input:*
```
5 5
-10 20 -10 30 20
```

*Output:*
```
5
```

*Explanation:*
The entire array has distinct elements {-10, 20, 30}, which is 3 distinct elements. Since 3 <= K=5, the whole array is a simple segment. Length is 5.

In [None]:
#@title Sample solutions
for i, (cpl, code) in enumerate(solutions):
  display.display(display.HTML(f"<h2>Solution #{i} ({cpl})</h2>"))
  display.display(display.Markdown(code))


```python
import sys

def solve():
    n, k = map(int, sys.stdin.readline().split())
    if n == 0:
        print(0)
        return
    a = list(map(int, sys.stdin.readline().split()))

    counts = {}
    max_len = 0
    left = 0
    for right in range(n):
        # Add element a[right] to the window
        counts[a[right]] = counts.get(a[right], 0) + 1

        # Shrink window from left if it violates the K distinct elements condition
        while len(counts) > k:
            counts[a[left]] -= 1
            if counts[a[left]] == 0:
                del counts[a[left]]
            left += 1

        # Update max_len with the length of the current valid window
        # The window A[left..right] is guaranteed to be valid here
        max_len = max(max_len, right - left + 1)

    print(max_len)

solve()
```
  


```python
import sys

def solve():
    n, k = map(int, sys.stdin.readline().split())
    if n == 0:
        print(0)
        return
    a = list(map(int, sys.stdin.readline().split()))

    max_len = 0
    for i in range(n):
        distinct_elements = set()
        for j in range(i, n):
            distinct_elements.add(a[j])
            if len(distinct_elements) <= k:
                max_len = max(max_len, j - i + 1)
            else:
                # Once we exceed K distinct elements for a fixed start i,
                # extending j further won't help
                break
    print(max_len)

solve()
```
  


```python
import sys

def solve():
    n, k = map(int, sys.stdin.readline().split())
    if n == 0:
        print(0)
        return
    a = list(map(int, sys.stdin.readline().split()))

    max_len = 0
    for i in range(n):
        for j in range(i, n):
            # Extract subarray and count distinct elements
            subarray = a[i : j+1]
            distinct_elements = set(subarray)
            if len(distinct_elements) <= k:
                max_len = max(max_len, len(subarray)) # or j - i + 1

    print(max_len)

solve()
```
  

In [None]:
#@title Tests
for cmt, inp, out in tests:
  display.display(display.HTML(f"<h3>{cmt}</h3><pre>{inp}</pre><h4>Out</h4><pre>{out}</pre>"))

## Test sample solutions

In [None]:
#@title Testing code
import textwrap

def test_solution(sol, inp):
  try:
    with open("/tmp/sol.py", "w") as f:
      f.write(sol)
    with open("/tmp/test.in", "w") as f:
      f.write(inp)
    !cat /tmp/test.in | python /tmp/sol.py > /tmp/test.out
    with open("/tmp/test.out") as f:
      return f.read().strip()
  except:
    return "Failed"

def eval_solution(sol_cmt, sol, tests):
  display.display(display.HTML(f"<h2>Testing solution: {sol_cmt}</h2>"))
  sol = sol.strip().replace("```python", "").replace("```", "")
  sol = textwrap.dedent(sol)
  # Fix commented out part in second solution.
  sol = sol.replace("# solve_fw", "solve_fw")
  table = ["<table><tr><th>Test</th><th>Input</th><th>Exp. out</th><th>Act. out</th></tr>"]
  for cmt, inp, out in tests:
    test_out = test_solution(sol, inp)
    table.append(f"<tr style='background-color: {'lightgreen' if out == test_out else '#ffaaaa'}'><td>{cmt}</td><td>{inp}</td><td>{out}</td><td>{test_out}</td></tr>")
  table.append("</table>")
  display.display(display.HTML("".join(table)))

for sol_cmt, sol in solutions:
  eval_solution(sol_cmt, sol, tests)

Traceback (most recent call last):
  File "/tmp/sol.py", line 31, in <module>
    solve()
  File "/tmp/sol.py", line 16, in solve
    counts[a[right]] = counts.get(a[right], 0) + 1
                                  ~^^^^^^^
IndexError: list index out of range
Traceback (most recent call last):
  File "/tmp/sol.py", line 31, in <module>
    solve()
  File "/tmp/sol.py", line 9, in solve
    a = list(map(int, sys.stdin.readline().split()))
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: '...'
Traceback (most recent call last):
  File "/tmp/sol.py", line 31, in <module>
    solve()
  File "/tmp/sol.py", line 9, in solve
    a = list(map(int, sys.stdin.readline().split()))
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: '...'
Traceback (most recent call last):
  File "/tmp/sol.py", line 31, in <module>
    solve()
  File "/tmp/sol.py", line 9, in solve
    a = list(map(int, sys

Test,Input,Exp. out,Act. out
Example 1 from statement,7 2 1 2 1 3 1 2 4,3,3.0
Example 2 from statement,10 3 1 2 3 4 1 2 3 1 2 1,5,6.0
Example 3 from statement,5 5 -10 20 -10 30 20,5,5.0
Corner Case: K=0,5 0 1 2 1 3 2,0,0.0
"Corner Case: K=0, N=0",0 0,0,0.0
Corner Case: N=0,0 5,0,0.0
Corner Case: K is large enough,5 10 1 2 1 3 2,5,5.0
"Corner Case: All elements are the same, K=1",10 1 5 5 5 5 5 5 5 5 5 5,10,10.0
"Corner Case: All elements are the same, K > 1",10 3 5 5 5 5 5 5 5 5 5 5,10,10.0
Corner Case: All elements are distinct,10 3 1 2 3 4 5 6 7 8 9 10,3,3.0


Traceback (most recent call last):
  File "/tmp/sol.py", line 24, in <module>
    solve()
  File "/tmp/sol.py", line 15, in solve
    distinct_elements.add(a[j])
                          ~^^^
IndexError: list index out of range
Traceback (most recent call last):
  File "/tmp/sol.py", line 24, in <module>
    solve()
  File "/tmp/sol.py", line 9, in solve
    a = list(map(int, sys.stdin.readline().split()))
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: '...'
Traceback (most recent call last):
  File "/tmp/sol.py", line 24, in <module>
    solve()
  File "/tmp/sol.py", line 9, in solve
    a = list(map(int, sys.stdin.readline().split()))
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: '...'
Traceback (most recent call last):
  File "/tmp/sol.py", line 24, in <module>
    solve()
  File "/tmp/sol.py", line 9, in solve
    a = list(map(int, sys.stdin.readline().split()))
   

Test,Input,Exp. out,Act. out
Example 1 from statement,7 2 1 2 1 3 1 2 4,3,3.0
Example 2 from statement,10 3 1 2 3 4 1 2 3 1 2 1,5,6.0
Example 3 from statement,5 5 -10 20 -10 30 20,5,5.0
Corner Case: K=0,5 0 1 2 1 3 2,0,0.0
"Corner Case: K=0, N=0",0 0,0,0.0
Corner Case: N=0,0 5,0,0.0
Corner Case: K is large enough,5 10 1 2 1 3 2,5,5.0
"Corner Case: All elements are the same, K=1",10 1 5 5 5 5 5 5 5 5 5 5,10,10.0
"Corner Case: All elements are the same, K > 1",10 3 5 5 5 5 5 5 5 5 5 5,10,10.0
Corner Case: All elements are distinct,10 3 1 2 3 4 5 6 7 8 9 10,3,3.0


Traceback (most recent call last):
  File "/tmp/sol.py", line 22, in <module>
    solve()
  File "/tmp/sol.py", line 9, in solve
    a = list(map(int, sys.stdin.readline().split()))
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: '...'
Traceback (most recent call last):
  File "/tmp/sol.py", line 22, in <module>
    solve()
  File "/tmp/sol.py", line 9, in solve
    a = list(map(int, sys.stdin.readline().split()))
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: '...'
Traceback (most recent call last):
  File "/tmp/sol.py", line 22, in <module>
    solve()
  File "/tmp/sol.py", line 9, in solve
    a = list(map(int, sys.stdin.readline().split()))
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: '...'


Test,Input,Exp. out,Act. out
Example 1 from statement,7 2 1 2 1 3 1 2 4,3,3.0
Example 2 from statement,10 3 1 2 3 4 1 2 3 1 2 1,5,6.0
Example 3 from statement,5 5 -10 20 -10 30 20,5,5.0
Corner Case: K=0,5 0 1 2 1 3 2,0,0.0
"Corner Case: K=0, N=0",0 0,0,0.0
Corner Case: N=0,0 5,0,0.0
Corner Case: K is large enough,5 10 1 2 1 3 2,5,5.0
"Corner Case: All elements are the same, K=1",10 1 5 5 5 5 5 5 5 5 5 5,10,10.0
"Corner Case: All elements are the same, K > 1",10 3 5 5 5 5 5 5 5 5 5 5,10,10.0
Corner Case: All elements are distinct,10 3 1 2 3 4 5 6 7 8 9 10,3,3.0


# Part 2: AI solves the task

In [None]:
#@title Prompt
task_solver_prompt = widgets.Textarea(
    value="""You are a world-class competitive programming sportsman.

You will be given the problem statement for an olympiad problem and will need to
come up with a Python program that solves this problem. Your program will
have to read input data from stdin and write the answer to stdout.

Please, proceed step by step:

Step 1.
Analyze the provided problem, and propose an algorithmic approach for solving it.
Assess the algorithmic time and space complexity of your solution.

Step 2.
Write the Python code for your solution.

Step 3.
Make sure your code covers all corner cases. If it doesn't, fix it.

Step 4.
Output the Python code for your solution between the <solution></solution> tags.
Make sure your code is well commented and clear.

If your code fails to execute, you may have to fix it until it works well
in follow-up requests.
""",
    description='Prompt:',
    disabled=False,
    layout=widgets.Layout(height='400px', width='auto'), # Adjust height/width
    style={'description_width': 'initial'}
)
task_solver_prompt

Textarea(value="You are a world-class competitive programming sportsman.\n\nYou will be given the problem stat…

In [None]:
#@title Solve problem
%%time
task_solver = genai.GenerativeModel(
  model_name="gemini-2.5-pro-exp-03-25",
  generation_config= {
      "temperature": 0.5,
      "top_p": 1,
      "top_k": 50,
  },
)
solved_task = task_solver.generate_content(task_solver_prompt.value + """

PROBLEM STATEMENT
""" + statement)

CPU times: user 523 ms, sys: 73.4 ms, total: 596 ms
Wall time: 1min 29s


### Full thinking process (collapsible)

In [None]:
print(solved_task.text)

```python
import sys
from collections import Counter # Can use Counter or standard dict

def solve():
    """
    Solves the Simple DNA Segments problem using a sliding window approach.
    Reads input N, K, and the sequence a.
    Finds the length of the longest contiguous subarray of a
    containing at most K distinct elements.
    Prints the result to stdout.
    """
    
    # Read N and K from the first line of input
    line1 = sys.stdin.readline().split()
    n = int(line1[0])
    k = int(line1[1])

    # --- Handle Edge Cases ---

    # If the sequence length N is 0, the longest segment is of length 0.
    if n == 0:
        print(0)
        return

    # Read the DNA sequence from the second line of input
    a = list(map(int, sys.stdin.readline().split()))

    # If K is 0, we are looking for segments with at most 0 distinct elements.
    # Only an empty segment satisfies this. The maximum length is 0.
    # Note: This case is also implicitly handled if N=0 by the check abov

## Solution evaluation

In [None]:
#@title Extract data
# (it's not really parsable XML)
import re
from IPython import display

if "<solution>" in solved_task.text:
  solution = textwrap.dedent(re.findall("<solution>(.+?)</solution>", solved_task.text, re.DOTALL)[0]).strip()
else:
  solution = solved_task.text
solution = solution.replace("```python", "").replace("```", "")

In [None]:
print(solution)


import sys
from collections import Counter # Can use Counter or standard dict

def solve():
    """
    Solves the Simple DNA Segments problem using a sliding window approach.
    Reads input N, K, and the sequence a.
    Finds the length of the longest contiguous subarray of a
    containing at most K distinct elements.
    Prints the result to stdout.
    """
    
    # Read N and K from the first line of input
    line1 = sys.stdin.readline().split()
    n = int(line1[0])
    k = int(line1[1])

    # --- Handle Edge Cases ---

    # If the sequence length N is 0, the longest segment is of length 0.
    if n == 0:
        print(0)
        return

    # Read the DNA sequence from the second line of input
    a = list(map(int, sys.stdin.readline().split()))

    # If K is 0, we are looking for segments with at most 0 distinct elements.
    # Only an empty segment satisfies this. The maximum length is 0.
    # Note: This case is also implicitly handled if N=0 by the check above.
    # 

In [None]:
eval_solution("Proposed solution", solution, tests)

Traceback (most recent call last):
  File "/tmp/sol.py", line 99, in <module>
    solve()
  File "/tmp/sol.py", line 60, in solve
    num_right = a[right]
                ~^^^^^^^
IndexError: list index out of range
Traceback (most recent call last):
  File "/tmp/sol.py", line 99, in <module>
    solve()
  File "/tmp/sol.py", line 26, in solve
    a = list(map(int, sys.stdin.readline().split()))
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: '...'
Traceback (most recent call last):
  File "/tmp/sol.py", line 99, in <module>
    solve()
  File "/tmp/sol.py", line 26, in solve
    a = list(map(int, sys.stdin.readline().split()))
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: '...'
Traceback (most recent call last):
  File "/tmp/sol.py", line 99, in <module>
    solve()
  File "/tmp/sol.py", line 26, in solve
    a = list(map(int, sys.stdin.readline().split()))
        ^^^^^

Test,Input,Exp. out,Act. out
Example 1 from statement,7 2 1 2 1 3 1 2 4,3,3.0
Example 2 from statement,10 3 1 2 3 4 1 2 3 1 2 1,5,6.0
Example 3 from statement,5 5 -10 20 -10 30 20,5,5.0
Corner Case: K=0,5 0 1 2 1 3 2,0,0.0
"Corner Case: K=0, N=0",0 0,0,0.0
Corner Case: N=0,0 5,0,0.0
Corner Case: K is large enough,5 10 1 2 1 3 2,5,5.0
"Corner Case: All elements are the same, K=1",10 1 5 5 5 5 5 5 5 5 5 5,10,10.0
"Corner Case: All elements are the same, K > 1",10 3 5 5 5 5 5 5 5 5 5 5,10,10.0
Corner Case: All elements are distinct,10 3 1 2 3 4 5 6 7 8 9 10,3,3.0
