# core

> Core functions for dialogify

In [None]:
#| default_exp core

In [None]:
#| hide
from nbdev.showdoc import *

## Introduction

[The Python Standard Library](https://docs.python.org/3/library/index.html) documentation is very helpful for learning Python. So is [Solveit](https://solve.it.com/)! Solveit is jupyter notebook + AI with superpowers. Learning programming is so much fun and productive with AI. Therefore, I wanted to convert these html python documentation pages into solveit dialogues, which comprise small pieces of notes and code messages with appropriate headings, which can be extracted from the pages' table of contents.

How it works:

- We first get the html from the python documentation web page.
- We turn it into `(msg_type, element)` where `msg_type` is `note` or `code` and `element` is soup element.
- Turn `element`s into appropriate solveit messages for the dialog.

The goal is to use `#` for the title, `##` for subheading, and `###` for each function definition from the docs.

In [None]:
#| export
from dialoghelper import *
from dialoghelper.capture import *
from fastcore.utils import *

from IPython.display import Markdown, display
from bs4 import BeautifulSoup, NavigableString
import httpx

from itertools import groupby as igroupby


First, we grab html from the documentation and create `soup`.

In [None]:
doc_url = 'https://docs.python.org/3/library/random.html'
doc_html = httpx.get(doc_url).text
doc_html[:600]

'<!DOCTYPE html>\n\n<html lang="en" data-content_root="../">\n  <head>\n    <meta charset="utf-8" />\n    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />\n<meta property="og:title" content="random — Generate pseudo-random numbers" />\n<meta property="og:type" content="website" />\n<meta property="og:url" content="https://docs.python.org/3/library/random.html" />\n<meta property="og:site_name" content="Python documentation" />\n<meta property="og:description" content="Source code: Lib/random.py This module imple'

In [None]:
soup = BeautifulSoup(doc_html, 'html.parser')

## Some helpful utilities

Here are some utility functions for getting the main content, cleaning text, getting title, etc.

In [None]:
#| export
def get_main(soup):
    "Extract the main content section from Python docs soup"
    return soup.select_one('div.body > section')

In [None]:
ms = get_main(soup); str(ms)[:300]

'<section id="module-random">\n<span id="random-generate-pseudo-random-numbers"></span><h1><code class="xref py py-mod docutils literal notranslate"><span class="pre">random</span></code> — Generate pseudo-random numbers<a class="headerlink" href="#module-random" title="Link to this heading">¶</a></h1'

In [None]:
#| export
def clean_txt(el):
    "Clean element text by removing paragraph signs and extra whitespace"
    return el.get_text().replace('¶', '').strip()

In [None]:
#| export
def get_title(section):
    "Extract the h1 title from a section"
    if (h1 := section.find('h1')): return clean_txt(h1)

In [None]:
get_title(ms)

'random — Generate pseudo-random numbers'

Before turning the `soup` into markdown, we turn these into each sections as in `(title, section)` tuples.

In [None]:
#| export
def get_sections(main):
    "Get all direct child sections as (title, section_element) tuples"
    return [(clean_txt(s.find('h2')), s) for s in main.find_all('section', recursive=False) if s.find('h2')]

In [None]:
len(get_sections(ms))

12

We can grab sections and grab the bookkeeping section

In [None]:
sts = get_sections(ms)
bk = sts[0][1]
str(bk)[:300]

'<section id="bookkeeping-functions">\n<h2>Bookkeeping functions<a class="headerlink" href="#bookkeeping-functions" title="Link to this heading">¶</a></h2>\n<dl class="py function">\n<dt class="sig sig-object py" id="random.seed">\n<span class="sig-prename descclassname"><span class="pre">random.</span><'

Looking at the preview to check if it is looking good.

In [None]:
#| export
def preview_msgs(msgs):
    """Preview message tuples as rendered markdown"""
    for msg_type, content in msgs:
        display(Markdown(f"**[{msg_type}]**\n\n{content}"))

In [None]:
preview_msgs(get_sections(ms)[:2])

**[Bookkeeping functions]**

<section id="bookkeeping-functions">
<h2>Bookkeeping functions<a class="headerlink" href="#bookkeeping-functions" title="Link to this heading">¶</a></h2>
<dl class="py function">
<dt class="sig sig-object py" id="random.seed">
<span class="sig-prename descclassname"><span class="pre">random.</span></span><span class="sig-name descname"><span class="pre">seed</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">a</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">version</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">2</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#random.seed" title="Link to this definition">¶</a></dt>
<dd><p>Initialize the random number generator.</p>
<p>If <em>a</em> is omitted or <code class="docutils literal notranslate"><span class="pre">None</span></code>, the current system time is used.  If
randomness sources are provided by the operating system, they are used
instead of the system time (see the <a class="reference internal" href="os.html#os.urandom" title="os.urandom"><code class="xref py py-func docutils literal notranslate"><span class="pre">os.urandom()</span></code></a> function for details
on availability).</p>
<p>If <em>a</em> is an int, its absolute value is used directly.</p>
<p>With version 2 (the default), a <a class="reference internal" href="stdtypes.html#str" title="str"><code class="xref py py-class docutils literal notranslate"><span class="pre">str</span></code></a>, <a class="reference internal" href="stdtypes.html#bytes" title="bytes"><code class="xref py py-class docutils literal notranslate"><span class="pre">bytes</span></code></a>, or <a class="reference internal" href="stdtypes.html#bytearray" title="bytearray"><code class="xref py py-class docutils literal notranslate"><span class="pre">bytearray</span></code></a>
object gets converted to an <a class="reference internal" href="functions.html#int" title="int"><code class="xref py py-class docutils literal notranslate"><span class="pre">int</span></code></a> and all of its bits are used.</p>
<p>With version 1 (provided for reproducing random sequences from older versions
of Python), the algorithm for <a class="reference internal" href="stdtypes.html#str" title="str"><code class="xref py py-class docutils literal notranslate"><span class="pre">str</span></code></a> and <a class="reference internal" href="stdtypes.html#bytes" title="bytes"><code class="xref py py-class docutils literal notranslate"><span class="pre">bytes</span></code></a> generates a
narrower range of seeds.</p>
<div class="versionchanged">
<p><span class="versionmodified changed">Changed in version 3.2: </span>Moved to the version 2 scheme which uses all of the bits in a string seed.</p>
</div>
<div class="versionchanged">
<p><span class="versionmodified changed">Changed in version 3.11: </span>The <em>seed</em> must be one of the following types:
<code class="docutils literal notranslate"><span class="pre">None</span></code>, <a class="reference internal" href="functions.html#int" title="int"><code class="xref py py-class docutils literal notranslate"><span class="pre">int</span></code></a>, <a class="reference internal" href="functions.html#float" title="float"><code class="xref py py-class docutils literal notranslate"><span class="pre">float</span></code></a>, <a class="reference internal" href="stdtypes.html#str" title="str"><code class="xref py py-class docutils literal notranslate"><span class="pre">str</span></code></a>,
<a class="reference internal" href="stdtypes.html#bytes" title="bytes"><code class="xref py py-class docutils literal notranslate"><span class="pre">bytes</span></code></a>, or <a class="reference internal" href="stdtypes.html#bytearray" title="bytearray"><code class="xref py py-class docutils literal notranslate"><span class="pre">bytearray</span></code></a>.</p>
</div>
</dd></dl>
<dl class="py function">
<dt class="sig sig-object py" id="random.getstate">
<span class="sig-prename descclassname"><span class="pre">random.</span></span><span class="sig-name descname"><span class="pre">getstate</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="headerlink" href="#random.getstate" title="Link to this definition">¶</a></dt>
<dd><p>Return an object capturing the current internal state of the generator.  This
object can be passed to <a class="reference internal" href="#random.setstate" title="random.setstate"><code class="xref py py-func docutils literal notranslate"><span class="pre">setstate()</span></code></a> to restore the state.</p>
</dd></dl>
<dl class="py function">
<dt class="sig sig-object py" id="random.setstate">
<span class="sig-prename descclassname"><span class="pre">random.</span></span><span class="sig-name descname"><span class="pre">setstate</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">state</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#random.setstate" title="Link to this definition">¶</a></dt>
<dd><p><em>state</em> should have been obtained from a previous call to <a class="reference internal" href="#random.getstate" title="random.getstate"><code class="xref py py-func docutils literal notranslate"><span class="pre">getstate()</span></code></a>, and
<a class="reference internal" href="#random.setstate" title="random.setstate"><code class="xref py py-func docutils literal notranslate"><span class="pre">setstate()</span></code></a> restores the internal state of the generator to what it was at
the time <a class="reference internal" href="#random.getstate" title="random.getstate"><code class="xref py py-func docutils literal notranslate"><span class="pre">getstate()</span></code></a> was called.</p>
</dd></dl>
</section>

**[Functions for bytes]**

<section id="functions-for-bytes">
<h2>Functions for bytes<a class="headerlink" href="#functions-for-bytes" title="Link to this heading">¶</a></h2>
<dl class="py function">
<dt class="sig sig-object py" id="random.randbytes">
<span class="sig-prename descclassname"><span class="pre">random.</span></span><span class="sig-name descname"><span class="pre">randbytes</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">n</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#random.randbytes" title="Link to this definition">¶</a></dt>
<dd><p>Generate <em>n</em> random bytes.</p>
<p>This method should not be used for generating security tokens.
Use <a class="reference internal" href="secrets.html#secrets.token_bytes" title="secrets.token_bytes"><code class="xref py py-func docutils literal notranslate"><span class="pre">secrets.token_bytes()</span></code></a> instead.</p>
<div class="versionadded">
<p><span class="versionmodified added">Added in version 3.9.</span></p>
</div>
</dd></dl>
</section>

`html_to_md` turns html into md for appropriate tags.

In [None]:
#| export
def html_to_md(el, in_link=False):
    """Recursively convert HTML element to markdown string"""
    if isinstance(el, NavigableString): return str(el)

    children_md = html_to_md_children(el, in_link=el.name=='a')

    match el.name:
        case 'a': return f"[{children_md}]({el.get('href', '')})"
        case 'code': return children_md if in_link else f"`{children_md}`"
        case 'em': return f"*{children_md}*"
        case 'strong': return f"**{children_md}**"
        case 'li': return f"- {children_md}"
        case _: return children_md

In [None]:
#| export
def html_to_md_children(el, in_link=False): return ''.join(html_to_md(child, in_link) for child in el.children)

In [None]:
print(html_to_md(bk))


Bookkeeping functions[¶](#bookkeeping-functions)


random.seed(*a=None*, *version=2*)[¶](#random.seed)
Initialize the random number generator.
If *a* is omitted or `None`, the current system time is used.  If
randomness sources are provided by the operating system, they are used
instead of the system time (see the [os.urandom()](os.html#os.urandom) function for details
on availability).
If *a* is an int, its absolute value is used directly.
With version 2 (the default), a [str](stdtypes.html#str), [bytes](stdtypes.html#bytes), or [bytearray](stdtypes.html#bytearray)
object gets converted to an [int](functions.html#int) and all of its bits are used.
With version 1 (provided for reproducing random sequences from older versions
of Python), the algorithm for [str](stdtypes.html#str) and [bytes](stdtypes.html#bytes) generates a
narrower range of seeds.

Changed in version 3.2: Moved to the version 2 scheme which uses all of the bits in a string seed.


Changed in version 3.11: The *seed* m

## `soup` to `(msg_type, el)`

Solveit messages have `Code`, `Note`, `Prompt`, and `Raw` for message types. But we want to focus on `note` and `code` for creating dialogs. By turning `soup` into `(msg_type, el)`, we can easily turn those into sovleit messages with markdown.

In [None]:
#| export
def has_cls(el, cls): return cls in ' '.join(el.get('class', []))

`dt` is special because it is used for function definition in python docs.

In [None]:
#| export
def get_msg_type(el):
    match el.name:
        case 'h1': return ('note', el)
        case 'h2': return ('note', el)
        case 'div' if has_cls(el, 'highlight'): return ('code', el)
        case 'div' if has_cls(el, 'admonition'): return ('note', el)
        case 'p': return ('note', el)
        case 'ul': return ('note', el)
        case 'dt': return ('dt', el)

In [None]:
#| export
def collect_msgs(el):
    res = get_msg_type(el)
    if res: return [res]
    msgs = []
    for o in el.children:
        if not isinstance(o, NavigableString):
            msgs.extend(collect_msgs(o))
    return msgs

In [None]:
#| export
def format_msg(msg_type, el):
    def r(ct): return (msg_type, ct)
    match el.name:
        case 'h1': return r(f"# {clean_txt(el)}")
        case 'h2': return r(f"## {clean_txt(el)}")
        case 'dt': return r(f"### `{clean_txt(el)}`")
        case 'div' if has_cls(el, 'admonition'):
            t = el.select_one('p.admonition-title')
            cts = '\n>\n> '.join([html_to_md(o) for o in t.find_next_siblings()])
            return r(f"> **{html_to_md(t)}**: {cts}")
        case _: return r(html_to_md(el))

Some functions/classes on the doc has multiple signatures. In this case, `dt`s need to be merged into a single message as a heading.

In [None]:
#| export
def merge_dt(msgs):
    res = []
    for t, grp in igroupby(msgs, first):
        if t == 'dt': res.append(('note', '\n'.join(o[1] for o in grp)))
        else: res.extend(grp)
    return res

In [None]:
#| export
def format_msgs(el): return merge_dt([format_msg(t, e) for t, e in collect_msgs(el)])

Let's try it on `bytearray` function from the "https://docs.python.org/3.12/library/functions.html".

In [None]:
bytearray_html = '''<dl class="py class" id="func-bytearray">
<dt class="sig sig-object py">
<em class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">bytearray</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">source</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">b''</span></span></em><span class="sig-paren">)</span></dt>
<dt class="sig sig-object py">
<em class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">bytearray</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">source</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">encoding</span></span></em><span class="sig-paren">)</span></dt>
<dt class="sig sig-object py">
<em class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">bytearray</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">source</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">encoding</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">errors</span></span></em><span class="sig-paren">)</span></dt>
<dd><p>Return a new array of bytes.</p>
<p>The optional <em>source</em> parameter can be used to initialize the array:</p>
<ul class="simple">
<li><p>If it is a <em>string</em>, you must also give the <em>encoding</em>.</p></li>
<li><p>If it is an <em>integer</em>, the array will have that size.</p></li>
</ul>
<p>Without an argument, an array of size 0 is created.</p>
</dd></dl>'''

In [None]:
ba_soup = BeautifulSoup(bytearray_html, 'html.parser')
preview_msgs(collect_msgs(ba_soup.dl))

**[dt]**

<dt class="sig sig-object py">
<em class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">bytearray</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">source</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">b''</span></span></em><span class="sig-paren">)</span></dt>

**[dt]**

<dt class="sig sig-object py">
<em class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">bytearray</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">source</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">encoding</span></span></em><span class="sig-paren">)</span></dt>

**[dt]**

<dt class="sig sig-object py">
<em class="property"><span class="k"><span class="pre">class</span></span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">bytearray</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">source</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">encoding</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">errors</span></span></em><span class="sig-paren">)</span></dt>

**[note]**

<p>Return a new array of bytes.</p>

**[note]**

<p>The optional <em>source</em> parameter can be used to initialize the array:</p>

**[note]**

<ul class="simple">
<li><p>If it is a <em>string</em>, you must also give the <em>encoding</em>.</p></li>
<li><p>If it is an <em>integer</em>, the array will have that size.</p></li>
</ul>

**[note]**

<p>Without an argument, an array of size 0 is created.</p>

In [None]:
ba_msgs = format_msgs(ba_soup)
ba_msgs

[('note',
  "### `class bytearray(source=b'')`\n### `class bytearray(source, encoding)`\n### `class bytearray(source, encoding, errors)`"),
 ('note', 'Return a new array of bytes.'),
 ('note',
  'The optional *source* parameter can be used to initialize the array:'),
 ('note',
  '\n- If it is a *string*, you must also give the *encoding*.\n- If it is an *integer*, the array will have that size.\n'),
 ('note', 'Without an argument, an array of size 0 is created.')]

In [None]:
merge_dt(ba_msgs)

[('note',
  "### `class bytearray(source=b'')`\n### `class bytearray(source, encoding)`\n### `class bytearray(source, encoding, errors)`"),
 ('note', 'Return a new array of bytes.'),
 ('note',
  'The optional *source* parameter can be used to initialize the array:'),
 ('note',
  '\n- If it is a *string*, you must also give the *encoding*.\n- If it is an *integer*, the array will have that size.\n'),
 ('note', 'Without an argument, an array of size 0 is created.')]

In [None]:
preview_msgs(format_msgs(ba_soup))

**[note]**

### `class bytearray(source=b'')`
### `class bytearray(source, encoding)`
### `class bytearray(source, encoding, errors)`

**[note]**

Return a new array of bytes.

**[note]**

The optional *source* parameter can be used to initialize the array:

**[note]**


- If it is a *string*, you must also give the *encoding*.
- If it is an *integer*, the array will have that size.


**[note]**

Without an argument, an array of size 0 is created.

Looks good! We can use `create_msg` to create solveit messages.

In [None]:
add_msg??


```python
def add_msg(
    content:str, # Content of the message (i.e the message prompt, code, or note text)
    placement:str='add_after', # Can be 'add_after', 'add_before', 'at_start', 'at_end'
    id:str=None, # id of message that placement is relative to (if None, uses current message; note: each add_msg updates "current" to the newly created message)
    msg_type: str='note', # Message type, can be 'code', 'note', or 'prompt'
    output:str='', # Prompt/code output; Code outputs must be .ipynb-compatible JSON array
    time_run: str | None = '', # When was message executed
    is_exported: int | None = 0, # Export message to a module?
    skipped: int | None = 0, # Hide message from prompt?
    i_collapsed: int | None = 0, # Collapse input?
    o_collapsed: int | None = 0, # Collapse output?
    heading_collapsed: int | None = 0, # Collapse heading section?
    pinned: int | None = 0, # Pin to context?
    dname:str='' # Dialog to get info for; defaults to current dialog
):
    """Add/update a message to the queue to show after code execution completes.
    If `dname` is None, the current dialog is used. If it is an open dialog, it will be updated interactively with real-time updates to the browser. If it is a closed dialog, it will be updated on disk. Dialog names must be paths relative to the solveit root directory (if starting with `/`) or relative to the current dialog (if not starting with `/`), and should *not* include the .ipynb extension."""
    _diff_dialog(placement not in ('at_start','at_end') and not id, "`id` or `placement='at_end'`/`placement='at_start'` must be provided when target dialog is different")
    if placement not in ('at_start','at_end') and not id: id = find_msg_id()
    res = call_endp(
        'add_relative_', dname, content=content, placement=placement, id=id, msg_type=msg_type, output=output,
        time_run=time_run, is_exported=is_exported, skipped=skipped, pinned=pinned,
        i_collapsed=i_collapsed, o_collapsed=o_collapsed, heading_collapsed=heading_collapsed)
    set_var('__msg_id', res)
    return res
```

**File:** `/usr/local/lib/python3.12/site-packages/dialoghelper/core.py`

In [None]:
#| export
def create_msgs(doc_tuples, dname='', **kwargs):
    """Create solveit messages from list of (msg_type, content) tuples"""
    for msg_type, ct in doc_tuples: add_msg(content=ct, msg_type=msg_type, placement='at_end' if dname else 'add_after', **kwargs)

In [None]:
# create_msgs(format_msgs(ms))

And we can make dialogs. 

In [None]:
#| export
def mk_dialog(url, dname=''):
    """Fetch Python docs URL and create a solveit dialog from it"""
    if dname and not (p := Path(f'{dname}.ipynb')).exists(): p.write_json({"cells":[],"metadata":{},"nbformat":4,"nbformat_minor":5})
    html = httpx.get(url).text
    soup = BeautifulSoup(html, 'html.parser')
    main = get_main(soup)
    create_msgs(format_msgs(main), dname=dname)

Here are examples to create solveit dialogs:

In [None]:
# mk_dialog('https://docs.python.org/3.12/library/functions.html', dname='dialogify/testing')

In [None]:
# mk_dialog('https://docs.python.org/3.12/howto/regex.html#regex-howto', dname='dialogify/regex_howto')

In [None]:
# mk_dialog('https://docs.python.org/3.12/howto/regex.html#regex-howto')

In [None]:
#| hide
import nbdev; nbdev.nbdev_export()