# Conversor de Markdown para HTML

Trabalho desenvolvido por Rui Gonçalves (A101759) a 2024-02-16

## Descrição

Converter um ficheiro de Markdown para HTML, nomeadamente:

- Negrito
- Itálico
- Links
- Imagens

Também foram implementados títulos e linhas horizontais.

## Trabalho desenvolvido

In [1]:
import re
from IPython.display import display, HTML

### Processamento

As linhas de títulos são começadas pelo caracter `#` sendo que podem ter até 6 (`######`), cada um corresponde a um nivel adicional.

In [2]:
def convert_title(text):
    return re.sub(r'(^#+)(.*)', lambda t: f'<h{len(t[1])}>{t[2]}</h{len(t[1])}>', text)

As linhas horizontais são definidas por `___`, `---` ou `***`, pelo que um simples `re.sub` pode ser utilizado.

In [3]:
def convert_hr(text):
    return re.sub(r'^___|---|\*\*\*$', r'<hr>', text)

Os links são definidos como `[text](href)`.

In [4]:
def convert_link(text):
    return re.sub(r'([^!])\[([^\]]*)\]\(([^\)]*)\)', lambda l: f'{l[1]}<a href="{l[3]}">{l[2]}</a>', text)

As imagens são semelhantes a links, com `![alt](href)`.

In [5]:
def convert_img(text):
    return re.sub(r'!\[([^\]]*)\]\(([^\)]*)\)', lambda l: f'<img alt="{l[1]}" src="{l[2]}" />', text)

Por último, o negrito e itálico, correspondem a `**negrito**`, `__negrito__`, `*italico*`, `*italico*`, sendo que estes podem ser juntos para `***negrito e itálico***` ou `***negrito e itálico***`.

In [6]:
# Correctly builds a list of the open and close
#  [*, **, **, *, ***, **, ***, ***, **, ***]
#  [1,  2,  2, 1,   1,  2,   3,   3,  2,   1]
def open_close_build(starts):
    final = []
    stack = []

    for part in starts:
        last = '' if len(stack) == 0 else stack[-1]
        mat = last == part
        final.append(mat)

        # Remove or append to the stack
        if mat:
            stack = stack[:-1]
        else:
            stack.append(part)

    return final


def write_tag(tag, close):
    close = '/' if close else ''
    return f'<{close}{tag}>'


def convert_decoration(text):
    starts = re.findall(r'\*{1,3}|_{1,3}', text)
    final = open_close_build(starts)

    for m, c in zip(starts, final):
        if m == '*' or m == '_':
            tag = 'i'
        elif m == '**' or m == '__':
            tag = 'b'
        else:
            tag = f'ib'

        if c:
            tag = tag[::-1]

        tag = ''.join([write_tag(t, c) for t in tag])

        text = text.replace(m, tag, 1)

    return text

Realizar a troca linha a linha.

In [7]:
def text2md(text):
    def conver_line(line):
        if not line:
            line = '<br>'

        line = convert_title(line)
        line = convert_hr(line)
        line = convert_link(line)
        line = convert_img(line)
        line = convert_decoration(line)

        return line

    lines = text.splitlines()
    lines = [conver_line(line) for line in lines]

    return ''.join(lines)

### Exemplo

In [8]:
example_text = """
# Lorem Ipsum

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed **mauris** velit, *fringilla* id tincidunt ut, [elementum](https://www.example.com) vitae magna.

## Nullam Quis

Nullam quis **_ultricies_** nisi. Proin vestibulum sapien et velit viverra, vitae vestibulum metus tincidunt. [Click here](https://www.example.com) for more information.

### Fusce In

Fusce in arcu nec turpis vehicula fringilla. Curabitur [malesuada](https://www.example.com) orci vel metus laoreet, vel convallis tortor suscipit.

#### Ut Eros

Ut eros felis, consectetur eget [libero](https://www.example.com) nec, aliquam ullamcorper libero. Sed vitae enim eu ligula auctor volutpat.

##### Aliquam Erat

Aliquam erat volutpat. Sed [imperdiet](https://www.example.com) dapibus odio, a cursus mauris consequat in. 

###### Suspendisse Tristique

Suspendisse tristique, ipsum vel efficitur [sodales](https://www.example.com), elit ex fringilla arcu, vel rhoncus ligula tortor nec libero.

---

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum [ullamcorper](https://www.example.com) dignissim fringilla. ![Image](https://via.placeholder.com/150)

---

*Lorem ipsum* dolor sit amet, **consectetur *adipiscing* elit**. Aenean ut tortor eu [nibh](https://www.example.com) convallis luctus. **Pellentesque habitant** morbi tristique senectus et netus et malesuada fames ac turpis egestas.

---

In hac habitasse platea dictumst. [**Nunc**](https://www.example.com) scelerisque, eros sit amet varius volutpat, quam justo euismod metus, eu hendrerit mi tortor nec est.
"""

In [9]:
t = text2md(example_text)
print(t)

<br><h1> Lorem Ipsum</h1><br>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed <b>mauris</b> velit, <i>fringilla</i> id tincidunt ut, <a href="https://www.example.com">elementum</a> vitae magna.<br><h2> Nullam Quis</h2><br>Nullam quis <b><i>ultricies</i></b> nisi. Proin vestibulum sapien et velit viverra, vitae vestibulum metus tincidunt. <a href="https://www.example.com">Click here</a> for more information.<br><h3> Fusce In</h3><br>Fusce in arcu nec turpis vehicula fringilla. Curabitur <a href="https://www.example.com">malesuada</a> orci vel metus laoreet, vel convallis tortor suscipit.<br><h4> Ut Eros</h4><br>Ut eros felis, consectetur eget <a href="https://www.example.com">libero</a> nec, aliquam ullamcorper libero. Sed vitae enim eu ligula auctor volutpat.<br><h5> Aliquam Erat</h5><br>Aliquam erat volutpat. Sed <a href="https://www.example.com">imperdiet</a> dapibus odio, a cursus mauris consequat in. <br><h6> Suspendisse Tristique</h6><br>Suspendisse tristique, ipsum ve

In [10]:
display(HTML(t))