## Testing various libraries' performance of converting markdown to HTML

## `markdown` library in python

In [1]:
import markdown

First test, with a basic string.

In [2]:
text = "# Hello world\n\nThis is an *italics*. This is a **bold**.\n\n\t-This is a tab."

In [5]:
html_output = markdown.markdown(text=text)
print(html_output)

<h1>Hello world</h1>
<p>This is an <em>italics</em>. This is a <strong>bold</strong>.</p>
<pre><code>-This is a tab.
</code></pre>


Now, load in a an outside markdown document and see if it parses it correctly.

In [6]:
with open('./sample.md', 'r') as f:
    markdown_doc = f.read()
    
markdown_doc

"# This is a header\n\n## This is a subheader\n\nHere is some normal text. I'll also include *italics* and **bold**.\n\nHow about unordered lists:\n\n- Item 1\n- Item 2\n- Item 3\n\nAnd ordered lists:\n\n1. Ordered Item 1\n2. Ordered Item 2\n3. Ordered Item 3\n\nNow a table:\n\n| Column 1 | Column 2 | Column 3 |\n|----------|----------|----------|\n| Row 1    | Row 2    | Row 3    |\n| Row 4    | Row 5    | Row 6    | \n\nFinally, a picture:\n\n![example picture](./sample_pic.png)"

In [7]:
html_doc = markdown.markdown(markdown_doc)

print(html_doc)

<h1>This is a header</h1>
<h2>This is a subheader</h2>
<p>Here is some normal text. I'll also include <em>italics</em> and <strong>bold</strong>.</p>
<p>How about unordered lists:</p>
<ul>
<li>Item 1</li>
<li>Item 2</li>
<li>Item 3</li>
</ul>
<p>And ordered lists:</p>
<ol>
<li>Ordered Item 1</li>
<li>Ordered Item 2</li>
<li>Ordered Item 3</li>
</ol>
<p>Now a table:</p>
<p>| Column 1 | Column 2 | Column 3 |
|----------|----------|----------|
| Row 1    | Row 2    | Row 3    |
| Row 4    | Row 5    | Row 6    | </p>
<p>Finally, a picture:</p>
<p><img alt="example picture" src="./sample_pic.png" /></p>


`markdown` mostly does well but struggles with tables. It also kind of did some weird formatting to the picture.

### `markdwn_analysis` library

In [8]:
from mrkdwn_analysis import MarkdownAnalyzer

In [13]:
analyzer = MarkdownAnalyzer(file_path='./sample.md')

headers = analyzer.identify_headers()
paragraphs = analyzer.identify_paragraphs()
links = analyzer.identify_links()
tables = analyzer.identify_tables()

In [10]:
print(headers)

{'Header': [{'line': 1, 'level': 1, 'text': 'This is a header'}, {'line': 3, 'level': 2, 'text': 'This is a subheader'}]}


In [11]:
print(paragraphs)

{'Paragraph': ["Here is some normal text. I'll also include *italics*,  **bold**, and a [link](https://www.google.com)", 'How about unordered lists:']}


In [12]:
print(links)

{'Text link': [{'line': 5, 'text': 'link', 'url': 'https://www.google.com'}]}


In [14]:
print(tables)

{}


This one seems interesting but it didn't capture the table.