Lightweight Markdown Parser with GUI and Live Preview

Overview

This project is a lightweight Markdown tokenizer, parser, and renderer built in Python. It converts raw Markdown (.md) input into structured tokens, parses them into HTML, and displays a real-time preview of the rendered content.

Initially designed as a command-line tool for educational purposes—to demonstrate lexical analysis and parsing similar to that used in compilers—it has since been expanded with a graphical user interface (GUI) using tkinter, and a live HTML preview window powered by pywebview.

Image Preview

How It Works

Markdown Input – Users type or paste Markdown into the GUI.

Tokenization – The MakeToken module scans the Markdown and generates tokens representing Markdown elements.

Parsing – The DriverParser processes these tokens into corresponding HTML.

Rendering – The HTML is styled using a custom theme (Theme) and rendered in a pywebview window.

Live Feedback – Every time the user clicks "Convert & Render", the preview updates instantly.

How tokenization works

Line-based Tokenization:

Detects headers (lines starting with #)
Detects unordered list items (lines starting with -)
Detects ordered list items (lines starting with 1. ) numbering is autometically handled
Detects code block (line starting with ```, and ending with ```)
Defaults to plain text if no special formatting is found

Block-Level Token Types

Block Type	Syntax Pattern	Tokenizer Used
Header	`#`, `##`, etc.	`HeaderTokenizer`
Unordered List	`- item` or `-- item`	`ListTokenizer`
Ordered List	`1. item`, `2. item`, etc.	`ListTokenizer`
Code Block	Between triple backticks ```	`CodeTokenizer`
Plain Text	Any other line	`TextTokenizer`

Will produce:

A header token with inline tokens
A text token with multiple inline tokens
A list item (unordered)
A list item (ordered)
A code block token with the Python function as content

After block-level parsing, the value of each token is further scanned for inline formatting using InlineTokenizer

Inline Tokenization:

The InlineTokenizer parses inline formatting in a given string and converts recognized styles into structured tokens. Supported formats include: Supported Syntax:

Format Type	Syntax Example	Token Type
Bold	`bold` or `__bold__`	`InlineType.BOLD`
Italic	`italic` or `_italic_`	`InlineType.ITALIC`
`Code`	`inline code`	`InlineType.CODE`
`Code`	spaced code	`InlineType.CODE`

Example Input:

This is **bold**, *italic*, and `code`.

Output Tokens:

   [
             InlineToken(InlineType.TEXT, "This is "),
             InlineToken(InlineType.BOLD, "bold"),
             InlineToken(InlineType.TEXT, ", "),
             InlineToken(InlineType.ITALIC, "italic"),
             InlineToken(InlineType.TEXT, ", and "),
             InlineToken(InlineType.CODE, "code"),
             InlineToken(InlineType.TEXT, ".")
      ]

Token Representation:

Each token has a type and value
Complex lines may yield nested tokens (e.g., header + inline bold text)

Tokenization Rules

The following rules govern how Markdown elements are tokenized:

#, ##, ... ###### → Header tokens (levels 1 to 6)
- at the beginning of a line → List item token (unordered)
1. at the beginning of a line → List item token (ordered)
**text** → Bold token
__text__ → Bold token
_text_ → Italic token
*text* → Italic token
`code` → Inline code token
```code``` → Block code token
Default → Plain text token

inline markdown parser rules

Markdown	Token Type	InlineType
`italic`	Inline	`ITALIC`
`_italic_`	Inline	`ITALIC`
`bold`	Inline	`BOLD`
`__bold__`	Inline	`BOLD`
`code`	Inline	`CODE`
Regular text	Inline	`TEXT`

For each line:

Check if it’s a header (#)
Check if it's an unordered list (-)
Check if it's an ordered list (1., 2., etc.)
Check if it's a code block (```)
Else: treat as normal text

For all:

Tokenize block-level
Then tokenize inline
Append either: (block_token, [inline_tokens]) or block_token (if no inline)
convert to html contents using token types

Topics Covered & Lessons Learned

This project covers a variety of topics in both programming and system design:

Programming Concepts

Object-Oriented Programming (OOP)
File I/O
Regular Expressions (Regex)
Python modules and packages
Test-driven development (TDD) structure

System Design Concepts

Tokenization & Lexical Analysis
Parser Architecture
Modularity and Separation of Concerns
Reusability and Extensibility in Software Design

This makes the project a strong starting point for those interested in building interpreters, compilers, or static analyzers.

How to Run

Prerequisites

Make sure you have Python 3 installed. You’ll also need the following Python packages:

pip install pywebview

| Note: tkinter comes pre-installed with most Python distributions. If not, you may need to install it manually depending on your OS.

Running the App

Clone the Repository on your computer:
```
git clone https://github.com/ethanux/Python-Markdown-Parser.git
cd Python-Markdown-Parser
```
| Note:if not using command line please change directories to the main/root directory of the project
Run the Main File:

on your teminlal run the following command and make sure u have python3 or heigher installed
```
python main.py
```
| Note : if you dont know ho to use the terminal just run the main.py file but double clicking on it
This will:

Launch the GUI for Markdown input
Open a new window for HTML preview

Test It Out

Try entering this example into the input box:

# Hello World

This is a paragraph with **bold**, *italic*, and `inline code`.

## List

- Item 1
- Item 2

1. First
2. Second

Then click “Convert & Render” to see the output in real time!

License

This project is open-source and available under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
config		config
core		core
html		html
parsers		parsers
render		render
tests		tests
tokenizer		tokenizer
utils		utils
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Lightweight Markdown Parser with GUI and Live Preview

Overview

Initially designed as a command-line tool for educational purposes—to demonstrate lexical analysis and parsing similar to that used in compilers—it has since been expanded with a graphical user interface (GUI) using tkinter, and a live HTML preview window powered by pywebview.

Image Preview

How It Works

How tokenization works

Line-based Tokenization:

Block-Level Token Types

Will produce:

Inline Tokenization:

Example Input:

Token Representation:

Tokenization Rules

inline markdown parser rules

For each line:

For all:

Topics Covered & Lessons Learned

Programming Concepts

System Design Concepts

How to Run

Prerequisites

Running the App

Test It Out

License

About

Uh oh!

Releases

Packages

Languages

ethanux/Python-Markdown-Parser

Folders and files

Latest commit

History

Repository files navigation

Lightweight Markdown Parser with GUI and Live Preview

Overview

Initially designed as a command-line tool for educational purposes—to demonstrate lexical analysis and parsing similar to that used in compilers—it has since been expanded with a graphical user interface (GUI) using tkinter, and a live HTML preview window powered by pywebview.

Image Preview

How It Works

How tokenization works

Line-based Tokenization:

Block-Level Token Types

Will produce:

Inline Tokenization:

Example Input:

Token Representation:

Tokenization Rules

inline markdown parser rules

For each line:

For all:

Topics Covered & Lessons Learned

Programming Concepts

System Design Concepts

How to Run

Prerequisites

Running the App

Test It Out

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages