Skip to content

privet-kitty/python-cf-html

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

python-cf-html

Note

This package is currently in the development stage.

A Python package for parsing and generating CF_HTML clipboard format data, which is a format used by Windows applications to store HTML content with metadata, including fragment boundaries and selection information.

Installation

This package requires Python 3.10 or higher.

Note: This package is not yet available on PyPI. Install directly from GitHub if you want to use it:

pip install git+https://github.com/privet-kitty/python-cf-html.git

Quick Start

Parsing CF_HTML Data

from cf_html import CfHtml

cf_html_str = (
    "Version:1.0\r\n"
    "StartHTML:0000000105\r\n"
    "EndHTML:0000000197\r\n"
    "StartFragment:0000000141\r\n"
    "EndFragment:0000000161\r\n"
    "<html>\r\n"
    "<body>\r\n"
    "<!--StartFragment--><p>Hello, World!</p><!--EndFragment-->\r\n"
    "</body>\r\n"
    "</html>"
)
cf_html = CfHtml.loads(cf_html_str)

# Access the fragment content
fragment = cf_html.fragment
print(fragment)  # <p>Hello, World!</p>

Generating CF_HTML from Context

from cf_html import CfHtml

html_context = (
    "<html>\r\n"
    "<body>\r\n"
    "<!--StartFragment--><p>Hello, World!</p><!--EndFragment-->\r\n"
    "</body>\r\n"
    "</html>"
)

cf_html = CfHtml.load_contexts(html_context)
print(str(cf_html))  # This outputs the following CF_HTML
# Version:1.0
# StartHTML:0000000105
# EndHTML:0000000197
# StartFragment:0000000141
# EndFragment:0000000161
# <html>
# <body>
# <!--StartFragment--><p>Hello, World!</p><!--EndFragment-->
# </body>
# </html>

Notes

Behavior of Fragment Boundaries

When examining the actual implementation of CF_HTML in official Windows applications like Microsoft Teams and Microsoft Edge, the fragment boundaries defined by StartFragment and EndFragment do not include the <!--StartFragment--> and <!--EndFragment--> comment markers themselves, but rather point to the HTML content between these markers. This package follows this real-world behavior.

However, the official HTML Clipboard Format specification states that StartFragment stores the "offset (in bytes) from the beginning of the clipboard to the start of the fragment" (emphasis added). According to the following BNF syntax provided in the specification, "fragment" seems to refer to content that includes both the <!--StartFragment--> and <!--EndFragment--> comment markers:

<cf-html>                ::= <description-header> <context>
<context>                ::= [<preceding-context>] <fragment> ment>[<trailing-context>]
<description-header>     ::= "Version:" <version> <br> ( <header-offset-keyword> ":" <header-offset-value> <br> )*
<header-offset-keyword>  ::= "StartHTML" | "EndHTML" | "StartFragment" | "EndFragment" | "StartSelection" | "EndSelection"
<header-offset-value>    ::= { Base 10 (decimal) integer string with optional *multiple* leading zero digits (see "Offset syntax" below) }
<version>                ::= "0.9" | "1.0"
<fragment>               ::= <fragment-start-comment> <fragment-text> <fragment-end-comment>
<fragment-start-comment> ::= "<!--StartFragment -->"
<fragment-end-comment>   ::= "<!--EndFragment -->"
<preceding-context>      ::= { Arbitrary HTML }
<trailing-context>       ::= { Arbitrary HTML }
<fragment-text>          ::= { Arbitrary HTML }
<br>                     ::= "\r" | "\n" | "\r\n"

The following example demonstrates this package's behavior:

Version:1.0
StartHTML:0000000105
EndHTML:0000000193
StartFragment:0000000139  ← Points to (inclusive) start of "<p>Hello, World!</p>"
EndFragment:0000000159    ← Points to (exclusive) end of "<p>Hello, World!</p>"
<html>
<body>
<!--StartFragment--><p>Hello, World!</p><!--EndFragment-->
</body>
</html>

Copyright

Copyright (c) 2025 Hugo Sansaqua.

About

library for handling CF_HTML format of Windows clipboard

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages