New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to Insert HTML into Docx? #352
Comments
I had a similar issue however our HTML is way more complex and I couldn't find any direct method, so we ended up translating HTML fragments into a run map and then something like: |
Wow thank you for the reply! Any way you could provide the link for that documentation? |
Have a look at docx.text.run import re
from docx import Document
test_items = [
"Trailing <i>tag</i>",
"<i>Leading</i> tag",
"This <i>time a tag</i> is in the middle",
]
class HTMLHelper(object):
""" Translates some html into word runs. """
def __init__(self):
self.get_tags = re.compile("(<[a-z,A-Z]+>)(.*?)(</[a-z,A-Z]+>)")
def html_to_run_map(self, html_fragment):
""" breakes an html fragment into a run map """
ptr = 0
run_map = []
for match in self.get_tags.finditer(html_fragment):
if match.start() > ptr:
text = html_fragment[ptr:match.start()]
if len(text) > 0:
run_map.append((text, "plain_text"))
run_map.append((match.group(2), match.group(1)))
ptr = match.end()
if ptr < len(html_fragment) - 1:
run_map.append((html_fragment[ptr:], "plain_text"))
return run_map
def insert_runs_from_html_map(self, paragraph, run_map):
""" inserts some runs into a paragraph object. """
for run_item in run_map:
run = paragraph.add_run(run_item[0])
if run_item[1] == "<i>":
run.italic = True
doc = Document()
html_helper = HTMLHelper()
for test_item in test_items:
run_map = html_helper.html_to_run_map(test_item)
print "------------------------------\nTest item:", test_item
print "\nRun map: ", run_map
par = doc.add_paragraph()
print "\n XML before:\n", par._element.xml
html_helper.insert_runs_from_html_map(par, run_map)
print "\n XML after:\n", par._element.xml
doc.save("test_run_mapping.docx") hope it helps |
Thank you so much for spending time to do that! It looks great. I am going to try to implement it now. |
I am getting a type error while trying to make the run fragments. if match.start > ptr: Any Ideas on how to fix this? |
I would expect it to be sorted by now, but for the sake of consistency: |
@electron378 |
I prepare something like this... This is enough for me..
|
Hey, @realsby can you please share full code of python-docx html tags text parser link? |
Is it possible to insert HTML into a Document using python-docx with styling applied?
The only thing I need to work are italics.
For example how to insert "Today is <i>Saturday</i>" with Saturday actually being inserted with italics?
Thanks!
The text was updated successfully, but these errors were encountered: