Changing Text without editing the formatting #285

powahftw · 2017-05-09T10:49:27Z

To edit some text i currently use pharagraphs and runs and apply all the styles i'm interested back. It would be interesting to have a way to just change the text of an existing text, leaving the formatting untouched.

scanny · 2017-05-09T19:19:51Z

@powahftw Character formatting (font characteristics) are specified at the Run level. A Paragraph object contains one or more (usually more) runs. When assigning to Paragraph.text, all the runs in the paragraph are replaced with a single new run. This is why the text formatting disappears; because the runs that contained that formatting disappear.

Although it would not work for all cases one might want, a useful behavior would be to replace the text in a paragraph, retaining the formatting present in the first run. This could be accomplished like this:

def replace_paragraph_text_retaining_initial_formatting(paragraph, new_text):
    p = paragraph._p  # the lxml element containing the `<a:p>` paragraph element
    # remove all but the first run
    for idx, run in enumerate(paragraph.runs):
        if idx == 0:
            continue
        p.remove(run._r)
    paragraph.runs[0].text = new_text

paragraph = textframe.paragraph[0]  # or wherever you get the paragraph from
new_text = 'foobar'
replace_paragraph_text_retaining_initial_formatting(paragraph, new_text)

I haven't tested this, maybe you can report back any mistakes if you try it out, but I think it gives the gist.

This would be roughly how such a feature would be implemented.

elmundio87 · 2017-08-22T14:16:37Z

@scanny Just tried your suggested function and it works perfectly - thanks!

bashsebbash · 2018-03-26T20:54:40Z

wow... this is perfect. I had written a loop of try/except that stored the attributes of the first run and then re-applied them after changing the text. Feel like a barbarian.

tekbj80 · 2019-04-23T12:13:21Z

Hi guys,

Thanks @scanny for the snippet! It was really helpful.
I did encounter something strange though. For some unknown reason(s), some cells in my file despite looking like a single line of text, it was split into a number of runs.

So, I had to modify your snippet into this

...
whole_text = " ".join([r.text for r in paragraph.runs])
whole_text = re.sub(replacement_string, new_text, whole_text)
    for idx, run in enumerate(paragraph.runs):
        if idx == 0:
            continue
        p = paragraph._p
        p.remove(run._r)
    paragraph.runs[0].text = whole_text
...

Hope this helps... or well, I just wanted to share the solution to the whole morning of frustration...

lokesh1729 · 2019-05-15T09:56:33Z

This really helps us.... thank you very much @scanny

franz-see · 2020-10-08T01:52:39Z

Hi @scanny ,

Thanks for the snippet! Question though - is changing the text on the paragraph really supposed to clear the formatting or is that a bug?

Thanks

scanny · 2020-10-08T18:17:44Z

@franz-see Paragraph.text is a convenience property. There is no general-case way to change the text while preserving the formatting. For example, if you wanted to replace:

The quick, brown fox.

with:

The lazy yellow dog.

How would you do that with something like Paragraph.text? So assigning text to Paragraph.text replaces all the runs in the paragraph with a single run containing the assigned text with no special formatting.

Character formatting provided by the paragraph-style is preserved, and generally this produces the best possible result. If you need to apply "inline" character formatting, then you need to do it yourself, run-by-run.

One way to do this is to assign "" to paragraph.text to "clear" the existing text and then add runs to the paragraph to suit.

finrodfelagund13 · 2020-11-17T15:13:43Z

That's really useful, thanks @scanny.

Is there a way to keep the formatting when replacing text in tables (cells), too?
To be honest, I don't fully understand what your above code does, so I'd appreciate any help.

def replace_text(self, replacements: dict, shapes: list):
    for shape in shapes:
        for match, replacement in replacements.items():
            if shape.has_table:
                for row in shape.table.rows:
                    for cell in row.cells:
                        if match in cell.text:
                            new_text = cell.text.replace(str(match), str(replacement))
                            cell.text = new_text

from: https://stackoverflow.com/questions/37924808/python-pptx-power-point-find-and-replace-text-ctrl-h

finrodfelagund13 · 2020-11-18T12:42:46Z

I think, I solved it. It was just a matter of finding how to access the run level for cells. I'll leave the solution here in case someone encounters a similar problem.

Thank you for this great module!

    for shape in shapes:
        for match, replacement in replacements.items():
            if shape.has_table:
                for row in shape.table.rows:
                    for cell in row.cells:
                        if match in cell.text:                                                        
                            for paragraph in cell.text_frame.paragraphs:
                               for run in paragraph.runs:
                                   p = paragraph._p  # the lxml element containing the `<a:p>` paragraph element
    #                            remove all but the first run
                                   for idx, run in enumerate(paragraph.runs):
                                       if idx == 0:
                                           continue
                                       p.remove(run._r)
                                   cur_text = run.text
                                   new_text = cur_text.replace(str(match), str(replacement))
                                   run.text = new_text

scanny · 2020-11-18T19:11:33Z

Same code, slightly refactored for length and indent-level. I think there was a bug in there too, you deleted runs before capturing the text they contain.

def iter_table_cells(shapes):
    for shape in shapes:
        if not shape.has_table:
            continue
        for row in shape.table.rows:
            for cell in row.cells:
                yield cell


for cell in iter_table_cells(shapes):
    for match, replacement in replacements.items():
        for paragraph in cell.text_frame.paragraphs:
            if match not in paragraph.text:
                continue
            orig_text = paragraph.text
            # --- the lxml element containing the `<a:p>` paragraph element ---
            p = paragraph._p
            # --- remove all but the first run ---
            for run in paragraph.runs[1:]:
                p.remove(run._r)
            run = paragraph.runs[0]
            run.text = orig_text.replace(str(match), str(replacement))

finrodfelagund13 · 2020-11-20T09:40:21Z

You are very kind. Thanks again.

neilmario70 · 2021-05-19T09:26:19Z

I am trying to highlight a specific word in red color in a pptx file using the below function.

def highlight_word_in_text(paragraph,highlight_word):
    p = paragraph._p
    paratext = p.text
    if highlight_word in paratext:
        for idx, run in enumerate(paragraph.runs):
            if idx == 0:
                continue
            p.remove(run._r)
        paragraph.runs[0].text = paratext[0:paratext.index(highlight_word)]
        run = paragraph.add_run()
        run.text = paratext[paratext.index(highlight_word):paratext.index(highlight_word)+len(highlight_word)]
        run.font.color.rgb = RGBColor(255, 0, 0)
        run = paragraph.add_run()
        run.text = paratext[paratext.index(highlight_word)+len(highlight_word):]

While it works, it loses the formatting and also adds some unusual characters like '_x000B' to some words where it finds the highlighted word. Could you please tell me what I am missing?

Full code below:

def highlight_word_in_text(paragraph,highlight_word):
    p = paragraph._p
    paratext = p.text
    if highlight_word in paratext:
        for idx, run in enumerate(paragraph.runs):
            if idx == 0:
                continue
            p.remove(run._r)
        paragraph.runs[0].text = paratext[0:paratext.index(highlight_word)]
        run = paragraph.add_run()
        run.text = paratext[paratext.index(highlight_word):paratext.index(highlight_word)+len(highlight_word)]
        run.font.color.rgb = RGBColor(255, 0, 0)
        run = paragraph.add_run()
        run.text = paratext[paratext.index(highlight_word)+len(highlight_word):]
        
prs2 = Presentation('test2.pptx')
for slide in prs2.slides:
    for shape in slide.shapes:
        if not shape.has_text_frame:
            continue
        for paragraph in shape.text_frame.paragraphs:
            highlight_word_in_text(paragraph,'business')
            
prs2.save('test3.pptx')

scanny added the text label May 9, 2017

HReynaud mentioned this issue May 15, 2024

Replace text with formatting #684

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changing Text without editing the formatting #285

Changing Text without editing the formatting #285

powahftw commented May 9, 2017

scanny commented May 9, 2017

elmundio87 commented Aug 22, 2017

bashsebbash commented Mar 26, 2018

tekbj80 commented Apr 23, 2019 •

edited

Loading

lokesh1729 commented May 15, 2019

franz-see commented Oct 8, 2020

scanny commented Oct 8, 2020

finrodfelagund13 commented Nov 17, 2020 •

edited

Loading

finrodfelagund13 commented Nov 18, 2020 •

edited

Loading

scanny commented Nov 18, 2020

finrodfelagund13 commented Nov 20, 2020

neilmario70 commented May 19, 2021

Changing Text without editing the formatting #285

Changing Text without editing the formatting #285

Comments

powahftw commented May 9, 2017

scanny commented May 9, 2017

elmundio87 commented Aug 22, 2017

bashsebbash commented Mar 26, 2018

tekbj80 commented Apr 23, 2019 • edited Loading

lokesh1729 commented May 15, 2019

franz-see commented Oct 8, 2020

scanny commented Oct 8, 2020

finrodfelagund13 commented Nov 17, 2020 • edited Loading

finrodfelagund13 commented Nov 18, 2020 • edited Loading

scanny commented Nov 18, 2020

finrodfelagund13 commented Nov 20, 2020

neilmario70 commented May 19, 2021

tekbj80 commented Apr 23, 2019 •

edited

Loading

finrodfelagund13 commented Nov 17, 2020 •

edited

Loading

finrodfelagund13 commented Nov 18, 2020 •

edited

Loading