Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing Text without editing the formatting #285

Open
powahftw opened this issue May 9, 2017 · 12 comments
Open

Changing Text without editing the formatting #285

powahftw opened this issue May 9, 2017 · 12 comments
Labels

Comments

@powahftw
Copy link

powahftw commented May 9, 2017

To edit some text i currently use pharagraphs and runs and apply all the styles i'm interested back. It would be interesting to have a way to just change the text of an existing text, leaving the formatting untouched.

@scanny scanny added the text label May 9, 2017
@scanny
Copy link
Owner

scanny commented May 9, 2017

@powahftw Character formatting (font characteristics) are specified at the Run level. A Paragraph object contains one or more (usually more) runs. When assigning to Paragraph.text, all the runs in the paragraph are replaced with a single new run. This is why the text formatting disappears; because the runs that contained that formatting disappear.

Although it would not work for all cases one might want, a useful behavior would be to replace the text in a paragraph, retaining the formatting present in the first run. This could be accomplished like this:

def replace_paragraph_text_retaining_initial_formatting(paragraph, new_text):
    p = paragraph._p  # the lxml element containing the `<a:p>` paragraph element
    # remove all but the first run
    for idx, run in enumerate(paragraph.runs):
        if idx == 0:
            continue
        p.remove(run._r)
    paragraph.runs[0].text = new_text

paragraph = textframe.paragraph[0]  # or wherever you get the paragraph from
new_text = 'foobar'
replace_paragraph_text_retaining_initial_formatting(paragraph, new_text)

I haven't tested this, maybe you can report back any mistakes if you try it out, but I think it gives the gist.

This would be roughly how such a feature would be implemented.

@elmundio87
Copy link

@scanny Just tried your suggested function and it works perfectly - thanks!

@bashsebbash
Copy link

wow... this is perfect. I had written a loop of try/except that stored the attributes of the first run and then re-applied them after changing the text. Feel like a barbarian.

@tekbj80
Copy link

tekbj80 commented Apr 23, 2019

Hi guys,

Thanks @scanny for the snippet! It was really helpful.
I did encounter something strange though. For some unknown reason(s), some cells in my file despite looking like a single line of text, it was split into a number of runs.

So, I had to modify your snippet into this

...
whole_text = " ".join([r.text for r in paragraph.runs])
whole_text = re.sub(replacement_string, new_text, whole_text)
    for idx, run in enumerate(paragraph.runs):
        if idx == 0:
            continue
        p = paragraph._p
        p.remove(run._r)
    paragraph.runs[0].text = whole_text
...

Hope this helps... or well, I just wanted to share the solution to the whole morning of frustration...

@lokesh1729
Copy link

This really helps us.... thank you very much @scanny

@franz-see
Copy link

Hi @scanny ,

Thanks for the snippet! Question though - is changing the text on the paragraph really supposed to clear the formatting or is that a bug?

Thanks

@scanny
Copy link
Owner

scanny commented Oct 8, 2020

@franz-see Paragraph.text is a convenience property. There is no general-case way to change the text while preserving the formatting. For example, if you wanted to replace:

The quick, brown fox.

with:

The lazy yellow dog.

How would you do that with something like Paragraph.text? So assigning text to Paragraph.text replaces all the runs in the paragraph with a single run containing the assigned text with no special formatting.

Character formatting provided by the paragraph-style is preserved, and generally this produces the best possible result. If you need to apply "inline" character formatting, then you need to do it yourself, run-by-run.

One way to do this is to assign "" to paragraph.text to "clear" the existing text and then add runs to the paragraph to suit.

@finrodfelagund13
Copy link

finrodfelagund13 commented Nov 17, 2020

That's really useful, thanks @scanny.

Is there a way to keep the formatting when replacing text in tables (cells), too?
To be honest, I don't fully understand what your above code does, so I'd appreciate any help.

def replace_text(self, replacements: dict, shapes: list):
    for shape in shapes:
        for match, replacement in replacements.items():
            if shape.has_table:
                for row in shape.table.rows:
                    for cell in row.cells:
                        if match in cell.text:
                            new_text = cell.text.replace(str(match), str(replacement))
                            cell.text = new_text

from: https://stackoverflow.com/questions/37924808/python-pptx-power-point-find-and-replace-text-ctrl-h

@finrodfelagund13
Copy link

finrodfelagund13 commented Nov 18, 2020

I think, I solved it. It was just a matter of finding how to access the run level for cells. I'll leave the solution here in case someone encounters a similar problem.

Thank you for this great module!

    for shape in shapes:
        for match, replacement in replacements.items():
            if shape.has_table:
                for row in shape.table.rows:
                    for cell in row.cells:
                        if match in cell.text:                                                        
                            for paragraph in cell.text_frame.paragraphs:
                               for run in paragraph.runs:
                                   p = paragraph._p  # the lxml element containing the `<a:p>` paragraph element
    #                            remove all but the first run
                                   for idx, run in enumerate(paragraph.runs):
                                       if idx == 0:
                                           continue
                                       p.remove(run._r)
                                   cur_text = run.text
                                   new_text = cur_text.replace(str(match), str(replacement))
                                   run.text = new_text

@scanny
Copy link
Owner

scanny commented Nov 18, 2020

Same code, slightly refactored for length and indent-level. I think there was a bug in there too, you deleted runs before capturing the text they contain.

def iter_table_cells(shapes):
    for shape in shapes:
        if not shape.has_table:
            continue
        for row in shape.table.rows:
            for cell in row.cells:
                yield cell


for cell in iter_table_cells(shapes):
    for match, replacement in replacements.items():
        for paragraph in cell.text_frame.paragraphs:
            if match not in paragraph.text:
                continue
            orig_text = paragraph.text
            # --- the lxml element containing the `<a:p>` paragraph element ---
            p = paragraph._p
            # --- remove all but the first run ---
            for run in paragraph.runs[1:]:
                p.remove(run._r)
            run = paragraph.runs[0]
            run.text = orig_text.replace(str(match), str(replacement))

@finrodfelagund13
Copy link

You are very kind. Thanks again.

@neilmario70
Copy link

I am trying to highlight a specific word in red color in a pptx file using the below function.

def highlight_word_in_text(paragraph,highlight_word):
    p = paragraph._p
    paratext = p.text
    if highlight_word in paratext:
        for idx, run in enumerate(paragraph.runs):
            if idx == 0:
                continue
            p.remove(run._r)
        paragraph.runs[0].text = paratext[0:paratext.index(highlight_word)]
        run = paragraph.add_run()
        run.text = paratext[paratext.index(highlight_word):paratext.index(highlight_word)+len(highlight_word)]
        run.font.color.rgb = RGBColor(255, 0, 0)
        run = paragraph.add_run()
        run.text = paratext[paratext.index(highlight_word)+len(highlight_word):]

While it works, it loses the formatting and also adds some unusual characters like '_x000B' to some words where it finds the highlighted word. Could you please tell me what I am missing?

Full code below:

def highlight_word_in_text(paragraph,highlight_word):
    p = paragraph._p
    paratext = p.text
    if highlight_word in paratext:
        for idx, run in enumerate(paragraph.runs):
            if idx == 0:
                continue
            p.remove(run._r)
        paragraph.runs[0].text = paratext[0:paratext.index(highlight_word)]
        run = paragraph.add_run()
        run.text = paratext[paratext.index(highlight_word):paratext.index(highlight_word)+len(highlight_word)]
        run.font.color.rgb = RGBColor(255, 0, 0)
        run = paragraph.add_run()
        run.text = paratext[paratext.index(highlight_word)+len(highlight_word):]
        
prs2 = Presentation('test2.pptx')
for slide in prs2.slides:
    for shape in slide.shapes:
        if not shape.has_text_frame:
            continue
        for paragraph in shape.text_frame.paragraphs:
            highlight_word_in_text(paragraph,'business')
            
prs2.save('test3.pptx')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants