Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inserting OMML into Text Frame or Paragraph #528

Open
kencomputes opened this issue Jul 11, 2019 · 3 comments
Open

Inserting OMML into Text Frame or Paragraph #528

kencomputes opened this issue Jul 11, 2019 · 3 comments

Comments

@kencomputes
Copy link

Hello,

I am trying to build a pipeline to convert existing MathML to OMML and insert it into a text frame in PPT.

I came across a very useful post regarding inserting MathML into a Word doc with python-docx.
It involves performing an XSL transformation on the plain MML using Office's "MML2OMML.XSL", then appending that etree object to a new paragraph. (python-openxml/python-docx#320)

Here's an example that works for me with python-docx:

from docx import Document
from docx.shared import Inches
from lxml import etree

doc = Document()

# Convert MathML (MML) into Office MathML (OMML) using a XSLT stylesheet
tree = etree.fromstring('<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>x</mi><mo>=</mo><mstyle displaystyle="true"><mfrac><mrow><mrow><mo>&#8722;</mo><mi>b</mi></mrow><mo>&#177;</mo><msqrt><msup><mi>b</mi><mn>2</mn></msup><mo>&#8722;</mo><mrow><mn>4</mn><mo>&#8290;</mo><mi>a</mi><mo>&#8290;</mo><mi>c</mi></mrow></msqrt></mrow><mrow><mn>2</mn><mo>&#8290;</mo><mi>a</mi></mrow></mfrac></mstyle></math>')
xslt = etree.parse('C:/Program Files/Microsoft Office/root/Office16/MML2OMML.XSL')

transform = etree.XSLT(xslt)
new_dom = transform(tree)

p = doc.add_paragraph()
p._element.append(new_dom.getroot())

doc.save('testDoc.docx')

I tried something similar using python-pptx. It runs without throwing any errors and creates the specified file, however the created document contains no equation.

Here's my attempt with python-pptx:

from pptx import Presentation
from pptx.util import Inches, Pt
from lxml import etree

prs = Presentation()
blank_slide_layout = prs.slide_layouts[6]
slide = prs.slides.add_slide(blank_slide_layout)

left = top = width = height = Inches(1)
txBox = slide.shapes.add_textbox(left, top, width, height)
tf = txBox.text_frame

# Convert MathML (MML) into Office MathML (OMML) using a XSLT stylesheet
tree = etree.fromstring('<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>x</mi><mo>=</mo><mstyle displaystyle="true"><mfrac><mrow><mrow><mo>&#8722;</mo><mi>b</mi></mrow><mo>&#177;</mo><msqrt><msup><mi>b</mi><mn>2</mn></msup><mo>&#8722;</mo><mrow><mn>4</mn><mo>&#8290;</mo><mi>a</mi><mo>&#8290;</mo><mi>c</mi></mrow></msqrt></mrow><mrow><mn>2</mn><mo>&#8290;</mo><mi>a</mi></mrow></mfrac></mstyle></math>')
xslt = etree.parse('C:/Program Files/Microsoft Office/root/Office16/MML2OMML.XSL')

transform = etree.XSLT(xslt)
new_dom = transform(tree)

p = tf.add_paragraph()
p._element.append(new_dom.getroot())

prs.save('testDoc.pptx')

Clearly I'm doing something wrong here, but I'm not quite sure what.

I simplified my MathML to <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>x</mi></math> and manually inserted the resulting OMML into a blank PPT slide then compared the slide1.xml between an empty slide & the slide with the formula.

Here's the XML that was added to the slide when I inserted the formula:

<mc:AlternateContent xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006">
  <mc:Choice xmlns:a14="http://schemas.microsoft.com/office/drawing/2010/main" Requires="a14">
    <p:sp>
      <p:nvSpPr>
        <p:cNvPr id="2" name="Rectangle 1">
          <a:extLst>
            <a:ext uri="{FF2B5EF4-FFF2-40B4-BE49-F238E27FC236}">
              <a16:creationId xmlns:a16="http://schemas.microsoft.com/office/drawing/2014/main" id="{63784C15-6834-4CF0-BA6B-62B8B3ACA648}" />
            </a:ext>
            </a:extLst>
        </p:cNvPr>
        <p:cNvSpPr />
        <p:nvPr />
      </p:nvSpPr>
      <p:spPr>
        <a:xfrm>
          <a:off x="4388007" y="3244334" />
          <a:ext cx="367985" cy="369332" />
        </a:xfrm>
        <a:prstGeom prst="rect">
          <a:avLst />
        </a:prstGeom>
      </p:spPr>
      <p:txBody>
        <a:bodyPr wrap="none">
          <a:spAutoFit />
        </a:bodyPr>
        <a:lstStyle />
        <a:p>
          <a14:m>
            <m:oMathPara xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math">
              <m:oMathParaPr>
                <m:jc m:val="centerGroup" />
              </m:oMathParaPr>
              <m:oMath xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math">
                <m:r>
                  <a:rPr lang="en-US" i="1">
                    <a:latin typeface="Cambria Math" panose="02040503050406030204" pitchFamily="18" charset="0" />
                  </a:rPr>
                  <m:t>?</m:t>
                </m:r>
              </m:oMath>
              </m:oMathPara>
            </a14:m>
          <a:endParaRPr lang="en-US" dirty="0" />
        </a:p>
      </p:txBody>
    </p:sp>
  </mc:Choice>
  <mc:Fallback>
    <p:sp>
      <p:nvSpPr>
        <p:cNvPr id="2" name="Rectangle 1">
          <a:extLst>
            <a:ext uri="{FF2B5EF4-FFF2-40B4-BE49-F238E27FC236}">
              <a16:creationId xmlns:a16="http://schemas.microsoft.com/office/drawing/2014/main" id="{63784C15-6834-4CF0-BA6B-62B8B3ACA648}" />
            </a:ext>
          </a:extLst>
        </p:cNvPr>
        <p:cNvSpPr>
          <a:spLocks noRot="1" noChangeAspect="1" noMove="1" noResize="1" noEditPoints="1" noAdjustHandles="1" noChangeArrowheads="1" noChangeShapeType="1" noTextEdit="1" />
        </p:cNvSpPr>
        <p:nvPr />
      </p:nvSpPr>
      <p:spPr>
        <a:xfrm>
          <a:off x="4388007" y="3244334" />
          <a:ext cx="367985" cy="369332"/>
        </a:xfrm>
        <a:prstGeom prst="rect">
          <a:avLst />
        </a:prstGeom>
        <a:blipFill>
          <a:blip r:embed="rId2" />
          <a:stretch>
            <a:fillRect />
          </a:stretch>
        </a:blipFill>
      </p:spPr>
      <p:txBody>
        <a:bodyPr />
        <a:lstStyle />
        <a:p>
          <a:r>
            <a:rPr lang="en-US">
              <a:noFill />
            </a:rPr>
            <a:t> </a:t>
          </a:r>
        </a:p>
      </p:txBody>
      </p:sp>
  </mc:Fallback>
</mc:AlternateContent>

Does anyone have any clever ideas for how I might go about inserting a formula into a blank slide/text field/paragraph?

@scanny
Copy link
Owner

scanny commented Jul 13, 2019

The prior step here is to add an equation to PowerPoint by hand, using the equation editor, and then examine the XML that produces (that works). It helps a lot to make the example presentation as simple as possible, so one slide with one shape. Then you can find the XML in question with:

$ opc browse my-example.pptx slide1.xml

@kencomputes
Copy link
Author

Thank you for the support @scanny!

I was able to append the math element with the following code:

from pptx import Presentation
from pptx.util import Inches, Pt
from lxml import etree

prs = Presentation()
blank_slide_layout = prs.slide_layouts[6]
slide = prs.slides.add_slide(blank_slide_layout)

left = top = width = height = Inches(1)
txBox = slide.shapes.add_textbox(left, top, width, height)
tf = txBox.text_frame

# Convert MathML (MML) into Office MathML (OMML) using a XSLT stylesheet
tree = etree.fromstring('<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>x</mi><mo>=</mo><mstyle displaystyle="true"><mfrac><mrow><mrow><mo>&#8722;</mo><mi>b</mi></mrow><mo>&#177;</mo><msqrt><msup><mi>b</mi><mn>2</mn></msup><mo>&#8722;</mo><mrow><mn>4</mn><mo>&#8290;</mo><mi>a</mi><mo>&#8290;</mo><mi>c</mi></mrow></msqrt></mrow><mrow><mn>2</mn><mo>&#8290;</mo><mi>a</mi></mrow></mfrac></mstyle></math>')
xslt = etree.parse('C:/Program Files/Microsoft Office/root/Office16/MML2OMML.XSL')

wrapper = etree.fromstring('<a14:m xmlns:a14="http://schemas.microsoft.com/office/drawing/2010/main"><m:oMathPara xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"></m:oMathPara></a14:m>')

transform = etree.XSLT(xslt)
new_dom = transform(tree)

wrapper.getchildren()[0].append(new_dom.getroot())

p = tf.add_paragraph()
p._element.append(wrapper)

prs.save('testDoc.pptx')

If there are text nodes to be inserted before/after the math content that can be accomplished like this:

textWrapOpen = '<a:r xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main"><a:t>'
textWrapClose = '</a:t></a:r>'
string = "Some string "
textTree = etree.fromstring(textWrapOpen + string + textWrapClose)
p._element.append(textTree)

I'm still working out some kinks, but this should be a good starting point for whoever might be attempting this in the future.

@Arya-Programmer
Copy link

Hey there, this was really helpful thanks, saved me so much time.
But I am facing another issue, I am importing a formula from a word then inserting it into a power point slide, its getting the work done but its inserting gibberish into the slide.

import re

import latex2mathml.converter
from pptx import Presentation
from docx import Document
from docx.shared import Inches
import docxlatex as latex
from lxml import etree


def latex_to_word(latex_input, for_ppt=False):
    mathml = latex2mathml.converter.convert(latex_input, display="block")
    
    tree = etree.fromstring(mathml)
    xslt = etree.parse(
        'D:\Programming\python\pptxMathVisualize\src\MML2OMML.XSL'
    )
    transform = etree.XSLT(xslt)
    new_dom = transform(tree)
    if for_ppt:
        wrapper = etree.fromstring(
            '''<a14:m xmlns:a14="http://schemas.microsoft.com/office/drawing/2010/main">
                <m:oMath xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math">
                </m:oMath>
            </a14:m>''')
        wrapper.getchildren()[0].append(new_dom.getroot())
        
        return wrapper
        
    return new_dom.getroot()


def getLatexEquations(text):
    return re.findall(r'\$([^$].*?)\$', text)


document = Document('demo.docx')
latexDoc = latex.Document('demo.docx')
prs = Presentation()

title_slide_layout = prs.slide_layouts[6]
slide = prs.slides.add_slide(title_slide_layout)
left = top = width = height = Inches(1)

equations = getLatexEquations(latexDoc.get_text())

for equation in equations:
    p = document.add_paragraph()
    p._element.append(latex_to_word(equation))

    txBoxNum = equations.index(equation) + 1.5
    txBox = slide.shapes.add_textbox(left, Inches(1 * txBoxNum), width, height)
    tf = txBox.text_frame
    txBoxP = tf.add_paragraph()
    txBoxP._element.append(latex_to_word(equation, for_ppt=True))
    tf

prs.save('test.pptx')
document.save('test1.docx')

Sorry haven't had time to refactor, so let me explain the code a little, basically we have two doc variables, latexDoc is for importing the formula from the docx, the other one is for editing the docx file, and the prs variable is self-explanatory, I am importing the equations using regex to separate them from text, and then putting them in text boxes in the pptx slide.

Here are the original formula
demo docx  Compatibility Mode  - Word (Product Activation Failed) 3_25_2023 12_04_28 AM (2)

This is the output:
test pptx - PowerPoint (Product Activation Failed) 3_25_2023 12_02_58 AM (2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants