Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for parsing Equations #892

Open
AM-ash-OR-AM-I opened this issue Jun 5, 2023 · 1 comment
Open

Support for parsing Equations #892

AM-ash-OR-AM-I opened this issue Jun 5, 2023 · 1 comment

Comments

@AM-ash-OR-AM-I
Copy link

AM-ash-OR-AM-I commented Jun 5, 2023

There's #528 issue before that showed how to insert Office Math ML (Equations) text, but I want to know is there any way to parse/extract text? #706 that seemed to have handled it however it still doesn't work for all text:
for e.g. in this below extract from slide.xml, it parses "We factorise it as" under <a:r> tag but doesn't not parse "𝑥" under <a14:m> tag.

<mc:AlternateContent xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
  xmlns:a14="http://schemas.microsoft.com/office/drawing/2010/main">
  <mc:Choice Requires="a14">
    <p:sp>
      <p:nvSpPr>
        <p:cNvPr id="8" name="TextBox 7" />
        <p:cNvSpPr txBox="1" />
        <p:nvPr />
      </p:nvSpPr>
      <p:spPr>
        <a:xfrm>
          <a:off x="1422400" y="4460458" />
          <a:ext cx="4528458" cy="682046" />
        </a:xfrm>
        <a:prstGeom prst="rect">
          <a:avLst />
        </a:prstGeom>
      </p:spPr>
      <p:txBody>
        <a:bodyPr wrap="square" lIns="0" tIns="0" rIns="0" bIns="0" rtlCol="0" anchor="t">
          <a:spAutoFit />
        </a:bodyPr>
        <a:lstStyle />
        <a:p>
          <a:pPr>
            <a:lnSpc>
              <a:spcPts val="5725" />
            </a:lnSpc>
          </a:pPr>
          <a:r>
            <a:rPr lang="en-IN" sz="4000">
              <a:solidFill>
                <a:schemeClr val="bg1" />
              </a:solidFill>
            </a:rPr>
            <a:t>We factorise it as </a:t>
          </a:r>
          <a14:m>
            <m:oMath xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math">
              <m:r>
                <a:rPr lang="en-US" sz="4000" i="1" spc="-229">
                  <a:solidFill>
                    <a:srgbClr val="FFC000" />
                  </a:solidFill>
                  <a:latin typeface="Cambria Math" />
                  <a:ea typeface="Cambria Math" panose="02040503050406030204" pitchFamily="18"
                    charset="0" />
                </a:rPr>
                <m:t>𝑥</m:t>  # Doesn't parse this
              </m:r>
            </m:oMath>
          </a14:m>
          <a:r>
            <a:rPr lang="en-IN" sz="4000">
              <a:solidFill>
                <a:schemeClr val="bg1" />
              </a:solidFill>
            </a:rPr>
            <a:t> =</a:t>
          </a:r>
          <a:endParaRPr lang="en-US" sz="4000" spc="-229" dirty="0">
            <a:solidFill>
              <a:schemeClr val="bg1" />
            </a:solidFill>
            <a:latin typeface="+mj-lt" />
          </a:endParaRPr>
        </a:p>
      </p:txBody>
    </p:sp>
  </mc:Choice>
  <mc:Fallback xmlns="">
    <p:sp>
      <p:nvSpPr>
        <p:cNvPr id="8" name="TextBox 7" />
        <p:cNvSpPr txBox="1">
          <a:spLocks noRot="1" noChangeAspect="1" noMove="1" noResize="1" noEditPoints="1"
            noAdjustHandles="1" noChangeArrowheads="1" noChangeShapeType="1" noTextEdit="1" />
        </p:cNvSpPr>
        <p:nvPr />
      </p:nvSpPr>
      <p:spPr>
        <a:xfrm>
          <a:off x="1422400" y="4460458" />
          <a:ext cx="4528458" cy="682046" />
        </a:xfrm>
        <a:prstGeom prst="rect">
          <a:avLst />
        </a:prstGeom>
        <a:blipFill>
          <a:blip r:embed="rId4" />
          <a:stretch>
            <a:fillRect l="-6729" t="-13393" r="-1211" b="-43750" />
          </a:stretch>
        </a:blipFill>
      </p:spPr>
      <p:txBody>
        <a:bodyPr />
        <a:lstStyle />
        <a:p>
          <a:r>
            <a:rPr lang="en-US">
              <a:noFill />
            </a:rPr>
            <a:t> </a:t>
          </a:r>
        </a:p>
      </p:txBody>
    </p:sp>
  </mc:Fallback>
</mc:AlternateContent>

Is there any way to extract by parsing tree?

@AM-ash-OR-AM-I AM-ash-OR-AM-I changed the title Support for parsing Office ML text Support for parsing Office Math ML text Jun 5, 2023
@AM-ash-OR-AM-I AM-ash-OR-AM-I changed the title Support for parsing Office Math ML text Support for parsing Equations Jun 5, 2023
@bennettbrowniowa
Copy link

Also see issue #947 . I'm interested in this project to extract professors' slides' text for a platform to crowdsource contributions, revision, and reviews of teaching materials in quantum information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants