Skip to content

PptxConverter: TypeError "can only concatenate str (not NoneType) to str" when shape/notes text is None #1808

@NikiforovAll

Description

@NikiforovAll

Summary

PptxConverter.convert performs unguarded + concatenation on shape.text and notes_frame.text. When python-pptx yields None for a text frame (e.g. an <a:r> with no <a:t>, or certain third-party-generated decks), the whole conversion fails with:

PptxConverter threw TypeError with message: can only concatenate str (not "NoneType") to str

One malformed shape fails the entire file.

Affected lines (v0.1.5, also present on main)

packages/markitdown/src/markitdown/converters/_pptx_converter.py:

  • L137 — md_content += "# " + shape.text.lstrip() + "\n"
  • L139 — md_content += shape.text + "\n"
  • L154 — md_content += notes_frame.text

Reproduction

import pptx.text.text
_orig = pptx.text.text.TextFrame.text.fget
pptx.text.text.TextFrame.text = property(
    lambda self: None if _orig(self) == "World" else _orig(self)
)

from markitdown import MarkItDown
MarkItDown().convert("sample.pptx")  # any pptx containing the text "World"
# markitdown._exceptions.FileConversionException:
#  - PptxConverter threw TypeError with message: unsupported operand type(s) for +: 'NoneType' and 'str'

Real-world trigger: a .pptx from a third-party generator with a text-frame run missing its <a:t> child, or a chart/SmartArt element whose title text is None.

Expected

Missing / None text should be treated as empty string. One bad shape shouldn't fail the entire file.

Proposed fix

md_content += "# " + (shape.text or "").lstrip() + "\n"
md_content += (shape.text or "") + "\n"
md_content += (notes_frame.text or "")

Happy to open a PR.

Environment

  • markitdown 0.1.5 (markitdown[all])
  • python-pptx 1.0.2
  • Python 3.12, Ubuntu 24.04

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions