<a href="https://colab.research.google.com/github/stancsz/notebook-scripts/blob/main/powerpoint_to_markdown.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# PowerPoint to Markdown Converter
This script, pptx_to_markdown.py, is designed to convert presentations from PowerPoint (.pptx) format into a markdown (.md) file, complete with formatted text and extracted images. It's particularly useful for documentation, sharing presentations in a text-readable format, or simply for content extraction.

Features
Text Extraction: Extracts all text from each slide and converts it into markdown format.
Image Extraction: Saves all images from the slides as separate files and references them in the markdown file.
Prerequisites
Before you run the script, ensure you have the following:

Python 3.x installed on your system.

The python-pptx library, which can be installed via pip:

```
pip install python-pptx
```
Usage
To use the script, follow these simple steps:

Place Your PowerPoint File: Ensure your .pptx file is accessible to the script, preferably in the same directory for ease of use.
Edit the Script: Modify the pptx_to_markdown('your_presentation.pptx') line in the script to include the path to your PowerPoint file.
Run the Script: Execute the script in your preferred Python environment. It will generate a markdown file named output.md and save images in the same directory.

In [None]:
!pip install python-pptx

Collecting python-pptx
  Downloading python_pptx-0.6.23-py3-none-any.whl (471 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m471.6/471.6 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
Collecting XlsxWriter>=0.5.7 (from python-pptx)
  Downloading XlsxWriter-3.1.9-py3-none-any.whl (154 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m154.8/154.8 kB[0m [31m9.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: XlsxWriter, python-pptx
Successfully installed XlsxWriter-3.1.9 python-pptx-0.6.23


In [None]:
from pptx import Presentation
import io
import os

def pptx_to_markdown(pptx_file, output_folder):
    # Create the output folder if it doesn't exist
    if not os.path.exists(output_folder):
        os.makedirs(output_folder)

    prs = Presentation(pptx_file)
    markdown_content = ""

    for slide_number, slide in enumerate(prs.slides):
        markdown_content += f"## Slide {slide_number + 1}\n"
        for shape in slide.shapes:
            if hasattr(shape, "text"):
                markdown_content += f"{shape.text}\n"
            if shape.shape_type == 13 and hasattr(shape, 'image'):  # Check for picture and image attribute
                image = shape.image
                image_bytes = io.BytesIO(image.blob)
                image_filename = f"slide_{slide_number + 1}_image.png"
                image_filepath = os.path.join(output_folder, image_filename)
                with open(image_filepath, "wb") as f:
                    f.write(image_bytes.read())
                markdown_content += f"![Image](./{output_folder}/{image_filename})\n"

    # Write the markdown content to a file in the output folder
    with open(os.path.join(output_folder, 'output.md'), 'w') as md_file:
        md_file.write(markdown_content)

# Usage
# pptx_to_markdown('your_presentation.pptx', 'output_folder')

In [None]:
# Change 'your_presentation.pptx' to the path of your PowerPoint file.
pptx_to_markdown('/content/drive/MyDrive/megan_ppt.pptx', '/content/drive/MyDrive/megan_output')