Skip to content

Commit

Permalink
Documentation: Updates for more file support.
Browse files Browse the repository at this point in the history
Also adds a little more info to rag.rst.
  • Loading branch information
jamie-lemon committed May 24, 2024
1 parent f26b673 commit 8ec2407
Show file tree
Hide file tree
Showing 7 changed files with 174 additions and 1 deletion.
44 changes: 44 additions & 0 deletions docs/about-feature-matrix.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,22 @@
:width: 0
:height: 0

.. image:: images/icons/icon-docx.svg
:width: 0
:height: 0

.. image:: images/icons/icon-pptx.svg
:width: 0
:height: 0

.. image:: images/icons/icon-xlsx.svg
:width: 0
:height: 0

.. image:: images/icons/icon-hangul.svg
:width: 0
:height: 0

.. raw:: html


Expand Down Expand Up @@ -145,6 +161,26 @@
background-size: 40px 40px;
}
#feature-matrix .icon.docx {
background: url("_images/icon-docx.svg") 0 0 transparent no-repeat;
background-size: 40px 40px;
}
#feature-matrix .icon.pptx {
background: url("_images/icon-pptx.svg") 0 0 transparent no-repeat;
background-size: 40px 40px;
}
#feature-matrix .icon.xlsx {
background: url("_images/icon-xlsx.svg") 0 0 transparent no-repeat;
background-size: 40px 40px;
}
#feature-matrix .icon.hangul {
background: url("_images/icon-hangul.svg") 0 0 transparent no-repeat;
background-size: 40px 40px;
}
</style>


Expand Down Expand Up @@ -172,6 +208,12 @@
<span class="icon svg"><cite>SVG</cite></span>
<span class="icon txt"><cite>TXT</cite></span>
<span class="icon image"><cite id="transFM3">Image</cite></span>
<hr/>
<span class="icon docx"><cite>DOCX</cite></span>
<span class="icon xlsx"><cite>XLSX</cite></span>
<span class="icon pptx"><cite>PPTX</cite></span>
<span class="icon hangul"><cite>HWPX</cite></span>
<span class=""><cite>See <a href="#note">note</a></cite></span>
</td>
<td>
<span class="icon pdf"><cite>PDF</cite></span>
Expand Down Expand Up @@ -579,3 +621,5 @@


<br/>

<div id="note"></div>
38 changes: 38 additions & 0 deletions docs/about.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,44 @@ The following table illustrates how |PyMuPDF| compares with other typical soluti
.. include:: about-feature-matrix.rst


----

.. image:: images/icons/icon-docx.svg
:width: 40
:height: 40

.. image:: images/icons/icon-xlsx.svg
:width: 40
:height: 40

.. image:: images/icons/icon-pptx.svg
:width: 40
:height: 40


.. image:: images/icons/icon-hangul.svg
:width: 40
:height: 40



.. note::

A note about **Office** document types (DOCX, XLXS, PPTX) and **Hangul** documents (HWPX). These documents can be loaded into |PyMuPDF| and you will receive a :ref:`Document <Document>` object.

There are some caveats:


- we convert the input to **HTML** to layout the content.
- because of this the original page separation has gone.

When saving out the result any faithful representation of the original layout cannot be expected.

Therefore input files are mostly in a form that's useful for text extraction.


----

.. _About_Performance:

Performance
Expand Down
19 changes: 19 additions & 0 deletions docs/images/icons/icon-docx.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
35 changes: 35 additions & 0 deletions docs/images/icons/icon-hangul.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
19 changes: 19 additions & 0 deletions docs/images/icons/icon-pptx.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
18 changes: 18 additions & 0 deletions docs/images/icons/icon-xlsx.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/rag.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Integrating |PyMuPDF| into your :title:`Large Language Model (LLM)` framework an

There are a few well known :title:`LLM` solutions which have their own interfaces with |PyMuPDF| - it is a fast growing area, so please let us know if you discover any more!

If you need to export to :title:`Markdown`:
If you need to export to :title:`Markdown` or obtain a :title:`LlamaIndex` Document from a file:

.. raw:: html

Expand Down

0 comments on commit 8ec2407

Please sign in to comment.