In [1]:
!git clone --depth=1 --branch 1.25.1 https://github.com/ArtifexSoftware/mupdf

Cloning into 'mupdf'...
remote: Enumerating objects: 1224, done.[K
remote: Counting objects: 100% (1224/1224), done.[K
remote: Compressing objects: 100% (1073/1073), done.[K
remote: Total 1224 (delta 215), reused 543 (delta 130), pack-reused 0 (from 0)[K
Receiving objects: 100% (1224/1224), 38.29 MiB | 12.27 MiB/s, done.
Resolving deltas: 100% (215/215), done.
Note: switching to '5d61c9df9c5d72520335898ce53046aa53a4e8a7'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false



In [5]:
from langchain.document_loaders import TextLoader
from pathlib import Path
from itertools import chain

source_dir = Path('./mupdf')

def load_cpp_files(source_dir):
    docs = []
    patterns = ['.c', '.h', '.cc', '.cpp']
    for pattern in patterns:
        for filepath in source_dir.glob(f'**/*{pattern}'):
            loader = TextLoader(filepath, encoding='latin1')
            docs.append(loader.load())
    return docs


cpp_docs = load_cpp_files(source_dir)
print(len(cpp_docs))

521


In [7]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

chunk_size = 1000
chunk_overlap = 200
def split_docs(docs):
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size,
        chunk_overlap=chunk_overlap,
    )
    return text_splitter.split_documents(docs)

docs_list = [item for sublist in cpp_docs for item in sublist]
chunked_docs = split_docs(docs_list)
print(len(chunked_docs))

19740


In [4]:
rm -rf cpp_code_index

In [8]:
from langchain.vectorstores import FAISS
from langchain_ollama import OllamaEmbeddings

embeddings = OllamaEmbeddings(model="nomic-embed-text")
vectorstore = FAISS.from_documents(chunked_docs, embeddings)
vectorstore.save_local("cpp_code_index")

In [13]:
from langchain.chains import RetrievalQA
from langchain_ollama import ChatOllama

# Set up the QA chain
model = 'llama3.1'
qa_chain = RetrievalQA.from_chain_type(
    llm=ChatOllama(model=model, temperature=0.0),
    retriever=vectorstore.as_retriever(),
)

# Ask a question
query = "how to create a new PDF using mupdf library?"
answer = qa_chain.invoke(query)
print(answer)

{'query': 'how to create a new PDF using mupdf library?', 'result': 'Based on the provided context, here\'s an example of how you can create a new PDF using the MuPDF library:\n\n```c\n#include "mupdf/fitz/display-list.h"\n#include "mupdf/pdf/document.h"\n\nint main() {\n    fz_context *ctx = fz_new_context(NULL, NULL, FZ_STORE_DEFAULT);\n    if (!ctx) {\n        fprintf(stderr, "Could not create global context.\\n");\n        return EXIT_FAILURE;\n    }\n\n    /* Register the document handlers (only really need PDF, but this is\n     * the simplest way. */\n    fz_register_document_handlers(ctx);\n\n    fz_try(ctx) {\n        /* Create a new PDF document with one page. */\n        pdf_document *pdf = pdf_new_document(ctx, 1);\n        if (!pdf) {\n            fprintf(stderr, "Could not create PDF document.\\n");\n            return EXIT_FAILURE;\n        }\n\n        /* Get the first page of the document. */\n        pdf_page *page = pdf_get_page(ctx, pdf, 0);\n\n        /* Create a n

In [14]:
print(answer['result'])

Based on the provided context, here's an example of how you can create a new PDF using the MuPDF library:

```c
#include "mupdf/fitz/display-list.h"
#include "mupdf/pdf/document.h"

int main() {
    fz_context *ctx = fz_new_context(NULL, NULL, FZ_STORE_DEFAULT);
    if (!ctx) {
        fprintf(stderr, "Could not create global context.\n");
        return EXIT_FAILURE;
    }

    /* Register the document handlers (only really need PDF, but this is
     * the simplest way. */
    fz_register_document_handlers(ctx);

    fz_try(ctx) {
        /* Create a new PDF document with one page. */
        pdf_document *pdf = pdf_new_document(ctx, 1);
        if (!pdf) {
            fprintf(stderr, "Could not create PDF document.\n");
            return EXIT_FAILURE;
        }

        /* Get the first page of the document. */
        pdf_page *page = pdf_get_page(ctx, pdf, 0);

        /* Create a new font object for the page. */
        pdf_font *font = pdf_new_font(ctx, "Helvetica", 12);
       

In [15]:
query = "how to create a new PDF with an image inside using mupdf library?"
answer = qa_chain.invoke(query)
print(answer)

{'query': 'how to create a new PDF with an image inside using mupdf library?', 'result': 'Unfortunately, the provided code snippet does not include any functionality for creating a new PDF document or adding images to it. However, based on the MuPDF documentation and other sources, I can guide you through the process.\n\nTo create a new PDF with an image inside using the MuPDF library, you would need to:\n\n1. Initialize the MuPDF context.\n2. Create a new PDF document object.\n3. Add an image to the document\'s page.\n4. Save the document to a file.\n\nHere is some sample code that demonstrates these steps:\n```c\n#include <mupdf/fitz.h>\n#include <stdio.h>\n\nint main(int argc, char **argv)\n{\n    fz_context *ctx = fz_new_context(NULL);\n    if (!ctx) {\n        return 1;\n    }\n\n    // Create a new PDF document object.\n    fz_document *doc = fz_new_document(ctx, NULL, 0);\n    if (!doc) {\n        fz_drop_context(ctx);\n        return 1;\n    }\n\n    // Add an image to the firs

In [17]:
print(answer['result'])

Unfortunately, the provided code snippet does not include any functionality for creating a new PDF document or adding images to it. However, based on the MuPDF documentation and other sources, I can guide you through the process.

To create a new PDF with an image inside using the MuPDF library, you would need to:

1. Initialize the MuPDF context.
2. Create a new PDF document object.
3. Add an image to the document's page.
4. Save the document to a file.

Here is some sample code that demonstrates these steps:
```c
#include <mupdf/fitz.h>
#include <stdio.h>

int main(int argc, char **argv)
{
    fz_context *ctx = fz_new_context(NULL);
    if (!ctx) {
        return 1;
    }

    // Create a new PDF document object.
    fz_document *doc = fz_new_document(ctx, NULL, 0);
    if (!doc) {
        fz_drop_context(ctx);
        return 1;
    }

    // Add an image to the first page of the document.
    fz_page *page = fz_add_page(doc, 1);
    if (!page) {
        fz_drop_document(doc);
      

In [18]:
query = "is there a c++ binding for mupdf library?"
answer = qa_chain.invoke(query)
print(answer['result'])

{'query': 'is there a c++ binding for mupdf library?', 'result': "Yes, based on the provided context, it appears that there is a C++ binding for the MuPDF library. The `MuOfficeLib_run` function takes a `void (*fn)(fz_context *ctx, void *arg)` parameter, which suggests that there are functions available in the MuPDF library that can be called from C++. Additionally, the `JMETHOD` macros and the use of `j_common_ptr` and other types suggest that the MuPDF library is being used with a Java Native Interface (JNI) to interact with Java code.\n\nHowever, it's worth noting that the provided context does not explicitly state that there is a C++ binding for the MuPDF library. It only shows some function declarations and macros related to the MuOfficeLib and MuPDF libraries.\n\nIf you're looking for more information or a specific C++ API documentation for the MuPDF library, I would recommend searching online or checking the official MuPDF website for more details."}


In [20]:
query = "how to use the c++ binding for mupdf library?"
answer = qa_chain.invoke(query)
print(answer['result'])

The MuPDF C++ binding is not explicitly documented in the provided code snippet. However, based on the context and the `MuOfficeLib_run` function signature, it appears that you can use the MuPDF C++ binding by calling the `MuOfficeLib_run` function with a pointer to a MuPDF function and some opaque data.

Here's an example of how you might use the MuPDF C++ binding to render a single page from a PDF document:

```cpp
int main(int argc, char **argv)
{
    // Initialize MuPDF context
    fz_context *ctx = fz_new_context(NULL);
    if (!ctx) {
        return 1;
    }

    // Load the PDF document
    char *input = argv[1];
    float zoom = atof(argv[3]);
    int rotate = atoi(argv[4]);
    int page_number = atoi(argv[2]);

    fz_document *doc = fz_load_from_file(ctx, input, NULL);
    if (!doc) {
        return 1;
    }

    // Get the first page
    fz_page *page = fz_get_page(doc, page_number - 1);

    // Create a pixmap to render the page into
    int width, height;
    fz_pixmap *pi