# Introduction to MMAPIS Project

Welcome to our comprehensive guide on using the MMAPIS project. MMAPIS, a multi-modal automated academic paper interpretation system, is designed to help you quickly understand the key points of a target article. It provides a solution to address the critical needs of the scientific community in our rapidly evolving digital landscape.

This Jupyter notebook offers a step-by-step tutorial on leveraging the capabilities of MMAPIS and the designed APIs to extract and interpret key information from academic papers. We will cover the following topics:

- Preprocessing
- 2-stage Summarization
- Downstream Applications

## Prerequisites

Before diving in, ensure you have the following prerequisites:

1. Follow the [Readme/# 🚀How to Run](../Readme.md) instructions to install the required packages and set up the environment.
2. Configure your environment in the [config](../config) directory.

## Objectives

In this module, we aim to:

1. Introduce the MMAPIS project.
2. Demonstrate how to use MMAPIS APIs for extracting and interpreting key information from academic papers.
3. Start up your server:

   - If running locally, follow:
     ```bash
     cd path/to/MMAPIS
     uvicorn backend:app --reload --port <your port>
     ```
     Example:
     ```
     uvicorn backend:app --reload --port 8000
     ```

   - If running on a server:
     ```
     uvicorn backend:app --host 0.0.0.0 --port <your port> --reload
     ```
     Example:
     ```
     uvicorn backend:app --host 0.0.0.0 --port 5010 --reload
     ```

After successfully starting your backend, you can find detailed parameters for any designed API in the FastAPI Interactive API docs at http://127.0.0.1:8000/docs (if running locally and your port is 8000).

## Preprocessing

### initialize the library

In [1]:
import requests
import sys
import os
# make sure we can import MMAPIS Module
sys.path.append(os.path.abspath("../../"))
from MMAPIS.tools import extract_zip_from_bytes,download_pdf,get_pdf_name
from MMAPIS.config.config import GENERAL_CONFIG, OPENAI_CONFIG, ARXIV_CONFIG, NOUGAT_CONFIG,INTEGRATE_PROMPTS,SECTION_PROMPTS,ALIGNMENT_CONFIG,LOGGER_MODES,TTS_CONFIG, APPLICATION_PROMPTS,TTS_CONFIG
# if you run locally and your port is set to 8000
# if your back end is running on a different port, url should http://<your ip>:<your port>
url = "http://127.0.0.1:8000"

MMAPIS_Dir = os.path.abspath("../")
MMAPIS_Dir

INFO:root:Best GPU: 0. Batch size: 4
INFO:root:Loading logging file from d:\git\MMAPIS\config\logging.ini


'd:\\git\\MMAPIS'

### Crawl from Arxiv

You can search for your target articles of interest using parameters similar to those on [arxiv](https://arxiv.org/search/?query=&searchtype=all&abstracts=show&order=-announced_date_first&size=50).

If your request body is invalid, you will receive a reminder like the following:
```json
{
    'status': 'request error',
    'message': 'input params error: error 1:type: bool_parsing, location: request body, param return_md input:5, msg: Input should be a valid boolean, unable to interpret input'
}
```
In the following example code, we based on [Attention Is All You Need](https://arxiv.org/pdf/1706.03762.pdf), which introduces the transformer model.

In [2]:
import requests
arxiv_url = url + "/get_links/"
params = {
    "key_word":"attention is all you need",
     "max_return":5,
     "return_md":False,
     "order":"announced_date_first"
}

arxiv_response = requests.post(arxiv_url,json=params)
arxiv_response

<Response [200]>

In [3]:
eval(arxiv_response.text)

{'status': 'success',
 'message': [{'pdf_url': 'https://arxiv.org/pdf/1706.03762',
   'title': 'Attention Is All You Need',
   'author': 'Authors:\nAshish Vaswani, \n      \n      Noam Shazeer, \n      \n      Niki Parmar, \n      \n      Jakob Uszkoreit, \n      \n      Llion Jones, \n      \n      Aidan N. Gomez, \n      \n      Lukasz Kaiser, \n      \n      Illia Polosukhin',
   'abstract': 'Abstract:\n      \n        …are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurr…\n        '},
  {'pdf_url': 'https://arxiv.org/pdf/1905.13497',
   'title': 'Attention Is (not) All You Need for Commonsense Reasoning',
   'author': 'Authors:\nTassilo Klein, \n      \n      Moin Nabi',
   'abstract': 'Abstract:\n      \n        …stro

Download the pdf for subsequent processing

In [4]:

pdf_ls = eval(arxiv_response.text)["message"]
transformer_url = pdf_ls[0]["pdf_url"]
flag, transformer_pdf_path = download_pdf(transformer_url,save_dir=os.path.join(MMAPIS_Dir,"res"))
transformer_pdf_path

WindowsPath('d:/git/MMAPIS/res/1706_03762/1706_03762.pdf')

### nougat

Nougat facilitates the conversion of a PDF file to its corresponding Markdown file. 

Due to the fundamental constraints imposed by the HTTP protocol, the process of uploading files necessitates the transmission of the body payload encoded as "form data." Within a given path operation, it is permissible to specify multiple parameters for Files and Forms. Nevertheless, the declaration of Body fields, anticipated to be received in JSON format, concurrently with these parameters is not viable. This restriction arises because the encoding of the request body will employ the multipart/form-data format rather than application/json. For a more comprehensive exposition on this topic, interested readers are directed to: [FastAPI Documentation on Request Forms and Files](https://fastapi.tiangolo.com/tutorial/request-forms-and-files/#define-file-and-form-parameters).

Consequently, when submitting parameters, it is imperative that they are formatted as data. Options for submission include the direct provision of the URL for a PDF file or the conveyance of the PDF's byte data.

#### provide url

In [5]:
import json
# you can only use pdf_url to process the pdf
nougat_url = url + "/pdf2md/"
params = {
    'pdf':transformer_url,
    "markdown":True
}
nougat_response = requests.post(nougat_url,data=params)
nougat_response

<Response [200]>

In [6]:
eval(nougat_response.text)

{'status': 'success',
 'message': [{'file_name': '1706_03762',
   'text': '# Attention Is All You Need\n\n Ashish Vaswani\n\nGoogle Brain\n\navaswani@google.com\n\n&Noam Shazeer1\n\nGoogle Brain\n\nnoam@google.com\n\n&Niki Parmar1\n\nGoogle Research\n\nnikip@google.com\n\n&Jakob Uszkoreit1\n\nGoogle Research\n\nusz@google.com\n\n&Llion Jones1\n\nGoogle Research\n\nllion@google.com\n\n&Aidan N. Gomez1\n\nUniversity of Toronto\n\naidan@cs.toronto.edu\n\n&Lukasz Kaiser1\n\nGoogle Brain\n\nlukaszkaiser@google.com\n\n&Illia Polosukhin1\n\nillia.polosukhin@gmail.com\n\nEqual contribution. Listing order is random. Jakob proposed replacing RNNs with self-attention and started the effort to evaluate this idea. Ashish, with Illia, designed and implemented the first Transformer models and has been crucially involved in every aspect of this work. Noam proposed scaled dot-product attention, multi-head attention and the parameter-free position representation and became the other person involved in n

#### upload bytes data of pdf

In [7]:
# Alternatively, you can pass the PDF content (if multiple files are needed, you can append other PDF paths in the pdf_path) as bytes.
nougat_url = url + "/pdf2md/"
pdf_path = [transformer_pdf_path]
files = []
for i in pdf_path:
    with open(i,"rb") as f:
        files.append(("pdf_content",open(i,"rb")))

params = {
    "markdown":True
}
nougat_response = requests.post(nougat_url,files=files,data=params)
nougat_response

<Response [200]>

In [8]:
if nougat_response.status_code == 200:
    article_ls = eval(nougat_response.text)["message"]

article_ls

[{'file_name': '1706_03762',
  'text': '# Attention Is All You Need\n\n Ashish Vaswani\n\nGoogle Brain\n\navaswani@google.com\n\n&Noam Shazeer1\n\nGoogle Brain\n\nnoam@google.com\n\n&Niki Parmar1\n\nGoogle Research\n\nnikip@google.com\n\n&Jakob Uszkoreit1\n\nGoogle Research\n\nusz@google.com\n\n&Llion Jones1\n\nGoogle Research\n\nllion@google.com\n\n&Aidan N. Gomez1\n\nUniversity of Toronto\n\naidan@cs.toronto.edu\n\n&Lukasz Kaiser1\n\nGoogle Brain\n\nlukaszkaiser@google.com\n\n&Illia Polosukhin1\n\nillia.polosukhin@gmail.com\n\nEqual contribution. Listing order is random. Jakob proposed replacing RNNs with self-attention and started the effort to evaluate this idea. Ashish, with Illia, designed and implemented the first Transformer models and has been crucially involved in every aspect of this work. Noam proposed scaled dot-product attention, multi-head attention and the parameter-free position representation and became the other person involved in nearly every detail. Niki designed, 

In [9]:
from pathlib import Path
transformer_path = Path(transformer_pdf_path)
transformer_md_path = transformer_path.with_suffix(".md")
with open(transformer_md_path,"w") as f:
    f.write(article_ls[0]['text'])
    print(f"Markdown file has been saved in {transformer_md_path}")
transformer_md_path

Markdown file has been saved in d:\git\MMAPIS\res\1706_03762\1706_03762.md


WindowsPath('d:/git/MMAPIS/res/1706_03762/1706_03762.md')

### alignment


To facilitate the alignment of text with a corresponding image, it is essential to supply the specific text in need of alignment via the parameter `text`, as well as the relevant PDF file through the parameter `pdf` for image extraction purposes. For enhanced precision in alignment, the inclusion of the parameter `raw_md_text` is recommended. This additional parameter assists in discerning the underlying structure of the document, thereby improving the accuracy of the alignment process.

#### alignment with pdf url

In [10]:
align_url = url + "/alignment/"
markdown_file = transformer_md_path
pdf_file = transformer_pdf_path

with open(markdown_file,"r") as f:
    text = f.read()

params = {
    "text":text,
    "raw_md_text":text,
    "pdf": transformer_url
}

align_response = requests.post(align_url,data=params)
align_response

<Response [200]>

#### align with pdf bytes data

In [11]:
align_url = url + "/alignment/"
markdown_file = transformer_md_path
pdf_file = transformer_pdf_path
files = []

with open(markdown_file,"r") as f:
    text = f.read()

with open(pdf_file,"rb") as f:
    files.append(("pdf_content",f))
    params = {
        "text":text,
        "raw_md_text":text
    }

    align_response = requests.post(align_url,files=files,data=params)
align_response

<Response [200]>

In [12]:
extract_dir = os.path.join(os.path.dirname(markdown_file), "alignment_raw")
os.makedirs(extract_dir,exist_ok=True)
if align_response.status_code == 200:
    extract_zip_from_bytes(align_response.content,extract_dir)
    print(f"Alignment files has been saved in {extract_dir}")

Alignment files has been saved in d:\git\MMAPIS\res\1706_03762\alignment_raw


## 2-stage Summarization

### section level summary

To generate a sectional summary of textual content, it is crucial to precisely configure the OpenAI model and input the text of your article. The FastAPI Interactive API documentation hosted on the backend server provides exhaustive details on each parameter, offering valuable insight into their specific functions and thereby augmenting the summary generation process's overall efficiency.

Furthermore, in the context of handling multi-threaded requests, it is essential to specify whether your API key is subject to a rate limit of three requests per minute. If this is the case, the `rpm_limit` should be set to 3 to adhere to this restriction. Conversely, if your API key is not subject to any rate limitations, setting the `rpm_limit` to 0 will facilitate a more rapid response time.

In [13]:
import requests
url = "http://127.0.0.1:8000"
section_summry_url = url + "/section_level_summary/"
file_path = transformer_md_path
with open(file_path,"r") as f:
    text = f.read()


params = {
    "api_key":OPENAI_CONFIG["api_key"],
    "base_url":OPENAI_CONFIG["base_url"],
    "article_text":text,
    "init_grid":3,
    "max_grid":4,
    "summary_prompts":SECTION_PROMPTS,
    "summarizer_params": {
        "rpm_limit":OPENAI_CONFIG["rpm_limit"],
        "ignore_titles":OPENAI_CONFIG["ignore_title"],
        "num_processes":OPENAI_CONFIG["num_processes"],
        "prompt_ratio":OPENAI_CONFIG["prompt_ratio"],
        "gpt_model_params": OPENAI_CONFIG["model_config"]

    }
}


section_summry_response = requests.post(section_summry_url,json=params)
section_summry_response

<Response [200]>

In [14]:
eval(section_summry_response.text)

{'status': 'success',
 'message': '# Attention Is All You Need\n\n- Authors: Ashish Vaswani avaswani@google.com &Noam Shazeer1 noam@google.com &Niki Parmar1 nikip@google.com &Jakob Uszkoreit1 &Llion Jones1 llion@google.com &Aidan N. Gomez1 &Lukasz Kaiser1 lukaszkaiser@google.com illia.polosukhin@gmail.com\n\n- Affiliations: Google Brain Google Brain Google Research Google Research usz@google.com Google Research University of Toronto aidan@cs.toronto.edu Google Brain &Illia Polosukhin1 Equal contribution. Listing order is random. Jakob proposed replacing RNNs with self-attention and started the effort to evaluate this idea. Ashish, with Illia, designed and implemented the first Transformer models and has been crucially involved in every aspect of this work. Noam proposed scaled dot-product attention, multi-head attention and the parameter-free position representation and became the other person involved in nearly every detail. Niki designed, implemented, tuned and evaluated countless mo

In [15]:
if section_summry_response.status_code == 200:
    section_summry = eval(section_summry_response.text)["message"]
    section_summary_path = Path(file_path).with_name("section_summary.md")
    with open(section_summary_path,"w") as f:
        f.write(section_summry)
        print(f"Section summary has been saved in {section_summary_path}")

Section summary has been saved in d:\git\MMAPIS\res\1706_03762\section_summary.md


Additionally, it is possible to synchronize the section summaries with the corresponding PDF document, whether accessed via URL or directly as a PDF file, for a comprehensive multi-modal interpretation.

In [16]:
text = section_summry
pdf_file = transformer_pdf_path
files = []

raw_md_path = transformer_md_path
with open(raw_md_path,"r") as f:
    raw_md_text = f.read()

with open(pdf_file,"rb") as f:
    files.append(("pdf_content",f))
    params = {
        "text":text,
        "raw_md_text":raw_md_text
    }
    section_align_response = requests.post(align_url,files=files,data=params)
section_align_response

<Response [200]>

In [17]:
extract_dir = os.path.join(os.path.dirname(transformer_path), "alignment_section")
if not os.path.exists(extract_dir):
    os.makedirs(extract_dir)
if align_response.status_code == 200:
    extract_zip_from_bytes(section_align_response.content,extract_dir)
    print(f"Alignment files has been saved in {extract_dir}")

Alignment files has been saved in d:\git\MMAPIS\res\1706_03762\alignment_section


### Document Level Summary

At this juncture, the primary objective is to enhance the readability and coherence of section-level summarizations, ensuring they remain both consistent and informatively rich. To compile a Document Level Summary, it is required that you furnish the previously generated `section_summaries`. Additionally, should you wish to personalize the summary or opt for the default settings, you may modify or omit your prompts from the `params`.

In [18]:
document_summary_url = url + "/document_level_summary/"
text = section_summry

params = {
    "api_key":OPENAI_CONFIG["api_key"],
    "base_url":OPENAI_CONFIG["base_url"],
    "section_summaries":text,
    "integrate_prompts":INTEGRATE_PROMPTS,
    "summarizer_params": {
        "rpm_limit":OPENAI_CONFIG["rpm_limit"],
        "ignore_titles":OPENAI_CONFIG["ignore_title"],
        "num_processes":OPENAI_CONFIG["num_processes"],
        "prompt_ratio":OPENAI_CONFIG["prompt_ratio"],
        "gpt_model_params": OPENAI_CONFIG["model_config"]

    }
}

document_summary_response = requests.post(document_summary_url,json=params)
document_summary_response


<Response [200]>

In [19]:
if document_summary_response.status_code == 200:
    document_level_summary = eval(document_summary_response.text)["message"]
    with open(Path(file_path).with_name("document_summary.md"),"w") as f:
        f.write(document_level_summary)
        print(f"Document summary has been saved in {Path(file_path).with_name('document_summary.md')}")

Document summary has been saved in d:\git\MMAPIS\res\1706_03762\document_summary.md


In [20]:
text = document_level_summary
pdf_file = transformer_pdf_path
files = []
with open(pdf_file,"rb") as f:
    files.append(("pdf_content",f))

    params = {
        "text":text,
        "raw_md_text":raw_md_text
    }
    document_align_response = requests.post(align_url,files=files,data=params)
document_align_response

<Response [200]>

In [21]:
extract_dir = os.path.join(os.path.dirname(transformer_path), "alignment_document")
if not os.path.exists(extract_dir):
    os.makedirs(extract_dir)
if document_align_response.status_code == 200:
    extract_zip_from_bytes(document_align_response.content,extract_dir)
    print(f"Alignment files has been saved in {extract_dir}")

Alignment files has been saved in d:\git\MMAPIS\res\1706_03762\alignment_document


## Application

In the Application phase, we identify four valuable downstream applications: blogging, speech generation, recommendation systems, and multimodal question answering (QA). To facilitate ease of use, we offer distinct APIs for each specific functionality as well as a unified, functionally integrated routing solution for comprehensive access.

### blog generation

The blogs generated through this process are designed to be more readable and mimic the quality of human-authored content, albeit potentially less detailed than both section-level and document-level summaries.

Similar to the preceding processing steps, it is necessary to supply the PDF document, either through a URL or in byte format, along with the raw text via the `raw_md_text` parameter for enhanced accuracy in alignment. Additionally, the results from the two-stage summarization process must be provided to facilitate the creation of blog text that is both coherent and engaging.

In [22]:
import json
blog_url = url + "/blog_generation/"

params = {
    "api_key":OPENAI_CONFIG["api_key"],
    "base_url":OPENAI_CONFIG["base_url"],
    "raw_md_text":raw_md_text,
    "document_level_summary":document_level_summary,
    "section_summary":section_summry,
    "blog_prompts":json.dumps(APPLICATION_PROMPTS["blog_prompts"]),
    "init_grid":ALIGNMENT_CONFIG["init_grid"],
    "max_grid":ALIGNMENT_CONFIG["max_grid"],
    "threshold":ALIGNMENT_CONFIG["threshold"],
    "file_name":"blog",
    "summarizer_params": json.dumps(
        {
        "rpm_limit":OPENAI_CONFIG["rpm_limit"],
        "ignore_titles":OPENAI_CONFIG["ignore_title"],
        "num_processes":OPENAI_CONFIG["num_processes"],
        "prompt_ratio":OPENAI_CONFIG["prompt_ratio"],
        "gpt_model_params": OPENAI_CONFIG["model_config"]

    })
}

files = []
with open(transformer_pdf_path,"rb") as f:
    files.append(("pdf_content",f))
    blog_response = requests.post(blog_url,data=params,files=files)
blog_response



<Response [200]>

In [23]:
if blog_response.status_code == 200:
    extract_dir = os.path.join(os.path.dirname(transformer_path), "alignment_blog")
    if not os.path.exists(extract_dir):
        os.makedirs(extract_dir)
    extract_zip_from_bytes(blog_response.content,extract_dir)
    print(f"Alignment files has been saved in {extract_dir}")

Alignment files has been saved in d:\git\MMAPIS\res\1706_03762\alignment_blog


#### Unified API

Utilizing a unified API, you can effortlessly access the four applications by simply modifying the `usage` parameter. This approach enables you to obtain a result that includes an HTML URL, offering a more dynamic display and facilitating richer interactive experiences.

In [24]:
app_url = url + "/app/"
params = {
    "api_key":OPENAI_CONFIG["api_key"],
    "base_url":OPENAI_CONFIG["base_url"],
    "raw_md_text":raw_md_text,
    "document_level_summary":document_level_summary,
    "section_summary":section_summry,
    "usage":"blog",
    "summarizer_params": json.dumps(
        {
        "rpm_limit":OPENAI_CONFIG["rpm_limit"],
        "ignore_titles":OPENAI_CONFIG["ignore_title"],
        "num_processes":OPENAI_CONFIG["num_processes"],
        "prompt_ratio":OPENAI_CONFIG["prompt_ratio"],
        "gpt_model_params": OPENAI_CONFIG["model_config"]

    })
}
files = []
with open(transformer_pdf_path,"rb") as f:
    files.append(("pdf_content",f))
    app_blog_response = requests.post(app_url,data=params,files=files)

app_blog_response

<Response [200]>

In [25]:
app_blog_response.url

'http://127.0.0.1:8000/index/blog_html/8a782298-e7c5-49ac-b96a-11be07cda7b1.html'

In [26]:
import webbrowser
if app_blog_response.status_code == 200:
    webbrowser.open_new_tab(app_blog_response.url)

### recommendation generation


Generate a recommendation score by analyzing both the raw article text and the document-level summary, subsequently returning a list encompassing six dimensions for a varied assessment and thorough comparison.

In [27]:
recommendation_url = url + "/recommendation_generation/"
raw_text_path = transformer_md_path
with open(raw_text_path,"r") as f:
    raw_text = f.read()

params = {
    "api_key":OPENAI_CONFIG["api_key"],
    "base_url":OPENAI_CONFIG["base_url"],
    "document_level_summary":document_level_summary,
    "raw_text":raw_text,
    "summarizer_params": {
        "rpm_limit":OPENAI_CONFIG["rpm_limit"],
        "ignore_titles":OPENAI_CONFIG["ignore_title"],
        "num_processes":OPENAI_CONFIG["num_processes"],
        "prompt_ratio":OPENAI_CONFIG["prompt_ratio"],
        "gpt_model_params": OPENAI_CONFIG["model_config"]
    }
}

recommendation_response = requests.post(recommendation_url,json=params)
recommendation_response


<Response [200]>

In [28]:
eval(recommendation_response.text)

{'status': 'success',
 'message': [{'title': 'Clarity of Objectives and Central Theme',
   'score': 9,
   'comments': 'The objectives and central theme of the paper are presented with exceptional clarity. The paper aims to introduce the Transformer model, based solely on attention mechanisms, as a superior alternative to recurrent or convolutional neural networks for sequence transduction tasks. The emphasis on translation quality, parallelizability, and training time sets clear objectives for the research.'},
  {'title': 'Appropriateness and Accuracy of Methods',
   'score': 8,
   'comments': 'The methods employed in the paper are well-suited to the research question of comparing the Transformer model to existing recurrent or convolutional models in machine translation tasks. The use of attention mechanisms, self-attention, multi-head attention, position-wise feed-forward networks, embeddings, and softmax functions is appropriate for achieving the research objectives. The detailed des

In [29]:
recommendation = eval(recommendation_response.text)["message"]
recommendation_text = ""
for item in recommendation:
    for i,v in enumerate(item.values()):
        if i == 0:
            recommendation_text += f"- {v}\n"
        else:
            recommendation_text += f"  - {v}\n"


with open(Path(raw_text_path).with_name("recommendation.md"),"w") as f:
    f.write(recommendation_text)
    print(f"Recommendation has been saved in {Path(raw_text_path).with_name('recommendation.md')}")
    

Recommendation has been saved in d:\git\MMAPIS\res\1706_03762\recommendation.md


#### Unified API

In [30]:
app_url = url + "/app/"
params = {
    "api_key":OPENAI_CONFIG["api_key"],
    "base_url":OPENAI_CONFIG["base_url"],
    "raw_md_text":raw_md_text,
    "document_level_summary":document_level_summary,
    "section_summary":section_summry,
    "usage":"recommend",
    "summarizer_params": json.dumps(
        {
        "rpm_limit":OPENAI_CONFIG["rpm_limit"],
        "ignore_titles":OPENAI_CONFIG["ignore_title"],
        "num_processes":OPENAI_CONFIG["num_processes"],
        "prompt_ratio":OPENAI_CONFIG["prompt_ratio"],
        "gpt_model_params": OPENAI_CONFIG["model_config"]

    })
}

app_recommend_response = requests.post(app_url,data=params)

app_recommend_response

<Response [200]>

In [31]:
webbrowser.open_new_tab(app_recommend_response.url)

True

### TTS geenration

Produce a broadcast-quality output, accompanied by an MP3 format tailored for speech applications, characterized by its clarity and suitability for oral delivery. This format is designed for easy comprehension when spoken, making it ideal for use in morning broadcasts.

In [32]:
broadcast_url = url + "/broadcast_generation/"

params = {
    "llm_api_key":OPENAI_CONFIG["api_key"],
    "llm_base_url":OPENAI_CONFIG["base_url"],
    "document_level_summary":document_level_summary,
    "section_summaries":section_summry,
    "broadcast_prompts": APPLICATION_PROMPTS["broadcast_prompts"],
    "summarizer_params": {
        "rpm_limit":OPENAI_CONFIG["rpm_limit"],
        "ignore_titles":OPENAI_CONFIG["ignore_title"],
        "num_processes":OPENAI_CONFIG["num_processes"],
        "prompt_ratio":OPENAI_CONFIG["prompt_ratio"],
        "gpt_model_params": OPENAI_CONFIG["model_config"]
    }
}

broadcast_response = requests.post(broadcast_url,json=params)
broadcast_response



<Response [200]>

In [33]:
eval(broadcast_response.text)

{'status': 'success',
 'message': 'Welcome to our discussion. Today, we will introduce a paper titled "Attention Is All You Need," with primary authors including Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. This paper presents the Transformer, a novel network architecture that relies solely on attention mechanisms and eliminates the need for recurrent or convolutional neural networks.\n\nThe authors demonstrate the superiority of the Transformer model in terms of translation quality, parallelizability, and training time. They achieve state-of-the-art results on machine translation tasks, improving the existing best results by over 2 BLEU on the WMT 2014 English-to-German translation task and establishing a new single-model state-of-the-art BLEU score of 41.8 on the WMT 2014 English-to-French translation task.\n\nThe Transformer model also generalizes well to other tasks, such as English constituency parsin

In [34]:
broadcast_script_content = eval(broadcast_response.text)["message"]
with open(Path(raw_text_path).with_name("broadcast_script.md"),"w") as f:
    f.write(broadcast_script_content)
    print(f"Broadcast script has been saved in {Path(raw_text_path).with_name('broadcast_script.md')}")
    

Broadcast script has been saved in d:\git\MMAPIS\res\1706_03762\broadcast_script.md


In [35]:
tts_url = url + "/tts/"

params = {
    "text":broadcast_script_content,
    "tts_api_key":TTS_CONFIG["api_key"],
    "tts_base_url":TTS_CONFIG["base_url"],
    "app_secret":TTS_CONFIG["app_secret"],
}

tts_response = requests.post(tts_url,json=params)
tts_response

<Response [200]>

In [36]:
import time
if tts_response.status_code != 200:
    print(f"Error in TTS: {tts_response.text}")

else:
    print("TTS generated successfully")
    save_dir = os.path.dirname(transformer_path)
    bytes_data = tts_response.content
    millis = int(round(time.time() * 1000))
    if not os.path.exists(save_dir):
        os.makedirs(save_dir)
    file_path = os.path.join(save_dir, str(millis) + ".mp3")
    with open(file_path, 'wb') as fo:
        fo.write(bytes_data)
    print(f"File saved at {file_path}")

TTS generated successfully
File saved at d:\git\MMAPIS\res\1706_03762\1711428846472.mp3


#### Unified API

In [37]:
app_url = url + "/app/"
params = {
    "api_key":OPENAI_CONFIG["api_key"],
    "base_url":OPENAI_CONFIG["base_url"],
    "raw_md_text":raw_md_text,
    "document_level_summary":document_level_summary,
    "section_summary":section_summry,
    "usage":"speech",
    "summarizer_params": json.dumps(
        {
        "rpm_limit":OPENAI_CONFIG["rpm_limit"],
        "ignore_titles":OPENAI_CONFIG["ignore_title"],
        "num_processes":OPENAI_CONFIG["num_processes"],
        "prompt_ratio":OPENAI_CONFIG["prompt_ratio"],
        "gpt_model_params": OPENAI_CONFIG["model_config"]

    })
}

app_speech_response = requests.post(app_url,data=params)

app_speech_response

<Response [200]>

In [38]:
webbrowser.open_new_tab(app_speech_response.url)

True

### Multimodal QA Generation

The Multimodal Question Answering (QA) system comprises three distinct agents: the General QA Answering Agent, the User Intent Identifying Agent, and the Multimodal QA Answering Agent. Initially, the User Intent Identifying Agent processes your inquiry to ascertain whether it pertains to specific section details or image indices. For instance, questions like "What is the main point in chapter 3?" or "What is figure 3 about?"—or even more precisely, "What is figure 0 in chapter 2 about?"—assist the User Intent Identifying Agent in comprehending your query more effectively. Subsequently, based on the specificity of the intent identified, the task is routed to either the General QA Answering Agent for broader queries or the Multimodal QA Answering Agent for inquiries specifically concerning a section or image index.

It is pertinent to clarify that the indexing is structured as follows:
```plaintext
# Title (designated as chapter 0)

## SubTitle 1 (referred to as chapter 1)
[img 0] (identifiable as either figure 0 or figure 0 in chapter 1) 

[img 1] (can be referenced as figure 1 or figure 1 in chapter 1)

## SubTitle 2 (denoted as chapter 2)
[img 2] (can be inquired about as figure 2 or figure 0 in chapter 2)    

[img 3] (is queryable as figure 3 or figure 1 in chapter 2)
...

```
This notation ensures a systematic approach to referencing both text and images within the document, facilitating precise and efficient retrieval of information during the question-answering process.

#### Unified API


In [39]:
import json
qa_url = url + "/app/"
params = {
    "api_key":OPENAI_CONFIG["api_key"],
    "base_url":OPENAI_CONFIG["base_url"],
    "document_level_summary":document_level_summary,
    "raw_md_text":raw_text,
    "usage": "qa",
    "summarizer_params": 
        {
        "rpm_limit":OPENAI_CONFIG["rpm_limit"],
        "ignore_titles":OPENAI_CONFIG["ignore_title"],
        "num_processes":OPENAI_CONFIG["num_processes"],
        "prompt_ratio":OPENAI_CONFIG["prompt_ratio"],
        "gpt_model_params": OPENAI_CONFIG["model_config"]

    }
}

Multimodal_qa_response = requests.post(qa_url,data=params)
Multimodal_qa_response


<Response [200]>

In [40]:
Multimodal_qa_response.url

'http://127.0.0.1:8000/index/qa_html/9f0491ee-b8ce-43fd-8bd7-b63d9901de42.html'

In [41]:
import webbrowser
webbrowser.open(Multimodal_qa_response.url)

True