# Introduction to MMAPIS Project

Welcome to our comprehensive guide on using the MMAPIS project. MMAPIS, a multi-modal automated academic paper interpretation system, is designed to help you quickly understand the key points of a target article. It provides a solution to address the critical needs of the scientific community in our rapidly evolving digital landscape.

This Jupyter notebook offers a step-by-step tutorial on leveraging the capabilities of MMAPIS and the designed APIs to extract and interpret key information from academic papers. We will cover the following topics:

- Preprocessing
- 2-stage Summarization
- Downstream Applications

## Prerequisites

Before diving in, ensure you have the following prerequisites:

1. Follow the [Readme/# 🚀How to Run](../Readme.md) instructions to install the required packages and set up the environment.
2. Configure your environment in the [config](../config) directory.

## Objectives

In this module, we aim to:

1. Introduce the MMAPIS project.
2. Demonstrate how to use MMAPIS APIs for extracting and interpreting key information from academic papers.
3. Start up your server:

   - If running locally, follow:
     ```bash
     cd path/to/MMAPIS
     uvicorn backend:app --reload --port <your port>
     ```
     Example:
     ```
     uvicorn backend:app --reload --port 8000
     ```

   - If running on a server:
     ```
     uvicorn backend:app --host 0.0.0.0 --port <your port> --reload
     ```
     Example:
     ```
     uvicorn backend:app --host 0.0.0.0 --port 5010 --reload
     ```

After successfully starting your backend, you can find detailed parameters for any designed API in the FastAPI Interactive API docs at http://127.0.0.1:8000/docs (if running locally and your port is 8000).

## Preprocessing

### initialize the library

In [1]:
import requests
import sys
import os
# make sure we can import MMAPIS Module
sys.path.append(os.path.abspath("../../"))
from MMAPIS.tools import extract_zip_from_bytes,download_pdf,get_pdf_name
from MMAPIS.config.config import GENERAL_CONFIG, OPENAI_CONFIG, ARXIV_CONFIG, NOUGAT_CONFIG,INTEGRATE_PROMPTS,SECTION_PROMPTS,ALIGNMENT_CONFIG,LOGGER_MODES,TTS_CONFIG, APPLICATION_PROMPTS,TTS_CONFIG
# if you run locally and your port is set to 8000
# if your back end is running on a different port, url should http://<your ip>:<your port>
url = "http://127.0.0.1:8000"

MMAPIS_Dir = os.path.abspath("../")
MMAPIS_Dir

INFO:root:Best GPU: 0. Batch size: 0
INFO:root:Loading logging file from d:\git\MMAPIS\config\logging.ini


'd:\\git\\MMAPIS'

### Crawl from Arxiv

You can search for your target articles of interest using parameters similar to those on [arxiv](https://arxiv.org/search/?query=&searchtype=all&abstracts=show&order=-announced_date_first&size=50).

If your request body is invalid, you will receive a reminder like the following:
```json
{
    'status': 'request error',
    'message': 'input params error: error 1:type: bool_parsing, location: request body, param return_md input:5, msg: Input should be a valid boolean, unable to interpret input'
}
```
In the following example code, we based on [Attention Is All You Need](https://arxiv.org/pdf/1706.03762.pdf), which introduces the transformer model.

In [2]:
import requests
arxiv_url = url + "/get_links/"
params = {
    "key_word":"attention is all you need",
     "max_return":5,
     "return_md":False,
     "order":"announced_date_first"
}

arxiv_response = requests.post(arxiv_url,json=params)
eval(arxiv_response.text)


{'status': 'success',
 'message': [{'pdf_url': 'https://arxiv.org/pdf/1706.03762',
   'title': 'Attention Is All You Need',
   'author': 'Authors:\nAshish Vaswani, \n      \n      Noam Shazeer, \n      \n      Niki Parmar, \n      \n      Jakob Uszkoreit, \n      \n      Llion Jones, \n      \n      Aidan N. Gomez, \n      \n      Lukasz Kaiser, \n      \n      Illia Polosukhin',
   'abstract': 'Abstract:\n      \n        …are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurr…\n        '},
  {'pdf_url': 'https://arxiv.org/pdf/1905.13497',
   'title': 'Attention Is (not) All You Need for Commonsense Reasoning',
   'author': 'Authors:\nTassilo Klein, \n      \n      Moin Nabi',
   'abstract': 'Abstract:\n      \n        …stro

Download the pdf for subsequent processing

In [3]:

pdf_ls = eval(arxiv_response.text)["message"]
transformer_url = pdf_ls[0]["pdf_url"]
flag, pdf_path = download_pdf(transformer_url,save_dir=os.path.join(MMAPIS_Dir,"res"))

In [4]:
transformer_pdf_path = pdf_path
transformer_pdf_path

WindowsPath('d:/git/MMAPIS/res/1706_03762/1706_03762.pdf')

### nougat

Nougat facilitates the conversion of a PDF file to its corresponding Markdown file. Due to the inherent limitations of HTTP, file uploads involve sending the body payload encoded as "form data." In a path operation, you can declare multiple File and Form parameters. However, you cannot simultaneously declare Body fields that are expected to be received as JSON. This is because the request body will be encoded using multipart/form-data instead of application/json.

This limitation is documented in a warning, and you can find more details here:  https://fastapi.tiangolo.com/tutorial/request-forms-and-files/#define-file-and-form-parameters.

You can either simply provide the PDF URL or pass the byte data of the PDF.

In [5]:
import json
# you can only use pdf_url to process the pdf
nougat_url = url + "/pdf2md/"
params = {
    'pdf':transformer_url,
    "markdown":True
}
nougat_response = requests.post(nougat_url,data=params)
nougat_response

<Response [200]>

In [6]:
eval(nougat_response.text)

{'status': 'success',
 'message': {'article_ls': [{'file_name': '1706_03762',
    'text': '# Attention Is All You Need\n\n Ashish Vaswani\n\nGoogle Brain\n\navaswani@google.com\n\n&Noam Shazeer1\n\nGoogle Brain\n\nnoam@google.com\n\n&Niki Parmar1\n\nGoogle Research\n\nnikip@google.com\n\n&Jakob Uszkoreit1\n\nGoogle Research\n\nusz@google.com\n\n&Llion Jones1\n\nGoogle Research\n\nllion@google.com\n\n&Aidan N. Gomez1\n\nUniversity of Toronto\n\naidan@cs.toronto.edu\n\n&Lukasz Kaiser1\n\nGoogle Brain\n\nlukaszkaiser@google.com\n\n&Illia Polosukhin1\n\nillia.polosukhin@gmail.com\n\nEqual contribution. Listing order is random. Jakob proposed replacing RNNs with self-attention and started the effort to evaluate this idea. Ashish, with Illia, designed and implemented the first Transformer models and has been crucially involved in every aspect of this work. Noam proposed scaled dot-product attention, multi-head attention and the parameter-free position representation and became the other pers

In [7]:
# Alternatively, you can pass the PDF content (if multiple files are needed, you can append other PDF paths in the pdf_path) as bytes.
nougat_url = url + "/pdf2md/"
pdf_path = [transformer_pdf_path]
files = []
for i in pdf_path:
    with open(i,"rb") as f:
        files.append(("pdf_content",open(i,"rb")))

params = {
    "markdown":True
}
nougat_response = requests.post(nougat_url,files=files,data=params)
nougat_response

<Response [200]>

In [8]:
if nougat_response.status_code == 200:
    article_ls = eval(nougat_response.text)["message"]["article_ls"]

article_ls

[{'file_name': '1706_03762',
  'text': '# Attention Is All You Need\n\n Ashish Vaswani\n\nGoogle Brain\n\navaswani@google.com\n\n&Noam Shazeer1\n\nGoogle Brain\n\nnoam@google.com\n\n&Niki Parmar1\n\nGoogle Research\n\nnikip@google.com\n\n&Jakob Uszkoreit1\n\nGoogle Research\n\nusz@google.com\n\n&Llion Jones1\n\nGoogle Research\n\nllion@google.com\n\n&Aidan N. Gomez1\n\nUniversity of Toronto\n\naidan@cs.toronto.edu\n\n&Lukasz Kaiser1\n\nGoogle Brain\n\nlukaszkaiser@google.com\n\n&Illia Polosukhin1\n\nillia.polosukhin@gmail.com\n\nEqual contribution. Listing order is random. Jakob proposed replacing RNNs with self-attention and started the effort to evaluate this idea. Ashish, with Illia, designed and implemented the first Transformer models and has been crucially involved in every aspect of this work. Noam proposed scaled dot-product attention, multi-head attention and the parameter-free position representation and became the other person involved in nearly every detail. Niki designed, 

In [9]:
from pathlib import Path
transformer_path = Path(transformer_pdf_path)
transformer_md_path = transformer_path.with_suffix(".md")
with open(transformer_md_path,"w") as f:
    f.write(article_ls[0]['text'])
    print(f"Markdown file has been saved in {transformer_md_path}")
transformer_md_path

Markdown file has been saved in d:\git\MMAPIS\res\1706_03762\1706_03762.md


WindowsPath('d:/git/MMAPIS/res/1706_03762/1706_03762.md')

### alignment

To align text and an image, you need to provide the text that requires alignment and its corresponding PDF for parsing the image, at the very least.

In [10]:
align_url = url + "/alignment/"
markdown_file = transformer_md_path
pdf_file = transformer_pdf_path
files = []

with open(markdown_file,"r") as f:
    text = f.read()

with open(pdf_file,"rb") as f:
    files.append(("pdf_content",f))
    params = {
        "text":text,
        "raw_md_text":text
    }

    align_response = requests.post(align_url,files=files,data=params)
align_response

<Response [200]>

In [11]:
extract_dir = os.path.join(os.path.dirname(markdown_file), "alignment_raw")
if not os.path.exists(extract_dir):
    os.makedirs(extract_dir)
if align_response.status_code == 200:
    extract_zip_from_bytes(align_response.content,extract_dir)
    print(f"Alignment files has been saved in {extract_dir}")

Alignment files has been saved in d:\git\MMAPIS\res\1706_03762\alignment_raw


## 2-stage Summarization

### section level summary

To generate a sectional summary of the file text, you need to configure the OpenAI model and provide your article text. The detailed meaning of each parameter can be viewed in the FastAPI Interactive API docs.

In [12]:
import requests
url = "http://127.0.0.1:8000"
section_summry_url = url + "/section_level_summry/"
file_path = transformer_md_path
with open(file_path,"r") as f:
    text = f.read()


params = {
    "api_key":OPENAI_CONFIG["api_key"],
    "base_url":OPENAI_CONFIG["base_url"],
    "article_text":text,
    "init_grid":3,
    "max_grid":4,
    "summary_prompts":SECTION_PROMPTS,
    "summarizer_params": {
        "rpm_limit":OPENAI_CONFIG["rpm_limit"],
        "ignore_titles":OPENAI_CONFIG["ignore_title"],
        "num_processes":OPENAI_CONFIG["num_processes"],
        "prompt_ratio":OPENAI_CONFIG["prompt_ratio"],
        "gpt_model_params": OPENAI_CONFIG["model_config"]

    }
}


section_summry_response = requests.post(section_summry_url,json=params)
section_summry_response

<Response [200]>

In [13]:
eval(section_summry_response.text)

{'status': 'success',
 'message': {'section_summary': '# Attention Is All You Need\n\n- Authors: Ashish Vaswani avaswani@google.com &Noam Shazeer1 noam@google.com &Niki Parmar1 nikip@google.com &Jakob Uszkoreit1 &Llion Jones1 llion@google.com &Aidan N. Gomez1 &Lukasz Kaiser1 lukaszkaiser@google.com illia.polosukhin@gmail.com\n\n- Affiliations: Google Brain Google Brain Google Research Google Research usz@google.com Google Research University of Toronto aidan@cs.toronto.edu Google Brain &Illia Polosukhin1 Equal contribution. Listing order is random. Jakob proposed replacing RNNs with self-attention and started the effort to evaluate this idea. Ashish, with Illia, designed and implemented the first Transformer models and has been crucially involved in every aspect of this work. Noam proposed scaled dot-product attention, multi-head attention and the parameter-free position representation and became the other person involved in nearly every detail. Niki designed, implemented, tuned and ev

In [14]:
if section_summry_response.status_code == 200:
    section_summry = eval(section_summry_response.text)["message"]["section_summary"]
    section_summary_path = Path(file_path).with_name("section_summary.md")
    with open(section_summary_path,"w") as f:
        f.write(section_summry)
        print(f"Section summary has been saved in {section_summary_path}")

Section summary has been saved in d:\git\MMAPIS\res\1706_03762\section_summary.md


you can also align the section summary with the corresponding pdf for multi-modal interpretation

In [15]:
text = section_summry
pdf_file = transformer_pdf_path
files = []

raw_md_path = transformer_md_path
with open(raw_md_path,"r") as f:
    raw_md_text = f.read()

with open(pdf_file,"rb") as f:
    files.append(("pdf_content",f))
    params = {
        "text":text,
        "raw_md_text":raw_md_text
    }
    section_align_response = requests.post(align_url,files=files,data=params)
section_align_response

<Response [200]>

In [16]:
extract_dir = os.path.join(os.path.dirname(transformer_path), "alignment_section")
if not os.path.exists(extract_dir):
    os.makedirs(extract_dir)
if align_response.status_code == 200:
    extract_zip_from_bytes(section_align_response.content,extract_dir)
    print(f"Alignment files has been saved in {extract_dir}")

Alignment files has been saved in d:\git\MMAPIS\res\1706_03762\alignment_section


### Document Level Summary

In [17]:
document_summary_url = url + "/document_level_summary/"
text = section_summry

params = {
    "api_key":OPENAI_CONFIG["api_key"],
    "base_url":OPENAI_CONFIG["base_url"],
    "section_summaries":text,
    "integrate_prompts":INTEGRATE_PROMPTS,
    "summarizer_params": {
        "rpm_limit":OPENAI_CONFIG["rpm_limit"],
        "ignore_titles":OPENAI_CONFIG["ignore_title"],
        "num_processes":OPENAI_CONFIG["num_processes"],
        "prompt_ratio":OPENAI_CONFIG["prompt_ratio"],
        "gpt_model_params": OPENAI_CONFIG["model_config"]

    }
}

document_summary_response = requests.post(document_summary_url,json=params)
document_summary_response


<Response [200]>

In [18]:
if document_summary_response.status_code == 200:
    document_level_summary = eval(document_summary_response.text)["message"]
    with open(Path(file_path).with_name("document_summary.md"),"w") as f:
        f.write(document_level_summary)
        print(f"Document summary has been saved in {Path(file_path).with_name('document_summary.md')}")

Document summary has been saved in d:\git\MMAPIS\res\1706_03762\document_summary.md


In [19]:
text = document_level_summary
pdf_file = transformer_pdf_path
files = []
with open(pdf_file,"rb") as f:
    files.append(("pdf_content",f))

    params = {
        "text":text,
        "raw_md_text":raw_md_text
    }
    document_align_response = requests.post(align_url,files=files,data=params)
document_align_response

<Response [200]>

In [20]:
extract_dir = os.path.join(os.path.dirname(transformer_path), "alignment_document")
if not os.path.exists(extract_dir):
    os.makedirs(extract_dir)
if document_align_response.status_code == 200:
    extract_zip_from_bytes(document_align_response.content,extract_dir)
    print(f"Alignment files has been saved in {extract_dir}")

Alignment files has been saved in d:\git\MMAPIS\res\1706_03762\alignment_document


## Application

### blog generation

The generated blog can be more readable and closer to human-made ones, but it may be less detailed compared to section-level summaries and document-level summaries.

In [21]:
import json
blog_url = url + "/blog_generation/"

params = {
    "api_key":OPENAI_CONFIG["api_key"],
    "base_url":OPENAI_CONFIG["base_url"],
    "raw_md_text":raw_md_text,
    "document_level_summary":document_level_summary,
    "section_summary":section_summry,
    "blog_prompts":json.dumps(APPLICATION_PROMPTS["blog_prompts"]),
    "init_grid":ALIGNMENT_CONFIG["init_grid"],
    "max_grid":ALIGNMENT_CONFIG["max_grid"],
    "threshold":ALIGNMENT_CONFIG["threshold"],
    "file_name":"blog",
    "summarizer_params": json.dumps(
        {
        "rpm_limit":OPENAI_CONFIG["rpm_limit"],
        "ignore_titles":OPENAI_CONFIG["ignore_title"],
        "num_processes":OPENAI_CONFIG["num_processes"],
        "prompt_ratio":OPENAI_CONFIG["prompt_ratio"],
        "gpt_model_params": OPENAI_CONFIG["model_config"]

    })
}

files = []
with open(transformer_pdf_path,"rb") as f:
    files.append(("pdf_content",f))
    blog_response = requests.post(blog_url,data=params,files=files)
blog_response



<Response [200]>

In [22]:
eval(section_summry_response.text)

{'status': 'success',
 'message': {'section_summary': '# Attention Is All You Need\n\n- Authors: Ashish Vaswani avaswani@google.com &Noam Shazeer1 noam@google.com &Niki Parmar1 nikip@google.com &Jakob Uszkoreit1 &Llion Jones1 llion@google.com &Aidan N. Gomez1 &Lukasz Kaiser1 lukaszkaiser@google.com illia.polosukhin@gmail.com\n\n- Affiliations: Google Brain Google Brain Google Research Google Research usz@google.com Google Research University of Toronto aidan@cs.toronto.edu Google Brain &Illia Polosukhin1 Equal contribution. Listing order is random. Jakob proposed replacing RNNs with self-attention and started the effort to evaluate this idea. Ashish, with Illia, designed and implemented the first Transformer models and has been crucially involved in every aspect of this work. Noam proposed scaled dot-product attention, multi-head attention and the parameter-free position representation and became the other person involved in nearly every detail. Niki designed, implemented, tuned and ev

In [23]:
if blog_response.status_code == 200:
    extract_dir = os.path.join(os.path.dirname(transformer_path), "alignment_blog")
    if not os.path.exists(extract_dir):
        os.makedirs(extract_dir)
    extract_zip_from_bytes(blog_response.content,extract_dir)
    print(f"Alignment files has been saved in {extract_dir}")

Alignment files has been saved in d:\git\MMAPIS\res\1706_03762\alignment_blog


## recommendation generation


Generate a recommendation score based on the raw article text and document-level summary.

In [24]:
recommendation_url = url + "/recommendation_generation/"
raw_text_path = transformer_md_path
with open(raw_text_path,"r") as f:
    raw_text = f.read()

params = {
    "api_key":OPENAI_CONFIG["api_key"],
    "base_url":OPENAI_CONFIG["base_url"],
    "document_level_summary":document_level_summary,
    "raw_text":raw_text,
    "score_prompts":APPLICATION_PROMPTS["score_prompts"],
    "summarizer_params": {
        "rpm_limit":OPENAI_CONFIG["rpm_limit"],
        "ignore_titles":OPENAI_CONFIG["ignore_title"],
        "num_processes":OPENAI_CONFIG["num_processes"],
        "prompt_ratio":OPENAI_CONFIG["prompt_ratio"],
        "gpt_model_params": OPENAI_CONFIG["model_config"]
    }
}

recommendation_response = requests.post(recommendation_url,json=params)
recommendation_response


<Response [200]>

In [25]:
recommendation = eval(recommendation_response.text)["message"]
with open(Path(raw_text_path).with_name("recommendation.md"),"w") as f:
    f.write(recommendation)
    print(f"Recommendation has been saved in {Path(raw_text_path).with_name('recommendation.md')}")
    

Recommendation has been saved in d:\git\MMAPIS\res\1706_03762\recommendation.md


## TTS geenration

Generate a broadcast-quality product along with its accomodating MP3 format specifically suited for speech applications.

In [26]:
broadcast_url = url + "/broadcast_generation/"

params = {
    "llm_api_key":OPENAI_CONFIG["api_key"],
    "llm_base_url":OPENAI_CONFIG["base_url"],
    "document_level_summary":document_level_summary,
    "section_summaries":section_summry,
    "broadcast_prompts": APPLICATION_PROMPTS["broadcast_prompts"],
    "summarizer_params": {
        "rpm_limit":OPENAI_CONFIG["rpm_limit"],
        "ignore_titles":OPENAI_CONFIG["ignore_title"],
        "num_processes":OPENAI_CONFIG["num_processes"],
        "prompt_ratio":OPENAI_CONFIG["prompt_ratio"],
        "gpt_model_params": OPENAI_CONFIG["model_config"]
    }
}

broadcast_response = requests.post(broadcast_url,json=params)
broadcast_response



<Response [200]>

In [27]:
eval(broadcast_response.text)

{'status': 'success',
 'message': {'broadcast_script': 'Welcome to our discussion, today we will introduce a paper titled "Attention Is All You Need", with primary authors including Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. The paper presents a novel network architecture called the Transformer for sequence transduction tasks. Unlike traditional models that rely on recurrent or convolutional neural networks, the Transformer is based solely on attention mechanisms.\n\nThe Transformer model consists of an encoder and a decoder, both composed of stacked layers. The encoder maps input sequences to continuous representations, while the decoder generates output sequences based on the encoder\'s representations. The attention mechanism used in the Transformer is called Scaled Dot-Product Attention, which computes weights on values using dot products of queries and keys. The model also incorporates Multi-Head At

In [28]:
broadcast_script_content = eval(broadcast_response.text)["message"]["broadcast_script"]
with open(Path(raw_text_path).with_name("broadcast_script.md"),"w") as f:
    f.write(broadcast_script_content)
    print(f"Broadcast script has been saved in {Path(raw_text_path).with_name('broadcast_script.md')}")
    

Broadcast script has been saved in d:\git\MMAPIS\res\1706_03762\broadcast_script.md


In [29]:
tts_url = url + "/tts/"

params = {
    "text":broadcast_script_content,
    "tts_api_key":TTS_CONFIG["api_key"],
    "tts_base_url":TTS_CONFIG["base_url"],
    "app_secret":TTS_CONFIG["app_secret"],
}

tts_response = requests.post(tts_url,json=params)
tts_response

<Response [200]>

In [30]:
eval(broadcast_response.text)

{'status': 'success',
 'message': {'broadcast_script': 'Welcome to our discussion, today we will introduce a paper titled "Attention Is All You Need", with primary authors including Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. The paper presents a novel network architecture called the Transformer for sequence transduction tasks. Unlike traditional models that rely on recurrent or convolutional neural networks, the Transformer is based solely on attention mechanisms.\n\nThe Transformer model consists of an encoder and a decoder, both composed of stacked layers. The encoder maps input sequences to continuous representations, while the decoder generates output sequences based on the encoder\'s representations. The attention mechanism used in the Transformer is called Scaled Dot-Product Attention, which computes weights on values using dot products of queries and keys. The model also incorporates Multi-Head At

In [31]:
import time
save_dir = os.path.dirname(transformer_path)
bytes_data = tts_response.content
millis = int(round(time.time() * 1000))
if not os.path.exists(save_dir):
    os.makedirs(save_dir)
file_path = os.path.join(save_dir, str(millis) + ".mp3")
with open(file_path, 'wb') as fo:
    fo.write(bytes_data)
print(f"File saved at {file_path}")

File saved at d:\git\MMAPIS\res\1706_03762\1708359477993.mp3
