Conversation
Signed-off-by: Tiep Le <tiep.le@intel.com>
Signed-off-by: Tiep Le <tiep.le@intel.com>
|
|
||
| The proposed architecture involves the creation of two megaservices. | ||
| - The first megaservice functions as the core pipeline, comprising four microservices: embedding, retriever, reranking, and LVLM. This megaservice exposes a MMRagBasedVisualQnAGateway, allowing users to query the system via the `/v1/mmrag_visual_qna` endpoint. | ||
| - The second megaservice manages user data storage in VectorStore and is composed of a single microservice, embedding. This megaservice provides a MMRagDataIngestionGateway, enabling user access through the `/v1/mmrag_data_ingestion` endpoint. |
There was a problem hiding this comment.
Can this be an enhanced microservice based on the existing data ingestion?
There was a problem hiding this comment.
@hshen14 Thanks for your comments. The simple answer is yes. We will enhance/reuse the existing data ingestion microservice. Current data ingestion microservice supports only TextDoc. In our proposal, we will enhance its interface to accept MultimodalDoc which can be either TextDoc, ImageDoc, ImageTextPairDoc, etc... If the input is of type TextDoc, we will divert the execution to current microservice/functions.
| The proposed architecture involves the creation of two megaservices. | ||
| - The first megaservice functions as the core pipeline, comprising four microservices: embedding, retriever, reranking, and LVLM. This megaservice exposes a MMRagBasedVisualQnAGateway, allowing users to query the system via the `/v1/mmrag_visual_qna` endpoint. | ||
| - The second megaservice manages user data storage in VectorStore and is composed of a single microservice, embedding. This megaservice provides a MMRagDataIngestionGateway, enabling user access through the `/v1/mmrag_data_ingestion` endpoint. | ||
| - The third megaservice functions as a helper to extract list of frame-transcript pairs from videos using audio-to-text models (e.g., BLIP2) for transcripting or LVLM model (e.g., LLAVA) for captioning. This megaservice is composed of 2 microservices: transcripting and LVLM. This megaservice provides a MMRagVideoprepGateway, enabling user access through the `/v1/mmrag_video_prep` endpoint. |
There was a problem hiding this comment.
You mentioned either audio-to-text models for transcripting or LVLM model for captioning, while you also mentioned they need to be composed. If composing is not mandatory, does it make sense to make it as microservice?
There was a problem hiding this comment.
@hshen14 Thanks for your comment. We were proposing each transcripting and LVLM being a microservice. The composition is not mandatory. We believe that when we ingest a video, it is better to include both frames' transcripts and frames' captions as metadata for inference (LVLM) after retrieval. However, in the megaservice we will have different options for user to choose whether they want transcript only, caption only or both. The composition here is optional. Hope this is clear to you.
| #### 2.1 Embeddings | ||
| - Interface `MultimodalEmbeddings` that extends the interface langchain_core.embeddings.Embeddings with an abstract method: | ||
| ```python | ||
| embed_multimodal_document(self, doc: MultimodalDoc) -> List[float] |
There was a problem hiding this comment.
The overall RFC is good to me. just minor comment here. I think this interface is implementation specific one, right?. my point here is you don't need to mention that. you just need tell user what's the standard input and output just like you did in Data Classes section, that's enough.
There was a problem hiding this comment.
The RFC file naming convention follows this rule: yy-mm-dd-[OPEA Project Name]-[index]-title.md
For example, 24-04-29-GenAIExamples-001-Using_MicroService_to_implement_ChatQnA.md
There was a problem hiding this comment.
@tileintel could you pls update this file name to be consistent with other PRs?
|
Can this be closed? |
|
Yes, will close it soon |
ftian1
left a comment
There was a problem hiding this comment.
@tileintel could you pls update this file name to be consistent with other PRs? then I can merge this one
We submit our RFC for Multimodal-RAG based Visual QnA.
@ftian1 @hshen14: Could you please help to provide feedbacks?