-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add multimodal document #1335
Conversation
jina/types/document/multimodal.py
Outdated
"""Each :class:`MultimodalDocument` should have at least 2 chunks (represent as :class:`DocumentSet`) | ||
and len(set(doc.chunks.modality)) == len(doc.chunk) | ||
""" | ||
def __init__(self, document = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe there should be the possibility to build it from N documents with different modalities and to merge them into one multimodal document?
Would that also be useful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another useful interface would be to extract the embedding or content by modality (or the chunk) given a modality name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will need that interface, otherwise we do not remove any heavylifting from the user when creating this, no?. Or we will need some kind of MultiModalDocumentBuilder?
We need to decide what experience we want to offer when building a document like this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can offer a Builder interface or design this class as a Builder. And delegate the checks of correctness at the last step of the build step
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the same thought before, it should be something like an observer. The assurance of the correctness should happens at when we call chunks.add
or chunks.append
at DocumentSet
or ChunkSet
. But meanwhile I'm afraid it's a bit "over engineering". But it's good to have some discussion over that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now I would be happy just by adding thr interface to build directly from chunks. So that some boilerplatr can be added. What we would need to know is maybe about the granularities to be assigned and so on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the interface add as from_chunks
. for check of the correctness, now I'm using an internal method called _validate
, and chunks were validated at 2 places: 1. if we build MultimodalDocument
using from_chunks
or constructor, it will be validated inside the constructor. 2. Validate the chunks at modality_content_mapping
inside _build_modality_content_mapping
, since modality_content_mapping
is the common entrance of other methods and properties. and we avoid overkill/change DocumentSet.add
and ChunkSet.append
a10465a
to
c73e7f5
Compare
Codecov Report
@@ Coverage Diff @@
## master #1335 +/- ##
==========================================
+ Coverage 83.52% 83.58% +0.06%
==========================================
Files 103 104 +1
Lines 6792 6861 +69
==========================================
+ Hits 5673 5735 +62
- Misses 1119 1126 +7
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before merging I would like to have exposed an interface
that lets the user build a MultiModalDocument
from different data like:
d = MultiModalDocument({'modalitya': contentA, 'modalityB': contentB})
or
d = MultiModalDocument.from_content_or_embedding({'modalitya': contentA, 'modalityB': contentB}) (think about better naming)
Also I think that this class should have better care of setting the right granularity
parameters for the MultiModalDocument
and the chunks
. (Not so important but needed for the sake of coherence)
jina/types/document/multimodal.py
Outdated
self._modality_content_mapping = {} | ||
if chunks: | ||
self._validate(chunks) | ||
self.chunks.clear() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it weird to clear something that has not been initialized?
jina/types/document/multimodal.py
Outdated
self._build_modality_content_mapping() | ||
return self._modality_content_mapping | ||
|
||
def extract_content_by_modality(self, modality: str) -> DocumentContentType: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather use from-modality
since by
seems like we are grouping
@JoanFM how about MultimodalDocument.from_content_modality_mapping({'visual': xxx, 'textual': xxx}) Where xxx could be content or embedding. this classmethod has the same naming convention as the property |
This is what I meant yes. The problem we have is that is not easy to tell if the content when is a numpy array if it is embedding or content. So we need a flag to say that. |
Agreed, I'll create a getter & setter in |
Well i thought it twice and no, I think is better to assume they arr created from content so u assume the chunks are filled by document. But make sure it is documented that if one chunk is created from embeddings they will need to create from the other interface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still missing the handling of granularity level, but looks really good!
jina/types/document/multimodal.py
Outdated
else chunk.content | ||
self._validate(chunks=self.chunks) | ||
|
||
def _validate(self, chunks: List[Document]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just being a little picky, but we either make it static or we do not pass self.chunks
and extract them inside the function right? I would vote for the second
* feat: add multimodal set * test: move conftest * test: fix unit test for multimodal driver
Add
MultimodalDocument
to primitive types, and apply changes toMultimodalDriver
.Example usage of a
MultimodalDocument
?I personally think
from_modality_content_mapping
should be discussed further (keep or not) since it brings a bit inconsistency for extractingembedding
orcontent
.