Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add multimodal document #1335

Merged
merged 16 commits into from
Dec 2, 2020
Merged

feat: add multimodal document #1335

merged 16 commits into from
Dec 2, 2020

Conversation

bwanglzu
Copy link
Member

@bwanglzu bwanglzu commented Nov 23, 2020

Add MultimodalDocument to primitive types, and apply changes to MultimodalDriver.

Example usage of a MultimodalDocument?

# With a list of documents (chunks)
md = MultimodalDocument(chunks=[chunk1, chunk2])
md = MultimodalDocument.from_chunks(chunks=[chunk1, chunk2])
# With modality content mapping (dict representation of modality and document content)
mapping = {'visual': 'visual content', 'textual': 'textual content'}
md = MultimodalDocument(modality_content_mapping=mapping)
md = MultimodalDocument.from_modality_content_mapping(modality_content_mapping=mapping)
# Exposed the modality content mapping as a property
md.modality_content_mapping
>>> {'visual': 'visual content', 'textual': 'textual content'}
# Exposed modalities (all modalities from chunk level) as a property
md.modalities
>>> ['visual', 'textual']
# Implemented a method to extract content from modality (used in driver)
md.extract_content_from_modality('visual')
>>> 'visual content'

I personally think from_modality_content_mapping should be discussed further (keep or not) since it brings a bit inconsistency for extracting embedding or content.

@jina-bot jina-bot added size/S area/core This issue/PR affects the core codebase area/helper This issue/PR affects the helper functionality component/driver component/type labels Nov 23, 2020
"""Each :class:`MultimodalDocument` should have at least 2 chunks (represent as :class:`DocumentSet`)
and len(set(doc.chunks.modality)) == len(doc.chunk)
"""
def __init__(self, document = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe there should be the possibility to build it from N documents with different modalities and to merge them into one multimodal document?

Would that also be useful?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another useful interface would be to extract the embedding or content by modality (or the chunk) given a modality name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need that interface, otherwise we do not remove any heavylifting from the user when creating this, no?. Or we will need some kind of MultiModalDocumentBuilder?

We need to decide what experience we want to offer when building a document like this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can offer a Builder interface or design this class as a Builder. And delegate the checks of correctness at the last step of the build step

Copy link
Member Author

@bwanglzu bwanglzu Nov 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same thought before, it should be something like an observer. The assurance of the correctness should happens at when we call chunks.add or chunks.append at DocumentSet or ChunkSet. But meanwhile I'm afraid it's a bit "over engineering". But it's good to have some discussion over that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I would be happy just by adding thr interface to build directly from chunks. So that some boilerplatr can be added. What we would need to know is maybe about the granularities to be assigned and so on

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the interface add as from_chunks. for check of the correctness, now I'm using an internal method called _validate, and chunks were validated at 2 places: 1. if we build MultimodalDocument using from_chunks or constructor, it will be validated inside the constructor. 2. Validate the chunks at modality_content_mapping inside _build_modality_content_mapping, since modality_content_mapping is the common entrance of other methods and properties. and we avoid overkill/change DocumentSet.add and ChunkSet.append

@bwanglzu bwanglzu self-assigned this Nov 23, 2020
CHANGELOG.md Outdated Show resolved Hide resolved
jina/drivers/multimodal.py Show resolved Hide resolved
jina/types/document/multimodal.py Outdated Show resolved Hide resolved
jina/drivers/multimodal.py Outdated Show resolved Hide resolved
@jina-bot jina-bot added size/M area/testing This issue/PR affects testing and removed size/S labels Nov 25, 2020
@codecov
Copy link

codecov bot commented Nov 25, 2020

Codecov Report

Merging #1335 (c38e755) into master (6a77950) will increase coverage by 0.06%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1335      +/-   ##
==========================================
+ Coverage   83.52%   83.58%   +0.06%     
==========================================
  Files         103      104       +1     
  Lines        6792     6861      +69     
==========================================
+ Hits         5673     5735      +62     
- Misses       1119     1126       +7     
Impacted Files Coverage Δ
jina/drivers/multimodal.py 91.42% <100.00%> (-3.93%) ⬇️
jina/excepts.py 100.00% <100.00%> (ø)
jina/types/document/__init__.py 97.36% <100.00%> (+0.05%) ⬆️
jina/types/document/multimodal.py 100.00% <100.00%> (ø)
jina/types/sets/document_set.py 98.85% <100.00%> (+0.13%) ⬆️
jina/peapods/grpc_asyncio.py 76.53% <0.00%> (-4.09%) ⬇️
jina/logging/sse.py 91.93% <0.00%> (-3.23%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6a77950...c38e755. Read the comment docs.

@bwanglzu bwanglzu marked this pull request as ready for review November 25, 2020 22:42
@bwanglzu bwanglzu requested a review from a team as a code owner November 25, 2020 22:42
@bwanglzu bwanglzu requested review from imsergiy and deepankarm and removed request for imsergiy November 25, 2020 22:42
@bwanglzu bwanglzu changed the title feat: draft multimodal document feat: add multimodal document Nov 25, 2020
Copy link
Member

@JoanFM JoanFM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before merging I would like to have exposed an interface that lets the user build a MultiModalDocument from different data like:

d = MultiModalDocument({'modalitya': contentA, 'modalityB': contentB})

or

d = MultiModalDocument.from_content_or_embedding({'modalitya': contentA, 'modalityB': contentB}) (think about better naming)

Also I think that this class should have better care of setting the right granularity parameters for the MultiModalDocument and the chunks. (Not so important but needed for the sake of coherence)

self._modality_content_mapping = {}
if chunks:
self._validate(chunks)
self.chunks.clear()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it weird to clear something that has not been initialized?

self._build_modality_content_mapping()
return self._modality_content_mapping

def extract_content_by_modality(self, modality: str) -> DocumentContentType:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather use from-modality since by seems like we are grouping

@bwanglzu
Copy link
Member Author

@JoanFM how about

MultimodalDocument.from_content_modality_mapping({'visual': xxx, 'textual': xxx}) 

Where xxx could be content or embedding. this classmethod has the same naming convention as the property MultimodalDocument.content_modality_mapping

@JoanFM
Copy link
Member

JoanFM commented Nov 29, 2020

@JoanFM how about

MultimodalDocument.from_content_modality_mapping({'visual': xxx, 'textual': xxx}) 

Where xxx could be content or embedding. this classmethod has the same naming convention as the property MultimodalDocument.content_modality_mapping

This is what I meant yes. The problem we have is that is not easy to tell if the content when is a numpy array if it is embedding or content. So we need a flag to say that.

@bwanglzu
Copy link
Member Author

Also I think that this class should have better care of setting the right granularity parameters for the MultiModalDocument and the chunks. (Not so important but needed for the sake of coherence)

Agreed, I'll create a getter & setter in :class:Document to expose granularity, now we can only access the property from :class:DocumentProto

@JoanFM
Copy link
Member

JoanFM commented Nov 29, 2020

@JoanFM how about

MultimodalDocument.from_content_modality_mapping({'visual': xxx, 'textual': xxx}) 

Where xxx could be content or embedding. this classmethod has the same naming convention as the property MultimodalDocument.content_modality_mapping

This is what I meant yes. The problem we have is that is not easy to tell if the content when is a numpy array if it is embedding or content. So we need a flag to say that.

Well i thought it twice and no, I think is better to assume they arr created from content so u assume the chunks are filled by document. But make sure it is documented that if one chunk is created from embeddings they will need to create from the other interface

nan-wang
nan-wang previously approved these changes Dec 1, 2020
Copy link
Member

@nan-wang nan-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM👍

jina/types/document/multimodal.py Show resolved Hide resolved
Copy link
Member

@JoanFM JoanFM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still missing the handling of granularity level, but looks really good!

else chunk.content
self._validate(chunks=self.chunks)

def _validate(self, chunks: List[Document]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just being a little picky, but we either make it static or we do not pass self.chunks and extract them inside the function right? I would vote for the second

bwanglzu and others added 3 commits December 2, 2020 16:36
* feat: add multimodal set

* test: move conftest

* test: fix unit test for multimodal driver
@JoanFM JoanFM merged commit e68c7ad into master Dec 2, 2020
@JoanFM JoanFM deleted the feat-add-multimodal-document branch December 2, 2020 21:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/core This issue/PR affects the core codebase area/helper This issue/PR affects the helper functionality area/testing This issue/PR affects testing component/driver component/type size/M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants