Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add manifest documentation #103

Merged
merged 2 commits into from
May 9, 2023
Merged

Add manifest documentation #103

merged 2 commits into from
May 9, 2023

Conversation

RobbeSneyders
Copy link
Member

Fixes #62

Please check if there's anything I forgot to explain :)

@RobbeSneyders RobbeSneyders added the documentation Improvements or additions to documentation label May 9, 2023
@RobbeSneyders RobbeSneyders added this to the Alpha milestone May 9, 2023
@RobbeSneyders RobbeSneyders self-assigned this May 9, 2023
docs/manifest.md Outdated
subset is added or modified.

Each subset contains two properties:
- A location which points to the location where the underlying data is stored
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth mentioning that the location is relative to the base path

docs/manifest.md Outdated

### Subsets

Each subset is a different view on the data contained in the dataset. Different subsets could
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't say it's a different view, it's just a part of the dataset

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworded into "different features".

Copy link
Contributor

@NielsRogge NielsRogge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding!

}
```

All subsets should contain at least the data points defined by the index.
Copy link
Contributor

@PhilippeMoussalli PhilippeMoussalli May 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question regarding this point, say that we have an image subset and we gathered different subsets linked to it (captions, aesthetic score, ...) where each subset was probably obtained in a separate component. Then we extend the images subset (e.g. laion retrieval) and the index along with it.

seed_images (image_subset)-> caption_component (add caption subset) -> aesthetic_filter (add aesthetic score subset) -> image retrieval (add index and extend image_subset)

How do we handle the missing entries in the other subsets (captions, aesthetic score, ...) if we only plan on including them using subsequent components?

... -> image retrieval (add index and extend image_subset) -> caption_component -> filter_component

I think this is not possible with the current framework? This question might have already been addressed or I could have missed it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, this is not supported. There's two ways around this:

  • Reorder the components in your pipeline (eg. in your example, move the image retrieval to the 2nd place)
  • Work with non-linear DAGs when this is supported in the future (eg. in your example, the 2nd part of the pipeline could branch off and later merge again)

@RobbeSneyders RobbeSneyders merged commit 283685d into main May 9, 2023
@RobbeSneyders RobbeSneyders deleted the feature/manifest-docs branch May 15, 2023 16:28
Hakimovich99 pushed a commit that referenced this pull request Oct 16, 2023
Fixes #62

Please check if there's anything I forgot to explain :)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create 'Manifest' documentation
3 participants