Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging all the codes of dev to main branch. #9

Merged
merged 64 commits into from
Oct 19, 2022
Merged

Merging all the codes of dev to main branch. #9

merged 64 commits into from
Oct 19, 2022

Conversation

dhrubo-os
Copy link
Collaborator

Description

Merging all the codes of dev to main branch

Issues Resolved

Merging all the codes of dev to main branch

--signoff::dhrubo@amazon.com

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

benwtrent and others added 29 commits October 18, 2022 16:25
This adds some more definite types for our NLP tasks and tokenization configurations.

This is the first step in allowing users to more easily import their own transformer models via something other than hugging face.

Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
Adds support for `question_answering` NLP models within the pytorch model uploader.

Related: elastic/elasticsearch#85958
Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
This improves the user consumed functions and classes for PyTorch NLP model upload to Elasticsearch.

Previously it was difficult to wrap your own module for uploading to Elasticsearch.

This commit splits some classes out, adds new ones, and adds tests showing how to wrap some simple modules.

Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
…es (#465)

This switches our sklearn.DecisionTreeClassifier serialization logic to account for multi-valued leaves in the tree.

The key difference between our inference and DecisionTreeClassifier, is that we run a softMax over the leaf where sklearn simply normalizes the results.

This means that our "probabilities" returned will be different than sklearn.

Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
Co-authored-by: Seth Michael Larson <seth.larson@elastic.co>
Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
… NLP task type from model config (#475)

For many model types, we don't need to require the task requested. We can infer the task type based on the model configuration and architecture.

This commit makes the `task-type` parameter optional for the model up load script and adds logic for auto-detecting the task type based on the 🤗 model.

Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
…n elastic vs open search

Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
LEFTA98 and others added 12 commits October 18, 2022 16:25
Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
…arning

Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
@ylwu-amzn
Copy link
Collaborator

See several files contains "elastic" like this one .ci/jobs/elastic+eland+7.x.yml, how about remove these files?

* Added support for XGBoost 1.6 (`#458`_)
* Added support for ``question_answering`` NLP tasks (`#457`_)

.. _#457: https://github.com/elastic/eland/pull/457
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep this change log file? Seems most content is about eland

@@ -57,3 +57,100 @@ If you discover a potential security issue in this project we ask that you notif
## Licensing

See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution.
**Repository:** <https://github.com/elastic/eland>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- Support for Docker
- Support for continuous integration
- Regenerating Sphinx docs
- Creating tutorials for `opensearch-py-ml` in both notebook and video form
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about follow the readme template of https://github.com/opensearch-project/opensearch-py, add "Code of Conduct", "Liscense" and "Copyright"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was planning to add those in the next PR.

@@ -0,0 +1,2319 @@
{
Copy link
Collaborator

@ylwu-amzn ylwu-amzn Oct 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how to use this demo notebook for community user. Does it depend on some data set? How to prepare the environment to run this demo? I think explain this in some doc could help, like readme or some other doc

@@ -0,0 +1,1213 @@
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file name seems not intuitive, we have "demo.ipynb" and this "demo_notebook.ipynb". How about change to other name to reflect the things it want to demo?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm planning to remove both of those in fact. And then add a detailed read me to follow.

Copy link
Collaborator

@ylwu-amzn ylwu-amzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved now, you can fix comments in next PR

@dhrubo-os dhrubo-os merged commit ba53ce1 into main Oct 19, 2022
@dhrubo-os dhrubo-os deleted the dev branch February 8, 2023 18:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants