Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add feature and feature set labels, for metadata #536

Merged
merged 25 commits into from Apr 30, 2020

Conversation

imjuanleonard
Copy link
Contributor

@imjuanleonard imjuanleonard commented Mar 16, 2020

What this PR does / why we need it:
-> Extension for #463

Update (@suwik):

  • Addressed previous review comments
  • Added labels on the feature set level
  • Removed the python SDK changes related to labels implemented inside field.py (30d63fa)
  • Made a small unit test refactoring on the way.

Which issue(s) this PR fixes:
Fixes #463

Does this PR introduce a user-facing change?:

Feature spec and Feature set spec will both have a new field called labels

@feast-ci-bot
Copy link
Collaborator

Hi @imjuanleonard. Thanks for your PR.

I'm waiting for a gojek member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@imjuanleonard
Copy link
Contributor Author

Hello @woop I cut the branch from the #499 and latest of #508 and implemented the SDK for the same
When this request gets merge, we can close #499 and #508 altogether

@khorshuheng
Copy link
Collaborator

/ok-to-test

@imjuanleonard
Copy link
Contributor Author

Implementation Detail
-> Cut Branch from #499
-> Cut Branch from #508
-> Implement the new python SDK for labels

@imjuanleonard
Copy link
Contributor Author

Demo List Features

from feast import Client,Entity, ValueType, FeatureSet, Feature
from pprint import pprint

client = Client(core_url="localhost:6565", serving_url="localhost6566")
client.set_project("test")

client_features = client.list_features()
pprint(vars(client_features['city']))

Output

{'_dtype': <ValueType.STRING: 2>,
 '_labels': {'key_test': 'value_test'},
 '_name': 'city'}

@imjuanleonard
Copy link
Contributor Author

Demo List Feature Set

Requirements
Add this function so it can prety print the feast/feature.py

    def __repr__(self):
        return "<Test name:%s dtype:%s labels:%s>" % (self._name, self.dtype, self.labels)

Create the python function to list feature set and print the features

from feast import Client,Entity, ValueType, FeatureSet, Feature
client = Client(core_url="localhost:6565", serving_url="localhost6566")
client.set_project('juan')
test = client.list_feature_sets(project="juan",name="driver",version="1")
for featureset in test:
    print(featureset.features)

Result:

[<Test name:city dtype:ValueType.STRING labels:{'key_test': 'value_test'}>]

@imjuanleonard
Copy link
Contributor Author

Example Set Labels

from google.protobuf.duration_pb2 import Duration
from feast import Client,Entity, ValueType, FeatureSet, Feature

client = Client(core_url="localhost:6565", serving_url="localhost6566")
client.set_project('juan')

feature1=Feature(name='total_transaction4',dtype=ValueType.INT64, labels=dict())
feature1.set_label("test","tost")
feature1.set_label("tast","tust")
feature1.set_label("bost","bust")

customer_fs = FeatureSet("customer_transactions",
                         entities=[Entity(name='customer_id4', dtype=ValueType.INT64, labels=dict())],
                         features=[feature1],
                         max_age=Duration(seconds=432000))
client.apply(customer_fs)

Result

Feature set updated/created: "customer_transactions:"

DB

feature_set_id          | juan/customer_transactions:5
bool_domain             |
domain                  |
float_domain            |
group_presence          |
image_domain            |
int_domain              |
labels                  | {"tast":"tust","test":"tost","bost":"bust"}
mid_domain              |
name                    | total_transaction4
natural_language_domain |
presence                |
project                 | juan
shape                   |
string_domain           |
struct_domain           |
time_domain             |
time_of_day_domain      |
type                    | INT64
url_domain              |
value_count             |
version                 | 5

@imjuanleonard
Copy link
Contributor Author

/retest

sdk/python/feast/client.py Outdated Show resolved Hide resolved
class Field:
"""
High level field type. This is the parent type to both entities and
features.
"""

def __init__(self, name: str, dtype: ValueType):
def __init__(self, name: str, dtype: ValueType, labels=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldnt labels have a type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is of type dictionary it is handled on the initiation below, this is to make the labels Optional so the function signature doesn't change

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use Optional[Dict] there I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If what you mean is typing.Optional

I have tried to use it before, it is not behaving as optional parameter instead according to the documentation it is behaving as optional value

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not entirely clear what you mean. Can you please refer to this explanation of Optional type hints: https://stackoverflow.com/questions/51710037/how-should-i-use-the-optional-type-hint

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I guess after python 3.5 they are using the Optional aforementioned for type hinting. Have added it

@woop
Copy link
Member

woop commented Mar 17, 2020

@imjuanleonard we absolutely need tests in order to sign this off. Can you please add them in the same PR?

@imjuanleonard
Copy link
Contributor Author

#536 (comment)

The test has been added

sdk/python/feast/feature.py Outdated Show resolved Hide resolved
@@ -46,6 +50,14 @@ def dtype(self) -> ValueType:
"""
return self._dtype

@property
def labels(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why some code is on the feature and some of it is on the field? Can we label entities? If not, why not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am following the current discussion that the use case people need are for the feature currently. So I don't want to assume that people need for the entities as well, Do we need metadata for the entity?

Copy link
Member

@woop woop Mar 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not so much about entity vs feature as it is feature vs field. Entities inherit from Fields. If the Field can have a label then an Entity can have a label. So we should either limit this entirely to features, or we should add labels to both Entities and Features, which may happen through the Field class.

Copy link
Collaborator

@suwik suwik Apr 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the .proto files on this branch, I can see that labels field is only applied to Features (not Entities or Fields). Not sure how to proceed with this one as I'm not familiar with requirements/use cases of the Python client. Any suggestion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually Fields are the carrier of (feature and entity) values—actual data, not the metadata of registry specifications. I think these changes to field.py should probably be backed out.

From discussion Friday, it sounds like we're in agreement to leave entity labeling out of scope for now. Useful thing to have someday (perhaps when #405 gets focus), but it wasn't initially planned for Feast 0.5. So I think we should proceed with adding map<string, string> labels to FeatureSetSpec as we discussed and providing the API for that.

We might have to defer Python SDK at least in part to the Gojek team, we don't currently use it. It's the primary means of verifying end-to-end functional integration in the project, but maybe as a compromise we can demonstrate the implementation using grpc_cli examples as a test plan, and Python SDK could be finished in a second PR.

@woop Does this sound reasonable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw @suwik have a look through tests/e2e directory in the root of the project for examples of Python SDK, useful for learning the system. They're kicked off for CI by scripts in infra/scripts (used to be in .prow/scripts at the point of our internal fork).

Copy link
Collaborator

@suwik suwik Apr 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for suggestion @ches. I'm thinking it would be good to add some logic testing the labels to the e2e tests (and later we could take that to our repo too). I tried to follow the installation instructions in the readme, but after running docker-compose up none of the mentioned ports are open. Any suggestion how to get my environment running for this repo so that I'm able to run the e2e tests?

EDIT: ok, seems like I had COMPOSE_FILE set to our internal fork file and that's why it wasn't starting all the services that I was expecting to.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed the python SDK changes related to labels (implemented inside field.py). I did that in a separate commit so it should be easy to rollback in case we actually need them.

@ches
Copy link
Member

ches commented Mar 22, 2020

I was a bit misled by the PR title and trying to follow the history from #499—this PR is not only Python SDK, it's complete implementation of feature labels.

Conflicts need to be fixed now (mostly generated files that have been removed from Git)—if in the process you could rebase and squash some of the false-start commits like list_features functionality that was removed, and typos, formatting, and comment updates, that would be fantastic. Not absolutely essential for you to do it, but being a significant PR, I'd love to be able to review it commit-by-commit.

@@ -47,6 +48,10 @@
@Column(name = "project")
private String project;

// Labels that this field belongs to
@Column(name = "labels", columnDefinition = "text")
private String labels;
Copy link
Member

@ches ches Apr 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm so I guess we're going to want to move this to the Feature entity class in #655… this was mentioned before but I guess I wasn't understanding the implications until I saw #655 reduced from #612.

Creates a bit of a conundrum for us since I thought the labels feature could be a straightforward backport (internal, assuming Feast isn't going to backport features). That's not going to be the case if this has to wait on #655 and substantial SQL schema migrations are going to be required for it.

@woop
Copy link
Member

woop commented Apr 29, 2020

/test test-end-to-end-batch

@woop
Copy link
Member

woop commented Apr 29, 2020

Looks good to me. Will you let me know when its ready to be merged?

@woop
Copy link
Member

woop commented Apr 29, 2020

Thanks for being so patient here @ches. I guess the conversation around Field, Entity, and Feature has temporarily been answered, although I personally don't think we have reached our destination yet. I do feel like the door is left more open with our current approach.

Map<String, String> featureSetLabels =
new HashMap<>() {
{
put("description", "My precious feature set");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my precious

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good spot! ;)

LABEL_KEY = "my"
LABEL_VALUE = "label"


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests look good, but possibly add a TODO to migrate this over to the python sdk once that has been updated?

@suwik
Copy link
Collaborator

suwik commented Apr 30, 2020

/test test-end-to-end

return get_feature_set_response.feature_set

@pytest.mark.timeout(45)
@pytest.mark.run(order=9)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @woop uses these to group similar tests together (10- - so maybe your tests can start from 50?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this stuff can be tweaked when changing the tests to use SDK in #663

@ches ches changed the title Add labels as metadata to features Add feature and feature set labels, for metadata Apr 30, 2020
@suwik
Copy link
Collaborator

suwik commented Apr 30, 2020

/test test-end-to-end

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: imjuanleonard, woop

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@woop
Copy link
Member

woop commented Apr 30, 2020

/lgtm

@feast-ci-bot feast-ci-bot merged commit ea4e2b0 into feast-dev:master Apr 30, 2020
ches pushed a commit to agoda-com/feast that referenced this pull request May 23, 2020
feast-ci-bot pushed a commit that referenced this pull request May 23, 2020
* Add feature and feature set labels

Backports #536 to v0.3

* Fix feature labels test for round trip

It appears that this test was only exercising the test setup code, not
covering that labels work when applied.

This change should be forward-ported to master.

Co-authored-by: Suwinski, Krzysztof (Agoda) <Krzysztof.Suwinski@agoda.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extend feature set and/or feature metadata
8 participants