Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GetFeastCoreVersion failed with code "StatusCode.UNIMPLEMENTED" #318

Closed
NicholaiStaalung opened this issue Nov 22, 2019 · 28 comments
Closed

Comments

@NicholaiStaalung
Copy link

NicholaiStaalung commented Nov 22, 2019

Expected Behavior

Feast Core and Serving should be connected in the python sdk when running feast version shown by the following output (From https://github.com/gojek/feast/blob/master/docs/getting-started/install-feast.md)

{
"sdk": {
"version": "feast 0.3.0"
},
"core": {
"url": "192.168.99.100:32090",
"version": "0.3",
"status": "connected"
},
"serving": {
"url": "192.168.99.100:32091",
"version": "0.3",
"status": "connected"
}
}

Current Behavior

When running feast version

GetFeastCoreVersion failed with code "StatusCode.UNIMPLEMENTED"
Method feast.core.CoreService/GetFeastCoreVersion is unimplemented
{"sdk": {"version": "feast 0.3.0"}, "core": {"url": "192.168.39.232:32090", "version": "", "status": "not connected"}, "serving": {"url": "192.168.39.232:32091", "version": "0.3", "status": "connected"}}

Steps to reproduce

Follow https://github.com/gojek/feast/blob/master/docs/getting-started/install-feast.md steps 0-2 for minikube (local) installation.

Then ran
pip3 install -e ${FEAST_HOME_DIR}/sdk/python --user
feast config set core_url ${FEAST_CORE_URL}
feast config set serving_url ${FEAST_SERVING_URL}
feast version
Which is where the problem occured

Specifications

  • Version: Master (0.3)
  • Platform: Localhost (Ubuntu 18.04)
  • Subsystem: python 3.6.8, helm 2.16.0, kubectl client 1.16.1, kubectl server 1.15.5, minikube 1.5.2

Possible Solution

I'm not sure.
I did however notice something strange when the pods are starting up. In the picture below a number of restarts occur for the core and serving services in the cluster. Before the restart occurs the pods are always going from 'ContainerCreating' to 'Running' to 'Error' to CrashLoopBackOff'. This happens in loops until it finally just says 'Running' after 5-6 mins. And it happens every time i do a clean (maybe unclean) installation. My best guess is that the core service has a bug with the connection but it could be in the python sdk as well for all i know.
image

@woop
Copy link
Member

woop commented Nov 22, 2019

Hey @NicholaiStaalung. Thanks for filing this issue :)

Question: Did it eventually come up for you? Was everything working as expected?

@NicholaiStaalung
Copy link
Author

Hey @NicholaiStaalung. Thanks for filing this issue :)

Question: Did it eventually come up for you? Was everything working as expected?

Hi @woop

Hmm from your question i guess it wasnt working as expected. What do you mean by "it"?

@woop
Copy link
Member

woop commented Nov 22, 2019

Hey @NicholaiStaalung. Thanks for filing this issue :)
Question: Did it eventually come up for you? Was everything working as expected?

Hi @woop

Hmm from your question i guess it wasnt working as expected. What do you mean by "it"?

I wasn't clear: Did Feast core eventually come up and work for you?

The restarts are a part of this "full scale" installation. Serving will restart until Core is up, and Core will wait/restart until Kafka is up, and Kafka waits for Zookeeper. We are still trying to reduce the total time for this installation to take place, as well as the amount of dependencies in this system.

@NicholaiStaalung
Copy link
Author

Hey @NicholaiStaalung. Thanks for filing this issue :)
Question: Did it eventually come up for you? Was everything working as expected?

Hi @woop
Hmm from your question i guess it wasnt working as expected. What do you mean by "it"?

I wasn't clear: Did Feast core eventually come up and work for you?

The restarts are a part of this "full scale" installation. Serving will restart until Core is up, and Core will wait/restart until Kafka is up, and Kafka waits for Zookeeper. We are still trying to reduce the total time for this installation to take place, as well as the amount of dependencies in this system.

I see. From the terminal output kubectl get pods its status is "running" which it remains as after 5-6 mins. This makes sense according to your description of the flow. I dont know of any other way to test if the core service is running properly. Any magic command you want to share? :)

@woop
Copy link
Member

woop commented Nov 23, 2019

You can just try feast feature-sets list and see if it gives you an empty list.

@NicholaiStaalung
Copy link
Author

You can just try feast feature-sets list and see if it gives you an empty list.

@woop

I get an almost empty list.

from feast feature-sets list

NAME VERSION

@woop
Copy link
Member

woop commented Nov 23, 2019

You can just try feast feature-sets list and see if it gives you an empty list.

@woop

I get an almost empty list.

from feast feature-sets list

NAME VERSION

Yip, that's empty :)

@NicholaiStaalung
Copy link
Author

NicholaiStaalung commented Nov 24, 2019

You can just try feast feature-sets list and see if it gives you an empty list.

@woop
I get an almost empty list.
from feast feature-sets list

NAME VERSION

Yip, that's empty :)

Is it then supposed to be working correctly?

I tried running feast feature-sets create passengers but it gave me a traceback which doesn't give much information really. Maybe you can understand it better? I see something about core.py in in the start so i guess it can be related to the issue of the core service as it still says "not connected" when i run feast version

Traceback (most recent call last):
File "/home/nrs/.local/bin/feast", line 11, in
load_entry_point('feast', 'console_scripts', 'feast')()
File "/home/nrs/.local/lib/python3.6/site-packages/click/core.py", line 764, in call
return self.main(*args, **kwargs)
File "/home/nrs/.local/lib/python3.6/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/home/nrs/.local/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/nrs/.local/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/nrs/.local/lib/python3.6/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/nrs/.local/lib/python3.6/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/home/nrs/misc/FeastKubeflow-test/feast/sdk/python/cli.py", line 157, in create
feast_client.apply(FeatureSet(name=name))
File "/home/nrs/misc/FeastKubeflow-test/feast/sdk/python/feast/client.py", line 203, in apply
self._apply_feature_set(feature_set)
File "/home/nrs/misc/FeastKubeflow-test/feast/sdk/python/feast/client.py", line 215, in _apply_feature_set

@woop
Copy link
Member

woop commented Nov 24, 2019

Hey @NicholaiStaalung

Yea the behavior of list is as expected. The stack trace is not normal though.

Is that the complete stack trace, or did any messages get trimmed at the bottom?

@NicholaiStaalung
Copy link
Author

Hey @NicholaiStaalung

Yea the behavior of list is as expected. The stack trace is not normal though.

Is that the complete stack trace, or did any messages get trimmed at the bottom?

Complete stack trace. Its not very informative no :)

@woop
Copy link
Member

woop commented Nov 24, 2019

Hey @NicholaiStaalung
Yea the behavior of list is as expected. The stack trace is not normal though.
Is that the complete stack trace, or did any messages get trimmed at the bottom?

Complete stack trace. Its not very informative no :)

I will add better exception handling there to see if we can improve the response.

In the mean time, can you try and run the same from the Python as a library (instead of CLI)?

You should probably also confirm that your configuration is correctly applied. Meaning you should have a folder ~/.feast/config.toml with a serving and a core URL.

@NicholaiStaalung
Copy link
Author

NicholaiStaalung commented Nov 24, 2019

@woop
The core service is quite unstable. It keeps restarting. I noticed because i succesfully created a feature set by following this guide https://github.com/gojek/feast/blob/feast-0-3-python-sdk-doc/sdk/python/docs/readme.md. I could list the feature set through the python sdk and the terminal sporadically. Then i figured that the Core Service must have been unstable and i did a watch kubectl get pods and saw that it kept restarting. This behaviour started immediately after i created a feature set with a single feature and a single entity. As you can see the service have been running for quite some time.
image

@NicholaiStaalung
Copy link
Author

NicholaiStaalung commented Nov 24, 2019

Hey @NicholaiStaalung
Yea the behavior of list is as expected. The stack trace is not normal though.
Is that the complete stack trace, or did any messages get trimmed at the bottom?

Complete stack trace. Its not very informative no :)

I will add better exception handling there to see if we can improve the response.

In the mean time, can you try and run the same from the Python as a library (instead of CLI)?

You should probably also confirm that your configuration is correctly applied. Meaning you should have a folder ~/.feast/config.toml with a serving and a core URL.

Yeah, i tried it. And i do have the config file. See my comment above.

@woop
Copy link
Member

woop commented Nov 24, 2019

@woop
The core service is quite unstable. It keeps restarting. I noticed because i succesfully created a feature set by following this guide https://github.com/gojek/feast/blob/feast-0-3-python-sdk-doc/sdk/python/docs/readme.md. I could list the feature set through the python sdk and the terminal sporadically. Then i figured that the Core Service must have been unstable and i did a watch kubectl get pods and saw that it kept restarting. This behaviour started immediately after i created a feature set with a single feature and a single entity. As you can see the service have been running for quite some time.
image

Can you do kubectl logs [feast-core-pod-name] to see the logs of feast-core?

@NicholaiStaalung
Copy link
Author

NicholaiStaalung commented Nov 24, 2019

@woop
The core service is quite unstable. It keeps restarting. I noticed because i succesfully created a feature set by following this guide https://github.com/gojek/feast/blob/feast-0-3-python-sdk-doc/sdk/python/docs/readme.md. I could list the feature set through the python sdk and the terminal sporadically. Then i figured that the Core Service must have been unstable and i did a watch kubectl get pods and saw that it kept restarting. This behaviour started immediately after i created a feature set with a single feature and a single entity. As you can see the service have been running for quite some time.
image

Can you do kubectl logs [feast-core-pod-name] to see the logs of feast-core?

here you go
core_logs.txt

I dont think its the full log. It only shows about 20 seconds. I guess it deletes the logs when it restarts

@woop
Copy link
Member

woop commented Nov 24, 2019

It seems like this is the problem: org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send the call.

You are using Minikube right? Are you sure that Kafka is accessible?

The way that I got it to work in Minikube was to expose the Kafka cluster to the outside and then connect from feast-core directly to the external address.

This guide does illustrate it https://github.com/gojek/feast/blob/master/docs/getting-started/install-feast.md

Can you run kubectl get services and see which port your Kafka is exposed on, and then run minikube ip to see what your minikube external IP is? Then try to list the topics using kafkacat

docker run -it --network=host edenhill/kafkacat:1.5.0 -b $(minikube ip):31090 -L

You might have to change that port based on the one that your Kafka is listening on.

The above command should print out information about the topics in your Kafka deployment. If it doesnt then that is a problem.

@NicholaiStaalung
Copy link
Author

NicholaiStaalung commented Nov 26, 2019

It seems like this is the problem: org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send the call.

You are using Minikube right? Are you sure that Kafka is accessible?

The way that I got it to work in Minikube was to expose the Kafka cluster to the outside and then connect from feast-core directly to the external address.

This guide does illustrate it https://github.com/gojek/feast/blob/master/docs/getting-started/install-feast.md

Can you run kubectl get services and see which port your Kafka is exposed on, and then run minikube ip to see what your minikube external IP is? Then try to list the topics using kafkacat

docker run -it --network=host edenhill/kafkacat:1.5.0 -b $(minikube ip):31090 -L

You might have to change that port based on the one that your Kafka is listening on.

The above command should print out information about the topics in your Kafka deployment. If it doesnt then that is a problem.

Hi @woop,

Sorry i was away. Thank you for helping with this issue.

I tried your commands

kubectl get services it looks like kafka is exposed on 9092 which i figure is correct?
image

minikube ip gets me 192.168.39.220

docker run -it --network=host edenhill/kafkacat:1.5.0 -b 192.168.39.220:31090 -L gets me
image

It printed out the topic 'feast' which i assume is the correct output. However im not able to read if this gets me any closer to fixing the issue.

@woop
Copy link
Member

woop commented Nov 26, 2019

Thanks for the detailed response!

Those look fine. Would you mind showing me your values.yaml file that you used to do this install?

@NicholaiStaalung
Copy link
Author

NicholaiStaalung commented Nov 27, 2019

Thanks for the detailed response!

Those look fine. Would you mind showing me your values.yaml file that you used to do this install?

Here you go

####REMOVED LINK

@woop
Copy link
Member

woop commented Nov 27, 2019

Thanks for the detailed response!
Those look fine. Would you mind showing me your values.yaml file that you used to do this install?

Here you go
Uploading my-feast-values.yaml.txt…

I think you pasted the wrong link. It's taking me back to this issue.

@NicholaiStaalung
Copy link
Author

NicholaiStaalung commented Nov 27, 2019

Thanks for the detailed response!
Those look fine. Would you mind showing me your values.yaml file that you used to do this install?

Here you go
Uploading my-feast-values.yaml.txt…

I think you pasted the wrong link. It's taking me back to this issue.

wierd :)

my-feast-values.yaml.txt

@woop
Copy link
Member

woop commented Nov 27, 2019

@NicholaiStaalung I am not sure how familiar you are with Kubernetes, but what I would do next to debug this is to confirm whether Feast Core can actually communicate with Kafka.

This would require you to SSH into the feast-core pod.

Something like

kubectl exec -it [podname] /bin/bash
curl  192.168.39.220:31090

It should say something like connection reset by peer

@NicholaiStaalung
Copy link
Author

NicholaiStaalung commented Nov 27, 2019

@NicholaiStaalung I am not sure how familiar you are with Kubernetes, but what I would do next to debug this is to confirm whether Feast Core can actually communicate with Kafka.

This would require you to SSH into the feast-core pod.

Something like

kubectl exec -it [podname] /bin/bash
curl  192.168.39.220:31090

It should say something like connection reset by peer

Okay, so i ssh'ed into the pod and did a curl on the ip and port as you described. I did get a connection reset by peer (Had to change the IP as i redeployed minikube)
image

And kubernetes is definitely not a strength of mine :)

@woop
Copy link
Member

woop commented Nov 28, 2019

Your connectivity seems fine here. Are you seeing any other logs on any other pods like Kafka?

Perhaps it would be easier if we you use Docker Compose here. We are in the process of adding it to the project, should be ready next week. #328

@NicholaiStaalung
Copy link
Author

Your connectivity seems fine here. Are you seeing any other logs on any other pods like Kafka?

Perhaps it would be easier if we you use Docker Compose here. We are in the process of adding it to the project, should be ready next week. #328

Okay. Thanks for your assistance. I will close the issue if Docker Compose works for me.

@woop
Copy link
Member

woop commented Nov 28, 2019

Sure

@NicholaiStaalung
Copy link
Author

Docker compose did indeed work. Closing as i have it running locally now

@woop
Copy link
Member

woop commented Dec 9, 2019

Thank you @NicholaiStaalung :)

woop pushed a commit that referenced this issue Oct 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants