Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xds relay fetches incorrect eds resources #147

Open
jyotimahapatra opened this issue Sep 17, 2020 · 4 comments
Open

xds relay fetches incorrect eds resources #147

jyotimahapatra opened this issue Sep 17, 2020 · 4 comments

Comments

@jyotimahapatra
Copy link
Contributor

jyotimahapatra commented Sep 17, 2020

Snapshot of versioninfo on upstream envoymanager and xds relay.
We can observe that cache keys are not mapping correctly to the responses they contain in xds relay cache.

➜  ~ em exec --stdin --tty envoymanager-main-754f8c6b74-ck2pr -n envoymanager-staging  --container envoymanager-service-gojson -- curl -s "0:6070/entry_dump?key=pyexample2_staging_eds" | head
[
{
  "key": "pyexample2_staging_eds",
  "version": "93efeb417a01422a7f856a1d14d70f2cbeefde0d",
  "resource":
{
  "clusterName": "pyexample2",
  "endpoints": [
    {
      "locality": {
➜  ~ em exec --stdin --tty envoymanager-main-754f8c6b74-ck2pr -n envoymanager-staging  --container envoymanager-service-gojson -- curl -s "0:6070/entry_dump?key=kitchensink_staging_eds" | head
[
{
  "key": "kitchensink_staging_eds",
  "version": "23878f15191ae03cdc018a012f2e5ddce2c2db40",
  "resource":
{
  "clusterName": "kitchensink",
  "endpoints": [
    {
      "locality": {
➜  ~ em exec --stdin --tty envoymanager-main-754f8c6b74-ck2pr -n envoymanager-staging  --container envoymanager-service-gojson -- curl -s "0:6070/entry_dump?key=pyexample2workers_staging_eds" | head
[
{
  "key": "pyexample2workers_staging_eds",
  "version": "d507920d74ddae6b002c24a899313aa656a78756",
  "resource":
{
  "clusterName": "pyexample2workers",
  "endpoints": [
    {
      "locality": {
➜  ~ em exec --stdin --tty xdsrelay-main-7bfc54dd8f-xjbqt -n xdsrelay-staging  --container xdsrelay-service-gojson -- curl -s 0:6070/cache/v3-pyexample2workers-staging-iad_eds | head
{
  "Cache": [
    {
      "Key": "v3-pyexample2workers-staging-iad_eds",
      "Resp": {
        "VersionInfo": "23878f15191ae03cdc018a012f2e5ddce2c2db40",
        "Resources": {
          "Endpoints": [
            {
              "cluster_name": "kitchensink",
➜  ~ em exec --stdin --tty xdsrelay-main-7bfc54dd8f-xjbqt -n xdsrelay-staging  --container xdsrelay-service-gojson -- curl -s 0:6070/cache/v3-pyexample2-staging-iad_eds | head
{
  "Cache": [
    {
      "Key": "v3-pyexample2-staging-iad_eds",
      "Resp": {
        "VersionInfo": "d507920d74ddae6b002c24a899313aa656a78756",
        "Resources": {
          "Endpoints": [
            {
              "cluster_name": "pyexample2workers",
➜  ~ em exec --stdin --tty xdsrelay-main-7bfc54dd8f-xjbqt -n xdsrelay-staging  --container xdsrelay-service-gojson -- curl -s 0:6070/cache/v3-kitchensink-staging-iad_eds | head
{
  "Cache": [
    {
      "Key": "v3-kitchensink-staging-iad_eds",
      "Resp": {
        "VersionInfo": "93efeb417a01422a7f856a1d14d70f2cbeefde0d",
        "Resources": {
          "Endpoints": [
            {
              "cluster_name": "pyexample2",
➜  ~
@jyotimahapatra
Copy link
Contributor Author

Found the reason for this. The aggregator rule in our private repo specific to Lyft had a bug due to which eds requests were cached on service name. So when svcA asked for eds, the last eds won and overwrote eds for all previous services

@jyotimahapatra
Copy link
Contributor Author

After adding rules to add resource name for eds, the cache is happy now.

  - rules:
    - match:
        request_type_match:
          types:
            - "type.googleapis.com/envoy.api.v2.RouteConfiguration"
            - "type.googleapis.com/envoy.config.route.v3.RouteConfiguration"
            - "type.googleapis.com/envoy.api.v2.ClusterLoadAssignment"
            - "type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment"
      result:
        resource_names_fragment:
          element: 0
          action: { exact: true }

@jyotimahapatra
Copy link
Contributor Author

An important aspect here is that these rules will possibly apply to all users of the project.

@eapolinario
Copy link
Contributor

In the envoy slack we mentioned two alternatives to fix this:

  • define an aggregation rules checker, similar to envoy's router check tool.
  • implicitly add the resource name to the cache key in case it's not present.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants