Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When reading numeric strings from an Elasticsearch store with --es.tags-as-fields.all, the string is parsed as a number #1236

Closed
ScottKaye opened this issue Nov 27, 2018 · 10 comments · Fixed by #1618

Comments

@ScottKaye
Copy link

ScottKaye commented Nov 27, 2018

Requirement - what kind of business use case are you trying to solve?

There seems to be a bug similar to jaegertracing/jaeger-ui#146, which may now relate to #1018. I am using Elasticsearch as my span store, with ES_TAGS_AS_FIELDS_ALL=true. This stores spans like this:

{
    "traceID": "bb62383fdf565f55",
    "spanID": "bb62383fdf565f55",
    "operationName": "operation",
    "references": [],
    "flags": 1,
    "startTime": 1543345090571024,
    "startTimeMillis": 1543345090571,
    "duration": 420980,
    "tag": {
        "sampler@type": "const",
        "sampler@param": "True",
        "identifier": "000045"
    },
    "logs": [],
    "process": {
        "serviceName": "ui",
        "tag": {
            "jaeger@version": "CSharp-0.2.2.0",
            "hostname": "local",
            "ip": "10.1.1.1"
        }
    }
}

Here's the relevant part of the index mapping, showing that identifier is mapped as a keyword (string), rather than a number.

{
    "span": {
        "properties": {
            "tag": {
                "properties": {
                    "identifier": {
                        "type": "keyword",
                        "ignore_above": 256
                    },
                    "sampler@param": {
                        "type": "keyword",
                        "ignore_above": 256
                    },
                    "sampler@type": {
                        "type": "keyword",
                        "ignore_above": 256
                    }
                }
            }
        }
    }
}

Problem - what in Jaeger blocks you from solving the requirement?

However when I open this span in Jaeger UI,

The string is parsed as a number, resulting in the zeroes (padding) being truncated. Users copying and pasting the identifier value into another tool to search for aren't getting the correct results until they manually pad the number.

Proposal - what do you suggest to solve the problem or improve the existing situation?

If there is any way to read the Elasticsearch type mapped to the value at query time, a value of "keyword" would represent a string. The mapping can be pulled with GET localhost:9200/jaeger-span*/_mapping/span/field/tag.identifier, which returns

{
    "jaeger-span-2018-11-27": {
        "mappings": {
            "span": {
                "tag.identifier": {
                    "full_name": "tag.identifier",
                    "mapping": {
                        "identifier": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                }
            }
        }
    }
}

Prefixing the identifiers with anything non-numeric keeps the string format and works, though users have to now remove the extra character before searching.

@tiffon
Copy link
Member

tiffon commented Nov 28, 2018

Thanks for reporting this!

🪲

@tiffon
Copy link
Member

tiffon commented Nov 28, 2018

@ ScottKaye Can you post the JSON for that span which is returned by the endpoint /api/traces/bb62383fdf565f55 (or from a different span showing the problem)?

As long as we still have valid type values for the tags, then this should be straight-forward.

Here's an example of what we're after:

{
    "traceID": "7de903e7b7d7d057",
    "spanID": "5f2dded4b5a3d28f",
    "flags": 1,
    "operationName": "HTTP GET /route",
    "references": [
        {
            "refType": "CHILD_OF",
            "traceID": "7de903e7b7d7d057",
            "spanID": "162d9e5bd66b12a1"
        }
    ],
    "startTime": 1543351493775595,
    "duration": 30768,
    "tags": [
        {
            "key": "span.kind",
            "type": "string",
            "value": "server"
        },
        {
            "key": "http.method",
            "type": "string",
            "value": "GET"
        },
        {
            "key": "http.url",
            "type": "string",
            "value": "/route?dropoff=728%2C326&pickup=172%2C31"
        },
        {
            "key": "component",
            "type": "string",
            "value": "net/http"
        },
        {
            "key": "http.status_code",
            "type": "int64",
            "value": 200
        }
    ]
}

The endpoint will return the JSON for the full trace, but we only need a span that has the issue.

@ScottKaye
Copy link
Author

ScottKaye commented Nov 28, 2018

Sure! Looks like identifier is being mapped as int64.

{
    "traceID": "16947132f854f889",
    "spanID": "16947132f854f889",
    "flags": 1,
    "operationName": "operation",
    "references": [],
    "startTime": 1543355742002377,
    "duration": 384997,
    "tags": [
        {
            "key": "sampler.type",
            "type": "string",
            "value": "const"
        },
        {
            "key": "sampler.param",
            "type": "string",
            "value": "True"
        },
        {
            "key": "identifier",
            "type": "int64",
            "value": 45
        }
    ]
}

Confirming with the index shows that it is indeed still stored as "000045". I'm using contrived examples - my identifier in reality is a timestamp in the format of hhmmss which means 3:10am ends up as 031000. Unfortunately I don't have the ability to change our identifier format, otherwise I would just give it any letter prefix and call it a day.

@tiffon
Copy link
Member

tiffon commented Nov 30, 2018

Thanks for the details. Can you verify the tag value is being set as a string, or with type set to string? Seems like it is because how can it have the leading "0" in storage, but just want to confirm.

This seems like it might be a bug in core.

My main recourse for the issue is to respect the type as passed to the UI, and in our effort to parse JSON Objects and Arrays for enhanced representation, make sure we don't clobber strings into other
types. But, that's blocked if it's being passed to the UI as an int.

@ScottKaye
Copy link
Author

Looks like this is related to ES_TAGS_AS_FIELDS. When I write a span with this option disabled, these are the written tags:

"tags": [
    {
        "key": "sampler.type",
        "type": "string",
        "value": "const"
    },
    {
        "key": "sampler.param",
        "type": "bool",
        "value": "true"
    },
    {
        "key": "identifier",
        "type": "string",
        "value": "000045"
    }
]

When I open this span in Jaeger UI, the value appears as a string correctly ("000045"). Writing the same span with ES_TAGS_AS_FIELDS=true renders 45.

I should have checked this sooner, sorry about that. Is there any way to move this issue to the core repo?

@pavolloffay pavolloffay transferred this issue from jaegertracing/jaeger-ui Dec 4, 2018
@pavolloffay
Copy link
Member

pavolloffay commented Dec 4, 2018

Here's the relevant part of the index mapping, showing that identifier is mapped as a keyword (string), rather than a number.

This is correct at the moment we store everything as keyword. We need to map different types to the same in advance now known field.

@ScottKaye how do you report spans to jaeger?

  • what client library are you using?
  • do you have an example how do you create the span with identifier tag?

Btw. status code int reported from jaeger-query is mapped correctly to int:
SPAN_STORAGE_TYPE=elasticsearch go run -tags ui ./cmd/all-in-one/main.go --es.tags-as-fields.all=true
Index mapping

"tag" : {
            "properties" : {
              "component" : {
                "type" : "keyword",
                "ignore_above" : 256
              },
              "http@status_code" : {
                "type" : "keyword",
                "ignore_above" : 256
              }...
            }
          },

index data

"tag" : {
   "http@status_code" : 200
}

UI json:

{
  "key": "http@status_code",
   "type": "int64",
    "value": 200
}

@ScottKaye
Copy link
Author

I am using the Jaeger C# library with this simple test script:

var tracer = new Jaeger.Tracer.Builder("ui")
	.WithSampler(new Jaeger.Samplers.ConstSampler(true))
	.Build();

using (var scope = tracer.BuildSpan("operation").StartActive(true))
{
	scope.Span.SetTag("identifier", "000045");
	scope.Span.Log("test-event");
}

This is how I'm starting up Jaeger on Windows (heh):

@echo off
SET SPAN_STORAGE_TYPE=elasticsearch
SET ES_SERVER_URLS=http://localhost:9200
SET ES_TAGS_AS_FIELDS_ALL=true

.\jaeger-1.8.0-windows-amd64\jaeger-all-in-one.exe
pause

No special configuration is set on Elasticsearch. I deleted everything in my local cluster, ran my test script, then opened Jaeger UI. I get 45 parsed as a number. Here's the full record as saved in Elasticsearch:

{
    "traceID": "4d1b1124185fa8d0",
    "spanID": "4d1b1124185fa8d0",
    "flags": 1,
    "operationName": "operation",
    "references": [],
    "startTime": 1543957826297233,
    "startTimeMillis": 1543957826297,
    "duration": 5001,
    "tags": [],
    "tag": {
        "identifier": "000045",
        "sampler@param": true,
        "sampler@type": "const"
    },
    "logs": [
        {
            "timestamp": 1543957826301251,
            "fields": [
                {
                    "key": "event",
                    "type": "string",
                    "value": "test-event"
                }
            ]
        }
    ],
    "process": {
        "serviceName": "ui",
        "tags": [],
        "tag": {
            "hostname": "local",
            "ip": "10.1.1.1",
            "jaeger@version": "CSharp-0.2.2.0"
        }
    }
}

Running the same procedure without SET ES_TAGS_AS_FIELDS_ALL=true results in Jaeger UI displaying the value correctly. I love what ES_TAGS_AS_FIELDS_ALL does structurally though, thank you for adding this feature!

@pavolloffay
Copy link
Member

Elasticsearch does automatically type coercion https://www.elastic.co/guide/en/elasticsearch/reference/current/coerce.html. If you store string 5000 it will be converted to int. Wondering what happens if you disable it on the index.

The discussion around types also happened here #906 (comment)

@pavolloffay
Copy link
Member

Type coercion happens for sure, but in your case the identifier is stored as a string. The conversion to int happens here https://github.com/jaegertracing/jaeger/blob/master/plugin/storage/es/spanstore/dbmodel/to_domain.go#L165.

@StarpTech
Copy link

StarpTech commented Jan 18, 2019

Hi, I'm facing with the same issue. The documents are stored correctly but all fields are from type string rather number etc. Here is an example of the opentracing status_code tag.

image

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants