Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better storage of _source #9034

Closed
jpountz opened this issue Dec 22, 2014 · 17 comments
Closed

Better storage of _source #9034

jpountz opened this issue Dec 22, 2014 · 17 comments
Labels
>enhancement high hanging fruit :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch

Comments

@jpountz
Copy link
Contributor

jpountz commented Dec 22, 2014

Today we store the _source is a single big binary stored field. While this is great for simplicity, this also has the bad side-effect to encourage to store fields individually in order to save some json parsing when there are a couple of large field values and we are only interested in some short values. Maybe we could try to be a bit smarter and store the _source across several stored fields so that it would not be an issue anymore?

Random idea: given a document that looks like:

{
  "title": "short_string",
  "body": "very_very_very_very_long_string",
  "array": [2, 3, 10],
  "foo": {
    "foo": 42,
    "bar": "baz"
  }
}

we could for instance store all the top-level fields into their own stored field

Field Values
title "short_string"
body "very_very_very_very_long_string"
array [2, 3, 10]
foo {"foo": 42, "bar": "baz"}

or maybe even each value individually (but it becomes more complicated with arrays of objects):

Field Values
title "short_string"
body "very_very_very_very_long_string"
array [2, 3, 10]
foo.foo 42
foo.bar "baz"

Then we would have to make _source filtering aware of the way fields are stored, and for instance if we store only top-level fields into their own stored field then we could translate an include rule like foo.* to "retrieve field foo", and foo.bar.* to "get everything under bar for field foo".

@clintongormley
Copy link
Contributor

As you say, arrays complicate things. One of the problems we with the the fields parameter was not knowing whether the original field in the _source was a scalar or a one-element array, or a null vs empty array.

@clintongormley
Copy link
Contributor

@jpountz just reread your original description and realised that your first suggestion handles the issue of representing things like 1 vs [1], null vs missing vs [] vs [null] etc.

I like this idea a lot. Once concern: is there extra overhead if we have thousands of small top-level fields, eg 10,000 ints? If so, could we possibly group these small fields?

@nik9000
Copy link
Member

nik9000 commented May 12, 2015

Wasn't the issue with storing everything in different fields that the extra lookups were time consuming? This wouldn't help there unless you used it for field that are large, kind like PostgreSQL's toast mechanism.

BTW - I've always thought it'd be nice to be able to load portions of string fields. Something like String stringAtOffset(name, startOffset, endOffset). Even if you used a compression algorithm on the string that needs to start at the beginning you could still get a win by not decompressing the whole thing. This probably comes from my concentration on highlighting and my experience with megabyte sized documents. And I rarely see source loading actually come up in stack traces so its probably silly.

@clintongormley
Copy link
Contributor

Wasn't the issue with storing everything in different fields that the extra lookups were time consuming? This wouldn't help there unless you used it for field that are large, kind like PostgreSQL's toast mechanism.

With Lucene 4 and above, this is no longer the case - having 1 large field vs several small fields no longer matters.

@nik9000
Copy link
Member

nik9000 commented May 12, 2015

With Lucene 4 and above, this is no longer the case - having 1 large field vs several small fields no longer matters.

Ah cool. I suspect that instinct is left over from Elasticsearch 0.90 days. Cheers.

@jpountz
Copy link
Contributor Author

jpountz commented Mar 14, 2018

cc @elastic/es-search-aggs

@jpountz
Copy link
Contributor Author

jpountz commented Jun 5, 2019

I wanted to check how much it would save so I played with the following patch, which stores every top-level json field in its own stored field as described in the issue description:

diff --git a/server/src/main/java/org/elasticsearch/index/engine/LuceneChangesSnapshot.java b/server/src/main/java/org/elasticsearch/index/engine/LuceneChangesSnapshot.java
index a3e86ab..f496275 100644
--- a/server/src/main/java/org/elasticsearch/index/engine/LuceneChangesSnapshot.java
+++ b/server/src/main/java/org/elasticsearch/index/engine/LuceneChangesSnapshot.java
@@ -236,9 +236,7 @@ final class LuceneChangesSnapshot implements Translog.Snapshot {
             return null;
         }
         final long version = parallelArray.version[docIndex];
-        final String sourceField = parallelArray.hasRecoverySource[docIndex] ? SourceFieldMapper.RECOVERY_SOURCE_NAME :
-            SourceFieldMapper.NAME;
-        final FieldsVisitor fields = new FieldsVisitor(true, sourceField);
+        final FieldsVisitor fields = new FieldsVisitor(true, parallelArray.hasRecoverySource[docIndex]);
         leaf.reader().document(segmentDocID, fields);
         fields.postProcess(mapperService);
 
diff --git a/server/src/main/java/org/elasticsearch/index/fieldvisitor/FieldsVisitor.java b/server/src/main/java/org/elasticsearch/index/fieldvisitor/FieldsVisitor.java
index 462f8ce..a0eeb69 100644
--- a/server/src/main/java/org/elasticsearch/index/fieldvisitor/FieldsVisitor.java
+++ b/server/src/main/java/org/elasticsearch/index/fieldvisitor/FieldsVisitor.java
@@ -49,30 +54,49 @@ import static org.elasticsearch.common.util.set.Sets.newHashSet;
  * Base {@link StoredFieldVisitor} that retrieves all non-redundant metadata.
  */
 public class FieldsVisitor extends StoredFieldVisitor {
+
     private static final Set<String> BASE_REQUIRED_FIELDS = unmodifiableSet(newHashSet(
             IdFieldMapper.NAME,
             RoutingFieldMapper.NAME));
 
     private final boolean loadSource;
-    private final String sourceFieldName;
+    private final boolean useRecoverySource;
     private final Set<String> requiredFields;
-    protected BytesReference source;
     protected String type, id;
     protected Map<String, List<Object>> fieldsValues;
 
+    private BytesStreamOutput sourceBytes;
+    private XContentGenerator sourceGenerator;
+    protected BytesReference source;
+
     public FieldsVisitor(boolean loadSource) {
-        this(loadSource, SourceFieldMapper.NAME);
+        this(loadSource, false);
     }
 
-    public FieldsVisitor(boolean loadSource, String sourceFieldName) {
+    public FieldsVisitor(boolean loadSource, boolean useRecoverySource) {
         this.loadSource = loadSource;
-        this.sourceFieldName = sourceFieldName;
+        this.useRecoverySource = useRecoverySource;
         requiredFields = new HashSet<>();
         reset();
     }
 
+    private XContentGenerator getSourceGenerator() throws IOException {
+        if (sourceGenerator == null) {
+            sourceBytes = new BytesStreamOutput();
+            sourceGenerator = JsonXContent.jsonXContent.createGenerator(sourceBytes);
+            sourceGenerator.writeStartObject();
+        }
+        return sourceGenerator;
+    }
+
     @Override
     public Status needsField(FieldInfo fieldInfo) throws IOException {
+        if (fieldInfo.name.equals(SourceFieldMapper.NAME) || fieldInfo.name.startsWith(SourceFieldMapper.NAME_PREFIX)) {
+            return loadSource && useRecoverySource == false ? Status.YES : Status.NO;
+        } else if (fieldInfo.name.equals(SourceFieldMapper.RECOVERY_SOURCE_NAME)) {
+            return loadSource && useRecoverySource ? Status.YES : Status.NO;
+        }
+
         if (requiredFields.remove(fieldInfo.name)) {
             return Status.YES;
         }
@@ -94,6 +118,11 @@ public class FieldsVisitor extends StoredFieldVisitor {
         if (mapper != null) {
             type = mapper.type();
         }
+        if (loadSource && source == null && sourceGenerator == null &&
+                mapper.metadataMapper(SourceFieldMapper.class).enabled()) {
+            // can happen if the source is split and the document has no fields
+            source = new BytesArray("{}");
+        }
         for (Map.Entry<String, List<Object>> entry : fields().entrySet()) {
             MappedFieldType fieldType = mapperService.fullName(entry.getKey());
             if (fieldType == null) {
@@ -109,8 +138,12 @@ public class FieldsVisitor extends StoredFieldVisitor {
 
     @Override
     public void binaryField(FieldInfo fieldInfo, byte[] value) throws IOException {
-        if (sourceFieldName.equals(fieldInfo.name)) {
+        if (SourceFieldMapper.RECOVERY_SOURCE_NAME.equals(fieldInfo.name)
+                || SourceFieldMapper.NAME.equals(fieldInfo.name)) {
+        } else if (fieldInfo.name.startsWith(SourceFieldMapper.NAME_PREFIX)) {
+            String fieldName = fieldInfo.name.substring(SourceFieldMapper.NAME_PREFIX.length());
+            getSourceGenerator().writeRawField(fieldName, new ByteArrayInputStream(value), XContentType.JSON);
         } else if (IdFieldMapper.NAME.equals(fieldInfo.name)) {
             id = Uid.decodeId(value);
         } else {
@@ -120,31 +153,58 @@ public class FieldsVisitor extends StoredFieldVisitor {
 
     @Override
     public void stringField(FieldInfo fieldInfo, byte[] bytes) throws IOException {
+        assert fieldInfo.name.startsWith(SourceFieldMapper.NAME_PREFIX) == false;
         final String value = new String(bytes, StandardCharsets.UTF_8);
         addValue(fieldInfo.name, value);
     }
 
     @Override
     public void intField(FieldInfo fieldInfo, int value) throws IOException {
+        assert fieldInfo.name.startsWith(SourceFieldMapper.NAME_PREFIX) == false;
         addValue(fieldInfo.name, value);
     }
 
     @Override
     public void longField(FieldInfo fieldInfo, long value) throws IOException {
+        assert fieldInfo.name.startsWith(SourceFieldMapper.NAME_PREFIX) == false;
         addValue(fieldInfo.name, value);
     }
 
     @Override
     public void floatField(FieldInfo fieldInfo, float value) throws IOException {
+        assert fieldInfo.name.startsWith(SourceFieldMapper.NAME_PREFIX) == false;
         addValue(fieldInfo.name, value);
     }
 
     @Override
     public void doubleField(FieldInfo fieldInfo, double value) throws IOException {
+        assert fieldInfo.name.startsWith(SourceFieldMapper.NAME_PREFIX) == false;
         addValue(fieldInfo.name, value);
     }
 
     public BytesReference source() {
+        if (type == null) {
+            throw new IllegalStateException("Call postProcess first");
+        }
+        if (source != null && sourceGenerator != null) {
+            throw new IllegalStateException("Documents should have a single source");
+        }
+        if (sourceGenerator != null) {
+            try {
+                sourceGenerator.writeEndObject();
+                sourceGenerator.close();
+            } catch (IOException e) {
+                throw new RuntimeException("cannot happen: in-memory stream", e);
+            }
+            source = sourceBytes.bytes();
+            sourceBytes = null;
+            sourceGenerator = null;
+        }
         return source;
     }
 
@@ -180,9 +240,6 @@ public class FieldsVisitor extends StoredFieldVisitor {
         id = null;
 
         requiredFields.addAll(BASE_REQUIRED_FIELDS);
-        if (loadSource) {
-            requiredFields.add(sourceFieldName);
-        }
     }
 
     void addValue(String name, Object value) {
diff --git a/server/src/main/java/org/elasticsearch/index/get/ShardGetService.java b/server/src/main/java/org/elasticsearch/index/get/ShardGetService.java
index f77fc07..b22a1e0 100644
--- a/server/src/main/java/org/elasticsearch/index/get/ShardGetService.java
+++ b/server/src/main/java/org/elasticsearch/index/get/ShardGetService.java
@@ -198,6 +198,7 @@ public final class ShardGetService extends AbstractIndexShardComponent {
             } catch (IOException e) {
                 throw new ElasticsearchException("Failed to get type [" + type + "] and id [" + id + "]", e);
             }
+            fieldVisitor.postProcess(mapperService);
             source = fieldVisitor.source();
 
             if (!fieldVisitor.fields().isEmpty()) {
diff --git a/server/src/main/java/org/elasticsearch/index/mapper/SourceFieldMapper.java b/server/src/main/java/org/elasticsearch/index/mapper/SourceFieldMapper.java
index 0242585..62f84a1 100644
--- a/server/src/main/java/org/elasticsearch/index/mapper/SourceFieldMapper.java
+++ b/server/src/main/java/org/elasticsearch/index/mapper/SourceFieldMapper.java
@@ -50,6 +57,7 @@ import java.util.function.Function;
 public class SourceFieldMapper extends MetadataFieldMapper {
 
     public static final String NAME = "_source";
+    public static final String NAME_PREFIX = NAME + ".";
     public static final String RECOVERY_SOURCE_NAME = "_recovery_source";
 
     public static final String CONTENT_TYPE = "_source";
@@ -241,8 +249,28 @@ public class SourceFieldMapper extends MetadataFieldMapper {
                 builder.close();
                 source = bStream.bytes();
             }
-            BytesRef ref = source.toBytesRef();
-            fields.add(new StoredField(fieldType().name(), ref.bytes, ref.offset, ref.length));
+
+            try (XContentParser sourceParser = XContentFactory.xContent(context.sourceToParse().getXContentType())
+                    .createParser(NamedXContentRegistry.EMPTY, DeprecationHandler.THROW_UNSUPPORTED_OPERATION, source.streamInput())) {
+                if (sourceParser.nextToken() != Token.START_OBJECT) {
+                    throw new IllegalArgumentException("Documents must start with a START_OBJECT, got " + sourceParser.currentToken());
+                }
+                while (sourceParser.nextToken() == Token.FIELD_NAME) {
+                    sourceParser.nextToken();
+                    String fieldName = sourceParser.currentName();
+                    BytesStreamOutput os = new BytesStreamOutput();
+                    try (XContentGenerator generator = JsonXContent.jsonXContent.createGenerator(os)) {
+                        generator.copyCurrentStructure(sourceParser);
+                    }
+                    fields.add(new StoredField(NAME + "." + fieldName, os.bytes().toBytesRef()));
+                }
+                if (sourceParser.currentToken() != Token.END_OBJECT) {
+                    throw new IllegalArgumentException("Documents must end with a END_OBJECT, but found a " + sourceParser.currentToken());
+                }
+                if (sourceParser.nextToken() != null) {
+                    throw new IllegalArgumentException("Documents must end with a END_OBJECT, but found a " + sourceParser.currentToken() + " after the end");
+                }
+            }
         } else {
             source = null;
         }

It doesn't pass all tests but it's enough to experiment. The approach is pretty conservative: the order of fields is preserved, and values are stored exactly as they were provided in the original source, it might just drop extra spaces, line feeds or comments.

I indexed the geonames dataset and ran the disk usage tool on it.

Branch codec Disk usage for stored fields Difference
master BEST_SPEED 199,501,741
patch BEST_SPEED 163,518,144 -18.0%
master BEST_COMPRESSION 115,161,885
patch BEST_COMPRESSION 102,128,485 -11.3%

Disk usage reduction looks interesting, but I think I am even more interested in how this could help improve the simplicity and efficiency of other APIs:

  • we could remove the store option of fields in mappings
  • source filtering wouldn't have to load the entire source in memory anymore

Furthermore, we could further improve disk usage if the user agreed to apply the same accuracy trade-offs to the _source as we already apply to indexed values and doc values (but this would have to be an opt-in I suppose):

  • we could store the number of millis since Epoch for date fieldls instead of the string representation of the date (the date format that was used is lost)
  • for scaled_float fields we could store the underlying long
  • for geo-points we could store a 64-bits long like we do for doc values
  • etc.

@freakingid
Copy link

@jpountz et al., I know this is a really old issue, but I'm trying to figure out why my customer's fetch phase is slow even when using source filtering or _source_includes.

Is _source still stored as one binary field, in Elasticsearch 6.x and 7.x?

If so, I am assuming that when my customer asks his query to get a small field from _source, ES still has to fetch the whole _source from disk before parsing out the field they want.

@mayya-sharipova
Copy link
Contributor

@freakingid You explained it right. _source is still stored as one binary field in Elasticsearch 6.x and 7.x, and ES fetches the whole _source from disk before parsing it as a user requested. Moreover, on the Lucene side for a single document all stored fields are stored together, so there is NO efficient way to fetch all stored values for a particular field for all documents.

We have plans to improve the situation both on the ES side (as described in this issue) and the on the Lucene side.

@xjtushilei
Copy link
Contributor

@mayya-sharipova _source stored as one binary ,so when using the Partial _update API , does it have a big impact on update performance?

I find that update performance is too bad , so I want find some ways to fix it.

@jpountz
Copy link
Contributor Author

jpountz commented Sep 16, 2019

Elasticsearch needs to rebuild the entire _source for updates by design, even partial updates. If this is the bottleneck of your workload, there isn't much that can be done.

@xjtushilei
Copy link
Contributor

Elasticsearch needs to rebuild the entire _source for updates by design, even partial updates. If this is the bottleneck of your workload, there isn't much that can be done.

thanks!

@rjernst rjernst added the Team:Search Meta label for search team label May 4, 2020
@jtibshirani
Copy link
Contributor

I played around with the idea a bit and had more notes to add.

  • On experiments with the geonames dataset I saw an increasing in indexing latency of ~20% with the strategy. This is unfortunate, I wonder if it's possible to mitigate this?

    Baseline: _source enabled.
    Contender: _source disabled, mark every top-level field as store: true. Note there are around ~20 stored fields.

    |                       Metric |         Task |    Baseline | Contender |   Unit |
    |-----------------------------:|-------------:|------------:|----------:|-------:|
    | Median Throughput            | index-append |     87040.7 |   75482.7 | docs/s |
    | 50th percentile service time | index-append |     321.022 |   395.966 |     ms |
    | 90th percentile service time | index-append |     538.993 |   875.26  |     ms |
    
  • The reduction in disk usage seems to come from the fact we avoid storing the top-level field names. On data with a few levels of object fields like the metricbeat benchmark, I didn't see a noticeable change.

Separately from performance, I wonder if this strategy would sufficiently address users' concerns around _source loading. It only applies to top-level fields, so it wouldn't cover cases where the stored field was part of an object field. It adds some complexity to the mental model -- to debug an issue around source loading performance, a user should understand that we only split apart top-level fields (which are not given special status in other APIs/ operations).

@jpountz
Copy link
Contributor Author

jpountz commented Sep 22, 2020

Contender: _source disabled, mark every top-level field as store: true. Note there are around ~20 stored fields.

A problem with disabling _source is that it disables some optimizations due to the need to wrap readers at merging time to remove the _recovery_source field. Maybe try the same benchmark without disabling _source to see whether it makes any difference?

I wonder if this strategy would sufficiently address users' concerns around _source loading. It only applies to top-level fields, so it wouldn't cover cases where the stored field was part of an object field. It adds some complexity to the mental model -- to debug an issue around source loading performance, a user should understand that we only split apart top-level fields (which are not given special status in other APIs/ operations).

Agreed this is a tricky trade-off. Making _source storage more complex would also make long-term backward compatibility more challenging. And even if we implemented that change to only parse the parts we need from _source, we'd still need to decompress the entire document (if not several entire documents) to access a single field given how stored fields work today. It's not clear to me how much of the retrieval time can be attributed to parsing JSON, but if that's small enough, maybe we should just close this issue. That would also mean that we could stop encouraging storing large fields on their own, and maybe even remove support for store: true in mappings entirely.

@jtibshirani
Copy link
Contributor

jtibshirani commented Oct 22, 2020

A problem with disabling _source is that it disables some optimizations due to the need to wrap readers at merging time to remove the _recovery_source field.

I had forgotten about _recovery_source! I tried the same comparison, but for the contender I applied the patch instead of disabling _source and adding stored fields. The performance hit looked similar:

|                       Metric |         Task |    Baseline |   Contender |     Diff |   Unit |
|-----------------------------:|-------------:|------------:|------------:|---------:|-------:|
|            Median Throughput | index-append |     57593.2 |     48992.1 | -8601.09 | docs/s |
| 50th percentile service time | index-append |     558.303 |     681.072 |  122.769 |     ms |
| 90th percentile service time | index-append |     1206.89 |     1312.22 |  105.337 |     ms |

It's not clear to me how much of the retrieval time can be attributed to parsing JSON, but if that's small enough, maybe we should just close this issue.

I ran the metricbeat track with a single field system.process.cgroup.memory.id set to store: true, and compared loading the field through source filtering (baseline) vs. stored_fields (contender). To ensure differences are detectable, I made sure all documents contained the field and set size: 100. This is a good test for JSON parsing overhead because each document has ~200 fields. There was only a small difference (this is pretty consistent):

|                       Metric |         Task |    Baseline |   Contender |     Diff |   Unit |
|-----------------------------:|-------------:|------------:|------------:|---------:|-------:|
| 50th percentile service time |   load-field |     13.1914 |     11.9647 |   1.2266 |     ms |
| 90th percentile service time |   load-field |     14.6916 |      12.729 |   1.9626 |     ms |

@jpountz
Copy link
Contributor Author

jpountz commented May 24, 2022

I wonder if we should close this issue in favor of synthetic source (#86603). While synthetic source is a different feature, mappings could be configured to mark every field as stored and enable synthetic source, which would give the ability to load a subset of the field without loading and parsing an entire JSON document in memory?

The downside compared to the proposal on this issue is that you would lose the JSON structure, but maybe it's ok?

@nik9000
Copy link
Member

nik9000 commented May 24, 2022

The downside compared to the proposal on this issue is that you would lose the JSON structure, but maybe it's ok?

I think synthetic source could be a fairly big thing that will eventually morph into covering more cases. I wouldn't be surprised if we ended up in a case more like this one day built on top of the synthetic source infrastructure. We're already talking about support for stored fields in synthetic source. It's not too much further to get here.

But in terms of things were doing in the short term for source storage I think synthetic source is it. In that sense I think we can close this, yeah.

@jpountz jpountz closed this as completed May 24, 2022
@javanna javanna added Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement high hanging fruit :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch
Projects
None yet
Development

No branches or pull requests

10 participants