Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Reindex is break can not copy all documents from source #7878

Closed
Hailong-am opened this issue Jun 2, 2023 · 4 comments · Fixed by #7967
Closed

[BUG] Reindex is break can not copy all documents from source #7878

Hailong-am opened this issue Jun 2, 2023 · 4 comments · Fixed by #7967
Labels
bug Something isn't working Search Search query, autocomplete ...etc

Comments

@Hailong-am
Copy link

Hailong-am commented Jun 2, 2023

Describe the bug
Reindex only copy partial of source documents to destination, and seems this is only happened for index opensearch_dashboards_sample_data_logs which OSD provided as an example dataset.

To Reproduce
Steps to reproduce the behavior:

  1. Usig docker-compose.yml to spinup OpenSearch and OpenSearch dashboard

  2. Import sample web logs data from OpenSearch dashboards
    image

  3. perform reindex for index opensearch_dashboards_sample_data_logs

POST _reindex
{
  "source": {
    "index": "opensearch_dashboards_sample_data_logs"
  },
  "dest": {
    "index": "reindex-logs-20230602"
  }
}
  1. see the result, total have 14074 records, but reindex only copy 3227 documents.
{
  "took": 871,
  "timed_out": false,
  "total": 14074,
  "updated": 0,
  "created": 3227,
  "deleted": 0,
  "batches": 4,
  "version_conflicts": 0,
  "noops": 0,
  "retries": {
    "bulk": 0,
    "search": 0
  },
  "throttled_millis": 0,
  "requests_per_second": -1,
  "throttled_until_millis": 0,
  "failures": []
}

Expected behavior
Reindex should copy all documents from source index to dest, that say should 14074 documents reindexed.

Plugins
using docker-compose.yml

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • OS: Amazon linux
  • Version 2.8, main

Additional context

OpenSearch 2.7 don't have this issue, test the same index, work as expected.

To see the segments of index

GET _cat/segments/opensearch_dashboards_sample_data_logs?v

index                                  shard prirep ip         segment generation docs.count docs.deleted    size size.memory committed searchable version compound
opensearch_dashboards_sample_data_logs 0     p      172.17.0.3 _9               9       2500            0   1.6mb           0 true      true       9.6.0   true
opensearch_dashboards_sample_data_logs 0     p      172.17.0.3 _b              11        574            0 419.1kb           0 true      true       9.6.0   true
opensearch_dashboards_sample_data_logs 0     p      172.17.0.3 _c              12      11000            0   5.7mb           0 true      true       9.6.0   false
opensearch_dashboards_sample_data_logs 0     r      172.17.0.4 _a              10       2347            0   1.5mb           0 true      true       9.6.0   true
opensearch_dashboards_sample_data_logs 0     r      172.17.0.4 _b              11       3227            0     2mb           0 true      true       9.6.0   true
opensearch_dashboards_sample_data_logs 0     r      172.17.0.4 _c              12       8500            0   4.4mb           0 true      true       9.6.0   false

index mapping

{
  "properties": {
    "@timestamp": {
      "path": "timestamp",
      "type": "alias"
    },
    "agent": {
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        }
      },
      "type": "text"
    },
    "bytes": {
      "type": "long"
    },
    "clientip": {
      "type": "ip"
    },
    "event": {
      "type": "object",
      "properties": {
        "dataset": {
          "type": "keyword"
        }
      }
    },
    "extension": {
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        }
      },
      "type": "text"
    },
    "geo": {
      "type": "object",
      "properties": {
        "coordinates": {
          "type": "geo_point"
        },
        "dest": {
          "type": "keyword"
        },
        "src": {
          "type": "keyword"
        },
        "srcdest": {
          "type": "keyword"
        }
      }
    },
    "host": {
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        }
      },
      "type": "text"
    },
    "index": {
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        }
      },
      "type": "text"
    },
    "ip": {
      "type": "ip"
    },
    "machine": {
      "type": "object",
      "properties": {
        "os": {
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          },
          "type": "text"
        },
        "ram": {
          "type": "long"
        }
      }
    },
    "memory": {
      "type": "double"
    },
    "message": {
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        }
      },
      "type": "text"
    },
    "phpmemory": {
      "type": "long"
    },
    "referer": {
      "type": "keyword"
    },
    "request": {
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        }
      },
      "type": "text"
    },
    "response": {
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        }
      },
      "type": "text"
    },
    "tags": {
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        }
      },
      "type": "text"
    },
    "timestamp": {
      "type": "date"
    },
    "url": {
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        }
      },
      "type": "text"
    },
    "utc_time": {
      "type": "date"
    }
  }
}

seems that only segments 172.17.0.4 _b 11 3227 are searched

@Hailong-am Hailong-am added bug Something isn't working untriaged labels Jun 2, 2023
@Hailong-am Hailong-am changed the title [BUG] Reindex is break can't not copy all documents from source [BUG] Reindex is break can not copy all documents from source Jun 2, 2023
@Hailong-am
Copy link
Author

Hailong-am commented Jun 2, 2023

@gashutos seems related to #7244

@gashutos
Copy link
Contributor

gashutos commented Jun 2, 2023

Most likely this can not be the case, if you have set up, can you quickly verify reverting this commit ?
This commit only changes sequence of segments, but searches all segment... @Hailong-am

@Hailong-am
Copy link
Author

Hailong-am commented Jun 2, 2023

Most likely this can not be the case, if you have set up, can you quickly verify reverting this commit ? This commit only changes sequence of segments, but searches all segment... @Hailong-am

revert this segment order change, issue has gone. @gashutos @nknize may have your insights here to why the sequence of segment impact query results?

andrross added a commit to andrross/OpenSearch that referenced this issue Jun 2, 2023
…verse segment read (opensearch-project#7244)"

This reverts commit 4c98b3d.

Reverting due to issue reported in opensearch-project#7878.
andrross added a commit to andrross/OpenSearch that referenced this issue Jun 2, 2023
…verse segment read (opensearch-project#7244)"

This reverts commit 4c98b3d.

Reverting due to issue reported in opensearch-project#7878.

Signed-off-by: Andrew Ross <andrross@amazon.com>
andrross added a commit to andrross/OpenSearch that referenced this issue Jun 2, 2023
…verse segment read (opensearch-project#7244)"

This reverts commit 4c98b3d.

Reverting due to issue reported in opensearch-project#7878.

Signed-off-by: Andrew Ross <andrross@amazon.com>
andrross added a commit to andrross/OpenSearch that referenced this issue Jun 2, 2023
…verse segment read (opensearch-project#7244)"

This reverts commit 4c98b3d.

Reverting due to issue reported in opensearch-project#7878.

Signed-off-by: Andrew Ross <andrross@amazon.com>
andrross added a commit that referenced this issue Jun 2, 2023
…verse segment read (#7244)" (#7892)

This reverts commit 4c98b3d.

Reverting due to issue reported in #7878.

Signed-off-by: Andrew Ross <andrross@amazon.com>
opensearch-trigger-bot bot pushed a commit that referenced this issue Jun 2, 2023
…verse segment read (#7244)" (#7892)

This reverts commit 4c98b3d.

Reverting due to issue reported in #7878.

Signed-off-by: Andrew Ross <andrross@amazon.com>
(cherry picked from commit bb26536)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
andrross added a commit that referenced this issue Jun 2, 2023
…verse segment read (#7244)" (#7893)

This reverts commit 4c98b3d.

Reverting due to issue reported in #7878.

Signed-off-by: Andrew Ross <andrross@amazon.com>
reta pushed a commit that referenced this issue Jun 2, 2023
…tion through re… (#7895)

* Revert "Time series based workload desc order optimization through reverse segment read (#7244)" (#7892)

This reverts commit 4c98b3d.

Reverting due to issue reported in #7878.

Signed-off-by: Andrew Ross <andrross@amazon.com>
(cherry picked from commit bb26536)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Remove unused imports

Signed-off-by: Andrew Ross <andrross@amazon.com>

---------

Signed-off-by: Andrew Ross <andrross@amazon.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Andrew Ross <andrross@amazon.com>
@andrross andrross added the Search Search query, autocomplete ...etc label Jun 5, 2023
@sejli sejli removed the untriaged label Jun 7, 2023
@sejli
Copy link
Member

sejli commented Jun 7, 2023

Fixed by reverting in #7892, closing.

@sejli sejli closed this as completed Jun 7, 2023
gaiksaya pushed a commit to gaiksaya/OpenSearch that referenced this issue Jun 26, 2023
…tion through re… (opensearch-project#7895)

* Revert "Time series based workload desc order optimization through reverse segment read (opensearch-project#7244)" (opensearch-project#7892)

This reverts commit 4c98b3d.

Reverting due to issue reported in opensearch-project#7878.

Signed-off-by: Andrew Ross <andrross@amazon.com>
(cherry picked from commit bb26536)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Remove unused imports

Signed-off-by: Andrew Ross <andrross@amazon.com>

---------

Signed-off-by: Andrew Ross <andrross@amazon.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Andrew Ross <andrross@amazon.com>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this issue Apr 25, 2024
…verse segment read (opensearch-project#7244)" (opensearch-project#7892)

This reverts commit 4c98b3d.

Reverting due to issue reported in opensearch-project#7878.

Signed-off-by: Andrew Ross <andrross@amazon.com>
Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Search Search query, autocomplete ...etc
Projects
Status: Done
4 participants