Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storage of Inverted Index Lists in Elasticsearch #106742

Open
yangyujieqqcom opened this issue Mar 26, 2024 · 2 comments
Open

Storage of Inverted Index Lists in Elasticsearch #106742

yangyujieqqcom opened this issue Mar 26, 2024 · 2 comments

Comments

@yangyujieqqcom
Copy link

yangyujieqqcom commented Mar 26, 2024

Inverted index, which maps terms to document lists, is a common data structure. In Elasticsearch (ES), the unique identifier (_id) for documents is typically a UUID (Universally Unique Identifier) type string by default, not a numeric type. However, at the underlying level, Elasticsearch does generate a numeric type unique identifier (_uid) for each document, which can be used for optimization techniques such as compression algorithms, RBM algorithms, or bitset mechanisms.

Specifically, Elasticsearch internally employs a method called Globally Unique Identifier (GUID) to generate a unique identifier (_uid) for each document. This UID is a 128-bit number usually represented as a hexadecimal string, but it can be converted into a numeric type for the purpose of compression algorithms’ optimization.

For optimization techniques like compression algorithms, RBM algorithms, or bitset mechanisms, the UID can be converted into a numeric type. For instance, converting the hexadecimal UID into an integer type and then applying compression algorithms or other optimization techniques. This helps reduce storage space and improves query performance.

In summary, while the document unique identifier (_id) in Elasticsearch defaults to a UUID type string, at the underlying level, a numeric type unique identifier (_uid) is additionally generated for each document, which can be utilized for various compression algorithms and optimization techniques.

Tasks

No tasks being tracked yet.
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Mar 26, 2024
@gmarouli gmarouli added >enhancement :Search/Search Search-related issues that do not fall into other categories labels Mar 26, 2024
@elasticsearchmachine elasticsearchmachine added Team:Search Meta label for search team and removed needs:triage Requires assignment of a team area label labels Mar 26, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@javanna javanna added :StorageEngine/Metrics You know, for Metrics and removed :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team labels Mar 26, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants