Skip to content

meysambagheri96/IoTHub.Database

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IoTHub.Database

Term-Based Search Algorithms:

Inverted Index: The inverted index is a data structure that creates a mapping of each term in the corpus to the documents that contain that term. It is a popular algorithm used in search engines like Lucene and Elasticsearch. The inverted index allows for fast term-based search by precomputing the document-term relationship and avoiding the need to scan the entire corpus for each search query.

Bloom Filters: Bloom filters are probabilistic data structures that can efficiently test whether an element is a member of a set. They can be used to reduce the number of disk reads required for term-based search by precomputing the set of documents that contain a term using a bloom filter. This allows for faster search since only the documents that are likely to contain the term need to be retrieved.

Term Partitioning: Term partitioning is the process of dividing the index into smaller subsets based on the terms. This is done to reduce the search space for each query and speed up the search process. By partitioning the index based on the terms, the search engine can quickly identify the subset of documents that contain the query terms, reducing the number of disk reads required.

Distributed Search: Distributed search is the process of distributing the search workload across multiple nodes or machines. This approach allows for parallel processing of the search queries and can significantly reduce the search time for large-scale search applications.

Wildcard-Based Search Algorithms:

Trie-based algorithm: This algorithm uses a trie data structure to store the indexed strings and their corresponding documents. To perform a wildcard search, the trie is traversed recursively, and all the words matching the wildcard expression are returned. The advantages of this algorithm are that it is fast, supports prefix and suffix wildcard searches, and can handle large datasets. However, it requires a significant amount of memory to store the trie, and it may not be suitable for high-dimensional data.

Regular expression-based algorithm: This algorithm uses regular expressions to match the wildcard expression against the indexed strings. The advantages of this algorithm are that it is flexible, supports multiple types of wildcard expressions, and can handle complex patterns. However, it can be slow for large datasets and complex expressions, and it may not be suitable for real-time applications.

N-gram-based algorithm: This algorithm breaks the indexed strings into n-grams (substrings of length n) and stores them in a data structure, such as a hash table or a tree. To perform a wildcard search, the n-grams of the wildcard expression are generated, and the matching n-grams in the data structure are retrieved. The advantages of this algorithm are that it can handle partial matching and misspellings, and it can be used for fuzzy matching. However, it can be slow for large datasets and complex expressions, and it may require a lot of memory to store the n-grams.

Bitap algorithm: This algorithm is a bitwise algorithm that uses a bit vector to perform a fuzzy search. To perform a wildcard search, the bit vector is generated for the wildcard expression, and the bit vector of each indexed string is compared with the wildcard bit vector. The advantages of this algorithm are that it is fast, memory-efficient, and can handle misspellings and transpositions. However, it requires exact matching of the wildcard expression and may not be suitable for complex patterns or real-time applications.

About

IoTHub Scalable In-Memory Database

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages