Skip to content

Releases: quickwit-oss/tantivy

Tantivy v0.22

12 Apr 05:05
17d5869
Compare
Choose a tag to compare

What's Changed

Tantivy 0.22 will be able to read indices created with Tantivy 0.21.

Bugfixes

  • Fix null byte handling in JSON paths (null bytes in json keys caused panic during indexing) #2345(@PSeitz)
  • Fix bug that can cause get_docids_for_value_range to panic. #2295(@fulmicoton)
  • Avoid 1 document indices by increase min memory to 15MB for indexing #2176(@PSeitz)
  • Fix merge panic for JSON fields #2284(@PSeitz)
  • Fix bug occuring when merging JSON object indexed with positions. #2253(@fulmicoton)
  • Fix empty DateHistogram gap bug #2183(@PSeitz)
  • Fix range query end check (fields with less than 1 value per doc are affected) #2226(@PSeitz)
  • Handle exclusive out of bounds ranges on fastfield range queries #2174(@PSeitz)

Breaking API Changes

Features/Improvements

New Contributors

Full Changelog: 0.21...v0.22.0

Tantivy v0.21

01 Sep 12:01
49448b3
Compare
Choose a tag to compare

Bugfixes

  • Fix track fast field memory consumption, which led to higher memory consumption than the budget allowed during indexing #2148#2147(@PSeitz)
  • Fix a regression from 0.20 where sort index by date wasn't working anymore #2124(@PSeitz)
  • Fix getting the root facet on the FacetCollector. #2086(@adamreichold)
  • Align numerical type priority order of columnar and query. #2088(@fmassot)

Breaking Changes

Features/Improvements

New Contributors

Full Changelog: 0.20.2...0.21

0.20.1

12 Jun 03:26
7220df8
Compare
Choose a tag to compare

What's Changed

Tantivy v0.20

09 Jun 13:11
e3eacb4
Compare
Choose a tag to compare

What's Changed

Bugfixes

  • Fix phrase queries with slop (slop supports now transpositions, algorithm that carries slop so far for num terms > 2) #2031#2020(@PSeitz)
  • Handle error for exists on MMapDirectory #1988 (@PSeitz)
  • Aggregation
    • Fix min doc_count empty merge bug #2057 (@PSeitz)
    • Fix: Sort order for term aggregations (sort order on key was inverted) #1858 (@PSeitz)

Features/Improvements

  • Add PhrasePrefixQuery #1842 (@trinity-1686a)
  • Add coerce option for text and numbers types (convert the value instead of returning an error during indexing) #1904 (@PSeitz)
  • Add regex tokenizer #1759(@mkleen)
  • Move tokenizer API to seperate crate. Having a seperate crate with a stable API will allow us to use tokenizers with different tantivy versions. #1767 (@PSeitz)
  • Columnar crate: New fast field handling (@fulmicoton @PSeitz) #1806#1809
    • Support for fast fields with optional values. Previously tantivy supported only single-valued and multi-value fast fields. The encoding of optional fast fields is now very compact.
    • Fast field Support for JSON (schemaless fast fields). Support multiple types on the same column. #1876 (@fulmicoton)
    • Unified access for fast fields over different cardinalities.
    • Unified storage for typed and untyped fields.
    • Move fastfield codecs into columnar. #1782 (@fulmicoton)
    • Sparse dense index for optional values #1716 (@PSeitz)
    • Switch to nanosecond precision in DateTime fastfield #2016 (@PSeitz)
  • Aggregation
    • Add date_histogram aggregation (only fixed_interval for now) #1900 (@PSeitz)
    • Add percentiles aggregations #1984 (@PSeitz)
    • [breaking] Drop JSON support on intermediate agg result (we use postcard as format in quickwit to send intermediate results) #1992 (@PSeitz)
    • Set memory limit in bytes for aggregations after which they abort (Previously there was only the bucket limit) #1942#1957(@PSeitz)
    • Add support for u64,i64,f64 fields in term aggregation #1883 (@PSeitz)
    • Add count, min, max, and sum aggregations #1794 (@guilload)
    • Switch to Aggregation without serde_untagged => better deserialization errors. #2003 (@PSeitz)
    • Switch to ms in histogram for date type (ES compatibility) #2045 (@PSeitz)
    • Reduce term aggregation memory consumption #2013 (@PSeitz)
    • Reduce agg memory consumption: Replace generic aggregation collector (which has a high memory requirement per instance) in aggregation tree with optimized versions behind a trait.
    • Split term collection count and sub_agg (Faster term agg with less memory consumption for cases without sub-aggs) #1921 (@PSeitz)
    • Schemaless aggregations: In combination with stacker tantivy supports now schemaless aggregations via the JSON type.
    • Perf: Fetch blocks of vals in aggregation for all cardinality #1950 (@PSeitz)
  • Searcher with disabled scoring via EnableScoring::Disabled #1780 (@shikhar)
  • Enable tokenizer on json fields #2053 (@PSeitz)
  • Enforcing "NOT" and "-" queries consistency in UserInputAst #1609 (@denis Bazhenov)
  • Faster indexing
  • Faster search
    • Work in batches of docs on the SegmentCollector (Only for cases without score for now) #1937 (@PSeitz)
    • Faster fast field range queries using SIMD #1954 (@fulmicoton)
    • Improve fast field range query performance #1864 (@PSeitz)
  • Make BM25 scoring more flexible #1855 (@alexcole)
  • Switch fs2 to fs4 as it is now unmaintained and does not support illumos #1944 (@Toasterson)
  • Made BooleanWeight and BoostWeight public #1991 (@fulmicoton)
  • Make index compatible with virtual drives on Windows #1843 (@yukun Guo)
  • Auto downgrade index record option, instead of vint error #1857 (@PSeitz)
  • Enable range query on fast field for u64 compatible types #1762 (@PSeitz) [#1876]
  • sstable
  • Add seperate tokenizer manager for fast fields #2019 (@PSeitz)
  • Make construction of LevenshteinAutomatonBuilder for FuzzyTermQuery instances lazy. #1756 (@adamreichold)
  • Added support for madvise when opening an mmaped Index #2036 (@fulmicoton)
  • Rename DatePrecision to DateTimePrecision #2051 (@guilload)
  • Query Parser
    • Quotation mark can now be used for phrase queries. #2050 (@fulmicoton)
    • PhrasePrefixQuery is supported in the query parser via: field:"phrase ter"* #2044 (@adamreichold)
  • Docs

New Contributors

Read more

Tantivy v0.19.2

10 Feb 04:26
Compare
Choose a tag to compare

Fixes an issue in the skip list deserialization, which deserialized the byte start offset incorrectly as u32.
get_doc will fail for any docs that live in a block with start offset larger than u32::MAX (~4GB).
Causes index corruption, if a segment with a doc store file larger 4GB is merged. (@PSeitz)

Tantivy v0.19.1

13 Jan 05:56
Compare
Choose a tag to compare

Hotfix on handling user input for get_docid_for_value_range (@PSeitz)

Tantivy v0.19

09 Jan 16:05
2c50b02
Compare
Choose a tag to compare

What's Changed

Bugfixes

  • Fix missing fieldnorms for u64, i64, f64, bool, bytes and date #1620 (@PSeitz)
  • Fix interpolation overflow in linear interpolation fastfield codec #1480 (@PSeitz @fulmicoton)

Features/Improvements

New Contributors

Full Changelog: 0.18...0.19

Tantivy v0.18.1

20 Oct 01:38
Compare
Choose a tag to compare

Tantivy v0.18

26 May 09:32
Compare
Choose a tag to compare
  • For date values chrono has been replaced with time (@uklotzde) #1304
  • Add histgram aggregation (@PSeitz)
  • Add support for fastfield on text fields (@PSeitz)
  • Add terms aggregation (@PSeitz)
  • Add support for zstd compression (@kryesh)

Tantivy v0.17

09 Mar 01:02
Compare
Choose a tag to compare
  • LogMergePolicy now triggers merges if the ratio of deleted documents reaches a threshold (@shikhar @fulmicoton) #115
  • Adds a searcher Warmer API (@shikhar @fulmicoton)
  • Change to non-strict schema. Ignore fields in data which are not defined in schema. Previously this returned an error. #1211
  • Facets are necessarily indexed. Existing index with indexed facets should work out of the box. Index without facets that are marked with index: false should be broken (but they were already broken in a sense). (@fulmicoton) #1195 .
  • Bugfix that could in theory impact durability in theory on some filesystems #1224
  • Schema now offers not indexing fieldnorms (@lpouget) #922
  • Reduce the number of fsync calls #1225
  • Fix opening bytes index with dynamic codec (@PSeitz) #1278
  • Added an aggregation collector compatible with Elasticsearch (@PSeitz)
  • Added a JSON schema type @fulmicoton #1251
  • Added support for slop in phrase queries @halvorboe #1068