Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
m1guelpf committed Jul 4, 2023
1 parent ebc2c74 commit f3479eb
Showing 1 changed file with 7 additions and 10 deletions.
17 changes: 7 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,32 +11,29 @@

## Features
- __Tiny__: It's in the name. It's literally just an axum server. Extremely easy to customize, around 600 lines of code.
- __Fast__: Tinyvector will have comparable speed to advanced vector databases when it comes to speed on small to medium datasets.
- __Fast__: Tinyvector should have comparable speed to advanced vector databases when it comes on small to medium datasets, and slightly better accuracy.
- __Vertically Scales__: Tinyvector stores all indexes in memory for fast querying. Very easy to scale up to 100 million+ vector dimensions without issue.
- __Open Source__: MIT Licensed, free forever.

### Soon
- __Integrated Models__: Soon you won't have to bring your own vectors, just generate them on the server automaticaly. Will support SBert, Hugging Face models, OpenAI, Cohere, etc.
- __Python/JS Client__: We'll add a comprehensive Python and Javascript package for easy integration with tinyvector in the next two weeks.

## We're better than ...
## Why use tinyvector?

In most cases, most vector databases are overkill for something simple like:
1. Using embeddings to chat with your documents. Most document search is nowhere close to what you'd need to justify accelerating search speed with [HNSW](https://github.com/nmslib/hnswlib) or [FAISS](https://github.com/facebookresearch/faiss).
2. Doing search for your website or store. Unless you're selling 1,000,000 items, you don't need Pinecone.
3. Performing complex search queries on a very large database. Even if you have 2 million embeddings, this might still be the better option due to vector databases struggling with complex filtering. Tinyvector doesn't support metadata/filtering just yet, but it's very easy for you to add that yourself.
Most vector databases are overkill for simple setups. For example:
- Using embeddings to chat with your documents. Most document search is nowhere close to what you'd need to justify accelerating search speed with [HNSW](https://github.com/nmslib/hnswlib) or [FAISS](https://github.com/facebookresearch/faiss).
- Doing search for your website or store. Unless you're selling 1,000,000 items, you don't need Pinecone.

## Embeddings?

What are embeddings?

> As simple as possible: Embeddings are a way to compare similar things, in the same way humans compare similar things, by converting text into a small list of numbers. Similar pieces of text will have similar numbers, different ones have very different numbers.
Embeddings are a way to compare similar things, in the same way humans compare similar things, by converting text into a small list of numbers. Similar pieces of text will have similar numbers, different ones have very different numbers.

Read OpenAI's [explanation](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings).

## Acknowledgements

- Will Depue's [tinyvector](https://twitter.com/willdepue/status/1675796236304252928) (python+sqlite+numpy) inspired this project (and the name). Will also contributed plenty of ideas to optimize performance.
- Will Depue's [tinyvector](https://twitter.com/willdepue/status/1675796236304252928) (python+sqlite+numpy) inspired me to build a vector database from scratch (and borrow the name). Will also contributed plenty of ideas to optimize performance.

## License

Expand Down

0 comments on commit f3479eb

Please sign in to comment.