Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Browse files

Update README.md

  • Loading branch information...
commit 52c7a9eb26b702138b6e46a09fbbf8babb45140e 1 parent 67b072d
@jankotek authored
Showing with 139 additions and 61 deletions.
  1. +139 −61 README.md
View
200 README.md
@@ -1,23 +1,15 @@
-JDBM provides HashMap, TreeMap, HashSet, TreeSet and LinkedList backed up by disk storage.
-Now you can insert more than a billion records into collection without running out of memory.
-JDBM is probably the fastest pure Java database.
+JDBM provides TreeMap, HashMap and other collections backed up by disk storage.
+Now you can handle bilions of items without ever running out of memory.
+JDBM is probably the fastest and the simpliest pure Java database.
-JDBM is tiny with its standalone jar only taking 160 KB (80KB if stripped). Yet it is packed with
-impressive features such as: space effective serialization, transactions, MRU/soft/weak instance cache,
-defragmentation, transparent encryption and compression.
+JDBM is tiny (160KB nodeps jar), but packed with features such as transactions,
+instance cache and space efficient serialization.
+It also has outstanding performance with 1 million inserts per second and 10 million fetches per second (disk based!!).
+It is tightly optimized and has minimal overhead.
+It scales well from Android phone to multi-terrabyte data sets.
-JDBM has outstanding performance; it can insert a million records per second and read them even faster.
-Unlike most of its competitors, it keeps high performance even with a huge number of records (1e12).
-Our tests show that it is 4 times faster than BerkleyDB Java Edition and only 2 times slower than
-Tokyo Cabinet (probably the fastest db written in C++).
-
-Its performance is a side effect of the author's obsession with minimal overhead.
-JDBM runs great even on systems with limited memory and poor Garbage Collector (Android).
-Data are read directly without copying, simply by using mapped memory buffers.
-Most of the logic is implemented using primitive numbers and arrays. JDBM hardly ever creates new object
-instances for its internal use. It also uses advanced techniques to minimise consumed disk space.
-
-Last but not least, JDBM is opensource and free-as-beer under Apache 2.0 license.
+JDBM is opensource and free-as-beer under Apache license.
+There is no catch and no strings attached.
News
====
@@ -28,92 +20,177 @@ News
18th Jan 2012 - Alpha1 released, [announcement](http://kotek.net/blog/jdbm_3.0_alpha_1_released) and
[binary jar](https://github.com/downloads/jankotek/JDBM3/JDBM-3.0-alpha-1.jar)
+Features
+========
+* B*Tree with `ConcurrentNavigableMap` interface
+ * Very fast for sequential read/write.
+ * Small values stored inside tree nodes
+ * Small values stored inside tree nodes, large values lazily fetched.
+ * Self-balancing, great performance even with 1e12 items.
+ * Delta compression on keys
+ * Custom comparators
+* H*Tree with `ConcurrentMap` interface
+ * Optimized for random reads/writes
+ * Small values stored inside tree nodes, large values lazily fetched.
+ * Self-balancing, great performance even with 1e12 items.
+* TreeSet and HashSet which uses BTree and HTree without values
+* LinkedList, which implements bounded BlockingDeque (not implemented yet)
+* Multi code scalability (currently under testing)
+ * Everything is thread safe
+ * Reads should scale linearly with number of cores (as soon as it fits into cache)
+ * All collection implements `Concurrent` interfaces
+ * Some multi-core scalability with `ReentrantReadWriteLock`.
+* Instance cache
+ * If data fits into cache, reads are almost as fast as in-memory collections.
+ * Minimal overhead, works well even with 16MB heap.
+ * Scales well into 64GB RAM heaps.
+ * Various yet simple tunning options
+* Transactions
+ * Single transaction per store, avoids concurrent modification stuff
+ * Transactions are ACID (with limits for single concurrent transaction)
+ * Option to disable transactions for fast inserts/updates
+* Low level key-value store
+ * Various options for on-disk store (NIO, RAF, locking...)
+ * Write performance not affected by store fragmentation
+ * In-memory store option
+ * Can read data from zip file with reasonable performance
+ * Can read data from classpath resource, database is deployable over Java Web Start
+ * Advanced defragmentation
+ * Print store statistics
+ * Transparent data encryption
+ * Only 9 bytes overhead per record (for example BTree node)
+* Space efficient serialization
+ * Custom code for most `java.util` and `java.lang` classes. For example Long(0) takes only single byte
+ * Very small POJO serialization overhead, typically only 3 bytes per class + 1 byte for each field.
+ * Mimic java serialization, fields can be `transient`, all classes needs to implement `Serializable` interface
+ * Supports `Externalizable`
+ * Possible to plug your own `Serializer`
+* Performance
+ * Blazing fast 1 million inserts / 10 million reads per second (on my 5GHz machine, but you should get 300000 insert p.s. easily)
+ * Tightly optimized code
+ * Uses NIO stuff you read about, but never see in action.
+ * Minimal heap usage, prevents `java.lang.OutOfMemoryError: GC overhead limit`
+ * Most logic done with primitives or arrays. Minimal stack usage.
+
+
+
+Introduction
+============
+All classes are contained in package `net.kotek.jdbm`. There are only two important classes: `DBMaker` is builder which configures and opens database. `DB` is database itself, it opens collections and controls transactions. Collections in JDBM mimic their `java.util` counter parts. TreeMap uses an on-disk ordered auto-balanced B*Tree index, LinkedList is stored as self referencing entries and so on. Everything should be thread safe (currently under testing).
+
+Maven Dependency
+----------------
+
+JDBM is not currently in any Maven repository. TODO: We should have soon custom repo with nightly builds.
+
+Quick example
+-------------
-10 seconds intro
-================
- import net.kotek.jdbm.*;
-
- /** create (or open existing) database using builder pattern*/
- String fileName = "helloWorld";
- DB db = DBMaker.openFile(fileName).make();
-
- /** Creates TreeMap which stores data in database. */
- SortedMap<Integer,String> map = db.createTreeMap("mapName");
+ import net.kotek.jdbm.*;
- /** add some stuff to map*/
- map.put(1, "One");
- map.put(2, "Two");
- map.put(3, "Three");
+ //Open database using builder pattern.
+ //All options are available with code autocompletion.
+ DB db = DBMaker.openFile("test")
+ .deleteAfterClose()
+ .enableEncryption("password",false)
+ .make();
+
+ //open an collection, TreeMap has better performance then HashMap
+ SortedMap<Integer,String> map = db.createMap("collectionName");
- System.out.println(map.keySet()); // > [1, 2, 3]
+ map.put(1,"one");
+ map.put(2,"two");
+ //map.keySet() is now [1,2] even before commit
- /** Map changes are not persisted yet, commit them (save to disk) */
- db.commit();
+ db.commit(); //persist changes into disk
- System.out.println(map.keySet()); // > [1, 2, 3]
+ map.put(3,"three");
+ //map.keySet() is now [1,2,3]
+ db.rollback(); //revert recent changes
+ //map.keySet() is now [1,2]
- /** Delete one record. Changes are not commited yet, but are visible. */
- map.remove(2);
+ db.close();
- System.out.println(map.keySet()); // > [1, 3]
+A few quick tricks
+------------------
+* Disabling transaction increases write performance 6x. Do it by `DBMaker.disableTransactions()`. Do not forget to close store correctly in this case!
+* When transactions are enabled all uncommited instances are stored in memory. Make sure you commit on time. It is most common cause of `OutOfMemoryError`.
+* JDBM does not try to reclaim unused space after massive delete, you must call `DB.defrag(false)` yourself.
+* TreeMap has usually better performance then HashMap.
+* JDBM uses instance cache with limited size by default. If you have enought memory and large store, use unbounded cache: `DBMaker.enableHardCache()`
+* JDBM is optimized for small size records. Sizes: 16 bytes is recommended, 32KB is reasonable maximum, 8MB is hard limit.
+* JDBM scales well up to 1e12 records. Batch insert overnight creates multi-terrabyte store.
- /** Did not like the change. Rollback to last commit (undo record remove). */
- db.rollback();
+DBMaker
+-------
- /** Key 2 was recovered */
- System.out.println(map.keySet()); // > [1, 2, 3]
+TODO
- /** close record manager */
- db.close();
+DB
+--
+TODO
-10 minutes intro
-================
+Collections
+-----------
-JDBM package exports only two important classes: `DBMaker` is builder which configures and opens database. `DB` is database itself, it creates and opens collections, and controls transactions. Other public classes are just utilities which you may find usefull. Collections in JDBM mimic their `java.util` counter parts, so TreeMap uses an on-disk ordered auto-balanced B*Tree index, LinkedList is stored as self referencing entries and so on. DB and collections should be thread safe (currently under testing).
+TODO
Instance cache
--------------
+
JDBM caches created instances similar way as Hibernate or other ORM frameworks. This greatly reduces serialization overhead and speedups database. There are five cache types, each configurable with method on `DBMaker` builder:
-* **Most Recently Used** (MRU) cache. It has fixed size and stores newest entries. This cache is on by default and is recommended for systems with limited memory. You can configure its size, default size is 2048.
+* **Most Recently Used** (MRU) cache. It is fixed size and stores newest entries. This cache is on by default. You can configure its size, default size is 2048. This cache has lowest GC overhead and may be suprisingly faster then other cache types.
* **No cache**. You may disable instance cache by using `DBMaker.disableCache()`
+* **Hard reference cache**. All instances fetched by JDBM are stored in cache until released. Good with large memory heaps. `Hard` cache is recommended over `soft` and `weak` as it has smaller overhead. Use `DBMaker.enableHardCache()` to enable it.
+
* **Weak reference cache**. Instances are referenced using `WeakReference`. When item is no longer referenced by other instances, it can be discarded by GC. Use `DBMaker.enableWeakCache()` to enable it.
* **Soft reference cache**. Instances are referenced using `SoftReference`. Similar to `WeakReference` but holds longer, until systems starts running out of memory. Use `DBMaker.enableSoftCache()` to enable it.
-* **Hard reference cache**. All instances fetched by JDBM are stored in cache until released. GC has no power to discard them. Use `DBMaker.enableHardCache()` to enable it.
-MRU cache is on by default. If you have enought memory (>256M) use Hard reference cache as it has smallest overhead (faster GC and no reference queue).
+With Weak/Soft/Hard cache JDBM starts backround cleanup thread. It also checks memory usage every 10 seconds, if free memory is bellow 25%, it clears cache. Our tests shows that GC is not fast enought to prevent `OutOfMemoryError`. This may be disabled with `DBMaker.disableCacheAutoClear()`.
-With Weak/Soft/Hard cache JDBM checks memory usage every 10 seconds and if free memory is bellow 25%, it clears cache. Our tests shows that GC may not be fast enought to prevent out of memory exceptions. You may also clear cache manually using `DB.clearCache()`, when moving from one type of data to other.
+You may clear cache manually using `DB.clearCache()`. This is usefull after massive delete, or when you are moving from one type of data to other.
Transactions
------------
-JDBM supports single transaction per store. It does not have multiple concurrent transactions with row/table locks, pessimistic locking and similar stuff. This trade off greatly simplifies design and speeds up operations.
-Transaction implementation is sound and solid. Uncommited data are stored in memory. During commit, data are appended to end of transaction log file. It is safe, as append operation hardly ever corrupts file. After commit is finished, data are replayed from transaction log file into main storage file. If users calls rollback, transaction log file is discarded.
+JDBM supports single transaction per store. It does not have multiple concurrent transactions with row/table locks, pessimistic locking and similar stuff. This trade off greatly simplifies design and speeds up operations. Transactions are still 'ACID' but in limited way.
-Keeping transaction log file brings some overhead. It is possible to disable transaction and write changes directly into main storage file. In this case no effort is made to protect file from corruption, all is sacrificed for maximal speed. It is absolutely necessary to properly close storage before exit. You may disable transactions by using `DBMaker.disableTransactions()`.
+Transaction implementation is sound and solid. Uncommited data are stored in memory. Then during commit appended to end of transaction log file. It is safe, as append operation hardly ever corrupts file. After commit is finished, data are replayed from transaction log file into main storage file. If users calls rollback, transaction log file is discarded.
-Uncommited data are stored in memory and flushed to disk during commit. So with large transactions you may run out of memory easily. With disabled transactions data are stored in 10 MB memory buffer and flushed to main storage file when buffer is filled.
+Keeping transaction log file brings some overhead. It is possible to disable transaction and write changes directly into main storage file. It makes inserts and updates about 6x faster. In this case no effort is made to protect file from corruption, all is sacrificed for maximal speed. It is absolutely necessary to properly close storage before exit. You may disable transactions by using `DBMaker.disableTransactions()`.
+
+Uncommited instances are stored in memory and flushed to disk during commit. So with large transactions you may run out of memory easily. With disabled transactions data are stored in 10 MB memory buffer and flushed to main storage file when buffer is filled.
Serialization
-------------
-JDBM has its own space-efficient serialization which tries to mimic standard implementation. It mimics java serialization, so you still have to implement `Serializable` interface. It also handles cyclic references and some other advanced stuff.
+
+JDBM has its own space-efficient serialization which tries to mimic standard implementation. All classes must implement `Serializable` interface. You may exclude field from serialization by `transient` keyword. Our serialization also handles cyclic references and some other advanced stuff. You may use your own binary format with `Externalizable` interface or custom `Serializer`.
JDBM has custom serialization code for most classes in `java.lang` and `java.util` packages. For `Date` JDBM writes only 9 bytes: 1-byte-long serialization header and 8-byte-long timestamp. For `true`, `String("")` or `Long(3)` JDBM writes only single-byte serialization header. For array list and other collections JDBM writes serialization header, packed size and data. Custom serializers have maximal space efficiency and low overhead.
-Standard java serialization stores class structure data (field names, types...) with record data. This generates huge overhead which multiplies with number of records. JDBM serialization stores class structure data in single space and record data only contains reference. So space overhead with POJOs is typically only 3 bytes per class + 1 byte for each field.
+Standard java serialization stores class structure data (field names, types...) with record data. This generates huge overhead which multiplies with number of records. JDBM serialization stores class structure data in single space and record data only contains reference. So space overhead for POJOs is typically only 3 bytes per class + 1 byte for each field.
-Our serialization is designed to be very fast on small chunks of data (a few POJOs glued together). With couple of thousands nodes in object tree it becomes slow. This affects single key or value only, and does not apply to JDBM collections. Maximal record size in JDBM is 8 MB anyway, so it is good practise to store only small key/value and use filesystem for larger data.
+Our serialization is designed to be very fast on small chunks of data (a few POJOs glued together). With couple of thousands nodes in object tree it becomes slow (N^2 scalability). Maximal record size in JDBM is 8 MB, so it is good practise to store only small key/value in database. You should always use filesystem for data larger then 500KB.
Defragmentation
---------------
+Store gets fragmented. JDBM is well designed, so this does not slows down write/update/delete operations. But fragmentation slows down read operations as more data needs to be readed from disk. JDBM does not do any sort of magic to reclaim unused data. It relies on user to call `DB.defrag` periodically or after massive update/delete/inserts.
+
+Defrag can be called at runtime, but `DB.defrag` methods blocks other read/writes until it finishes. Defrag basically recreates copyes data from one store to second store. Then it deletes first store and renames second.
+
+Defragnentation has two modes controlled by `DB.defrag(boolean fullDefrag)` parameter:
+
+**Quick defrag** is designed to be as fast as possible. It only reclaims unused space (compacts store), but does not reorganize data inside store. It copyes all data from one store to other, without empty spaces between records. It is very fast, limited only by disk sequential write speed. Call it by `DB.defrag(false)`
+
+**Full defrag** is designed to make store as fast as possible. It reorganizes data layout, so nodes from single collection are stored close to each other. This makes future reads from store faster as less data needs to be read. Full defrag is much slower than quick defrag, as it traverses and copies all collections unsequentially.
Troubleshooting
@@ -129,8 +206,8 @@ JDBM keeps uncommited data in memory, so you may need to commit more often. If y
**OutOfMemoryError: GC overhead limit exceeded**
Your app is creating new object instances faster then GC can collect them. When using Soft/Weak cache use Hard cache to reduce GC overhead (is auto cleared when free memory is low). There is JVM parameter to disable this assertion.
-**OverlappingFileLockException**
-You are trying to open file already opened by another JDBM. Make sure that you `DB.close()` store correctly, operating system may leave lock after JVM is terminated. You may try `DBMaker.useRandomAccessFile()` which is slower, but does not use such aggressive locking. In read-only mode you can also open store multiple times.
+**File locking, OverlappingFileLockException, some IOError**
+You are trying to open file already opened by another JDBM. Make sure that you `DB.close()` store correctly, operating system may leave lock after JVM is terminated. You may try `DBMaker.useRandomAccessFile()` which is slower, but does not use such aggressive locking. In read-only mode you can also open store multiple times. You may also disable file locks completely by `DB.disableFileLock()` (at your own risk of course)
**InternalError, Error, AssertionFailedError, IllegalArgumentException, StackOverflowError and so on**
There was an problem in JDBM. It is possible that file store was corrupted thanks to an internal error or disk failure. Disabling cache by `DBMaker.disableCache()` may workaround the problem. Please submit bug report to github.
@@ -142,3 +219,4 @@ Special thanks to EJ-Technologies for donating us excellent
+
Please sign in to comment.
Something went wrong with that request. Please try again.