Skip to content
Newer
Older
100755 232 lines (160 sloc) 14.4 KB
f129994 @jankotek Update readme, add note about JDBM4
authored Aug 21, 2012
1 **NOTE: this project is in maintenance mode (bug fix only), I redirected my effort to JDBM4 which should provide better concurrent scalability**
2
52c7a9e @jankotek Update README.md
authored Apr 24, 2012
3 JDBM provides TreeMap, HashMap and other collections backed up by disk storage.
a625c80 Update readme.md with Apache link
Jan Kotek authored Apr 30, 2012
4 Now you can handle billions of items without ever running out of memory.
52c7a9e @jankotek Update README.md
authored Apr 23, 2012
5 JDBM is probably the fastest and the simpliest pure Java database.
fcab28c updates in readme
Jan Kotek authored Jan 9, 2012
6
52c7a9e @jankotek Update README.md
authored Apr 23, 2012
7 JDBM is tiny (160KB nodeps jar), but packed with features such as transactions,
8 instance cache and space efficient serialization.
9 It also has outstanding performance with 1 million inserts per second and 10 million fetches per second (disk based!!).
10 It is tightly optimized and has minimal overhead.
11 It scales well from Android phone to multi-terrabyte data sets.
fcab28c updates in readme
Jan Kotek authored Jan 9, 2012
12
52c7a9e @jankotek Update README.md
authored Apr 23, 2012
13 JDBM is opensource and free-as-beer under Apache license.
14 There is no catch and no strings attached.
fcab28c updates in readme
Jan Kotek authored Jan 9, 2012
15
137ebfa @jankotek Update README.md
authored Jan 18, 2012
16 News
43a8c0b update README
Jan Kotek authored Jan 28, 2012
17 ====
a388122 @jankotek Update README.md
authored Sep 5, 2012
18 4th Sep 2012 - JDBM3 alpha4 was released. [Just bugfixes](https://groups.google.com/forum/?fromgroups=#!topic/jdbm/yBB4dLW54Pk)
19
bbf5954 @jankotek Update readme, add note about JDBM4
authored Aug 21, 2012
20 18st Aug 2012 - First version of JDBM4 is available on [GitHub](https://github.com/jankotek/JDBM4)
21
275f231 Rename package `net.kotek.jdbm` to `org.apache.jdbm`
Jan Kotek authored May 9, 2012
22 30th Apr 2012 - JDBM3 [may soon become part of Apache Foundation](https://groups.google.com/forum/?fromgroups#!topic/jdbm/pb4LWr6qTxM). This will not affect github site, but package may be renamed in a few days (done).
577590b Update readme.md with Apache note
Jan Kotek authored Apr 30, 2012
23
df854f4 Update readme.md
Jan Kotek authored Apr 30, 2012
24 10th Apr 2012 - Alpha3 was just released. Get [binary jar](https://github.com/downloads/jankotek/JDBM3/JDBM-3.0-alpha3.jar) and [read some notes](http://groups.google.com/group/jdbm/browse_thread/thread/db1f0ed52ce5fb3c)
b90079b Added note about Alpha3
Jan Kotek authored Apr 10, 2012
25
56334bb Add news about Alpha2
Jan Kotek authored Feb 24, 2012
26 24th Feb 2012 - Alpha2 released with tons of bugfixes. Get [binary jar](https://github.com/downloads/jankotek/JDBM3/JDBM-3.0-alpha2.jar)
27
28 18th Jan 2012 - Alpha1 released, [announcement](http://kotek.net/blog/jdbm_3.0_alpha_1_released) and
6d6a5f5 @jankotek Update README.md
authored Jan 18, 2012
29 [binary jar](https://github.com/downloads/jankotek/JDBM3/JDBM-3.0-alpha-1.jar)
137ebfa @jankotek Update README.md
authored Jan 18, 2012
30
52c7a9e @jankotek Update README.md
authored Apr 23, 2012
31 Features
32 ========
33 * B*Tree with `ConcurrentNavigableMap` interface
34 * Very fast for sequential read/write.
35 * Small values stored inside tree nodes
36 * Small values stored inside tree nodes, large values lazily fetched.
37 * Self-balancing, great performance even with 1e12 items.
38 * Delta compression on keys
a625c80 Update readme.md with Apache link
Jan Kotek authored Apr 30, 2012
39 * Submaps (aka cursors) to view limited collection subsets
52c7a9e @jankotek Update README.md
authored Apr 23, 2012
40 * Custom comparators
41 * H*Tree with `ConcurrentMap` interface
42 * Optimized for random reads/writes
43 * Small values stored inside tree nodes, large values lazily fetched.
44 * Self-balancing, great performance even with 1e12 items.
45 * TreeSet and HashSet which uses BTree and HTree without values
46 * LinkedList, which implements bounded BlockingDeque (not implemented yet)
47 * Multi code scalability (currently under testing)
48 * Everything is thread safe
49 * Reads should scale linearly with number of cores (as soon as it fits into cache)
50 * All collection implements `Concurrent` interfaces
51 * Some multi-core scalability with `ReentrantReadWriteLock`.
52 * Instance cache
53 * If data fits into cache, reads are almost as fast as in-memory collections.
54 * Minimal overhead, works well even with 16MB heap.
55 * Scales well into 64GB RAM heaps.
a625c80 Update readme.md with Apache link
Jan Kotek authored Apr 30, 2012
56 * Various yet simple tuning options
52c7a9e @jankotek Update README.md
authored Apr 23, 2012
57 * Transactions
58 * Single transaction per store, avoids concurrent modification stuff
59 * Transactions are ACID (with limits for single concurrent transaction)
60 * Option to disable transactions for fast inserts/updates
61 * Low level key-value store
62 * Various options for on-disk store (NIO, RAF, locking...)
63 * Write performance not affected by store fragmentation
64 * In-memory store option
65 * Can read data from zip file with reasonable performance
66 * Can read data from classpath resource, database is deployable over Java Web Start
67 * Advanced defragmentation
68 * Print store statistics
69 * Transparent data encryption
70 * Only 9 bytes overhead per record (for example BTree node)
71 * Space efficient serialization
72 * Custom code for most `java.util` and `java.lang` classes. For example Long(0) takes only single byte
73 * Very small POJO serialization overhead, typically only 3 bytes per class + 1 byte for each field.
74 * Mimic java serialization, fields can be `transient`, all classes needs to implement `Serializable` interface
75 * Supports `Externalizable`
76 * Possible to plug your own `Serializer`
77 * Performance
a625c80 Update readme.md with Apache link
Jan Kotek authored Apr 30, 2012
78 * Blazing fast 1 million inserts / 10 million reads per second (on my 5GHz machine, but you should get 300000 inserts p.s. easily)
52c7a9e @jankotek Update README.md
authored Apr 23, 2012
79 * Tightly optimized code
80 * Uses NIO stuff you read about, but never see in action.
81 * Minimal heap usage, prevents `java.lang.OutOfMemoryError: GC overhead limit`
82 * Most logic done with primitives or arrays. Minimal stack usage.
83
84
85
86 Introduction
87 ============
275f231 Rename package `net.kotek.jdbm` to `org.apache.jdbm`
Jan Kotek authored May 9, 2012
88 All classes are contained in package `org.apache..jdbm`. There are only two important classes: `DBMaker` is builder which configures and opens database. `DB` is database itself, it opens collections and controls transactions. Collections in JDBM mimic their `java.util` counter parts. TreeMap uses an on-disk ordered auto-balanced B*Tree index, LinkedList is stored as self referencing entries and so on. Everything should be thread safe (currently under testing).
52c7a9e @jankotek Update README.md
authored Apr 23, 2012
89
90 Maven Dependency
91 ----------------
92
93 JDBM is not currently in any Maven repository. TODO: We should have soon custom repo with nightly builds.
94
95 Quick example
96 -------------
137ebfa @jankotek Update README.md
authored Jan 18, 2012
97
553729f @kdabir renamed package name in README
kdabir authored May 15, 2012
98 import org.apache.jdbm.*;
8ff1d77 @jankotek Update README.md
authored Apr 29, 2012
99
100 //Open database using builder pattern.
101 //All options are available with code autocompletion.
102 DB db = DBMaker.openFile("test")
92be798 @kdabir Modified the example to work with the API changes.
kdabir authored May 15, 2012
103 .deleteFilesAfterClose()
8ff1d77 @jankotek Update README.md
authored Apr 29, 2012
104 .enableEncryption("password",false)
105 .make();
52c7a9e @jankotek Update README.md
authored Apr 23, 2012
106
8ff1d77 @jankotek Update README.md
authored Apr 29, 2012
107 //open an collection, TreeMap has better performance then HashMap
92be798 @kdabir Modified the example to work with the API changes.
kdabir authored May 15, 2012
108 SortedMap<Integer,String> map = db.createTreeMap("collectionName");
fcab28c updates in readme
Jan Kotek authored Jan 9, 2012
109
8ff1d77 @jankotek Update README.md
authored Apr 29, 2012
110 map.put(1,"one");
111 map.put(2,"two");
112 //map.keySet() is now [1,2] even before commit
fcab28c updates in readme
Jan Kotek authored Jan 9, 2012
113
8ff1d77 @jankotek Update README.md
authored Apr 29, 2012
114 db.commit(); //persist changes into disk
fcab28c updates in readme
Jan Kotek authored Jan 9, 2012
115
8ff1d77 @jankotek Update README.md
authored Apr 29, 2012
116 map.put(3,"three");
117 //map.keySet() is now [1,2,3]
118 db.rollback(); //revert recent changes
119 //map.keySet() is now [1,2]
fcab28c updates in readme
Jan Kotek authored Jan 9, 2012
120
8ff1d77 @jankotek Update README.md
authored Apr 29, 2012
121 db.close();
fcab28c updates in readme
Jan Kotek authored Jan 9, 2012
122
52c7a9e @jankotek Update README.md
authored Apr 23, 2012
123 A few quick tricks
124 ------------------
125 * Disabling transaction increases write performance 6x. Do it by `DBMaker.disableTransactions()`. Do not forget to close store correctly in this case!
126 * When transactions are enabled all uncommited instances are stored in memory. Make sure you commit on time. It is most common cause of `OutOfMemoryError`.
127 * JDBM does not try to reclaim unused space after massive delete, you must call `DB.defrag(false)` yourself.
128 * TreeMap has usually better performance then HashMap.
129 * JDBM uses instance cache with limited size by default. If you have enought memory and large store, use unbounded cache: `DBMaker.enableHardCache()`
130 * JDBM is optimized for small size records. Sizes: 16 bytes is recommended, 32KB is reasonable maximum, 8MB is hard limit.
131 * JDBM scales well up to 1e12 records. Batch insert overnight creates multi-terrabyte store.
fcab28c updates in readme
Jan Kotek authored Jan 9, 2012
132
52c7a9e @jankotek Update README.md
authored Apr 23, 2012
133 DBMaker
134 -------
fcab28c updates in readme
Jan Kotek authored Jan 9, 2012
135
52c7a9e @jankotek Update README.md
authored Apr 23, 2012
136 TODO
fcab28c updates in readme
Jan Kotek authored Jan 9, 2012
137
52c7a9e @jankotek Update README.md
authored Apr 23, 2012
138 DB
139 --
fcab28c updates in readme
Jan Kotek authored Jan 9, 2012
140
52c7a9e @jankotek Update README.md
authored Apr 23, 2012
141 TODO
fcab28c updates in readme
Jan Kotek authored Jan 9, 2012
142
52c7a9e @jankotek Update README.md
authored Apr 23, 2012
143 Collections
144 -----------
43a8c0b update README
Jan Kotek authored Jan 28, 2012
145
52c7a9e @jankotek Update README.md
authored Apr 23, 2012
146 TODO
43a8c0b update README
Jan Kotek authored Jan 28, 2012
147
148 Instance cache
149 --------------
52c7a9e @jankotek Update README.md
authored Apr 23, 2012
150
43a8c0b update README
Jan Kotek authored Jan 28, 2012
151 JDBM caches created instances similar way as Hibernate or other ORM frameworks. This greatly reduces serialization overhead and speedups database. There are five cache types, each configurable with method on `DBMaker` builder:
152
52c7a9e @jankotek Update README.md
authored Apr 23, 2012
153 * **Most Recently Used** (MRU) cache. It is fixed size and stores newest entries. This cache is on by default. You can configure its size, default size is 2048. This cache has lowest GC overhead and may be suprisingly faster then other cache types.
43a8c0b update README
Jan Kotek authored Jan 28, 2012
154
155 * **No cache**. You may disable instance cache by using `DBMaker.disableCache()`
156
52c7a9e @jankotek Update README.md
authored Apr 23, 2012
157 * **Hard reference cache**. All instances fetched by JDBM are stored in cache until released. Good with large memory heaps. `Hard` cache is recommended over `soft` and `weak` as it has smaller overhead. Use `DBMaker.enableHardCache()` to enable it.
158
53063fb update README
Jan Kotek authored Jan 28, 2012
159 * **Weak reference cache**. Instances are referenced using `WeakReference`. When item is no longer referenced by other instances, it can be discarded by GC. Use `DBMaker.enableWeakCache()` to enable it.
43a8c0b update README
Jan Kotek authored Jan 28, 2012
160
161 * **Soft reference cache**. Instances are referenced using `SoftReference`. Similar to `WeakReference` but holds longer, until systems starts running out of memory. Use `DBMaker.enableSoftCache()` to enable it.
162
163
52c7a9e @jankotek Update README.md
authored Apr 23, 2012
164 With Weak/Soft/Hard cache JDBM starts backround cleanup thread. It also checks memory usage every 10 seconds, if free memory is bellow 25%, it clears cache. Our tests shows that GC is not fast enought to prevent `OutOfMemoryError`. This may be disabled with `DBMaker.disableCacheAutoClear()`.
43a8c0b update README
Jan Kotek authored Jan 28, 2012
165
52c7a9e @jankotek Update README.md
authored Apr 23, 2012
166 You may clear cache manually using `DB.clearCache()`. This is usefull after massive delete, or when you are moving from one type of data to other.
43a8c0b update README
Jan Kotek authored Jan 28, 2012
167
168 Transactions
169 ------------
f16496b update readme
Jan Kotek authored Feb 3, 2012
170
52c7a9e @jankotek Update README.md
authored Apr 23, 2012
171 JDBM supports single transaction per store. It does not have multiple concurrent transactions with row/table locks, pessimistic locking and similar stuff. This trade off greatly simplifies design and speeds up operations. Transactions are still 'ACID' but in limited way.
f16496b update readme
Jan Kotek authored Feb 3, 2012
172
52c7a9e @jankotek Update README.md
authored Apr 23, 2012
173 Transaction implementation is sound and solid. Uncommited data are stored in memory. Then during commit appended to end of transaction log file. It is safe, as append operation hardly ever corrupts file. After commit is finished, data are replayed from transaction log file into main storage file. If users calls rollback, transaction log file is discarded.
f16496b update readme
Jan Kotek authored Feb 3, 2012
174
52c7a9e @jankotek Update README.md
authored Apr 23, 2012
175 Keeping transaction log file brings some overhead. It is possible to disable transaction and write changes directly into main storage file. It makes inserts and updates about 6x faster. In this case no effort is made to protect file from corruption, all is sacrificed for maximal speed. It is absolutely necessary to properly close storage before exit. You may disable transactions by using `DBMaker.disableTransactions()`.
176
177 Uncommited instances are stored in memory and flushed to disk during commit. So with large transactions you may run out of memory easily. With disabled transactions data are stored in 10 MB memory buffer and flushed to main storage file when buffer is filled.
f16496b update readme
Jan Kotek authored Feb 3, 2012
178
179
180 Serialization
181 -------------
52c7a9e @jankotek Update README.md
authored Apr 23, 2012
182
183 JDBM has its own space-efficient serialization which tries to mimic standard implementation. All classes must implement `Serializable` interface. You may exclude field from serialization by `transient` keyword. Our serialization also handles cyclic references and some other advanced stuff. You may use your own binary format with `Externalizable` interface or custom `Serializer`.
f16496b update readme
Jan Kotek authored Feb 3, 2012
184
185 JDBM has custom serialization code for most classes in `java.lang` and `java.util` packages. For `Date` JDBM writes only 9 bytes: 1-byte-long serialization header and 8-byte-long timestamp. For `true`, `String("")` or `Long(3)` JDBM writes only single-byte serialization header. For array list and other collections JDBM writes serialization header, packed size and data. Custom serializers have maximal space efficiency and low overhead.
186
52c7a9e @jankotek Update README.md
authored Apr 23, 2012
187 Standard java serialization stores class structure data (field names, types...) with record data. This generates huge overhead which multiplies with number of records. JDBM serialization stores class structure data in single space and record data only contains reference. So space overhead for POJOs is typically only 3 bytes per class + 1 byte for each field.
43a8c0b update README
Jan Kotek authored Jan 28, 2012
188
52c7a9e @jankotek Update README.md
authored Apr 23, 2012
189 Our serialization is designed to be very fast on small chunks of data (a few POJOs glued together). With couple of thousands nodes in object tree it becomes slow (N^2 scalability). Maximal record size in JDBM is 8 MB, so it is good practise to store only small key/value in database. You should always use filesystem for data larger then 500KB.
43a8c0b update README
Jan Kotek authored Jan 28, 2012
190
52e1077 Option to skip collection defrag, this makes defrag much faster in so…
Jan Kotek authored Feb 25, 2012
191 Defragmentation
192 ---------------
193
52c7a9e @jankotek Update README.md
authored Apr 23, 2012
194 Store gets fragmented. JDBM is well designed, so this does not slows down write/update/delete operations. But fragmentation slows down read operations as more data needs to be readed from disk. JDBM does not do any sort of magic to reclaim unused data. It relies on user to call `DB.defrag` periodically or after massive update/delete/inserts.
195
196 Defrag can be called at runtime, but `DB.defrag` methods blocks other read/writes until it finishes. Defrag basically recreates copyes data from one store to second store. Then it deletes first store and renames second.
197
198 Defragnentation has two modes controlled by `DB.defrag(boolean fullDefrag)` parameter:
199
200 **Quick defrag** is designed to be as fast as possible. It only reclaims unused space (compacts store), but does not reorganize data inside store. It copyes all data from one store to other, without empty spaces between records. It is very fast, limited only by disk sequential write speed. Call it by `DB.defrag(false)`
201
202 **Full defrag** is designed to make store as fast as possible. It reorganizes data layout, so nodes from single collection are stored close to each other. This makes future reads from store faster as less data needs to be read. Full defrag is much slower than quick defrag, as it traverses and copies all collections unsequentially.
52e1077 Option to skip collection defrag, this makes defrag much faster in so…
Jan Kotek authored Feb 25, 2012
203
204
f16496b update readme
Jan Kotek authored Feb 3, 2012
205 Troubleshooting
206 ===============
a5a40f9 Add links to mail-group
Jan Kotek authored Mar 16, 2012
207
208 Please report bug into Github error tracker. There is [mail-group](mailto:jdbm@googlegroups.com) if you would have questions, you may also browse [group archive](http://groups.google.com/group/jdbm).
209
f16496b update readme
Jan Kotek authored Feb 3, 2012
210 JDBM uses chained exception so user does not have to write try catch blocks. IOException is usually wrapped in IOError which is unchecked. So please always check first exception.
43a8c0b update README
Jan Kotek authored Jan 28, 2012
211
f16496b update readme
Jan Kotek authored Feb 3, 2012
212 **OutOfMemoryError**
213 JDBM keeps uncommited data in memory, so you may need to commit more often. If your memory is limited use MRU cache (on by default). You may increase heap size by starting JVM with extra parameter `-Xmx500MB`.
43a8c0b update README
Jan Kotek authored Jan 28, 2012
214
f16496b update readme
Jan Kotek authored Feb 3, 2012
215 **OutOfMemoryError: GC overhead limit exceeded**
216 Your app is creating new object instances faster then GC can collect them. When using Soft/Weak cache use Hard cache to reduce GC overhead (is auto cleared when free memory is low). There is JVM parameter to disable this assertion.
43a8c0b update README
Jan Kotek authored Jan 28, 2012
217
52c7a9e @jankotek Update README.md
authored Apr 23, 2012
218 **File locking, OverlappingFileLockException, some IOError**
219 You are trying to open file already opened by another JDBM. Make sure that you `DB.close()` store correctly, operating system may leave lock after JVM is terminated. You may try `DBMaker.useRandomAccessFile()` which is slower, but does not use such aggressive locking. In read-only mode you can also open store multiple times. You may also disable file locks completely by `DB.disableFileLock()` (at your own risk of course)
43a8c0b update README
Jan Kotek authored Jan 28, 2012
220
f16496b update readme
Jan Kotek authored Feb 3, 2012
221 **InternalError, Error, AssertionFailedError, IllegalArgumentException, StackOverflowError and so on**
222 There was an problem in JDBM. It is possible that file store was corrupted thanks to an internal error or disk failure. Disabling cache by `DBMaker.disableCache()` may workaround the problem. Please submit bug report to github.
43a8c0b update README
Jan Kotek authored Jan 28, 2012
223
224 ---
225 Special thanks to EJ-Technologies for donating us excellent
0b21865 Fixed link
Jan Kotek authored Jan 9, 2012
226 [JProfiler](http://www.ej-technologies.com/products/overview.html)
fcab28c updates in readme
Jan Kotek authored Jan 9, 2012
227
228
229
230
52c7a9e @jankotek Update README.md
authored Apr 23, 2012
231
Something went wrong with that request. Please try again.