linqdb
C#
Latest commit 78258b0 Jan 18, 2017 @ren85 search bug fix

README.md

Linqdb


Linqdb is efficient embedded database for .net with linq-style api. It is based on Leveldb.
Linqdb is a balance between nosql and relational database giving you the most control while allowing normalised relational data.
By knowing how it works (see sections implementation and performance hints) you can write efficient and clean code and have clear data model.
For example of this check out linqdb's benchmark using stackoverflow data.
For a more sophisticated example check out full text search engine built on top of linqdb: stackse.

Get it

Windows 32-bit version: Linqdb_32.zip

Windows 64-bit version: Linqdb_64.zip

Linux version

Copy this folder in your solution and add a reference to LinqDb.dll.

Creating new entities

An entity or a table is just a class. The class must have int Id property which if 0 indicates new entity and if not - the entity to be updated. Supported data types are: int, double, DateTime (and their nullables), byte[] and string. (Note bool is not supported, use int instead). For example, we’ll use:
public class SomeData
{
    public int Id { get; set; }
    public int PeriodId { get; set; }
    public int? ObjectId { get; set; }
    public double Value { get; set; }
    public double? Normalized { get; set; }
    public int? PersonId { get; set; }
    public DateTime? Date { get; set; }
    public string Name { get; set; }
}
        
Then to save new record we would (table will be created automatically if not there based on type's name):
using LinqDb;
...
var db = new Db("DATA");

var d = new SomeData() { Id = 0, Normalized = 1.2, PeriodId = 5 }; db.Table<SomeData>().Save(d); db.Dispose();

To save bulk of data more efficiently:
var list = new List<SomeData>();
...
db.Table<SomeData>().SaveBatch(list);
        
The more data will be packed in list the more efficient save will be. However 32 bit version only supports ~2Gb ram so care must be taken.
Same goes to Delete and Update methods (the more ids to delete or update the more efficient - and ram hungry - it is).

Querying

Select
Select everything from table:
List<SomeData> res = db.Table<SomeData>()
                       .SelectEntity();
        
To select only some columns (this is faster, so if possible only select what’s needed):
var res = db.Table<SomeData>()
            .Select(f => new 
            { 
                Id = f.Id,
                Normalized = f.Normalized
            });
        
Note that only anonymous types are supported in this case. res is List<AnonymousType>


Where
Linqdb supports Where clause:
var res = db.Table<SomeData>()
            .Where(f => f.Normalized == 2.3 || f.Date > DateTime.Now && f.PersonId == 5)
            .Select(f => new 
            { 
                Id = f.Id,
                Normalized = f.Normalized
            });
        
Where supports these operators: && || == >= > < <= !=. Also on the left hand side of operator there must be some property without any expression (i.e. f.PersonId % 2 == 0 won‘t work), on the right hand side - constant/variable/expression. It's important to note that types on both sides must match exactly (i.e. int != int?) and casting is only possible on the right side.
Note that if you can get away with other operators like .Intersect .Between .Search .Or - do that since they are all are likely to be more efficient than .Where.


Between
To evaluate something like .Where(f => f.Normalized > 3 && f.Normalized < 10) Linqdb will iterate index twice: first it will find all values > 3, then all values < 10 and finally return it’s intersection. So if table is large it’s faster to use .Between(f => f.Normalized, 3, 10, BetweenBoundaries.BothExclusive) which will only scan index once: from value 3 to 10.
var res = db.Table<SomeData>()
            .Between(f => f.Normalized, 3, 10, BetweenBoundaries.BothExclusive)
            .Select(f => new 
            { 
                Id = f.Id,
                Normalized = f.Normalized
            });
        
Intersect
Finds intersection of values in table with given set:
var res = db.Table<SomeData>()
            .Intersect(f => f.Normalized, new HashSet<double>() { 10, 20, 30 })
            .Select(f => new
            {
                Id = f.Id,
                Normalized = f.Normalized
            });
        
Order, skip, take
Ordering (by one column) is supported:
var res = db.Table<SomeData>()
            .OrderBy(f => f.Normalized)
            .Skip(10)
            .Take(10)
            .Select(f => new
            {
                Id = f.Id,
                Normalized = f.Normalized
            });
        
There are also overloads of Select and SelectEntity that take out argument total. This argument will be assigned the number of records that satisfied condition. (Handy with .Skip and .Take when the total number is also needed).


Search
Linqdb supports full text search on string columns. For example,
var res = db.Table<SomeData>()
            .Search(f => f.Name, "some text")
            .Select(f => new
            {
                Id = f.Id,
                Name = f.Name
            });
        
will find rows where column Name contains both „some“ and „text“. If you don't need search functionality on string you can store them more efficiently as byte.
var res = db.Table<SomeData>()
            .Search(f => f.Name, "black", true)
            .Select(f => new
            {
                Id = f.Id,
                Name = f.Name
            });
        
this will find everything that starts with "black".
.Search takes optional prameters start_step and steps which enable partial search. This is handy when you have lots of data and want to return some results as fast as possible. The search will only happen on slice of data, for example, start_step = 0, steps = 1 means only first 1000 documents will be searched, start_step = 10, steps = 5 - 5000 documents will be searched starting from document 10000. You can get total number of steps using LastStep function.


Or
By default when statements like .Where or .Search go together they imply logical "and" to the results. It is possible to have "or" on neighbouring statements like so:
var res = db.Table<SomeData>()
            .Search(f => f.Name, "some text").Or().Search(f => f.Name, "something else")
            .Where(f => f.Id > 100)
            .Select(f => new
            {
                Id = f.Id,
                Name = f.Name
            });
        
In this case only one of search need to satisfy to return the result. More than one .Or could be used.


Intermediate results
All the work happens in .Select or .SelectEntity statement, so things like these are ok:
    var tmp = db.Table<SomeData>()
                .Where(f => f.Normalized == 5);
    if (person_id != null)
    {
        tmp.Where(f => f.PersonId == person_id); //no need to assign to tmp
    }
    var res = tmp.SelectEntity(); //goes to disk
        
Also as you can see multiple .Where, .Between, .Intersect, .Search statements can be used (but only one OrderBy / OrderByDescending).


Count
To obtain table count:
Db.Table<SomeData>().Count();

Also works when conditions applied:
Db.Table<SomeData>().Where(f => f.Id < 50).Count();


GetIds
GetIds() method allows to get list of id's satisfying conditions without having to select them. Select uses either scan through entire value range or disk seeks, either of which is expensive, so if you can avoid it by using GetIds() - do it.

Updating

As mentioned above Save(item); will update row with item’s Id using item’s properties, given that Id is not 0. If it is 0, new item will be created and new Id will be assigned to the object’s Id property.
To update column of multiple rows you would need to construct Dictionary<int, T> where T is column’s type and:
var dic = new Dictionary<int, int?>();
dic[2] = 8;
dic[3] = 11;
db.Table<SomeData>().Update(f => f.PeriodId, dic);
        
This will update column PeriodId of rows with Ids 2 and 3 with respective values 8 and 11.

Deleting

To delete row(s) you would need to construct HashSet<int> of ids to be deleted and:
db.Table<SomeData>().Delete(new HashSet() { 2, 3 });
        

Transactions

Linqdb supports transactions with the limitation that one table can only be modified once during the transaction.
Transaction is used like this:
using (var transaction = new LinqdbTransaction())
{
    var d = new SomeData()
    {
        Id = 1,
        Normalized = 1.2,
        PeriodId = 5
    };
    db.Table<SomeData>(transaction).Save(d); //note that .Table takes transaction as a parameter
    var d2 = new BinaryData()
    {
        Id = 1,
        Data = new List<byte>() { 1, 2, 3 }.ToArray()
    };
    db.Table<BinaryData>(transaction).Save(d2);
    transaction.Commit(); //all writes happen here, if it fails - nothing gets modified (all or nothing)
}
        
if .Commit is not called, nothing is modified. LinqdbTransaction constructor takes int that indicates how many seconds will be waited in order to acquire write lock for a table (default is 30 sec). This is to avoid deadlocks from blocking indefinetely. Another thing to note is that once table is modified inside a transaction it becomes write-locked for the rest of the transaction (no one outside transaction is able to modify it until the transaction is over). Therefore it is important to keep transaction as short as possible: prepare everything before transaction starts and make it quick. Transaction must be created used and destroyed in the same thread.

Init \ dispose

Before using Linqdb one would need to create it:
var db = new Db("DATA");
        
The argument is path to the database.
The db needs to be created once at the application startup.
Before exiting application it is a good practice to dispose the db:

db.Dispose();

Replication

To make a copy of a database programatically:
db.Replicate("PATH_TO_COPY");
        
If directory exists - it will be removed before copying. Database can still be read/written while replication is in progress. However it is most efficient to make replication when the database is least used.

Guarantees

When saving, updating or deleting data it is guaranteed that row will be fully saved/updated/deleted or not touched at all. It is also guaranteed that all rows will be either saved/updated/deleted or none touched at all in the batch scenario. It is guaranteed that data will be correct under heavy multithreaded usage.

Limitations \ known caveats

  • Linqdb is one process database, that is only one process can access it at a time, even if it’s just for reading. That means, for example, that if your app is on desktop windows you need to turn off things like file indexing and antivirus scanning for database folder, so that no other process is using Leveldb's files.
  • You cannot change type of a column. If you need that – create a new column with required type and copy data there.
  • And after you modify type after it has been created - say added new column - you may want to manually fill it with whatever default value (i.e. if it is a nullable type – it won’t automatically satisfy == null after creation). In other words, type changes don’t automatically change data (but new inserts/updates/deletes will work).
If you are using Linqdb in a web application hosted by IIS, you would need to ensure that apppools won't intersect by setting "Disabled overlapped recycling" to true and allowing only 1 worker process.

Implementation

Linqdb uses Leveldb under the hood. (Leveldb’s license is New BSD License). The structure of Linqdb column is index (as Leveldb's rows with keys "i:table_number:column_number:linqdb_value:id" and empty value) and value (as Leveldb's rows with keys "table_number:column_number:id" and values containing linqdb_values). The locking happens with writes and locks table (while reading can go on) for short period. Consisting reading is ensured with Leveldb's snapshots. Consisting writing - with table locks and Leveldb's write batches. Index is searched by key "i:table_number:column_number:linqdb_value:" (Leveldb's values are sorted and so this will find either exact match by value or the closest one) and iteration from there is used afterwards.

Performance hints

Manual indexing
It turns out that if you have lots of data (millions of records) search index works much more efficiently than ordinary index described above. So, for example, if you have
.Between(f => f.Date, some_date1, some_date2)
and condition is satisfied for millions of records, it would be much faster to store Date as string (for example to the day level) and do something like this
var tmp = db.Table<SomeData>().Search(f => f.Date_string, some_date1.ToString("yyyyMMdd"));
for(var i=some_date1.AddDays(1); i<=some_date2; i = i.AddDays(1))
{
    tmp.Or().Search(f => f.Date_string, i.ToString("yyyyMMdd"));
}
var res = tmp.SelectEntity();
These queries are not identical, since the latter doesn't account for time, but the idea should be clear.
It's hard to built such index automatically since it's not clear what is the best interval. But you know your data and if you have lots of it, you might want to optimise queries this way.
If you have int column with few distinct values, for example, int value respresenting some type - then it's also a good candidate to be saved as string and searched upon.

Select statement
As noted above .Select or .SelectEntities takes resulting ids and decides whether to make a full scan of values (when the number of ids is substantial portion of entire data set) or make individual disk seeks (when it is not substantial portion). Either way this is expensive. So in scenarios when you need to aggregate over a large number of records selecting them first might be a bad idea. In such scenarios it is recommended to keep dictionary in memory where the keys would be ids and values - values to be aggregated. Then you could use .GetIds method and using ids access this dictionary.

Benchmark


License

Copyright (c) 2016 The Linqdb Authors. All rights reserved.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Support

Google support group