Architecture web page is cluttered with grammar/spelling mistakes. #6

regexident · 2013-09-16T12:12:49Z

I made an attempt to fix some of the grammar/spelling issues in your architecture documentation. You should get a native speaker to proof-read your stuff before releasing it. Otherwise you're just shooting yourself (and your own work) in the foot.

PREFACE

Sophia (adj. sophisticated) database and its ~~it's~~ architecture was born as a result of research and rethinking primary alghorithmical constraints associated with ~~getting a popular Log-file based data structure, such as LSM-tree, its it's variations based on Fractional Cascading ideas and a B-Tree.~~ (this sentence is ungrammatical and hard to make sense of)

Those limitations are slow read's, meta-data bloat and unexpected latency hops.

Most ~~of a~~ log-based databases (or LSM-alike specifically) trend to organize their file storage as a collection of sorted files which are periodically merged ~~merges~~. Thus, without using some sort of key filtering scheme (like bloom-filter), to find a key, it has ~~have~~ to traverse all files which can ~~and could~~ take up to O(files_count * log(file_size)) in the worst case.

Sophia was specifically is designed ~~the way~~ to improve the situation and get a fast reads while still benefit from append-only design.

DESIGN

Sophia's ~~Sophia~~ architecture combines a region in-memory index with *a * in-memory key index.

A region ~~Region~~ index is represented as an ordered range of regions with their min and max keys and a latest on-disk reference. Regions ~~are~~ never overlap.

These regions have the same semantical meaning as the B-Tree pages, but are designed differently. They do not have a tree structure or , any internal page-to-page relationships and thus no ~~, thus, a~~ meta-data overhead (specifically to append-only B-Tree).

A single region on-disk holds keys with values. And as a B-tree page, region has it's maximum key count. Regions are uniquely identified by region id number, by which they can be tracked in a future.

A key ~~Key~~ index is very similar to LSM zero-level (memtable), but has a ~~have~~ different key lifecycle. All modifications first ~~get into there~~ (needs rephrasing) and hold until ~~in till~~ they will be explicitly removed by merger.

The database ~~Database~~ update lifecycle is organized in terms of epochs ~~Epoch's~~. Epoch lifetime is determined in terms of key updates. When the update counter reaches an(/the) epoch's ~~epoch~~ watermark number then the Rotation event happen.

Each epoch, depending to its ~~it's~~ state, is associated with a single log file or database file. Before getting added added to the in-memory index, ~~every~~ modifications are first written to the epoch's write-ahead log.

On each rotation event:
a. current epoch, which is called 'live', is marked as 'transfer' and a new 'live' epoch is created (new log file)
b. create new and swap current in-memory key index
c. merger thread is being woken up

The merger ~~Merger~~ thread is the core part that is responsible for region merging and garbage collecting of a old regions and older epochs ~~epoch's~~. On wakeup**,** the merger thread iterates through list of epochs marked as 'transfer' and starts the ~~start~~ merge procedure.

The merge procedure has the following steps:

create new database file for the latest 'transfer' epoch

fetch any keys from the in-memory index that associated with a single destination region

for each ~~foreach~~ fetched key ~~keys~~ and origin region start the merge and write a new region to the database file

on each completed region (current merged key count is less or equal to max region key count):
a. allocate new split region for region index, set min and max
b. first region is always has id of origin destination region
c. link region and schedule for future commit

on origin region update complete:
a. update destination region index file reference to the current epoch and insert split regions
b. remove keys from key index

start step (2) as long as (sure you didn't mean until?) there are ~~while there is~~ no updates left

start garbage collector

database synced with disk and if everything went well, remove all 'transfer' epochs (log files) and gc'ed databases

free index

The garbage ~~Garbage~~ collector has a simple design.

All that is needed ~~what is need~~ is to track an/the epoch's ~~epoch~~ total region count and a count of transfered regions during merge procedure. Thus, if some older epoch database has fewer than 70% (or any other changeable factor) live regions ~~less then 70% (or other changeable factor)~~ they just get copied to current epoch database file and the old one is being deleted.

On database recovery ~~recover,~~ Sophia tracks and builds an index of pages from the youngest epochs (biggest numbers ~~number's~~) down to the oldest. Log files are being replayed and epochs ~~epoch's~~ are marked as 'transfer'.

Sophia has been ~~maybe~~ evaluated as having the following complexity (in terms of disk accesses):

set: worst case is a O(1) append-only key write + in-memory index insert

get: worst case is a O(1) random region read, which itself do amortized O(log region_size) key compares + in-memory key index search + in-memory region search

range: range queries are very fast due to the fact that each iteration needs to compare no more than ~~is only need to compare only~~ two keys without searching ~~search~~ them, and access through mmaped database. Roughly complexity can be equally evaluated as having to sequentially read a mmaped file.

pmwkaa · 2013-09-16T12:48:27Z

That is a huge, thank you very much and shame to me and my poor engrish ;)

regexident · 2013-09-16T14:34:04Z

That is a huge, thank you very much and shame on to me and my poor english ~~engrish~~ ;)

Okay, I'll stop now, promised. ;)

Anyway, …you're welcome. :)
Figured a promising project deserved some clean docs.
And while we're at it: you might also want to re-consider your font choice, which looks nice but is quite hard to read.

$ softlimit -a 860000000 ./batch Segmentation fault (core dumped) (gdb) bt #0 ss_pageradd (p=0x55aa8f8d8590) at sophia/std/ss_pager.c:59 #1 0x000055aa8f733675 in ss_pagerpop (p=0x55aa8f8d8590) at sophia/std/ss_pager.c:71 pmwkaa#2 0x000055aa8f7336fb in ss_sagrow (s=0x55aa8f8d8878) at sophia/std/ss_slaba.c:32 pmwkaa#3 ss_slabamalloc (a=0x55aa8f8d8870, size=<optimized out>) at sophia/std/ss_slaba.c:89 pmwkaa#4 0x000055aa8f741ef2 in ss_malloc (size=65, a=<optimized out>) at sophia/std/ss_a.h:45 pmwkaa#5 sx_valloc (v=0x55aa9ab5cdf0, asxv=<optimized out>) at sophia/transaction/sx_v.h:28 pmwkaa#6 sx_set (t=t@entry=0x7f81028ad05c, index=index@entry=0x7f81028c3c8c, version=0x55aa9ab5cdf0) at sophia/transaction/sx.c:285 pmwkaa#7 0x000055aa8f751841 in se_txwrite (t=0x7f81028ad024, o=<optimized out>, flags=<optimized out>) at sophia/environment/se_tx.c:65 pmwkaa#8 0x000055aa8f753f4e in se_recoverlog (log=0x55aa8f8f46c0, e=<optimized out>) at sophia/environment/se_recover.c:111 pmwkaa#9 se_recoverlogpool (e=0x55aa8f8d8010) at sophia/environment/se_recover.c:150 pmwkaa#10 se_recover (e=e@entry=0x55aa8f8d8010) at sophia/environment/se_recover.c:172 pmwkaa#11 0x000055aa8f75a850 in se_open (o=0x55aa8f8d8010) at sophia/environment/se.c:56 pmwkaa#12 0x000055aa8f75a988 in sp_open (ptr=<optimized out>) at sophia/sophia/sophia.c:75 pmwkaa#13 0x000055aa8f712d19 in main (argc=<optimized out>, argv=<optimized out>) at batch.c:31

pmwkaa closed this as completed Sep 16, 2013

njaard mentioned this issue May 14, 2014

Deadlock when opening two cursors #50

Open

This was referenced Jun 25, 2014

crash in cursor iterator #53

Open

crash with multiple threads #54

Open

This was referenced Sep 5, 2015

check for mmap return value instead of p Safari77/sophia#1

Merged

check for mmap return value instead of p #92

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architecture web page is cluttered with grammar/spelling mistakes. #6

Architecture web page is cluttered with grammar/spelling mistakes. #6

regexident commented Sep 16, 2013

PREFACE

DESIGN

pmwkaa commented Sep 16, 2013

regexident commented Sep 16, 2013

Architecture web page is cluttered with grammar/spelling mistakes. #6

Architecture web page is cluttered with grammar/spelling mistakes. #6

Comments

regexident commented Sep 16, 2013

PREFACE

DESIGN

pmwkaa commented Sep 16, 2013

regexident commented Sep 16, 2013