Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Architecture web page is cluttered with grammar/spelling mistakes. #6

Closed
regexident opened this issue Sep 16, 2013 · 2 comments
Closed

Comments

@regexident
Copy link

I made an attempt to fix some of the grammar/spelling issues in your architecture documentation. You should get a native speaker to proof-read your stuff before releasing it. Otherwise you're just shooting yourself (and your own work) in the foot.

PREFACE

Sophia (adj. sophisticated) database and its it's architecture was born as a result of research and rethinking primary alghorithmical constraints associated with getting a popular Log-file based data structure, such as LSM-tree, its it's variations based on Fractional Cascading ideas and a B-Tree. (this sentence is ungrammatical and hard to make sense of)

Those limitations are slow read's, meta-data bloat and unexpected latency hops.

Most of a log-based databases (or LSM-alike specifically) trend to organize their file storage as a collection of sorted files which are periodically merged merges. Thus, without using some sort of key filtering scheme (like bloom-filter), to find a key, it has have to traverse all files which can and could take up to O(files_count * log(file_size)) in the worst case.

Sophia was specifically is designed the way to improve the situation and get a fast reads while still benefit from append-only design.

DESIGN

Sophia's Sophia architecture combines a region in-memory index with *a * in-memory key index.

A region Region index is represented as an ordered range of regions with their min and max keys and a latest on-disk reference. Regions are never overlap.

These regions have the same semantical meaning as the B-Tree pages, but are designed differently. They do not have a tree structure or , any internal page-to-page relationships and thus no , thus, a meta-data overhead (specifically to append-only B-Tree).

A single region on-disk holds keys with values. And as a B-tree page, region has it's maximum key count. Regions are uniquely identified by region id number, by which they can be tracked in a future.

A key Key index is very similar to LSM zero-level (memtable), but has a have different key lifecycle. All modifications first get into there (needs rephrasing) and hold until in till they will be explicitly removed by merger.

The database Database update lifecycle is organized in terms of epochs Epoch's. Epoch lifetime is determined in terms of key updates. When the update counter reaches an(/the) epoch's epoch watermark number then the Rotation event happen.

Each epoch, depending to its it's state, is associated with a single log file or database file. Before getting added added to the in-memory index, every modifications are first written to the epoch's write-ahead log.

On each rotation event:
a. current epoch, which is called 'live', is marked as 'transfer' and a new 'live' epoch is created (new log file)
b. create new and swap current in-memory key index
c. merger thread is being woken up

The merger Merger thread is the core part that is responsible for region merging and garbage collecting of a old regions and older epochs epoch's. On wakeup**,** the merger thread iterates through list of epochs marked as 'transfer' and starts the start merge procedure.

The merge procedure has the following steps:

  1. create new database file for the latest 'transfer' epoch
  2. fetch any keys from the in-memory index that associated with a single destination region
  3. for each foreach fetched key keys and origin region start the merge and write a new region to the database file
  4. on each completed region (current merged key count is less or equal to max region key count):
    a. allocate new split region for region index, set min and max
    b. first region is always has id of origin destination region
    c. link region and schedule for future commit
  5. on origin region update complete:
    a. update destination region index file reference to the current epoch and insert split regions
    b. remove keys from key index
  6. start step (2) as long as (sure you didn't mean until?) there are while there is no updates left
  7. start garbage collector
  8. database synced with disk and if everything went well, remove all 'transfer' epochs (log files) and gc'ed databases
  9. free index

The garbage Garbage collector has a simple design.

All that is needed what is need is to track an/the epoch's epoch total region count and a count of transfered regions during merge procedure. Thus, if some older epoch database has fewer than 70% (or any other changeable factor) live regions less then 70% (or other changeable factor) they just get copied to current epoch database file and the old one is being deleted.

On database recovery recover, Sophia tracks and builds an index of pages from the youngest epochs (biggest numbers number's) down to the oldest. Log files are being replayed and epochs epoch's are marked as 'transfer'.

Sophia has been maybe evaluated as having the following complexity (in terms of disk accesses):

set: worst case is a O(1) append-only key write + in-memory index insert

get: worst case is a O(1) random region read, which itself do amortized O(log region_size) key compares + in-memory key index search + in-memory region search

range: range queries are very fast due to the fact that each iteration needs to compare no more than is only need to compare only two keys without searching search them, and access through mmaped database. Roughly complexity can be equally evaluated as having to sequentially read a mmaped file.

@pmwkaa
Copy link
Owner

pmwkaa commented Sep 16, 2013

That is a huge, thank you very much and shame to me and my poor engrish ;)

@pmwkaa pmwkaa closed this as completed Sep 16, 2013
@regexident
Copy link
Author

That is a huge, thank you very much and shame on to me and my poor english engrish ;)

Okay, I'll stop now, promised. ;)

Anyway, …you're welcome. :)
Figured a promising project deserved some clean docs.
And while we're at it: you might also want to re-consider your font choice, which looks nice but is quite hard to read.

This was referenced Jun 25, 2014
Safari77 added a commit to Safari77/sophia that referenced this issue Sep 5, 2015
$ softlimit -a 860000000 ./batch 
Segmentation fault (core dumped)

(gdb) bt
#0  ss_pageradd (p=0x55aa8f8d8590) at sophia/std/ss_pager.c:59
#1  0x000055aa8f733675 in ss_pagerpop (p=0x55aa8f8d8590) at sophia/std/ss_pager.c:71
pmwkaa#2  0x000055aa8f7336fb in ss_sagrow (s=0x55aa8f8d8878) at sophia/std/ss_slaba.c:32
pmwkaa#3  ss_slabamalloc (a=0x55aa8f8d8870, size=<optimized out>) at sophia/std/ss_slaba.c:89
pmwkaa#4  0x000055aa8f741ef2 in ss_malloc (size=65, a=<optimized out>) at sophia/std/ss_a.h:45
pmwkaa#5  sx_valloc (v=0x55aa9ab5cdf0, asxv=<optimized out>) at sophia/transaction/sx_v.h:28
pmwkaa#6  sx_set (t=t@entry=0x7f81028ad05c, index=index@entry=0x7f81028c3c8c, version=0x55aa9ab5cdf0) at sophia/transaction/sx.c:285
pmwkaa#7  0x000055aa8f751841 in se_txwrite (t=0x7f81028ad024, o=<optimized out>, flags=<optimized out>) at sophia/environment/se_tx.c:65
pmwkaa#8  0x000055aa8f753f4e in se_recoverlog (log=0x55aa8f8f46c0, e=<optimized out>) at sophia/environment/se_recover.c:111
pmwkaa#9  se_recoverlogpool (e=0x55aa8f8d8010) at sophia/environment/se_recover.c:150
pmwkaa#10 se_recover (e=e@entry=0x55aa8f8d8010) at sophia/environment/se_recover.c:172
pmwkaa#11 0x000055aa8f75a850 in se_open (o=0x55aa8f8d8010) at sophia/environment/se.c:56
pmwkaa#12 0x000055aa8f75a988 in sp_open (ptr=<optimized out>) at sophia/sophia/sophia.c:75
pmwkaa#13 0x000055aa8f712d19 in main (argc=<optimized out>, argv=<optimized out>) at batch.c:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants