# A dummy file system using a key-value store

For an imaginary "app" a key-value store exists as the only persistent layer.

The app needs to behave like a file system, which requires:

users can:

- create project 
- create folder
- move project (into folder)
- move folder (into folder e.g. nesting)
- close project (archive)
- recover project
- leave project
- add project member
- remove project member
- checkout project (git clone)

The first thing I'll need is a dummy file system so the user can organise data.


To manage the directories for the user using the linux path system notation and a single dictionary. 

For this I'll use `path.split('/')` which will look like this:

```python
>>> '/1/2/project A'.split('/') 
['', '1', '2', 'project A']
```

The first empty string is to be thrown away and the last item is the pointer to the project. To avoid indexing and guess, I'll use the `*`-notation when unpacking from `split`:

```python
>>> _, *route, target = "/1/2/project A".split('/')  
['1', '2'] project A
```

As the `start, *middle, end` notation is somewhat unique to python, I'll show how it works here::

In this context `_` is a throw-away variable for the first item. `target` receives the last item and `*route` receives everything in between. 

First, however I need something to verify, so here's my folder structure for now:

In [1]:
d = {
    '1.1': {
        '2.1': {
            '3.1': 'inode 1',  # equivalent to a folder structure /1.1/2.1/ where 3.1 points to the project.
            '3.2': 'inode 2',
            '3.3': 'inode 3'
            }
        }, 
    '1.2': 'inode 4',
    '1.3': {
        '2.2': 'inode 5'
        }
    }

In [2]:
# prefix components:
space =  '    '
branch = '│   '
# pointers:
tee =    '├── '
last =   '└── '

def tree(d: dict, prefix: str=''):
    pointers = [tee] * (len(d)-1) + [last]
    for pointer, name, content in zip(pointers, d.keys(), d.values()):
        suffix = "" if isinstance(content, dict) else f": {content}"
        print(prefix + pointer + name + suffix)
        if isinstance(content, dict):
            extension = branch if pointer == tee else space
            tree(content, prefix=prefix+extension)
        
tree(d)

├── 1.1
│   └── 2.1
│       ├── 3.1: inode 1
│       ├── 3.2: inode 2
│       └── 3.3: inode 3
├── 1.2: inode 4
└── 1.3
    └── 2.2: inode 5


This seems acceptable.

The next task is to add the support functions to manipulate the file system:

- cd
- ls
- mkdir
- rmdir
- link path to file
- move


In [3]:
def cd(d, path):
    """ change directory"""
    if path.endswith('/'):
        path = path.rstrip('/')
    _, *route = path.split('/')
    for subdir in route:
        d = d[subdir]
    return d

In [4]:
cd(d, '/1.1/2.1')

{'3.1': 'inode 1', '3.2': 'inode 2', '3.3': 'inode 3'}

In [5]:
tree(cd(d, '/1.1/2.1'))

├── 3.1: inode 1
├── 3.2: inode 2
└── 3.3: inode 3


This also seems reasonable.

In [6]:
def ls(d, path):
    d = cd(d,path)
    max_length = max(len(k) for k in d.keys())
    for k,v in d.items():
        if not isinstance(v, dict):
            print(k.ljust(max_length, " "), v)
        else:
            print(k.ljust(max_length, " "), "dir")

In [7]:
ls(d, '/')
print("-" * 8)
ls(d, '/1.1')
print("-" * 8)
ls(d, '/1.1/2.1')

1.1 dir
1.2 inode 4
1.3 dir
--------
2.1 dir
--------
3.1 inode 1
3.2 inode 2
3.3 inode 3


This looks reasonable for a `ls` function. 

Let's continue with `mkdir`

In [8]:
def mkdir(d, path):
    """ make directory """
    _, *route = path.split('/')
    for i in route:
        if i not in d:
            d[i] = {}
        d = d[i]

In [9]:
mkdir(d, '/1.4')
tree(d)

├── 1.1
│   └── 2.1
│       ├── 3.1: inode 1
│       ├── 3.2: inode 2
│       └── 3.3: inode 3
├── 1.2: inode 4
├── 1.3
│   └── 2.2: inode 5
└── 1.4


Yes. Folder `1.4` was added.

In [10]:
mkdir(d, '/1.1/2.1/3.4')
tree(d)

├── 1.1
│   └── 2.1
│       ├── 3.1: inode 1
│       ├── 3.2: inode 2
│       ├── 3.3: inode 3
│       └── 3.4
├── 1.2: inode 4
├── 1.3
│   └── 2.2: inode 5
└── 1.4


Also good: Folder `/1.1/2.1/3.4` was added.

Let's remove it.

In [11]:
def rmdir(d, path):
    _, *route, target = path.split('/')
    for i in route:
        d = d[i]
    del d[target]

In [12]:
rmdir(d, '/1.1/2.1/3.4')
tree(d)

├── 1.1
│   └── 2.1
│       ├── 3.1: inode 1
│       ├── 3.2: inode 2
│       └── 3.3: inode 3
├── 1.2: inode 4
├── 1.3
│   └── 2.2: inode 5
└── 1.4


Folder `/1.1/2.1/3.4` was removed as expected.

The next item is to put some project pointer into the folder. At OS level I believe it's called "linking" of `inode`s to the directory.

The link types must then be:

- `ln`: creates a link between an existing file (elsewhere on the filesystem) and a given directory.
- `touch`: creates a new file and links it immediately.

In [13]:
def ln(d, path, inode):
    """ links project to path. """
    if path.endswith('/'): raise ValueError("path must be a filename in an existing directory.")
    _, *route, filename = path.split('/')
    for i in route:
        if i not in d:
            d[i] = {}
        d = d[i]
    d[filename] = inode

In [14]:
ln(d, '/1.4/project blue', 'inode 6')
tree(d)

├── 1.1
│   └── 2.1
│       ├── 3.1: inode 1
│       ├── 3.2: inode 2
│       └── 3.3: inode 3
├── 1.2: inode 4
├── 1.3
│   └── 2.2: inode 5
└── 1.4
    └── project blue: inode 6


This also seems to work as expected.

Let's remove the file.

In [15]:
def rm(d, path):
    """ links project to path. """
    if path.endswith('/'): raise ValueError("path must be a filename in an existing directory.")
    _, *route, filename = path.split('/')
    for i in route:
        if i not in d:
            d[i] = {}
        d = d[i]
    del d[filename] 

In [16]:
rm(d, '/1.4/project blue')
tree(d)

├── 1.1
│   └── 2.1
│       ├── 3.1: inode 1
│       ├── 3.2: inode 2
│       └── 3.3: inode 3
├── 1.2: inode 4
├── 1.3
│   └── 2.2: inode 5
└── 1.4


Yep. It's clean.

The final exercise is now to move parts of the directory structure.

In [17]:
def mv(d, old, new):
    """ moves old to reside in new """
    if not new.endswith('/'): raise ValueError(f"{new} it not a directory")

    if old.endswith('/'):  # is dir:
        _, *route = old.split('/')
        filename = ''
    else:  # is file
        _, *route, filename = old.split('/')

    d2 = d
    for i in route:
        d2 = d2[i]

    _, *new_route, _ = new.split('/')
    d3 = d
    for i in new_route:
        if i not in d3:
            d3[i] = {}
        d3 = d3[i]

    if old.endswith('/'):  # is dir
        d3.update(d2.copy())  # handle names later.
        d2.clear()
    else:
        d3[filename] = d2[filename]
        del d2[filename]

In [18]:
mv(d, '/1.1/2.1', '/1.3/')
tree(d)

├── 1.1
├── 1.2: inode 4
├── 1.3
│   ├── 2.2: inode 5
│   └── 2.1
│       ├── 3.1: inode 1
│       ├── 3.2: inode 2
│       └── 3.3: inode 3
└── 1.4


Yes. All the nodes were moved correctly.

Let's reverse it:

In [19]:
mv(d, '/1.3/2.1', '/1.1/')  # move 3.1 from 1.3/2.1 into 1.1/2.1/
tree(d)

├── 1.1
│   └── 2.1
│       ├── 3.1: inode 1
│       ├── 3.2: inode 2
│       └── 3.3: inode 3
├── 1.2: inode 4
├── 1.3
│   └── 2.2: inode 5
└── 1.4


Let's also move `inode 5` to `/1.1/2.1/`:

In [20]:
mv(d, '/1.3/2.2', '/1.1/2.1/')
tree(d)

├── 1.1
│   └── 2.1
│       ├── 3.1: inode 1
│       ├── 3.2: inode 2
│       ├── 3.3: inode 3
│       └── 2.2: inode 5
├── 1.2: inode 4
├── 1.3
└── 1.4


At this point I have a decent dummy file system for my app.

Now comes the part where I'll arrange the rest.

To imitate the key-value store I'll have a simple dictionary, where the keys will have to be meaningful, so they don't get lost.

I will also need a number of IDs for quick access:

In [21]:
store = {}

# reserved keywords:
# 'user'
# 'user.{uid}.tree
# 'email'
# 'key'

def _new_key(name):
    if not name.startswith('key:'): raise ValueError
    store[name] = store.get(name,0) + 1
    return store[name]

def _new_user_id() -> int:
    return _new_key('key:max user id')

def _new_inode_id() -> int:
    return _new_key('key:max inode id')

def _new_folder_id() -> int:
    return _new_key('key:max folder id')


You may notice I use the `:` as separator. That's on purpose as is doesn't collide with `.` and `/` from the filesystem. More on that later.

Next is to add users. I'll need some globally unique identifier, so why not stick to email addresses?

In [22]:

def add_user(email):
    if ":" in email: raise ValueError("invalid email.")
    if f'emails:{email}' in store:
        raise ValueError("email already in use.")
    new_id = _new_user_id()
    store[f'user:{new_id}'] = email
    store[f'emails:{email}'] = str(new_id)  # reverse lookup.
    store[f'fs:{new_id}'] = {'root': {}, 'archive': {}}  # users file system 
    store[f'projects:{new_id}'] = {}  # {project id : project path}
    store[f'settings:{new_id}'] = {}  # other user settings.

These keys allow me to construct the target key directly as long as I know the user id.

Here are a couple of examples:

In [23]:
def users():
    return {k:v for k,v in store.items() if k.startswith('user')}


def home(user_id):
    return [f"fs:{user_id}"]

def _keys():
    return {k:v for k,v in store.items() if k.startswith('key')}

def view_user(email):
    user_id = store[f"emails:{email}"]
    d = {}
    for k,v in store.items():
        if ":" in k:
            key, uid = k.split(":")
            if uid in {user_id, email}:
                if isinstance(v, dict):
                    d[k] = v.copy()
                else:
                    d[k] = v
    return d


Let's try this out...

In [24]:
alice = 'alice@example.com'

In [25]:
add_user(alice)

try:
    add_user(alice)
    assert False, "adding the email twice is not good."
except ValueError:
    assert True


In [26]:
for k, v in view_user(alice).items():
    print(k,v)


user:1 alice@example.com
emails:alice@example.com 1
fs:1 {'root': {}, 'archive': {}}
projects:1 {}
settings:1 {}


In [27]:
store

{'key:max user id': 1,
 'user:1': 'alice@example.com',
 'emails:alice@example.com': '1',
 'fs:1': {'root': {}, 'archive': {}},
 'projects:1': {},
 'settings:1': {}}

As the user is linking projects to a directory structure, we obviously need to be able to add projects and put them into the file system.

In [28]:
def projects(user_id):
    return store[f"projects:{user_id}"]

def new_project(user_id, name, url, path="/root"):
    d = store[f'projects:{user_id}']
    nid = _new_inode_id()
    d[nid] = url  # {node id pointing to project URL}
    fs = store[f'fs:{user_id}']  # file system adding project URL to root.
    ln(fs, f'{path}/{name}', nid)


In [29]:
alice_id = store[f"emails:{alice}"]
print(projects(alice_id))
# there should be no projects

{}


Let's add a project:

In [30]:
new_project(alice_id, "project 1", url="http://example.com/project1")

In [31]:
fs = store[f'fs:{alice_id}']
tree(fs)

├── root
│   └── project 1: 1
└── archive


In the printout above we see that "project 1" was added to the root of the file system and below we see the contents of the key-value store:

In [32]:
store

{'key:max user id': 1,
 'user:1': 'alice@example.com',
 'emails:alice@example.com': '1',
 'fs:1': {'root': {'project 1': 1}, 'archive': {}},
 'projects:1': {1: 'http://example.com/project1'},
 'settings:1': {},
 'key:max inode id': 1}

At this point it works, but there are some short-comings:

- The file system has no rights management, so a user can technically seen delete the `archive` branch, despite that this is unintended.
- The user may change email (for example after name change), but this shouldn't corrupt the store or deny alice access to her files.
- In case that the store can't take the python `dict` as a value, I would have to use `json.dumps` first. Also a minor thing.

I think it's time to pause and ponder about real world restrictions...

(to be continued)