Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hierarchy, folders, authorization and nested (parent-child) relationships #18

Closed
joepio opened this issue Nov 2, 2020 · 5 comments
Closed
Labels

Comments

@joepio
Copy link
Member

joepio commented Nov 2, 2020

This is going to be a mess of various thoughts regarding hierarchies - sorry for the lack of structure

On desktops, we generally use folders and sub-folders for various goals:

  1. Identification: the path of a file is often also its ID - it tells the machine where the file can be found
  2. Authorization: setting rights - whether a user or program has read / write access
  3. Categorization: grouping related stuff together.
  4. Disk space management: calculating size / identifying data usage culprits.
  5. Navigation & Intuition: useful for humans for finding some file about some subject.

If Atomic Data is to be just as useful as a Unix filesystem or a Google Drive, we need to find solutions for the earlier mentioned five goals, too.

  1. Identification: The ID is obviously the Subject of the resource. This is the URL. There are no real restrictions on Atomic URLs - as long as they resolve using HTTP(S) or IPFS. If a resource moves and changes its URL, the owner must redirect to the new location.
  2. Authorization: Doing authorization without some hierarchical model seems almost impossible. Although a multi-parent model could work fine (if its additive).
  3. Categorization: We could use a tag like system. For this.
  4. Disk space management: When counting things
  5. Navigation & Intuition: useful for humans for finding some file about some subject.

So let's check out some approaches to hierarchy

Approaches to hierarchy

Classic Unix-style file/folder model

This is the one we're most familiar with. Each files has a path, and only one parent.
The five goals in the intro describe where we use these paths for.
That's a lot of responsibilities for a single string, but that has some merits:

  • Simple mental model. It's easy to reason about where a file is and who has rights.

But it also causes issues:

  • Changing IDs. If we move a file, we can no longer find it by its ID. That doesn't fare well on the web, we want Cool URIs. Google Drive solves this by using UUIDs in URLs, instead of paths. This leads to ugly, nontransparent URLs, though.
  • Having a single parent limits how we can organize a file. For example, you might want to have picture inside your personal vacation 2019 folder, but one on your public timeline. You could copy it, but that wastes space and makes it harder to manage items from one place.

Nested tagging

In this approach, resources can be linked to Tags. A Tag is like a parent folder, but its a 1-n relationship. Every resource can have multiple Tags - contrary to how folders work in most systems. Google drive does use a tag-like model, though. A folder can be placed in multiple places.

Tags can be nested, like folders. Should circular tags be possible? If they are, then implementations need to be very much aware of this, to prevent getting stuck in some loop.

These tags could be easily used for authorization. If you want to find out if an agent can read something, check the tag of the resource. Then, check the agents-read-access property (an array of Agents), and check if the requester is present in that array. If that is not the case, check the groups-read-access property (an array of Groups), and check if the requester is present in that group.

We could use an additive rights model for authorization. Check all the tags of a resource (and all parent tags of these tags) to find out whether the user has the correct rights. If the correct rights are present in any of the tags, you're good to go.

Folders are often also used for calculating disk space. This might be a bit harder with tags - you don't want to count items with multiple tags in each parent. So how do you decide how big a tag is, in bytes? One solution is to have like a 'main' tag, which perhaps is simply the first tag. Only that one is counted.

@joepio
Copy link
Member Author

joepio commented Feb 17, 2021

I've been working on atomic-data-browser for a couple of weeks, and playing around with that made me feel even more hungry for some hierarchy. Some things that I want:

  • I want to be able to find stuff by navigating a hierarchy
  • When I open a resource I want to know where it is. I want to see its parent, see where it lives. I want a breadcrumb bar.
  • When I open my app, I want to quickly see the most important types of content. I currently ordered this by generating a set of collections, which are fully dynamic and class dependent, but this fails whenever I make a new type of class.
  • I want to share a bunch of stuff (a space / a folder / a tag) with other people. Not just for viewing, but also for editing.
  • I want to constrain some pars of hierarchy to only contain certain things
  • I want to see all my stuff in a notion-like sidebar, where I can collapse irrelevant items.
  • I want to spend as little as much time possible on thinking about hierarchy when creating something new.
  • I want to see which folders are eating up most of my storage space
  • I want to be able to follow a thing (parent) and all its sub-things (children), receive notifications when any changes.

Approaches

Every item has a parent, only tags (folders) can be parents

  • Simple to understand, will feel very familiar to any OS / dropbox / google drive
  • Can be easily mapped to existing filesystems (for dropbox / drive like synchronization with desktops)
  • Easy to implement consistent views for

Every item has a parent, any resource can be a parent

  • Every time you make a new resource, you have a decent default parent: the currently shown item!
  • Means that in terms of views, we kind of need every type of view to see children, or constantly have the ability to see children.
  • Allows for using non-folder parents, which can be very useful. For example, comments may be children of some News Article, even though that news article is not a folder.
  • Mapping to traditional filesystems may be hard. Maybe create a folder and add a file in there with the same name, or index to act as the 'main' resource?
  • Creating

Parents are optional

  • Will lead to items that may be unfindable, which can be difficult when finding disk usage culprits, for example.
  • Easier to create content - no need to specify / think about parent.

@joepio
Copy link
Member Author

joepio commented May 3, 2021

I think that making every item parentable is a pretty clean solution. Doing this means that every Atomic Data Server needs a root node. As a term, I feel like root is a bit too technical. I prefer drive, as a root node should feel like a hard drive.

But how should we order the rest of the content? Maybe the Drive has children, which are calculated dynamically, and we use the existing collections as children?

@joepio
Copy link
Member Author

joepio commented May 3, 2021

Rights and grants (read / write access)

We'll definitely use the Hierarchy model to provide (or possibly restrict) write and read access to resources. How should this work?

Additive model

I think we'll need an additive model, which means that parents can only add permissive rights to resources. This helps to keep performance decent - if anywhere in a tree we see that the correct grant is present, we can execute the command. We don't have to evaluate the entire tree. This does, however, mean that it will not be possible to prevent children to be more restrictive than parents. There are situations where this will feel awkward. For example, when dealing with comments on something. We might think of comments as being children of some parent commnt / thread, but we generally don't allow the OP to get edit rights to all its comments.

Properties

  • Have a read and edit array containing Agents (maybe groups later on). These can be present on any resource. They are checked before returning a resource or applying a commit.

@joepio
Copy link
Member Author

joepio commented May 3, 2021

Folders

Even though any resource can be a Parent, we might want to use folders, too. These might contain items that are foreign, so they are not necessarily strictly children of the Folder.

@joepio
Copy link
Member Author

joepio commented Jun 30, 2021

I've been thinking about bidirectionality. For now, childrens describe their parents - but parents do not have to set their children. Parents can add rights, and in that case it only makes sense if children describe their parents, not the other way around. Otherwise, we could have some new resource that maliciously pretends to be the parent of a child, and then gains write rights to something. Not good, so we need children to be in charge of who their parents are. Yay, children!

But... Sometimes, we need to find children by navigating parents. For example, when opening the navigation sidebar in the Atomic Data Browser. Currently, we calculate the children at runtime, basically by querying the database. This has some performance cost, so at the moment we only do this for Drive resources (which are the top level items that appear in the sidebar). This, of course, does not cover all usecases for finding children, because all resources can have children!

I think we have a couple of solutions:

Use a collection / endpoint for finding children

If the front-end needs to know who the children are, simply create a Collection and perform the query. This means front-ends will need to have logic for doing this.

Only allow Drives / specific resources to have children

This means - treat Drives like folders in a Filesystem. Only these Drives / Folders can have children. The /commit handler should check if the parent is a Drive, and if it is not, it will error.

This is kind of limiting, because it often feels logical to have a parent relationship to something that's not a folder. Although, to be honest, I can't really think of important usecases at this moment. Hmm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant