Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Meta: Future file structure #340

Open
christianlupus opened this issue Oct 9, 2020 · 19 comments
Open

Meta: Future file structure #340

christianlupus opened this issue Oct 9, 2020 · 19 comments
Labels
dependent documentation Missing, unclear, or outdated documentation enhancement New feature or request question Further information is requested

Comments

@christianlupus
Copy link
Collaborator

christianlupus commented Oct 9, 2020

Currently the files are structured in a folder of the user. In this folder, one recipe.json and max two images are located.

I suggest to extend this file/folder structure to allow for more information to be stored while keeping with the main motivation of saving all information as files for easier backup:

I suggest to add one optional file meta.json to each folder. This folder can take different options that are related to other currently open issues here (e.g. #311). The exact structure of this meta file needs to be discussed. As it is only internal to the cookbook app, we are relative free in the data to be stored there.

This issue should serve as a discussion basis.


Depends on #1126

@christianlupus christianlupus added enhancement New feature or request question Further information is requested labels Oct 9, 2020
@christianlupus christianlupus added this to To do in Codebase refactory via automation Oct 9, 2020
@seyfeb
Copy link
Collaborator

seyfeb commented Oct 22, 2020

Thanks for opening the thread @christianlupus.

Let me try to summarize the discussion on this topic from #120. Both issues should be considered together as sharing is an important functionality (Nextcloud is developing into a collaboration platform). So when deciding for a storage solution it should probably be in that spirit.

Wishes

  • Performant recipe searches (full text)
  • Simple backup (w/o requiring database dumps)
  • Accessibility with external tools (e.g., for rendering recipes if there is no connection to the server possible)
  • Sharing recipes
    • as a common recipe, edits can be seen by both
    • as a copy
    • to the public/to other NC users

Suggested storage solutions

Pure database storage of recipes

  • storing all data in RDB tables / fields for fast querying
  • storing (schema.org-compatible?) JSON in special JSON column, additional data in additional fields

Pure (json-)file-based storage

  • storing recipe data only in JSON files
  • single or multiple JSON files possible (e.g., one following schema.org standard, second one with NC Cookbook extensions)

Mixed file and database storage

  • database is used for searching recipes (more performant than search through files)
  • database needs to be refreshed when something changes in the files or vice versa
  • files could only be a dump of the database as an additional backup

Some arguments

Pro database

Pro file-storage

  • easy backups
  • access possible without Nextcloud (external tools)
  • current solution

@christianlupus
Copy link
Collaborator Author

Thank you @seyfeb for the summary!

Considering the wished you formulated, the pure DB solution seems to be the worst candidate. Both access from external programs as well as easy backup is not really given.

Considering my experiences with reading files directly, I only see the single chance in creating a central JSON file with an index or something in that sense. Iterating through all files in the recipes folder takes on my test machine (SSD) something in the range of 30ms per recipe. This looks not too much but if your cookbook starts to grow to a few hundred recipes, this soon becomes significant, especially as it would be required for almost every HTTP request.
In fact we are simulating the typical use-case of a DB. So I am suggesting to use a software that was written for that solely purpose: A database system. Additionally to the currently stored values, I suggest to store all relevant data in the DB. As soon as a change is made, this can be applied both to the file structure as well as the DB (#301).
A few changes are needed on the DB schema as it is suboptimally designed at the moment. This is more related to sharing and the like and is better suited to #300 or a separate issue.

So, if we want to stick with our requirements/wishes, I fear the only way is to go with a DB plus separate files if there are no completely new ideas.


However we do things, I highly recommend to insert an abstraction layer if we start designing a new file interface. I am thinking of an abstract class that defines the CRUD operations on a complete recipe/file. Then we can later do migrations and changes in the file structure more easily without affecting the other parts of the app.

The current structure is to use the name of the recipe as a folder name. Inside this folder (in the configured recipe folder search path) there are a few files so far:

  • full.jpg: The full-scaled image
  • recipe.json: The JSON-encoded data of the recipe
  • thumb.jpg: A small image used as a thumbnail

There are more things that need to be stored with a recipe

This list is not necessarily complete. There might be more things, that need to be considered in the future. So, we should define a generic structure that allows for such extensions. I am thinking of having a resources folder xor a instructions folder (#324). The resources' filenames itself could be set to an integer or using a hash-based approach. One would have to think of the URLs stored in the JSON then, but that's a different story.

For metadata I suggest to add a file meta.json that could be arbitraryly formatted. The content has to be defined separately (carefully trying to minimize the amount of information there).

@christianlupus
Copy link
Collaborator Author

Ahh, and I forgot one thing. Just a quick addition: It might be a good point in time to consider extending to multiple cookbooks per person. Using that abstraction layer in place, one could have multiple folders representing multiple different cookbooks.

Regarding sharing that would allow to share a whole cookbook (ro and rw with add/delete) or per-recipe. Adding the feature to copy/move a recipe to another cookbook would allow to move from the shared one (or the shared with me cookbook for single recipe shares) to an own one for modification/... That would solve the link vs copy issue discussed in #120.

@seyfeb
Copy link
Collaborator

seyfeb commented Oct 23, 2020

I like the idea of having multiple cookbooks as this would give the user more flexibility. However, this might not solve the problem as you expect.

What if a user wants to have a recipe appear in multiple cookbooks? Then again the question arises: Are all recipes linked (and therefore changes of the recipe in a single cookbook propagate through all books), or do I create a copy of a recipe when adding to a different cookbook?

I think the second case should be easily solvable by copying the recipe folder and is represented in the design you proposed. The first use case is probably more difficult to solve. One possibility could be to create a folder for this recipe containing a single file representing a “recipe node” which contains information about the recipe location:

  • local in the same folder
  • remote (i.e., in a different folder)

I might be constructing a use case that nobody needs. Not sure about that, but we should at least actively decide not to support this ;)

Anyways, I would propose using node files as an alternative to the meta.json file. This is similar, but different, to inodes (we probably wouldn’t store owner and permissions data there). This file could contain all metadata and references to other files, such as the schema.org JSON and extensions. As @christianlupus suggested, the content itself should be stored separately.

Regarding the database question: The consensus seems to be to have both, a database and file storage, so we should probably settle on this. All advantages of having easily accessible data, simple backups, and fast searches seem to be possible.

@christianlupus
Copy link
Collaborator Author

I am thinking in an analogous way as files and folders are handled in NC. The relation is obviously Folder<->Cookbook and File<->Recipe. If we are discussing cookbooks in cookbooks is another open question. Maybe for later.
If user A shares a folder with user B, user B can access all files (and folders) within that folder. Changes mage by A are propagated to B's files. If B has writing rights, any changes done by B will also affect the files A sees. The same holds true for sharing files instead of folders in the first sight. If the user B decides to make a (local) copy of the file, he has the complete rights and the changes are no longer affecting A's data.
Using the same interpretations on the cookbook app, that would mean, we are linking the recipes during sharing (in fact we are using the very same data sets). What would be missing is a feature to

  1. clone a recipe to a new name in the same cookbook
  2. copy/move a recipe from one cookbook to another.

Regarding the meta.json: This was intended for cookbook app related extensions to the main json that are not covered by the shema.org standard. I am not sure if it makes sense to save the sharing information here as the notion of a user is merely lost if the database was lost.

Another point is the following: If A shares a cookbook/recipe with B this sharing is represented just by a link in the DB. I was thinking that way we could optimize storage (each recipe is stored only once), run time (recipes need to be indexed only once not once per user) and code simplicity (just use the folder from another user internally). This however causes the issue that for user B the main files app of NC does not know anything of these shared files. Thus in the local file structure synced by the client the recipes are not going to be visible. It might be possible to register the files to be shared in the main NC app but I would have to look this up to be sure.

The alternative would be to share the corresponding cookbook folder from the files app (so the file management is taken away from the cookbook app similarly to the current state but for multiple cookbooks possible). As we are basing on the files app, all users would have the same set of files (recipes) during normal file sync.
It would then be the responsibility of B (the receiver of the share) to move the share to an appropriate location in his file system (which is just some DB change done by files app) and register a new cookbook within the cookbook app and/or reindex the named cookbook.
As we do not know about the manual steps involved here (moving the shared file around), the cookbook app cannot help much and an intelligent algorithm to decide which recipes should be reindexed when is hard to formulate.

@seyfeb
Copy link
Collaborator

seyfeb commented Oct 23, 2020

Actually, I’m totally with you. The confusion probably comes from the fact that I was talking about a single user and you were talking about sharing between users. :)

Reusing recipes in multiple cookbooks

What I was trying to illustrate was the case when somebody wants to have the identical recipe in two different cookbooks. Two cookbooks means having two folders - one for each cookbook. Storing the recipe in one of the cookbooks can be handled as discussed (create a recipe folder and store all related data (JSON files, images, vids, etc) in the respective folder). Only how to link (not copy) the recipe to the second cookbook in a way that can be backed up (i.e., does not need entries in the database) would be less clear to me.

Recipe node

The idea of the node file was not to handle sharing between users. I totally agree that this should be done as you said - using the Nextcloud internals of file sharing. The idea was rather to have one file for each recipe that contains/links to all relevant information. For example, it determines the local location, i.e.: Is the recipe data located in the same folder or in a different folder (it would be something like a symlink then)? But the node file could also contain all data as you proposed for the meta.json and reference the schema.org JSON file.

Maybe the meta.json and the node JSON are actually kind of the same. I’m not sure about this right now.

Obviously this linking part does require some implementation stuff: what if the original recipe is deleted from a cookbook? -> A popup must inform the user that the recipe is used in cookbooks X, Y, Z and asked if it should be deleted from all cookbooks or only from a selection. Depending on the response the original data then might need to be moved to a different location.

In a first shot, the feature does not need to be there. But at least we would have a possible way to build such a feature later.

Reindexing Location of cookbooks

I agree that reindexing of cookbooks when the user is free to move them around his system is difficult. What if we require the user to have all his cookbooks located in a single folder? As a first step we could require a Cookbooks/ folder on the top level of the files system. In a future implementation we could allow the user to set a custom path for his cookbooks.

This would allow reindexing without the requirement to iterate over the complete Nextcloud content and look for recipe data.

@christianlupus
Copy link
Collaborator Author

OK, I see we have two different approaches here in mind.

Inode-like approach

We have a set of cookbooks and a set or recipes. These are unrelated in the first moment. Then associations are added that define, which cookbooks contain which recipes. This is similar to the data storage structure in most Linux file systems (see inodes).

The clear benefit of this approach is that the association defines the access rights, thus sharing between users is done easily once we have it running for one user. It is then merely a UI issue.

I doubt that the NC files app itself does support this file structure itself. So we need to make this inside the cookbook app. This has the additional drawback that external programs can no longer work on the synced file structure as the files are only saves outside the structure. The files app not recognize these things. [1]

The real location of the data could be certain locations:

  1. In the DB as every NC user has the rights to access it (using an authorized app)
  2. External to the common file structure like e.g. the group folders are stored separately, all users are treated equally then. Inclusion into the file tree might be possible but with quite some work from our side
  3. Inside on of the users folders, the recipe might easily be lost as the single source of truth might get removed/corrupted if the user plays around in his file system, file handling is necessary for us [2]

Folder-based approach

The main idea behind this approach is to consider a folder (in the user's file tree) to be a cookbook. Details can be discussed (e.g. should cookbook be allowed to contain other cookbooks?). Here are in fact two subtypes.

The benefit of this is a clear notion of the owner of a recipe.

Sharing the files/folders using official file app's process

Files folders are shared using the NC internal functionality.
In a first step, the sharing could be done manually as practiced nowadays by some users. Later, this could be simplified by the cookbook app to configure the receiving side during the sharing automatically.

This approach has the drawback that linking a recipe into multiple cookbooks of the same user will not be possible. When allowing cookbooks in cookbooks this impact could be reduced by including several in matching fashion. [3]

As everything is done with the knowledge of the files app, usage of external programs on local files is fulfilled trivially.

Sharing internally in the cookbook app

Here the files are saved in the file structure of the owner. The other apps do not know anything about the shared files as the sharing is mainly done in the cookbook app.

Clear drawback is that the other users do not see the files in their synced local file tree. It might be possible to overcome this by adding a sharing using the files app. However this is just an educated guess.

As we are not restricted to the possibilities of the files app, more sophisticated sharing might be possible (more tailored towards cookbook's use cases).

Footnotes

[1] The only exception to this rule I see would be to define a virtual dummy user that holds all files and these files are stared across the NC instance. However I feel this is breaking many architectural decisions made by the core team. So, let's forget that quickly.

[2] This is especially true, as we have no saying in changes to the files through the files app. The user might decide to remove the recipe from his account but rendering it unreachable from all other cookbooks (his and others users').

[3] Something like a cookbook Christmas bakery could be included in the cookbook Baking. If the containing cookbook (baking) contains the unison of all sub-cookbooks a recipe in Christmas bakery would also appear in Baking. Obviously, this is only possible in a tree-like structure and thus a certain restriction.

@seyfeb
Copy link
Collaborator

seyfeb commented Oct 24, 2020

So, I might not have been clear enough on this^^ I really would want to stick with the approach of having all data available in the filesystem (for the requested backup and offline-editing solution) just as you propose. What about this approach:

General setup

All cookbooks are located in /Recipes/ or /Cookbooks/, assuming / is the root of the users NC data. A single directory for all books allows easier iteration of the cookbook data, e.g., for updating or rebuilding the database. Regarding the folders below the Cookbook directory we have

  • Folders (with no recipe data) == Cookbooks
  • Folders (with recipe data) == Recipe

Sharing of whole cookbooks and recipes between users is handled using the NC file app by sharing the respective folder. The functionality for sharing can be exposed in the Cookbook’s interface with setting up the receiving-user’s directory later. I think at this point our approaches are identical and we should probably settle on such an approach.

Internal sharing between cookbooks (not users)

Now for the internal sharing between cookbooks of the same user. The node/meta.json file of a single recipe could contain information if it has to be synced with another recipe.

For illustration: Assuming we have four cookbooks Baking/Christmas Bakery/, Baking/Favorite Cookies/, Desserts/, and Soul Food/, and want to have the recipe Ginger Cookiesin every cookbook. The first three of them should be in sync, the fourth should only be a copy. Editing the recipe in one of the first three books should update the other two as well. Editing the fourth should only change the fourth.

Given this setup, the file Baking/Christmas Bakery/Ginger Cookies/meta.json could contain an array of linked recipes

{
	"synced_recipes" : ["Baking/Favorite Cookies/Ginger Cookies", "Desserts/Ginger Cookies"]
}

while Soul Food/Ginger Cookies/meta.json contains an empty array.

To prevent data loss (if the user accidentally deletes the "single source of truth") we could still have copies of the data (images, etc.) in all of the folders. Also, if a recipe is deleted from one cookbook (i.e., it is not available in the file system anymore) the entry from the synced recipes’ arrays can be deleted. The data of the linked recipes is available in the respective folders anyway.

Updating a recipe via the Cookbook app could automatically update all synced recipes. If a recipe is edited only locally (e.g., offline) without updating the recipes in sync, an update via the cookbook app would have to be done manually (or periodically). Which recipe has been updated could be checked by looking at the timestamp of the node/meta.json.

@christianlupus
Copy link
Collaborator Author

OK, so it seems we are setting a consensus to use the stock files app to save and share the recipes. The details might be discussed and will most probably be discussed in #120.

Regarding the issue with removing the files by the user, I rethought and maybe you are right: If the user is advanced (or silly) enough to fiddle around with the internal file structure (in destructive ways), he might be on his own. We cannot provide a second, third and fourth backup system in the app.

Assuming you have the cookbooks A, B, and C. The recipe should be located in A (just arbitrarily). Then what about the following structure?: In A you have a meta.json with

{
   "remote_clones": ["B", "C"],
   ...
}

And in B and C both only a meta.json with

{
   "clone": "A"
}

The only implication I see here is that a shared recipe would only be in A in the offline cookbook folder. The others would be just arbitrary JSON files as the linking/cloning is not understood.

Or we do a periodical sync of all recipe.json files. Here (especially with offline editing) we might get into real trouble if we get a split-brain/conflict scenario. No normal user might want to debug a JSON manually. Should we then simply overwrite the changes to reflect the one in the original cookbook (A)? What happens if the change was made on B and C? Which one will win?
Therefore I'd rather pledge for a single recipe.json file been visible in the synced files.

@seyfeb
Copy link
Collaborator

seyfeb commented Oct 24, 2020

There are multiple things.

Your example

If I understand you correctly, your example reflects what I was trying to propose earlier - having the files in a single location and having placeholder/links in the other locations.

As you said (and after thinking about this I would second the concern), only having the meta.json that tells you that a recipe is a clone of "A" and not having any data in the folder would collide with the approach of using the NC files app for sharing. If I share a complete cookbook with a second user and the cookbook contains clone references the user who receives the shared cookbook has no access to the recipe data.

Periodic sync

I did also see the problem of a "split-brain/conflict scenario". That is why I suggested to sync recipes based on the timestamp of the meta.json. But I guess one would have to make a complete diff of the recipe folder to find differences in recipe.json, images, videos, etc. If there are changes in only one of the folders, syncing might be doable.

It gets tricky if there are changes in both directories. I only see two solutions to this scenario: (a) Show a dialog with the differences to the user and let him decide which ones to keep; (b) unlink the clones, keep all as separate instances and maybe inform the user about this.

Single recipe.json

Having a single recipe.json, a single source for images, etc., as you said, would be ideal. However, I guess, this might be a decision against having linked clones.

Sidenote

I just tried to create a hard link on the file system level, share the link with a user and edit the shared file. The hard link did not survive ;)

@christianlupus
Copy link
Collaborator Author

If I understand you correctly, your example reflects what I was trying to propose earlier - having the files in a single location and having placeholder/links in the other locations.

Yes, mainly.
Although, the information might as well go just into the DB solely.

As you said (and after thinking about this I would second the concern), only having the meta.json that tells you that a recipe is a clone of "A" and not having any data in the folder would collide with the approach of using the NC files app for sharing. If I share a complete cookbook with a second user and the cookbook contains clone references the user who receives the shared cookbook has no access to the recipe data.

As I wrote, we must distinguish between sharing between users and linking between cookbooks with one user's data.

For the sharing with other users, the whole original folder (with all images, resources and recipe.json file) must be shared. On the receiver's side this file must be moved to the corresponding location for the cookbook app to find and detect it.

For sharing internally, I see no real chance. A single recipe.json will cause only a single file to be visible on the locally synced files for the user. Otherwise we might need to tweak the files similar to the groups folder app.

It gets tricky if there are changes in both directories. I only see two solutions to this scenario: (a) Show a dialog with the differences to the user and let him decide which ones to keep; (b) unlink the clones, keep all as separate instances and maybe inform the user about this.

That is exactly the split brain (two independent evolutions or the data) scenario.
I fear that doing a diff on these files might be quite involving. For the JSON it might be possible with quite some work but for binary files, we might need to be very careful.

I just tried to create a hard link on the file system level, share the link with a user and edit the shared file. The hard link did not survive ;)

Yes, this is obvious as there is an additional level of abstraction. This will cause really strange effects, as the main core pretty sure does not consider hard links to be even present. You might end up overwriting the inode and thus changing files "behind the back of NC" causing a complete can of worms to open.

Having a single recipe.json, a single source for images, etc., as you said, would be ideal. However, I guess, this might be a decision against having linked clones.

Not necessarily. We could keep the links withing the web view (stored in DB or meta.json or both). Additionally, when we allow cookbooks in cookbooks and associate any recipe in a sub-cookbook to the cookbook recursively, in the downloaded folder there is at least in the folder structure the JSON burried somewhere.

To get this issue finished, I suggest to get back on track. This whole discussion might well be better located in #120 as it is mainly the discussion of the requirements and wishes regarding sharing.

This issue was more related to the file structure within a single recipe. I see the main concerns of this issue as: What files should be saved in which location? What information should be stored where?

@seyfeb
Copy link
Collaborator

seyfeb commented Oct 28, 2020

You’re right, we got a little off track. Still, this specific use case might have (had) an influence on the structure of the file storage, so I guess that’s fine. Some last comments:

Not necessarily. We could keep the links withing the web view (stored in DB or meta.json or both). Additionally, when we allow cookbooks in cookbooks and associate any recipe in a sub-cookbook to the cookbook recursively, in the downloaded folder there is at least in the folder structure the JSON buried somewhere.

Recursive access of recipes - definitely. The only question which will arise is how to represent this sensibly in the UI. But that’s not for now.

Linkings in the web-view are fine. That’s what I was going for anyway. Let’s just remember: All information that we only keep in the database won’t be available for the requested “easy backup”.

BTT

I think we are closing the circle here. Based on the current file structure, we can simply add a meta.json which should have a structure that allows arbitrary extensions to the recipe.json. Features/data which we want to support in Cookbook but are not supported in the schema.org standard could go there. Additional files such as videos and images may be stored a the recipe’s subfolder.

File system level:

  • recipes are represented by folders which can contain
    • recipe.json (schema.org-compatible data)
    • meta.json (extendable file with data that is not compatible with schema.org)
    • data (images, videos, etc, ...)
  • cookbooks are represented by folders which can contain either
    • additional sub-cookbooks (nesting)
    • recipes

Use cases:

  1. Performant recipe searches (full text)
    • ✔︎ (possible if data is also stored in the database and data is kept in sync)
  2. Simple backup (w/o requiring database dumps)
    • ✔︎
  3. Accessibility with external tools (e.g., for rendering recipes if there is no connection to the server possible)
    • ✔︎
  4. Sharing recipes between users (further discussion in [Feature Request] Share recipe with other nextcloud users #120)
    4.1. as a common recipe, edits can be seen by both
    • ✔︎ (possible, via NC files app)
      4.2 as a copy
    • ✔︎ (possible, via NC files app)
      4.3 to the public/to other NC users
    • ✔︎ (possible, via NC files app)
  5. Sharing recipes between cookbooks of single user
    5.1. as a linked recipe, edits available in both cookbooks
    • ❓ (to be discussed, maybe in [Feature Request] Share recipe with other nextcloud users #120, maybe in a different (new) issue)
      5.2 as a copy
    • ✔︎ (possible by copying the folder)
      5.3 as a linked recipe in nested cookbooks
    • ✔︎ (possible by the nested cookbook folder structure, resolving necessary in the UI)

Number 4 as described above does not consider recipes stored as links (see 5.1). This would require further (future) investigation.

For syncing database/folders, it might be helpful to have a single access point in the file structure. For example, in the NC user root under Recipes/ or a user-defined location.

Please correct me if I missed something or got something wrong.

@christianlupus
Copy link
Collaborator Author

A new issue #364 came up recently. It should be implementable in a straight way in my intention: We could have a subfolder versions that contains all verions. The versions itself are just JSON file clones. To organize this I suggest to have the following example structure:

.
└── Baked Beans
    ├── full.jpg
    ├── meta.json
    ├── recipe.json
    ├── resources
    │   ├── 7a29e5455a048f33a77dfa2e7b4e659b.jpg
    │   ├── 8cc80ec7e26764f216d4830999dc2e5f.jpg
    │   ├── 996c4aa87369f1c8fcc11fcd60950ad3.jpg
    │   └── ab0b90a8e9dc6cfde7c7d5c8b858146f.jpg
    ├── thumb.jpg
    └── versions
        ├── Dad's special
        │   ├── full.jpg
        │   ├── meta.json
        │   ├── recipe.json
        │   └── thumb.jpg
        └── From reference cookbook
            ├── full.jpg
            ├── meta.json
            ├── recipe.json
            └── thumb.jpg

I think this matches with the latest summary from you, @seyfeb. Any comments/enhancements?

Regarding the single access point: I'd for now not restrict that. Let's keep it in mind but see how the implementation works out.

@christianlupus
Copy link
Collaborator Author

christianlupus commented Nov 3, 2020

OK, after some discussion in #364 I think I might need to reconsider my structure a bit. I will try to give a bit of structure.

Recipe folders

Such a folder contains exactly one recipe. The structure is proposed as following (similar to above):

.
└── Baked Beans
    ├── full.jpg
    ├── meta.json
    ├── recipe.json
    ├── resources
    │   ├── 7a29e5455a048f33a77dfa2e7b4e659b.jpg
    │   ├── 8cc80ec7e26764f216d4830999dc2e5f.jpg
    │   ├── 996c4aa87369f1c8fcc11fcd60950ad3.jpg
    │   └── ab0b90a8e9dc6cfde7c7d5c8b858146f.jpg
    ├── thumb.jpg
    ├── version.json
    └── versions
        ├── dad-s-special
        │   ├── full.jpg
        │   ├── meta.json
        │   ├── recipe.json
        │   ├── thumb.jpg
        │   └── version.json
        └── original
            ├── full.jpg
            ├── meta.json
            ├── recipe.json
            ├── thumb.jpg
            └── version.json

All files related to a recipe (recipe.json, full.jpg, thumb.jpg as well as resources) are copied to all version subfolders (here dad-s-special and original). That way this folder structure can be synced with the main files app.

New files are the /version.json and /versions/*/version.json. The files /versions/*/version.json should contain information about the version's human-readible format (e.g. Dad's special for Christmas). Additionally, there should be a history hash (see below) and optionally more data (TBD).

Cookbook folder

A cookbook is represented by a cookbook folder. Such a folder is special in the sense that it has a hidden folder .cookbook in it. The .cookbook folder will contain all internal data and be described below.
Apart from that (hidden) folder, there are other folders. These are either recipe folders or other (nested) cookbook folders. Recipes part of a nested cookbook are considered part of the parent cookbook as well.

Here without the content of the folders

.
└── Recipes
    ├── .cookbook
    ├── Recipe A
    └── Recipe B

Cookbook storage folder .cookbook

The .cookbook folder contains a meta.json for some human-readible description and other meta data. Data needs clear specification.

Additionally, there is a folder history. Within that there are multiple folders to define the history of the recipes in the cookbook. The name of these version folders is a hash of some data (details are TBD). The version.json in the recipe folder above points to the corresponding hash to identify it quickly.
One optimization would maybe be to use folders for all possible first characters (e.g. [0-9a-f] as regexp) and thus get less folders to parse during reading.

The version can be stored fully or incrementally.

Fully stored version

This is especially needed for the very first version of a recipe. Each version contains the resources (including full.jpg and thumb.jpg) as well as the complete recipe.json and meta.json. If a file is not present, it is considered not part of the version.

The version.json is different from the version.json from the recipe folder. It allows to specify the parent version (as hash). Additionally, it needs to specify that the current version is to be considered a full one.

Incremental one

If new versions are stored it might be better of to store only the changes (especially for many changes in the JSON and the image been constant). A corresponding setting in the version.json must be set.
The diffing of JSON files (recipe.json and meta.json) might be carried out in a sensible way to generate a JSON-diff. Alternatively we could just replace the JSON with a new one.
Any other files are considered new or changed resource files. Removed resources need to be registered in the version.json.

That way we can have layered structure where one (virtual layer) reuses the files from the layers below as long as the file was not changed (think of stacked file systems).

Changed recipes

Once a recipe has to be changed (for whatever reason, detected by whatever method), a new version in the .cokbook/history folder should be generated. The changes since the last version should be save according to above rules and the version hash of the recipe in the recipe folder should be updated accordingly.

Regarding the hashed data for a version, I suggest something depending on

  • parent hash
  • hashes of all involved files (or all existing files on virtual layer)
  • potentially date/user
.
└── .cookbook
    ├── meta.json
    └── history
        ├── <HashA>
        │   ├── full.jpg
        │   ├── meta.json
        │   ├── recipe.json
        │   ├── thumb.jpg
        │   └── version.json
        └── <HashB>
            ├── full.jpg
            ├── meta.json-diff
            ├── recipe.json-diff
            └── version.json

Finally, we should create a documentation of the file structure after the process to have a reference manual 🙄.

@seyfeb
Copy link
Collaborator

seyfeb commented Nov 3, 2020

These are interesting ideas. The proposal contains two new concepts:

  1. An automatic history of any recipe, which is appended with each recipe change.
    • The history of all recipes which are direct members of a cookbook are stored within the hidden .cookbook folder.
  2. Specifically tagged versions of a recipe.
    • These are stored in the versions subfolder of a given recipe and listed in the version.json.

Incremental vs. full storage

I would prefer to have an incremental storage of different versions to prevent too much duplicate data cluttering the system. Especially when using large or many images (e.g., if feature requests requiring more than a single recipe per image), or video files may be present.

However, I tend to storing changed files as a whole and not as a diff. This might be comparable to something like docker’s layered file system. Some files like images can’t be stored sensibly as diffs anyway. And textual files are pretty small anyway. This would have the advantage that each file is readable as a whole and potential errors when merging diffs won’t be a problem.

Notes

Some questions questions which were not immediately clear to me. I think the answers are contained in your post, but it may be helpful to have them stated as clearly as possible. I give it a try>

  • Is the complete recipe data repeated in each named (tagged) recipe or do they only reference the data in the .cookbook history?
    • Are the named versions linked to the history versions? Yes, in the version.json.
    • Can I retrieve the info how named recipe versions depend on each other? Yes, by iterating through the version.json files of the recipes, which contain the parent recipe’s hash. Should we also record child hashes?
  • Can one build trees of recipe versions (different branches developing in different directions)? Possibly yes. Checking out an old version and making changes should create a different branch.
  • Is there a special commit message or comment required or possible for
    • any edit
    • tagged recipes?
  • What happens if I move a recipe to a different cookbook? The recipe folders must be moved to the cookbook folder. All folders of the recipe in .cookbook/history/ must be moved to the new cookbooks .cookbook/history/ folder.
  • How are recipes identified? By their name (which might change for different versions)? By an identifier created upon their first import or saving? Is the proposed history related to the recipe versions or is this an independent concept?
  • Is the history folder structured further? No it contains folders for all recipes which are direct members of the cookbook (not nested cookbooks).

Is this correct?

Documentation

Finally, we should create a documentation of the file structure after the process to have a reference manual 🙄.

I think you have already created a good starting point, although you got me confused for a second with the Baked Beans cookbook :D

@christianlupus
Copy link
Collaborator Author

One night later I think it makes sense to name the files in /.cookbook/history/<hash>/version.json maybe better commit.json. Just to distinguish from the version.json in the recipe folder.

About the incremental vs full lstorage, you are perfectly right. I would suggest to go with incremental as much as possible. Just thee very first commit must be full. specially the binary data I would anyways store completely all the times.

  • Is the complete recipe data repeated in each named (tagged) recipe or do they only reference the data in the .cookbook history?

They are repeated (sort of, see below [1]). This allows read/usage in 3rd party apps on the synced files.

  • Are the named versions linked to the history versions? Yes, in the version.json.

Correct

  • Can I retrieve the info how named recipe versions depend on each other? Yes, by iterating through the version.json files of the recipes, which contain the parent recipe’s hash. Should we also record child hashes?

I am not yet sure about a double linked list (aka child commits). This might speed up things when iterating over the whole tree but requires sensible storing especially if a branch is cut off (deleted). We will have to keep everything in sync.
As this is something only related to the version.json/commit.json, we can add it later if need arises.

  • Can one build trees of recipe versions (different branches developing in different directions)? Possibly yes. Checking out an old version and making changes should create a different branch.

Exactly that was the idea behind the structure

  • Is there a special commit message or comment required or possible for
    • any edit
    • tagged recipes?

Yes, in the version.json/commit.json both commit message and comment can be written for any edit/commit.

I suggest to make at least a message required for human readability if commits are made manually [2].
Comments can be attached later.

For the branches I'd say yes as well. In the /versions/<versionid>/version.json file we can add a readable name and a description for that branch/variant of the recipe.
I do not call it a tag [3].

  • What happens if I move a recipe to a different cookbook? The recipe folders must be moved to the cookbook folder. All folders of the recipe in .cookbook/history/ must be moved to the new cookbooks .cookbook/history/ folder.

Yes that is right unless the recipe should start a new history from scratch.
The same holds true btw when sharing single recipes.

Anyways, we will need a regular garbage collection to remove old entries in the history similar to the current database reindex approach.

  • How are recipes identified? By their name (which might change for different versions)? By an identifier created upon their first import or saving? Is the proposed history related to the recipe versions or is this an independent concept?

Here we need to differentiate between the recipe folder and a recipe itself.

The recipe folder might be identified by a simplified name of the latest recipe version (replace all special chars with dashes or so to avoid issues with the file name). The recipe is per definition inside the recipe folder. This makes manual identification of the recipes easy for those syncing and running 3rd party code.
I'd say the name and thus identification can change over time (if no conflicts happen at the time of the change). As long as the history entries do not have a clear back-reference to their recipes (aka git branches), I see no problem there.

The history is referenced by the hashes. Thus these serve well for identification.

[3]: I'd say (purely linguistically): A version or commit is a change of a recipe that has envolved over time.
On the other hand, a branch/variant/head identifies a special instance of the recipe allowing to save a certain state.
I tend to exchange these during writing quickly so I try to write commit and branch to make explicit what I am writing of.
I do not call it a tag as a tag does not change over time. It is set once and kept forever. Maybe a user will pose a feature request there later but this should not be that hard to realize.

  • Is the history folder structured further? No it contains folders for all recipes which are direct members of the cookbook (not nested cookbooks).

Yes, any recipes in nested cookbooks have their history in the nested one. So we need to identify first the closest cookbook and then look up the history files there.

The only structure I suggest to have a deeper folder structure I already mentioned:
history/
├── 0
│   ├── 0
│   ├── 1
│   ├── 2
│   ├── 3
│   ├── 4
│   ├── 5
│   ├── 6
│   ├── 7
│   ├── 8
│   ├── 9
│   ├── a
│   ├── b
│   ├── c
│   ├── d
│   ├── e
│   └── f
├── 1
│   ├── 0
│   ├── 1
│   ├── 2
│   ├── 3
│   ├── 4
│   ├── 5
│   ├── 6
│   ├── 7
│   ├── 8
│   ├── 9
│   ├── a
│   ├── b
│   ├── c
│   ├── d
│   ├── e
│   └── f
├── 2
│   ├── 0
│   ├── 1
│   ├── 2
│   ├── 3
│   ├── 4
│   ├── 5
│   ├── 6
│   ├── 7
│   ├── 8
│   ├── 9
│   ├── a
│   ├── b
│   ├── c
│   ├── d
│   ├── e
│   └── f
├── 3
│   ├── 0
│   ├── 1
│   ├── 2
│   ├── 3
│   ├── 4
│   ├── 5
│   ├── 6
│   ├── 7
│   ├── 8
│   ├── 9
│   ├── a
│   ├── b
│   ├── c
│   ├── d
│   ├── e
│   └── f
├── 4
│   ├── 0
│   ├── 1
│   ├── 2
│   ├── 3
│   ├── 4
│   ├── 5
│   ├── 6
│   ├── 7
│   ├── 8
│   ├── 9
│   ├── a
│   ├── b
│   ├── c
│   ├── d
│   ├── e
│   └── f
├── 5
│   ├── 0
│   ├── 1
│   ├── 2
│   ├── 3
│   ├── 4
│   ├── 5
│   ├── 6
│   ├── 7
│   ├── 8
│   ├── 9
│   ├── a
│   ├── b
│   ├── c
│   ├── d
│   ├── e
│   └── f
├── 6
│   ├── 0
│   ├── 1
│   ├── 2
│   ├── 3
│   ├── 4
│   ├── 5
│   ├── 6
│   ├── 7
│   ├── 8
│   ├── 9
│   ├── a
│   ├── b
│   ├── c
│   ├── d
│   ├── e
│   └── f
├── 7
│   ├── 0
│   ├── 1
│   ├── 2
│   ├── 3
│   ├── 4
│   ├── 5
│   ├── 6
│   ├── 7
│   ├── 8
│   ├── 9
│   ├── a
│   ├── b
│   ├── c
│   ├── d
│   ├── e
│   └── f
├── 8
│   ├── 0
│   ├── 1
│   ├── 2
│   ├── 3
│   ├── 4
│   ├── 5
│   ├── 6
│   ├── 7
│   ├── 8
│   ├── 9
│   ├── a
│   ├── b
│   ├── c
│   ├── d
│   ├── e
│   └── f
├── 9
│   ├── 0
│   ├── 1
│   ├── 2
│   ├── 3
│   ├── 4
│   ├── 5
│   ├── 6
│   ├── 7
│   ├── 8
│   ├── 9
│   ├── a
│   ├── b
│   ├── c
│   ├── d
│   ├── e
│   └── f
├── a
│   ├── 0
│   ├── 1
│   ├── 2
│   ├── 3
│   ├── 4
│   ├── 5
│   ├── 6
│   ├── 7
│   ├── 8
│   ├── 9
│   ├── a
│   ├── b
│   ├── c
│   ├── d
│   ├── e
│   └── f
├── b
│   ├── 0
│   ├── 1
│   ├── 2
│   ├── 3
│   ├── 4
│   ├── 5
│   ├── 6
│   ├── 7
│   ├── 8
│   ├── 9
│   ├── a
│   ├── b
│   ├── c
│   ├── d
│   ├── e
│   └── f
├── c
│   ├── 0
│   ├── 1
│   ├── 2
│   ├── 3
│   ├── 4
│   ├── 5
│   ├── 6
│   ├── 7
│   ├── 8
│   ├── 9
│   ├── a
│   ├── b
│   ├── c
│   ├── d
│   ├── e
│   └── f
├── d
│   ├── 0
│   ├── 1
│   ├── 2
│   ├── 3
│   ├── 4
│   ├── 5
│   ├── 6
│   ├── 7
│   ├── 8
│   ├── 9
│   ├── a
│   ├── b
│   ├── c
│   ├── d
│   ├── e
│   └── f
├── e
│   ├── 0
│   ├── 1
│   ├── 2
│   ├── 3
│   ├── 4
│   ├── 5
│   ├── 6
│   ├── 7
│   ├── 8
│   ├── 9
│   ├── a
│   ├── b
│   ├── c
│   ├── d
│   ├── e
│   └── f
└── f
    ├── 0
    ├── 1
    ├── 2
    ├── 3
    ├── 4
    ├── 5
    ├── 6
    ├── 7
    ├── 8
    ├── 9
    ├── a
    ├── b
    ├── c
    ├── d
    ├── e
    └── f

Some more notes:

[1] The history should be considered fixed, once the commit has been made. Thus, by diffing the latest history with the work copy under the recipe folder, we can detect if there was a change made (just the detection, next steps to be taken are not yet defined). Here it comes to play that we need to decidee if any change should automatically generate a commit or only of the user clicks on a button or

[2] If we do auto-commits the commit messages might need to be something automatically generated. If the user uses the internal frontend, we can ask for a commit message during saving but for the 3rd party changes I see no chance to do so at all.

I think you have already created a good starting point, although you got me confused for a second with the Baked Beans cookbook :D

Uups. I updated.

@seyfeb
Copy link
Collaborator

seyfeb commented Nov 4, 2020

So for our current requirements, I see most problems solved with the proposed structure. Problems may arise in the future when trying to implement recipe sharing (#120 and discussion above).

One night later I think it makes sense to name the files in /.cookbook/history//version.json maybe better commit.json. Just to distinguish from the version.json in the recipe folder.

That’s a good idea. I also like your term “variant” for the different branches. Maybe we should name the folders like that - "variants" instead of "versions".

[1] The history should be considered fixed, once the commit has been made. Thus, by diffing the latest history with the work copy under the recipe folder, we can detect if there was a change made [...]

What I don’t get is, what determines the "main recipe" in the recipe folder? Is this not just one of multiple variants? I guess, you propose this structure to comply to the old standard and to not break existing Cookbook-viewer clients?

The latest commit is based on one of the variants. When you say "the work copy under the recipe folder", are you talking about the corresponding variant?

Just to make sure: The recipe in the versions/variants folder is the status at the tip of a branch? What happens if I switch to an older commit of that branch and want to make that the one referenced in the folder? What if I add a new change/commit from there? What about the abandoned commits? Probably at some point this requires some UI to manage the tree.

Open questions

As far as I see, the open questions to be answered for the file structure are

  • Which data determines the recipe commit hash?
  • What is the content of the commit.json?
  • What is the content of the version.json/variant.json?
  • What is the content of the .cookbook/meta.json?
  • What is the structure of the recipe’s meta.json?

Questions that don’t need to be answered in this thread

  • When are commits created?
  • If there are automatically created recipe commits: Is a commit message required for automatic commits?
  • How does sharing recipes work? Especially considering the versioning? (I.e., is the history shared, too?) (see also [Feature Request] Share recipe with other nextcloud users #120 and discussion above).
  • Should a double-linked list be used for easy traversal of recipe versions?

@christianlupus
Copy link
Collaborator Author

[1] The history should be considered fixed, once the commit has been made. Thus, by diffing the latest history with the work copy under the recipe folder, we can detect if there was a change made [...]

What I don’t get is, what determines the "main recipe" in the recipe folder? Is this not just one of multiple variants? I guess, you propose this structure to comply to the old standard and to not break existing Cookbook-viewer clients?

I think there is small misunderstanding here.

The history can be seen mostly similar to a git history with the tree-like structure (no merges). This is only the history folder in .cookbook.

Unlike the classical git approach, where I have only one working copy that I can switch between, I am voting to have all branches checked out at the same time. This was due to the intention to allow 3rd party apps to access all branches not only one of them. How should changes be realized? Checking out another branch will cause maybe trouble when syncing etc.

Therefore my intention was for a recipe folder (see this comment's structure for a recipe folder) to guarantee a main variant workspace (think of the genderized git master, in Baked Beans/recipe.json). All other branches' workspaces are in the variants (good point to rename as well 👍) folders.
For 3rd party apps this are only 3 independent, nested folders that contain three complete recipes. Only we have the notion of different variants in the cookbook app.

What I meant by my statement with the diffing: For each variant (be it named or main) we know the latest commit state from the history. By comparing these history states with the current working copies, we can detect if a change to the workspace files has been made. Further steps might or might not be needed.

We could move the main branch to the variants folder but that would break current paths for 3rd parts apps.

Just to make sure: The recipe in the versions/variants folder is the status at the tip of a branch? What happens if I switch to an older commit of that branch and want to make that the one referenced in the folder? What if I add a new change/commit from there? What about the abandoned commits? Probably at some point this requires some UI to manage the tree.

I think this is not 100% fixed yet. Of course, some UI will be needed. For the NC app the following is valid: I was a bit inspired by onshape (a CAD program I use from time to time for 3D printing). You can open there a side panel where a branch structure of the commits is depicted. I'd suggest to add a link to allow a user to view each individual version and to create a new branch of it. For all branches there need to be a way to edit each one. But this is only the UI issue around the whole thing.

So from a backend perspective, I suggest to allow commits only to branches (no dangling ones). I assume this is what you mean by abandoned commits?

Open questions

* What is the content of the `commit.json`?

Here come a few suggestions (might be changed again):

  • parent commit hash
  • NC user id (commiter)
  • timestamp
  • description
  • type of commit (full vs diffed)
  • list of removed files (only in diffed commit)
  • manual commit or autocommit
  • optional array of comments on version (can be defined later)
  • optional array of reviews (?, defined later)
* What is the content of the `version.json`/`variant.json`?
  • latest commit hash
  • human readible variant name
  • Optional description
* What is the content of the `.cookbook/meta.json`?
  • version of the cookbook (to allow distinguish later changes)
  • Human readable description/name
* What is the structure of the recipe’s `meta.json`?

Keep it empty for now, ready for any extensions, where we might need additional data

* Which data determines the recipe commit hash?

Most probably the required fields in the corresponding commit.json. About the date we might want to think but otherwise we could run into issues of duplicate commit hashes.

Questions that don’t need to be answered in this thread
* Should a double-linked list be used for easy traversal of recipe versions?

This is the one question that might really be answered during implementation. But it is purely an optimization issue for the backend.

@christianlupus christianlupus moved this from To do to In progress in Codebase refactory Nov 7, 2020
@seyfeb seyfeb changed the title Meta: Future file sturcture Meta: Future file structure Jan 23, 2021
@christianlupus christianlupus moved this from In progress to To do in Codebase refactory Jan 18, 2022
@christianlupus christianlupus moved this from To do to In progress in Codebase refactory Jan 18, 2022
@christianlupus christianlupus moved this from In progress to To do in Codebase refactory Jan 18, 2022
@github-actions
Copy link

This PR/issue depends on:

@christianlupus christianlupus added the documentation Missing, unclear, or outdated documentation label Aug 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependent documentation Missing, unclear, or outdated documentation enhancement New feature or request question Further information is requested
Projects
Development

No branches or pull requests

2 participants