Make `.storage/` more VCS-friendly #370

OnFreund · 2020-04-22T14:47:18Z

Context

(I realize this is a contentious topic, but hopefully we can have a productive discussion based on the merit of the specific proposal)

The recent post about the future of YAML sparked a lot of heated debates. There is one topic in particular, though, that I believe can be resolved in a way that alleviates the concerns raised by the community, while keeping in line with forward progress and UI-based config for devices/services.

Many people use git (or any other VCS) for two main purposes - keeping track of history, and sharing configuration (some people also use it for backup, but I believe there are better ways to do that, so I'll leave that out). Currently the .storage/ folder is not too friendly to VCS systems, as it mixes configuration and state. I want to propose a combination of guidelines and product changes that will resolve this.

When it comes to tracking history, looking at my own instance, I see several files that change frequently without configuration changes:

auth - this file mixes configuration (users, groups) with state (refresh_tokens)
core.config_entries - this file is definitely configuration (even the name says so :) ), but several integrations (e.g. spotify), write semi-transient tokens to it.
core.entity_registry - this file is also mostly static, but I've witnessed two types of changes: icons changing, and capabilities switching from null to {}. The latter (hopefully) might be a one time transition, but the former is a serious concern. It also mixes a UI concern into backend configuration.
core.restore_state - looks like it's purely cache and can be excluded from VCS
mobile_app - looks like it's purely cache and can be excluded from VCS

Additionally, the existence of secrets in the first two, prevents the benefit of sharing configuration.

Proposal

I propose a combination of several ideas to address this:

Create a dedicated documentation page for guidelines for VCS tracking.
Within that page, recommend which files to exclude (e.g. core.restore_state, mobile_app)
Split the auth file by extracting the refresh_tokens section to a new file (and recommend excluding the new file from VCS).
Remove icons from the entity registry. Static icons can be configured via customizations, and dynamic icons (based on state) should be handled by the UI.
Create a permanent storage mechanism for integrations outside of config entries. This will allow the Spotify integration, for example, to constantly update the token, without updating the config entry.
Create a !secret like mechanism for config entries, so that a config entry can be shared without exposing secrets. The auth file can also use that mechanism for credentials. This is required for sharing, not for history tracking, but it's possible that whatever's left after extracting secrets is not useful by itself, so need to consider this carefully.

All of these steps I believe will work great together, but some of them can also be implemented independently to make the situation more VCS- friendly.

Consequences

The main consequence is that managing the .storage folder in a VCS becomes way more friendly. The downside is potentially added complexity with suggestions 3-6, but I believe it would be minimal. It's also going to take some time till integrations adapt to suggestions 4-6.

The text was updated successfully, but these errors were encountered:

iantrich · 2020-04-22T15:22:10Z

Nice write up and finding what I think is an agreeable compromise. I don't have a horse in this race, but the merits of the proposal seem sound to me.

balloob · 2020-04-22T17:50:43Z

Yeah, this sounds good. We can split the folders in .config and .storage. This will be a pain for rollbacks, so preferably we merge all changes within 1 release cycle.

auth: Refresh tokens are not config, but maybe long lived ones are?
mobile_app: apps register the attributes and capabilities of a sensor in advance, after that only send over state. So this is config except for state, which can be dropped.
entity_registry: we should not remove icons, it's part of the entity config. We use these to restore an entity if it hasn't been registered. Name + icon are used by users to identify entities. By default, icons are static and are made dynamic on the frontend based on device class + state.

Jc2k · 2020-04-22T18:27:32Z

The architecture nerd in me is all for a seperation like this, but I think we should think about the use cases some more maybe?

This wouldn't be a backup, as you would likely need encryption keys or tokens relegated to .storage. So what is the purpose of the Git repository?

Rolling back changes made in the UI? It's all hypothetical but I can see lots of murky cases like what if the change causes a corresponding change in .storage? Like a new token to be issued or similar. Or i pair something, rollback my .config with git. Then re-pair. But i only reverted .config and the integration left non-config state in .storage that conflicts with the new pairing attempt.
Sharing? I don't think our json storages will ever be sharable, will they? Would we pull out our security tokens etc to make them shareable? In we did that, would they live in .storage in that case? Or somewhere else? For me the more important question: Would we really encourage users to hand merge example files on github into their own like they do with configuration yaml?

If we can discount these 2 for now, then i'd like to know how @OnFreund is using Git and how the existing .storage makes that hard.

Coming at this from the other angle, are there features we could add to HA based on this use of VCS that would help all users? Like an undo buffer? Or event log entries for changes?

Jc2k · 2020-04-22T18:33:32Z

(BTW for the sharing use case, maybe something like i suggested here and here would work better?)

OnFreund · 2020-04-22T19:17:38Z

@balloob

long lived tokens - for history tracking, having them as config is good enough, but for sharing it isn't. Depends on how far we want to go with this. I'm not convinced that there's enough to "share" in auth and core.entity_registry once you take out all the secrets, but I'm probably not the best person to advocate on behalf of the people who use VCS for sharing purposes.
mobile_app - it's not just state, it's also icon and attributes. Maybe there are others as well, but those are the ones I found.
I'm not sure I understand the reasoning for why icon is not a UI concern, but as long as it's static it's ok. As fo dynamic icons, I agree that the desired behavior should be calculated in the frontend based on device class + state, but right now you'll see dynamic icons in core.entity_registry. The main culprit once again seems to be the mobile app - for example the activity sensor can change its icon from mdi:walk to mdi:run, the battery level sensor can change its icon from mdi:battery-20 to mdi:battery-charging-30, etc...

BTW another advantage I just thought of for the !secret like mechanism is that it could simplify replacements, i.e. when having to remove a config entry and add it again.

@Jc2k I personally use history tracking mainly to go back and understand my reasoning as I'm making changes. I've had several cases where I almost made the same stupid change to an automation condition, for example, only to look through history and realize my mistake. Comments could help (AFAIK they can't be used in automation as the UI editor would overwrite them), but they represent a snapshot in time so they're no replacement for a full history. Another use case is when I need to replace a device - it might take some time between removing the config entry and adding a new one, and having the history with all of my settings (config entries are not just secrets) makes the transition smoother. Other users who track the history have other use cases. I'm sure each and every one of these cases has a solution that can be developed in HA, but:

It's a lot of work to identify all of these use cases
It's a lot of work to implement the solutions
VCS are already good at tracking history and cover all of these use cases
Code not written does not have bugs :)

As for sharing, I scrub my files before sharing anyway, so I'm not the right person to ask, but I can see the benefits just by browsing through the forums and seeing both a lot of configuration shared, and a lot of people requesting sharing ability for .storage. (I would also love to feel safer when scrubbing).

As for your Kubernetes inspired suggestion, I really think it's awesome, and I would love to have it. It still, at least for me, won't be a replacement for version history, and I would like to store that file you're suggesting in a VCS :).

Jc2k · 2020-04-22T20:55:25Z

Thanks for the reply @OnFreund, it was really helpful 👍

I agree if i had manifests, I would put them in git, definitely! It's good to put them in Git. I would probably put them in my self-hosted GitLab CD and deploy on commit (after CI..).

I think I should say i'm not advocating against version tracking, as a developer I absolutely rely on it constantly. But I can't help but think tracking .storage is maybe the wrong way around. Tracking an output rather than an input.

I think the proposed solution actually means you can't use the full power of Git to manage your config. You can't really use it to rollback config changes because those changes might involve state data that we have banished to a different folder. You can't use merging, because you are only supposed to be writing via HA API's. So it's like having a partial backup with a really good mechanism for for looking at the diffs and a way to record the rationale (in git commit messages). I don't mean to seem too negative here - I really can still see why this would be a really powerful and useful tool, but can you see why I feel like we might be able to do better too?

I think the manifests approach works really well here for your VCS use cases from what you've said. I really need to find some time to make a prototype! You'd be tracking the source for your changes, not an output of your changes. You'd be able to rollback changes by reverting to an old manifest and applying the old manifest over the top. You'd be able to comment changes you were making (its yaml again). Theres no state so you can share them. If you wanted to share them and they had secrets in you could use a template system on them without adding complexity to HA. If you were keeping them in your own private self-hosted GitLab (guilty) then you could just commit them as is. Heck you could maintain a public repo of manifests and accept pull requests against them, and be able to merge them and apply them against your own install, no trying to merge JSON or recreate the change via the UI.

For the sharing question is, if we remove secrets and state from .storage and rename it .config, the point i'm really trying to make is - will it be a solution for sharing? Certainly it will be "better", but is it a solution? I'm not arguing that we shouldn't support sharing, i'm arguing that sharing .storage isn't a good way of supporting sharing.

RE:

Code not written does not have bugs :)

Took the words out of my mouth, I live by this rule! But I felt I needed to say I think this cleanup of .storage will be quite a few code changes too 😉 😛

balloob · 2020-04-22T21:19:39Z

Wow wait a second, we should never share .storage. It's the part of the config that is secret and your data.

If you want to separate state and config a bit more, I can understand that. But to make it shareable, no. It should not be shared.

Jc2k · 2020-04-22T21:32:15Z

@balloob I totally 100% agree. But apparently people do edit and share parts of them! I managed to find a couple in GitHub. Nothing especially bad - lovelace config stood out.

One of the use cases / motivations for the original request is to make it more safe to share (see item 6 in original proposal). I don't think thats the right direction to go in either.

But what do you think? If there was a .config and it had all keys etc stripped from it, and users could share them, would that be sensible? I don't think it would be. And that's why im babbling here :-)

nickrout · 2020-04-22T21:33:55Z

From reading the never ending "future of yaml" thread on the forum, there is a lot of confusion between backup (via VCS) and sharing. Or should I say that people talk about git in both contexts and the conversation gets confused.

OnFreund · 2020-04-23T05:29:54Z

As I mentioned, I too am skeptic that it would be useful to share .storage, but a lot of people do want that and I think their voices should be heard.
I'm putting sharing aside, and the rest of this comment is solely about history tracking:

@Jc2k I completely agree RE tracking inputs rather than outputs, and I really do think your manifest approach could be awesome, and can't wait for it. We're completely aligned on the "perfect world" solution, but it's still far along and in the mean time, there is a series of steps we can take, many of which have other benefits, and I think the external impact is mostly minimal:

Suggestions 1 & 2 are documentation changes and would be relevant even in the perfect world you're describing.
Suggestion 3 is relatively simple, and is a one time change with trivial maintenance cost.
Suggestion 5 is also relatively simple, and while it does have a maintenance cost associated with it, I think it's minimal. I also think it could help and/or simplify some integrations even without the VCS motivation.
Suggestion 4 (handling of dynamic icons in the UI, rather than the entity registry), is, I believe, the most challenging from a maintenance perspective, but I believe it's the right approach regardless of VCS. Backend should be about state, and frontend about the presentation.
The suggestion @balloob added about removing the state from the mobile_app file sounds like it also is not going to have any maintenance cost, and will only simplify code.

So overall, a very limited impact on the codebase, but a huge impact on the ability to track history. I'm obviously biased, but I think it's worth it :)

Jc2k · 2020-04-23T16:50:45Z

After shelving the sharing element of your request (thank you) I don't disagree that this change would be architecturally nice, and I would support it simply because a good seperation is cleaner. Should we update the ticket accordingly (remove the stuff about sharing)?

I don't think we could advertise in a release "and now artifacts managed by Home Assistant are git friendly", it implies too much about hand editing them doesn't it? Can it be re-framed as a code clean up task only?

Please don't underestimate the developer cost here. The change might be worth doing but it involves data migrations, they are not free. There are integrations that you don't use that need to be changed too if uniformly applying the rules. It might have a light cost on a fresh install and it might have a light maintenance cost after it is fully complete. The migrations need writing and then then the code needs to support both until enough people have upgraded. Developers need to be mindful of both in the meantime. And when they are removed inevtiably some people's stuff will stop working (the "long tail").

I have started work on the perfect world by the way. I am fed up of HA tickets about yaml, and maybe this will stop some of them. It's changing hourly at the moment. It's quite cute running the manifest on the commind line and seeing the UI update live, HA really is cool. Please do ticket some of the items of config you find VCS most useful for and i will prioritise those tasks. If you genuinely want a tool like this, do encourage me because it's not something I need for myself.

balloob · 2020-04-23T17:50:21Z

a lot of people do want that and I think their voices should be heard

A lot of people also wanted us to keep supporting Python 2. It's an opinion and does not have to be honored. In the end, we need to be able to maintain Home Assistant and that's impossible if we listen to everyone.

So yes let's drop the sharing aspect, as when that would drive this ticket, I suggest we go ahead and close this.

If this is about splitting config and runtime info, there's some merit to achieve that, but it will also be an enormous operation that in the end will benefit users how exactly compared to spending all that time on improving Home Assistant for everyone?

OnFreund · 2020-04-23T17:55:07Z

@Jc2k

I don't think we could advertise in a release "and now artifacts managed by Home Assistant are git friendly", it implies too much about hand editing them doesn't it?

I agree, that's why the title is make it more friendly. It's a spectrum and there is no end state after which we say it is, and it'll never be as friendly as the old YAML files, and that's ok. I don't think the title needs any changing.

Please don't underestimate the developer cost here. The change might be worth doing but it involves data migrations, they are not free.

I'm definitely not underestimating them, but eventually in software, maintenance cost dominates upfront cost, at least in most projects. I understand that there's effort involved in implementing these suggestions, but I do believe that their maintenance costs are low.

There are integrations that you don't use that need to be changed too if uniformly applying the rules

I believe that my suggestions are mostly backwards compatible. The only integration that needs changes right away is the mobile app, and it's because it's the culprit of many of these issues, not because of backwards compatibility concerns. Unless I'm missing anything here, I don't think the scenario you're describing for migration is needed.

I have started work on the perfect world by the way ... It's quite cute running the manifest on the commind line and seeing the UI update live, HA really is cool.

I'm really excited by this, and yes, 100% agree that HA really is cool :)

Please do ticket some of the items of config you find VCS most useful for and i will prioritise those tasks

Personally, I like a history of every configuration that I change. In terms of re-applying changes, I think the most common use case is having to remove an integration and re-add it (for various reasons - malfunctioning hardware, etc...). This entails renaming all of the created entities as well (both entity id and friendly name), renaming devices and putting them in areas, etc... This is also true for hub like integrations (e.g. z-wave), where you might need to replace a node. In the hub case, the integration itself doesn't change, but the devices and entities do. In any case, maybe this deserves to be in a separate ticket?

If you genuinely want a tool like this, do encourage me because it's not something I need for myself.

I wouldn't want you to develop this for me. There's an entire community that's craving for something like this.

OnFreund · 2020-04-23T18:03:59Z

@balloob

A lot of people also wanted us to keep supporting Python 2. It's an opinion and does not have to be honored. In the end, we need to be able to maintain Home Assistant and that's impossible if we listen to everyone.

I'm not saying it should be honored, I'm saying it should be heard. Eventually if the decision it to not support sharing, I, personally, am totally fine with it.

So yes let's drop the sharing aspect, as when that would drive this ticket, I suggest we go ahead and close this.

Most of this ticket is not about sharing - it's about separation of configuration and state, with history tracking being an important motivation for it.
My personal opinion, for what it's worth, is that suggestions 1-5 offer a good balance between effort and benefit, and most importantly, have minimal maintenance costs, so there's no "tax" that we have to keep on paying with future development.
You mention that this would be an enormous operation, and @Jc2k also supports that view. Looking at my suggestions, I fail to see the enormity (with suggestion 4 being a possible exception). What am I missing?

balloob · 2020-04-23T18:08:04Z

Things are easy if you design a new system, but we'll need to move everything to a new folder. But where would that folder be! It can't be the config folder, as it's state. So now we're introducing a new folder, but people don't have that mapped in their docker container so we won't persist it?

Also your suggestions are limited to storage. What about the database, Z-Wave and other integrations that store their stuff in config? Oh and then the missing integrations thatuse storage, like ZHA is missing.

You'll need to audit and migrate every integration.

OnFreund · 2020-04-23T18:12:52Z

Moving state to a new folder is not part of my suggestions. The config folder always mixed configuration and state, but each file was only one of them (with the exception, perhaps, of things like the z-wave xml file, but that's external and beyond our control). If we drop the requirement for separate folders, most integrations are unaffected, and my suggestions are mostly backwards compatible.

I'm not looking for a perfect solution - I'm looking for a better solution.

balloob · 2020-04-23T19:29:35Z

But to what end? If it is not to completely separate config and state there is no purpose.

OnFreund · 2020-04-24T07:04:55Z

As long as each individual file is either configuration or state, but not both, you can track only the configuration files using your VCS.

Jc2k · 2020-04-24T10:29:33Z

We aren't saying it's hard because we are lazy or something.

From a clean solution point of view, if you value knowing which files are state and which are config it is obviously better to put them in seperate folders, as it decreases the mental tax of knowing which files are for config and which are not. I don't want to have to use a cheat sheet when developing to remember what is and isn't OK in each file, i want a piece of code that writes to the config store and a piece of code that writes to the state store. So if we do this it should be seperate folders.

And from my point of view the location of the file doesn't make it hard exactly, even if it doesn't help. It's splitting the data between files, writing a migration to do that, having to maintain code to read from both places during the deprecation period. Fielding the user tickets when people upgrade after the deprecation period and wonder why it broke.

If we adopt this as an architecture thing and make it a rule for all integrations to follow that does imply looking at the integrations you don't use as well. So you might have identified a handful of cases, but thats not the extent of Home Assistant is it.

Each seperate integration might be putting refresh tokens in their config entry, thats not something thats controlled in one place. It might be one file that you are caring about, but each integration is responsible for its own config entry. That means reviewing each integration that uses config entries and understanding enough that the encryption keys in e.g. homekit_controller are long term, but the ones in some other integration aren't. There are 115 integrations that uses the config flow framework, and no way to tell without checking them all.

BTW, right now the config flow code is really nice, it's a joy to develop against. If we wanted it to stay that way we'd probably want to build in support for having an integrations state linked to its config entry. That would be needed for all sorts of reasons - things like making sure that it was removed when the config entry was removed. If we didn't bake it into the framework each integration would have to do it for themselves. Even if not every integration did need to do it, we could still end up with 20 different implementations of the same thing, each having its own quirks and bugs.

Icons getting written to the entity state file? Well the problem there is each integration is responsible for its own icons. So it might "just" be batteries and the mobile app, but there are at least 17 battery implementations each handling their own icon. Maybe some other (non-battery) integrations change their icons and none of us in this ticket use those integrations so we don't know.

There are also in the region of 40-50 pieces of HA that have their own file in .storage, how many of those don't effect you? But if we are enforcing a rule as a point of our architecture, don't we have to check them too?

pvizeli · 2020-04-24T10:37:30Z

As long as each individual file is either configuration or state, but not both, you can track only the configuration files using your VCS.

That will be never possible, since .storage work like a database and each files are tables with IDs they reference to other data. A config change can affect multiple files to same time and because they are async in background, you can't be sure that you snapshot just a single change.

What ever you try to archive with VCS and storage is not possible. The logic of HA is growed, there is not flat system anymore. The only way is snapshot/restore like we use on Supervisor, a selective rollback is not possible or need implement as logic abstraction on top of storage handler which is also not VCS style.

OnFreund · 2020-04-24T11:04:04Z

@pvizeli

That will be never possible

I disagree. I believe it is possible, and I showed the steps required to achieve it in my proposal. If you think I missed anything, you're more than welcome to show me what it is.

frenck · 2020-04-24T11:21:47Z

I'm sorry, I have to agree with @pvizeli, @OnFreund.

While a separation of states/config would help in maintaining a configuration in a version control system, it is still not meant to be controlled from a repository.

Considering that, and the dynamic environment Home Assistant has, there will always be changes happening in both state and config.

IMHO it is an illusion to say: I can restore my configuration by checking out from git. Even with the proposal set in this architectural issue.

OnFreund · 2020-04-24T11:24:49Z

@Jc2k hold on a second, I think you're reading too much into my proposal.

First of all, I definitely do not think you're lazy (what gave you that impression??), but it goes both ways - I invested time and effort in due diligence here, to come up with suggestions that would maximize the positive impact, while minimizing the undesired impact.

if you value knowing which files are state and which are config it is obviously better to put them in seperate folders

I agree, but software engineering is an art of compromise. Specifically, my suggestions achieve a lot of the desired benefits, while reducing the cost, yet still keeping us on the path to full separation if we want to go there in the future. I can understand this argument if my proposal were to take us in the other direction (i.e. if it made a future split of folders more difficult), but that's not the case. Additionally, the config folder already mixes configuration files with state files, but it was good enough for history tracking.

I don't want to have to use a cheat sheet when developing to remember what is and isn't OK in each file

You don't - you should never write directly to files. There are APIs for that. You don't know that the config entries you're writing to get stored core.config_entries, and that's going to stay like this.

And from my point of view the location of the file doesn't make it hard exactly, even if it doesn't help

That's a very black and white approach. What I'm suggesting does help, quite a lot, and is on the path to full separation if we want to achieve it in the future.

It's splitting the data between files, writing a migration to do that, having to maintain code to read from both places during the deprecation period. Fielding the user tickets when people upgrade after the deprecation period and wonder why it broke.

I specifically crafted my suggestions to be backwards compatible for integrations so that none of it is needed outside of core functions. Also, I'm pretty sure that people upgrading after a depreciation period would have worse problems on their mind than losing refresh tokens (the only permanent thing I'm suggesting we move).

If we adopt this as an architecture thing and make it a rule for all integrations to follow that does imply looking at the integrations you don't use as well. So you might have identified a handful of cases, but thats not the extent of Home Assistant is it.
...
There are 115 integrations that uses the config flow framework, and no way to tell without checking them all.

I agree, but part of the point of the discussion here is to expose it. Additionally, since everything I'm suggesting is backwards compatible, these integrations aren't going to break - they simply are going to continue to write state to configuration files, and people who are affected by it will flag it and fix it through issues and PRs, with no lost functionality in between.

right now the config flow code is really nice, it's a joy to develop against.

I partially agree - it's a joy to write the flow itself, but really hard to test against, and testing is an important part of developing. I've already seen different integrations test completely differently, and the tests are all more verbose and indirect. I'm not complaining, and I understand this is work in progress, but I don't think we should discount testing here.

That would be needed for all sorts of reasons - things like making sure that it was removed when the config entry was removed.

I'm not sure that's a good idea. If removing an integration and re-adding it (unfortunately, it happens more often than I'd like), required a lot of reconfiguration, it would be hell.

If we didn't bake it into the framework each integration would have to do it for themselves. Even if not every integration did need to do it, we could still end up with 20 different implementations of the same thing, each having its own quirks and bugs.

But that's exactly what I'm suggesting in 5 - create a storage mechanism that integrations can use. Is your critique that implementing my suggestion would make it harder to implement it again? :)

Icons getting written to the entity state file? Well the problem there is each integration is responsible for its own icons. So it might "just" be batteries and the mobile app, but there are at least 17 battery implementations each handling their own icon. Maybe some other (non-battery) integrations change their icons and none of us in this ticket use those integrations so we don't know.

All the more reason to have this logic centralized. Once again, since everything is backwards compatible, no integration breaks, and the community can gradually transition integrations to use that logic.

There are also in the region of 40-50 pieces of HA that have their own file in .storage, how many of those don't effect you? But if we are enforcing a rule as a point of our architecture, don't we have to check them too?

I'm sure there are a lot that don't affect me. But once again, everything is backwards compatible - we set the guidelines, and the community can gradually get there. There are also many device/service integrations that still use YAML, but we have the guidelines to transition away from it.

OnFreund · 2020-04-24T11:26:00Z

@frenck

IMHO it is an illusion to say: I can restore my configuration by checking out from git. Even with the proposal set in this architectural issue.

You're putting words in my mouth - I never said this would be possible. I said that tracking history is important by itself, even without the ability to restore, and my suggestions get us there.

frenck · 2020-04-24T11:31:43Z

If the goal is just history tracking, that is technically already possible. But sure, splitting that would make it better.

So that leaves us with the sole reasons to do this:

Splitting because we feel that more static / less static things should not be combined
Splitting for the sake of tracking history in a version control system (which is not actually version control... as there is just version history, no control).

However, this still applies / are cons:

Still not meant for public sharing
Implementation time
An extra layer of things to take into account from a developer perspective. Which goes where?
Migration of the current things (which should not be underestimated, this takes a lot of time and effort).

Which makes me wonder if the juice is worth the squeeze. As it provides a small use case for a smaller group of people (that in general not are looking for version tracking, but actual version control).

🤷

OnFreund · 2020-04-24T11:39:23Z

Yes, there are cons, but I think the juice is definitely worth the squeeze here. First of all, reading through the forums, I think you're underestimating the size of the group.
Second, I think you're overestimating the cost - I feel like a broken record, but I'll repeat it once again: these changes are backwards compatible for integrations - every integration can make the transition in its own pace, and only integrations that mix configuration and state need to make the transition at all. Once we create the framework, I believe the community will take care of transitioning the integrations to use it, and if an integration is left unchanged, it's a sign that it's either not mixing state and configuration, or not enough to bother anyone. This is very similar to the process we're taking for transition from YAML to config flows, only simpler.
I do agree that some backend code (specifically the auth piece that's using refresh tokens), will have to migrate, but I think the overall cost/benefit ratio is small.

frenck · 2020-04-24T11:46:03Z

You don't have to repeat yourself on every response. I was just giving my opinion on the matter.

I don't underestimate the forums, we have some fine stats now based on the topics lately. And even considering those responses, a large part of them are not looking for history tracking, but version control. These are different things.

I think it is a fair question to ask if the juice is worth the squeeze. As for my answer to that question: it is not.

Considering it would not provide a restore path, and it is not made for sharing, it leaves history tracking.

That raises my personal curiosity: Why track the history of this? Is it auditing?

OnFreund · 2020-04-24T11:52:58Z

It's definitely a fair question to ask whether the juice is worth the squeeze, and that's the point of this discussion, but it's important that we understand what the squeeze actually is. I was repeating because I feel like the point is not getting across, and that the "squeeze" is assumed to be something way bigger than what I'm suggesting.

As for why track history - I answered this earlier:

I personally use history tracking mainly to go back and understand my reasoning as I'm making changes. I've had several cases where I almost made the same stupid change to an automation condition, for example, only to look through history and realize my mistake. Comments could help (AFAIK they can't be used in automation as the UI editor would overwrite them), but they represent a snapshot in time so they're no replacement for a full history. Another use case is when I need to replace a device - it might take some time between removing the config entry and adding a new one, and having the history with all of my settings (config entries are not just secrets) makes the transition smoother. Other users who track the history have other use cases.

frenck · 2020-04-24T11:53:48Z

So isn't the thing you are looking for: Audit logging?

OnFreund · 2020-04-24T11:56:30Z

I'm less concerned with naming and more with functionality. What is your specific suggestion?

Jc2k · 2020-04-24T13:11:32Z

It's that kind of detail that i'd like us to get into, so that we can reflect on the "squeeze" that may (or may not!) be lurking under the surface. Without looking at the details of OAuth2Session, the idea of updating it to support reading from both locations, and to only write to the new one, is appealing indeed.

I don't think the storage mechnaism in (5) needs to be hard, it can just be a Store() that uses a different folder (or even, just a different file), can't it? But then again, as @pvizeli said, it might be JSON on disk but to HA it's used like a database. Soo do we need to maintain referential integrity here!?

What i mean is, how do you see the life cycle of the refresh token being managed? If it lives on the config entry right now that means it gets removed when the integration does. How do we make sure that these OAuth refresh tokens get removed when the integration is removed? Is there an API to link state like this to a config entry so that HA can clean it up by itself? Or is it up to each integration to manage clean up of their own state? Does OAuth2Session already have a hook like that or would we have to modify each integration to add one?

OnFreund · 2020-04-24T13:59:16Z

I don't think the storage mechnaism in (5) needs to be hard, it can just be a Store() that uses a different folder (or even, just a different file), can't it?

Agreed. A different file with a reference to the config entry is what I had in mind. This is also how the registry and device entities are implemented.

What i mean is, how do you see the life cycle of the refresh token being managed?

"Refresh tokens" is an overloaded term. I use it to mean the tokens that are named as such in the auth file (in a section that I propose breaking off to a new file), so let's use the term "oauth tokens". As for their lifecycle - I once again propose we look at the device and entity registry. IIRC, when an integration is removed, the respective devices and entities are kept until HA is restarted (in case you want to immediately add the integration again. At least I remember an arch topic discussing this, but maybe this isn't the current implementation). The store can have the same life cycle, or whatever the current implementation for the registries is.

OnFreund · 2020-04-24T14:12:08Z

As for API, once again looking at the OAuth implementation, instead of calling:

self.hass.config_entries.async_update_entry(self.config_entry, data=...)

It could simply call:

self.hass.config_entries.async_update_entry_store(self.config_entry, key, value)

The store mechanism will completely abstract away the fact that it's written to another file, its format, etc...

Reading would be just as simple:

value = self.hass.config_entries.async_read_from_entry_store(self.config_entry, key)

The read implementation can also fallback on the config entry, or we leave that to the client to decide.

TaakoMagnusen · 2020-10-23T15:47:55Z

I want to share my support for this, and it's disappointing that there hasn't been any movement on this thread past 2 days after it's creation.

I think architecturally it makes sense to impose two top level directories state and config, each with two sub directories (maybe more as needed).

Within config there will be public and private:

public contains configuration you expect to share via VCS.
private contains configurations that you would not share such as secrets.yaml.

Within state there will be static and dynamic:

static contains state that is written automatically and you don't expect to change very often (or ever), such as zigbee node database.
dynamic will be used for home assistant to write stuff that changes often such as logging (startup logs, OZW logs, etc).

In addition to the above, i think we would be better off eliminating .storage entirely. All it does is add confusion. Why is the lovelace ui generated by the GUI live in storage as opposed to with all the other configuration files? Why can't it live next to the other .yaml files? It could even live side by side with ui-lovelace.yaml and just have a separate name so both could be loaded.

gsdevme · 2020-12-14T09:30:19Z

I have raised a somewhat related type issue in that .storage could support an external database solution which (at least for me) would remove this as a problem. #472

That will be never possible, since .storage work like a database and each files are tables with IDs they reference to other data.

My suggestion is to support a "real" database.

OnFreund · 2020-12-15T13:08:27Z

@gsdevme sounds like an interesting idea, although I think it's solving a different problem than my suggestions here.

gsdevme · 2020-12-15T15:19:34Z

Somewhat, if the "non" friendly VCS data was simply moved out into a real database though it would solve this @OnFreund

OnFreund · 2020-12-15T15:34:33Z

But that's only useful if you can move away the state without the config, and splitting them is what this ticket is about.

laurensV · 2021-09-28T07:24:05Z

I also want to share my support on this approach. Like @OnFreund said: its not a perfect solution but it is a better solution. The perfect world scenario (basically where we can put the input in code with something like manifest (@Jc2k staryed a repo for this but never finished unfortunately)) would be best, but that would take way longer so an intermediate solution where we can do better VSC for the output (files in .storage) would be a good start as the whole architecture of .storage is horribly designed for VSC at the moment. Would even be better if we dont have to VSC control any output, but we can just VSC the input. With the manifest approach: wouldn't it be best if even the UI config writes to such a file first and then applying that file? That way you can still do VSC control for UI changes, without having to write yaml (but you also have the possibility to write yaml manually)

Jc2k · 2021-09-28T08:07:11Z

You pinged me, so I have to add: One of the main reasons I didn't drive that manifest idea forward was the toxicity around the YAML/No-YAML conversation (not related to this ticket). I have no great desire to involve myself in that any more. I do still think that approach would be an easy way forward for the people who want to manage HA this way. But I don't want to build something I won't use myself for a small number of people who would only be begrudgingly using it.

If I had an interest in using it (and had worked on it at all since I last participated in YAML-wars) it would be usable by now. I think it is inaccurate to say it would take "way longer". It is just no one is working on any of the solutions, and no one has found a solution that the core team wants to be responsible for maintaining.

To reply to your ideas:

"Intermediate solution" is always a hard NO from me. HA is a large project both in terms of code size and in terms of users. Intermediate solution means having to do some work, then having to do some more work to undo that, then having to do even more work to do it the way it shouldn't have been at the start. And at all times considering the impact on our 86k+ users (which complicates the "undo" part).
The manifest idea is about using the same API's the UI uses, but providing a YAML based format and a CLI tool as an alternative to the UI. There's no particular need for the UI to write to a YAML file first with this scenario. In fact that would complicate the idea and make it take longer to implement.
I think seperating out data vs config etc is a good thing, but it needs doing properly. We can't half ass it.

OnFreund · 2021-09-28T08:13:01Z

"Intermediate solution" is always a hard NO from me

Software engineering is always about intermediate solutions, and HA has its fair share of examples. You rarely know the what "full solution" is in advance, and even if you do, it's usually impossible to get there in one step.

laurensV · 2021-09-28T09:39:15Z

One of the main reasons I didn't drive that manifest idea forward was the toxicity around the YAML/No-YAML conversation (not related to this ticket). I have no great desire to involve myself in that any more. I do still think that approach would be an easy way forward for the people who want to manage HA this way. But I don't want to build something I won't use myself for a small number of people who would only be begrudgingly using it.

Totally understandable. Toxicity can kill motivation, which is too bad as I still believe the manifest idea could be a great alternative to the UI (without having to make changes the core of Home Assistant)

There's no particular need for the UI to write to a YAML file first with this scenario. In fact that would complicate the idea and make it take longer to implement.
...
I think seperating out data vs config etc is a good thing, but it needs doing properly. We can't half ass it.

I agree that it is important and shouldn't be half assed, but it is hard to find the right solution. Ideally I feel a system where you could both use code and the UI to edit (the same) configuration would be best, but might indeed be complicated and will take longer to implement. But I disagree with the fact that there is no need for UI writing to a config file (whether it be yaml or something else that can be put in version control and could also be changed manually without the UI). Right now, the UI is writing config in JSON files stored in .storage (which is used as a database, and databases are hard to use in version control). A system where the UI is not writing directly to the database (.storage folder in HA case), but is writing the config first to configuration files that can simply be put in git, and then you could sync these config files with your database (.storage folder) would be best in my opinion.

Home Assistant is not the only (open source) system that is struggling with this. I had been involved in the past with the open-source CMS Drupal, and they had similar problems, where the UI is making config changes directly in the database. They developed a great system which they call "Features", where you can basically export your database config to files that you can put in version control and sync those files again with the database and see the differences between the code configuration and the database configuration. You can read more about it here, maybe it can give some inspiration: https://www.drupal.org/node/1585750

Jc2k · 2021-09-28T10:48:21Z

Thanks for the great reply @laurensV

The manifest idea is probably pretty good for allowing the UI and manifest files to manage the same configuration, and I don't think it would take a long time for the properly motivated person. After all, the API's exist, otherwise the UI wouldn't work! Just like making a change on the app on your phone is immediately visible on your HA instance on :8123, when the manifest client makes a change using those very same API's it too will be immediately visible in the HA user interface. And no restarts or partial reloads needed like when you manually edit YAML.

Because of this, the storage of that configuration is completely opaque. As long as the UI and manifest client use the same API's, it doesn't matter if you put the storage in a database, left it in JSON, or something else entirely. It means we can leave .storage alone and still have YAML manifests that are managed in VCS.

This is how the idea can be implemented without needing much help from the HA team.

                                                        ┌──────────────────────────────┐
                                                        │                              │
                               ┌────────────────────────┤      HA app on iPhone        │
                               │                        │                              │
                               │                        └──────────────────────────────┘
                               │
┌────────────┐           ┌─────▼──────┐                 ┌──────────────────────────────┐
│            │           │            │                 │                              │
│  .storage  │◄──────────┤   HA API   │◄────────────────┤  homeassistant.local:8123    │
│            │           │            │                 │                              │
└────────────┘           └─────▲──────┘                 └──────────────────────────────┘
                               │                                                                     ┌───────────────┐
                               │                        ┌──────────────────────────────┐             │               │◄─────┐
                               │                        │                              │◄────────────┤dashboard.yaml │      │
                               └────────────────────────┤    manifest apply -f *.yaml  │             │               │    ┌─┴───────┐
                                                        │                              │             └───────────────┘    │         │
                                                        └──────────────────────────────┘                                  │   VCS   │
                                                                                    ▲                ┌───────────────┐    │         │
                                                                                    └────────────────┤               │    └─┬───────┘
                                                                                                     │   hue.yaml    │      │
                                                                                                     │               │◄─────┘
                                                                                                     └───────────────┘

The expectation is that you'd be able to diff your config against the live system without actually applying any changes too.

I believe the manifest tool could be used export particular objects. The whole concept was modelled off kubernetes where one can kubectl get objtype objname -o yaml to get a manifest of any object in the system.

Also you'll be able to choose which parts to manage with manifests and which parts to manage with the UI, it's not all or nothing.

Because this is all happening via the API this can run on a different system to your actual HA. It doesn't need to access the file system. It doesn't need to SSH in. E.g. when i first started using HA i used GitLab to deploy my HA. It would actually SSH into the HA VM and run docker-compose commands there. It was quite annoying to set up the SSH part and run docker-compose over SSH (docker-compose didn't have ssh tunnel support back then, so it was entirely manual). With the manifest client it can run from any GitLab runner that can see the HomeAssistant API port, theres no SSH key, theres no shared file system, it just needs an API token. Much much simpler for CD setups like that.

Hopefully this explains why I don't think the UI needs to be directly aware of the storage format of .storage and neither do the manifests themselves. Actually the manifests would probably hide some of the complexity of the storage format where feasible. E.g. we wouldn't want to hardcode device uuid's in automations like the storage format does.

But I do think all of this is orthogonal to this ticket, and I don't think this ticket is a intermediate step towards manifests. The manifest idea is quite unrelated.

Unfortunately at the moment the core team isn't convinced by this ticket, and that hasn't changed from Apr 2020 AIUI. I don't think I can do everyone's concerns justice so I won't repeat them. And I don't want to put words in peoples mouths if I end up having to defend their positions. But at the moment it does seem like its unlikely to move forward.

Jc2k · 2021-09-28T11:06:13Z

"Intermediate solution" is always a hard NO from me

Software engineering is always about intermediate solutions, and HA has its fair share of examples. You rarely know the what "full solution" is in advance, and even if you do, it's usually impossible to get there in one step.

You are not wrong. It is definitely possible to define agile as a way of delivering successive intermediate solutions until a user story is satisfied, but that's not really where I was going. When A and B are completely unrelated (which in this case I think that they are), calling A an intermediate solution for B does not move the conversation forward, it just clouds it.

To be clear: I do reject this ticket as an intermediate solution for "B" (where B is my manifests idea), because it is not. But that is distinct from rejecting or not rejecting this ticket. I do think if this concerns raised could be resolved to everyones satisfaction it would be able to stand on its own two feet without conflating it with anything else.

OnFreund · 2021-09-28T11:14:32Z

I agree with that - these two solutions have very little in common, namely that there's some overlap in the requirements they satisfy. However, neither of them is a step towards the other, and both have their merit even if the other solution is already implemented.

gsdevme · 2021-09-28T11:22:47Z

One thing Im currently doing is running my own container image from Home Assistant whereby within the Entrypoint I write out the .storage/ from templates then mutate it. Naturally its brittle to change but has meant I can move my Home Assistant around and even use docker-compose locally to quickly test things before I deploy it to "Production"

For example I commit auth_provider.homeassistant in a /storage folder (non dot)

{
    "version": 1,
    "key": "auth_provider.homeassistant",
    "data": {
        "users": [
            {
                "username": "user",
                "password": "not_a_real_password"
            },
            {
                "username": "another_user",
                "password": "not_a_real_password"
            }
        ]
    }
}

Then on boot it copies what is basically my ".dist" files and the Entrypoint it runs

hass --script auth -c /config change_password user $HOME_ASSISTANT_PASSWORD_USER
hass --script auth -c /config change_password another_user $HOME_ASSISTANT_PASSWORD_ANOTHER_USER

This allows me to use my kubenetes secrets to hold the passwords and they are applied when the container boots and its "stateless" enough that I don't need to worry about migrating my .storage

This however is a total hack around the unsolved problem and has its own limitations as some of the json is difficult to reason with. I also benefit from (almost) every integration being MQTT so auto discovery means I don't need to drive any manual UI elements.

So I can run my Home assistant from my computer in "dev mode" basically and I can run it from my production k8s server without too much hassle.

laurensV · 2021-09-28T12:50:16Z

@gsdevme Even though it is a total hack, could you share your container image with me? Would love to have a similar setup as you where I can run Home Assistant with docker-compose on my computer in "dev mode" and on my k8s server in prod mode!

Hopefully this explains why I don't think the UI needs to be directly aware of the storage format of .storage and neither do the manifests themselves.

@Jc2k Great schema, that definitely explains it! I am quite new to HA and it is hard for me to determine how much work it would be to make the manifest a completely functional alternative to the UI, but would love to see this happen somewhere in the future, but like you said, more people probably need to be on board before that happens. With your schema in mind: What I was thinking before is putting configuration files in between the HA API and the .storage folder, so no matter if the change comes from the website UI, or the mobile phone app, it always gets stored in configuration (yaml) files (which you can then use for VCS), but that is probably really hard to achieve, while your solution can be achieved without having to even touch the core of HA. My ideal scheme would look like this, but probably impossible to achieve:

                                                                                 ┌──────────────────────────────┐
                                                                                 │                              │
                                                        ┌────────────────────────┤      HA app on iPhone        │
                                                        │                        │                              │
                                                        │                        └──────────────────────────────┘
                                                        │
┌────────────┐           ┌────────────┐           ┌─────▼──────┐                 ┌──────────────────────────────┐
│            │           │            │           │            │                 │                              │
│  .storage  │◄──────────│*config.yaml│◄──────────┤   HA API   │◄────────────────┤  homeassistant.local:8123    │
│            │           │            │           │            │                 │                              │
└────────────┘           └────────────┘           └─────▲──────┘                 └──────────────────────────────┘
                                ▲                       │                                                        
                                │                       │                        ┌──────────────────────────────┐
                                │                       │                        │                              │
                         ┌──────┴─────┐                 └────────────────────────┤    custom UIs/Others         │
                         │            │                                          │                              │
                         │     VCS    │                                          └──────────────────────────────┘
                         │            │                                                                          
                         └────────────┘

PS. completely unrelated question: How did you make that text based schema?

gsdevme · 2021-09-28T13:08:08Z

@laurensV Sure, its here https://github.com/gsdevme/home-assistant

https://github.com/gsdevme/home-assistant/blob/3a95ed609c03c18281899d91fde4cbb2a26e0ef6/Dockerfile#L13-L17

https://github.com/gsdevme/home-assistant/blob/3a95ed609c03c18281899d91fde4cbb2a26e0ef6/entrypoint.sh#L4-L14

So when running locally its just

➜  home-assistant git:(master) docker-compose up
Creating network "home-assistant_default" with the default driver
Creating home-assistant_ha_1 ... done
Attaching to home-assistant_ha_1
ha_1  | First boot of .storage (<---- writes the bootstrap to skip the manual GUI options)
ha_1  | Password changed
ha_1  | Password changed
ha_1  | Testing configuration at /config/
ha_1  | [s6-init] making user provided files available at /var/run/s6/etc...exited 0.
ha_1  | [s6-init] ensuring user provided files have correct perms...exited 0.
ha_1  | [fix-attrs.d] applying ownership & permissions fixes...
ha_1  | [fix-attrs.d] done.
ha_1  | [cont-init.d] executing container initialization scripts...
ha_1  | [cont-init.d] done.
ha_1  | [services.d] starting services
ha_1  | [services.d] done.

frenck · 2023-05-11T13:24:43Z

This architecture issue is old, stale, and possibly obsolete. Things changed a lot over the years. Additionally, we have been moving to discussions for these architectural discussions.

For that reason, I'm going to close this issue.

../Frenck

laurensV · 2023-05-11T14:48:09Z

@frenck seems to me this issue that configuration is not VCS friendly is still a problem (this hasn't changed in newer versions of home assistant). Probably too hard to fix, but definitely not obsolete.. Do you know if there is already a discussion about where to store configuration in the new discussions? If not I might make a new topic there..

frenck · 2023-05-11T15:02:34Z

There has been no activity on this issue in years. It has been closed for that reason.

../Frenck

parautenbach · 2023-05-11T16:10:59Z

I only became aware of this initiative as part of today's clean-up. I'd be interested in a solution to this – discussed here or elsewhere. :-)

OnFreund · 2023-05-14T08:41:19Z

Created a discussion: #901

frenck · 2023-05-14T21:06:33Z

Please let's not make duplicates. Just asking was an option too....

I'll migrate and reopen as a discussion once I'm back at my desk.

../Frenck

Jc2k mentioned this issue Jun 19, 2020

Allow contributors to (optionally) add YAML configuration on device Integrations #399

Closed

nitobuendia mentioned this issue Jun 20, 2020

Create a CLI to import existing and future YAML configs into .storage entities #401

Closed

gsdevme mentioned this issue Dec 14, 2020

Support external RDBMS or NoSQL DB for the use case of .storage #472

Closed

frenck closed this as not planned Won't fix, can't repro, duplicate, stale May 11, 2023

home-assistant locked and limited conversation to collaborators May 15, 2023

frenck converted this issue into discussion #902 May 15, 2023

This issue was moved to a discussion.

Make .storage/ more VCS-friendly #370

Make .storage/ more VCS-friendly #370

Comments

OnFreund commented Apr 22, 2020 • edited Loading

Context

Proposal

Consequences

iantrich commented Apr 22, 2020

balloob commented Apr 22, 2020

Jc2k commented Apr 22, 2020

Jc2k commented Apr 22, 2020

OnFreund commented Apr 22, 2020

Jc2k commented Apr 22, 2020

balloob commented Apr 22, 2020

Jc2k commented Apr 22, 2020

nickrout commented Apr 22, 2020

OnFreund commented Apr 23, 2020 • edited Loading

Jc2k commented Apr 23, 2020 • edited Loading

balloob commented Apr 23, 2020

OnFreund commented Apr 23, 2020

OnFreund commented Apr 23, 2020

balloob commented Apr 23, 2020

OnFreund commented Apr 23, 2020

balloob commented Apr 23, 2020

OnFreund commented Apr 24, 2020

Jc2k commented Apr 24, 2020 • edited Loading

pvizeli commented Apr 24, 2020 • edited Loading

OnFreund commented Apr 24, 2020 • edited Loading

frenck commented Apr 24, 2020

OnFreund commented Apr 24, 2020 • edited Loading

OnFreund commented Apr 24, 2020

frenck commented Apr 24, 2020 • edited Loading

OnFreund commented Apr 24, 2020

frenck commented Apr 24, 2020 • edited Loading

OnFreund commented Apr 24, 2020

frenck commented Apr 24, 2020

OnFreund commented Apr 24, 2020

Jc2k commented Apr 24, 2020 • edited Loading

OnFreund commented Apr 24, 2020

OnFreund commented Apr 24, 2020 • edited Loading

TaakoMagnusen commented Oct 23, 2020 • edited Loading

gsdevme commented Dec 14, 2020 • edited Loading

OnFreund commented Dec 15, 2020

gsdevme commented Dec 15, 2020

OnFreund commented Dec 15, 2020

laurensV commented Sep 28, 2021 • edited Loading

Jc2k commented Sep 28, 2021

OnFreund commented Sep 28, 2021

laurensV commented Sep 28, 2021 • edited Loading

Jc2k commented Sep 28, 2021 • edited Loading

Jc2k commented Sep 28, 2021

OnFreund commented Sep 28, 2021

gsdevme commented Sep 28, 2021 • edited Loading

laurensV commented Sep 28, 2021

gsdevme commented Sep 28, 2021

frenck commented May 11, 2023

laurensV commented May 11, 2023

frenck commented May 11, 2023

parautenbach commented May 11, 2023

OnFreund commented May 14, 2023

frenck commented May 14, 2023

This issue was moved to a discussion.

Make `.storage/` more VCS-friendly #370

Make `.storage/` more VCS-friendly #370

OnFreund commented Apr 22, 2020 •

edited

Loading

OnFreund commented Apr 23, 2020 •

edited

Loading

Jc2k commented Apr 23, 2020 •

edited

Loading

Jc2k commented Apr 24, 2020 •

edited

Loading

pvizeli commented Apr 24, 2020 •

edited

Loading

OnFreund commented Apr 24, 2020 •

edited

Loading

OnFreund commented Apr 24, 2020 •

edited

Loading

frenck commented Apr 24, 2020 •

edited

Loading

frenck commented Apr 24, 2020 •

edited

Loading

Jc2k commented Apr 24, 2020 •

edited

Loading

OnFreund commented Apr 24, 2020 •

edited

Loading

TaakoMagnusen commented Oct 23, 2020 •

edited

Loading

gsdevme commented Dec 14, 2020 •

edited

Loading

laurensV commented Sep 28, 2021 •

edited

Loading

laurensV commented Sep 28, 2021 •

edited

Loading

Jc2k commented Sep 28, 2021 •

edited

Loading

gsdevme commented Sep 28, 2021 •

edited

Loading