Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: changing zone leaves system in inconsistent state until restart of the node #20

Open
gerner opened this issue Nov 22, 2023 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@gerner
Copy link
Collaborator

gerner commented Nov 22, 2023

Contact Details

nick.gerner@gmail.com

Version

v2.0 - v2.8

System Type

AREDN node

What happened?

meshchatsync loads its zone name from config once at startup. changing the zone name doesn't restart meshchatsync so it's syncing the old zone. this makes the client very confusing.

What browsers are you seeing the problem on?

Chrome

@gerner gerner added bug Something isn't working needs triage labels Nov 22, 2023
@gerner
Copy link
Collaborator Author

gerner commented Nov 22, 2023

I think the issue is stuff like this:

messages_db_file = messages_db_file_orig .. "." .. zone_name()

there are global variables that get initialized once when a file is loaded (e.g. when meshchatsync starts) that include the zone name (and I suppose other config parameters). When the config changes, those variables are still holding on to stale state.

It's even worse because some state is loaded fresh when it's needed. When that happens we might, for example, write messages from one zone into a messages db file for a different zone.

When a user loads meshchat, that's going to try loading a different messagedb file which is not being updated by meshchatsync.

Maybe meshchatsync should get restarted when the config changes. But how to manage that? I guess we could also make meshchat sync stateless, so every loop it does refreshes all its state. Trying to hunt down every piece of state and call that touches config sounds error prone to me.

@hickey
Copy link
Owner

hickey commented Nov 22, 2023

A couple of question to clarify the above and think about a better solution.

First agree that the config is read once at startup of meshchatsync. In the second paragraph you state that some of the state is loaded as needed. I have not looked at the code yet and uncertain what you are referring to. Can you provide an example or two? I suspect that there is a DB version variable written to a file. Is this what you are referring to?

Just on the surface I could see where the config is read each time a sync starts. It would be great if iNotify libs are installed under OpenWRT and even better if there is a LUA interface for accessing them. Not certain of either at this moment. But it really is not much on the resources to re-read the config file each time during the sync. It would just be nicer to read it when it is really necessary.

But then the bigger question I have is what to do when the zone does change. The existing message database is no effectively junk and should be discarded. Should this be automatically done? What if the zone name change was an accident, should the message DB be wiped? Or should it be a procedure for changing the zone name where the admin manually needs to go wipe the message DB?

My gut says that it should be a procedure for the admin to perform, but it would be nice if MeshChat just did the appropriate things and made it a trivial change for the admin.

@gerner
Copy link
Collaborator Author

gerner commented Nov 23, 2023

Here's an example where config is NOT re-loaded. Notice that zone_name(), which does read the config file, is called when this file is pulled in, and used to construct the message db file name:

messages_db_file = messages_db_file_orig .. "." .. zone_name()

And an example of where it IS re-read. zone_name() is called when evaluating neighbors for nodes in the same zone. So it's going to re-read the config file every time it gets called:
https://github.com/hickey/meshchat/blob/master/meshchatlib.lua#L195

for reference, here's zone_name(): https://github.com/hickey/meshchat/blob/master/meshchatlib.lua#L58-L67

Regarding zone change and the message db: great news! the database file is named WITH the zone name. So if you change zone name, (and if we fix this bug), a new message db file will be created. But the old one gets saved, so if you switch back, you'll just re-open that old file, all your messages are there waiting for you.

And, at least on my nodes, the message file lives on a temp partition so they all get wiped on node restart.

Regarding a fix: broadly I think the simpler solution of just re-loading config every time meshchat sync hits its sync loop is good. I've never used iNotify and it sounds like too many moving parts IMHO. The bigger issue is that "loading the config" is spread all over the code. Functions are just opening (possibly several) config file(s) when they need to. Maybe a refactor where the config file(s) is only read in a single place and we can just kick that load_config() method or something like that. I guess a hacky workaround would be to re-load meshchatlib.lua if that's a thing in lua.

@hickey
Copy link
Owner

hickey commented Nov 23, 2023

I was not thinking about the message DB having the zone name appended to it. So good points and that would relieve the problem of trying to untangle the DB after the change of the zone name.

Actually I am not certain if LUA would actually reload the config file as it is being imported instead of read as a file. There is a possibility that it keeps an internal state that keeps from re-importing a file that is has already imported. The other option is to use the config API of MeshChat to read and update the config values.

@gerner
Copy link
Collaborator Author

gerner commented Nov 24, 2023

So here's my thought: we should not have lots of different variables in the global scope holding on to config info (e.g. everything in meshchatconfig.lua, messages_db_file, there might be more). At least, we should not have those things sitting in global state without a way to refresh them.

Your idea about using the config API is nice, but there's so much code shared by meshchatsync and the server code that I'm worried it'll be impractical to hunt down the use cases between the two and it'll be hard to maintain that separation without duplicating a lot of code.

I did a quick look around and it's possible the only global state vars we have are in meshchatconfig.lua and the one variable messages_db_file in meshchatlib.lua.

How about this idea:

  • We move messages_db_file to meshchatconfig.lua. It's actually already defined there, the lib code just redefines it (which is crazy cause you might require those modules in the opposite order!) It kinda looks like a hack that it's living there.
  • Make config reloadable on the fly. We could move config to a json/ini/yaml file or something instead of in a lua module. And we have meshchatconfig.lua load that, with a method to reload that config. I'm open to other ways of accomplishing this point.
  • We have meshchatsync reload config at the start of every sync loop.
  • Avoid global variables in the future, at least ones that are holding config-like data.

That fixes this bug, gives us a pattern to hot-reload the config if we need it in the future, and places a style guideline to avoid this kind of bug in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants