Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow separate mail and state paths #33

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

robn
Copy link
Contributor

@robn robn commented Jun 9, 2022

Add config items mail_dir and state_dir to allow these to be explicitly set in config, and modifies --path (-C) to set their defaults correctly depending on whether or not a directory or a file is offered:

  • directory: config file is mujmap.toml in the given dir, mail_dir and state_dir default to the given dir (that is, same as existing behaviour)
  • file: config file is as given, mail_dir defaults to notmuch's mail_root, state_dir defaults to the system-standard state/cache dir.

This covers points (1) and (2) in my comment on #29 and sets a base for (3).

@elizagamedev
Copy link
Owner

Ty for this. I'm unfortunately busy this weekend so I'm going to have to review this Monday at the earliest.

mujmap.toml.example Outdated Show resolved Hide resolved
Copy link
Owner

@elizagamedev elizagamedev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Run rustfmt on the codebase
  • Update CHANGELOG.md

Good stuff so far :)

mujmap.toml.example Outdated Show resolved Hide resolved
src/config.rs Outdated Show resolved Hide resolved
src/local.rs Outdated
.ok()
.filter(|rel| rel.components().count() > 0)
.map_or(
"path:**".to_string(),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can probably just be an empty string tbh

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a bit of rework elsewhere to do so because we append and lastmod:... in Local::all_emails_since and with empty-string it produces an invalid query.

(probably, that rework is for the allmail query to be an Option)

// notmuch mail root, then search under the relative maildir path (allowing multiple
// maildirs per notmuch dir). If not, assume this is the only maildir for the notmuch dir,
// and use a global query.
let all_mail_query = mail_dir
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user types an absolute path for the mail_dir config option but misspells part of the path, causing strip_prefix to fail, they could experience some nasty consequences. We should keep/revise the MailDirNotASubdirOfNotmuchRoot error here in that case instead of swallowing. We can still support the config option being a relative path by checking if its is_absolute() returns false and handling that slightly differently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure there's a lot we can do really; how do we know if its a error if the path is valid?

The best I can think to do is for a configured mail_dir, make sure it exists and bail out if not. Of course its still possible to typo, but at least it won't plow on regardless.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've added a commit that does just that if mail_dir is provided (if we compute it, it doesn't have to exist beforehand).

The more I think about the more I think this will be rare, because most of the time I expect that people would mess with their config until its set up how they like, and would then walk away, so its unlikely that they're going to get a path wrong in a system they've already set up. Its still good to have a safety for an obviously-wrong situation though.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this isn't quite what I meant. I think I understand why you changed it like this now, to specially handle the case for if mail_dir and the notmuch root directory are the same, is that right? According to the docs, it seems like strip_prefix will yield an empty PathBuf rather than Err for this case, so there's actually no need to specially handle this case.

https://doc.rust-lang.org/std/path/struct.PathBuf.html#method.strip_prefix

But the consequences of removing the MailDirNotASubdirOfNotmuchRoot error as this commit currently exists would result in something like the following:

  • The notmuch root directory is ~/Maildir and mujmap is configured so that its mail dir is ~/Maildir/mujmap. So far so good.
  • The user decides they want to move their maildir to an external hard drive, and move it to /mnt/maildrive/Maildir, but forget to update mujmap, which still points to ~/Maildir
  • The user runs mujmap. Both directories are valid, but because strip_prefix fails here, mujmap silently both decides that it owns the entire mail database, and insists on placing mail files in the old ~/Maildir and trying to add them to the notmuch database. Potentially destructive chaos ensues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really following your scenario, and I'm not sure if that's because its imprecisely described, or if its because we're not quite in agreement about what we're doing here and so talking past a little bit. I will try and write down what I think is happening here, and hopefully I will understand by the end.

I'm seeing mail_dir (and state_dir and cache_dir) as a kind of override; if you set it, it will be used over any other value.

So in your scenario, if they've set mail_dir to something ~/Maildir/mujmap then they get what they ask for, regardless of whether or not its a subdir of the notmuch root. This has to be this way to support "side-by-side" configs (like XDG).

If on the other hand they haven't set mail_dir explicitly and its implied (by "directory" mode), then I don't see how mujmap could know this. The paths haven't changed, just the files were moved out without its knowledge.

Going back to the code, what I think I'm doing is that if the maildir is under the notmuch root, then the query becomes path:xxx/** where xxx is the path of the maildir relative to the notmuch root, but if its outside the notmuch root, we use the global path:** because what else can we do? The code is effectively converting the empty PathBuf and Err to mean the same thing: use a global query.

Ahh, maybe that's it. Are you talking about the case where the user has explicitly set mail_dir to be under the notmuch root? And then they move it later, which will change the path prefix. That could be a problem (I have not thought hard about this yet). I hadn't considered that someone could deliberately set things up this way because I only know of yours and my use cases! Hmm.

I wonder if that's the real problem here, that the path prefix is kind of baking too much info about the existing paths into the notmuch database, and if they change we don't know how to hook it up properly?

If so I'm not sure we can fix this here. Some possible solutions (off the top of my head):

  • require that if the maildir is under the notmuch dir, you have to move them together so the prefixes match
  • find some other way to match message ids than by filename (maybe notmuch properties? otherwise hard, secondary index or something)
  • force notmuch's mail_rootto follow mujmap's mail_dir so we can use the global path query only (breaks anyone that has multiple maildirs in a single notmuch index)

Something else might be to do a sanity check. If there's a bunch of stuff in the index, but none of the paths exist on disk, abort and warn the user. Hmm, and thinking more on that, maybe the answer is that if a message is not on disk, but is in notmuch, and is on the server, re-download it.

I'm not really arguing against the need for MailDirNotASubdirOfNotmuchRoot as such, I just don't know when it should fire. But honestly, I still wonder if this is a thing. Is moving things around a common thing that notmuch people like to do? (I am only a new notmuch person myself). If its not, then maybe noticing something and off and forcing a resync is the right way.

(I wrote this over the course of a busy and distracted day so I feel like everything has fallen out of my head even more than the original plan fell out of my head in the last couple of months; I'm really struggling. I'll try to think about this more over the weekend).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in your scenario, if they've set mail_dir to something ~/Maildir/mujmap then they get what they ask for, regardless of whether or not its a subdir of the notmuch root. This has to be this way to support "side-by-side" configs (like XDG).

I think maybe this is where the confusion stems. notmuch doesn't actually support this configuration; everything has to be a subdirectory of database.mail_root. database.path used to refer to the same thing as database.mail_root, but is now independent and only refers to the actual database files if both of these options are specified (see here). If you've somehow managed to index files outside of the root, I think that's actually undefined behavior and probably not good. 😕 But it seems like the best way to prevent unintended behavior is to catch this discrepancy between the two configurations as early as possible, hence MailDirNotASubdirOfNotmuchRoot.

My example was mostly just me trying to address what I thought your misconception was rather than your actual example, rather than me trying to necessarily specifically prevent these kinds of database migration issues, but I do agree that a sanity check for a really large unexpected change like this would be a great to have feature.

(I wrote this over the course of a busy and distracted day so I feel like everything has fallen out of my head even more than the original plan fell out of my head in the last couple of months; I'm really struggling. I'll try to think about this more over the weekend).

That's okay, me too 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe this is where the confusion stems. notmuch doesn't actually support this configuration; everything has to be a subdirectory of database.mail_root. database.path used to refer to the same thing as database.mail_root, but is now independent and only refers to the actual database files if both of these options are specified (see here). If you've somehow managed to index files outside of the root, I think that's actually undefined behavior and probably not good. confused

I agree that's where the confusion comes from; if using NOTMUCH_PROFILE these things can all be separated, and indeed, that's a large part of the reason I'm even doing this work!

My current notmuch config with profiles, fyi:

$ env NOTMUCH_PROFILE=home notmuch config list
...
database.backup_dir=/home/robn/.local/share/notmuch/home/backups
database.hook_dir=/home/robn/.local/share/notmuch/home/hooks
database.mail_root=/home/robn/.local/share/mujmap/home
database.path=/home/robn/.local/share/notmuch/home

And then I run mujmap with this PR as: env NOTMUCH_PROFILE=home mujmap --path /home/robn/.config/mujmap/home.toml. mail_dir is not set there, so its being taken from notmuch.

I did consider that maybe the intent of notmuch's profile support was still that the database was under the maildir, and maybe I was indeed getting accidental behaviour. The config doc is not clear on this, though perhaps that's just it not answering a question that no one asked. However, the tests for split-path configs seems to imagine that the database path and mail dir can be in very different locations, as perhaps does the commit that introduced it, and a code read seems to support that too.

I think I don't have any more to add. I (of course) think I'm right about what notmuch is supposed to be able to do here; if you're not persuaded then I'm not sure what we do next.

src/config.rs Outdated Show resolved Hide resolved
src/sync.rs Outdated Show resolved Hide resolved
src/sync.rs Outdated Show resolved Hide resolved
src/sync.rs Outdated Show resolved Hide resolved
src/cache.rs Show resolved Hide resolved
src/local.rs Outdated
.config(ConfigKey::MailRoot)
.map_or(
db.path().into(),
|root| PathBuf::from(root)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably don't need to clone into owned types here since we call canonicalize() on the result.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

db.path() returns &Path, root is a &str. An owned PathBuf is the only way I can see to make it all live long enough using this particular construction.

Cheap alternative; move canonicalize into both arms:

diff --git src/local.rs src/local.rs
index b347283..e40d2cc 100644
--- src/local.rs
+++ src/local.rs
@@ -15,7 +15,7 @@ use std::collections::HashMap;
 use std::collections::HashSet;
 use std::fs;
 use std::io;
-use std::path::PathBuf;
+use std::path::{Path, PathBuf};
 
 const ID_PATTERN: &'static str = r"[-A-Za-z0-9_]+";
 const MAIL_PATTERN: &'static str = formatcp!(r"^({})\.({})(?:$|:)", ID_PATTERN, ID_PATTERN);
@@ -95,8 +95,9 @@ impl Local {
         // that key (ancient version of notmuch?), use the database path.
         let mail_root = db
             .config(ConfigKey::MailRoot)
-            .map_or(db.path().into(), |root| PathBuf::from(root))
-            .canonicalize()
+            .map_or(db.path().canonicalize(), |root| {
+                Path::new(&root).canonicalize()
+            })
             .context(CanonicalizeSnafu {})?;
 
         // Figure out our maildir. Either the configured thing, or notmuch's mail root. Which in

@robn
Copy link
Contributor Author

robn commented Aug 3, 2022

Hi! Long time! Thanks for the review, I've finally got some clear air to work on this.

rustfmt, good tip. Never used in anger, and while I'm not sure I agree with all its opinions I appreciate its opinions at all!

@robn robn force-pushed the separate-paths branch 6 times, most recently from 7558af0 to 80c01bc Compare August 8, 2022 02:30
@robn
Copy link
Contributor Author

robn commented Aug 8, 2022

I pushed a new commit on top that ensures all "configured" paths are absolute.

(By "configured" I mean whatever is in the Config struct, not necessarily what's listed in the config file).

This should mean that its not necessary to consider what to do with relative paths, since we can't ever have them.

@robn robn force-pushed the separate-paths branch 3 times, most recently from fbfe22c to 7ced295 Compare August 8, 2022 07:18
@robn
Copy link
Contributor Author

robn commented Aug 8, 2022

@elizagamedev apart from the two outstanding comments above, I believe this is done!

Copy link
Owner

@elizagamedev elizagamedev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, much improved, ty. Still a few things left.

  • Run rustfmt again after further changes. I ought to make this a pre-merge Github actions check or something.

mujmap.toml.example Outdated Show resolved Hide resolved
// notmuch mail root, then search under the relative maildir path (allowing multiple
// maildirs per notmuch dir). If not, assume this is the only maildir for the notmuch dir,
// and use a global query.
let all_mail_query = mail_dir
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this isn't quite what I meant. I think I understand why you changed it like this now, to specially handle the case for if mail_dir and the notmuch root directory are the same, is that right? According to the docs, it seems like strip_prefix will yield an empty PathBuf rather than Err for this case, so there's actually no need to specially handle this case.

https://doc.rust-lang.org/std/path/struct.PathBuf.html#method.strip_prefix

But the consequences of removing the MailDirNotASubdirOfNotmuchRoot error as this commit currently exists would result in something like the following:

  • The notmuch root directory is ~/Maildir and mujmap is configured so that its mail dir is ~/Maildir/mujmap. So far so good.
  • The user decides they want to move their maildir to an external hard drive, and move it to /mnt/maildrive/Maildir, but forget to update mujmap, which still points to ~/Maildir
  • The user runs mujmap. Both directories are valid, but because strip_prefix fails here, mujmap silently both decides that it owns the entire mail database, and insists on placing mail files in the old ~/Maildir and trying to add them to the notmuch database. Potentially destructive chaos ensues.

src/config.rs Show resolved Hide resolved
@elizagamedev
Copy link
Owner

By the way, once this is merged and once I finish fixing #26 (I've done about half of it) I'm planning on releasing v2.1 in response to the bearer token changes.

If set, overrides whatever mujmap chooses. Which, for now, is just the
config dir as normal.
notmuch has an idea of where the mail dir is, which is the "root" for
its path queries. If mujmap's mail dir is under mail root, then compute a
path query relative to it. Otherwise, if mujmap's mail dir is the same
as the mail root, or somewhere outside it entirely, use a "global" path
query.
For state_dir, that's whatever the best state dir for the system is, or
the cache dir if that's not available. For mail_dir, that's whatever
notmuch says.

Of course this doesn't change anything in "directory" mode, because the
defaults are implied to be the config dir there.
For dir, as before, the mail and state dirs default to the config dir.
For file, the defaults will be computed per previous commit.
If they're not provided in config, we need to select appropriate
defaults for the OS. Move this into the config setup itself so its all
done in once place up front.
Relative paths leaking through into the sync code proper has potential
to do strange things. Rather than trying to fix this up at point of use,
instead ensure that all configured paths are absolute.

Note that we can't just canonicalize them all, as they might not exist
yet.
This is a small protection against typos in this config item. Its
possible that you actually wanted a subdir of the notmuch mail_root, but
typed it wrong. That would lead notmuch to miscompute the notmuch path
prefix, which could result in mujmap having the wrong idea about what
mail exists locally or on the server, with a risk of mail being deleted
from one or both ends.

Its hard to say what the odds are of a mistyped path matching a valid
directory, but it seems reasonably unlikely, so hopefully this will
prevent the most common case.
@robn
Copy link
Contributor Author

robn commented Aug 11, 2022

  • Run rustfmt again after further changes. I ought to make this a pre-merge Github actions check or something.

cargo fmt is running clean for me; did you spot something or is this just a "don't forget!"?

@robn
Copy link
Contributor Author

robn commented Aug 11, 2022

By the way, once this is merged and once I finish fixing #26 (I've done about half of it) I'm planning on releasing v2.1 in response to the bearer token changes.

Ack, sounds great! I will try my best to have it all back to you within the next day.

@robn
Copy link
Contributor Author

robn commented Aug 12, 2022

By the way, once this is merged and once I finish fixing #26 (I've done about half of it) I'm planning on releasing v2.1 in response to the bearer token changes.

Given my unending confusion and my weekend being full, I suggest not waiting for this PR to get a release out. I don't want to be the one holding up the auth fixes.

@elizagamedev
Copy link
Owner

cargo fmt is running clean for me; did you spot something or is this just a "don't forget!"?

It's more of a "don't forget" kind of thing, I didn't notice anything wrong.

Given my unending confusion and my weekend being full, I suggest not waiting for this PR to get a release out. I don't want to be the one holding up the auth fixes.

There's no need to rush, I haven't been active the past few days after all. I don't want to put pressure on you as a volunteer so I won't block the release for this PR in that case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants