Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a "string-like" class to represent filepaths #2326

Open
Tracked by #1344
Minoru opened this issue Jan 30, 2023 · 4 comments
Open
Tracked by #1344

Add a "string-like" class to represent filepaths #2326

Minoru opened this issue Jan 30, 2023 · 4 comments
Labels
refactoring This issue describes a way in which some particular part of the code could be improved rust-port

Comments

@Minoru
Copy link
Member

Minoru commented Jan 30, 2023

This is an offshoot of #1344, and it blocks that issue.

We currently represent paths with std::string. What we want is to represent them with some other class which wraps std::string.

I think the basic interface for that class should be:

  • construction from std::vector<std::uint8_t> or std::vector<char>. std::string is up to debate; if it's provided, it should definitely be marked explicit or done through a helper method with a descriptive name
  • infallible conversion to UTF-8, for logging. I think it could try to interpret the bytes as UTF-8, and where that fails, represent each byte in hexadecimal like \x80

Then this new class should be used where filepaths are used. I'd start with ConfigContainer: some of the settings contain filepaths, start with those. Also look through the utils module, as it has some filepath-related utilities like resolve_tilde.

Finally, Rust FFI should be fixed to pass filepaths as &[u8], not &str (because str is UTF-8).

This issue is up for grabs, but might not be suitable for those who see Newsboat for the first time. Don't hesitate to ask questions though!

@Minoru Minoru added enhancement This issue proposes a new feature or an improvement to an existing one rust-port refactoring This issue describes a way in which some particular part of the code could be improved and removed enhancement This issue proposes a new feature or an improvement to an existing one labels Jan 30, 2023
@bogdasar1985
Copy link
Contributor

@Minoru I wanna try to solve this issue, but I confess that I don’t quite understand it completely, so in order not to write a bunch of unnecessary code, I’ll write here a wrapper draft, questions and etc.

infallible conversion to UTF-8, for logging.

As I understand, in case of invalid UTF-8 symbol class should:

  1. Set invalid bytes in string to \x80
  2. Log it by LOG::INFO or LOG::DEBUG
  3. Signal to user about it (two variants):
    2.1 set bool flag inside class to false and let user to check it (but it broke an invariant of class and user may just forget to check!)
    OR
    2.2 throwing an exception from constructor contains string with invalid bytes set to \x80
class FilePathString {
public:
	// Constructors
	// check is UTF8 valid and signal to user as described above
	FilePathString(const std::vector<std::uint8_t>& path);

	FilePathString(const std::string& path);

	std::string data() const
	{
		return str;
	}

	// Other common string methods if necessary
private:
	std::string str;
};

Am I on the right way at all, or am I missing something?

@Minoru
Copy link
Member Author

Minoru commented Sep 2, 2023

You kind of are, but not quite. Your example seem to hint that the path will get converted to UTF-8 and stored as UTF-8, but that's exactly what we need to avoid. The class should store the path as bytes, with no assumptions about its encoding. There should be a method that returns UTF-8, we're going to use it when writing the path to logs.

Also, the class should not have the common string methods. Just treat the path as a vector of bytes, with no knowledge about what it contains. Maybe take a look at Rust's Path and PathBuf, and what methods they provide -- our class should be pretty much the same.

I actually started working on this a while ago. Take a look at feature/2326-filepath branch in this repo. I wrote the skeleton of the class and converted some classes and functions to it. The branch is quite dirty and unfinished, but hopefully it nudges you in the right direction. If you decide to pick it up, the next step would be to finish porting test helpers to the new class. Just change some path-like std::string to Filepath and fix all the compilation errors; repeat until there are no other places to change. Then start removing implicit conversions from include/filepath.h, and finally remove the ENABLE_IMPLICIT_FILEPATH_CONVERSIONS define.

The skeleton probably doesn't have all the necessary methods. They should be added as more code is converted and you see that the methods are actually needed. Keep them as similar to Path/PathBuf as possible.

I should warn you that this task is a slog :) But if you make even a singe commit, please publish your work so the next person starts a bit closer to the goal.

@bogdasar1985
Copy link
Contributor

@Minoru

#2561 (comment)

Okay! Thank you very much for the work. I'll merge and rebase it now, then create new issues and write an update on #2326.

So, where is update and what's next?

@Minoru
Copy link
Member Author

Minoru commented Dec 29, 2023

Sorry, I dropped the ball on this one! Here goes.


In #2561, @bogdasar1985 ported all interfaces and storage to the new Filepath class. I added some docs and tests, and rebased the result onto current master branch; it's available in the feature/2326-filepath branch now.

This doesn't provide any type-safety yet, because Filepath is implicitly convertible to and from std::string. These conversions, along with a few more helpers, are hidden behind an ENABLE_IMPLICIT_FILEPATH_CONVERSIONS define. The final task is to remove those.

I think the best approach to this would be to go through all .cpp files, and for each put #undef ENABLE_IMPLICIT_FILEPATH_CONVERSIONS right after includes and then fix all the compilation errors in that file. Once we treated all .cpp files like this, we can remove #undefs along with the define itself, fix a few compilation errors that might pop up in .h files, do the final rebase onto the master, and finally merge this work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactoring This issue describes a way in which some particular part of the code could be improved rust-port
Projects
None yet
Development

No branches or pull requests

2 participants