Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use AsRef<Path> instead of &str. #80

Merged
merged 5 commits into from Oct 19, 2022

Conversation

npatsakula
Copy link
Contributor

Master PR: #76
Blocked by: #79

Motivation

  1. Not every FS path is a valid utf-8 string.
  2. AsRef<Path> is strictly more acceptable with same level of correctness:
    you can use &str, Path and PathBuf.

Implementation

  • Use AsRef<Path> instead of &str.
  • Remove unnecessary type casts and unwraps from tests.

@veta666, this one is also on you.

@npatsakula npatsakula changed the title Use path Use AsRef<Path> instead of &str. Sep 12, 2022
Copy link
Owner

@guillaume-be guillaume-be left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @npatsakula for this PR.

I would be in favor of actually moving from &str to Path instead of AsRef<Path> for 2 reasons:

  1. As you mention, not every filesystem path is a valid utf-8 string. If we decide to take the path of correctness, it makes sense to expect a Path. The user can still convert the &str it may use to Path before passing it to the tokenizers and vocabs.
  2. Using Option with generics with AsRef leads to awkward None arguments, requiring the user to pass None::<Path> instead of simply None. I feel this is a high price to pay for allowing (possibly incorrects) &str as path representation.

What do you think? The next few pull requests will introduce breaking changes anyway, so I think using Path makes sense here.

I believe this would be ready to merge before #79 (I am not fully bought in the idea of snafu) so it may make sense to merge this one first and revert the error handling commits on this branch.

@npatsakula
Copy link
Contributor Author

Thank you @npatsakula for this PR.

I would be in favor of actually moving from &str to Path instead of AsRef<Path> for 2 reasons:

1. As you mention, not every filesystem path is a valid utf-8 string. If we decide to take the _path_ of correctness, it makes sense to expect a `Path`. The user can still convert the `&str` it may use to `Path` before passing it to the tokenizers and vocabs.

2. Using `Option` with generics with `AsRef` leads to awkward `None` arguments, requiring the user to pass `None::<Path>` instead of simply `None`. I feel this is a high price to pay for allowing (possibly incorrects) `&str` as path representation.

What do you think? The next few pull requests will introduce breaking changes anyway, so I think using Path makes sense here.

I believe this would be ready to merge before #79 (I am not fully bought in the idea of snafu) so it may make sense to merge this one first and revert the error handling commits on this branch.

  1. Yes, but &str is valid subset of filesystem paths (the reverse is not true). This mean that we obtain richer interface for no cost.
  2. We can use pre-turbofished interface for beginner rust programmers (link).

Now I replaced all AsRef<Path> with Path and broke all comment documentation + you can see how noisy it looks like.

@guillaume-be
Copy link
Owner

Hello @npatsakula ,

You are right I got confused into assuming that Path was a subset of &str, it does make sense to keep a richer interface. As the library now leverages the specialized new_with_special_token_mapping_path, the issue with the awkward None turbofish becomes even less of an issue.

I apologize - but you convinced me that your original choice was, indeed, the right one. I would be happy to go with AsRef<Path> (after causing some extra work on your end). If you're happy to do so I'd go back to your original implementation with AsRef<Path> but without the error handling rework for now.

I'm happy to help our revert the changes if you'd like me to do so, please let me know.

@npatsakula
Copy link
Contributor Author

Hello @guillaume-be! I reverted to AsRef<Path> (doc-comments too).

@guillaume-be
Copy link
Owner

Hello @guillaume-be! I reverted to AsRef<Path> (doc-comments too).

Thanks a lot! looks good to merge - sorry again for the unneeded iterations

@guillaume-be guillaume-be merged commit 4d8aa3e into guillaume-be:master Oct 19, 2022
@npatsakula npatsakula mentioned this pull request Oct 19, 2022
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants