Incorporate Path::Tiny #229

Closed
schwern opened this Issue Feb 19, 2013 · 16 comments

Projects

None yet

3 participants

@schwern
Contributor
schwern commented Feb 19, 2013

@dhorne suggested solving our file issues (#25, #26, #4, #140) by using Path::Tiny.

Pros...

  • @dagolden knows his stuff
  • It throws exceptions (autodie exceptions even)
  • It does a lot of what we want
  • If it doesn't have a feature we want, we can add or subclass it
  • We don't have to maintain it

Cons...

  1. Its very new
  2. It probably doesn't work on VMS
  3. It has some minor bugs
  4. It doesn't distinguish between files and directories

Rebuts...

  1. So will anything we write
  2. We're struggling to work on Windows, much less VMS
  3. So will anything we write
  4. If you're not on VMS that doesn't matter

I haven't done a detailed examination, but IMO it is Good Enough, I trust David Golden, and is certainly better than nothing at all. File handling has been a gaping hole in the perl5i interface. If it doesn't work out or something better comes along there's always perl5i::3.

Thoughts?

@ghost
ghost commented Feb 19, 2013

Without prior context (but thanks to Github for notifying me anyway), a couple thoughts:

  • Mental model is "paths" i.e. strings with some convenience behavior attached. That's why there is no file/directory distinction. Perl's built-ins don't care until you call something on them. Neither does Path::Tiny. Contrast to Path::Class, which will happily let you do 'dir("/etc/passwd")->is_dir' and think that's true.
  • It's a class so you can call methods on a string, but it's almost intentionally designed not to be easily subclassable
  • Minor bugs? If so, please open issues for them.
  • Happy to add little features if it can stay Tiny.
Contributor
schwern commented Feb 19, 2013

We're more than happy to submit features back, but we're likely going to want to subclass the hell out of it to add features that aren't tiny. Why is it a feature to be difficult to subclass?

I remember some more concerns I had. Its not UTF8 by default and has all sorts of icky foo_utf8 methods. Feels very 1995 both to have special Unicode methods and for a brand new file handling system not be UTF8 by default. IMO raw is now the exception. More pragmatically, perl5i has a "UTF8 everywhere" policy. We can include Unicode::UTF8 so performance is not an issue. If its possible to construct an object with a default binmode then we can take care of this in a subclass. Happy to patch that in.

Ubiquitous file locking concerns me. It seems like the right thing to do, but I'm concerned its going to cause mysterious deadlocks. I see the open*() methods don't lock and that's probably good enough.

@ghost
ghost commented Feb 19, 2013

Part of what makes Path::Class, et all so damn slow is all the indirection so that everything is a method call and subclasses can customize behavior. For the sake of speed, I didn't let that constraint affect my coding. It's not pathologically not subclassable, but I intentionally didn't give it any consideration.

UTF8... well... I see your point, sort of. I could conceivably refactor it a bit so the real work method is private and then a public method can set up args goto it. Or maybe I can pick up a default from the same hints that open.pm uses. (If it doesn't happen anyway. I haven't tested it.)

The flocking happens on all the slurp/lines/spew/append stuff. The pure open stuff just gives handles and you have enough rope to hang yourself. And the spew exclusive locks the tempfile that it writes to before atomically renaming. (Which is sort of overkill -- flocking a tempfile -- but correct.) So I think that should avoid most of the deadlock risk. Append has some risk, but that's sort of inherent in an append operation and you want to flock that anyway or you're crazy.

Contributor
schwern commented Feb 19, 2013

On 2/19/13 12:53 PM, David Golden wrote:

UTF8... well... I see your point, sort of. I could conceivably refactor
it a bit so the real work method is private and then a public method can
set up args goto it.

I was thinking giving objects a binmode attribute which is used as the
default if the argument is not given.

Speaking of attributes... array ref objects? skeptical look Can I
talk you out of that with some patches and benchmarks?

Or maybe I can pick up a default from the same
hints that open.pm uses. (If it doesn't happen anyway. I haven't
tested it.)

It likely does not. That's non-trivial to fix, but I've done it before.

The flocking happens on all the slurp/lines/spew/append stuff. The pure
open stuff just gives handles and you have enough rope to hang yourself.
And the spew exclusive locks the /tempfile/ that it writes to before
atomically renaming. (Which is sort of overkill -- flocking a tempfile
-- but correct.) So I think that should avoid most of the deadlock risk.
Append has some risk, but that's sort of inherent in an append operation
and you want to flock that anyway or you're crazy.

Upon looking at what things flock and what don't, I'm ok with it. It
was wise to separate methods into "raw operations" like open*() and "let
me take care of everything for you" methods like append().

Contributor

I have been trying out Path::Tiny and like it quite a lot.

I think default utf8 would be good, with ways to get non-utf8 available.

I encountered, what to me was a rather huge problem with it, though. Undefined path strings default to ".". ie path(undef) is the same as path('.'). This caused me a problem on the weekend when I had a test which called code which creates a bunch of files. The test checked that the expected files ->exists and then ->removed them. Unfortunately a bug in my test meant that I deleted all of my code. I lost my uncommitted changes, which was annoying. I reckon it would be better if calling methods on path($empty_str) and path(undef) just died. I have meant to report this, but hadn't got around to it yet.

@ghost
ghost commented Feb 19, 2013

On Mon, Feb 18, 2013 at 10:59 PM, myfwhite notifications@github.com wrote:

I encountered, what to me was a rather huge problem with it, though.
Undefined path strings default to ".". ie path(undef) is the same as
path('.'). This caused me a problem on the weekend when I had a test which
called code which creates a bunch of files. The test checked that the
expected files ->exists and then ->removed them. Unfortunately a bug in my
test meant that I deleted all of my code. I lost my uncommitted changes,
which was annoying. I reckon it would be better if calling methods on
path($empty_str) and path(undef) just died. I have meant to report this,
but hadn't got around to it yet.

I could be convinced that something like that makes sense. Though possibly
just path(undef) would be fatal, while letting path() be cwd for
convenience sake since it's a such a common thing. I'm on the fence about
path('')... it seems likely that someone passing that has a bug, so perhaps
makes sense to be fatal.

David Golden dagolden@cpan.org
Take back your inbox!http://www.bunchmail.com/
Twitter/IRC: @xdg

@ghost
ghost commented Feb 19, 2013

@schwern, https://gist.github.com/dagolden/4984147

Also, I've played a bit with caller and picking up open.pm hints. Do-able, at least for perl 5.10+, so I'll likely include that sometime soonish once I have time for proper tests and figure out ${^OPEN}, which appears weirdly orthogonal. See

$ perl -CD -we 'use open ":encoding(UTF-8)"; sub foo { use Data::Dumper; print Dumper((caller(0))[10]); print "${^OPEN}\n"} foo(); print "${^OPEN}\n"'
$VAR1 = {
      'open>' => ':encoding(UTF-8)',
      'open<' => ':encoding(UTF-8)'
    };
:utf8:utf8
:utf8:utf8

I'm perplexed, given that open.pm supposedly sets ${^OPEN}.

@ghost
ghost commented Feb 19, 2013

Path::Tiny 0.011 just shipped to CPAN with open pragma support.

I also tried out a rather hacky subclass: https://github.com/dagolden/path-tiny-utf8

Frankly, I'd rather see you just using the open pragma or having people write "slurp_utf8" or "slurp_raw". While those are longer to type, they are self-documenting.

I can release this if you really want, but I'd prefer not to if what I've done already with the open pragma for Path::Tiny will suffice.

Contributor
schwern commented Feb 19, 2013

On 2/19/13 6:53 PM, David Golden wrote:

I could be convinced that something like that makes sense. Though possibly
just path(undef) would be fatal, while letting path() be cwd for
convenience sake since it's a such a common thing. I'm on the fence about
path('')... it seems likely that someone passing that has a bug, so perhaps
makes sense to be fatal.

Deja vu all over again.
http://perl5.git.perl.org/perl.git/commit/35ae6b54cb5a8d6a9ce74c7f818c7d62df4ac621

If cwd is a common thing, don't conflate and complicate the most
commonly used function, add Path::Tiny->cwd.

Put this in as a Path::Tiny issue and I'll fix it up now.
dagolden/Path-Tiny#15

Contributor
schwern commented Feb 20, 2013

@dagolden I'm sure arrays are slightly faster than hashes, I actually can't make heads nor tails of the Dumbbench output.

array: Ran 1473 iterations (455 outliers).
array: Rounded run time per iteration: 0e-1 +/- 0.0e+00 (0.0%)
hash: Ran 166243 iterations (65790 outliers).
hash: Rounded run time per iteration: 0e-1 +/- 0.0e+00 (0.0%)

But does that make a squirt of difference to Path::Tiny? I seriously doubt it.

And if Path::Tiny honors the open pragma then that should do it for our purposes, thanks!

Contributor
schwern commented Feb 22, 2013

Path::Tiny appears to have patched up all of our concerns with one exception being error message location and that's close to being fixed.

I propose once dagolden/Path-Tiny#13 is fixed we can incorporate Path::Tiny pretty much as is.

Any further concerns?

Contributor
schwern commented Mar 18, 2013

It sounds like Path::Tiny is the thing. Switching this over to DO IT status!

This will fill a huge hole in perl5i. I'm super excited!

Contributor
schwern commented Mar 23, 2013

I tried to do this and found it increases startup time by about 25%. Even after putting Path::Tiny on a diet its about 20%. Even though we use autodie, you pay for it in every package.

A work around is to have our path load Path::Tiny on demand.

@ghost
ghost commented Mar 23, 2013

Just be sure you benchmark how many extra "require" calls outweigh that startup cost. path() called 1e3 time? 1e4 times? 1e5 times? etc. At some point when does overhead on "normal" workload outweigh startup diet?

Keep in mind that almost any real-world code is going to have deps that load some of Path::Tiny's deps, too, so the real world startup overhead is less than the standalone overhead.

Contributor
schwern commented Mar 24, 2013

I did. It added about 1/6 to the runtime cost so I made it only load once.

The startup cost of perl5i is already 200ms on my Macbook, noticeable by the user. Any slower to load and it precludes its use in command line programs. The user should not have to decide between good performance and a better language. And they shouldn't pay for what they don't use.

Contributor
schwern commented Jun 11, 2013

$file->path has been added with #247.

@schwern schwern closed this Jun 11, 2013
@schwern schwern referenced this issue Jul 23, 2014
Closed

v2.13.0 #276

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment