GitHub - hercynium/Pod-Stupid: The simplest, stupidest 'pod parser' possible

hercynium / Pod-Stupid Public

Notifications You must be signed in to change notification settings
Fork 0
Star 1

The simplest, stupidest 'pod parser' possible

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
exc		exc
lib/Pod		lib/Pod
t		t
.gitignore		.gitignore
Changes		Changes
README		README
TODO		TODO
dist.ini		dist.ini
perlcritic.rc		perlcritic.rc
weaver.ini		weaver.ini

Repository files navigation

NAME
    Pod::Stupid - The simplest, stupidest 'pod parser' possible

VERSION
    version 0.005

SYNOPSIS
      use Pod::Stupid;
  
      my $file = shift; # '/some/file/with/pod.pl';
      my $original_text = do { local( @ARGV, $/ ) = $file; <> }; # slurp
  
      my $ps = Pod::Stupid->new();
  
      # in scalar context returns an array of hashes.
      my $pieces = $ps->parse_string( $original_text );
  
      # get your text sans all POD
      my $stripped_text = $ps->strip_string( $original_text );
  
      # reconstruct the original text from the pieces...
      substr( $stripped_text, $_->{start_pos}, 0, $_->{orig_txt} )
          for grep { $_->{is_pod} } @$pieces;
  
      print $stripped_text eq $original_text ? "ok - $file\n" : "not ok - $file\n";

DESCRIPTION
    This module was written to do one simple thing: Given some text as
    input, split it up into pieces of POD "paragraphs" and non-POD
    "whatever" and output an AoH describing each piece found, in order.

    The end user can do whatever s?he wishes with the output AoH. It is
    trivially simple to reconstruct the input from the output, and hopefully
    I've included enough information in the inner hashes that one can easily
    perform just about any other manipulation desired.

INDESCRIPTION
    There are a bunch of things this module will NOT do:

    *   Create a "parse tree"

    *   Pod validation (it either parses or not)

    *   Pod cleanup

    *   "Handle" encoded text (but it *should* still parse)

    *   Feed your cat

    However, it may make it easier to do any of the above, with a lot less
    time and effort spent trying to grok many of the other POD parsing
    solutions out there.

    A particular design decision I've made is to avoid needing to save any
    state. This means there's no need or advantage to instantiating an
    object, except for your own preferences. You can use any method as
    either an object method or a class method and it will work the same way
    for both. This design should also discourage me from trying to bloat
    Pod::Stupid with every feature that tickles my fancy (or yours!) but
    still, I encourage any feature requests!

METHODS
  new
    the most basic object constructor possible. Currently takes no options
    because the object neither has nor needs to keep any state.

    This is only here if you want to use this module with an OO interface.

  parse_string
    Given a string, parses for pod and, in scalar context, returns an AoH
    describing each pod paragraph found, as well as any non-pod.

      # typical usage
      my $pieces = $ps->parse_string( $text );
  
      # to separate pod and non-pod
      my @pod_pieces     = grep { $_->{is_pod}  } @$pieces;
      my @non_pod_pieces = grep { $_->{non_pod} } @$pieces;

  strip_string
    given a string or string ref, and (optionally) an array of pod pieces,
    return a copy of the string with all pod stripped out and an AoH
    containing the pod pieces. If passed a string ref, that string is
    modified in-place. In any case you can still always get the stripped
    string and the array of pod parts as return values.

      # most typical usage
      my $txt_nopod = $ps->strip_string( $text );
  
      # pass in a ref to change string in-place...
      $ps->strip_string( \$text );   # $text no longer contains any pod
  
      # if you need the pieces...
      my ( $txt_nopod, $pieces ) = $ps->strip_string( $text );
  
      # if you already have the pod pieces...
      my $txt_nopod = $ps->strip_string( $text, $pod_pieces );

KNOWN LIMITATIONS
    *   Currently only works on files with unix-style line endings.

TODO
    This is only what I've thought of... suggestions *very* welcome!!!

    *   Fix aforementioned limitation

    *   More comprehensive tests

    *   A utility module to do common things with the output

CREDITS
    Uri Guttman for giving me the task that led to my shaving this
    particular yak

SEE ALSO
    *   Pod::Simple

    *   Pod::Parser

    *   Pod::Stripper

    *   Pod::Escapes

    *   perlpod

    *   perlpodspec

    *   perldoc

    *   and about a million other things...

POD TERMINOLOGY FOR DUMMIES (aka: me)
  paragraphs
    In Pod, everything is a paragraph. A paragraph is simply one or more
    consecutive lines of text. Multiple paragraphs are separated from each
    other by one or more blank lines.

    Some paragraphs have special meanings, as explained below.

  command
    A command (aka directive) is a paragraph whose first line begins with a
    character sequence matching the regex m/^=([a-zA-Z]\S*)/

    I've actually been a bit more generous, matching m/^=(\w+)/ instead.
    Don't rely on that though. I may have to change to be closer to the spec
    someday.

    In the above regex, the type of command would be in $1. Different types
    of commands have different semantics and validation rules yadda yadda.

    Currently, the following command types (directives) are described in the
    Pod Spec <http://perldoc.perl.org/perlpodspec.html> and technically, a
    proper Pod parser should consider anything else an error. (I won't
    though)

    *   head[\d] (\d is a number from 1-4)

    *   pod

    *   cut

    *   over

    *   item

    *   back

    *   begin

    *   end

    *   for

    *   encoding

  directive
    Ostensibly a synonym for a command paragraph, I consider it a subset of
    that, specifically the "command type" as described above.

  verbatim paragraph
    This is a paragraph where each line begins with whitespace.

  ordinary paragraph
    This is a prargraph where each line does not begin with whitespace

  data paragraph
    This is a paragraph that is between a pair of "=begin identifier" ...
    "=end identifier" directives where "identifier" does not begin with a
    literal colon (":")

    I do not plan on handling this type of paragraph in any special way.

  block
    A Pod block is a series of paragraphs beginning with any directive
    except "=cut" and ending with the first occurence of a "=cut" directive
    or the end of the input, whichever comes first.

  piece
    This is a term I'm introducting myself. A piece is just a hash
    containing info on a parsed piece of the original string. Each piece is
    either pod or not pod. If it's pod it describes the kind of pod. If it's
    not, it contains a 'non_pod' entry. All pieces also include the start
    and end offsets into the original string (starting at 0) as 'start_pos'
    and 'end_pos', respectively.

AUTHOR
    Stephen R. Scaffidi <sscaffidi@cpan.org>

COPYRIGHT AND LICENSE
    This software is copyright (c) 2010 by Stephen R. Scaffidi.

    This is free software; you can redistribute it and/or modify it under
    the same terms as the Perl 5 programming language system itself.