Skip to content
any-dl: generic mediathek-downloader ("generic" means, you also can call it "scrapertool")
OCaml Standard ML Other
Failed to load latest commit information.
COPYING rename to meet normal naming conventions
Makefile yojson for JSON-parsing added to Makefile.
OCamlMakefile Current OCamlMakefile from github.
PKGBUILD ocaml-yojson added as dependency
README select-command now also works on Document's and Document_array's.
TODO storematch now works. initial_referrer set to empty-string, instead of "-". "-" created pro… bug corrected: this function does not need a channel as parameter.
evaluate.mli Added mli-Files for strengthening the interfaces. Invalid_argument and Not_found caught arond the eval-loop, with messa…
main.mli Added mli-Files for strengthening the interfaces. Redirection is not error; default is to follow it. new function: remove_fragment_from_url
parsers.mli new function: remove_fragment_from_url result_to_string now has optvalue 'details'. With details = true: num…
rc-file.adl parser mp3bulk removed. The new mp3-parser should be able do that tas…
scriptlexer.mll '{' and '}' now also possible in match strings, when using backslash …
scriptparser.mly csv_read-read command implemented.
testparsers.adl Parsing html-table to Match_result works, but some problems need to b… function "print_pcre_error" added - print short err-message to stderr
tools.mli new function: remove_fragment_from_url


any-dl, a tool for downloading Mediathek video-files

! This stuff is becoming more and a more mature, nevertheless
! it is under development.
! Be aware, that the parser-definition-language is not frozen so far
! and may change in the future!
! For users who just want to use any-dl, without caring about
! the internals of it or it's parser-defs, this does not matter.


  The tool any-dl is inspired by, and has it's name derived from tools like
  youtube-dl, arte-dl, dctp-dl, zdf-dl, ...

  These tools are specialized downloading tools for videos
  of youtube, as well as tv-broadcasting companies.

  All these tools do download video files, and for accomplishing this task,
  they need to download and analyze webpages, via which thos cvideos are
  presented to the viewer.

  All these small tools are only programmed to work with certain video archives,
  and a lot of work is going into these kind of tools.

  any-dl is intended to be generic enough to allow downloads of videos from all
  these platforms, and for this case, probviding a Domain Specific Langage
  (DSL) which defines how the videos of a certain server can be downloaded.

  The DSL is designed to allow defining parsers, which say, how to scrape
  the archives.

  This language will explained below.

  But as normal user, you normally will not need to know this parser definition
  You just need to know how t use any-dl, and this is pretty simple.
  So, before the parser definition language will be explained,
  the usage of any-dl will be explained.

  any-dl provides a program with a certain language,
  that allows doing the parsing stuff of websites with
  focus on video download.

  If you miss a parser for a certain site or if you have written
  one by your own, please let me know.

  As any-dl does delegate the stream-downloads to certain tools,
  it will make sense to have the following tools also installed:

  - rtmpdump
  - mplayer
  - ffmpeg      (for rtsp)
  - vlc         (for rtsp)
  - youtube-dl  (for youtube/vimeo, any-dl has now a parser that is just a wrapper here)

Compilation / Installation / Setup

  When you read this README file from within the directory
  you downloaded (via git or elsewhere), then you
  already have unpacked it.

  You need to compile and install / setup the tool.

  You will need to have OCaml installed, as well as
  some libraries.

  For the libraries in use, under Debian
  the following packages for each library are listed:

  (I'm not a Debian package expert, so please send me
  corrections, if necessary.)

      netstring, netsys, netclient:




  To compile any-dl, just type "make" at the shell then.

    $  make

  The the file "any-dl" should be in the current directory.
  You can then copy it to $HOME/bin if your PATH-variable
  points to it.
  Or possible you may copy it to /usr/lovcal/bin or /usr/bin
  depending on the Linux-/Unix-system you are using, and
  the filesystem-standard it is using.

  The file "rc-file.adl" does contain needed parser-definitions
  for any-dl to work as expected.

  This file must be available in one of three places:

    - /etc/any-dl.rc

    - $XDG_CONFIG_HOME/any-dl.rc  ( default: $HOME/.config/any-dl.rc )

    - $HOME/.any-dl.rc

  So, please copy from the local dir, where you built any-dl
  the file "rc-file.adl" to one of the three places, mentioned above.

  This command would do it for the default of the XDG_CONFIG_HOME environment variable:

   $ cp rc-file.adl  $HOME/.config/any-dl.rc     # copy the config file to the XDG-default-dir

  But if the XDG_CONFIG_HOME-environment variable is set,
  it's better to use it:

   $ cp rc-file.adl  $XDG_CONFIG_HOME/any-dl.rc      # copy the config file to the XDG-dir

        ( For command-line-newbies:
           The "$" symbol in the mentioned command lines, which you have to
           type do represent the prompt of the shell; don't type it. )

  If you want to add your own parser-definitions, it would make sense
  to save the file "rc-file.adl"  in the  /etc/-directory, as mentioned above
  and then add your own parser-definitions in one of those places,
  where a any-dl config-file can be placed inside your HOME directory.

  This would have the advantage, that the config-file coming with any-dl will
  always be placed in the /etc/ directory, and your local parser-definitions
  will be saved in your local configs in $HOME.


  So, if you want to add your ADDITIONAL parsers to those that are already
  existing in the file inside "rc-file.adl" ( e.g. copied to /etc/any-dl.rc ),
  this can be done by just editing the config files in your $HOME-dir
  and write your own definitions into these files.
  It's not necessary (and also not recommended) to have there a copy of the
  "rc-file.adl", in which you added your parsers.
  Just write ONLY your own parsers into your local files.

  So, in other words: if you want to add your own parser-definitions place them
  solely in one of the default places fo config files in $HOME.
  Don't copy the stuff from "rc-file.adl" stored in /etc/any-dl.rc.

  ( If you place the config file "rc-file.adl" in $XDG_CONFIG_HOME/any-dl.rc
    you can add your parsers in $HOME/.any-dl.rc )

  Of course you can also add your additional parsers in the place,
  where all parsers from "rc-file.adl" are stored... but when you update
  to a newer version of any-dl you maybe by accident overwrite your own
  stuff with the new "rc-file.adl" coming with a newer version of any-dl.)

  IF YOU WISH TO USE OTHER CONFIG-FILES, you can specify them
  with the -f option of any-dl.
  If you use the -f option, you can give a filename-path, which is used
  as config file then.

  If you wish to add more than one config file, you can do it by just
  using the -f option more than once.


  You need to provide the url from the video archive,
  and give it to any-dl as a command line argument.
  Very often you have to quote the url inside of " and "
  so that certain symbols are not interpreted by the shell,
  from which you start any-dl.

  For example on ARTE mediathek, there is
  a telecast "Frankreichs mythische Orte", and the URL
  of it is:

  If you want  to download the video of it, at the shell it will look like this:

    $ any-dl ""

  Then any-dl would download the video. :-)
  That's all :-)

  The same principle holds true for any other archives, for which
  a parser definition already is provided.
  If there is no such parser defined, any-dl will tell you with an

  You then may ask, if there already is a parser for it available,
  written by the author of any-dl, or by any other persons.
  Or you could learn the parser definition language and program your
  own parser for that archive.
  If you send the parser you wrote to the author of any-dl,
  then in a newer release of any-dl, other people could use it also.

  By the way: there are already also some parser definitions, that are not
  focussed on certain video archives.
  There is the parser "linkextract" as well as "linkextract_xml".
  You can use them to pick out html-hyperreferences (typically called "links"
  or "references") or links in xml-files.

  To pick a certain parser can be done with the command-line switch "p":

    $ any-dl -p linkextract ""

  will print out all href's of the document (and they should all appear as absolute URLs).

  The names of all defined/available parsers can be displayed with the "l"-switch:

    $ any-dl -l

  If a parser as URLs, on which it will be invoked as default (when not using -p)
  it is also displayed with -l as switch.

  If you want to write your own parser-definitions, you need the list of
  commands. You can get it with the -c switch:

    $ any-dl -c 

  will print a list of all keywords that the lexer/scanner does accept.

  That's enough for an introduction.

  And here now follows a brief introduction into the parser definition language.

Parser-Definition Language: Intro

  Here is a simple parser definition, that allows to pick out all
  html-hyper-references from a webpage and print them.

  parsername "linkextract": ( "" )

  As you can see, the definition allows to give a parsername
  to the definition of the parser, an inbetween of "start" and "end"
  the commands that define the parser, are listed.

  A get-command that downloads the url
  (which is given via the command line) is done

  Then the commands "linkextract" and "print"
  are executed.
  So, all links from the document, referred to by the URL
  are printed.

  The part with the parantehses and quoting-symbols allows to bind certain
  URL's to this parser, so that a parser can be selected automatically
  via the URL. So, a parser, dedicated to a certain URL will be invoked
  to work on the document, that has a certain URL.

  Via command line arguments, it is possible, to select a different parser,
  to do it differently than using the defaults.

  As an example see at the parser, that does look-up for the
  video-files of the NDR-TV-broadcaster in germany:

    # Example-URL:
    parsername "ndr_mediathek_get": ( "" )
      match( "http://.*?mp4" );

      # download the video
      # ------------------
      paste("wget ",  $url );

  There you can see, that the parser-name is set to
  "ndr_mediathek_get", and the URL, to which this parser is bound by
  default is "".
  This does mean, that any URLs, that start with ""
  will be parsed with the "ndr_mediathek_get" parser.

  If you give an URL like the one in the example (shown above the parser)
  as command line argument to any-dl, then the parser "ndr_mediathek_get"
  is invoked to look for the video file.

  Again, an implicit get is invoked.
  Because the first doeument must be downloaded in any case,
  the first get is done implicitly.
  It's obvious that the first document must be downloaded,
  and it makes writing the parsers easier.

Stack and named variables

  This language is somehow special, that uses a mix of
  a stack-based language and one that allows named variables.
  The stack has a size of one value.
  Most functions use the stack. They can get their argument from there,
  as well as puttin gtheir results to the stack.
  A one-value-stack, which is used to read arguments from and save
  results to, does behave like a pipe in unix-environment.
  Something is written to a pipe by someone, and the same thing is read from a pipe by someone.
  So, the stack emulates something like a pipe.

    (Another analogy would be Perl's built in variable $_
     but a Pipe analogy does fit the picture better. I think.)

  Because this behaviour sometimes is not providing enough complexity,
  any-dl also allows to store data/results in named variables.

The NDR-example explained

  The first command does a MATCH with regular expressions
  on the contents of the first document.
  It does the match on the document, which was downloaded by the implicit
  GET-command. This document was put onto the 1-valued-stack.
  The match command reads the argument (the document) from the stack,
  tries to match for the certain regular expression, and puts the result
  onto the stack.

  Then from the result (a match is a 2D-matrix, meaning an array of an array",
  the first row (index == 0) is selected with ROWSELECT.
  The resulting selection holds an array.
  This selection-result is put to the 1-valued-stack.
  The stack-value (selection-result) is stored in the named variable "url" for later use
  via the STORE-command.

    To come back to the pipe-analogy, it's like a pipe that would look like this
      GET(<start-url>) | MATCH(<regular_expression>) | ROWSELECT( <index> ) | STORE( <varname> ) | ....

  The paste-command pastes the literal string and the contents of the named variable "url"
  together, and places the  result on the 1-valued stack.
  The system command tries to use the system() command (which you may know
  from other programming languages, the shell or the system-API)
  and as argument uses the value from the stack.

  So, if the variable "url" contanins the video-url,
  the system()-call would look like this one:

      system("wget <video-url>");

  That is the parser language explained by example.

  I hope, this example shows you, what there is all about the input language
  (parser definition language).

  It's comparingly easy (IMHO), and in this way it will be possible to have easy access
  to a lot of different video archives, all with the same tool.
  So, it is not necessary to look for tool-updates, when some URLs and how they
  are connected together, on a video-archive-page, do change.

  If something changes in the way a video url is presented on one of these
  video/archives / Mediatheken, then only the according parser-definition
  needs to be updated. The tool any-dl itself does not needed to be changed.

  Also, all the different tools that provide video-download-functionality,
  with all
  their seperated effort of the programmer (many programmers), done to make only
  certain archive be accessed, can be freed to make just the basic analyzing of
  the webpages that provide the videos, and save effort to program a tool.
  So, one tool and many archives, instead of many tools for some archives.

  So, I think the advantage may be obvious to you.

  Now, details about the language will follow.

Language Features:


  parsername "<parser-name>": ( <list-of-urls> )

  Example: see above.

  <list-of-urls> is a comma-seperated list of strings.

  Commands all end with a semicolon ( ';' ).
  Commands / functions, that do not have parameters, will be used without
  parenatheses ( '(' and ')' ).
  Only when a command / function will need arguments,
  these will be passed inside parenatheses ( '(' and ')' )
  which follow the name of the command/function.

  Some commands are available with and without parantheses.
  An example is the print-command/function.

  Stringquoting at the moment has three dfferent styles:

    String-Quoting:  "    "
    String-Quoting:  >>>  <<<
    String-Quoting:  _*_  _*_

  The language offers a stack of size 1.
  That means, that results from one command / function
  can be passed as input for the next command/function
  and this is default behaviour.
  Not all commands / functions do need the stack
  for input, and not all do leave something there
  as result (and input for following functions/commands).

  But if there is the need for transfering a result,
  normally no additional variables are needed.
  Most often, the data can be transferred from one function/command
  to the next one via the 1-valued-stack.

  But in certain cases, this is not enough.
  For these cases there are named variables also.

  To store the current data from the default-stack
  under a certain name, the command
  will be used.

  To copy (restore/recall) the value of the named variable back to the default stack,
  the command 
  can be used.

  In the paste()-command/function, it is possible, to access
  named variables via the $-notation, that you might know from
  other programming languages, like Perl for example.

  In the NDR-parser, it looks like this:
      paste("wget ",  $url );

  This does paste together the literal string "wget "
  and the contents of the named variable "url".
  The result of paste is stored at the one-valued stack.
  And the system-command uses this value as it's argument
  (and therefore downloads a file with the wget-tool).


The document(-url) given via command line is loaded automatically.
The loaded document is automatically saved as a named variable (name: "BASEDOC").

Command Line Options:

    -l list parser-definitions and related URLs
    -p <parsername>  selects a certain parser, to be used for all urls.
                     The names that can be selected can be listed with
                     the -l option, or one can look into the rc-file.
    -f filename for rc-file
    -v  verbose output
    -vv very verbose output
    -c show commands of parserdef-language
    -v verbose     
    -s safe: no download via system invoked
    -i interactive: interactive features enabled

    -a     auto-try: try all parsers
    -as    auto-try-stop: try all parsers; stop after first success

    -u     set the user-agent-string manually
    -ir    set the initial referrer from '-' to custom value 
    -ms    set a sleep-time in a (bulk-) get-command in milli-seconds
            => sleeps only for bulk-get-commands (get that would call a list of documents,
               not for single get-commands)
    -sep   set seperator-string, which is printed between parser-calls

    -help  Display this list of options
    --help  Display this list of options


  1.: Print html-links of a webpage:

    If you want to print the href-links of html,
    use any-dl with the predefined parser for link-extraction:

      $ any-dl -p linkextract   <url_list>

List of commands/keywords and a short-description of them:

    Appends tmpvar to a named variable.
    If that variable does not exist already, it's internally created as empty match-result
    (which means the appendto-command then creates the varable itself with the new data.)

    "appendto" only works on match-results.

    Two match results will be concatenated this way, and be saved in the
    named variable automatically. (No store-command is needed after append.)

    The itmes will be concatenated as Rows, so adding two matchres'
    will add the second matchres as appending it's rows to the matchres in the
    named variable.

    creates the basename of an url or filename;
    the leading filename or URL-path is removed

    call a macro.
    The macro is working like textual insertion of the commands of the
    macro at the place where the "call"-command is used.

    csv_read reads in a file as csv-file.
    The result is placed in tmpvar as Match_result.

    csv_save_as does save a *match_result* to a csv-file.
    All data is transformed to have equal number of columns in each row.
    Arguments of csv_save_as() are appended into a resulting filename.

    csv_save does save a *match_result* to a csv-file.
    All data is transformed to have equal number of columns in each row.
    The filename is derived from the used STARTURL.
    The charcater set is shrinked down to a subset of ASCII.
    ".csv" is appended automatically.

    selects columns from a match-result
        # Example:
        # --------

    deletes / removes a variable.
    It is not accessible anymore then.
    This means: accessing it can result in an error,
    because it's like accessing a variable that was not
    defined at all.

    downloads an entity and storing to a file.
        # Examples:
        # ---------
        download( $filename );

    drops a column from a match-result

    drops a row from a match-result

    just a dummy command (something like a NOP of processors)

    dump a html-page: deparses the tags, prints tags and data
    annotated; data is indented and an underline prepended.
    The underline is a multitude (defaults to 2) of the deepness
    of the nesting in the parse-tree.
    Means: the deeper something is wrapped in tags, the higher the indentation.

    dump a the data-part of a html-page: deparses the tags, prints data part,
    and NOT the tags.
    Works like un-tag html, or like a html-2-text.

    just a dummy command (something like a NOP of processors),
    but gives back Empty as tmpvar

    end-keyword for the parser-definition

    extract matching elements from data

    extract non-matching elements from data
    (grepv: grep -v)

    exit's a parse of one parser.
    This means, that the URL that is currently tried to be parsed and
    worked on, will not be further investigated.
    But if there are more than one URl given via command-line,
    then the next url will be investigated.
    This means: even if by accident your parser for one url is
    exited (e.g. you are developing the parser for that URL),
    the next one will be worked on.

    gets a document like html or xml page.
    Could also be a file, but not a stream so far.

    Decodes the HTML-Quotings like &#034; and such stuff
    back into "normal" characters.

    this is an interactive selectmatch. ("i" for interactive).
    Without the "-i" switch on the command line, it behaves
    like selectmatch().
    But when the "-i" switch is set via command line,
    then an interactive menue will be displayed, so that
    the user can select an option; this option will allow
    to select the row by the selected column-index interactively.
    The user selects a number (beginning from 0).
    The corresponding column of the selected number
    will be used for selection of the row.
    If the input is not valid, a default value will be used.
    The default value is the value, that is the second arg of
    iselectmatch(). It would be the same as a hard coded selection
    of a selectmatch().
    So, in most cases it would make sense to use iselectmatch()
    instead of selectmatch().
        # Example:
        # --------
        iselectmatch( <col_idx>, <matchpat>, <default_pattern>);

    extracts href-links from html-pages;
    relative links will tried to be converted into absolute links.

    extracting href-items of an xml-document

    displays all named variables.
    Prints variable-name only.
    (show_variables does also print the contents of the variables)

    tries to make an url from a string

    tries to match to the used pattern.
    PCRE-matches are used.
    The result is a matrix, containing of

    Please note:
      For real matches: Col 0 is the whole match, all others are the groups of a match.
      For match_results, thatare just "arrays of arrays" (not coming from a match,
      this obviously does not hold.
      If you do a match, and want only the selected groups to appear in your
      result, use
      to kick out the whole-match.

        # examples:
        match(>>>another "Regex"-String<<<);

    a multiple-select, like select, but the result will be an array
    of items (Strings or URls) not a single element.
        # Example:
        # --------

    this keywords starts the definition of a parser.

    the paste()-command creates a string from strings and variable-names (-notation).
    paste() accepts a list of items, seperated by commas (",").
        # Example:
        # --------
        paste( "literal string", $varname, "foo", $bar );

    post does make a post-request (instead of get-request) to a webserver.

    The post-data is stored in named variables; the names of the variables
    will be given to post as arguments, e.g.:
      post( "name_1", "name_2" );
    and the values will be looked up internally.
    For that purpose, the post-data has to be stored in named variables,
    before the post-command is called, so that the value for a variable
    can be looked up by the post-command.

    The URL for the post-command is taken from tmpvar.

        # Example:
        # --------
        post("valname_1", "valname_2", "valname_3"); # the values must be set as named variables before.

    print invoked without parantheses prints the value on the one-val-stack.
    print() with parantheses prints strings and variables (denoted by $-notatation),
    which means it accepts the same parameters as paste() but does not change the
    print() used on an empty string does end the line automatically.
    This means, a new line will be used for further commands.
    If you wish to print only a certain string, without line-endlings added,
    you need to use print_string()

    accepts only one string-argument and prints it.
    It prints the plain string, and does not add line-ending automatically.

    wraps the one-val-stack value with '"' and '"'.
    needed for arguments that are given to other tools,
    which will be invoked bia system() (which is invoking
    a shell).

    reads one line from stdin / console.
    Without arguments, the input is stored in the TMPVAR,
    With argument, the argument is used as variable-name,
    and the input is stored in this named variable.

        # Examples:
        # ---------

    get a named value and store it on the one-val-stack.
        # Example:
        # --------

    selects a certain row from a match-result.
        # Example:
        # --------

    saves a document to a file.
    The filename is derived from the url of the document.
    The charcater set is shrinked down to a subset of ASCII.

    saves a document to a file with filename as argument.

    selects ONE part of a tmpvar.

    Examples: select(0);

    For rows and columns:

      0 selects the document,
      1 selects the url of the document

      any other value  selects the document too

    selects document with index (starting at 0)
      selects ONE ELEMENT from a row or a column.
      The row/column must already have been selected with rowselect() or colselect().
      select() does NOT allow matches on match-results (which are a matrix internally).
        # Example:
        # --------

    allows to select a row from a match-result, by specifiying
    a column-index and a string-matching-pattern for this certain
    So, this is a more advanced rowselect() with additional matching capabilities.

    shows a match-result in a certain way; this command is
    intended to display matchese in a way, wher they can be read easily.
    Most often will be used in parser-development.
    But can of course also be used for informing the user
    on the steps that any-dl has done (e.,g. just be verbose and
    display the matches). But normally, rather developers
    will be interested in these details.

    just shows the "type" of the value in the one-val-stack.

    displays all named variables.
    Prints variable-name and contents of the variable.
    (list_variables does only print the names of the variables)

    this keyword indicates the start of the keywords section
    of a parser definition.

    store the value from the one-val-stack as named variable.
    (use recall() for getting it back to the one-val-stack, or
     $-notation in some of the commnds that accept this notatiom).
       # Example:

    Stores the tmpvar (must be match-result) to a named variable,
    with Row- and Column-Indexes as part of the name:
      storematch("MyName"); # stores matchresult as  MyName.(col).(row)
      (for all col's and row's as indexes of the match-result)

    Uses Pcre.replace internally.
        # Example:
        # --------
        subst("pattern", "subst-string");

    calls the system() command with the string that is hold in
    the one-val-stack.

  table_to_matchres  (expermental feature so far)
    converts a html-table to a match-result.
    This conversion works for single tables.
    So, a selection of a table should be as specific as possible, so that
    only one table will be seleted with tagselect.
    Then the conversion works.
    If more than one table has been extracted by tagselect, then
    they all will becoerced into ONE mathc-result.
    If that's, what is wanted, anything is fine. Otherwise, seperate table-selection will be necessary.

    Use tagselect with "htmlstring"-extractor, like this:

       # Example:
       # --------
         tagselect("table"."id"="foobar" | htmlstring );

    selects tags and "subtags" from a document tree and gives back
    data accordingly.
    selection can be a *list* of tags, and optionally the argument
    "args" or the argument "arg" with a key-parameter (of a key-value pair)
    that selects the certain argument.
    See above in the command-examples for syntax details.

    Selection list does do a selection on the firt selector-specification.
    Then the resulting stuff is again selected, and so on.

        tagselect("table", "a", "img"."align"="top"| dump);

      The document is first scanned for table's.
      The outermost match is selected. So if a table is inside a table,
      the outer tag will be selected, and the whole outer table be selected.
      The inner table would just be content of the first one.
      No in-depth selection is done.

      All found table's then are scanned for <a ...> tags, which
      should be the <a href="..."> stuff.
      From the found <a ...>-tags any img-tags inside these a-tags
      will be selected, if they also are top-aligned.

      The result then is dumped to screen/console.

    tagselect selects elements from the document tree,
    so that a selection picks that certain tag and all it's descenmdants.
    That means for example, that a data-slurp-extraction will show all data from the descendants.
    But all other extractors ONLY LOOK UP THE TOPMOST element.
    (And not the desendants)
    The reason is: that the selected element normaly is what needs to be analyzed,
    not necessarily the descendants.

    With the "anytag" selector in tagselect (e.g. 'tagselect( anytags, argpairs );' )
    ANY tag is selected, so ALL tags are TOPMOST tags, because any descendant also
    is edetected as a new tag.
    This is a depth-first selection, with each element being a top-element.
    This way you can access all descendants and analyse them, fr example
    extract all argpairs from all the tags of the whole document.

        # Examples, showing the allowed syntax:
        # -------------------------------------
        tagselect( "a"| dump );    # dumps all <a ...> tags
        tagselect( "br"| dump );   # dumps all <br>-tags
        tagselect( "table", "a"| dump ); # <a ...> inside tables will be dumped
        tagselect( "img"."src"| dump );  # <img src="...">   wil be dumped

        tagselect("table", "a", "img"."align"="top"| dump); # all img-tags with "align"="top" will be selected,
                                                            # if they appear inside a table; the stuff is dumped to screen

        tagselect( "table", "a" | argpairs );      # extract argpairs from the stuff that was selected
        tagselect( "table", "a" | arg("href") );   # extract value for the arg with key/name "href" from the stuff that was selected

        # the pair-extratcors ( "argpairs", "argkeys", "argvals" ) can be used as single-extractor-arguments
        # the other selectros select one item only (not pairs) and can be given as list, like this:

        tagselect("img"."src" | arg("src"), arg("alt") );

        # tagselect used with "anytags"-selector 
        # --------------------------------------
        # the "anytags"-selector selects ANY tags,
        # which means that ALL tags from the document are
        # picked up in depth-first manner.
        # without anytags, a match does pick a tag with all descendants.
        # But these descendants will not be extracted with a extractor-pattern!
        # --------------------------------------
        tagselect( anytags | argpairs ); # shows argpairs of ANY / ALL tags found (depth-first)

    extracts the contents from the <title>-tag of a webpage
    and puts the resutlt to the one-val-stack.

    converts the value of the one-val-stack to a string-representation.

    converts the value of the one-val-stack to a value of the same type,
    that a match-operation gives as result.
    This is an a row-column "array" and show_match-command could show
    details about it.
    Useful for later selecting/rowselecting values from this matchres-typed

    transposition of a Match_result (which is an array of arrays, or a "matrix").
    This exchanges rows and columns.

    Sorts entries. (only match-result's so far.)

    From the tmpvar it removes entries with same contents.
    Works like "uniq" from unix-toolbox, but not limited to neighbouring lines;
    or like "sort -u" without a sort (not changing the order of the entries).

    Please note: on match-results, uniq works on ROWs. This means, that multiple
    rows will be discarded.
    But multiple Columns in a row will not be touched.
    If you want to remove multiple columns of a row, you need to first transpose,
    then use uniq, and then transpose again.

    So, for uniq-ing columns (remove multiple equal columns), you need to do it this way:

New feature (as of december 2014): assignments

  It's now also possible to use assignments.

      varname = COMMAND(...);

  assigns the tmpvar that was created by the command
  (internally via Store-command) to a named variable.

  Using assignments is not different to calling a command
  and then do store(<varname>);

  It's just syntactic sugar,
  maybe helpful in situations, where many store/recall comands would be

  The tmpvar will be placed on the tmpvar-stack,
  so a previous stack-value will be replaced by the new one.

Control Structures:

  if( ... )
  then ...
  else ...

    Here the "..." stands for statements that can be used there.
    (statement-list / command-list; each command needs to be followed by a semicolon)

    Instead of "if" also "ifnotempty" and "ifne" can be used.
    All three behave the same way, but some people may prefer more
    specific keywords.
    So, if the tmpvar that the statements-list inside the "if"-command (in the paranteheses)
    lefts behind, is not empty, the condition is true.
    (Thats like in otherlanguages like C or Perl).

  while( ... )

    Here the "..." stands for statements that can be used there.
    (statement-list / command-list; each command needs to be followed by a semicolon)

    Instead of "while" also "whilenotempty" and "whilene" can be used.
    All three behave the same way, but some people may prefer more
    specific keywords.
    So, if the tmpvar that the statements-list inside the "if"-command (in the paranteheses)
    lefts behind, is not empty, the condition is true and the statementlist
    between "do" and "done" will be evaluated.
    This is repeated, as long as the statement-list between the while-parantheses
    evaluates to the empty value.

Prefedined Variables:

    The URL that is given via cli and investigated by the corresponding
    parser can access the url via this named variable.
    (recall("STARTURL") or $STARTURL)

    The document, that is retrieved via $STARTURL is avaiable as
    names variable BASEDOC.

    Cookies received from a webserver will be stored here.

    Cookies which should be send to the webserver are stored here.

    Unix-Timestamp (seconds 00h00m00s GMT 01.01.1970) as string.

Using Cookies:
  If a server sends cookies, they will be stroed in the named variable
  If you want to send them back to the server, you manually need to copy
  the contents of "COOKIES.RECEIVED" to "COOKIES.SEND".
  Just add these two commands between the commands which get cookies and which
  should send cookies:

    recall("COOKIES.RECEIVED"); # recall the cookies from named variable and put them on the one-var-stack
    store("COOKIES.SEND");      # store the cookies from the one-var-stack in the variable "COOKIES.SEND"

  This handling seems a bit unconvenient; cookies could be received and sent automagically.
  But this manual handling gives you more control over the process.
  Be aware, that the contents of "COOKIES.SEND" is not changed automatically.
  So, wzhat is stored in that variable will be used again and again in next calls to the webserver.
  So, you need to keep track of the right cookies by using the above mentioned way of
  setting "COOKIES.SEND", or delete that variable.

  A new feature was added in end of march 2015: macros.

  It's now posible to define macros, so that repeating sequences don't need to
  be coded again and again. Instead a macro can be defined that allows
  factoring-out common command sequences into macros, and then call these
  macros with the "call"-command. (See call-command above, in the
  parserdef-language description.)

  Macros can be defined anywhere in a rc-file.
  They don't need to be defined before they are used.

  The syntax of macro definitons can be seen in this example:

  defmacro "FooBar":
    tagselect( "a"."href" | arg("href") );

  This macro would be called with the command

This macro would be called with the command

Something went wrong with that request. Please try again.