diff --git a/docs/pdds/draft/pdd10_embedding.pod b/docs/pdds/draft/pdd10_embedding.pod index a3ac691339..2d2ddc923a 100644 --- a/docs/pdds/draft/pdd10_embedding.pod +++ b/docs/pdds/draft/pdd10_embedding.pod @@ -1,167 +1,155 @@ # Copyright (c) 2001-2010, Parrot Foundation. -=head1 [DRAFT] PDD10: Embedding and Extending +=head1 PDD10: Embedding =head2 Abstract -What we believe people will do when embedding and extending Parrot, why they -do it, and how. - -{{ NOTE: some of this will later move into pdds 11 & 12, but for now -just want to get the stub checked in. }} +Parrot, more precisely libparrot, can be embedded into applications to provide +a dynamic language runtime. A perfect example of this embedding is in the +Parrot executable, which is a thin wrapper around libparrot. =head2 Version -$Revision$ +Version 1 =head2 Description -Why embed: - -=over 4 - -=item * access to special features/libraries/languages Parrot provides - -=item * need an interpreter for a DSL or existing language - -=item * want to run Parrot on another platform or environment (dedicated -hardware, in a web server, et cetera) - -=back - -Why extend: - -=over 4 - -=item * need something NCI doesn't provide - -=item * writing a custom PMC - -=back - -Philosophical rules: - -=over 4 - -=item * only ever use opaque pointers - -=item * should be able to communicate through PMCs - -=item * minimize conversions to and from C data +=head3 Difference Between Embedding and Extending + +Embedding and Extending (PDD 11) are similar concepts. In both, we write code +that interfaces with libparrot. In an embedding situation we write an +application which loads and calls libparrot. In an extending situation, +libparrot loads and calls your module. + +Extending gives libparrot more features, and allows your code to execute from +inside libparrot. From that location, the extending application has full +access to the available power and features of libparrot. This includes +knowledge about internal structure definitions, and internal-only functions +and subsystems. + +Embedding, on the other hand, has much more limited access to libparrot. All +embedding applications must use the official embedding API, which is +limited and abstracted by design. Embedding applications must treat all +pointers and structures returned from the API as being opaque. + +=head3 The Embedding API + +The Embedding API is a special set of functions found in the C +directory. These functions may not be used internally by libparrot, embedding +applications may not use any other functions. Breaking either of these rules +can have serious implications for application stability. + +Prior to the implementation of the new API, when libparrot had an unhandled +exception it would call the C C library function to close the +application. This is undesirable because embedding applications want the +ability to handle errors and recover from problems in libparrot. The new API +provides error handling capabilities for cases of unhandled exceptions, +including both expected EXCEPT_exit and other types of error-related +exceptions. + +The embedding API also makes sure certain details are in place, including +stack markers for the GC. Calling into libparrot without setting a valid +stack marker could cause serious (and difficult to diagnose) errors. + +The embedding API provides relatively limited interaction with libparrot, at +least from the point of view of an internals developer or an extension +developer. There are many reasons for this. First and foremost, the full power +of libparrot is almost always available through the runcore. If you want to +do something with Parrot, it is almost always easier and preferred to write +your code in a language which targets Parrot, compile it down to bytecode, and +load that bytecode into Parrot to execute. Almost all applications of +libparrot will involve bytecode execution at some level, and this is where +most operations become possible. + +The API also provides a powerful abstraction layer between the libparrot +internals developers and the embedding application developers. The API is +sufficiently abstracted and detached enough that even large changes to the +internals of libparrot are unlikely to require any changes in the embedding +application. For instance, libparrot could completely change it's entire +object model implementation and not cause a change to the API at all. + +While limited, the API is not static. If embedders need new features or +functionality, those can usually be added with relative ease. + +=head2 Using the Embedding API + +Using the Embedding API brings with it some rules that the embedding +developer must follow, and some conventions that the embedding developer +should follow unless it's unreasonable to do so. =over 4 -=item * perhaps macros; Ruby does this fairly well and Perl 5 does this -poorly +=item * The embed API operates mostly on the 4 core Parrot data types: +Parrot_PMC, Parrot_String, Parrot_Int, and Parrot_Float. The first two of +these are pointers and should be treated as opaque. -=item * minimize the number of necessary functions - -=item * probably can follow core Parrot code to some extent, but beware the -Perl 5 problem - -=over 4 - -=item * do not expose Parrot internals that may change - -=over 4 - -=item * minimize the number of headers used - -=item * minimize the number of Parrot types exposed - -=item * follow boundaries similar to those of PIR where possible - -=back - -=item * probably includes vtable functions on PMCs - -=back - -=back - -=back - -Gotchas: - -=over 4 +=item * PMCs are the primary data item. Anything more complicated than an +integer or string will be passed as a PMC. -=item * who handles signals? +=item * The number of API functions will stay relatively small. The purpose of +the API is not to provide the most efficient use of libparrot, but instead +the most general and abstracted one. -=item * who owns file descriptors and other Unix resources? +=item * Calls into libparrot carry a performance overhead because we have to +do error handling, stack manipulation, data marshalling, etc. It is best to +do less work through the API, and more work through bytecode and the runcore. -=item * is there an exception boundary? +=item * The embed API uses a single header file: L. +Embedding applications should use only this header file and no other header +files from Parrot. Embedding applications should NOT use +L or L, or any other files. -=item * namespace issues -- especially key related +=item * libparrot does little to no signal handling. Those are typically the +responsibility of the embedder. -=item * probably a continuation/control flow boundary +=item * File descriptors and resource handles are typically owned by whoever +opens them first. If the embedding application tells libparrot to open a file +with a FileHandle PMC, libparrot will keep and manage that file descriptor. +Functionality may be provide to import and export sharable resources like +these. -=item * packfiles and subroutines probably too much information for -either +=item * Resources such as allocated memory are managed by whoever creates +them. If the embedding application allocates a structure and passes it in to +libparrot, the embedding application is in charge of managing and freeing that +structure. If libparrot allocates data, it will be in charge of managing and +freeing it. In many cases, data passed to or from libparrot through the API +will be copied to a new memory buffer. -=item * do not let MMD and other implementation details escape - -=item * okay to require some PBC/PIR/PASM for handling round-trip data - -=item * Parrot should not spew errors to STDERR when embedded - -=item * who allocates and deallocates resources passed through the boundary -level? - -=item * should be access to Parrot's event loop when embedded - -=item * passing var args to Parrot subs likely painful - -=over 4 - -=item * perhaps macros/functions to add parameters to call - -=item * build up a call signature somehow? - -=item * some abstraction for a call frame? - -=back - -=item * compiling code from a string should return the PMC Sub entry point -(:main) - -=item * are there still directory path, loading, and deployment issues? - -=item * how do dynamic oplibs and custom PMCs interact? - -=item * what's the best way to handle character sets and Unicode? - -=back - -=head2 Definitions - -Embedding - using libparrot from within another program, likely with a -C/NCI/FFI interface - -Extending - writing Parrot extensions, likely through C or another language - -In practice, there is little difference between the two; mostly in terms of -who has control. The necessary interfaces should stay the same. +=item * libparrot will not output error information to C unless +specifically requested to. Instead, libparrot will gather all error +information and make it available to the user through function calls. =head2 Implementation -Implementation details. +The embedding API has two goals: To allow access to libparrot as a dynamic +language runtime and bytecode interpreter, and to encapsulate implementation +details internal to libparrot from the embedding application. -Simplicity is the main goal; it should be almost trivial to embed Parrot in an -existing application. It must be trivial to do the right thing; the APIs must -make it so much easier to work correctly than to make mistakes. This means, -in particular, that: +There are several guidelines for the embedding API implementation that +developers of it should follow: =over 4 -=item * it should never be possible to crash or corrupt the interpreter when -following the interface as documented +=item * It should never be possible to crash or corrupt the interpreter when +following the interface as documented. The interpreter should be able to be +used and reused until it is explicitly destroyed. + +=item * It should never be possible for libparrot to crash, corrupt, or +forcibly exit the embedding application. Also, libparrot should never use +resources which haven't been assigned to it, such as standard IO handles +C, C, and C. -=item * each API call or element should have a single purpose +=item * Each API function should have a single purpose, and should avoid +duplication of functionality as much as possible. A course-grained API is +preferrable to a fine-grained one, even if some performance must be +sacrificed. -=item * names must be consistent in the API documentation and the examples +=item * names must be consistent in the API documentation and the examples. +All API functions are named C. -=item * it I be possible to embed Parrot I Parrot through NCI, -as a test both of the sanity of the external interface as well as NCI +=item * The return value of every API function should be an integer value. The +return value should be 1 on success, and 0 on failure. No other results should +be returned. =back @@ -169,88 +157,61 @@ as a test both of the sanity of the external interface as well as NCI It is the external code's duty to create, manage, and destroy interpreters. -C returns an opaque pointer to a new interpreter: - - Parrot_Interp Parrot_new(Parrot_Interp parent); - -C can be NULL for the I interpreter created. All subsequent -calls to this function should pass an existing interpreter. - -I - -C destroys an interpreter and frees its resources. - - void Parrot_destroy(Parrot_Interp); - -I - -=head3 Working with Source Code and PBC Files - -Perhaps the most common case for working with code is loading it from an -external file. This may often be PBC, but it must also be possible to load -code with any registered compiler. This I be a single-stage operation: - - Parrot_PMC Parrot_load_bytecode( Parrot_Interp, const char *filepath ); - - Parrot_PMC Parrot_load_hll_code( Parrot_Interp, const char *compiler, - const char *filepath ); - -The PMC returned will be the Sub PMC representing the entry point into the -code. That is, it will be the PMC representing the C<:main> subroutine, if -one exists, or the first subroutine in the file. - -If there is an error -- such that the file does not exist, the compiler is -unknown, or there was a compilation or invalid bytecode error -- the PMC -should be an Exception PMC instead. +C returns an opaque pointer to a new interpreter, +with some options set in it. The definition of C +is as follows: -I + Parrot_Int + Parrot_api_make_interpreter(Parrot_PMC parent, Parrot_Int flags, + Parrot_Init_Args *args, Parrot_PMC * interp); -I and C exposes -the details of packfiles to the external API and uses two operations to -perform a single logical operation.> +A common usage pattern for making an interpreter is: -I can load PBC, PIR, and PASM files without having a -compiler named explicitly.> + Parrot_PMC interp = NULL; + Parrot_Init_Args *args = NULL; + GET_INIT_ARGS(args); + if (!Parrot_api_make_interpreter(NULL, 0, args, &interp)) { + fprintf(stderr, "Could not create interpreter"); + exit(EXIT_FAILURE); + } -Compiling source code generated or read from the host application is also -possible: +C can be NULL for the I interpreter created, or where the +interpreter does not have a logical parent. If a parent is provided, the new +interpreter will have a child/parent relationship with the parent interp. - Parrot_PMC Parrot_compile_string( Parrot_Interp, Parrot_String compiler, - const char *code, - Parrot_String error ); +The C parameter contains a bit-wise combination of certain startup +flags that govern interpreter creation. It is safe to set this to 0 unless +special needs require it to be otherwise. -The potential return values are the same as for loading code from disk. +The C parameter is a structure containing a series of options that must +be set on the interpreter during initialization. These options, many of which +deal with the memory subsystem and other deep internals can typically be +ignored. C can be C if no special options need to be set. -I to F.> +The new interpreter PMC is returned in the last parameter. -=head3 Working with PMCs +C destroys an interpreter and frees +its resources. -TBD. + Parrot_Int Parrot_api_destroy_interpreter(Parrot_Interp); -=head3 Calling Functions +It is a good idea to destroy child interpreters before destroying their +parents. -TBD. - -=head3 Calling Opcodes - -TBD. - -=head2 Language Notes - -It should be possible to register a compiler for an HLL with an interpreter -such that it is possible to load source code written in that language or pass -source code to an interpreter successfully. +=head3 Working with Source Code and PBC Files -=head2 References +libparrot natively executes .pbc bytecode files. These are manipulated in +Parrot through a PMC interface. PBC PMCs can be obtained in a number of ways: +they can be returned from a compiler, they can be loaded from PBC, or they can +be constructed on the fly. -None. +I -=cut +Once a PBC PMC is obtained, several things can be done with it: It can be +loaded into libparrot as a library and individual calls can be made into it. +It can also be executed directly as an application, which will trigger the +C<:main> function, if any. The PMC can also be written out to a .pbc file for +later use. -__END__ -Local Variables: - fill-column:78 -End: