Permalink
Fetching contributors…
Cannot retrieve contributors at this time
1144 lines (802 sloc) 37.9 KB
Super basic documentation.
$Id$
CTABLES - A Tcl interface for defining tables at the "C" level, generating
a Tcl C exension to create, access and manipulate those tables, compile
that extension, link it as a shared library, and have it be loadable on demand
as a C shared library via Tcl's "package require" mechanism.
Why CTables?
------------
Tcl is not well-known for its ability to represent complex data structures.
Yes, it has lists and associative arrays and, in Tcl 8.5, dicts. Yes,
object systems such as Incr Tcl provide a way to create somewhat complex
data structures and yes, the BLT toolkit, among probably others, has created
certain more efficient ways to represent data (a vector type, for instance)
than available by default and, yes, you can play games with "upvar" and
namespaces to create relatively complex structures.
However, there are three typical problems with rolling your own complex
data structures using lists, arrays, and upvar, or Itcl or Otcl, etc:
One is that they are memory-inefficient. Tcl objects use substantially
more memory than native C. For example, an integer stored as a Tcl
object has the integer and all the overhead of a Tcl object, 24 bytes
minimum and often way more. When constructing stuff into lists, there is
an overhead to making those lists, and the list structures themselves
consume memory, sometimes a surprising amount since Tcl tries to avoid
allocating memory on the fly by often allocating more than you need, and
sometimes much more than you need. (It is not uncommon to see ten or
twenty times the space consumed by the data itself used up by the
Tcl objects, lists, array, etc, used to hold them. Even on a modern
machine, using 20 gigabytes of memory to store a gigabyte of data is
at a minimum kind of gross and at worst makes the solution unusable.)
(Tcl arrays also store the field name along with each value, which is
inherently necessary given their design but is yet another example of the
inefficiency of this approach.)
The second problem with rolling your own complex structures is that they
are computationally inefficient. Constructing complicated structures out
of lists, arrays, etc, traversing and updating them is realtively CPU
time-consuming.
Finally, such approaches are often clumsy and obtuse. A combination of
upvar and namespaces and lists and arrays to represent a complex structure,
for example, creates a relatively opaque way of expressing and manipulating
that structure, making the code twisted, hard to follow, hard to teach, hard
to modify, and hard to hand off.
CTables reads a structure definition and emits C code to define that structure.
We generate a full-fledged C extension that manages rows as native C structs,
and emit subroutines for manipulating those rows in an efficient
manner. Memory efficiency is high because we have very little per-row
overhead (only a hashtable entry beyond the size of the struct itself).
Computational efficiency is high because we are reasonably clever about
storing and fetching those values, particularly when populating from
PostgreSQL database query results, reading them from a Tcl channel containing
tab-separated data, writing them tab-separated, locating them, updating
them, counting them, as well as importing and exporting by other means.
We also maintain a "null value" bit per field and provide ways to distinguish
between null values and non-null values, similar to SQL databases, and
providing a ready bridge between those database and our tables.
Data Types
----------
The following data types are available. Other types can be fairly easily
added, and how to do that is documented in the README file.
boolean - a single 0/1 bit
varstring - a variable-length string
fixedstring - a fixed-length string
short - a short integer
int - a machine native integer
long - a machine native long
wide - a 64-bit wide intgeger (Tcl Wide)
float - a floating point number
double - a double precision floating point number
char - a single character
mac - an ethernet MAC address
inet - an internet IP address
tclobj - a Tcl object, more on this powerful capability later
Example
-------
Let's define a ctable:
package require ctable
CExtension cable 1.1 {
CTable cable_info {
inet ip
mac mac
varstring name
varstring address
varstring addressNumber
varstring geos
int i
int j
long ij
tclobj extraStuff
}
}
Everything is defined in the CExtension. This will generate a C extension
named Cable, compile it and link it. Multiple tables can be defined in
one CExtension definition.
The name of the C extension follows the CExtension keyword, followed by
the version number, and then a code body containing table definitions.
When the above extension is sourced, a C source file will be generated,
compiled, and linked as a shared library.
After sourcing in the above definition, you can do a "package require Cable"
or "package require Cable 1.1" and it will load the extension.
For efficiency's sake, we detect whether or not the C extension has been
altered since the last time it was generated as a shared library, and
avoid the compilation and linking when it isn't necessary.
Sourcing the above code body and doing a "package require Cable" will
create one new command, cable_info, corresponding to the defined table.
We call this command a metatable.
cable_info create x - creates a new object, x, that is a Tcl command that
will manage and manipulate zero or more rows of the cable_info table.
You can create additional instances of the table using the metatable's
"create" method that will be independent of this one.
You can also say...
set obj [cable_info create #auto]
...to create a new instance of the table (containing, at first, zero rows),
without having to generate a unique name for it.
Additional metatable methods are, for example:
cable_info info - which currently does nothing (boring)
cable_info null_value \\N - which sets the default null value for all
fields to, in this case, \N -- note eventually we will be able to
override this within specific instances of the table -- for now,
though, it is global to all tables of the specified type.
cable_info method foo bar - this will register a new method named
"foo" and then invoke the proc "bar" with the arguments being
the name of the object followed by whatever arguments were passed.
For example, if after executing "cable_info method foo bar" and
creating an instance of the cable_info table named x, if you
executed
x foo a b c d
...then proc "bar" would be called with the arguments "x a b c d".
NOTE - Tcl appears to examine a shared library name and stop at the first
numeric digit in an apparently somewhat inadequate attempt to make sure it
doesn't include shared library version numbers in the expected *_Init and
*_SafeInit function names for the library being generated. Consequently
when you're defining a C extension via the CExtension command, do not
include any digits in your C extension's name.
Where Stuff Is Built
--------------------
The generated C source code, .o object file, and shared library are written
in a directory called "build" underneath the directory that's current
at the time the CExtension is sourced, unless a build path
is specfied. For example, after the "package require ctable" and outside
of and prior to the CExtension definition, if you invoke
CTableBuildPath /tmp
...then those files will be generated in the /tmp directory. (It's a bad
idea to use /tmp on a multiuser machine, of course, but could be OK for
an appliance or something like that.)
Note that the specified build path is appended to the Tcl library search
path variable, auto_path, if it isn't already in there.
Methods for Manipulating CTables
--------------------------------
Now the nitty gritty... The following built-in methods are available as
arguments to each instance of a ctable:
get, set, array_get, array_get_with_nulls, exists, delete, count, foreach,
type, import, import_postgres_result, export, fields, fieldtype,
needs_quoting, names, reset, destroy, statistics, write_tabsep,
read_tabsep
For the examples, assume we have done a "cable_info create x"
set
x set key field value ?field value...?
or
x set key keyValueList
The key is required and it must be unique. It can contain anything you
want. It's not also an element of the table. We may change this in the
future to make it possible to have tables that do not require any keys
(there is already a provision for this, though incomplete) and also to
allow more than one key. But for now, lame or not, this is how it works,
and as Peter says, for more than one key, you can always create some kind
of compound key.
% x set peter ip 127.0.0.1 name "Peter da Silva" i 501
In the above example, we create a row in the cable_info table named "x"
with an index of "peter", an ip value of 127.0.0.1, a name of "Peter
da Silva", and an "i" value of 501. All fields in the row that have
not been set will be marked as null. (Also any field set with the
null value will also be marked as null.)
% set values [list ip 127.0.0.1 name "Peter da Silva" i 501]
% x set peter $values
In this example, we specify the value as a list of key-value pairs.
This is a natural way to pull an array into a keytable:
% x set key [array get dataArray]
fields
"fields" returns a list of defined fields, in the order they were defined.
% x fields
ip mac name address addressNumber geos i j ij
get
Get fields. Get specified fields, or all fields if none are specified,
returning them as a Tcl list.
% x get peter
127.0.0.1 {} {Peter da Silva} {} {} {} 501 {} {}
% x get peter ip name
127.0.0.1 {Peter da Silva}
array_get
Get specified fields, or all fields if none are specified, in
"array get" (key-value pair) format. Note that if a field is null,
it will not be fetched.
% x array_get peter
ip 127.0.0.1 name {Peter da Silva} i 501
% x array_get peter ip name mac
ip 127.0.0.1 name {Peter da Silva}
array_get_with_nulls
Get specified fields, or all fields if none are specified, in "array get"
(key-value pair) format. If a field contains the null value, it is
fetched anyway. (Yes this should probably be an option switch to array_get
instead of its own method.)
% x array_get_with_nulls peter
ip 127.0.0.1 mac {} name {Peter da Silva} address {} addressNumber {} geos {} i 501 j {} ij {}
% x array_get_with_nulls peter ip name mac
ip 127.0.0.1 name {Peter da Silva} mac {}
Note that if the null value has been set, that value will be returned
other than the default null value of an empty Tcl object.
% cable_info null_value \\N
% x array_get_with_nulls peter
ip 127.0.0.1 mac \N name {Peter da Silva} address \N addressNumber \N geos \N i 501 j \N ij \N
% x array_get_with_nulls peter ip name mac
ip 127.0.0.1 name {Peter da Silva} mac \N
exists
Return 1 if the specified key exists, 0 otherwise.
% x exists peter
1
% x exists karl
0
delete
Delete the specified row from the table. Returns 1 if the row existed, 0
if it did not.
% x delete karl
0
% x set karl
% x delete karl
1
% x delete karl
0
count
Return a count the number of rows in the table.
% x count
1
search
Search for matching rows and take actions on them, with optional sorting.
Search is a powerful element of the ctables tool that can be leveraged
to do many things traditionally done with database systems that incur
much more overhead.
Search can perform brute force multivariable searches on a ctable and
taking actions on matching records, without Tcl interpretation for
each row. On a modern 2006 Intel and AMD machines, search can do,
for example, unanchored string match searches at a rate of 3 million
per CPU second (330 nanoseconds per row), with a lot of optimization
still not attempted.
$ctable search ?-sort {?-?field..}? ?-fields fieldList? ?-glob pattern? ?-regexp pattern? ?-compare list? ?-countOnly 0|1? ?-offset offset? ?-limit limit? ?-code codeBody? ?-write_tabsep channel? ?-key keyVar? ?-get varName? ?-array_get varName? ?-array_get_with_nulls varName? ?-include_field_names 0|1?
-sort
Sort results based on the specified field or fields. If multiple
fields are specified, their precedence is in descending order.
In other words, the first field is the primary search key.
If you want to sort a field in descending order, put a dash
in front of the field name.
-fields
Restrict search results to the specified fields.
If you have a lot of fields in your table and only need a few,
using -fields to restrict retrieval to the specified fields will
provide a nice performance boost.
Fields that are used for sorting and/or for comparison expressions
do not need to be included in -fields unless you want those fields
retrieved.
-glob pattern
Perform a glob-style comparison on the key, excluding the examination
of rows not matching.
-regexp pattern
Not currently implemented.
-countOnly 0|1
If 1, counts matching rows but does not take any action based on
the count.
-offset offset
If specified, begins actions on search results at the "offset"
row found. For example, if offset is 100, the first 100 matching
records are bypassed before the search action begins to be
taken on matching rows.
-limit limit
If specified, limits the number of rows matched to "limit".
Even if used with -countOnly, -limit still works so if, for example,
you want to know if there are at least 10 matching records in the
table but you don't care what they contain or if there are more than
that many, you can search with -countOnly 1 -limit 10 and it will
return 10 if there are ten or more records.
-write_tabsep channel
Matching records are written tab-separated to the file or socket
(or postgresql database handle) "channel".
-include_field_names 1
If you are doing -write_tabsep, -include_field_names 1 will cause
the first line emitted to be a tab-separated list of field names.
-key keyVar
-get listVar
-array_get listVar
-array_get_with_nulls listVar
-code codeBody
Run code on matching rows.
If -key is specified, the key value of each matching row is written
into the variable specified as the argument to -key.
If -get is specified, the fields of the matching row are written
into the variable specified as the argument to -get. If -fields
is specified, you get those fields in the same order. If -fields
is not specified, you get all the fields in the order they were
defined. If you have any qeustion about the order of the fields,
just ask the ctable with "$table fields".
-array_get works like -get except that the field names and field
values are written into the specified variable in a manner that
"array get" can load into an array. I call this "array set"
format. Fields that are null are not retrieved with -array_get.
-array_get_with_nulls pulls all the fields. Note it is a common
bug to use -array_get in a -code loop and not unset the array
before resuming the loop, resulting in null variables not being
unset, that is, in the previous row fetch, field x has a value and
in the current row it doesn't. If you aven't unset your array,
and you "array get" the new result into the array, the previous
value of x will still be there. So either unset or use
array_get_with_nulls.
-compare list
Perform a comparison to select rows.
Compare expressions are specified as a list of lists. Each list
consists of an operator and one or more arguments.
Upon search, all of the expressions are evaluated left to right
and form a logical "and". That is, if any of the expressions fail,
the row is skipped.
Here's an example:
$ctable search -compare {{> coolness 50} {> hipness 50}} ...
In this case you're selecting every row where coolness is greater
than 50 and hipness is greater than 50.
Here are the available expressions:
{false field}
Expression compares true if field is false.
{true field}
Expression compares true if field is true.
{null field}
Expression compares true if field is null.
{notnull field}
Expression compares true if field is not null.
{< field value}
Expression compares true if field less than value.
...works with both strings and numbers, and yes,
compares the numbers and numbers and not strings.
{<= field value}
Expression compares true if field is less than or equal to value.
{= field value}
Expression compares true if field is equal to value.
{!= field value}
Expression compares true if field is not equal to value.
{>= field value}
Expression compares true if field is greater than or equal to value.
{> field value}
Expression compares true if field is greater than value.
{match field expression}
Expression compares true if field matches glob expression.
Case is insensitive.
{match_case field expression}
Expression compares true if field matches glob expression,
case-sensitive.
{range field low hi}
Expression compares true if field is within the range of
low <= field < hi.
Broken or inop under "search" but working with "search+" as of
12/6/06.
Examples:
Write everything in the table tab-separated to channel $channel
$ctable search -write_tabsep $channel
Write everything in the table with coolness > 50 and hipness > 50
$ctable search -write_tabsep $channel -compare {{> coolness 50} {> hipness 50}}
Run some code every everything in the table matching above
$ctable search -compare {{> coolness 50} {> hipness 50}} -key key -array_get data -code {
puts "key -> $key, data -> $data"
}
search+
Search for matching rows and take actions on them, exploiting skiplists,
with optional sorting. (sorting is currently broken.)
$ctable search+ ?-sort {field1 {field2 desc}}? ?-fields fieldList? ?-compare list? ?-countOnly 0|1? ?-offset offset? ?-limit limit? ?-code codeBody? ?-write_tabsep channel?
Skip lists have significantly higher insert overhead -- it takes about 7 microseconds per row inserting a million two-varchar field rows, including allocating, adding the hashtable entry, and adding the skip list node, versus about 2.3 microseconds per row just to allocate the space and do the hashtable insert without the skip list insert.
Skip lists also can't get to the "key" that the row was inserted with. Skip lists more point to a future where there isn't an external key -- that is, what would have been the external key exists as a normal field in the row.
The one huge win of skip lists is the "range" comparison method. We are seeing speedups of 200,000% when a "range" can be used on an indexed field versus search's brute force scanning.
foreach DEPRECATED (use "search" instead)
Iterate over all of the rows in the table, or just the rows in the table
matching a string match wildcard, executing tcl code on each of them.
% x foreach key {
puts $key
puts ""
}
If you want to do something with the keys, like access data in the row,
use get, array_get or array_get_with_nulls, etc, within the code body.
% x foreach key {
catch {unset data}
array set data [x array_get $key]
puts "$key:"
parray data
puts ""
}
x foreach varName ?pattern? codeBody - an optional match pattern in
"string match" format will restrict what is presented to the code body.
The normal Tcl semantics for loops are followed; that is, you can
execute "continue" and "break" to resume the code with the next
row and break out of the foreach loop, respectively.
incr
Increment the specified numeric values, returning a list of the
new incremented values
% x incr $key a 4 b 5
...will increment $key's a field by 4 and b field by 5, returning
a list containing the new incremented values of a and b.
type
Return the "type" of the object, i.e. the name of the object-creating
command that created it.
% x type
cable_info
import_postgres_result
x import_postgres_result pgTclResultHandle
Given a Pgtcl result handle, import_postgresql_result will iterate over
all of the result rows and create corresponding rows in the table.
This is extremely fast as it does not do any intermediate Tcl evaluation
on a per-row basis.
How you use it is, first, execute some kind of query:
set res [pg_exec $connection "select * from mytable"]
(You can also use pg_exec_prepared or even the asynchronous Pgtcl
commands pg_sendquery and pg_sendquery_prepared in association with
pg_getresult -- see the Pgtcl documentation for more info.)
Check for an error...
if {[pg_result $res -status] != "PGRES_RESULT_OK"} {...}
...and then do...
x import_postgres_result $res
On a 2 GHz AMD64 we are able to import about 200,000 10-element rows
per CPU second, i.e. around 5 microseconds per row.
fieldtype
Return the datatype of the named field.
foreach field [x fields] {
puts "$field type is [x fieldtype $field]"
}
ip type is inet
mac type is mac
name type is varstring
address type is varstring
addressNumber type is varstring
geos type is varstring
i type is int
j type is int
ij type is long
needs_quoting
Given a field name, return 1 if it might need quoting. For example,
varstrings and strings may need quoting as they can contain any
characters, while integers, floats, IP addresses, MAC addresses, etc,
do not, as their contents are predicatable.
names
Return a list of all of the keys in the table. This is fine for small
tables but can be inefficient for large tables as it generates a list
containing each key, so a 650K table will generate a list containing
650K elements -- in such a case we recommend that you use foreach or
export instead.
reset
Clear everything out of the table. This deletes all of the rows in
the table, freeing all memory allocated for the rows, the rows'
hashtable entries, etc.
% x count
652343
% x reset
% x count
0
destroy
Delete all the rows in the table, free all of the memory, and destroy
the object.
% x destroy
% x asdf
invalid command name "x"
statistics
Report hashtable statistics on the object.
% x statistics
0 entries in table, 4 buckets
number of buckets with 0 entries: 4
number of buckets with 1 entries: 0
number of buckets with 2 entries: 0
number of buckets with 3 entries: 0
number of buckets with 4 entries: 0
number of buckets with 5 entries: 0
number of buckets with 6 entries: 0
number of buckets with 7 entries: 0
number of buckets with 8 entries: 0
number of buckets with 9 entries: 0
number of buckets with 10 or more entries: 0
average search distance for entry: nan
write_tabsep DEPRECATED (use sort -write_tabsep)
x write_tabsep channel ?-glob pattern? ?-nokeys? ?field...?
Write the table tab_separated to a channel, with the names of desired
fields specified, else all fields if none are specified.
set fp [open /tmp/output.tsv w]
x write_tabsep $fp
close $fp
If the glob pattern is specified and the key of a row does not match
the glob pattern, the row is not written.
The first field written will be the key whether you like it or not, unless
if -nokeys is specified, the key value is not written to the destination.
NOTE - We do not currently quote any tabs that occur in the data, so
if there are tab characters in any of the strings in a row, that row
will not be read back in properly. In fact, we will generate an error
when attempting to read such a row.
read_tabsep
x read_tabsep channel ?-glob pattern? ?-nokeys? ?field...?
read tab-separated entries from a channel, with a list of fields
specified, or all fields if none are specified.
set fp [open /tmp/output.tsv r]
x read_tabsep $fp
close $fp
The first field is expected to be the key (unless -nokeys is specified)
and is not included in the list of fields. So if you name five fields,
for example, each row in the input file (or socket or whatever) should
contain six elements.
It's an error if the number of fields read doesn't match the number
expected.
If the glob pattern is defined it's applied to the key (first field in
the row) and if it doesn't match, the row is not inserted.
If -nokeys is specified, the first field of each row is not used as
the key -- rather, the key is automatically created as an ascending
integer starting from 0.
If you subsequently do another read_tabsep, the key will again begin at
0.
If you later want to insert at the end of the table, you can do a set
with the key being the table's count.
read_tabsep stops when it reaches end of file OR when it reads an
empty line. Since you must have a key and at least one field, this
is safe. However it might not be safe with -nokeys.
The nice thing about it is you can indicate end of input with an
empty line and then do something else with the data that follows.
index
Index is used to create skip list indexes on fields in a table.
x index create foo
...creates a skip list index on field "foo". It will index any existing
rows in the table and any future rows that are added. Also if a "set",
"read_tabsep", etc, causes a row's indexed value to change, its index
will be updated.
If there is already an index present on that field, does nothing.
x index drop foo
....drops the skip list on field "foo." if there is no such index,
does nothing.
x index dump foo
...dumps the skip list for field "foo". This can be useful to help
understand how they work and possibly to look for problems.
x index count foo
...returns a count of the skip list for field "foo". This number should
always match the row count of the table (x count). If it doesn't,
there's a bug in index handling.
The C Code Generated And C Routines Made Available
--------------------------------------------------
(There is a better interface than this. You can interact with any table
regardless of its composition with standardized C calls made through
the ctable table and ctable creator table structures. It's not documented
yet but you can study ctable_search.c, where it is used extensively, and
ctable.h where those structures are defined.)
For the above cable_info table defined, the following C struct is
created:
struct cable_info {
TAILQ_ENTRY(cable_info) _link;
struct in_addr ip;
struct ether_addr mac;
char *name;
int _nameLength;
char *address;
int _addressLength;
char *addressNumber;
int _addressNumberLength;
char *geos;
int _geosLength;
int i;
int j;
long ij;
struct Tcl_Obj *extraStuff;
unsigned int _ipIsNull:1;
unsigned int _macIsNull:1;
unsigned int _nameIsNull:1;
unsigned int _addressIsNull:1;
unsigned int _addressNumberIsNull:1;
unsigned int _geosIsNull:1;
unsigned int _iIsNull:1;
unsigned int _jIsNull:1;
unsigned int _ijIsNull:1;
unsigned int _extraStuffIsNull:1;
};
Note that varstrings are char * pointers. We allocate the space for whatever
string is stored and store the address of that allocated space. Fixed-length
strings are generated inline.
The null field bits and booleans are all generated together and should be
stored efficiently by the compiler.
You can examine the C code generated. It's quite readable. If you didn't
know better, you might think it was written by a person rather than a program.
Each table-defining command created has a StructHeadTable associated with it,
for example
struct cable_infoStructHeadTable {
Tcl_HashTable *registeredProcTablePtr;
long unsigned int nextAutoCounter;
};
The registered proc table is how we handle registering methods and the
nextAutoCounter is how we can generate unique names for instances of the
table when using "#auto".
Each instance of the table created with "create" has a StructTable
associated with it, for instance:
struct cable_infoStructTable {
Tcl_HashTable *registeredProcTablePtr;
Tcl_HashTable *keyTablePtr;
Tcl_Command commandInfo;
long count;
TAILQ_HEAD (cable_infoHead, cable_info) rows;
};
This contains a pointer to the head table's registered proc table, a hash
table that we use to store and fetch keys, a command info struct that
we use to delete our created command when it's told to destroy itself,
the row count, and a TAILQ structure that is currently not used but is
there for eventually creating keyless tables that will be navigable with
cursors, or even multi-keyed tables or keyed tables but allowing ordered
traversal rather than the pseudorandom ordering provided by foreach, names,
write_tabsep, etc, due to the unpredictable order of hashtable traversal.
Next, the number of fields is defined, the field names as an array of pointers
to character strings and an enumerated type definition of the fields:
#define CABLE_INFO_NFIELDS 10
static CONST char *cable_info_fields[] = {
"ip",
"mac",
"name",
"address",
"addressNumber",
"geos",
"i",
"j",
"ij",
"extraStuff",
(char *) NULL
};
enum cable_info_fields {
FIELD_CABLE_INFO_IP,
FIELD_CABLE_INFO_MAC,
FIELD_CABLE_INFO_NAME,
FIELD_CABLE_INFO_ADDRESS,
FIELD_CABLE_INFO_ADDRESSNUMBER,
FIELD_CABLE_INFO_GEOS,
FIELD_CABLE_INFO_I,
FIELD_CABLE_INFO_J,
FIELD_CABLE_INFO_IJ,
FIELD_CABLE_INFO_EXTRASTUFF
};
The types of each field are emitted as an array and whether or not fields
need quoting:
enum ctable_types cable_info_types[] = {
CTABLE_TYPE_INET,
CTABLE_TYPE_MAC,
CTABLE_TYPE_VARSTRING,
CTABLE_TYPE_VARSTRING,
CTABLE_TYPE_VARSTRING,
CTABLE_TYPE_VARSTRING,
CTABLE_TYPE_INT,
CTABLE_TYPE_INT,
CTABLE_TYPE_LONG,
CTABLE_TYPE_TCLOBJ
};
int cable_info_needs_quoting[] = {
0,
0,
1,
1,
1,
1,
0,
0,
0,
1
};
A setup routine is defined that is automatically run once when the extension
is loaded, for example, cable_info_setup creates some Tcl objects containing
the names of all of the fields and stuff like that.
An init routine, for example, cable_info_init, is defined that will set a
newly malloc'ed row to default values (Defaults can be specified for most
fields. If a field does not have a default, its null bit is set to true.)
For efficiency's sake, we have a base copy that we initialize the first time
the init routine is called and then for subsequent calls we merely do a
structure copy to copy that base copy to the pointer to the row passed.
A delete routine is defined, for instance, cable_info_delete, that will take
a pointer to the defined structure and free it. The thing here is that it
has to delete any varstrings defined within the row prior to freeing the row
itself.
*_find takes a pointer to the StructTable corresponding to the ctable,
for instance, cable_infoStructTable and a char * containing the key to be
looked up, and returns a pointer to the struct (in the example, a
struct ctable_info *) containing the matching row, or NULL if none is found.
*_find_or_create takes a pointer to the StructTable, a char * containing
the key to be looked up or created, and a pointer to an int. If the
key is found, a pointer to its row is returned and the pointed-to int
is set to zero. If it is not found, a new entry for that name is created,
an instance of the structure is allocated and initialized, the pointed-to
int is set to one, and the pointer to the new row is returned.
A *_obj_is_null routine is defined, for instance cable_info_obj_is_null
that will return a 1 if the passed Tcl object contains a null value and
zero otherwise.
*_genlist (cable_info_genlist), given a pointer to a Tcl interpreter and
a pointer to a row of the corresponding structure type will generate a
list of all of the fields in the table into the Tcl interpreter's result
object.
*_gen_keyvalue_list does the same thing except includes the names of all the
fields paired with the values.
*_gen_nonuull_keyvalue_list does the same thing as *_gen_keyvalue_list except
that any null values do not have their key-value pair emitted.
*_set (cable_info_set) can be used from your own C code to set values in
a row. It takes a Tcl interpreter pointer, a pointer to a Tcl object containing
the value you want to set, a pointer to the corresponding structure, and
a field number from the enumerated list of fields.
It handles detecting and setting the null bit as well.
*_set_fieldobj is like *_set except the field name is contained in a Tcl object
and that field name is extracted and looked up from the field list to determine
the field number used by *_set.
*_set_null takes a row pointer and a field number and sets the null bit for
that field to true. Note there is no way to set it to false except to set
a value into a field as simply clearing the bit would be an error unless
some value was written into the corresponding field.
*_get fetches a field from a table entry and returns a Tcl object containing
that field. It takes a pointer to the Tcl interpreter, a pointer to a row
of the structure, and a field number. If the null bit is set, the null
value is returned.
Even though it is returning Tcl objects, it's pretty efficient as it passes
back the same null object over and over for null values and uses the correct
Tcl_New*Obj for the corresponding data type, hence ints are generated with
Tcl_NewIntObj, varstrings with Tcl_NewStringObj, etc.
*_get_fieldobj works like *_get except the field name is contained in the
passed-in field object and looked up.
*_lappend_fieldobj and *_lappend_field_and_nameobj append the specified
field from the pointed-to row and append the field name (via a continually
reused name object) and value, respectively.
*_lappend_nonull_field_and_nameobj works just like *_lappend_field_and_nameobj
except that it doesn't append anything when the specified field in the
pointed-to row is null.
*_get_string - This is particularly useful for the C coder. It takes a
pointer to an instance of the structure, a field number, a pointer to
an integer, and a pointer to a Tcl object and returns a string representation
of the requested field. The Tcl object is used for certain conversions
and hence can be considered a reusable utility object. The length of the
string returned is set into the pointed-to integer.
Example:
CONST char *cable_info_get_string (struct cable_info *cable_info_ptr, int field, int *lengthPtr, Tcl_Obj *utilityObj) {...}
For fixed strings and varstrings, no copying is performed.
*_delete_all_rows - give a pointer to the StructTable for an instance,
delete all the rows.
Interfaceing with Ctables From C
--------------------------------
At the time of this writing, no C code has been written to use any of these
routines that is not part of the CTable code itself. We envision providing
a way to write C code inline within the CTable definition and, for more
complicated code writing, to provide a way to compile and link your C
code with the generated C code. This will require generating an include
file containing the structure definition, function definitions for the
C routines you'd be calling, and many other things currently going
straight into the C code. These changes are fairly straightforward, however,
and are on the "to do" list.
Using Copy In For Super Fast Ctable-to-PostgreSQL Transfers
-----------------------------------------------------------
Here's the PostgreSQL syntax for copying from a file (or stdin) to a table:
COPY tablename [ ( column [, ...] ) ]
FROM { 'filename' | STDIN }
[ [ WITH ]
[ BINARY ]
[ OIDS ]
[ DELIMITER [ AS ] 'delimiter' ]
[ NULL [ AS ] 'null string' ]
[ CSV [ HEADER ]
[ QUOTE [ AS ] 'quote' ]
[ ESCAPE [ AS ] 'escape' ]
[ FORCE NOT NULL column [, ...] ]
Here's an example of taking a ctable and copying it it to a PostgreSQL table.
package require Pgtcl
source cpescan.ct
package require Cpe_scan
cpe_scan null_value \\N
cpe_scan create cpe
set fp [open junk]
#cpe read_tabsep $fp ip_address status ubr k_docsifcmstatustxpower k_docsifdownchannelpower k_docsifsigqsignalnoise k_docsifsigqmicroreflections
#cpe read_tabsep $fp -glob *:e1 ip_address status ubr
#cpe read_tabsep $fp -glob *:e1
cpe read_tabsep $fp
close $fp
puts [cpe count]
set db [pg_connect www]
#
# note double-backslashing on the null value and that we set the null value
# to match the null_value set with the ctable.
#
set res [pg_exec $db "copy kfoo from stdin with delimiter as '\t' null as '\\\\N'"]
#
# after you've started it, you expect the postgres response handle's status
# to be PGRES_COPY_IN
#
if {[pg_result $res -status] != "PGRES_COPY_IN"} {
puts "[pg_result $res -status] - bailing"
puts "[pg_result $res -error]"
exit
}
#
# next you use the write_tabsep method of the ctable to write
# TO THE DATABASE HANDLE
#
#cpe write_tabsep $db ip_address status ubr
cpe write_tabsep $db
#
# then send a special EOF sequence.
#
puts $db "\\."
#
# the result handle previously returned will now have magically changed
# its status to the normal PGRES_COMMAND_OK response.
#
puts [pg_result $res -status]
NOTE that all the records must be accepted, i.e. not violate any constraints, etc, or none of them will be.
Karl Lehenbauer
7/19/06