DBIx::TableLoader - Easily load a database table from a data set
version 1.101
my $dbh = DBI->connect(@connection_args);
DBIx::TableLoader->new(dbh => $dbh, data => $data)->load();
# interact with new database table full of data in $dbh
This module tries to provide a fast and simple (but very configurable) interface for taking a set of data and loading it into a database table.
Common uses would be to take data from a file (like a CSV) and load it into a SQLite table. (For that specific case see DBIx::TableLoader::CSV.)
In most cases simply calling load()
is sufficient,
but all methods are documented below for completeness.
Create a new instance. Accepts a hash or hashref of options.
This module is very configurable but tries to use good defaults in the hopes that you won't need to configure too much in most cases.
Most likely needed options:
dbh
- A database handle as returned byDBI->connect()
data
- An arrayref of arrayrefs of data (which will be the input records)
See "OPTIONS" for the full list.
Returns a hashref of the options defined by the base class and their default values.
Returns a hashref of additional options defined by a subclass.
my $columns = $loader->columns;
# [ ['column1', 'data type'], ['column two', 'data type'] ]
Returns an arrayref of the columns. Each element is an arrayref of column name and column data type.
my $column_names = $loader->column_names;
# ['column1', 'column two']
Returns an arrayref of the column names.
Executes a CREATE TABLE
SQL statement on the database handle.
Generates the opening of the CREATE TABLE
statement
(everything before the column specifications).
Defaults to "CREATE $table_type TABLE $quoted_name ("
.
Generates the SQL for the CREATE TABLE
statement
by concatenating "create_prefix",
the column definitions,
and "create_suffix".
Can be overridden in the constructor.
Generates the closing of the CREATE TABLE
statement
(everything after the column specifications).
Defaults to ")"
.
Returns the default (base) name for the table.
This is mostly for subclasses where a useful table name
can be determined from the input (like a filename).
In this module it defaults to 'data'
.
This gets concatenated together with "name_prefix" and "name_suffix" in "name".
Columns that have not been given an explicit data type
will be defined using the default_column_type
.
You can pass a value explicitly to the constructor, or it will try to determine an appropriate (string) type based on the database driver (using "default_sql_data_type").
If all else fails it will default to text
(which works for SQLite, PostgreSQL, MySQL, and some others).
Passed to "type_info" in DBI to query the database driver for an appropriate default column type.
Defaults to DBI::SQL_LONGVARCHAR
.
This method goes through the columns
and converts any scalar
column name to an arrayref of column name and default_column_type
.
It modifies itself and returns nothing.
It is called automatically from the constructor.
Execute the DROP TABLE
statement on the database handle.
Returns the portion of the SQL statement before the table name.
Defaults to DROP TABLE
.
Generates the SQL for the DROP TABLE
statement
by concatenating "drop_prefix", "quoted_name", and "drop_suffix".
Alternatively drop_sql
can be set in the constructor
if you need something more complex.
Returns the portion of the SQL statement after the table name.
Nothing by default.
Subclasses will override this method according to the input data format.
This is called from "get_row" to retrieve the next row of raw data.
It should return undef
when there are no more rows.
my $row = $loader->get_row();
Returns a single row of data at a time (as an arrayref).
This method will be called repeatedly until it returns undef
.
The returned arrayref will be flattened and passed to "execute" in DBI.
This is called from "get_row" when a row is determined to be invalid (when "validate_row" throws an error).
If handle_invalid_row
was not specified in the constructor
this method is a no-op:
the original row will be returned (and eventually passed to "execute" in DBI).
Possible values for the handle_invalid_row
option:
-
die
- Callsdie()
with the error message -
warn
- Callswarn()
with the error message and returns the row unmodified -
code ref
If it's a subroutine reference it is called as a method, receiving the loader object, the error message, and the row:
$handler->($loader, $error, $row);
The handler should either
die
to cease processing, return false to skip this row and get the next one, or return a (possibly modified) row that will be passed to "execute" in DBI.This allows you to, for example, write to a log when a bad row is found without aborting your transaction:
handle_invalid_row => sub { my ($self, $error, $row) = @_; $logger->log(['Bad row: %s: %s', $error, $row]); return; # return false to skip this row and move to the next one }
Generate the INSERT
SQL statement that will be passed to "prepare" in DBI.
Execute an INSERT
statement on the database handle for each row of data.
It will call "prepare" in DBI using "insert_sql"
and then call "execute" in DBI once for each row returned by "get_row".
my $number_of_rows = $loader->load();
Load data into database table.
This is a wrapper that does the most commonly needed things
in a single method call.
If the transaction
setting is true (default)
the actions will be wrapped in a transaction.
- "drop" (if configured)
- "create" (if configured)
- "insert_all"
Returns the number of rows inserted.
Returns the full table name
(concatenation of name_prefix
, name
, and name_suffix
).
This method is called from "new" after the object is blessed (obviously). Any preparation work specific to the type of data should be done here.
This is mostly a hook for subclasses and does very little in this module.
Returns the full, quoted table name.
Passes catalog
, schema
, and name
attributes to "quote_identifier" in DBI.
my $quoted_names = $loader->quoted_column_names();
# ['"column1"', '"column two"']
Returns an arrayref of column names quoted by the database driver.
Called from "get_row" to check that the provided row is valid.
It may die
for any error
which will be caught in "get_row"
and the error will be passed to "handle_invalid_row".
The return value works like that of "handle_invalid_row": On success, the valid row (possibly modified) should be returned. If a false value is returned "get_row" will attempt to get another row.
Currently this only checks that the number of fields in the row matches the number of columns expected, however other checks may be added in the future. Subclasses can overwrite this to define their own validations (though calling the original (superclass method) is recommended).
This module is very [excessively] configurable. In most cases the default values will be sufficient, but you should be able to customize the object to fit your needs.
Frequently Used Options:
-
columns
- Arrayref of column definitionsEach element can be an arrayref of column name and data type or just a string for the column name and "default_column_type" will be used. If not passed in the first row of
data
will be assumed to be column names.columns => ['first_name', 'last_name', ['last_seen', 'date']]
-
dbh
- A database handle as returned byDBI->connect()
This module probably isn't useful without one.
-
data
- An arrayref of arrayrefs of data to populate the table;Subclasses may define more appropriate options and ignore this parameter. If you're using this base class, you'll probably need this (unless you provide your own
get_row
coderef).data => [ ['polar', 'bear', '2010-08-15'], ['blue', 'duck', '2009-07-30'] ]
Less common options that are available when you desire extra tweaking power:
-
create
- Boolean; Whether or not to perform theCREATE TABLE
statementDefaults to true.
-
default_column_type
- Default data type for each columnThis will be used for each column that does not explicitly define a data type. The default will be determined from the database driver using
default_sql_data_type
. See "default_column_type".default_column_type => 'CHAR(50)'
-
drop
- Boolean to execute aDROP TABLE
statement beforeCREATE TABLE
Defaults to false. Set it to true if the named table already exists and you want to recreate it.
-
get_row
- A sub (coderef) that will override "get_raw_row"You can use this if your input data is in a different format than the module expects (to split a string into an arrayref, for instance). This is called like a method (the object will be
$_[0]
). The return value will be passed tomap_rows
if both are present.# each record is a line from a log file; # use the m// operator in list context to capture desired fields get_row => sub { my $s = <$io>; [ $s =~ m/^(\d+)\s+"([^"]+)"\s+(\S+)$/ ] }
NOTE
: If you useget_row
and don't passdata
you will probably want to passcolumns
(otherwise columns will be taken from the first call toget_row
). -
grep_rows
- A sub (coderef) to determine if a row should be used or skippedNamed after the built in
grep
function. It will receive the row as an arrayref in$_[0]
. (The row will also be available in$_
for consistency with the built ingrep
.) The object will be passed as$_[1]
in case you want it. If it returns a true value the row will be used. If it returns false the next row will be fetched and the process will repeat (until all rows have been exhausted).grep_rows => sub { $_->[1] =~ /something/ } # accept the row if it matches grep_rows => sub { my ($row, $obj) = @_; do_something(); } # 2 variables
-
handle_invalid_row
- How to handle invalid rows.Can be
die
,warn
, or a sub (coderef). See "handle_invalid_row" for more details. Default is to ignore (in which case DBI will likely error). -
map_rows
- A sub (coderef) to filter/mangle a row before useNamed after the built in
map
function. It will receive the row as an arrayref in$_[0]
. (The row will also be available in$_
for consistency with the built inmap
.) The object will be passed as$_[1]
in case you want it. It should return an arrayref (which will be used as the row).map_rows => sub { [ map { uc $_ } @$_ ] } # uppercase all the fields map_rows => sub { my ($row, $obj) = @_; do_something(); } # 2 variables
-
name
- Table nameDefaults to
'data'
. Subclasses may provide a more useful default. -
table_type
- String that will go beforeTABLE
in "create_prefix"A useful value might be
TEMPORARY
orTEMP
. This is probably database driver dependent, so use an appropriate value. -
transaction
- BooleanAll the operations in "load" will be wrapped in a transaction by default. Set this option to false to disable this.
Options that will seldom be necessary but are available for completeness and/or consistency:
-
catalog
- Table catalogPassed to "quote_identifier" in DBI to get the full, quoted table name. None by default.
-
create_prefix
- The opening of the SQL statementSee "create_prefix". Overwrite if you need something more complex.
-
create_sql
- TheCREATE TABLE
statementSee "create_sql". Overwrite if you need something more complex.
-
create_suffix
- The closing of the SQL statementSee "create_suffix". Overwrite if you need something more complex.
-
default_sql_data_type
- Default SQL standard data typeIf
default_column_type
is not supplied it will be determined by asking the database driver for a type corresponding toDBI::SQL_LONGVARCHAR
. Alternate values can be passed (DBI::SQL_VARCHAR()
for instance). See "default_sql_data_type". -
drop_prefix
- The opening of the SQL statementSee "drop_prefix". Overwrite if you need something more complex.
-
drop_sql
- TheDROP TABLE
statementWill be constructed if not provided. See "drop_sql".
-
drop_suffix
- The closing of the SQL statementSee "drop_suffix". Overwrite if you need something more complex.
-
name_prefix
- String prepended to table nameProbably mostly useful in subclasses where
name
is determined automatically. -
name_suffix
- String appended to table nameProbably mostly useful in subclasses where
name
is determined automatically. -
quoted_name
- Full table name, properly quotedOnly necessary if you need something more complicated than
$dbh->quote_identifier($catalog, $schema, $table)
(see "quote_identifier" in DBI). -
schema
- Table schemaPassed to "quote_identifier" in DBI to get the full, quoted table name. None by default.
This module was designed to be subclassed for use with specific data input formats.
DBIx::TableLoader::CSV is a prime example. It is the entire reason this base module was designed.
Subclasses will likely want to override the following methods:
- "defaults" - a hashref of additional acceptable options (and default values)
- "default_name" - if you can determine a good default name from the input
- "get_raw_row" - to return the next row of data
- "prepare_data" - to initialize your object/data (open the file, etc.)
Be sure to check out the code for DBIx::TableLoader::CSV.
Also see a very simple example in t/subclass.t
.
It seemed frequent that I would find a data set that was difficult to view/analyze (CSV, log file, etc.) and would prefer to load it into a database for its powerful, familiar processing abilities.
I once chose to use MySQL because its built in LOAD DATA
command
read the malformed CSV I was given and the .import
command in SQLite did not.
I wrote this module so that I'd never have to make such a choice again. I wanted to be able to use the power of Text::CSV to make sure I could take any CSV I ever got and load it into SQLite easily.
I tried to make this module a base class to be able to handle various formats.
This is more of a list of ideas than features that are planned.
- Allow a custom column name transformation sub to be passed in
- Use "decamelize" in String::CamelCase by default?
- Allow extra columns (like
id
) to be added and/or generated - Option to scan the data to guess appropriate data types for each column
- Make a SQLite function so that you could call this from a
dbish
command line? - Allow
UPDATE
statements and specify the key columns (for theWHERE
clause)
You can find documentation for this module with the perldoc command.
perldoc DBIx::TableLoader
The following websites have more information about this module, and may be of help to you. As always, in addition to those websites please use your favorite search engine to discover more resources.
-
MetaCPAN
A modern, open-source CPAN search engine, useful to view POD in HTML format.
Please report any bugs or feature requests by email to bug-dbix-tableloader at rt.cpan.org
, or through
the web interface at https://rt.cpan.org/Public/Bug/Report.html?Queue=DBIx-TableLoader. You will be automatically notified of any
progress on the request by the system.
https://github.com/rwstauner/DBIx-TableLoader
git clone https://github.com/rwstauner/DBIx-TableLoader.git
Randy Stauner rwstauner@cpan.org
- Randy Stauner randy@magnificent-tears.com
- Randy Stauner randy@r4s6.net
This software is copyright (c) 2011 by Randy Stauner.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.