Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLI utility for inserting binary files into SQLite #122

Closed
simonw opened this issue Jul 26, 2020 · 10 comments
Closed

CLI utility for inserting binary files into SQLite #122

simonw opened this issue Jul 26, 2020 · 10 comments
Labels
enhancement New feature or request research

Comments

@simonw
Copy link
Owner

simonw commented Jul 26, 2020

SQLite BLOB columns can store entire binary files. The challenge is inserting them, since they don't neatly fit into JSON objects.

It would be great if the sqlite-utils CLI had a trick for helping with this.

Inspired by simonw/datasette-media#14

@simonw simonw added enhancement New feature or request research labels Jul 26, 2020
@simonw
Copy link
Owner Author

simonw commented Jul 26, 2020

One idea: sqlite-utils insert-files

It could work something like this:

sqlite-utils insert-files files.db /tmp/blah.jpg /tmp/foo.gif \
  --table files \
  -c key:filename -c hash:sha256 -c body:content \
  --pk key

This would insert those two image files into the database in a table called files with a schema that looks something like this:

CREATE TABLE files (
    key text primary key,
    hash text,
    body blob
);

The -c key:filename options here are the most interesting: they let you create the table with a specific layout. The bit before the : is the column name. The bit after the : can be a range of different things:

  • filename - just the filename
  • filepath - the full filepath (provided on the command-line)
  • absolutepath - the filepath expanded to start with /home/... or whatever
  • sha256 - the SHA256 of the contents
  • md5 - the MD5
  • content - the binary content itself
  • mtime - the mtime (floating point timestamp)
  • ctime - the ctime (floating point timestamp)
  • mtime_iso - the mtime as an ISO datetime
  • ctime_iso - the mtime as an ISO datetime
  • size - the size of the file in bytes

@simonw
Copy link
Owner Author

simonw commented Jul 26, 2020

The command also accepts one or more directories, in which case it will recursively scan them for all files that they contain.

@simonw
Copy link
Owner Author

simonw commented Jul 26, 2020

Related: #123 (--raw option)

@simonw
Copy link
Owner Author

simonw commented Jul 26, 2020

Maybe support --replace for replacing images with an existing primary key.

@simonw
Copy link
Owner Author

simonw commented Jul 26, 2020

I should consider easy compatibility with https://www.sqlite.org/sqlar.html

An SQLite Archive is an ordinary SQLite database file that contains the following table as part of its schema:

CREATE TABLE sqlar(
  name TEXT PRIMARY KEY,  -- name of the file
  mode INT,               -- access permissions
  mtime INT,              -- last modification time
  sz INT,                 -- original file size
  data BLOB               -- compressed content
);

Each row of the SQLAR table holds the content of a single file. The filename (the full pathname relative to the root of the archive) is in the "name" field. The "mode" field is an integer which is the unix-style access permissions for the file. "mtime" is the modification time of the file in seconds since 1970. "sz" is the original uncompressed size of the file. The "data" field contains the file content. The content is usually compressed using Deflate, though not always. If the "sz" field is equal to the size of the "data" field, then the content is stored uncompressed.

@simonw
Copy link
Owner Author

simonw commented Jul 26, 2020

I'm trying to play with sqlite3 -A on my Mac.

sqlite3 -A tells me that it's an unknown option - but I used brew info sqlite to find my homebrew installed version and it turns out this works:

 % /usr/local/Cellar/sqlite/3.32.1/bin/sqlite3 -A
Wrong number of arguments.  Usage:
.archive ...             Manage SQL archives
   Each command must have exactly one of the following options:
     -c, --create               Create a new archive
     -u, --update               Add or update files with changed mtime
     -i, --insert               Like -u but always add even if unchanged
     -t, --list                 List contents of archive
     -x, --extract              Extract files from archive
   Optional arguments:
     -v, --verbose              Print each filename as it is processed
     -f FILE, --file FILE       Use archive FILE (default is current db)
     -a FILE, --append FILE     Open FILE using the apndvfs VFS
     -C DIR, --directory DIR    Read/extract files from directory DIR
     -n, --dryrun               Show the SQL that would have occurred
   Examples:
     .ar -cf ARCHIVE foo bar  # Create ARCHIVE from files foo and bar
     .ar -tf ARCHIVE          # List members of ARCHIVE
     .ar -xvf ARCHIVE         # Verbosely extract files from ARCHIVE
   See also:
      http://sqlite.org/cli.html#sqlar_archive_support

@simonw
Copy link
Owner Author

simonw commented Jul 26, 2020

This seems to work in creating a SQLite archive containing all .gif files in the current directory:

/usr/local/Cellar/sqlite/3.32.1/bin/sqlite3 archive.db -A -c *.gif

Then listing files like this:

$ /usr/local/Cellar/sqlite/3.32.1/bin/sqlite3 archive.db -A -t
copyable.gif
debug-allow.gif
flash.gif
table-md.gif

Here's the schema:

$ sqlite3 archive.db .schema                                  
CREATE TABLE sqlar(
  name TEXT PRIMARY KEY,  -- name of the file
  mode INT,               -- access permissions
  mtime INT,              -- last modification time
  sz INT,                 -- original file size
  data BLOB               -- compressed content
);

@simonw
Copy link
Owner Author

simonw commented Jul 27, 2020

Test command:

sqlite-utils insert-files gifs.db *.gif \
    -c filename:filename \
    -c filepath:filepath \
    -c absolutepath:absolutepath \
    -c sha256:sha256 \
    -c md5:md5 \
    -c content:content \
    -c mtime:mtime \
    -c ctime:ctime \
    -c mtime_iso:mtime_iso \
    -c ctime_iso:ctime_iso \
    -c size:size \
    --pk absolutepath

@simonw
Copy link
Owner Author

simonw commented Jul 27, 2020

Inserting files by piping them in should work - but since a filename cannot be derived this will need a --name blah.gif option.

cat blah.gif | sqlite-utils insert-files files.db files - --name=blah.gif

simonw added a commit that referenced this issue Jul 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request research
Projects
None yet
Development

No branches or pull requests

1 participant