New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checksum and other CLI functions as "Tools" #1435

Open
tlhackque opened this Issue Jun 18, 2018 · 10 comments

Comments

Projects
None yet
3 participants
@tlhackque
Copy link

tlhackque commented Jun 18, 2018

Describe the new feature

sqlitebrowers is a GUI that seems to aim to replace sqlite3.

One thing that I noticed missing is the .checksum [--schema --sha3-nnn] command.
This is handy when packing up a database for archiving and/or distribution.
It's described here: https://www.sqlite.org/cli.html#sha3sum

The database integrity checks described on the same page, would also be good candidates for the "Tools" menu discussed in #1434 ...

See also the pragmas integrity_check https://www.sqlite.org/pragma.html#pragma_integrity_check, quick_check, and foreign_key_check on the same page

Additional info

Please answer these questions before submitting your feature request.

Is your feature request related to an issue? Please include the issue number.

yes, vaguely. See above

Does this feature exist in another product or project? Please provide a link.

Don't know

Do you have a screenshot? Please add screenshots to help explain your idea.

No

@justinclift

This comment has been minimized.

Copy link
Member

justinclift commented Jun 18, 2018

Good thinking @tlhackque.

Hadn't known the sha3sum command even existed in the CLI. Sounds useful. 😄

On the other hand, I'm not sure how effective the integrity_check pragma really is. From memory of testing it last year briefly, when purposely corrupting databases (random blocks of data written into the file interior) the integrity check never spotted problems. Didn't seem to be a very comprehensive check.

Still, I added it to our server side part of things at the time, just in case. After all "it shouldn't make things worse". 😉

@justinclift

This comment has been minimized.

Copy link
Member

justinclift commented Jun 18, 2018

Only semi-related, but this is kind of nifty:

    https://www.sqlite.org/locrsf.html

@justinclift

This comment has been minimized.

Copy link
Member

justinclift commented Jun 18, 2018

Oh wow, DB4S is even mentioned directly in the referenced US Library of Congress page:

    https://www.loc.gov/preservation/digital/formats/fdd/fdd000461.shtml#sustainability

Under "Adoption".

@tlhackque

This comment has been minimized.

Copy link
Author

tlhackque commented Jun 18, 2018

I agree that the integrity checks can provide false confidence - but what they can do, they should.

Randomly throwing data at a database is surprisingly unlikely to be detected. A well functioning database will be mostly data (especially as the page size increases) - and the integrity checks are for metadata. (well, except for the one where the user has to supply code).

What they should detect are cases where indexs, page free counts, and the like go wrong. When they pass, you may retrieve the wrong data, but it will be well organized. :-)

It's sort of like throwing space junk at the planet. The odds are it will hit water (~75% of the planet's surface) - and if it hits land, it's still not likely to be inhabited. Despite all the stuff that's fallen uncontrolled from orbit, I believe only one person has been it - and fortunately, not seriously injured.

That's for testing. For production, O'Toole's observation on Murphy's Law applies.[1]

If you want to cause corruption, aim low - or high. Typlically a database will have configuration parameters, the root of indicies and storage maps in the first few pages. (Or with some, as with .ZIP files, the last few.) The beginning and the end of a file are the only two places that are easy to find.

Seriously, metadata checks provide the assurance that you should not have sqlite panic or hang following its metadata. User level consisency is a different issue. My observation was merely that SQLite offers the tool, so it would be good to expose it in the browser. Improving the tool's quality/applicability is more an issue for the SQLite developers...

There's a note in the description of the checksum command that it uses an extension - it's actually clever in that the checksum is impervious to VACUUM, but does protect the data and schema. You can probably bundle the same extension into your code - but be sure to keep it up-to-date :-)

Happy coding.

--
[1] Murphy: "Whatever can go wrong, will."
Murphy's corollaries: "at the worst possible time", "with the worst possible audience", "for the maximum destructive effect" ...

O'Toole's observation: "That's all true. But ... Murphy was an optimist."

@tlhackque

This comment has been minimized.

Copy link
Author

tlhackque commented Aug 8, 2018

As I noted parenthetically in #1481 - another good candidate for "Tools" would be "Backup Database".

SQLite has an on-line Backup API that makes this pretty easy. See [the doc] here(https://sqlite.org/backup.html) and here.

I've exposed it (via the Perl DBD::SQLite wrapper) in several of my applications - it's pretty fast & convenient -- and it doesn't take the database offline. It also lets you save or populate a temporary database.

Making backup functions easily available is always a good thing.

Making it easy to snapshot a production database is a particularly good thing when you provide functions that might modify it in unexpected ways.

And this is one more sqlite3 function that you can make convenient. (More than one if you separate Backup & Restore - though I think Backup is the more important.)

A side note on the coding examples - as you're a GUI, you'll be tempted to use the second coding example to provide progress and aliveness for large databases. That's fine, but I would not use the example's 5 page quantum. Instead, run one backup_step, and pick a quantum that's a reasonable fraction of backup_pagecount. A small pagecount is inefficient - and creates many more opportunities for a write by another process to restart the backup operation. Too large, and the progress is chunky & other processes may time out. Off the top of my head, I'd start with something like 1-5% of backup_pagecount, with a floor of 10 pages... That said, in a low contention environment, backing up a 1/2GB database in one step (same spindle) takes less than 10 seconds. So you might elect to go that route.

@tlhackque tlhackque referenced this issue Aug 8, 2018

Closed

Preferences hiding #1434

1 of 8 tasks complete
@tlhackque

This comment has been minimized.

Copy link
Author

tlhackque commented Aug 23, 2018

Still, I added it to our server side part of things at the time, just in case. After all "it shouldn't make things worse"

The linked commit seems to be related to a web server version of DB4S, which I've never had reason to use.

I think that "integrity check" (and the quick/foreign keys checks) should be available on demand (from the "Tools" menu) for any open database in the standalone browser. One may wish to run them after a crash/power event - or before submitting a SQLite bug report.

Whenever enough round TUITs accumulate...

@mgrojo

This comment has been minimized.

Copy link
Contributor

mgrojo commented Sep 4, 2018

I've thought that we don't need anything fancy for executing the integrity and the foreing key checks. When the user clicks the menu option, we open a new tab in Execute SQL, insert the corresponding pragma and execute the query. The result would be this:
imagen

And for foreign key check (we can use a comment in the SQL for explaining the user the pragma result):
imagen

How do you all find this simple approach?

@justinclift

This comment has been minimized.

Copy link
Member

justinclift commented Sep 4, 2018

The linked commit seems to be related to a web server version of DB4S ...

Ahhh yeah. Started creating an optional cloud based storage solution for SQLite databases (checksummed, version controlled, traceability-enabled, etc) so people can share data sets and collaborate on them. Most of the basic bits are in place, and DB4S can send/receive data from it, but it still needs work before it'll be ready for "real" production usage.

That being said it is online and works. It just has... some rough edges. I haven't put time into it recently, but intend to in a while after getting some higher personal priority stuff taken care of.

@mgrojo mgrojo self-assigned this Sep 5, 2018

@tlhackque

This comment has been minimized.

Copy link
Author

tlhackque commented Sep 6, 2018

How do you all find this simple approach?

Using an Execute SQL tab seems like a good idea. Cryptic output with references to tech documentation seems at odds with the DB4S philosophy of simplicity:

This program is not a visual shell for the sqlite command line tool. It does not require
familiarity with SQL commands. It is a tool to be used both by developers and by end
users, and it must remain as simple to use as possible in order to achieve its goals.

I don't think you need the explanation of the function in the results window - it can be the "What IS" text for the item on the tools menu.

The integrity check has a default limit of reporting up to 100 errors; If you get 100, you should note that more errors may be unreported. My application allows the user to specify the limit.

I also expand the foreign key check results to make it easier for the user to comprehend. I think you already have the foreign_key_list results in the data base structure tab, so reporting violations more helpfully wouldn't be hard.

Doesn't take much code either way...

Below is the (Perl) code that I use - I think it should be readable even if you don't speak Perl.

sub verify {
    my $db = shift;

    my $dbh = $db->{dbh};

    $db->log( "Basic integrity test starting" );
    $db->{options}{update}($db) if( $db->{options}{update} );

    my $limit = 100;
    my $log = $dbh->selectall_arrayref( << "SQL" );
pragma integrity_check( $limit )
SQL

    foreach my $row (@$log) {           # Single column, error string
        $db->log( " %s\n", $row->[0] );
    }
    if( @$log >= $limit ) {
        $db->log( " ... limit reached, additional errors may be present\n" );
    }

    $db->log( "Cross-reference test starting" );
    $db->{options}{update}($db) if( $db->{options}{update} );

    $log = $dbh->selectall_arrayref( << "SQL" );
pragma foreign_key_check
SQL
    if( @$log ) {
        $db->log( " Cross-reference errors detected:\n" );
        my %tblcon;
        foreach my $row (@$log) {
            my( $fromtbl, $fromrow, $totbl, $rule ) = @$row;
            unless( $tblcon{$totbl} ) { # Rules for target table
                my $rules = $dbh->selectall_arrayref( << "SQL" );
pragma foreign_key_list("$fromtbl")
SQL
                foreach my $con (@$rules) {
                    $tblcon{$totbl}[$con->[1]] =  # Seq
                      {
                       table  => $con->[2],       # Table name
                       fcol   => $con->[3],       # Column making ref
                       tcol   => $con->[4],       # Target column
                       onupd  => $con->[5],       # On Update action
                       ondel  => $con->[6],       # On Delete action
                       match  => $con->[7] };     #
                }
            }
            $db->log(
             " Reference from \"%s\" in row %u of %s to \"%s\" in %s fails constraint\n ",
                     $tblcon{$totbl}[$rule]{fcol}, $fromrow, $fromtbl,
                     $tblcon{$totbl}[$rule]{tcol}, $totbl );
        }
    } else {
        $db->log( "ok\n" );
    }
    $db->log( "Test complete\n" );
}

You probably already have the rules - which means your version would be shorter. OTOH, I don't deal with "without_rowid" tables (the reported row is NULL in that case).

$db->log writes to the UI status window.
The {update} just refreshes the screen (this runs in a callback of an Tk applications).

I suppose there should be a limit on how many foreign key ("cross reference") errors are reported.

mgrojo added a commit that referenced this issue Sep 16, 2018

Enhancement #1435: pragmas invoked from Tools menu
Pragmas integrity_check, quick_check, foreign_key_check and optimize are
integrated in the Tools menu. The actions open a confirmation dialog
(with Help button that opens the corresponding reference page in
sqlite.org) and, after confirmation, opens a new SQL editor tab and runs
there the pragma, showing the possible results.

Sender in executeQuery is null in this case, so a check is added.

Added accelerator to Tools menu entry, which lacked one.
@justinclift

This comment has been minimized.

Copy link
Member

justinclift commented Jan 19, 2019

@tlhackque Some of your suggested Tools have been added by @mgrojo (a while back). They should be both in the nightly builds, and in the 3.11.0 betas.

If you have time to try them out and let us know good/bad/weirdness, etc, that would be nifty. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment