Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Truncated history revision for revisions #941

Closed
invenio-developers opened this Issue · 14 comments

3 participants

@invenio-developers
Collaborator

Originally by adeiana (@Osso) on 2012-03-12

We have a problem over here in inspire where some revisions for our records are truncated because they are longer than 65535 which is the max length for BLOBs in MySQL.
I would suggest to converting the blob to mediumblob.

r = run_sql('SELECT marcxml FROM hstRECORD WHERE id_bibrec=%s AND job_date=%s', ('1083313', '20120106133541'))

len(r[0][0])
65535
zlib.decompress(r[0][0])
Traceback (most recent call last):
File "", line 1, in ?
zlib.error: Error -5 while decompressing data

@tiborsimko
Owner

Originally on 2012-03-12

I fully agree, of course. Will you prepare a patch against master?

We may want to check other BLOB occurrences as well, e.g. firerole or job details:

$ git grep -Hni blob modules/miscutil/sql/ | grep -Ev '(mediumblob|longblob)'
modules/miscutil/sql/tabcreate.sql:2950:  password blob NOT NULL,
modules/miscutil/sql/tabcreate.sql:2952:  settings blob default NULL,
modules/miscutil/sql/tabcreate.sql:2989:  firerole_def_ser blob NULL,
modules/miscutil/sql/tabcreate.sql:3004:  data blob NOT NULL,
modules/miscutil/sql/tabcreate.sql:3149:  reply_order_cached_data blob NULL default NULL,
modules/miscutil/sql/tabcreate.sql:3199:  reply_order_cached_data blob NULL default NULL,
modules/miscutil/sql/tabcreate.sql:3636:  marcxml blob NOT NULL,
modules/miscutil/sql/tabcreate.sql:3641:  job_details blob NOT NULL,
modules/miscutil/sql/tabcreate.sql:3662:  job_details blob NULL default NULL,
@invenio-developers
Collaborator

Originally by adeiana (@Osso) on 2012-03-13

I will prepare a patch

@invenio-developers
Collaborator

Originally by adeiana (@Osso) on 2012-03-13

I was originally planning to have a check when creating the revision.
However I do not think we need it anymore. I used longblob which is the
same column format used by bibfmt and as a result we will never have
a record longer than that except if it something really weird is going on.

@tiborsimko
Owner

Originally on 2012-03-13

Thanks. Can you please do the following:

  • update Makefile.am's target update-db-from-v1.0-to-v1.1 with appropriate ALTER TABLE statement;

  • turn your clean_history_tables script into a new CLI option for the bibedit tool, such as bibedit --check-revisions [recid], that would detect bad revisions for given record (or for all records if given asterisk, say) and that would not only list troublesome revisions, but perhaps also interactively ask whether to delete any found troublesome revisions. (BTW, when checking, beware that there may be bibupload processes running.) See bibedit --help for other existing options WRT record revisions; you can emulate what these do and edit bibeditcli.py in this respect.

@invenio-developers
Collaborator

Originally by adeiana (@Osso) on 2012-03-15

Did those changes.

@invenio-developers
Collaborator

Originally by adeiana (@Osso) on 2012-11-15

The new version is here adeiana/941-hst

@kaplun
Collaborator

Originally on 2012-11-15

Hi Alessio,

it would be cool if:

  • you indeed give the possibility for users to interactively (/globally) delete corrupted histories (e.g. with e.g. a --delete-corrupted-history additional flag) as Tibor was mentioning in #comment:8)
  • print errors as in
#!python
print >> sys.stderr, "ERROR: foo"

to make it clear that it is an error

  • (minor) you can wrap the results of run_sql in intbitset as in:
#!python
for i, recid in enumerate(intbitset(run_sql("SELECT id FROM bibrec")))

to keep memory compact (instead of having a list of one million integers.

  • You can actually quick guess all the potential corrupted blobs: just look for all the blob whose length happen to exactly the length of the max size of the blob column: i.e. 2^16 bytes if I well understand the reference at: http://dev.mysql.com/doc/refman/5.0/en/storage-requirements.html In this way you could in principle speed up the overall check process and plug-it to the end of post_upgrade() hook :-)
@invenio-developers
Collaborator

Originally by adeiana (@Osso) on 2012-11-15

I agree with the changes I updated the patch in consequence.

I would only sys.stderr for a script error, finding corrupted revisions is the normal behavior of the script so should be stdout to me.
So that change I left out.

You got it right I went for the speedy procedure in the invenio upgrade. I kept the only working way for bibeditcli because it is generic, it'll check for uncompression errors which can be due to other corruptions and not just len.

@invenio-developers
Collaborator

Originally by Alessio Deiana alessio.deiana@cern.ch on 2012-11-27

In c6f2a15:

#CommitTicketReference repository="" revision="c6f2a15d7e48e18d99edf966dc7ed91a587a271a"
installation: bigger hstRECORD.marcxml size

- Updates table structure of the history table from blob to longblob.
  (fixes #941)

- Adds a new cli option to bibedit `--check-revisions` to check for
  invalid revisions.

Co-authored-by: Tibor Simko <tibor.simko@cern.ch>
@invenio-developers
Collaborator

Originally by Alessio Deiana alessio.deiana@cern.ch on 2012-11-27

In c6f2a15:

#CommitTicketReference repository="" revision="c6f2a15d7e48e18d99edf966dc7ed91a587a271a"
installation: bigger hstRECORD.marcxml size

- Updates table structure of the history table from blob to longblob.
  (fixes #941)

- Adds a new cli option to bibedit `--check-revisions` to check for
  invalid revisions.

Co-authored-by: Tibor Simko <tibor.simko@cern.ch>
@invenio-developers
Collaborator

Originally by Alessio Deiana alessio.deiana@cern.ch on 2012-11-27

In c6f2a15:

#CommitTicketReference repository="" revision="c6f2a15d7e48e18d99edf966dc7ed91a587a271a"
installation: bigger hstRECORD.marcxml size

- Updates table structure of the history table from blob to longblob.
  (fixes #941)

- Adds a new cli option to bibedit `--check-revisions` to check for
  invalid revisions.

Co-authored-by: Tibor Simko <tibor.simko@cern.ch>
@invenio-developers
Collaborator

Originally by Alessio Deiana alessio.deiana@cern.ch on 2012-11-27

In c6f2a15:

#CommitTicketReference repository="" revision="c6f2a15d7e48e18d99edf966dc7ed91a587a271a"
installation: bigger hstRECORD.marcxml size

- Updates table structure of the history table from blob to longblob.
  (fixes #941)

- Adds a new cli option to bibedit `--check-revisions` to check for
  invalid revisions.

Co-authored-by: Tibor Simko <tibor.simko@cern.ch>
@invenio-developers
Collaborator

Originally by Alessio Deiana alessio.deiana@cern.ch on 2012-11-27

In c6f2a15:

#CommitTicketReference repository="" revision="c6f2a15d7e48e18d99edf966dc7ed91a587a271a"
installation: bigger hstRECORD.marcxml size

- Updates table structure of the history table from blob to longblob.
  (fixes #941)

- Adds a new cli option to bibedit `--check-revisions` to check for
  invalid revisions.

Co-authored-by: Tibor Simko <tibor.simko@cern.ch>
@invenio-developers
Collaborator

Originally by Alessio Deiana alessio.deiana@cern.ch on 2012-11-27

In c6f2a15:

#CommitTicketReference repository="" revision="c6f2a15d7e48e18d99edf966dc7ed91a587a271a"
installation: bigger hstRECORD.marcxml size

- Updates table structure of the history table from blob to longblob.
  (fixes #941)

- Adds a new cli option to bibedit `--check-revisions` to check for
  invalid revisions.

Co-authored-by: Tibor Simko <tibor.simko@cern.ch>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.