Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Database dump failure #248

Closed
traviscb opened this Issue · 11 comments

5 participants

@traviscb
Collaborator

Originally on 2010-08-16

As I mentioned on the last EVO, I'm taking some dumps of the db on dev and beta. Tibor suggested that max_allowed_packet was the culprit for having an unlcean dump of the db.

This is true for dev, I believe, in that I just finally took a snapshot of the db there with no errors, after raising max_allowed_packet in /etc/my.cnf to 500M. Still loading that on my machine here to make sure I am right.

However, on beta, I set max_allowed_packet to 500M and then 1G (on pcudssw1502)and still have a problem:

2010-08-16 20:15:01 --> Task #6337 started.
2010-08-16 20:15:01 --> Reading parameters started
2010-08-16 20:15:01 --> Reading parameters ended
2010-08-16 20:15:01 --> Database dump started
2010-08-16 20:15:01 --> ... writing /opt/tbrooks/data-dumps//inspirehep-dbdump-2010-08-16_20:15:01.sql
2010-08-16 20:24:53 --> ERROR: mysqldump exit code is 768.
2010-08-16 20:24:54 --> Task #6337 finished. [ERROR]

I can't find anything obvious about 768 error codes from MySQL.

This is not the same problem that dev was giving when doing the dumps that failed, so this may be a different issue, but note that the dump file gets to about 15G and ends with:


--
-- Dumping data for table `rnkCITATIONDATA`
--

LOCK TABLES `rnkCITATIONDATA` WRITE;
/*!40000 ALTER TABLE `rnkCITATIONDATA` DISABLE KEYS */;
@tiborsimko
Owner

Originally on 2010-08-16

You seem to have changed one value only; the [mysqldump] section value
still stayed low.

PCUDSSW1502> grep max_a /etc/my.cnf
max_allowed_packet = 1G
max_allowed_packet = 160M
@traviscb
Collaborator

Originally on 2010-08-16

Thanks, good point I missed that. but after fixing:


[tbrooks@pcudssw1502 ~]$ grep packet /etc/my.cnf
max_allowed_packet = 1G
max_allowed_packet = 1G

I still get:

2010-08-16 20:15:01 --> Task #6337 started.
2010-08-16 20:15:01 --> Reading parameters started
2010-08-16 20:15:01 --> Reading parameters ended
2010-08-16 20:15:01 --> Database dump started
2010-08-16 20:15:01 --> ... writing /opt/tbrooks/data-dumps//inspirehep-dbdump-2010-08-16_20:15:01.sql
2010-08-16 20:24:53 --> ERROR: mysqldump exit code is 768.
2010-08-16 20:24:54 --> Task #6337 finished. [ERROR]
@tiborsimko
Owner

Originally on 2010-08-16

Apparently fixed by setting max_allowed_packet on the client side
as well; a fresh DB dump is available at
/opt/cds-invenio/var/log/inspirehep-dbdump-2010-08-16_23:28:32.sql.
Closing the ticket, let's wait with further analysis in case problem
reappears.

@tiborsimko tiborsimko closed this
@traviscb
Collaborator

Originally on 2010-08-17

For future reference, note that this is set in the client side, not in the args to mysqldump, but in my.cnf on the application server (1506 as opposed to 1502, the db server):

[tbrooks@pcudssw1506:tbrooks/data-dumps]$ more /etc/my.cnf  

<snip>

[mysqldump]
max_allowed_packet = 1G
@tiborsimko
Owner

Originally on 2010-08-17

Yes, that's exactly what I did.

@kaplun
Collaborator

Originally on 2011-02-03

Mmh. I have the same issue in OpenAIRE. I tried to fix it setting 1G in any instance of max_allowed_packet without success. Moreover it is the first time that this errore happens (I think) and OpenAIRE has only 3 records :-) So I think it's an error unrelated to max_allowed_packet.

Could it be due to AFS?

P.s. I found this ticket by Googling around for MySQL exit code 768... We seem to be almost the only one on the web falling on this issue :-)

@tiborsimko
Owner

Originally on 2011-04-01

After further adjustments of MySQL parameters, the dumper has been
performing well for the past few weeks, so I'm closing this ticket.

@jalavik
Collaborator

Originally on 2011-08-12

This issue seem to be back on Inspire, as last nights dbdump failed in this exact way. Has anyone any further information about this issue?

@ludmilamarian
Collaborator

Originally on 2011-08-12

dbdumped failed on CDS as well last night, but with a different exit code: 1280, which I don't know what it is, and did not find any useful hints googleing it.
On the other hand, 2 nights ago, dbdumped failed on CDS with exit code 768 (as the original ticket reports). I modified my.cnf (as suggested above) but I am not convinced this was causing the problem.. it might be AFS playing tricks on us :(

@tiborsimko
Owner

Originally on 2011-08-15

Replying to [comment:13 lmarian]:

I modified my.cnf (as suggested above) but I am not convinced this was causing the problem.. it might be AFS playing tricks on us :(

I had increased max_allowed_packet on INSPIRE boxes because of the big citation dictionaries, where ~1GB of blob data is stored in one row. This definitely helped to remove the dumping blocker at the time, although the dump error re-appeared from time to time afterwards.

On CDS, the dumps were working fine in the past even with smaller max_allowed_packet value, because there are no big blobs such as citation dictionaries. So it should not be necessary to tweak this concrete parameter. Still, some of the tables are huge, so we may need to tweak some other buffer parameters.

I'll try to look at this problem closer now that it reoccurs more frequently again. FWIW, dumping to local file system also caused troubles on INSPIRE when I was testing this one year ago, although this may have been before max_allowed_packet and buffer tweaks. I'll have a look.

@tiborsimko
Owner

Originally on 2011-08-16

So I've had a look and both CDS and INSPIRE dumping problems were due to disk space issue. The dump files are bigger than 33G now, so there was not enough room on the dump partition to (temporarily) hold three dumps anymore. I have cleaned the disk space last night and the dump happened successfully both on CDS and INSPIRE, see for example RT ticket 143960.

To be safer in the future, I have increased the AFS quota for dbdump volume, see RT ticket 144096. Moreover, dbdump will soon check for available space before it runs, so that it won't stop the queue unnecessarily, see #578.

I'm therefore closing this concrete ticket here again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.