-
Notifications
You must be signed in to change notification settings - Fork 440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault when using --rows on large dbs with 0.12 #848
Comments
@Dansthunder |
Ok, easy enough. Eyeballing it, it appears to dying on a table with binary data?!? I tested with --hex-blob, to see if that made a difference, and it didn't. Thanks again! |
@Dansthunder I need you to run:
then upload the file with the core dump located in /tmp/ under the template name: |
Oh god, I have just realized of |
I tried to simulate your scenario with this:
And executing mydumper like this:
But it is working, no segfault. |
Can you tell which table it's failing on exactly in my logs? Maybe I can get an explain or something? Delete some data, optimize, recreate? I'd guess probably a couple of hundred tables, and we're guessing it's the table with binary data in it. The difference between v0.11 and v0.12 of mydumper is too large for me to make guesses on, as to why one works, and the other doesn't. Easy solution for now is for me to just stick with v0.11 assuming it's doing everything correctly. Both compile with no issues, and I'm doing a cmake . -DWITH_SSL=OFF and make. I'm running an mysqlcheck -o on the testbox and letting it optimize every single table which will take a bit to say the least. If there's any change after the optimize, I'll let you know. Thanks! |
Hi @Dansthunder , https://www.ringerhq.com/i/mydumper/mydumper is an alternative to fix this kind of issues faster. |
@davidducos I deleted over 100 million rows out of two of the tables last night, down from 302 million down to 194 million, still segfaults. Full optimization of the entire database didn't help, other than saving some disk space. We've reverted back to v0.11.6, and look forward to future releases. Thanks for your time and effort! |
@Dansthunder |
Didn't segfault immediately like previously, ran for a smidge then died. Mydumper v0.11.6 has been working past few weeks, no issues. ./amx-test.sh: line 26: 2525 Segmentation fault (core dumped) /usr/src/scripts/mydumper-0.13.0-1/mydumper --database=$DB_NAME --host=$DB_HOST --user=$DB_USER --password=$DB_PASS --outputdir=$DB_DUMP --rows=20000 --threads=8 --build-empty-files --compress --triggers --events --routines --logfile=/var/log/backups/$DOW.dw-db.$DB_NAME.log --verbose=3 |
@Dansthunder can you run this in a x86-64 architecture please? |
None of our servers are x86-64. We're 100% running on aarch64 Graviton2 or Graviton3 C6GN or C7G Amazon EC2 instances. The performance gain in MariaDB moving to Graviton3 in addition to cost savings prohibit migrating backwards to x86-64, either Intel or AMD. We haven't touched x86-64 in about 3+ years now. But, out of curiosity... Why not spin this up, move the data volume over, and see wtf happens? Same exact thing. [root@rhel9-x86 tmp]# uname -a [root@rhel9-x86 ~]# ./amx-test.sh |
Hi @Dansthunder,
Up to frame 7 is ok, lower frames, are bulls%&t, can you check again with master as I merge #866 which might solve this issue. |
Same. I get ~280meg dumped, does fine on the smaller tables, and when it finally hits the big one, it segfaults. Same amount dumped prior to this master version. |
Hi @Dansthunder,
Why I'm pointing that out? well, in frame 7 we are in funciton process_integer_chunk which is only called if dbt->chunk_type is INTEGER on function thd_JOB_DUMP. However, dbt->chunk_type is CHAR, makes no sense. I think that you are sending me coredumps that I'm not able to review properly. So I will ask you one more thing, I need you to execute this:
And then we might be able to understand what is going on. |
@Dansthunder I think that I found the issue, just to be 100% sure, can you share the table structure of table amx.rtx_pages? or the last table that is failing? as I think that it has a VARCHAR primary key. |
Reading symbols from ./mydumper... I was trying to keep things the same...compiling from source, since I'd have to do that on aarch64. Since SOMEONE doesn't provide rpms for errr, the right flavor. :) Explain attached. |
Hahaha... I think that SOMEONE is not going to help you anymore! Hahaha... For sure, we need help to push #835 to have those packages available ASAP. Ok, about the issue, yeah... I know what is the problem... I'm going to move chunking by CHAR to |
Ok, I have been working during the weekend on it and after understand where was the issue, I was able to fix it. I changed A LOT of things, so be careful. |
Hi @Dansthunder, do you have any feedback? |
Yeah. So, not getting a segfault. Backup runs, but on my EC2 image, the x86 box has 16g of ram and 32g of swap and it eats all of it's memory and swap and hard locks. First time I ran, thought my ssh session died. So give me a bit here. For before it ran away, it looked like mariadb was the one using all the memory, not mydumper, which would be odd. Nothing changed in my.cnf. |
Grrr... Takes forever to reboot a hung EC2. Switched back over to the ARM EC2. Running backup right now on arm EC2 instance, 32G / 64G swap, see if it barfs here. May have overwhelmed the AMD x86-64 8 core / 16G box. Running on the ARM box, no runaway memory issues. box sitting with 9G ram free. It's running. But is it chunking vs using rows? Backup is now 15+ mins in, and 5.2GB done. Here's v11.5.3 backup time for AMX DB: 11 mins. |
FYI: Same test EC2, older version: Linux dw-rhel9.va.test.box 5.14.0-70.13.1.el9_0.aarch64 #1 SMP Thu Apr 14 12:36:51 EDT 2022 aarch64 aarch64 aarch64 GNU/Linux New mydumper was off fresh boot, and old version was obviously run after the first, so MariaDB could be running off of cache. 25 mins vs 15 mins vs 11 mins production. But no segfault! :) |
Hi @Dansthunder, |
I have been working on another approach and I found that --rows will not be the kind of parameter that we need for 'character' splitting. I will be adding --char-deep and --char-chunks, where deep is based on the new strategy of how we split a table, we start with a table with MIN and MAX at deep 0, thread 1 (t1) is going to take that job, then when t2 arrives and t1 will change the max to MIN + (MAX-MIN)/2, which is going to be the min for t2. Now, we have 2 jobs from [ MIN , MIN + (MAX-MIN)/2 ] and [ MIN + (MAX-MIN)/2 , MAX ] both at deep 1. Every time that a job is splitted, the deep increases. --char-deep will limit the deep, which at the ends means how many times a chunk can be splitted. |
I am also running into this segmentation fault. I looked in the schema and saw a table that is using a VARCHAR as a primary key. I tried to exclude this table but received another segmentation error. This is what I am running and the version of the software. mydumper mydumper --version I get this around 21gb of dumped data each time. (./script.sh: line 32: 2905 Segmentation fault) |
@christianjoun Why don't you test with master compiling your self. |
Ok, instead of --rows, now I'm using --char-chunks and say --char-deep=8 ? |
@Dansthunder you need to use --rows!! --char-chunks and --char-deep is for tuning, you can leave the default values. |
Ok using same exact script using --rows: I didn't add the --char-chunks or --char-deep. So we're back to the exact same perf as v0.11.6, 20 mins. This is just a default GP3 drive, not the max perf that we run in production on this test box. mydumper0.12.8, built against MySQL 10.8.5 with GZIP Started dump at: 2022-11-09 19:50:44 |
@Dansthunder ok, this is good, but I'm wondering if we are chunking by CHAR and how it is being split. Is it possible for you to share the log? as I need to check the where clause per chunk. |
@Dansthunder try again with --rows=1000:100000:10000000 |
Number of rows doesn't make a difference. I was capped on EC2 drive speed. I maxed out drive spec, and changed threads from 8, to 16, 24, 32, and 64. I was running another shell, watching iotop, and no matter what I did, I was flat lining mariadb read speeds. Let's try and break this thing, and give DavidDucos something to fix. (never crashed no matter how many threads --aawww shucks) This is a 16 core arm64 EC2 Amazon EC2 c7g.4xlarge, Maxing out a GP3 drive to 16,000 IOPs and 1000 MiB/s 32 threads, dropped the previous 20 minute run down to 10, and 64 threads down to 8, with rows=50000. There's one smaller table at the end that's holding up this 8 minute run, that could probably benefit from a smaller rows=xxxx, but I don't feel like waiting again. Started dump at: 2022-11-10 00:52:18 How we settled on 8 threads... used to be 12, then 8. Obviously, prod only reboots when something bad happens. So after maybe 3-6 months of backups, mariadb would segfault during a backup -- out of memory or whatever. This is going back year(s). Reducing it down to 8, no issues, box lives, no out of memory crashes. So while on the test box 16+ work, seriously doubt I'd trust it, just so a backup runs 2 mins faster. |
Hi @Dansthunder, |
with --rows=20000, --threads=8 w/zstd. I had to go digging thru the past comments to figure out how to enable it. :) with --threads=8, that's just on the verge of maxing out disk IO read limits on EC2 / GP3 max settings. I can only read so fast from MariaDB. I can go --threads-10, 12, 16, etc, instantly maxed out, and no diff in time. We'd have to move outta the "Normal" NVME GP3 drives and step up to IO2 or IO2 Block Express for insane speeds. Boss ain't paying for that. zstd is the way to go. :) |
Hello,
I've just updated mydumper and ran into some issues. Am I missing a flag or doing something wrong?
Results in: ./test.sh: line 24: 4865 Segmentation fault (core dumped) /usr/src/scripts/mydumper/mydumper --database=$DB_NAME --host=$DB_HOST --user=$DB_USER --password=$DB_PASS --outputdir=$DB_DUMP --threads=12 --rows=20000 --compress --triggers --events --routines --verbose=3
On my latest test server, RHEL 9.0, Linux 5.14.0-70.13.1.el9_0.aarch64
[root@rhel9 mydumper]# ./mydumper -V
mydumper 0.12.8, built against MySQL 10.8.5-MariaDB
I'm passing the following and this works, but takes forever, since using chunks, each thread becomes single threaded on large tables in the DB. Tables are ~79GB in size.
--database=$DB_NAME
--host=$DB_HOST
--user=$DB_USER
--password=$DB_PASS
--outputdir=$DB_DUMP
--threads=12
--chunk-filesize=50
--compress
--triggers
--events
--routines
--verbose=3
This USED to work, (and still does on the last of the 11 series -- mydumper 0.11.6, built against MySQL 10.8.5)
--database=$DB_NAME
--host=$DB_HOST
--user=$DB_USER
--password=$DB_PASS
--outputdir=$DB_DUMP
--rows=20000
--compress
--threads=12
--triggers
--events
--routines
--verbose=3
Process of elimination, removing triggers, events, routines, verbose, threads, etc, did nothing. Only until adding chunk solves it. It's ONLY on this large database with very large tables. Works fine on all the other smaller databases with smaller tables. v0.12.5-3 had memory leak issues that silently died after eating all the swap on the server. v0.12.7-1, -2, -3 & v0.12.8 segfault with --rows= on a DB with large tables.
I've had to roll back v0.11.6 for now to regain --rows capability.
--chunk-filesize=20 \ with mydumper 0.12.8
real 99m8.901s
user 35m52.131s
sys 0m27.496s
--rows=20000 \ with mydumper 0.11.6
real 8m39.193s
user 36m11.394s
sys 0m33.158s
Thanks!
The text was updated successfully, but these errors were encountered: