Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gridcoinreseach client not shutting down via File/Exit resulting in corrupted database and wallet. #1293

Closed
kingbeowulf opened this issue Sep 9, 2018 · 10 comments
Labels

Comments

@kingbeowulf
Copy link

With v3.7.15.0 stopping and restarting the gridcoinreseach (no server) was not an issue. After upgrading to v3.7.16.0, yesterday noticed that the gridcoinreseach client green check mark was replaced with sync icon: client had lost sync/not up-to-date and wasn't downloading blocks. After File/Exit, the client would not start up, giving corrupt DB and wallet errors. I was able to restore and backup wallet and reload teh snapshot.zip files. Client starting normally and was back up-to-date in about an hour. Asked about this issue on #gridcoin (IRC/Dicord). After another shutdown, we noted that gridcoinreseach was still running and writing to log. Killed process; gridcoinreseach started up normally. Several hours later, shut down gridcoinreseach, killed dangling process, error on startup:

$ gridcoinresearch
terminate called after throwing an instance of 'std::runtime_error'
  what():  init_blockindex(): error opening database environment IO error: /home/beowulf/.GridcoinResearch/txleveldb/MANIFEST-115390: No such file or directory
Aborted

Deleted blk0001.dat, tcleveldb/* and reloaded snapshot.db. gridcoinresearch syndec with no wallet errors. The only difference I can find after v3.7.16.0 installation (Tuesday) is that I upgraded to linux kernel 4.4.153 (more spectre/meltdown mitigations, new i7-6850K firmware) and nvidia-390.87 (Thursday) and the issues happened/noticed 2 days later. System info:

Slackware64 14.2 
http://www.slackware.com
changelog: ftp://ftp.osuosl.org/pub/slackware/slackware64-14.2/ChangeLog.txt

Current kernel: 4.4.153

build env: clean qemu-2.12.0 Slackware64 14.2 guest VM
build date: 04-SEP-2018 with kernel 4.4.144, other updated current.
buildscript: http://www.slackbuilds.org/repository/14.2/academic/Gridcoin-Research/
source: https://github.com/gridcoin-community/Gridcoin-Research/archive/3.7.16.0/Gridcoin-Research-3.7.16.0.tar.gz
build: QRENC=yes UPNP=yes ./Gridcoin-Research.SlackBuild

I've attached IRC transcript, dg.log and debug.log around the time on the errors.
db-1st_error.log
db-2nd_error_.log
grc_debuglog1.txt
grc_debuglog2.txt
grc_debuglog3.txt
grc_irc_debug_transcript-edited.txt

@jamescowens
Copy link
Member

The log shows clear evidence of at least two gridcoinresearch processes running simultaneously. It appears the first instance only partially shut down and another instance tried to access the database, leading to corruption. I helped do initial troubleshooting for this on Discord yesterday night and requested that an issue be created.

I was able to determine that a normal kill signal finishes off the residual process that remains after file/exit is executed. I cannot duplicate this issue on my Linux platforms.

@iFoggz
Copy link
Member

iFoggz commented Sep 9, 2018

  1. from your first database log it appears a failed or improper shutdown occurred and the database is not passing its checks. if you did not move your wallet.dat and its the same one there then something shutdown before the database finished. If your wallet was running while you updated the kernel and it had not shutdown the wallet yet and it was forced to shutdown the wallet may of not been finished doing what it needed to do for a safe shutdown.

  2. your grc_debuglog1.txt has an interesting error about deserialize or I/O error which could possibly mean a corruption on your disk or file system i would run a integrity check on your system as they may of caused 1 or more of your issues

  3. your grc_debuglog2.txt shows you got stuck with the txPrev bug (which will be addresses in a future release) however this would not break the blockchain database to the extent you are experiencing

Please check the integrity of your disk and then as for wallet try a previous backup of the wallet and see if it loads without any problems. if transactions are missing shutdown the wallet and then restart with -rescan so it will rescan for missing transactions. the wallet will backup daily if ran for a daily amount of time.

@iFoggz
Copy link
Member

iFoggz commented Sep 9, 2018

keep this issue up to date with your progress and lets figure this out

@dopeshitnetworks-irc-dopeshit-net
Copy link

I had the same or similar issue on a dedicated laptop wallet ( no crunching just wallet , no normal computer use ) and we assumed it was due to my 100ish wallet addresses. I had the client running on the 3.7.15.0 series distro and had not noticed for 2 days that it was complaining about a corrupt database. As I had created wallet addresses A-Z , AA-ZZ and 1-50 and would use a new one each time with a faucet. So each UTXO only had .02-1grc each and that was it vs the main address at 25kish.
So what about the option in the client " detach database on exit/shutdown " ?
I had to move mine to a new machine and just migrate my wallet.dat and gridcoinresearchd.conf.
Something I noticed , I currently have a node running on Slackware 14.2 and simply installed without slackbuild. Unlike Boinc where I just strait had to compile and install geoclue, webkitgtk, wxGTK3, libwebp and then clone the source from github for boinc and compile,
I did a normal compile and install with Gridcoinresearch too , I try to avoid installing stuff that someone has taken the time to modify and or change deviating from the original source code. So imo , maybe drop slackbuild and compile it from the source code provided by our dev's on github and see if that makes any different. It will compile from the native source , no modifications or slackbuild needed or slapt-get or slackpkg etc.
Although at this point I would move it to a windows box temp and clean up your client and then back to Slackware. Just a thought and or my opinion. IMO it seems Gridcoinresearch likes to be ran on Slackware , although 192gb ram helps.

@kingbeowulf
Copy link
Author

Well, I think I figured out what is going on with wallet not always shutting down when I think it should. I did check the drive integrity: no SMART errors, /messages/syslog/dmesg entries, e2fsck is clean.

First some background. My main Slackware box is configured with Xfce 4.12 and 2 independent X screens:
GTX 1060 monitor0 screen0
GTX 660 monitor1 screen1
Screen0 is the main screen with panels an systray/notifications and used to run GRC wallet in a second workspace; Screen1 has just a simple launcher panel. I noticed when running on Screen1, programs do not show icons in the Screen0 systray (this is correct behavior). Combining this with option "minimize to tray instead of taskbar" was the issue.

Your wallet has "File - Exit" and the X in the upper corner but no "File -Close". In both GTK(Boinc) and Qt (GRC) these are fully configurable. With GRC-wallet I was expecting the X to be "File - Exit" and not "File -Close". "File -Close" usually is set to minimize to task bar or systray (see Boinc Manager) as it is on my system (checked with some other programs as well). Thus, when I clicked the X, GRC-wallet did not close but minimized to systray. Since Screen1 has no systray...GRC-wallet was still in memory when I would launch it again (on either X screen).

Recommendations:

  1. Keep "File - Exit" as full wallet shut down, with a message as some other wallets do.
  2. Map the window X to "File - Exit" since current "close" is ambiguous.
  3. Add "File - Close" with behavior as user defined in "Options - Window" with default as "File - Exit"
  4. On launch, check if already running.

This will alleviate some confusion among users (i.e. me) running multiple screens and various DE/WM configurations. Thanks.

@tomasbrod
Copy link
Member

[x] closes the window as it should be.
Gridcoin wallet requires systray. If you do not have systray, it will not work correctly.

In which DE the systray does not show applications from other screens? It is interesting issue, because systrayed apps are not on any particular screen, because they do not have window.

Gridcoin locks it's data directory and keeps it locked during run time, therefore the second instance should not start. If it does not, and corrupts data if multiple are started, then issue specific to that should be opened.

@kingbeowulf
Copy link
Author

Since moving the GRC wallet to the other X screen running a systray so I know if its "really" shut down and not just "closed", verified with htop, it seems stable. It is concerning that there does not seem to be a start up check to make sure the wallet is not already in memory, or to prevent the new copy from accessing the files.

/off-topic
@tomasbrod, Even for the most Linux users my multi GPU, multi monitor X windows configuration is a bit unusual. It is not, however, unusual to run without an optional systray. If GRC wallet requires one that's on me - assuming it's mentioned in the documentation somewhere. Suffice to day, right now boincmanager, mumble, hexchat, for example, will all use a systray if available but do not stay in memory when I click the [x] . That [x] is to close the window AND exit the program from memory. Other wallets behave correctly in this manner, AND pop up a shutdown message as feedback.
/

@denravonska denravonska added the bug label Oct 9, 2018
@denravonska
Copy link
Member

denravonska commented Oct 9, 2018

I agree that the [X] should close the wallet if there is no systray. I'd assume Qt abstracted this but I'll have a look.

Edit: It looks from the Bitcoin source that it handles the [X] the same way as we do. It either minimizes to tray if the option is enabled, or it quits. Are you sure it works in other wallets?

@jamescowens
Copy link
Member

Is this still an issue? BTW there IS a startup check to see if the files are already locked. I am going to close this if there is no response after a few days.

@philipswift
Copy link

Ages ago we had a caveat saying 'before extensive support/helpdesk becomes available, the end user running Windows needs to...'

  1. Install latest patches and service packs installed but excluding 'optional' updates (which can be buggy)
  2. Run command line CMD as administrator and do chkdsk /f (deferring to James' 'check disk health' comment)
  3. Run command line CMD as administrator and do sfc /SCANNOW which scans the integrity of all protected [operating] system files and repairs files with problems when possible.
    image
    image
    image
    image
    image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants