Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request - Provide warning indicator when the miner is dying #630

Open
anto opened this issue Aug 29, 2015 · 3 comments
Open

Feature request - Provide warning indicator when the miner is dying #630

anto opened this issue Aug 29, 2015 · 3 comments

Comments

@anto
Copy link

anto commented Aug 29, 2015

This feature request is at the moment only relevant to Antminer U3. However, this feature could be possibly also used for the miners that do not have the capability to update their status.

As of BFGMiner 5.2.0, BFGMiner does not actually know the status of Antminer U3. It keeps running and expects Antminer U3 to perform hashing even when Antminer U3 already crashed. From what I observed when Antminer U3 is dying, the all-time effective average hash rate keeps reducing. All other hash rate figures are usually increasing a little bit, but they cannot reliably be used as indicator.

However, from what I have observed today, the all time effective average hash rate figure is also not reliable. But I believe the algorithm that is being used to calculate the all time effective average hash rate figure can be used to generate a warning indicator that Antminer U3 is dying, so that the users do not have to physically look at the hashing LED on Antminer U3 to see whether it is still hashing or not.

@luke-jr
Copy link
Owner

luke-jr commented Aug 30, 2015

I'm not sure there's a way to reliably detect failure of the U3. Maybe we can request clock speed info when no nonces have been found in a while...?

@anto
Copy link
Author

anto commented Aug 30, 2015

Hi Luke,

Yes. Perhaps that could work. The question would be, how long should it wait for no nonces from the miner before declaring that the miner is dead?

As I mentioned on my previous issue report, I am not a programmer so I have actually no clue on how to implement this feature. However, below is what I have observed so far which I hope would give you ideas on implementing the feature if you wish.

Yesterday, I started to use only 1 BFGMiner process to manage 2 of my Antminer U3'. I used voltage=x750, clock=x0782 and timing=0.0175 and I got the total hash rate figures of around 101/101/100 GH/s. This morning at around 06:00 UTC (according to Eligius hash rate graph below), the 1st Antminer U3 (AMU 0) crashed. BFGMiner does not know about it. When I checked it at around 11:00 UTC, the hashing LED of the 1st miner was not illuminating any more. As we can see on below screenshot, that the all-time average hash rates of all AMU 0 processors increased to above 12.63 GH/s and their all-time average effective hash rates decreased to below 9 GH/s.

I initially thought that I could set a trigger to automatically power cycle the miners based on the difference of all-time average hash rate and all-time average effective hash rate like below:

a = all-time average hash rate
b = all-time average effective hash rate
Initiate power cycle when (a - b) > 10

The assumption for the above algorithm to work is that a > b. But I found that the b is not reliable. At some set of voltage, clock and timing parameters, a > b. But at different set of that parameters, a < b. And I have no clue on what parameters affecting b. I have clear idea on how to control a. I think I will raise another issue ticket for this one.

I think it will be great if we could just query the status of the miners via RPC API instead of trying to figure out myself like above.

Cheers,

Anto

1st Antminer U3 crashed

Eligius graph of Antminer U3 crashed

@jstefanop
Copy link
Collaborator

I'm working on a solution for this issue with bfgminer detecting dead ASIC devices. The only way to do it reliably would be with nonce responses from an ASIC, and would most likely have some default value of 120 seconds (if you have an ASIC set with some abnormally high diff value for whatever reason then I could implement a command line set value for ASIC is SICK after x seconds)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants