Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minerdash runs but then falls over #50

Open
drak42 opened this issue May 13, 2014 · 24 comments
Open

Minerdash runs but then falls over #50

drak42 opened this issue May 13, 2014 · 24 comments
Labels
Milestone

Comments

@drak42
Copy link

drak42 commented May 13, 2014

Hi,

I got it running and have started adding hosts.

After a while it falls over with this:

2014-05-13T21:21:45.183Z - info: 222.154.249.121 - GET / HTTP/1.1
Error: EMFILE, readdir '/opt/miner-dashboard/frontend/views/partials'
glob error { [Error: EMFILE, readdir '/opt/miner-dashboard/frontend/views/partials']
errno: 20,
code: 'EMFILE',
path: '/opt/miner-dashboard/frontend/views/partials' }

In the browser i see this:
Error: EMFILE, readdir '/opt/miner-dashboard/frontend/views/partials'

Also graphs don't seem to be working.

Chris

@drak42
Copy link
Author

drak42 commented May 13, 2014

2014-05-13T23:53:12.733Z - info: miner2 - error fetching data code=EMFILE, errno=EMFILE, syscall=connect
2014-05-13T23:53:12.734Z - info: miner9 - error fetching data code=EMFILE, errno=EMFILE, syscall=connect
2014-05-13T23:53:12.735Z - info: miner3 - error fetching data code=EMFILE, errno=EMFILE, syscall=connect
2014-05-13T23:53:12.736Z - info: miner5 - error fetching data code=EMFILE, errno=EMFILE, syscall=connect
2014-05-13T23:53:12.736Z - info: miner6 - error fetching data code=EMFILE, errno=EMFILE, syscall=connect
2014-05-13T23:53:12.738Z - info: miner7 - error fetching data code=EMFILE, errno=EMFILE, syscall=connect

@selaux selaux added the bug label May 14, 2014
@selaux
Copy link
Owner

selaux commented May 14, 2014

Just create the /opt/miner-dashboard/frontend/views/partials directory, it doesn't seem to be included in the zip.

@selaux selaux added this to the 0.4.0 milestone May 14, 2014
@drak42
Copy link
Author

drak42 commented May 14, 2014

Hi,

Done that, app still runs for a while then starts loosing connection to devices it seems..

2014-05-14T08:56:26.549Z - info: miner10 - error fetching data code=ETIMEDOUT, errno=ETIMEDOUT, syscall=connect
2014-05-14T08:56:26.549Z - info: miner2 - error fetching data code=ETIMEDOUT, errno=ETIMEDOUT, syscall=connect
2014-05-14T08:56:26.550Z - info: miner9 - error fetching data code=ETIMEDOUT, errno=ETIMEDOUT, syscall=connect
2014-05-14T08:56:26.550Z - info: miner5 - error fetching data code=ETIMEDOUT, errno=ETIMEDOUT, syscall=connect
2014-05-14T08:56:26.551Z - info: miner8 - error fetching data code=ETIMEDOUT, errno=ETIMEDOUT, syscall=connect

They are all on the same network, two segments though, only thing different is I reference them at the firewall external ip and port which I then NAT internally to each miner, allows me to add remote boxes' in the same way

@selaux
Copy link
Owner

selaux commented May 14, 2014

Can you try to use one of the example scripts of cgminer (i.e. https://github.com/ckolivas/cgminer/blob/master/api-example.py) to check wether it is an issue with miner-dashboard or the miner is just not reachable anymore?

@drak42
Copy link
Author

drak42 commented May 14, 2014

Bit of a script kiddie here...

How would I call this to run it and what would I need to change?

@selaux
Copy link
Owner

selaux commented May 14, 2014

Download the script to the host where miner-dashboard is running, then run (Replace IP and Port if necessary)

python2 api-example.py summary 10.153.210.1 4028

This should give you an output like

{u'STATUS': [{u'STATUS': u'S', u'Msg': u'Summary', u'Code': 11, u'When': 1400059950, u'Description': u'cgminer 4.3.0'}], u'id': 1, u'SUMMARY': [{u'Difficulty Accepted': 112591.0, u'Pool Rejected%': 0.0053, u'Found Blocks': 0, u'Difficulty Rejected': 6.0, u'MHS 15m': 2987.56, u'Device Rejected%': 0.0053, u'Pool Stale%': 0.0843, u'Work Utility': 41.26, u'Rejected': 2, u'Elapsed': 163798, u'Hardware Errors': 975, u'Accepted': 31079, u'Network Blocks': 374, u'Local Work': 522033, u'Get Failures': 4, u'Difficulty Stale': 95.0, u'Total MH': 489078591.0, u'Device Hardware%': 0.8581, u'Discarded': 319983, u'Stale': 25, u'MHS av': 2985.86, u'Getworks': 5937, u'MHS 5s': 3199.98, u'Best Share': 58023, u'MHS 1m': 3018.63, u'MHS 5m': 2992.61, u'Last getwork': 1400059950, u'Remote Failures': 0, u'Utility': 11.38}]}

@drak42
Copy link
Author

drak42 commented May 14, 2014

Here you go

sudo python api-example.py summary x.x.x.x 4035
{u'STATUS': [{u'STATUS': u'S', u'Msg': u'Summary', u'Code': 11, u'When': 1400060131, u'Description': u'cgminer 3.7.2'}], u'id': 1, u'SUMMARY': [{u'Difficulty Accepted': 17885.913479819999, u'Pool Rejected%': 2.0135999999999998, u'Found Blocks': 0, u'Difficulty Rejected': 367.55075749999997, u'Device Rejected%': 46.941299999999998, u'Pool Stale%': 0.0, u'Work Utility': 13.199999999999999, u'Rejected': 16, u'Elapsed': 3558, u'Hardware Errors': 117, u'Accepted': 767, u'Network Blocks': 52, u'Local Work': 1659, u'Get Failures': 0, u'Difficulty Stale': 0.0, u'Total MH': 1359.1251999999999, u'Device Hardware%': 13.0, u'Discarded': 836, u'Stale': 0, u'MHS av': 0.38, u'Getworks': 417, u'MHS 5s': 0.38, u'Best Share': 921033, u'Remote Failures': 0, u'Utility': 12.93}]}

@drak42
Copy link
Author

drak42 commented May 14, 2014

As i said, npm start works fine for a while, then suddenly seems to loose connectivity

@drak42
Copy link
Author

drak42 commented May 14, 2014

had no issues with version 2 though...

@selaux
Copy link
Owner

selaux commented May 14, 2014

Hm, I didn't change anything having to do with polling the miner status from 0.2.0 to 0.3.0.

Now to get some more information:

  • How log does it take until the timeout errors happen?
  • Can you issue the api-example commands while the timeout errors happen?
  • Please issue the following command while the errors happen and paste the output lsof -p 3614, where 3614 is the PID of the node app command (the second column when you execute ps aux | grep "node app")

@selaux
Copy link
Owner

selaux commented May 14, 2014

PS: The issue might have been there before, the logging is a new thing.

@selaux
Copy link
Owner

selaux commented May 14, 2014

Another thing: Do all connections fail? Do you get any updated timestamps in the dashboard?

@drak42
Copy link
Author

drak42 commented May 15, 2014

Will get on to getting those details for you shortly, yes all connections fail. Runs perfectly for a few minutes then seems to loose all connections, sometimes a few come back then they drop off again to

@drak42
Copy link
Author

drak42 commented May 17, 2014

Hi,

Got some time to do a few tests.

  1. took about 2 minutes to fall over and loose connections to all devices.
  2. Issuing the API command to a device still returns data with no problems
  3. Here is some output for you:
    36 (SYN_SENT)
    node 3610 root 1013u IPv4 51342215 0t0 TCP BlackBOX.fritz.box:50053->x.86.204.y.static.snap.net.nz:44036 (SYN_SENT)
    node 3610 root 1014u IPv4 51342216 0t0 TCP BlackBOX.fritz.box:46318->x.86.204.y.static.snap.net.nz:44037 (SYN_SENT)
    node 3610 root 1015u IPv4 51342217 0t0 TCP BlackBOX.fritz.box:46319->x.86.204.y.static.snap.net.nz:44037 (SYN_SENT)
    node 3610 root 1016u IPv4 51342218 0t0 TCP BlackBOX.fritz.box:46320->x.86.204.y.static.snap.net.nz:44037 (SYN_SENT)
    node 3610 root 1017u IPv4 51342604 0t0 TCP BlackBOX.fritz.box:36358->x.86.204.y.static.snap.net.nz:44038 (SYN_SENT)
    node 3610 root 1018u IPv4 51342605 0t0 TCP BlackBOX.fritz.box:36359->x.86.204.y.static.snap.net.nz:44038 (SYN_SENT)
    node 3610 root 1019u IPv4 51342606 0t0 TCP BlackBOX.fritz.box:36360->x.86.204.y.static.snap.net.nz:44038 (SYN_SENT)
    node 3610 root 1020u IPv4 51342607 0t0 TCP BlackBOX.fritz.box:47004->x.86.204.y.static.snap.net.nz:44039 (SYN_SENT)
    node 3610 root 1021u IPv4 51342608 0t0 TCP BlackBOX.fritz.box:47005->x.86.204.y.static.snap.net.nz:44039 (SYN_SENT)
    node 3610 root 1022u IPv4 51342609 0t0 TCP BlackBOX.fritz.box:47006->x.86.204.y.static.snap.net.nz:44039 (SYN_SENT)
    node 3610 root 1023u IPv4 51342306 0t0 TCP BlackBOX.fritz.box:40078->.86.204.y.static.snap.net.nz:44031 (SYN_SENT)

As I mentioned I am doing port NAT'ing at a firewall level to access units in different networks, The all exist behind my public IP

Thanks

@drak42
Copy link
Author

drak42 commented May 17, 2014

Example of my configs:
id: 'miner1',
module: 'miners/bfgminer',
title: 'Rock Solid Miner 1 - Dual Sappihre R9 270x',
host: '203.86.204.25',
port: 44030

Port 44030 on my firewall NAT's to port 4030 on a device internally on a 192.168.1.x range

@selaux
Copy link
Owner

selaux commented May 18, 2014

Can you try the current master? I tried a fix.

@drak42
Copy link
Author

drak42 commented May 19, 2014

I did a git pull update, hope that's ok.
Same thing, runs perfectly for a while then falls over.

Get these errors in log when trying to refresh the browser when it fails.

2014-05-19T06:23:39.705Z - info: 192.168.1.102 - GET / HTTP/1.1
Error: EMFILE, open '/opt/miner-dashboard/frontend/views/index.hbs'
glob error { [Error: EMFILE, readdir '/opt/miner-dashboard/frontend/views']
errno: 20,
code: 'EMFILE',
path: '/opt/miner-dashboard/frontend/views' }
2014-05-19T06:23:40.396Z - info: miner1 - error fetching miner data Error: connect EMFILE
2014-05-19T06:23:40.399Z - info: miner10 - error fetching miner data Error: connect EMFILE

In the browser I get this:

Error: EMFILE, open '/opt/miner-dashboard/frontend/views/index.hbs'

I will try a clean installation, also I get the following in npm update/install. Not sure if they mean anything.

npm WARN engine hawk@0.10.2: wanted: {"node":"0.8.x"} (current: {"node":"v0.10.28","npm":"1.4.9"})

npm WARN unmet dependency /opt/miner-dashboard/node_modules/grunt-browserify requires async@'~0.7.0' but will load
npm WARN unmet dependency /opt/miner-dashboard/node_modules/async,
npm WARN unmet dependency which is version 0.8.0
npm WARN unmet dependency /opt/miner-dashboard/node_modules/grunt/node_modules/js-yaml/node_modules/argparse requires underscore.string@'~2.3.1' but will load
npm WARN unmet dependency /opt/miner-dashboard/node_modules/grunt/node_modules/underscore.string,
npm WARN unmet dependency which is version 2.2.1
npm WARN unmet dependency /opt/miner-dashboard/node_modules/handlebars/node_modules/uglify-js requires async@'~0.2.6' but will load
npm WARN unmet dependency /opt/miner-dashboard/node_modules/async,
npm WARN unmet dependency which is version 0.8.0

npm WARN engine cryptiles@0.1.3: wanted: {"node":"0.8.x"} (current: {"node":"v0.10.28","npm":"1.4.9"})
npm WARN engine sntp@0.1.4: wanted: {"node":"0.8.x"} (current: {"node":"v0.10.28","npm":"1.4.9"})
npm WARN engine hoek@0.7.6: wanted: {"node":"0.8.x"} (current: {"node":"v0.10.28","npm":"1.4.9"})
npm WARN engine boom@0.3.8: wanted: {"node":"0.8.x"} (current: {"node":"v0.10.28","npm":"1.4.9"})

npm WARN optional dep failed, continuing fsevents@0.2.0

@selaux
Copy link
Owner

selaux commented May 19, 2014

I'm out of ideas. It looks like the connections to cgminer cannot be opened or closed correctly and you eventually run out of file descriptors that you are allowed to open. You could increase the limit via ulimit, but that will just delay the errors. I'll leave this open, maybe I'll get some ideas in the future.

@selaux selaux modified the milestones: 0.5.0, 0.4.0 May 19, 2014
@selaux
Copy link
Owner

selaux commented May 19, 2014

Allright, it did't let me go. After some tests with your miners 😉, I think we have two issues:

  • If a host does not respond at all (DROP in iptables), the connection is kept open for an infinite amount of time (this explains the huge amount of open connections). This should be fixed by the time you read this (2b67cda).
  • The second issue might be something with your firewall. ATM the dashboard is running fine on my machine, but I had some issues where none of the miners would respond in time. Executing api-example.py does not yield any results as well when this happens and restarting miner-dashboard does not help as well (after a while without requests it does). Do you have any rate limiting enabled? If so, it needs to be set to (numberOfMiners * 3) / interval.

Let me know if there are any news with current master.

NB: You might want to increase the interval the miners are polled for such an amount of miners (the default is every second to keep the frontend responsive). I think something around 5 seconds would be better (less traffic, almost the same value).

@drak42
Copy link
Author

drak42 commented May 20, 2014

Hey, thanks for all the help!

Been running for 10 minutes now so looking good :)

@drak42
Copy link
Author

drak42 commented May 20, 2014

spoke to soon...

I'm redoing the network next week to eliminate the firewall, I'll get back to you after that. Issue may be there then

@selaux
Copy link
Owner

selaux commented Jun 28, 2014

Any update?

@drak42
Copy link
Author

drak42 commented Jul 18, 2014

Hey mate, just getting back to this.
Problem still exists, wondering if it's not somethign relating to the antminers...

@drak42
Copy link
Author

drak42 commented Jul 29, 2014

I changed the default time value of 1000 in bfgminer.js to 5000 and it seems to be working

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants