Allow per-pool algorithm/nfactor and switching kernel at runtime (ORIG: Allow more per-pool configuration values) #134

Humancell · 2014-03-05T07:48:53Z

I found myself tonight wanting to set up a .conf file with multiple pools ... but realized that one of the pools is using scrypt-n.

It would be valuable if some of the configuration items that are pool specific could be added to the pools section in the .conf file.

e.g. adding "nfactor" : 11 in only some of the pools sections.

Humancell · 2014-03-05T19:42:40Z

As I did more looking at this, I see that due to historical reasons the json of the .conf was never really designed thinking of multiple algorithms for different coins.

What I am seeing now is that the "pools" section contains some pool specific information, but that the rest of the .conf contains a mixture of GPU/Hardware related settings and coin/pool specific values.

I wonder if there is a way to modify the way that sgminer parses the .conf so that the format could remain as is ... but allow for "pool specific overrides" to be added to the pools section of the .conf?

This would provide backwards compatibility, and allow newer files to have the pool specific settings.

I do realize that this means that the code would have to be updated to allow some of these settings to be changed by sgminer when pool switches occur ... a whole other issue that would have to be explored.

veox · 2014-03-10T09:49:11Z

I do realize that this means that the code would have to be updated to allow some of these settings to be changed by sgminer when pool switches occur ...

This indeed would be very nice, and nothing impossible. Switching kernels at runtime and an overhaul of configuration parsing.

IMO (EDIT: implementing this in the current codebase) is just postponing the inevitable rewrite, though.

mrbrdo · 2014-03-17T17:56:46Z

@veox shouldn't be too hard I imagine. Also I wouldn't just dismiss the benefits of "battle-tested" cgminer codebase. Sure, future development would probably be much faster on Python, but it would probably take quite a while before it would work rock solid. I wouldn't mind a Python rewrite at all but I think it will take months before it's stable enough for regular use. You'd probably have to write tests too and that takes time as well.

One important use case of this for me is that I could give my rig up for Scrypt renting and mine Vert in the meantime.
I think even just having algorithm/nfactor settings per-pool could be enough, I use identical settings for scrypt and vertminer on my R9 280Xs. Haven't tried on sgminer but definitely planning to.

mrbrdo · 2014-03-20T04:16:02Z

Anyone wanna try this?
https://github.com/mrbrdo/sgminer/tree/poolalgo
Don't forget it's the "poolalgo" branch. If you wanna download zip: https://github.com/mrbrdo/sgminer/archive/poolalgo.zip

Example config:

{
"pools" : [
        {
                "url" : "stratum+tcp://myscryptpool",
                "user" : "y",
                "pass" : "x",
                "pool-algorithm" : "scrypt",
                "pool-nfactor" : "10"
        },
        {
                "url" : "stratum+tcp://myvertcoinpool",
                "user" : "y",
                "pass" : "x",
                "pool-algorithm" : "adaptive-n-factor",
                "pool-nfactor" : "11"
        }
]
...

I can only test it on my Macbook Air at the moment and the hashrate is so low I can't make any real conclusions, but it seems like it's working. If everything is well I can clean it up and make a PR, but I don't know if this is the best possible approach to solve this. But hey, if it works, it's good enough for me :)

mrbrdo · 2014-03-21T03:14:42Z

I got the chance to try it out... It doesn't crash, but after switching to a pool with a different algo, the hashrate stays really low, and I can't figure out why... When switching from scrypt to adaptive, I also got "GPU0: invalid nonce - HW error" errors, but when switching from adaptive to scrypt, I didn't get those. In both cases the hashrate was really low, though. Any idea @veox ? I think someone more familiar with the codebase might immediately know what I'm missing here..
The invalid nonce sounds like it's still using the old kernel, but initCl does get called and the kernel gets compiled for sure.

mrbrdo · 2014-03-22T05:00:25Z

Success. Sgminer was not freeing up OpenCL buffers in thread_shutdown, that's why it slowed down after restarting, it works now. I still have to fix something probably related to if there are multiple gpus/threads, but on my test rig where I have just one thread it works pretty nicely.

veox · 2014-03-22T14:10:55Z

@mrbrdo sorry for the communication lag, busy working AFK this week.

The poolalgo branch code looks pretty good. Kernel switching at runtime is much desired. The pool-* variables are a nuisance, sure, but that's an issue with the config-parsing code mostly.

Haven't tested yet, will probably have time next week.

mrbrdo · 2014-03-22T15:29:46Z

Cool. Like I said there is still something to fix as it doesn't work with multiple gpu threads yet (segfault after startup).

mrbrdo · 2014-03-22T18:01:43Z

Got it.. Just a stupid mini mistake. It works now, but I have to test it a bit more, not sure if I get same hashrate now... Have to test.

I have 2 options with different problems:

if I pause the threads and reinit OpenCL (should be also a little faster), then the problem is that on start, it seems the threads might not always get to the point where they are paused (call mt_disable), and then it just wait for that indefinitely
if I cancel the threads and then completely reinit (reinit_gpu), that doesn't have the above issue, but then sometimes I get "GPUx failure, disabling!", I guess it thinks something is wrong with the thread even though it's not

mrbrdo · 2014-03-23T02:52:24Z

Ok I eventually got to a better approach, switching kernels after it already gets work for the new pool. It works pretty well now, but there is still some memory probably leaking in OpenCL/on GPU, because hashrate drops a bit and I also got a CL_MEM_OBJECT_ALLOCATION_FAILURE from clEnqueueNDRangeKernel at one point.
Btw, why do we set algorithm name when it's irrelevant? I mean adaptive-n-factor is still scrypt... And the name parameter is not used anywhere except for displaying, but we could just display the nfactor that is used. After all, adapter-n-factor with nfactor 10 is just normal scrypt.

PS: Sorry for the weird indentation, I'll fix it before I do a pull req.

mrbrdo · 2014-03-23T06:37:00Z

I think I finally nailed it. No hashrate drop, after initial testing I don't see any issues. There could still possibly be corner cases (like switching pools very fast) that need to be handled, and I'm currently using a pretty lame synchronisation "workaround", but it does seem to work now. So at least we can go from here now. I can clean up a little bit and make a pull request.

I chose to switch the kernel inside the hash_sole_work function (which seems to be always used for scrypt anyway, possibly we can remove hash_driver_work and hash_queued_work), main reason being that switch_pools seems to happen more often than actual switching of pools (I guess for checking pools periodically), and also since I have to make all the threads stop processing, which I can usually do by pausing them, but that won't work when there is no work yet (primarily on startup), because the thread is stuck in get_work (blocking), then the only option would be to kill it off, but don't like that idea since I think reinit_gpu has some memory leaks (which also impact performance).

The only issue that I do see is that when mining nfactor 11, one of my cards (tried on a rig with 4 GPU and rig with 6 GPU, same in both cases) has a bit lower hashrate (280 instead of 350). Which card it is seems to be random, when I switch pool to nf10 and back, it will be a different card :) Weird stuff. Always just one as far as I've tried.

veox · 2014-03-23T13:24:15Z

Btw, why do we set algorithm name when it's irrelevant?

There's been a lot of requests for "scrypt-jane" (Keccak, actually), which uses the same N factor as Litecoin Scrypt, but a different hashing function. There's a lot more CCs covered by sph-sgminer that use yet other functions.

Scrypt has so many possible combinations of parameters/functions that IMO it's best to have "one way to define a parameter set" from the very beginning. Perhaps it should also have network algorithm aliases, too ("litecoin", "vertcoin", etc.).

veox · 2014-03-23T13:32:12Z

The only issue that I do see is that when mining nfactor 11, one of my cards (tried on a rig with 4 GPU and rig with 6 GPU, same in both cases) has a bit lower hashrate (280 instead of 350). Which card it is seems to be random, when I switch pool to nf10 and back, it will be a different card :) Weird stuff. Always just one as far as I've tried.

Are you using two host-side GPU threads (gpu-threads setting) on these cards? Could it be that one of the threads gets "stuck"? (Just an OTTOMH guess.)

mrbrdo · 2014-03-23T17:45:46Z

Yeah, two gpu-threads. I thought about that, but not sure since then I would expect the hashrate to be about half, no? It's kind of weird that it happened every time when I switched to adaptive, and never when I switched to scrypt :)
I had a similar issue with sgminer (on adaptive-n-scrypt) before, but on just one of my rigs, the hashrate on one of the cards was exactly the same, but it was always the first card (GPU0). I then tried removing cards from the rigs, and no matter which/how many cards I removed, GPU0 was always having a lower hashrate (but sometimes it went back up for a bit and then back down).
So what I am suspecting is, that the card that is slower now, is the first one that does the switch of the kernel, since it happens in no particular order. And that the bug is somewhere else, but this maybe makes it more apparent.

mrbrdo · 2014-03-24T01:04:18Z

Funny, if I restart that GPU (via ncurses cli interface for example), then starts working normally :) I'll try to see if I can make the threads restart fully instead of just reinitializing OpenCL, perhaps that will solve the issue.

mrbrdo · 2014-03-24T03:15:21Z

@veox fully working now, no more issues. I also moved the switching to get_work, seems like the best place to put it.

https://github.com/veox/sgminer/pull/163

veox · 2014-04-07T19:57:06Z

There is also a way to use algorithm in the JSON config instead of pool-algorithm. It won't work with CLI args, since it relies on json_array_index. I'm not terribly concerned about that, since --pool-algorithm looks better on the command line anyway.

Other than this (a way to change the default using the same config option name), there's nothing else to wish for with functionality. (EDIT: except maybe the API...) I'll test it a bit more in case I introduced any bugs with my changes, and merge into master.

@mrbrdo, thank you for taking this and implementing it from start to finish.

veox · 2014-04-07T20:23:31Z

Merged into master.

mrbrdo · 2014-04-07T21:33:01Z

@veox thanks veox for maintaining this project. Now let's just hope Scrypt-N coins gain profitability so I will actually have use for this again :P

veox · 2014-04-07T23:38:29Z

You just wait a twoweeks. ;)

veox added the enhancement label Mar 10, 2014

veox added this to the 5.0.0 milestone Mar 10, 2014

veox added configuration labels Mar 17, 2014

veox changed the title ~~Allow more per-pool configuration values~~ Allow per-pool algorithm/nfactor and switching kernel at runtime (ORIG: Allow more per-pool configuration values) Mar 22, 2014

veox modified the milestones: 4.2.0, 5.0.0 Mar 22, 2014

veox mentioned this issue Mar 22, 2014

Abstract support for multiple algorithms #126

Closed

veox closed this as completed Apr 7, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow per-pool algorithm/nfactor and switching kernel at runtime (ORIG: Allow more per-pool configuration values) #134

Allow per-pool algorithm/nfactor and switching kernel at runtime (ORIG: Allow more per-pool configuration values) #134

Humancell commented Mar 5, 2014

Humancell commented Mar 5, 2014

veox commented Mar 10, 2014

mrbrdo commented Mar 17, 2014

mrbrdo commented Mar 20, 2014

mrbrdo commented Mar 21, 2014

mrbrdo commented Mar 22, 2014

veox commented Mar 22, 2014

mrbrdo commented Mar 22, 2014

mrbrdo commented Mar 22, 2014

mrbrdo commented Mar 23, 2014

mrbrdo commented Mar 23, 2014

veox commented Mar 23, 2014

veox commented Mar 23, 2014

mrbrdo commented Mar 23, 2014

mrbrdo commented Mar 24, 2014

mrbrdo commented Mar 24, 2014

veox commented Apr 7, 2014

veox commented Apr 7, 2014

mrbrdo commented Apr 7, 2014

veox commented Apr 7, 2014

Allow per-pool algorithm/nfactor and switching kernel at runtime (ORIG: Allow more per-pool configuration values) #134

Allow per-pool algorithm/nfactor and switching kernel at runtime (ORIG: Allow more per-pool configuration values) #134

Comments

Humancell commented Mar 5, 2014

Humancell commented Mar 5, 2014

veox commented Mar 10, 2014

mrbrdo commented Mar 17, 2014

mrbrdo commented Mar 20, 2014

mrbrdo commented Mar 21, 2014

mrbrdo commented Mar 22, 2014

veox commented Mar 22, 2014

mrbrdo commented Mar 22, 2014

mrbrdo commented Mar 22, 2014

mrbrdo commented Mar 23, 2014

mrbrdo commented Mar 23, 2014

veox commented Mar 23, 2014

veox commented Mar 23, 2014

mrbrdo commented Mar 23, 2014

mrbrdo commented Mar 24, 2014

mrbrdo commented Mar 24, 2014

veox commented Apr 7, 2014

veox commented Apr 7, 2014

mrbrdo commented Apr 7, 2014

veox commented Apr 7, 2014