NQueensFAF is an independent research project dedicated to the development of highly optimized or generally new algorithms for solving the N-Queens problem. It is platform independent and the included solvers can be tested through the GUI or CLI of the demo application or embedded in your project (πInstallation). Built with Java 21.
Included:
- a solver for CPUs using Java Threads, supports multi-threading
- a solver for GPUs using OpenCL, supports automated distribution among multiple GPUs (multi-GPU)
- a simple, recursive, single-threaded solver for comparison purposes.
Currently work in progress:
- a completely new solving method with outstanding performance and excellent scaling (πNews)
- a server-client system for distributed computing on heterogeneous GPUs and CPUs (πDistributed Computing)
Description | GUI | CLI |
---|---|---|
adjust the number of CPU threads | β | β |
adjust the number of pre-placed queens | β | β |
view, select and configure available GPUs | β | β |
automatically determine the weight of all available GPUs (for multi-GPU) |
β | |
save&restore the progress of a solver run (includes auto-save) | β | β |
save/load solver configurations to/from files | β | |
see a history of all finished runs during the current session | β | |
see the records for a certain N (record = shortest duration for finishing a run) |
β | |
copy the solver configuration of a record or history entry | β |
Note: While the GPU-Solver is tested successfully for a range of NVIDIA and Intel Integrated GPUs, it remains not working on AMD GPUs and untested for Intel Arc GPUs. If you happen to have an Intel Arc GPU, feel free to test it and let us know if it worked or not :)
During the time we have spent developing NQueensFAF, we have been able to continuously expand our available hardware. Especially the newer graphics cards show the potential of our program.
GPUs
Board size N | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 |
---|---|---|---|---|---|---|---|---|
RTX 3080 FE | 0.03s | 0.77s | 5.85s | 0:48m | 6:56m | 1:02h | 9:45h | 4d 7h |
RTX 3060 Ti FE | 0.10s | 1.26s | 10.18s | 1:23m | 12:10m | 1:49h | 17:50h | 7d 2h |
GTX 1650 Ti | 0.40s | 3.62s | 29.08s | 4:02m | 35:21m | not measured | not measured | not measured |
Intel UHD 770 | 4.71s | 32.98s | 4:18m | 36:13m | not measured | not measured | not measured | not measured |
RX 6650 XT | 0.28s | 2.00s | 16.60s | 2:13m | 19:14m | 3:03h | not measured | not measured |
CPUs
Board size N | 16 | 17 | 18 | 19 | 20 | 21 | 22 |
---|---|---|---|---|---|---|---|
i5 - 12600k single | 1.12s | 7.04s | 49.92s | 6:21m | 57:47m | not measured | not measured |
i5 - 12600k multi | 0.203s | 0.79s | 4.91s | 37.1s | 4:59m | 42:20m | 6:09h |
i5 - 9300h single | 1.32s | 8.95s | 1:05m | 8:20m | 1:10h | not measured | not measured |
i5 - 9300h multi | 0.25s | 1.75s | 12.5s | 1:35m | 13:05m | 1:52h | 16:18h |
Ryzen 5800X single | 0.91s | 6.09s | 44.3s | 5:38m | 45:24m | mot measured | not measured |
Ryzen 5800X multi | 0.28s | 0.70s | 4.06s | 30.3s | 4:04m | 33:53m | not measured |
Single stands for single-threaded and multi for multi-threaded with the maximum number of threads. The CPUs and the GPUs are used with stock settings.
Note: Your graphics card may go into another power state when running the program. To check this and to avoid this, you can use a tool such as "nvidiainfo".
Java21 (or a newer version) needs to be installed on your computer.
The demo application can be downloaded from the Releases page.
There are two artifacts potentially useful for external projects:
nqueensfaf-impl
: contains the classes representing the solvers; depends onnqueensfaf-core
nqueensfaf-core
: simplifies the implementation of a new solver algorithm and its usage
Their jar's can be downloaded from the Releases page to be added to the classpath of your project.
The GUI is self-explanatory. If you do have a question though, feel free to ask.
Command format:
nqueensfaf-demo (-n=<N> | -r=<path_to_save_file>) [<general_options>] <solver_command>
If you just want to get started, take a look at the examples in the πCPU-Solver section and the πGPU-Solver section.
Specify exactly one of them:
-n=<N>
βΆ specify the board size to start a new computation-r=<path_to_save_file>
βΆ specify the path to a save-file generated using_auto-save to continue an old computation
-s=<percentage>
βΆ auto-save percentage interval in decimal, for example -s=0.05 for auto-saving in 5% intervals-u=<interval>
βΆ update duration, solutions and progress after each <interval> milliseconds-h
βΆ print device specific help message
Note: You must re-enable auto-saving again each time you resume from a save-file.
Command format:
cpu [-t=<threadcount>] [-p=<pre_queens>] [-h]
-t=<threads>
βΆ number of threads that should be used-p=<pre_queens>
βΆ number of pre-placed queens, default is 6. A higher number means more but smaller tasks by placing additional queens before starting the worker threads. Most of the time 6 is the best option.-h
βΆ print CPU-Solver specific help message
- N=16 on CPU with 1 thread
nqueensfaf-demo -n=16 cpu
- N=18 on CPU with 8 threads
nqueensfaf-demo -n=18 cpu -t=8
- N=20 with 8 threads and auto-saves in 5% steps
nqueensfaf-demo -n=20 -s=0.05 cpu -t=8
- continue the solution of the 20 queens problem from the save-file 20-queens.faf and auto-save in 5% steps
nqueensfaf-demo -s=0.05 -r=./20-queens.faf cpu -t=8
Command format:
gpu [-p=<pre_queens>] [-h]
-0
βΆ use the default GPU-p=<pre_queens>
βΆ see πCPU-Solver-h
βΆ print GPU-Solver specific help message
When -0
is not specified, the πselection of GPUs is done interactively when executing the command.
- compute N=18 on the default GPU
nqueensfaf-demo -n=18 gpu -0
- compute N=20 and select the GPUs interactively
nqueensfaf-demo -n=20 gpu
- compute N=20, select the GPUs interactively and do auto-saves each 10%
nqueensfaf-demo -n=20 -s=0.1 gpu
When starting the GPU-Solver, you will see a list of all available GPUs provided with indices. To select a GPU, you simply enter its index and its configuration, separated by commata. Multiple GPUs are selected one after another.
Selection / Configuration format:
<index>[,wg=<workgroup_size>][,mw=<weight>]
wg
βΆ workgroup size on the GPU, default value 64 is best for NVIDIA GPUs. A value of 24 has proven best for Integrated Intel GPUs.mw
βΆ the weight, for multi-GPU, represents the portion of the workload that the respective GPU should process. In other words, it ideally should represent the GPU's performance relative to the other selected GPUs. Default value is 1.
- select GPU 0 (default GPU) and set its workgroup size to 128
0,wg=128
- select GPUs 0 and 1, GPU 0 should use workgroup size 24 and GPU 1 should do twice the work of GPU 0
0,mw=1,wg=24
β
1,mw=2
- select GPUs 0, 1 and 2 and let them all contribute equally (use default weight 1)
0
β
1
β
2
CpuSolver cs = new CpuSolver();
cs.onInit(() -> System.out.println("Starting Solver for board size " + cs.getN() + "..."))
cs.onFinish(() -> System.out.println("Found " + cs.getSolutions() + " solutions in " + cs.getDuration() + " ms"))
cs.setN(16)
cs.solve();
GpuSolver gs = new GpuSolver();
List<Gpu> availableGpus = gs.getAvailableGpus();
gs.gpuSelection().choose(availableGpus.get(0));
gs.setN(18);
gs.solve();
For more examples, take a look at the examples module.
The abstract class AbstractSolver
provides a good structure and handy features for implementing your own solver. Just extend it and fill the abstract methods with your code.
The documentation of nqueensfaf-core
can help you here.
A subproject of NQueensFAF is the development and administration of a distributed computing system in the context of the N-Queens problem. We aim to deploy an easy-to-use client program that supports Windows and Linux as well as macOS, while also keeping the setup-process to a minimum, so that anybody with a computer can contribute.
The goals are:
- Solve N=27 and confirm the results of the TU Dresden.
- Solve N=28 and set the new world record.
Further updates on this are expected for summer 2025.
- We are currently developing a new solver which is based on a completely new method. Solving N=22 on the 12600k (single-threaded) takes only 2h25min, which corresponds to a speedup factor of more than 40 compared to the present solver. Additionally, the method possesses much better scaling. At N=24 the speedup factor is already 100. This is still a work in progress and there are lots of optimizations that have to be implemented, so lets see how far we can go. The new solver will be included in the repository as soon as it is finished. However, this may take some time.
- We are excited to announce that we have successfully verified the number of solutions for the 26-Queens problem.
The computation was performed using 3 GPUs (2x3070, 1x3060ti) and it took slightly more than 3 weeks to finish.
The CPU-Solver and the GPU-Solver are based on following concepts:
- using bits to represent the occupancy of the board; based on the implementation by Jeff Somers
- calculating start constellations, in which the borders of the board are already occupied by 3 or 4 queens; based on the implementation by the TU Dresden (click here for a very good description fo this method)
- GPU: remember board-leaving diagonals when going to the next row, so that they can be reinserted when we go backwards. This has also been done in Ping Che Chen's implementation of the N Queens Problem for GPU's
- Volker Berndt, for introducing us to the chess world
- Vincent Hindriksen, for mentioning NQueensFAF in his blog at StreamHPC