fhstoica/N_Body_Simulation
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This project contains the source code needed for simulating the evolution of a star cluster using the Verlet algorithm. See for example: http://en.wikipedia.org/wiki/Verlet_integration Since the complexity is O(N^2) where N is the number of particles, it is essential to take advantage of multi-core processors. It is a multi-threaded simulation using barrier syncronization and the number of threads can be specified by the user via the "MaxThreads" parameter. Under Linux CPU affinity can also be specified, by uncommenting the -DLINUX preprocessor option in the The particles are split in equal groups per thread, each thread computing the updated positions and velocities of each particle in its assigned group. The algorithm uses adaptive time step, and writes out the values only when enough evolution time has accumulated. At this point the cumulative time is reset, and the process is repeated. This way the updated particle positions are written out at regular time intervals, even though the number of simulation steps can change. To avoid delays due to I/O operations the intermediate results are accumulated in a vector which is written to file only at the end of the simulation. The threads use barrier synchronization, but the last thread to reach the barrier much check if enough time has accumulated to write the current particle positions. This is determined by having each thread set a bit on a global 32-bit unsigned integer, so we are limited to 32 threads. Locking is needed so that the integer is not modified between the time the thread sets its bit and the time checks the resulting value. The total number of simulation steps is specified in the beginning, but the resulting number of frames depends on the actual evolution of the cluster. The largest disadvantage is the formation of a close binary pair; the particles move at very large speed with very small separation, so the timestep will be always very small. As mentioned in the beginning, the treads can have user-specified CPU affinity, so that any thread can run on any CPU, or each thread can be forced to run on a single CPU. Affinity is set via the function: pthread_setaffinity_np(p_thread[i], sizeof(cpu_set_t), &cpuset); It works only in Linux as "_np" stands for non-portable. If a thread is created on the wrong CPU it is migrated. This is not very costly, as it is done only once at the beginning of the run. To enable this, uncomment the -DLINUX preprocessor option in the Makefile. The user can leave these two lines in the main() function: CPU_ZERO(&cpuset); CPU_SET(i, &cpuset); uncommented, and this will force each thread to run on a single CPU. Commenting them out will allow any thread to run on any CPU. Leaving -DLINUX commented-out will allow the program to compile in Windows using MinGV. However, it will not work in Cygwin, as barriers are not supported. To build the program use: make star_cluster.srl and then simply run it with: ./star_cluster.srl One also needs the intial data for the cluster, generated by the Gen_Init_Cond.py python script. Simply edit the script to set the correct number of particles (the same as in star_cluster.cpp) and then run with: python Gen_Init_Cond.py 99.54719 It creates the initial_data.dat file used by star_cluster.cpp. The number given as argument is the seed for the random number generator. One thing to keep in mind is that if you increase the number of particles you'll have to modify the velocity scale "vscale" and the position (inter-particle distance) scale "pscale" to obtain a well-behaved cluster. That is because the potential energy is negative (attractive) and increases as O(N^2) with the number of particles. Failing to do so will result in a cluster of very fast-moving particles, which will force the time step to be very small, resulting in very little evolution time. Visualising the data: The data can be visualised using the Show_Cluster_Motion.cpp program. The number of particles must be set manually to the correct value (same as in star_cluster.cpp). const GLint N = 4; To build the program simply use make -f OpenGL_Makefile Show_Cluster_Motion.srl or in Windows make -f OpenGL_Makefile_Win Show_Cluster_Motion.srl You may need to edit the makefiles so that they point to the correct directories containing the gl.h, glu.h and glut.h. Then use it as: ./Show_Cluster_Motion.srl It reads the "positions.txt" generated by the simulation. Click on the window and move the mouse around to rotate the entire cluster while the particles are moving. Internally it creates a vector of display lists and loops over it calling one list at a time until the users presses ESC. The cluster simulation above can make a pretty good benchmark for processor speed on multi-threaded tasks. However, due to the N^2 scaling, it becomes impractical to run it for clusters larger than 128 particles. Past this point we enter the realm of GPU programming. The interesting part is to see for each number of particles what is optimal: to allow any thread on any CPU core, or to force each thread to run on a sigle CPU. Happy hacking! Update 07-Dec-2014: The code has been updated to take advantage of the SSE instructions available on modern processors. The -DSSE compiler flag selects the SSE code. The run time drops to a bout a half of the non-optimized version. The compiler flags for tuning to a specific architecture are set for Athlon 64, but they can be easily changed to other architectures.
About
Multithreaded N-body (star cluster) simulation using barrier synchronization, and OpenGL code to visualize the results.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published