Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it so NUM_THREADS is set intelligently instead of idiotically in makefile so doing "make ptest" or "make ptestlong" doesn't kill some computers #6283

Closed
williamstein opened this issue Jun 14, 2009 · 21 comments

Comments

@williamstein
Copy link
Contributor

The top of SAGE_ROOT/makefile is

# How many threads should be used when doing parallel testing (and
# sometime in the future, parallel building)?
NUM_THREADS=20

I've many times accidently done "make ptest" and with extremely unpleasant results the next day.

Component: doctest coverage

Author: John Palmieri

Reviewer: Dan Drake, Minh Van Nguyen

Merged: Sage 4.1.2.alpha4

Issue created by migration from https://trac.sagemath.org/ticket/6283

@williamstein williamstein added this to the sage-4.1.2 milestone Jun 14, 2009
@jhpalmieri
Copy link
Member

Attachment: trac_6283-numthreads.patch.gz

apply to scripts repository. depends on #2624.

@jhpalmieri
Copy link
Member

patch for SAGE_ROOT/makefile

@jhpalmieri
Copy link
Member

Attachment: makefile.patch.gz

Attachment: makefile.gz

the file SAGE_ROOT/makefile

@jhpalmieri
Copy link
Member

comment:1

I'm not sure what "intelligently" means, but here is a patch which sets it to be the number of processors, as detected by multiprocessing.cpu_count(). This also changes the command sage -tp N <files>: if N is 0, then replace it with the number of processors.

This depends on the patch at #2624.

@jhpalmieri
Copy link
Member

Author: John Palmieri

@dandrake
Copy link
Contributor

comment:2

I really want this patch in, since I am very very tired of editing the makefile on every tarball I unpack. One question: do we know what multiprocessing.cpu_count() returns on something like the Sun T2 machine? That machine has many "threads" but not terribly many cores, and I'm not sure what cpu_count does. Also, does this work in Cygwin or Windows? We are trying to revive the Cygwin port.

@dandrake
Copy link
Contributor

comment:3

Hrm, I just checked on T2, and multiprocessing.cpu_count() returns 128 on that machine, which perhaps is not ideal. The whole issue of cpus/cores/threads is really complicated -- see our own drkirkby on comp.os.solaris. I think we can ignore this issue for the moment, since Sage doesn't even really build on T2.

@jhpalmieri
Copy link
Member

comment:4

Replying to @dandrake:

Hrm, I just checked on T2, and multiprocessing.cpu_count() returns 128 on that machine, which perhaps is not ideal. The whole issue of cpus/cores/threads is really complicated -- see our own drkirkby. I think we can ignore this issue for the moment, since Sage doesn't even really build on T2.

I'm happy to ignore this. If it returns the "wrong" number, that seems like a bug in Python.

I don't have access to many different kinds of machines. Should we ask people on sage-devel to test multiprocessing.cpu_count()?

@dandrake
Copy link
Contributor

comment:5

After the discussion on sage-devel, I think we should merge this. The only concrete example of a machine where this doesn't work is t2.math, and anyone can fix it by editing the makefile -- which is what everyone has been doing all along anyway!

Release manager: merge attachment: trac_6283-numthreads.patch and also patch the makefile in $SAGE_ROOT with attachment: makefile.patch.

@dandrake
Copy link
Contributor

Reviewer: Dan Drake

@sagetrac-mvngu
Copy link
Mannequin

sagetrac-mvngu mannequin commented Sep 25, 2009

Attachment: trac_6283-warning.patch.gz

adds warning about setting the number of threads to 0

@sagetrac-mvngu
Copy link
Mannequin

sagetrac-mvngu mannequin commented Sep 25, 2009

comment:6

The file SAGE_ROOT/makefile has been manually patched to include the changes in makefile.patch. I have also included a warning about using the default value of zero, i.e. having NUM_THREADS=0 which is the default:

# NUM_THREADS is the number of threads to use for parallel testing              
# (and sometime in the future, parallel building).  If this is 0, then          
# later it gets set to the number of processors -- see sage-ptest.              
# The detection of number of processors might not be reliable on some           
# platforms. On a Sun SPARC T5240 (t2.math), the reported number of processors  
# might not correspond to the actual number of processors. See ticket #6283.    
#                                                                               
# WARNING: Unless you are certain that you want to use all the cores/processors
# on your system for parallel doctesting, change the value of NUM_THREADS to    
# a (sensible) positive integer. The default value is zero.                     
NUM_THREADS=0  # default is zero

I think people should be made aware that having a value of zero means that all the cores/processors on their system would be devoted to parallel doctesting. On a personal machine, that's OK and any damage done would be localized to the person's own machine. But on a multi-user machine such as the machines making up the Sage cluster, bsd.math, etc., the default value of zero can be dangerous because on sage.math, say, this would use 24 cores for parallel doctesting. Most of the time, people are running very long jobs on sage.math, mod.math, and geom.math. Using all 24 cores on any of these machines can have unexpected consequences such as having doctest failures due to segfaults, out of memory error, and even bringing those machines offline. The patch trac_6283-warning.patch also adds a warning to this effect to the file sage-ptest. If you want to use the file SAGE_ROOT/makefile for parallel doctesting, be absolute sure that you have set a sensible positive integer for NUM_THREADS.

@dandrake
Copy link
Contributor

comment:7

Your warnings are a good idea. Perhaps we can make it part of release manager boot camp to teach release managers to do doctests with ./sage -tp <sensible positive int> devel/sage or make ptest NUM_THREADS=<s.p.i> instead of just doing make ptest. (Similar comments for ptestlong.)

@sagetrac-mvngu
Copy link
Mannequin

sagetrac-mvngu mannequin commented Sep 25, 2009

comment:8

Merged these patches in the script repository:

  1. trac_6283-numthreads.patch
  2. trac_6283-warning.patch

Manually patched the file SAGE_ROOT/makefile using makefile.patch, together with the above warning message.

@sagetrac-mvngu
Copy link
Mannequin

sagetrac-mvngu mannequin commented Sep 25, 2009

Merged: Sage 4.1.2.alpha3

@sagetrac-mvngu
Copy link
Mannequin

sagetrac-mvngu mannequin commented Sep 25, 2009

Changed reviewer from Dan Drake to Dan Drake, Minh Van Nguyen

@sagetrac-mvngu sagetrac-mvngu mannequin closed this as completed Sep 25, 2009
@sagetrac-drkirkby
Copy link
Mannequin

sagetrac-drkirkby mannequin commented Sep 25, 2009

comment:9

I realise this has got positive review, but I would have personally not given it that. I would have personally not allowed the default to exceed 8, since for 99% of machines with 8 or more 'CPUs/threads', they are mutli-user servers, not personal workstations. As such, people should not be exploiting them to the full.

On 't2' currently this would not cause an issue, since it is not used a lot. But it would be an issue once there are many users. I doubt it would be sensible on sage.math to exploit it to the full.

Dave

@sagetrac-mvngu
Copy link
Mannequin

sagetrac-mvngu mannequin commented Sep 25, 2009

comment:10

Replying to @sagetrac-drkirkby:

I realise this has got positive review, but I would have personally not given it that. I would have personally not allowed the default to exceed 8, since for 99% of machines with 8 or more 'CPUs/threads', they are mutli-user servers, not personal workstations. As such, people should not be exploiting them to the full.

Agreed.

On 't2' currently this would not cause an issue, since it is not used a lot. But it would be an issue once there are many users. I doubt it would be sensible on sage.math to exploit it to the full.

I think a value of 1 would be sensible. No matter if one runs doctest using SAGE_ROOT/makefile on a personal machine such as a Pentium 4, a dual core PC or sage.math, it would still use one thread as default. After manually merging changes to SAGE_ROOT/makefile, I did:

make ptestlong

only to realize that it spawned 24 cores on mod.math to parallel doctest. In panic mode, I hastily killed all of my threads. So how about opening another ticket to make NUM_THREADS=1?

@dandrake
Copy link
Contributor

comment:11

Replying to @sagetrac-mvngu:

So how about opening another ticket to make NUM_THREADS=1?

I have a different idea, which we'll discuss at #7011.

@sagetrac-mvngu
Copy link
Mannequin

sagetrac-mvngu mannequin commented Sep 27, 2009

Changed merged from Sage 4.1.2.alpha3 to Sage 4.1.2.alpha4

@sagetrac-mvngu
Copy link
Mannequin

sagetrac-mvngu mannequin commented Sep 27, 2009

comment:12

There is no 4.1.2.alpha3. Sage 4.1.2.alpha3 was William Stein's release for working on making the notebook a standalone package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants