Illustrates multithreading and quality up.

# Parallel Runs

Almost all computers are parallel and have multiple cores available.

In [1]:
from phcpy.dimension import get_core_count
nbcores = get_core_count()
nbcores

PHCv2.4.88 released 2023-12-26 works!


32

In the experiment we use the cyclic 7-roots problem.

In [2]:
from phcpy.families import cyclic
c7 = cyclic(7)
for pol in c7:
    print(pol)

x0 + x1 + x2 + x3 + x4 + x5 + x6;
x0*x1 + x1*x2 + x2*x3 + x3*x4 + x4*x5 + x5*x6 + x6*x0;
x0*x1*x2 + x1*x2*x3 + x2*x3*x4 + x3*x4*x5 + x4*x5*x6 + x5*x6*x0 + x6*x0*x1;
x0*x1*x2*x3 + x1*x2*x3*x4 + x2*x3*x4*x5 + x3*x4*x5*x6 + x4*x5*x6*x0 + x5*x6*x0*x1 + x6*x0*x1*x2;
x0*x1*x2*x3*x4 + x1*x2*x3*x4*x5 + x2*x3*x4*x5*x6 + x3*x4*x5*x6*x0 + x4*x5*x6*x0*x1 + x5*x6*x0*x1*x2 + x6*x0*x1*x2*x3;
x0*x1*x2*x3*x4*x5 + x1*x2*x3*x4*x5*x6 + x2*x3*x4*x5*x6*x0 + x3*x4*x5*x6*x0*x1 + x4*x5*x6*x0*x1*x2 + x5*x6*x0*x1*x2*x3 + x6*x0*x1*x2*x3*x4;
x0*x1*x2*x3*x4*x5*x6 - 1;


In [3]:
from phcpy.volumes import mixed_volume
mixed_volume(c7)

924

For this problem, there are as many solutions as the mixed volume.

In [4]:
from phcpy.solver import solve

To measure the speedup, the elapsed time between the start and the end of the run has to be computed.  The most honest time measurement is the *wall clock time* which as suggested uses the time on the wall clock.  The timers provided by Python do not measure the CPU time of compiled code that is executed by the solver.

In [5]:
from datetime import datetime

In [6]:
timestart = datetime.now()
s = solve(c7)
timestop = datetime.now()
elapsed_onecore = timestop - timestart
print('elapsed wall clock time on 1 core :', elapsed_onecore)

elapsed wall clock time on 1 core : 0:00:04.097170


We check whether we have as many solutions as the mixed volume.

In [7]:
len(s)

924

Now we solve again, using all available cores.

In [8]:
timestart = datetime.now()
s = solve(c7, tasks=nbcores)
timestop = datetime.now()
elapsed_manycores = timestop - timestart
print('elapsed wall clock time on', nbcores, 'cores:', elapsed_manycores)

elapsed wall clock time on 32 cores: 0:00:00.384301


In [9]:
len(s)

924

Observe the reduction in the elapsed wall clock time.

In [10]:
speedup = elapsed_onecore/elapsed_manycores
speedup

10.66135659287902

## quality up

Can multithreading compensate for the overhead of double double arithmetic?
If we can afford the time for a sequential run, by how much can we increase
the precision in a multithreaded run in the same time or less?

In [11]:
timestart = datetime.now()
s = solve(c7, tasks=nbcores, precision='dd')
timestop = datetime.now()
elapsed = timestop - timestart
print('elasped wall clock time on', nbcores, 'cores :', elapsed)

elasped wall clock time on 32 cores : 0:00:03.663632


Again, we check whether we have as many solutions as the mixed volume.

In [12]:
len(s)

924

In [13]:
elapsed < elapsed_onecore

True

With the multicore run, we compensated for the cost overhead of double double arithmetic, as the elapsed wall clock time on many cores in double double precision is less than the run on one core in double precision.