Browse files

clone parallel docs to parallelz

  • Loading branch information...
minrk committed Jan 24, 2011
1 parent 8e2ebb2 commit a4b0811049c0b85cc0a1dffd20afb88ebb712626
@@ -20,6 +20,7 @@ Contents
+ parallelz/index.txt
@@ -0,0 +1,19 @@
+.. _parallelz_index:
+Using IPython for parallel computing (ZMQ)
+.. toctree::
+ :maxdepth: 2
+ parallel_intro.txt
+ parallel_process.txt
+ parallel_multiengine.txt
+ parallel_task.txt
+ parallel_mpi.txt
+ parallel_security.txt
+ parallel_winhpc.txt
+ parallel_demos.txt
@@ -0,0 +1,282 @@
+Parallel examples
+In this section we describe two more involved examples of using an IPython
+cluster to perform a parallel computation. In these examples, we will be using
+IPython's "pylab" mode, which enables interactive plotting using the
+Matplotlib package. IPython can be started in this mode by typing::
+ ipython -p pylab
+at the system command line. If this prints an error message, you will
+need to install the default profiles from within IPython by doing,
+.. sourcecode:: ipython
+ In [1]: %install_profiles
+and then restarting IPython.
+150 million digits of pi
+In this example we would like to study the distribution of digits in the
+number pi (in base 10). While it is not known if pi is a normal number (a
+number is normal in base 10 if 0-9 occur with equal likelihood) numerical
+investigations suggest that it is. We will begin with a serial calculation on
+10,000 digits of pi and then perform a parallel calculation involving 150
+million digits.
+In both the serial and parallel calculation we will be using functions defined
+in the :file:`` file, which is available in the
+:file:`docs/examples/kernel` directory of the IPython source distribution.
+These functions provide basic facilities for working with the digits of pi and
+can be loaded into IPython by putting :file:`` in your current
+working directory and then doing:
+.. sourcecode:: ipython
+ In [1]: run
+Serial calculation
+For the serial calculation, we will use SymPy ( to
+calculate 10,000 digits of pi and then look at the frequencies of the digits
+0-9. Out of 10,000 digits, we expect each digit to occur 1,000 times. While
+SymPy is capable of calculating many more digits of pi, our purpose here is to
+set the stage for the much larger parallel calculation.
+In this example, we use two functions from :file:``:
+:func:`one_digit_freqs` (which calculates how many times each digit occurs)
+and :func:`plot_one_digit_freqs` (which uses Matplotlib to plot the result).
+Here is an interactive IPython session that uses these functions with
+.. sourcecode:: ipython
+ In [7]: import sympy
+ In [8]: pi = sympy.pi.evalf(40)
+ In [9]: pi
+ Out[9]: 3.141592653589793238462643383279502884197
+ In [10]: pi = sympy.pi.evalf(10000)
+ In [11]: digits = (d for d in str(pi)[2:]) # create a sequence of digits
+ In [12]: run # load one_digit_freqs/plot_one_digit_freqs
+ In [13]: freqs = one_digit_freqs(digits)
+ In [14]: plot_one_digit_freqs(freqs)
+ Out[14]: [<matplotlib.lines.Line2D object at 0x18a55290>]
+The resulting plot of the single digit counts shows that each digit occurs
+approximately 1,000 times, but that with only 10,000 digits the
+statistical fluctuations are still rather large:
+.. image:: single_digits.*
+It is clear that to reduce the relative fluctuations in the counts, we need
+to look at many more digits of pi. That brings us to the parallel calculation.
+Parallel calculation
+Calculating many digits of pi is a challenging computational problem in itself.
+Because we want to focus on the distribution of digits in this example, we
+will use pre-computed digit of pi from the website of Professor Yasumasa
+Kanada at the University of Tokoyo ( These
+digits come in a set of text files (
+that each have 10 million digits of pi.
+For the parallel calculation, we have copied these files to the local hard
+drives of the compute nodes. A total of 15 of these files will be used, for a
+total of 150 million digits of pi. To make things a little more interesting we
+will calculate the frequencies of all 2 digits sequences (00-99) and then plot
+the result using a 2D matrix in Matplotlib.
+The overall idea of the calculation is simple: each IPython engine will
+compute the two digit counts for the digits in a single file. Then in a final
+step the counts from each engine will be added up. To perform this
+calculation, we will need two top-level functions from :file:``:
+.. literalinclude:: ../../examples/kernel/
+ :language: python
+ :lines: 34-49
+We will also use the :func:`plot_two_digit_freqs` function to plot the
+results. The code to run this calculation in parallel is contained in
+:file:`docs/examples/kernel/`. This code can be run in parallel
+using IPython by following these steps:
+1. Copy the text files with the digits of pi
+ ( to the working directory of the
+ engines on the compute nodes.
+2. Use :command:`ipcluster` to start 15 engines. We used an 8 core (2 quad
+ core CPUs) cluster with hyperthreading enabled which makes the 8 cores
+ looks like 16 (1 controller + 15 engines) in the OS. However, the maximum
+ speedup we can observe is still only 8x.
+3. With the file :file:`` in your current working directory, open
+ up IPython in pylab mode and type ``run``.
+When run on our 8 core cluster, we observe a speedup of 7.7x. This is slightly
+less than linear scaling (8x) because the controller is also running on one of
+the cores.
+To emphasize the interactive nature of IPython, we now show how the
+calculation can also be run by simply typing the commands from
+:file:`` interactively into IPython:
+.. sourcecode:: ipython
+ In [1]: from IPython.kernel import client
+ 2009-11-19 11:32:38-0800 [-] Log opened.
+ # The MultiEngineClient allows us to use the engines interactively.
+ # We simply pass MultiEngineClient the name of the cluster profile we
+ # are using.
+ In [2]: mec = client.MultiEngineClient(profile='mycluster')
+ 2009-11-19 11:32:44-0800 [-] Connecting [0]
+ 2009-11-19 11:32:44-0800 [Negotiation,client] Connected: ./ipcontroller-mec.furl
+ In [3]: mec.get_ids()
+ Out[3]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
+ In [4]: run
+ In [5]: filestring = 'pi200m-ascii-%(i)02dof20.txt'
+ # Create the list of files to process.
+ In [6]: files = [filestring % {'i':i} for i in range(1,16)]
+ In [7]: files
+ Out[7]:
+ ['pi200m-ascii-01of20.txt',
+ 'pi200m-ascii-02of20.txt',
+ 'pi200m-ascii-03of20.txt',
+ 'pi200m-ascii-04of20.txt',
+ 'pi200m-ascii-05of20.txt',
+ 'pi200m-ascii-06of20.txt',
+ 'pi200m-ascii-07of20.txt',
+ 'pi200m-ascii-08of20.txt',
+ 'pi200m-ascii-09of20.txt',
+ 'pi200m-ascii-10of20.txt',
+ 'pi200m-ascii-11of20.txt',
+ 'pi200m-ascii-12of20.txt',
+ 'pi200m-ascii-13of20.txt',
+ 'pi200m-ascii-14of20.txt',
+ 'pi200m-ascii-15of20.txt']
+ # This is the parallel calculation using the method
+ # which applies compute_two_digit_freqs to each file in files in parallel.
+ In [8]: freqs_all =, files)
+ # Add up the frequencies from each engine.
+ In [8]: freqs = reduce_freqs(freqs_all)
+ In [9]: plot_two_digit_freqs(freqs)
+ Out[9]: <matplotlib.image.AxesImage object at 0x18beb110>
+ In [10]: plt.title('2 digit counts of 150m digits of pi')
+ Out[10]: <matplotlib.text.Text object at 0x18d1f9b0>
+The resulting plot generated by Matplotlib is shown below. The colors indicate
+which two digit sequences are more (red) or less (blue) likely to occur in the
+first 150 million digits of pi. We clearly see that the sequence "41" is
+most likely and that "06" and "07" are least likely. Further analysis would
+show that the relative size of the statistical fluctuations have decreased
+compared to the 10,000 digit calculation.
+.. image:: two_digit_counts.*
+Parallel options pricing
+An option is a financial contract that gives the buyer of the contract the
+right to buy (a "call") or sell (a "put") a secondary asset (a stock for
+example) at a particular date in the future (the expiration date) for a
+pre-agreed upon price (the strike price). For this right, the buyer pays the
+seller a premium (the option price). There are a wide variety of flavors of
+options (American, European, Asian, etc.) that are useful for different
+purposes: hedging against risk, speculation, etc.
+Much of modern finance is driven by the need to price these contracts
+accurately based on what is known about the properties (such as volatility) of
+the underlying asset. One method of pricing options is to use a Monte Carlo
+simulation of the underlying asset price. In this example we use this approach
+to price both European and Asian (path dependent) options for various strike
+prices and volatilities.
+The code for this example can be found in the :file:`docs/examples/kernel`
+directory of the IPython source. The function :func:`price_options` in
+:file:`` implements the basic Monte Carlo pricing algorithm using
+the NumPy package and is shown here:
+.. literalinclude:: ../../examples/kernel/
+ :language: python
+To run this code in parallel, we will use IPython's :class:`TaskClient` class,
+which distributes work to the engines using dynamic load balancing. This
+client can be used along side the :class:`MultiEngineClient` class shown in
+the previous example. The parallel calculation using :class:`TaskClient` can
+be found in the file :file:``. The code in this file creates a
+:class:`TaskClient` instance and then submits a set of tasks using
+:meth:`` that calculate the option prices for different
+volatilities and strike prices. The results are then plotted as a 2D contour
+plot using Matplotlib.
+.. literalinclude:: ../../examples/kernel/
+ :language: python
+To use this code, start an IPython cluster using :command:`ipcluster`, open
+IPython in the pylab mode with the file :file:`` in your current
+working directory and then type:
+.. sourcecode:: ipython
+ In [7]: run
+ Submitted tasks: [0, 1, 2, ...]
+Once all the tasks have finished, the results can be plotted using the
+:func:`plot_options` function. Here we make contour plots of the Asian
+call and Asian put options as function of the volatility and strike price:
+.. sourcecode:: ipython
+ In [8]: plot_options(sigma_vals, K_vals, prices['acall'])
+ In [9]: plt.figure()
+ Out[9]: <matplotlib.figure.Figure object at 0x18c178d0>
+ In [10]: plot_options(sigma_vals, K_vals, prices['aput'])
+These results are shown in the two figures below. On a 8 core cluster the
+entire calculation (10 strike prices, 10 volatilities, 100,000 paths for each)
+took 30 seconds in parallel, giving a speedup of 7.7x, which is comparable
+to the speedup observed in our previous example.
+.. image:: asian_call.*
+.. image:: asian_put.*
+To conclude these examples, we summarize the key features of IPython's
+parallel architecture that have been demonstrated:
+* Serial code can be parallelized often with only a few extra lines of code.
+ We have used the :class:`MultiEngineClient` and :class:`TaskClient` classes
+ for this purpose.
+* The resulting parallel code can be run without ever leaving the IPython's
+ interactive shell.
+* Any data computed in parallel can be explored interactively through
+ visualization or further numerical calculations.
+* We have run these examples on a cluster running Windows HPC Server 2008.
+ IPython's built in support for the Windows HPC job scheduler makes it
+ easy to get started with IPython's parallel capabilities.
Oops, something went wrong.

0 comments on commit a4b0811

Please sign in to comment.