Permalink
Browse files

clone parallel docs to parallelz

  • Loading branch information...
minrk committed Jan 24, 2011
1 parent 8e2ebb2 commit a4b0811049c0b85cc0a1dffd20afb88ebb712626
View
@@ -20,6 +20,7 @@ Contents
install/index.txt
interactive/index.txt
parallel/index.txt
+ parallelz/index.txt
config/index.txt
development/index.txt
api/index.txt
@@ -0,0 +1,19 @@
+.. _parallelz_index:
+
+==========================================
+Using IPython for parallel computing (ZMQ)
+==========================================
+
+.. toctree::
+ :maxdepth: 2
+
+ parallel_intro.txt
+ parallel_process.txt
+ parallel_multiengine.txt
+ parallel_task.txt
+ parallel_mpi.txt
+ parallel_security.txt
+ parallel_winhpc.txt
+ parallel_demos.txt
+
+
@@ -0,0 +1,282 @@
+=================
+Parallel examples
+=================
+
+In this section we describe two more involved examples of using an IPython
+cluster to perform a parallel computation. In these examples, we will be using
+IPython's "pylab" mode, which enables interactive plotting using the
+Matplotlib package. IPython can be started in this mode by typing::
+
+ ipython -p pylab
+
+at the system command line. If this prints an error message, you will
+need to install the default profiles from within IPython by doing,
+
+.. sourcecode:: ipython
+
+ In [1]: %install_profiles
+
+and then restarting IPython.
+
+150 million digits of pi
+========================
+
+In this example we would like to study the distribution of digits in the
+number pi (in base 10). While it is not known if pi is a normal number (a
+number is normal in base 10 if 0-9 occur with equal likelihood) numerical
+investigations suggest that it is. We will begin with a serial calculation on
+10,000 digits of pi and then perform a parallel calculation involving 150
+million digits.
+
+In both the serial and parallel calculation we will be using functions defined
+in the :file:`pidigits.py` file, which is available in the
+:file:`docs/examples/kernel` directory of the IPython source distribution.
+These functions provide basic facilities for working with the digits of pi and
+can be loaded into IPython by putting :file:`pidigits.py` in your current
+working directory and then doing:
+
+.. sourcecode:: ipython
+
+ In [1]: run pidigits.py
+
+Serial calculation
+------------------
+
+For the serial calculation, we will use SymPy (http://www.sympy.org) to
+calculate 10,000 digits of pi and then look at the frequencies of the digits
+0-9. Out of 10,000 digits, we expect each digit to occur 1,000 times. While
+SymPy is capable of calculating many more digits of pi, our purpose here is to
+set the stage for the much larger parallel calculation.
+
+In this example, we use two functions from :file:`pidigits.py`:
+:func:`one_digit_freqs` (which calculates how many times each digit occurs)
+and :func:`plot_one_digit_freqs` (which uses Matplotlib to plot the result).
+Here is an interactive IPython session that uses these functions with
+SymPy:
+
+.. sourcecode:: ipython
+
+ In [7]: import sympy
+
+ In [8]: pi = sympy.pi.evalf(40)
+
+ In [9]: pi
+ Out[9]: 3.141592653589793238462643383279502884197
+
+ In [10]: pi = sympy.pi.evalf(10000)
+
+ In [11]: digits = (d for d in str(pi)[2:]) # create a sequence of digits
+
+ In [12]: run pidigits.py # load one_digit_freqs/plot_one_digit_freqs
+
+ In [13]: freqs = one_digit_freqs(digits)
+
+ In [14]: plot_one_digit_freqs(freqs)
+ Out[14]: [<matplotlib.lines.Line2D object at 0x18a55290>]
+
+The resulting plot of the single digit counts shows that each digit occurs
+approximately 1,000 times, but that with only 10,000 digits the
+statistical fluctuations are still rather large:
+
+.. image:: single_digits.*
+
+It is clear that to reduce the relative fluctuations in the counts, we need
+to look at many more digits of pi. That brings us to the parallel calculation.
+
+Parallel calculation
+--------------------
+
+Calculating many digits of pi is a challenging computational problem in itself.
+Because we want to focus on the distribution of digits in this example, we
+will use pre-computed digit of pi from the website of Professor Yasumasa
+Kanada at the University of Tokoyo (http://www.super-computing.org). These
+digits come in a set of text files (ftp://pi.super-computing.org/.2/pi200m/)
+that each have 10 million digits of pi.
+
+For the parallel calculation, we have copied these files to the local hard
+drives of the compute nodes. A total of 15 of these files will be used, for a
+total of 150 million digits of pi. To make things a little more interesting we
+will calculate the frequencies of all 2 digits sequences (00-99) and then plot
+the result using a 2D matrix in Matplotlib.
+
+The overall idea of the calculation is simple: each IPython engine will
+compute the two digit counts for the digits in a single file. Then in a final
+step the counts from each engine will be added up. To perform this
+calculation, we will need two top-level functions from :file:`pidigits.py`:
+
+.. literalinclude:: ../../examples/kernel/pidigits.py
+ :language: python
+ :lines: 34-49
+
+We will also use the :func:`plot_two_digit_freqs` function to plot the
+results. The code to run this calculation in parallel is contained in
+:file:`docs/examples/kernel/parallelpi.py`. This code can be run in parallel
+using IPython by following these steps:
+
+1. Copy the text files with the digits of pi
+ (ftp://pi.super-computing.org/.2/pi200m/) to the working directory of the
+ engines on the compute nodes.
+2. Use :command:`ipcluster` to start 15 engines. We used an 8 core (2 quad
+ core CPUs) cluster with hyperthreading enabled which makes the 8 cores
+ looks like 16 (1 controller + 15 engines) in the OS. However, the maximum
+ speedup we can observe is still only 8x.
+3. With the file :file:`parallelpi.py` in your current working directory, open
+ up IPython in pylab mode and type ``run parallelpi.py``.
+
+When run on our 8 core cluster, we observe a speedup of 7.7x. This is slightly
+less than linear scaling (8x) because the controller is also running on one of
+the cores.
+
+To emphasize the interactive nature of IPython, we now show how the
+calculation can also be run by simply typing the commands from
+:file:`parallelpi.py` interactively into IPython:
+
+.. sourcecode:: ipython
+
+ In [1]: from IPython.kernel import client
+ 2009-11-19 11:32:38-0800 [-] Log opened.
+
+ # The MultiEngineClient allows us to use the engines interactively.
+ # We simply pass MultiEngineClient the name of the cluster profile we
+ # are using.
+ In [2]: mec = client.MultiEngineClient(profile='mycluster')
+ 2009-11-19 11:32:44-0800 [-] Connecting [0]
+ 2009-11-19 11:32:44-0800 [Negotiation,client] Connected: ./ipcontroller-mec.furl
+
+ In [3]: mec.get_ids()
+ Out[3]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
+
+ In [4]: run pidigits.py
+
+ In [5]: filestring = 'pi200m-ascii-%(i)02dof20.txt'
+
+ # Create the list of files to process.
+ In [6]: files = [filestring % {'i':i} for i in range(1,16)]
+
+ In [7]: files
+ Out[7]:
+ ['pi200m-ascii-01of20.txt',
+ 'pi200m-ascii-02of20.txt',
+ 'pi200m-ascii-03of20.txt',
+ 'pi200m-ascii-04of20.txt',
+ 'pi200m-ascii-05of20.txt',
+ 'pi200m-ascii-06of20.txt',
+ 'pi200m-ascii-07of20.txt',
+ 'pi200m-ascii-08of20.txt',
+ 'pi200m-ascii-09of20.txt',
+ 'pi200m-ascii-10of20.txt',
+ 'pi200m-ascii-11of20.txt',
+ 'pi200m-ascii-12of20.txt',
+ 'pi200m-ascii-13of20.txt',
+ 'pi200m-ascii-14of20.txt',
+ 'pi200m-ascii-15of20.txt']
+
+ # This is the parallel calculation using the MultiEngineClient.map method
+ # which applies compute_two_digit_freqs to each file in files in parallel.
+ In [8]: freqs_all = mec.map(compute_two_digit_freqs, files)
+
+ # Add up the frequencies from each engine.
+ In [8]: freqs = reduce_freqs(freqs_all)
+
+ In [9]: plot_two_digit_freqs(freqs)
+ Out[9]: <matplotlib.image.AxesImage object at 0x18beb110>
+
+ In [10]: plt.title('2 digit counts of 150m digits of pi')
+ Out[10]: <matplotlib.text.Text object at 0x18d1f9b0>
+
+The resulting plot generated by Matplotlib is shown below. The colors indicate
+which two digit sequences are more (red) or less (blue) likely to occur in the
+first 150 million digits of pi. We clearly see that the sequence "41" is
+most likely and that "06" and "07" are least likely. Further analysis would
+show that the relative size of the statistical fluctuations have decreased
+compared to the 10,000 digit calculation.
+
+.. image:: two_digit_counts.*
+
+
+Parallel options pricing
+========================
+
+An option is a financial contract that gives the buyer of the contract the
+right to buy (a "call") or sell (a "put") a secondary asset (a stock for
+example) at a particular date in the future (the expiration date) for a
+pre-agreed upon price (the strike price). For this right, the buyer pays the
+seller a premium (the option price). There are a wide variety of flavors of
+options (American, European, Asian, etc.) that are useful for different
+purposes: hedging against risk, speculation, etc.
+
+Much of modern finance is driven by the need to price these contracts
+accurately based on what is known about the properties (such as volatility) of
+the underlying asset. One method of pricing options is to use a Monte Carlo
+simulation of the underlying asset price. In this example we use this approach
+to price both European and Asian (path dependent) options for various strike
+prices and volatilities.
+
+The code for this example can be found in the :file:`docs/examples/kernel`
+directory of the IPython source. The function :func:`price_options` in
+:file:`mcpricer.py` implements the basic Monte Carlo pricing algorithm using
+the NumPy package and is shown here:
+
+.. literalinclude:: ../../examples/kernel/mcpricer.py
+ :language: python
+
+To run this code in parallel, we will use IPython's :class:`TaskClient` class,
+which distributes work to the engines using dynamic load balancing. This
+client can be used along side the :class:`MultiEngineClient` class shown in
+the previous example. The parallel calculation using :class:`TaskClient` can
+be found in the file :file:`mcpricer.py`. The code in this file creates a
+:class:`TaskClient` instance and then submits a set of tasks using
+:meth:`TaskClient.run` that calculate the option prices for different
+volatilities and strike prices. The results are then plotted as a 2D contour
+plot using Matplotlib.
+
+.. literalinclude:: ../../examples/kernel/mcdriver.py
+ :language: python
+
+To use this code, start an IPython cluster using :command:`ipcluster`, open
+IPython in the pylab mode with the file :file:`mcdriver.py` in your current
+working directory and then type:
+
+.. sourcecode:: ipython
+
+ In [7]: run mcdriver.py
+ Submitted tasks: [0, 1, 2, ...]
+
+Once all the tasks have finished, the results can be plotted using the
+:func:`plot_options` function. Here we make contour plots of the Asian
+call and Asian put options as function of the volatility and strike price:
+
+.. sourcecode:: ipython
+
+ In [8]: plot_options(sigma_vals, K_vals, prices['acall'])
+
+ In [9]: plt.figure()
+ Out[9]: <matplotlib.figure.Figure object at 0x18c178d0>
+
+ In [10]: plot_options(sigma_vals, K_vals, prices['aput'])
+
+These results are shown in the two figures below. On a 8 core cluster the
+entire calculation (10 strike prices, 10 volatilities, 100,000 paths for each)
+took 30 seconds in parallel, giving a speedup of 7.7x, which is comparable
+to the speedup observed in our previous example.
+
+.. image:: asian_call.*
+
+.. image:: asian_put.*
+
+Conclusion
+==========
+
+To conclude these examples, we summarize the key features of IPython's
+parallel architecture that have been demonstrated:
+
+* Serial code can be parallelized often with only a few extra lines of code.
+ We have used the :class:`MultiEngineClient` and :class:`TaskClient` classes
+ for this purpose.
+* The resulting parallel code can be run without ever leaving the IPython's
+ interactive shell.
+* Any data computed in parallel can be explored interactively through
+ visualization or further numerical calculations.
+* We have run these examples on a cluster running Windows HPC Server 2008.
+ IPython's built in support for the Windows HPC job scheduler makes it
+ easy to get started with IPython's parallel capabilities.
Oops, something went wrong.

0 comments on commit a4b0811

Please sign in to comment.