# Software I : Anaconda, AstroPy, and libraries

We will make extensive use of Python and various associated libraries and so the first thing we need to ensure is that we all have a common setup and are using the same software. The Python distribution that we have decided to use is <i>Anaconda</i> which can be downloaded from <a href="http://continuum.io/downloads">here</a> (although we hope that you have already done this prior to the school). Make sure that you installed the Python 2.7 version for your operating system (there is nothing wrong with Python 3.x but it is slightly different syntactically). Python 2 will be supported until 2020. We promote to write code that supports Python 2.7 and 3.x simultaneously, and you can use the future package explained <a href="http://python-future.org">here</a>

For installing Anaconda in **Linux** follow this <a href="https://www.continuum.io/downloads#linux">link</a>

For installing Anaconda in **Mac** follow this <a href="https://www.continuum.io/downloads#osx">link</a>

For installing Anaconda in **Windows** follow this <a href="https://www.continuum.io/downloads#windows">link</a>

We will follow the LSST Data Managment Style guide explained <a href="https://developer.lsst.io/coding/python_style_guide.html#pep-8-is-the-baseline-coding-style">here</a>

## Installing packages

One of the advantages of the <i>Anaconda</i> distribution is that it comes with many of the most commonly-used Python packages, such as <a href="http://www.numpy.org">numpy</a>, <a href="http://www.scipy.org">scipy</a>, and <a href="http://scikit-learn.org">scikit-learn</a>, preinstalled. However, if you do need to install a new package then it is very straightforward: you can either use the Anaconda installation tool <i>conda</i> or the generic Python tool <i>pip</i> (both use a different registry of available packages and sometimes a particular package will not available via one tool but will be via the other).

For example, <a href="https://github.com/bwlewis/irlbpy">irlbpy</a> is a superfast algorithm for finding the largest eigenvalues (and corresponding eigenvectors) of very large matrices. We can try to install it first with <i>conda</i>:

<code>conda install irlbpy</code>

but this will not find it:

<code>Fetching package metadata: ....
Error: No packages found in current osx-64 channels matching: irlbpy

You can search for this package on Binstar with <br/>
    binstar search -t conda irlbpy
</code>

so instead we try with <i>pip</i>:

**<code>pip install irlbpy</code>**

In the event that both fail, you always just download the package source code and then install it manually with:

<code>python install setup.py</code>

in the appropriate source directory.

We'll now take a brief look at a few of the main Python packages. 

## Python interpreter

The standard way to use the Python programming language is to use the Python interpreter to run python code. The python interpreter is a program that reads and execute the python code in files passed to it as arguments. At the command prompt, the command python is used to invoke the Python interpreter.

For example, to run a file my-program.py that contains python code from the command prompt, use::

$ python my-program.py

We can also start the interpreter by simply typing python at the command line, and interactively type python code into the interpreter.

<img src="images/python-screenshot.jpg" width="600">

This is often how we want to work when developing scientific applications, or when doing small calculations. But the standard python interpreter is not very convenient for this kind of work, due to a number of limitations.

## IPython

IPython is an interactive shell that addresses the limitation of the standard python interpreter, and it is a work-horse for scientific use of python. It provides an interactive prompt to the python interpreter with a greatly improved user-friendliness.

<img src="images/ipython-screenshot.jpg" width="600">

Some of the many useful features of IPython includes:

Command history, which can be browsed with the up and down arrows on the keyboard.
Tab auto-completion.
In-line editing of code.
Object introspection, and automatic extract of documentation strings from python objects like classes and functions.
Good interaction with operating system shell.
Support for multiple parallel back-end processes, that can run on computing clusters or cloud services like Amazon EE2.

## Jupyter notebook

<a href="http://ipython.org/notebook.html">Jupyter notebook</a> is an HTML-based notebook environment for Python, similar to Mathematica or Maple. It is based on the IPython shell, but provides a cell-based environment with great interactivity, where calculations can be organized and documented in a structured way.

<img src="images/ipython-notebook-screenshot.jpg" widt="600">

Although using a web browser as graphical interface, IPython notebooks are usually run locally, from the same computer that run the browser. To start a new Jupyter notebook session, run the following command:

$ jupyter-notebook

from a directory where you want the notebooks to be stored. This will open a new browser window (or a new tab in an existing window) with an index page where existing notebooks are shown and from which new notebooks can be created. Usually, the URL for the Jupyter notebook is <http://localhost:8888>

## NumPy

<a href="http://www.numpy.org">NumPy</a> is the main Python package for working with N-dimensional arrays. Any list of numbers can be recast as a NumPy array:

In [10]:
import numpy as np
x = np.array([1, 5, 3, 4, 2])
x

array([1, 5, 3, 4, 2])

Arrays have a number of useful methods associated with them:

In [12]:
print x.min(), x.max(), x.sum(), x.argmin(), x.argmax() 

1 5 15 0 1


and NumPy functions can act on arrays in an elementwise fashion: 

In [9]:
np.sin(x * np.pi / 180.)

array([ 0.01745241,  0.0348995 ,  0.05233596,  0.06975647,  0.08715574])

Ranges of values are easily produced:

In [13]:
np.arange(1, 10, 0.5)

array([ 1. ,  1.5,  2. ,  2.5,  3. ,  3.5,  4. ,  4.5,  5. ,  5.5,  6. ,
        6.5,  7. ,  7.5,  8. ,  8.5,  9. ,  9.5])

In [14]:
np.linspace(1, 10, 5)

array([  1.  ,   3.25,   5.5 ,   7.75,  10.  ])

In [16]:
np.logspace(1, 3, 5)

array([   10.        ,    31.6227766 ,   100.        ,   316.22776602,
        1000.        ])

Random numbers are also easily generated in the half-open interval [0, 1):

In [17]:
np.random.random(10)

array([ 0.32236496,  0.21506812,  0.43010248,  0.00518381,  0.76868494,
        0.40007316,  0.54393627,  0.47369813,  0.84379927,  0.64993354])

or from one of the large number of statistical distributions provided:

In [18]:
np.random.normal(loc = 2.5, scale = 5, size = 10)

array([ 0.0923285 ,  7.42538429,  1.76331778, -3.53594575,  8.68638027,
        2.71567963,  3.7712319 , -1.98054188, -0.82563295,  0.08646263])

Another useful method is the <i>where</i> function for identifying elements that satisfy a particular condition: 

In [19]:
x = np.random.normal(size = 100)
np.where(x > 3.)

(array([26]),)

Of course, all of these work equally well with multidimensional arrays.

In [25]:
x = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
np.sin(x)

array([[ 0.84147098,  0.90929743,  0.14112001, -0.7568025 , -0.95892427],
       [-0.2794155 ,  0.6569866 ,  0.98935825,  0.41211849, -0.54402111]])

Data can also be automatically loaded from a file into a Numpy array via the <i>loadtxt</i> or <i>genfromtxt</i> methods:

In [None]:
data = np.loadtxt("somefile.csv", delimiter = ",", skiprows = 3)

## SciPy

<a href="http://www.scipy.org">SciPy</a> provides a number of subpackages that deal with common operations in scientific computing, such as numerical integration, optimization, interpolation, Fourier transforms and linear algebra.

In [33]:
f = lambda x: np.cos(-x ** 2 / 9.)

x = np.linspace(0, 10, 11)
y = f(x)

from scipy.interpolate import interp1d
f1 = interp1d(x, y)
f2 = interp1d(x, y, kind = 'cubic')

from scipy.integrate import quad
print quad(f1, 0, 10)
print quad(f2, 0, 10)
print quad(f, 0, 10)

(1.6346035509274763, 1.1580617576001373e-08)
(1.2743983238992254, 1.1301087878068563e-08)
(1.4332524555959525, 5.534144099065977e-11)


## Scikit-learn

<a href="http://scikit-learn.org">scikit-learn</a> provides algorithms for machine learning tasks, such as classification, regression, and clustering, as well as associated operations, such as cross-validation and feature normalization. A related module is <a href="http://www.astroml.org">astroML</a> which is a wrapper around a lot of the scikit-learn routines but also offers some additional functionality and faster/alternate implementations of some methods.

## Pandas

<a href="http://pandas.pydata.org/index.html">pandas</a> offers data structures, particularly data frames, and operations for manipulating numerical tables and time series, such as fancy indexing, reshaping and pivoting, and merging, as well as a number of analysis tools. Although similar functionality already exists in numpy, pandas is highly optimized for performance and large data sets.

## AstroPy

<a href="http://www.astropy.org">AstroPy</a> aims to provide a core set of subpackages to specifically support astronomy. These include methods to work with image and table data formats, e.g., FITS, VOTable, etc., along with astronomical coordinate and unit systems, and cosmological calculations.

In [38]:
from astropy import units as u
from astropy.coordinates import SkyCoord

c = SkyCoord(ra = 10.625 * u.degree, dec = 41.2 * u.degree, frame = 'icrs')
print c.to_string('hmsdms')
print c.galactic

00h42m30s +41d12m00s
<SkyCoord (Galactic): (l, b) in deg
    (121.12334339, -21.6403587)>


In [39]:
from astropy.cosmology import WMAP9 as cosmo
print cosmo.comoving_distance(1.25), cosmo.luminosity_distance(1.25)

3944.5841858 Mpc 8875.31441806 Mpc


In [None]:
from astropy.io import fits
hdulist = fits.open('someimage.fits')
hdulist.info()

In [None]:
from astropy.io.votable import parse
votable = parse('sometable.xml')
table = votable.get_first_table()
data = table.array

A useful affiliated package is <a href="https://astroquery.readthedocs.org">Astroquery</a> which provides tools for querying astronomical web forms and databases. This is not part of the regular AstroPy distribution and needs to be installed separately. Whereas many data archives have standardized VO interfaces to support data access, Astroquery mimics a web browser and provides access via an archive's form interface. This can be useful as not all provided information is necesarily available via the VO.  

For example, the <a href="http://ned.ipac.caltech.edu">NASA Extragalactic Database</a> is a very useful human-curated resource for extragalactic objects. However, a lot of the information that is available via the web pages is not available through an easy programmatic API. Let's say that we want to get the list of object types associated with a particulae source:

In [41]:
from astroquery.ned import Ned
co = SkyCoord(ra = 56.38, dec = 38.43, unit = (u.deg, u.deg))
result = Ned.query_region(co, radius = 0.07 * u.deg)
set(result.columns['Type'])

{'G', 'RadioS', 'UvS'}

## Other libraries

For some of the other lectures or projects this week, you might also need to install the following Python packages:

<ul>
<li> photutils
<li> glue
</ul>