# 1.2 Exploring Data

## Finding jobs

In [section one](signac_101_Getting_Started.ipynb) of this tutorial, we evaluated the ideal gas equation and stored the results in the *job document* and in a file called `V.txt`.
Let's now have a look at how we can explore our data space for basic and advanced analysis.

We already saw how to iterate over the *complete* data space using the "`for job in project`" expression.
This is a short-hand notation for "`for job in project.find_jobs()`", meaning: "find **all** jobs".

Instead of finding all jobs, we can also find a subset using *filters*.

Let's get started by getting a handle on our project using the `get_project()` function.
We don't need to initialize the project again, since we already did that in section 1.

In [None]:
import signac
project = signac.get_project('projects/tutorial')

Next, we assume that we would like to find all jobs, where *p=10.0*. For this, we can use the `find_jobs()` method, which takes a dictionary of parameters as filter argument.

In [None]:
for job in project.find_jobs({'p': 10.0}):
    print(job.statepoint())

In this case, that is of course only a single job.

You can execute the same kind of find operation on the [command line](signac_105_Command_Line_Interface.ipynb) with `$ signac find`, as will be shown later.

While the filtering method is optimized for a simple dissection of the data space, it is possible to construct more complex query routines for example using [list comprehensions](https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions).

This is an example for how to select all jobs where the pressure *p* is greater than 0.1:

In [None]:
jobs_p_gt_0_1 = [job for job in project if job.sp.p > 0.1]
for job in jobs_p_gt_0_1:
    print(job.statepoint(), job.document)

Finding jobs by certain criteria requires an index of the data space.
In all previous examples this index was created implicitly, however depending on the data space size, it may make sense to create the index explicitly for multiple uses. This is shown in the next section.

## Indexing

An index is a complete record of the data and its associated metadata within our project’s data space. To generate an index for our project's data space, use the `index()` method:

In [None]:
for doc in project.index():
    print(doc)

Using and [index](http://signac.readthedocs.io/en/latest/indexing.html) to operate on data is particular useful in later stages of a computational investigation, where data may come from different projects and the actual storage location of files is less important.

You can store the index wherever it may be useful, e.g., a file, a database, or even just in a variable for repeated find operations within one script.
The **signac** framework provides the `Collection` class, which can be utilized to manage indeces in memory and on disk.

In [None]:
index = signac.Collection(project.index())

for doc in index.find({'statepoint.p': 10.0}):
    print(doc)

## Views

Sometimes we want to examine our data on the file system directly. However the file paths within the workspace are obfuscated by the *job id*. The solution is to use *views*, which are human-readable, maximally compact hierarchical links to our data space.

To create a linked view we simply execute the `create_linked_view()` method within python or the `$ signac view` command on the [command line](signac_105_Command_Line_Interface.ipynb).

In [None]:
project.create_linked_view(prefix='projects/tutorial/view')
%ls projects/tutorial/view

The view paths only contain parameters which actually vary across the different jobs.
In this example, that is only the pressure *p*.

This allows us to examine the data with highly-compact human-readable path names:

In [None]:
%ls 'projects/tutorial/view/p/1.0/job/'
%cat 'projects/tutorial/view/p/1.0/job/V.txt'

**NOTE: Update your view after adding or removing jobs by executing the view command for the same prefix again!**

Tip: Consider creating a linked view for large data sets on an [**in-memory** file system](https://en.wikipedia.org/wiki/Tmpfs) for best performance!

The [next section](signac_103_A_Basic_Workflow.ipynb) will demonstrate how to implement a basic, but complete workflow for more expensive computations.