# 1.2 Exploring Data

## Finding jobs

In [section one](signac - 1.1 Getting started.ipynb) of this tutorial, we evaluated the ideal gas equation and stored the results in the *job document* and in a file called `V.txt`.
Let's now have a look at how we can explore our data space for basic and advanced analysis.

We already saw how to iterate over the *complete* data space using the `find_jobs()` method.
Instead of finding all jobs, we can also find a subset using *filters*.

Let's get started by getting a handle on our project using the `get_project()` function.
We don't need to initialize the project again, since we already did that in section 1.

In [None]:
import signac
project = signac.get_project('../../projects/tutorial')

Next, we assume that we would like to find all jobs, where *p=10.0*. For this, we can use the find_jobs function, which takes a dictionary of parameters as the input:

In [None]:
for job in project.find_jobs({'p': 10.0}):
    print(job.statepoint())

In this case, that is only a single job.
You can execute the same kind of filtering method on the command line with:

    $ signac find '{"p": 10.0}'
    5a456c131b0c5897804a4af8e77df5aa
    
Use the `signac statepoint` command line function to inspect the statepoint of that result

    $ signac find '{"p": 10.0}' | xargs signac statepoint
    {"p": 10.0, "kT": 1.0, "N": 1000}

The filtering method is optimized for simple dissection of the data space.

We can construct more complex query routines. One way to do so is using list comprehensions.
This is an example for how to select all jobs where the pressure *p* is greater than 0.1:

In [None]:
jobs_p_gt_0_1 = [job for job in project.find_jobs() if job.statepoint()['p'] > 0.1]
for job in jobs_p_gt_0_1:
    print(job.statepoint(), job.document)

Finding jobs by certain criteria requires an index of the data space.
In the previous examples this index was created implicitly, however depending on the data space size, it may make sense to create the index explicitly for multiple uses. This is shown in the next section.

## Indexing

An index is a complete record of the data and its associated metadata within our project’s data space. To create an index, we need to crawl through the project’s data space, for example by calling the `index()` method:

In [None]:
for doc in project.index():
    print(doc)

Each index document contains at least the state point parameters and the contents of the *job document*.

You can generate an in index on the command line with:

    $ signac index
    
You can store the index for example in a variable, a file, or a database.

In [None]:
# Create the index once
index = list(project.index())

# Use it multiple times
for job in project.find_jobs({'p': 10.0}, index=index):
    print(job.statepoint())
    
for job in project.find_jobs({'p': 1.0}, index=index):
    print(job.statepoint())

## Views

Sometimes we want to examine our data on the file system directly. However the file paths within the workspace are obfuscated by the *job id*. The solution is to use *views*, which are human-readable, maximally compact hierarchical links to our data space.

To create a linked view we simply execute the `create_linked_view()` method within python or the `$ signac view` command on the command line:

In [None]:
# Create an empty view directory
% rm -rf ../../projects/tutorial/view
% mkdir ../../projects/tutorial/view

# Create the linked view within the view directory
project.create_linked_view(prefix='../../projects/tutorial/view')
% ls projects/tutorial/view

The view paths only contain parameters which actually vary across the different jobs.
In this example, that is only the pressure *p*.

This allows us to examine the data with highly-compact human-readable path names:

In [None]:
% ls '../../projects/tutorial/view/p_1.0/job/'
% cat '../../projects/tutorial/view/p_1.0/job/V.txt'

Tip: Consider creating a linked view for large data sets on an **in-memory** file system for best performance!

The [next section](signac - 1.3 A Basic Workflow.ipynb) will demonstrate how to implement a basic, but complete workflow for more expensive computations.