# How do I get a list of all files that _match_ a particular metadata _property_?

### Overview
Here we focus on listing all files within a single project that **match** a particular metadata property. One _use-case_ which will benefit greatly from this is:

 * I have _hundreds_ of files in my project
 * I want to run a task(s) which only uses _type X_ files
 * I want to query all _type X_ files with one call

Our prior examples of doing this (e.g. the _Organizing files into a Cohort_ cells [here](https://github.com/sbg/okAPI/blob/advanced_access/Tutorials/CGC/batch_SAMtoolsView.ipynb) followed the general strategy of:

 1. List all the files (n = _N_)
 2. Loop through the list
 3. Split off the file extension and see if it's _feasible_
 4. Get the metadata of any feasible files (as [here](files_detailOne.ipynb))
 5. If the property matches, add it to a _list_ of files to process

This works, but will result in up to **_N_+1 API calls**. Here we will show how to do this with only **one API** call and show the speed improvement. If you run this code in the next ten minutes, we'll include the **special bonus** of searching for files using a list of names!

### Prerequisites
 1. You need your _authentication token_ and the API needs to know about it. See <a href="Setup_API_environment.ipynb">**Setup_API_environment.ipynb**</a> for details.
 2. You understand how to <a href="projects_listAll.ipynb" target="_blank">list</a> projects you are a member of (we will just use that call directly and pick one here).
 3. You have already cloned the Public Project _Cancer Cell Line Encyclopedia (CCLE)_.
 
## Imports
We import the _Api_ class from the official sevenbridges-python bindings below.

In [None]:
import sevenbridges as sbg

## Initialize the object
The `Api` object needs to know your **auth\_token** and the correct path. Here we assume you are using the credentials file in your home directory. For other options see <a href="Setup_API_environment.ipynb">Setup_API_environment.ipynb</a>

In [None]:
# [USER INPUT] specify platform {cgc, sbpla, etc}
prof = 'sbpla'


config_file = sbg.Config(profile=prof)
api = sbg.Api(config=config_file)

## Search by metadata
This is the **optimal** way to query files matching a particular metadata. Here, we check two (hard-coded) metadata properties. It's possible to check as many as you'd like. We are going to use _Copy of Cancer Cell Line Encyclopedia (CCLE)_ which is a nice big project with **2555** files.

In [None]:
# [USER INPUT] Set metadata properties and values here; set project name
# Note that you can have multiple apps or projects with the same name. It is best practice to reference entities by ID.
project_name = 'Copy of Cancer Cell Line Encyclopedia (CCLE)'
metadata_to_match = {'experimental_strategy': 'WXS',
                     'platform':'Illumina'}

# Find project
my_project = [p for p in api.projects.query(limit=100).all()
              if p.name == project_name]

if not my_project:  #    empty list is False, {list, tuple, etc} is True
    print('Target project ({}) not found, please check spelling'.format(project_name))
    raise KeyboardInterrupt
else:
    my_project = my_project[0]
    my_project = api.projects.get(id=my_project.id)

# How many files do we have?
my_files = api.files.query(project = my_project)
print('There are {} files in your project'.format(my_files.total))

# Query by metadata
my_matched_files = api.files.query(
    project=my_project, limit=100, 
    metadata=metadata_to_match)
 
print("""
There are {} files matching the metadata criteria.
This is {} percent of the dataset.
""".format(my_matched_files.total,
              100*(my_matched_files.total/my_files.total)))

## Additional Information
Detailed documentation of this particular REST architectural style request is available [here](http://docs.cancergenomicscloud.org/docs/list-files-in-a-project)