# How do I get a list of all files (including subfolders) that _match_ a particular metadata _property_?

### Overview

This recipe is mainly used to show how to run through the project folder structure and find files of interest. Here we focus on listing all files within a single project that a particular metadata property, but same can be applied on other attributes, such as file name. One _use-case_ which will benefit greatly from this is:

 * I have _hundreds_ of files in my project
 * I want to run a task(s) which only uses _type X_ files
 * I want to query all _type X_ files with one call
 * I want to traverse the subdirectories

You can use the basic strategy to:

 1. List all the files and folders (n = _N_)
 2. Loop through the list 
 3. If item is a file - get the metadata of every file (as [here](files_detailOne.ipynb)), if the property matches, add it to a _list_ of files to process
 4. Else if item is a folder start from 1. for that folder


### Prerequisites
 1. You need your _authentication token_ and the API needs to know about it. See <a href="Setup_API_environment.ipynb">**Setup_API_environment.ipynb**</a> for details.
 2. You understand how to <a href="projects_listAll.ipynb" target="_blank">list</a> projects you are a member of (we will just use that call directly and pick one here).
 3. You have already cloned the Public Project _Cancer Cell Line Encyclopedia (CCLE)_.
 
## Imports
We import the _Api_ class from the official sevenbridges-python bindings below.

In [None]:
import sevenbridges as sbg

## Initialize the object
The `Api` object needs to know your **auth\_token** and the correct path. Here we assume you are using the credentials file in your home directory. For other options see <a href="Setup_API_environment.ipynb">Setup_API_environment.ipynb</a>

In [None]:
# [USER INPUT] specify credentials file profile {cgc, sbg, default}
prof = 'default'

config_file = sbg.Config(profile=prof)
api = sbg.Api(config=config_file)

## Search by metadata


In [None]:
# [USER INPUT] Set metadata properties and values here; set project ID
project_id = 'my-name/my-project'
metadata_to_match = {'experimental_strategy': 'WXS',
                     'platform':'Illumina'}


def find_files_by_metadata(project_id, metadata_to_match, parent=None):
    """
    If parent is set, it is a folder id to search in.
    """

    
    if not parent:
        # Query by metadata in the root
        matched_files = list(api.files.query(
            project=project_id, limit=100, 
            metadata=metadata_to_match).all())
        for item in api.files.query(limit=100, project=project_id).all():
            if item.is_folder():
                matched_files.extend(find_files_by_metadata(project_id, metadata_to_match, item.id))
    else:
        # Query by metadata in the folder
        matched_files = list(api.files.query(
            parent=parent, limit=100, 
            metadata=metadata_to_match).all())
        for item in api.files.query(limit=100, parent=parent).all():
            if item.is_folder():
                matched_files.extend(find_files_by_metadata(project_id, metadata_to_match, item.id))
    return matched_files
            
    
matched_files = find_files_by_metadata(project_id, metadata_to_match)
print("Total matched files {}".format(len(matched_files)))

## Additional Information
Detailed documentation of this particular REST architectural style request is available [here](http://docs.cancergenomicscloud.org/docs/list-files-in-a-project)