# `eido` command line usage

To use the command line application one just needs two required paths as arguments to the `eido` command:

- a path to a project configuration file (`-p`/`--pep` option)
- a path to a YAML formatted schema (`-s`/`--schema` option)

For this tutorial, let's grab a PEP from a public example repository that describes a few PRO-seq test samples:


In [12]:
git clone git@github.com:databio/ppqc.git

Cloning into 'ppqc'...
remote: Enumerating objects: 78, done.[K
remote: Counting objects: 100% (78/78), done.[K
remote: Compressing objects: 100% (56/56), done.[K
remote: Total 78 (delta 38), reused 61 (delta 22), pack-reused 0[K
Receiving objects: 100% (78/78), 36.38 KiB | 0 bytes/s, done.
Resolving deltas: 100% (38/38), done.
Checking connectivity... done.


In [4]:
cd ppqc

First, let's use `eido` to validate this project against the generic PEP schema. You just need to provide a path to the project config file and schema as an input.

In [7]:
eido -p ppqc_config.yaml -s http://schema.databio.org/PEP/pep.yaml

Reading sample annotations sheet: '/home/nsheff/code/eido/docs_jupyter/ppqc/ppqc_annotation_revised.csv'
Storing sample table from file '/home/nsheff/code/eido/docs_jupyter/ppqc/ppqc_annotation_revised.csv'
Reading subannotations: /home/nsheff/code/eido/docs_jupyter/ppqc/ppqc_subannotation.csv
Validation successful


Any PEP should validate against that schema, which describes generic PEP format. We can go one step further and validate it against the PEPPRO schema, which describes Proseq projects specfically for this pipeline:

In [8]:
eido -p ppqc_config.yaml -s http://schema.databio.org/pipelines/ProseqPEP.yaml

Reading sample annotations sheet: '/home/nsheff/code/eido/docs_jupyter/ppqc/ppqc_annotation_revised.csv'
Storing sample table from file '/home/nsheff/code/eido/docs_jupyter/ppqc/ppqc_annotation_revised.csv'
Reading subannotations: /home/nsheff/code/eido/docs_jupyter/ppqc/ppqc_subannotation.csv
Validation successful


This project would *not* validate against a different pipeline's schema:

In [10]:
eido -p ppqc_config.yaml -s http://schema.databio.org/pipelines/bedmaker.yaml

Reading sample annotations sheet: '/home/nsheff/code/eido/docs_jupyter/ppqc/ppqc_annotation_revised.csv'
Storing sample table from file '/home/nsheff/code/eido/docs_jupyter/ppqc/ppqc_annotation_revised.csv'
Reading subannotations: /home/nsheff/code/eido/docs_jupyter/ppqc/ppqc_subannotation.csv
Traceback (most recent call last):
  File "/home/nsheff/.local/bin/eido", line 8, in <module>
    sys.exit(main())
  File "/home/nsheff/.local/lib/python3.7/site-packages/eido/eido.py", line 189, in main
    validate_project(p, args.schema, args.exclude_case)
  File "/home/nsheff/.local/lib/python3.7/site-packages/eido/eido.py", line 118, in validate_project
    _validate_object(project_dict, _preprocess_schema(schema_dict), exclude_case)
  File "/home/nsheff/.local/lib/python3.7/site-packages/eido/eido.py", line 103, in _validate_object
    raise e
  File "/home/nsheff/.local/lib/python3.7/site-packages/eido/eido.py", line 100, in _validate_object
    jsonschema.validate(object, schema)
  File "

     'Sample_series_id': 'GSE63872',
     'Sample_source_name_ch1': 'HelaS3_cells',
     'Sample_status': 'Public on Jan 16 2015',
     'Sample_submission_date': 'Dec 04 2014',
     'Sample_supplementary_file_1': 'ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM1558nnn/GSM1558746/suppl/GSM1558746_GRO-seq_signal_minus.bedGraph.gz',
     'Sample_supplementary_file_2': 'ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM1558nnn/GSM1558746/suppl/GSM1558746_GRO-seq_signal_plus.bedGraph.gz',
     'Sample_taxid_ch1': '9606',
     'Sample_title': 'HelaS3_GRO-seq',
     'Sample_type': 'SRA',
     'cell_type': 'HelaS3',
     'derived_cols_done': [],
     'genome': 'hg38',
     'gsm_id': 'GSM1558746',
     'merged': True,
     'merged_cols': {'SRR': 'SRR1693611 SRR1693612',
                     'SRX': 'SRX796411 SRX796411',
                     'read1': '/project/shefflab/data/sra_bam//SRR1693611.bam '
                              '/project/shefflab/data/sra_bam//SRR1693612.bam',
                     'read1_ke

: 1

Optionally, to validate just the config part of the PEP or a specific sample, `-n`/`--sample-name` or `-c`/`--just-config` arguments should be used, respectively. Please refer to the help for more details:

In [11]:
eido -h

version: 0.0.6
usage: eido [-h] [--version] -p PEP -s SCHEMA [-e] [-n SAMPLE_NAME | -c]
            [--silent] [--verbosity V] [--logdev]

eido - validate project metadata against a schema

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  -p PEP, --pep PEP     PEP configuration file in yaml format.
  -s SCHEMA, --schema SCHEMA
                        PEP schema file in yaml format.
  -e, --exclude-case    Whether to exclude the validation case from an error.
                        Only the human readable message explaining the error
                        will be raised. Useful when validating large PEPs.
  -n SAMPLE_NAME, --sample-name SAMPLE_NAME
                        Name or index of the sample to validate. Only this
                        sample will be validated.
  -c, --just-config     Whether samples should be excluded from the
                        validation.
  --silent          