[BREAKING] KeyError: '_samples' Cant validate schema #33

G-kodes · 2021-10-29T10:33:28Z

I am using Eido as a part of snakemake to enforce the PEP Metadata format declaration for my work. I have written the required schemas after following the tutorial, however, when I try to run validation, I get the following error:

Traceback (most recent call last):
  File "/opt/homebrew/lib/python3.9/site-packages/snakemake/__init__.py", line 593, in snakemake
    workflow.include(
  File "/opt/homebrew/lib/python3.9/site-packages/snakemake/workflow.py", line 1182, in include
    exec(compile(code, snakefile.get_path_or_uri(), "exec"), self.globals)
  File "/Users/g-kodes/Documents/Pharmacogenetic-Analysis-Pipeline/workflow/Snakefile", line 31, in <module>
    # DEFINE CONTEXT-VARIABLES:
  File "/opt/homebrew/lib/python3.9/site-packages/snakemake/workflow.py", line 1267, in pepschema
    eido.validate_project(project=pep, schema=schema, exclude_case=True)
  File "/opt/homebrew/lib/python3.9/site-packages/eido/validation.py", line 45, in validate_project
    _validate_object(project_dict, preprocess_schema(schema_dict), exclude_case)
  File "/opt/homebrew/lib/python3.9/site-packages/eido/schema.py", line 32, in preprocess_schema
    "items" in schema_dict[PROP_KEY]["_samples"]
KeyError: '_samples'

I have tried importing my PEP schemas using peppy the indicated Python package, and it imports fine there. When I try to validate manually using the eido cli, I receive the following error which has the same issue, so I don't think this is a snakemake or peppy issue:

Traceback (most recent call last):
  File "/opt/homebrew/bin/eido", line 8, in <module>
    sys.exit(main())
  File "/opt/homebrew/lib/python3.9/site-packages/eido/cli.py", line 89, in main
    validate_project(p, args.schema, args.exclude_case)
  File "/opt/homebrew/lib/python3.9/site-packages/eido/validation.py", line 45, in validate_project
    _validate_object(project_dict, preprocess_schema(schema_dict), exclude_case)
  File "/opt/homebrew/lib/python3.9/site-packages/eido/schema.py", line 32, in preprocess_schema
    "items" in schema_dict[PROP_KEY]["_samples"]
KeyError: '_samples'

The text was updated successfully, but these errors were encountered:

stolarczyk · 2021-10-29T13:04:45Z

Could you post your PEP and the schema here? Also, what versions of peppy and eido are you using?

G-kodes · 2021-11-16T10:41:18Z

Peppy: v0.31.1
Eido: v0.1.5

PEP:

description: "Schema for PEP sample declaration information for this workflow"
pep_version: "2.1.0"
sample_table: "samples.csv"
subsample_table: "partitions.csv"

**PEP Schema: **

name: "My Pipeline"
description: My Description
imports:
  - http://schema.databio.org/pep/2.0.0.yaml
properties:
  sample_name:
    type: string
    description: Identifier
  dataset:
    type: string
    description: Dataset Name
  file:
    type: string
    description: Filename
  reference_genome:
    type: string
    description: reference ID
required:
  - sample_name
  - dataset
  - file
  - reference_genome
files:
  - file

nsheff · 2021-11-16T13:30:11Z

I can confirm this also happens on eido v0.1.6-dev

nsheff · 2021-11-16T13:38:51Z

Hi @G-kodes your schema isn't reflecting the structure of the PEP correctly, which has both config and samples sections. Update your schema to this:

name: "My Pipeline"
description: My Description
imports:
  - http://schema.databio.org/pep/2.0.0.yaml
properties:
  samples:
    type: array
    items:
      type: object
      properties:
        sample_name:
          type: string
          description: Identifier
        dataset:
          type: string
          description: Dataset Name
        file:
          type: string
          description: Filename
        reference_genome:
          type: string
          description: reference ID
      required:
        - sample_name
        - dataset
        - file
        - reference_genome
      files:
        - file
required: 
  - samples

It should work. You can see some more example schemas here: https://github.com/databio/schema.databio.org/tree/master/pipelines

nsheff · 2021-11-16T14:55:50Z

Your schema wasn't what you intended, but also, eido wasn't correctly interpreting this schema (which should have been asking for those properties on the main item). So, the error should have been different than what it showed, which may have pointed you down the path to fix your schema.

So, I've fixed the error message, so this will be more informative if a future user tries to do the same thing. This will show up on the next release (0.1.6).

G-kodes · 2021-11-19T07:46:30Z

@nsheff thank you very much for the help! I guess this is chalked up to me not understanding the documentation properly so feeling a bit dumb this side XD.

I have updated my schema as you have directed, however unfortunately I am still getting an error:

Traceback (most recent call last):
  File "/opt/homebrew/lib/python3.9/site-packages/snakemake/__init__.py", line 593, in snakemake
    workflow.include(
  File "/opt/homebrew/lib/python3.9/site-packages/snakemake/workflow.py", line 1182, in include
    exec(compile(code, snakefile.get_path_or_uri(), "exec"), self.globals)
  File "/Users/g-kodes/Documents/Pharmacogenetic-Analysis-Pipeline/workflow/Snakefile", line 37, in <module>
  File "/opt/homebrew/lib/python3.9/site-packages/snakemake/workflow.py", line 1267, in pepschema
    eido.validate_project(project=pep, schema=schema, exclude_case=True)
  File "/opt/homebrew/lib/python3.9/site-packages/eido/validation.py", line 45, in validate_project
    _validate_object(project_dict, preprocess_schema(schema_dict), exclude_case)
  File "/opt/homebrew/lib/python3.9/site-packages/eido/schema.py", line 32, in preprocess_schema
    "items" in schema_dict[PROP_KEY]["_samples"]
TypeError: argument of type 'NoneType' is not iterable

Am I just missing something basic here?

nsheff · 2021-11-19T12:43:28Z

Hmm. I am not reproducing this, using either eido 0.1.5 or 0.1.6. Can you provide your complete PEP (you posted the yaml, I need the csvs) by chance? ~~This error appears to be indicating that you don't have any samples in your PEP, but that's confusing to me.~~ It looks like the samples array in your schema isn't being recognized.

Are you sure you copied the schema correctly?

nsheff · 2021-11-19T12:47:29Z

Also, you might try with the latest dev version which is almost released, (0.1.6)

pip install https://github.com/pepkit/eido/archive/refs/heads/dev.zip

You can check version with eido --version.

G-kodes · 2021-11-23T07:12:11Z

Hi! My apologies, this is turning into a support ticket!

My edit version output is eido 0.1.5

and my complete PEP file is:

description: "Schema for PEP sample declaration information for this workflow"
pep_version: "2.1.0"
sample_table: "samples.csv"

I have attempted to use the newer Eido version you indicated above, however it appears to be dependant on an un-released version of peppy, version >=0.32.0. I can only find up till version 0.31.2 online?

nsheff · 2021-11-23T12:37:49Z

If samples.csv doesn't exist, you should be getting this exception:

peppy.exceptions.SampleTableFileException: Could not read table: samples.csv. Caught exception: FileNotFoundError(2, 'No such file or directory')

If I do have a CSV file with at least 1 sample, I can validate that PEP with eido 0.1.5:

cat samples.csv

sample_name,attr,dataset,file,reference_genome
test1,test2,test,test,test

eido --version
> eido 0.1.5
eido validate pep.yaml -s pep_schema.yaml 
> Validation successful
cat pep.yaml

Result:

description: "Schema for PEP sample declaration information for this workflow"
pep_version: "2.1.0"
sample_table: "samples.csv"

And for the schema:

cat pep_schema.yaml

Result:

name: "My Pipeline"
description: My Description
imports:
  - http://schema.databio.org/pep/2.0.0.yaml
properties:
  samples:
    type: array
    items:
      type: object
      properties:
        sample_name:
          type: string
          description: Identifier
        dataset:
          type: string
          description: Dataset Name
        file:
          type: string
          description: Filename
        reference_genome:
          type: string
          description: reference ID
      required:
        - sample_name
        - dataset
        - file
        - reference_genome
      files:
        - file
required:
  - samples

Really, eido 0.1.5 should be working for you. But anyway, if you want to upgrade, peppy 0.32.0 is pending release. You can install 0.32.0 of peppy in the same way, sorry about that:

pip install https://github.com/pepkit/eido/archive/refs/heads/dev.zip

G-kodes · 2022-01-11T06:25:59Z

Ok so... New Year and I have finally taken enough of a break and reset to fix this issue once and for all and it's a stupid mistake XD.

So basically my issue was an indentation (or lack thereof) in my schema reference file. Basically, I messed up the indentation, so I had:

name: "My Pipeline"
description: My Description
imports:
  - http://schema.databio.org/pep/2.1.0.yaml
properties:
  samples:
  type: array
  items:
    ...<insert rest here>

when I needed to add indents like so:

name: "My Pipeline"
description: My Description
imports:
  - http://schema.databio.org/pep/2.1.0.yaml
properties:
  samples:
    type: array
    items:
      ...<insert rest here>

The moral of the story, if you can't fix it and you're getting frustrated, take a break 😂

nsheff added the bug Something isn't working label Nov 16, 2021

nsheff added the likely-solved label Nov 16, 2021

G-kodes closed this as completed Jan 11, 2022

G-kodes mentioned this issue Jan 11, 2022

[FEATURE] | Ratify a CI implementation Tuks-ICMM/Pharmacogenetic-Analysis-Pipeline#19

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BREAKING] KeyError: '_samples' Cant validate schema #33

[BREAKING] KeyError: '_samples' Cant validate schema #33

G-kodes commented Oct 29, 2021

stolarczyk commented Oct 29, 2021 •

edited

Loading

G-kodes commented Nov 16, 2021 •

edited

Loading

nsheff commented Nov 16, 2021

nsheff commented Nov 16, 2021

nsheff commented Nov 16, 2021 •

edited

Loading

G-kodes commented Nov 19, 2021

nsheff commented Nov 19, 2021 •

edited

Loading

nsheff commented Nov 19, 2021

G-kodes commented Nov 23, 2021

nsheff commented Nov 23, 2021 •

edited

Loading

G-kodes commented Jan 11, 2022

[BREAKING] KeyError: '_samples' Cant validate schema #33

[BREAKING] KeyError: '_samples' Cant validate schema #33

Comments

G-kodes commented Oct 29, 2021

stolarczyk commented Oct 29, 2021 • edited Loading

G-kodes commented Nov 16, 2021 • edited Loading

nsheff commented Nov 16, 2021

nsheff commented Nov 16, 2021

nsheff commented Nov 16, 2021 • edited Loading

G-kodes commented Nov 19, 2021

nsheff commented Nov 19, 2021 • edited Loading

nsheff commented Nov 19, 2021

G-kodes commented Nov 23, 2021

nsheff commented Nov 23, 2021 • edited Loading

G-kodes commented Jan 11, 2022

stolarczyk commented Oct 29, 2021 •

edited

Loading

G-kodes commented Nov 16, 2021 •

edited

Loading

nsheff commented Nov 16, 2021 •

edited

Loading

nsheff commented Nov 19, 2021 •

edited

Loading

nsheff commented Nov 23, 2021 •

edited

Loading