Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I'm probably doing something wrong.... #1577

Open
dbolser opened this issue Sep 27, 2023 · 4 comments
Open

I'm probably doing something wrong.... #1577

dbolser opened this issue Sep 27, 2023 · 4 comments
Labels

Comments

@dbolser
Copy link

dbolser commented Sep 27, 2023

The JSON Schema is here:
https://www.encodeproject.org/profiles/experiment#raw

I'm calling:
datamodel-codegen --input experiment.json --input-file-type jsonschema --output models/experiment.py --output-model-type pydantic_v2.BaseModel

The model is loaded perfectly by pydantic, but my specific data fails to validate:

$ python validate.py 
Validation failed!
165 validation errors for Experiment
biosample_ontology
  Input should be a valid string [type=string_type, input_value={'status': 'released', 's... 'K-562', 'K-562 cell']}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.4/v/string_type
analyses.0
  Input should be a valid string [type=string_type, input_value={'documents': ['/document...'ENCODE4 v1.2.1 GRCh38'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.4/v/string_type
analyses.1
  Input should be a valid string [type=string_type, input_value={'documents': ['/document...ENCODE4 v1.15.0 GRCh38'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.4/v/string_type
...

Here is the code of my simple validator script:

import json
from pydantic import ValidationError

from models.experiment import Experiment

def validate_json(json_data):
    try:
        Experiment(**json_data)
        print("Validation successful!")
    except ValidationError as e:
        print("Validation failed!")
        print(e)

if __name__ == "__main__":
    # Load JSON data to be validated
    with open('experiment-ENCSR545YBD.json', 'r') as f:
        data_json = json.load(f)
    
    # Validate
    validate_json(data_json)

The experiment JSON can be downloaded here:
https://www.encodeproject.org/experiments/ENCSR545YBD/?format=json

I guess it's to do with the complex data types not being strings, and being represented as $refs, but I don't know what I@m doing 😅

@dbolser
Copy link
Author

dbolser commented Sep 27, 2023

I have the feeling that I need to cross reference several other objects ... somehow...

@koxudaxi
Copy link
Owner

koxudaxi commented Oct 4, 2023

Thank you for creating the issue.
I have checked the data and the schema.
I think biosample_ontology should be string. But, the actual type is an object?

{
    "title": "Biosample ontology",
    "description": "An embeded property for linking to biosample type which describes the ontology of the biosample.",
    "comment": "See biosample_type.json for available identifiers.",
    "type": "string",
    "linkTo": "BiosampleType"
}
 "biosample_ontology": {"status": "released", 

@dbolser
Copy link
Author

dbolser commented Oct 4, 2023 via email

@dbolser
Copy link
Author

dbolser commented Oct 5, 2023

BTW, Do I need to worry about these warnings?

$ datamodel-codegen --input experiment.json --input-file-type jsonschema --output models/experiment.py --output-model-type pydantic_v2.BaseModel
/me/.venv/lib/python3.10/site-packages/datamodel_code_generator/parser/jsonschema.py:334: UserWarning: format of 'accession' not understood for 'string' - using default
  warn(f'format of {format__!r} not understood for {type_!r} - using default' '')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants