Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JsonValidationService.readSchema(InputStream is) doesn't correctly resolve http $ref #6

Open
chadlankford opened this issue Feb 27, 2019 · 18 comments
Labels
enhancement New feature or request question Further information is requested

Comments

@chadlankford
Copy link

chadlankford commented Feb 27, 2019

valid schema reference is built taking the value of $ref and applying it relative to the domain of the current schema. It yields a valid URL of the format http://mydomain/myschemafile.json which resolves. however I get the error message:

org.leadpony.justify.api.JsonValidatingException: [line:6,column:65] The schema reference "/myschemafile.json"(http://mydomain/myschemafile.json) cannot be resolved.

since a valid URL is built, curious why it isn't able to resolve it? could you please support this mechanism for referencing an external hosted file in $ref? fyi - I also tried putting the full url in the $ref rather than a domain relative path, same result.

thanks in advance.

@chadlankford
Copy link
Author

chadlankford commented Feb 28, 2019

I do realize the draft 07 docs give some latitude to the validator impl regarding resolving $ref per the following:

"Even though the value of a $ref is a URI, it is not a network locator, only an identifier. This means that the schema doesn’t need to be accessible at that URI, but it may be. It is basically up to the validator implementation how external schema URIs will be handled, but one should not assume the validator will fetch network resources indicated in $ref values."

however, if the resulting uri reference is http, it should resolve it over the network. Besides, it would be super convenient if it worked that way.

@chadlankford
Copy link
Author

I ended up finding and using the mechanism in your api to implement my own JsonSchemaResolver. That works for me.

@leadpony
Copy link
Owner

Hello @chadlankford. Thank you for using this small library.
I am happy to hear you are in the right way. Please see also Schema Resolver in Justify Examples.
Thank you.

@chadlankford
Copy link
Author

one other question. is there a way to output the effective schema after the $refs are resolved?

@leadpony
Copy link
Owner

leadpony commented Feb 28, 2019

Do you mean you would like to merge the referencing schema and the referenced schemas into a large one schema and output it to a file ?
No, currently there is no such a way provided.
For example, how do you obtain the effective schema from the following one without "$ref"s ?

{
    "$id": "http://example.org/example.schema.json",
    "type": "object",
    "properties": {
        "foo": {
            "$ref": "http://example.org/example.schema.json"
        }
    }
}

@leadpony leadpony added the question Further information is requested label Feb 28, 2019
@chadlankford
Copy link
Author

yeah, I didn't think about the recursive reference situation. I suppose you would just have to make a decision to stop resolving the references for effective schema once recursive situation is detected. maybe, inject a $comment in the effective schema to represent why a reference was not resolved.

@leadpony leadpony added the enhancement New feature or request label Mar 1, 2019
@leadpony
Copy link
Owner

leadpony commented Mar 1, 2019

Here is a referencing schema.

{
    "$id": "https://example.org/a.schema.json",
    "type": "object",
    "properties": {
        "foo": {
            "$ref": "b.schema.json"
        }
    }
}

And this is the second schema referenced from the first one.

{
    "$id": "https://example.org/b.schema.json",
    "type": "integer",
    "minimum": 0
}

Then you can merge the second schema into the first one using definitions keyword as follow.

{
    "$id": "https://example.org/a.schema.json",
    "type": "object",
    "properties": {
        "foo": {
            "$ref": "b.schema.json"
    },
    "definitions": {
        "name-whatever-you-like": {
            "$id": "https://example.org/b.schema.json",
            "type": "integer",
            "minimum": 0
        }
    }
}

Practically, why do you need to merge multiple schema files into one?
Your JsonSchemaResolver implementation can easily resolve the referenced schemas with "$id" of "http:" or "https:" scheme from your local file system, that is very common way when using multiple schemas.
Just for debugging purpose?

@chadlankford
Copy link
Author

chadlankford commented Mar 1, 2019

yes, the utility of this is strictly for debugging in the cases of complex object graphs whose schemas make heavy use of $ref. Sometimes it can be helpful.

@chadlankford
Copy link
Author

chadlankford commented Mar 1, 2019

Another question for you...

I have implemented a custom problem handler which basically outputs the problem parameters map. The output is pretty good except it would be nice if the there was more context to the problem. For example, if it problem says a property foo is required, it would nice to see at least to which object the property is supposed to belong. Same situation if the problem is a value validation. Right now, value validation problems just indicate actual and expected value definition maps, but should indicate a property with context.

To me, the best solution is to include the entire object path to the property at the center of the problem, using either a json pointer or some sort of json pathing notation, ie, "foo.bar.key".

Like the effective schema issue I mentioned, this is also something that is very useful for deep/complex object validations.

Does that make sense?

@leadpony
Copy link
Owner

leadpony commented Mar 2, 2019

Thank you @chadlankford. Now I go it. Please see the issue #5 because it is related to your second request.

@rconnacher
Copy link

Hello @chadlankford,
Would you be willing to post your implementation of JsonSchemaResolver? I'm scratching my head on how to resolve remote references.
Thanks!

@leadpony
Copy link
Owner

leadpony commented May 4, 2019

Hello @rconnacher
Thank you for contacting me.
Strictly not for remote schemas, but have you seen the code sample ?
Another code available here also does connecting to the schemas at local web server via HTTP.
Thank you.

@rconnacher
Copy link

Thanks, and Hi @leadpony,
In my use case the schemas will be remote, so I reviewed your second example (AbstractConformanceTest)

In my experiment I define a JsonSchemaReaderFactory as in your example, and create a JsonSchemaReader using StringReader over the schema text. (I'm coding in Groovy, so I had to port your example.)
Using 'http://bmeta.berkeley.edu/common/apiResponseSchemaV1.json' as the schemaUri, and

{
  "correlationId": "a56b96cd-d2f7-49e6-9094-8cfdf38850d6",
  "foo": "bar",
  "response": [
    {
      "identifiers": [
        { "type": "campus-uid", "id": "10746" },
        { "type": "calnet-id",  "id": "russellc" },
      ],
      "names": [
        { 
            "type": { "code": "PRI", "description": "Primary" },
            "familyName": "Connacher",
            "givenName": "Russell"
        }
      ],
      "phones": [
        {
            "foo": "bar"
        }
      ],
      "emails": [
        {
            "type": { "code": "BUSN", "description": "Business"  },
            "emailAddress": "russellc@berkeley.edu",
            "primary": true
        }
      ]
    }
  ]
}

as the jsonData I run this:

import javax.json.JsonReader
import org.leadpony.justify.api.JsonSchema
import org.leadpony.justify.api.JsonSchemaReader
import org.leadpony.justify.api.JsonSchemaReaderFactory
import org.leadpony.justify.api.JsonSchemaResolver
import org.leadpony.justify.api.JsonValidationService
import org.leadpony.justify.api.Problem
import org.leadpony.justify.api.ProblemHandler

new JsonResolutionTest().validate(schemaUri, jsonData)

class JsonResolutionTest  {

    private static JsonValidationService service
    private static JsonSchemaReaderFactory schemaReaderFactory

    private static void validate(String schemaUri, String jsonData) {
        try {
            service = JsonValidationService.newInstance()
            schemaReaderFactory = service.createSchemaReaderFactoryBuilder()
                    .withSchemaResolver(JsonResolutionTest::resolveSchema)
                    .build()

            def schemaData = schemaUri.toURL().text
            JsonSchemaReader schemaReader = schemaReaderFactory.createSchemaReader(new StringReader(schemaData))
            JsonSchema schema = schemaReader.read()

            List<Problem> problems = new ArrayList()
            ProblemHandler handler = problems.addAll()

            JsonReader jsonReader = service.createReader(new StringReader(jsonData), schema, handler)
            jsonReader.readValue()

            println( problems.toString() )

        } catch (Exception e) {
            println( e.getMessage() )
            e.getStackTrace().each {
                println( it.toString() )
            }
        }

    }

    private static JsonSchemaResolver resolveSchema (URI id) {
        try {
            InputStream stream = id.toURL().openStream()
            JsonSchemaReader reader = schemaReaderFactory.createSchemaReader(stream)
            return reader.read() as JsonSchemaResolver
        } catch (Exception e) {
            println(e.getMessage())
            e.getStackTrace().each {
                println(it.toString())
            }
            return null
        }
    }
}

But when I try to read the schema, I get a casting exception:

BasicSchema$None1_groovyProxy cannot be cast to org.leadpony.justify.api.JsonSchema
com.sun.proxy.$Proxy11.resolveSchema(Unknown Source)
org.leadpony.justify.internal.schema.io.AbstractBasicSchemaReader.resolveSchema(AbstractBasicSchemaReader.java:246)
org.leadpony.justify.internal.schema.io.AbstractBasicSchemaReader.dereferenceSchema(AbstractBasicSchemaReader.java:230)
org.leadpony.justify.internal.schema.io.AbstractBasicSchemaReader.resolveAllReferences(AbstractBasicSchemaReader.java:214)
org.leadpony.justify.internal.schema.io.AbstractBasicSchemaReader.postprocess(AbstractBasicSchemaReader.java:188)
org.leadpony.justify.internal.schema.io.AbstractBasicSchemaReader.readSchema(AbstractBasicSchemaReader.java:93)
org.leadpony.justify.internal.schema.io.AbstractSchemaReader.read(AbstractSchemaReader.java:49)
org.leadpony.justify.internal.schema.io.AbstractProbeSchemaReader.readSchema(AbstractProbeSchemaReader.java:48)
org.leadpony.justify.internal.schema.io.AbstractSchemaReader.read(AbstractSchemaReader.java:49)
org.leadpony.justify.api.JsonSchemaReader$read.call(Unknown Source)

That last call, "AbstractBasicSchemaReader.resolveSchema" is here:

    private JsonSchema resolveSchema(URI id) {
        JsonSchema schema = (JsonSchema)this.idSchemaMap.get(id);
        if (schema != null) {
            return schema;
        } else {
            Iterator var3 = this.resolvers.iterator();

            do {
                if (!var3.hasNext()) {
                    return null;
                }

                JsonSchemaResolver resolver = (JsonSchemaResolver)var3.next();
                schema = resolver.resolveSchema(id);     // exception is thrown here
            } while(schema == null);

            return schema;
        }
    }

I'm using Groovy 3.0.0, where method references (like ".withSchemaResolver(AbstractConformanceTest::resolveSchema") are pretty new. Is this possibly the problem? Can you suggest any way of creating a reader that can resolve remote references without that recursive sort of configuration?

Thanks very much!

@chadlankford
Copy link
Author

chadlankford commented May 5, 2019

@leadpony, @rconnacher

Below, you will find a rough outline of what I did. Basically, the NetworkJsonSchemaResolver, just retrieves the remote file, reads it in with a new SchemaReaderFactory on which its sets itself to the resolver. The getSchemaJson method is just your favorite way to pull the contents of a remote file over http as a String.

Now, I did this quickly because I knew all my references were to network locations. I am basically making the assumption every reference is a url. If I was trying to make this more robust, I would impl the NetworkJsonSchemaResolver more generically as a GenericJsonSchemaResolver which perhaps inspects the uri and uses the protocol, if any, as a hint of how to load it. For example, http or https would indicate a network location. Maybe, if the protocol is file or classpath, the resolver could handle this intelligently as a fully qualified local location. And, perhaps if all else fails, treat the reference as this library does by default.

Hope this helps.

public class SchemaLoader {
   private JsonSchemaResolver resolver = new NetworkJsonSchemaResolver();

   public JsonSchema loadSchema(String url) {
       try {
           String schemaJson = getSchemaJson(url);
           return service.createSchemaReaderFactoryBuilder()
                            .withSchemaResolver(resolver)
                            .build()
                            .createSchemaReader(
                                    new ByteArrayInputStream(schemaJson.getBytes())
                            )
                            .read();
       }
      catch(Exception e) {
         logger.error("", e);
      }
      return null;
   }   

   class NetworkJsonSchemaResolver implements JsonSchemaResolver {
        @Override
        public JsonSchema resolveSchema(URI uri) {
            try {
                String schemaJson = getSchemaJson(uri.toString());
                return service.createSchemaReaderFactoryBuilder()
                        .withSchemaResolver(resolver)
                        .build()
                        .createSchemaReader(
                                new ByteArrayInputStream(schemaJson.getBytes())
                        )
                        .read();
            } catch (Exception e) {
                logger.error("", e);
            }
            return null;
        }
    }
}

@leadpony
Copy link
Owner

leadpony commented May 5, 2019

Hello @rconnacher and @chadlankford
Thank you many.
@rconnacher, the method JsonResolutionTest.resolveSchema() in the code above seems to return an instance of JsonSchemaResolver instread of JsonSchema. Is this the correct example?

@rconnacher
Copy link

@chadlankford
Thanks! Your example generalizes and reinforces what I've learned from @leadpony's AbstractConformanceTest.

@rconnacher
Copy link

rconnacher commented May 7, 2019

@leadpony
You're right. Casting to a JsonSchemaResolver was a mistake on my part.

Mine's not quite working yet (mixing reader methods resulted in random "Unexpected char 0" parsing errors thrown by the glassfish JsonParser when the JsonSchemaReader looks for a next event). I'll report back once I figure it out.

Thanks again!

@leadpony
Copy link
Owner

leadpony commented May 7, 2019

Hello @rconnacher
JsonParser in the JSON Processing API created with a single parameter of InputStream will automatically detect the character encoding of the given stream from UTF-8, UTF-16, and UTF-32 with or without BOM. If the character encoding of your remote schema is neither of these, e.g. ISO 8859-1, the parser will fail to work correctly. You can explicitly specify the character encoding of the remote schema as the second parameter of JsonSchemaReaderFactory.createSchemaReader().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants