Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose API for building JSON schemas from Tapir Schemas #2873

Merged
merged 12 commits into from May 18, 2023

Conversation

kciesielski
Copy link
Member

@kciesielski kciesielski commented May 8, 2023

Solves #2838

This PR exposes a method which allows to convert a Tapir schema to ASchema (JSON schema). It uses similar mechanism to the internals of SchemasForEndpoints.

Proposal for the new API:

import sttp.apispec.circe._
import sttp.apispec.{ReferenceOr, Schema => ASchema, SchemaType => ASchemaType}
import sttp.tapir._
import  sttp.tapir.docs.apispec.schema._
import sttp.tapir.generic.auto._

  object Childhood {
    case class Child(age: Int, height: Option[Int])
  }
  case class Parent(innerChildField: Child, childDetails: Childhood.Child)
  case class Child(childName: String) // to illustrate unique name generation
  val tSchema = implicitly[Schema[Parent]]

  val jsonSchema: ReferenceOr[ASchema] = TapirSchemaToJsonSchema(
    tSchema, 
    markOptionsAsNullable = true,
    metaSchema = MetaSchemaDraft04, // default
    schemaName = defaultSchemaName // default
)

to encode the schema:

import io.circe.Printer
import io.circe.syntax._
import sttp.apispec.circe._

val schemaAsJson = jsonSchema.getOrElse(ASchemaType.Null).asJson
println(Printer.spaces2.print(schemaAsJson.deepDropNullValues))

which gives:

{
  "$schema" : "https://json-schema.org/draft-04/schema#",
  "required" : [
    "innerChildField",
    "childDetails"
  ],
  "type" : "object",
  "properties" : {
    "innerChildField" : {
      "$ref" : "#/$defs/Child"
    },
    "childDetails" : {
      "$ref" : "#/$defs/Child1"
    }
  },
  "$defs" : {
    "Child" : {
      "required" : [
        "childName"
      ],
      "type" : "object",
      "properties" : {
        "childName" : {
          "type" : "string"
        }
      }
    },
    "Child1" : {
      "required" : [
        "age"
      ],
      "type" : "object",
      "properties" : {
        "age" : {
          "type" : "integer",
          "format" : "int32"
        },
        "height" : {
          "type" : [
            "integer",
            "null"
          ],
          "format" : "int32"
        }
      }
    }
  }
}

Checklist

  • The public API
  • Tests
  • Documentation

@kciesielski
Copy link
Member Author

@kamilkloch could you take a look at the proposed API in the PR description? Would such way of obtaining JSON schemas be useful for use cases like the one you had?

@kciesielski kciesielski added enhancement New feature or request openapi labels May 8, 2023
@kamilkloch
Copy link
Contributor

@kciesielski Thank you! Looking into it.

@kamilkloch
Copy link
Contributor

kamilkloch commented May 9, 2023

We might want to expose a slightly modified API, similar to what sttp.apispec.openapi.OpenAPI is doing. In the end, the user is probably interested not in ListMap[SchemaId, ReferenceOr[ASchema]] but rather means to generate an aggregated JSON schema (comprising all the $refs) for a given list of tapir schemas. Just like there exists an io.circe.Encoder[OpenAPI], we could have io.circe.Encoder[OpenAPIJsonSchema].

Other remarks:

val intListSchema = implicitly[Schema[List[Int]]]
val result = TapirSchemaToJsonSchema(List(intListSchema), markOptionsAsNullable = false)
println(result.keys)

currently prints List(). Ultimately, OpenAPIJsonSchema(List(intListSchema)).asJson should probably look similar to this:

{
  "$schema" : "http://json-schema.org/draft-04/schema#",
  "type" : "array",
  "items" : {
    "type" : "integer"
  }
}

Also, for

 import sttp.apispec.circe._

 case class Child(childName: String)
 val childSchema = implicitly[Schema[Child]]
 val result = TapirSchemaToJsonSchema(List(childSchema), markOptionsAsNullable = false)
 println(result("Child").value.asJson.deepDropNullValues)

the schema is missing $schema field (probably to be added as a configuration parameter to OpenAPIJsonSchema:

{
  "required" : [
    "childName"
  ],
  "type" : "object",
  "properties" : {
    "childName" : {
      "type" : "string"
    }
  }
}

EDIT: import sttp.apispec.circe._ requires "com.softwaremill.sttp.apispec" %% "jsonschema-circe" % Versions.sttpApispec % Test

@kciesielski
Copy link
Member Author

Thanks a lot for your input @kamilkloch! What you call OpenAPIJsonSchema in your example would then be a wrapper around a ListMap[SchemaId, ReferenceOr[ASchema]], right? Then it could be created with .apply(tapirSchemas: List[TSchema[_]], params) where params are similar to what was proposed in TapirSchemaToJsonSchema, but with additional defaultMetaSchema that would set the $schema field on all the members?

@kamilkloch
Copy link
Contributor

That might be a nice approach, yes :) Points to consider:

  1. defaultMetaSchema: meta schema has impact on the generated JSON Schema: https://json-schema.org/specification-links.html. If we are to support multiple meta schemas, we probably need to have a sealed AST for that parameter and a custom generator for each one?
  2. We could follow the sttp.apispec.openapi.OpenAPI approach and have OpenAPIJsonSchema#toJson: String, implemented in a similar fashion as toYaml:
    def toYaml: String = {
      import sttp.apispec.openapi.circe._
      Printer(dropNullKeys = true, preserveOrder = true).pretty(openAPI.asJson)
    }

@kciesielski
Copy link
Member Author

kciesielski commented May 11, 2023

If we are to support multiple meta schemas, we probably need to have a sealed AST for that parameter and a custom generator for each one?

We might want to start with one metaSchema and consider support for switching them in the future, if there's demand. You mentioned draft-04 in your code, is this just an arbitrary example? The latest version is 2020‑12 and I'm wondering if this is a good default for us, it's my first experience with JSON schemas.

OpenAPIJsonSchema#toJson: String, implemented in a similar fashion as toYaml

Sounds good! One more thought, regarding the hardest part, which is always naming things ;) As far as I understand the specification is maintained outside of the OpenAPI community, so OpenAPIJsonSchema may not be a good name. The Schema type is in apispec-doc tapir module, separated from openapi-doc, probably for the same reason. (@adamw ?)
If that's the case then we could call it just JsonSchemas().

@kamilkloch
Copy link
Contributor

Agreed on both (fixed meta schema for now and JsonSchema as a proposed name). Which meta schema - probably the one we are currently compatible with, I do not know which is that... 🤔

@kciesielski
Copy link
Member Author

@kamilkloch Thanks again for your priceless input :)
Here's a proposition of the API after some updates:

  1. To create the representation, we call
   case class Parent(innerChildField: Child)
    case class Child(childNames: List[String])
    val tSchema = implicitly[Schema[Parent]]

    val jsonSchemas: JsonSchemas = JsonSchemas(
    List(tSchema), 
    markOptionsAsNullable = true,
    metaSchema = Some(MetaSchema202012), // default = None
    schemaName = defaultSchemaName // default
)
  1. Underlying representation is a ListMap[SchemaId, ASchema]. I think the indentifiers can be useful, so I wouldn't remove them.

  2. To generate schemas as an JSON array, one can call

import sttp.apispec.circe._

val schemasAsJson = jsonSchemas.schemas.values.asJson

which gives

[
  {
    "$schema" : "https://json-schema.org/draft/2020-12/schema",
    "required" : [
      "innerChildField"
    ],
    "type" : "object",
    "properties" : {
      "innerChildField" : {
        "$ref" : "#/components/schemas/Child"
      }
    }
  },
  {
    "$schema" : "https://json-schema.org/draft/2020-12/schema",
    "required" : [
      "childName"
    ],
    "type" : "object",
    "properties" : {
      "childName" : {
        "type" : "string"
      }
    }
  }
]
  1. Top-level simple schemas like the example you have (Schema[List[Int]]) should now work, but I don't know when would that be usable in reality?

  2. References may be problematic, because they represent OpenAPI document references, like #/components/schemas/Child. Since we're no longer in the OpenAPI context, one might prefer references like https://some.ref.url/myschema.json. However, in order this to be achievable, we need schema.$id field to match this name, and there is no schema.$id in our model at this moment. Adding such a field to the apispec module breaks binary compatibility and I don't think we want to go that far right now.

  3. Serializing to JSON with a simple JsonSchema.toJson would require the entire thing to have an additional accompanying circe module. I guess that's also a kind of a stretch we might not want to go for right now, especially that converting to JSON can be done easily with sttp.apispec.circe as I mentioned in point 3. This will also be shown in Tapir documentation.

WDYT?

@kamilkloch
Copy link
Contributor

kamilkloch commented May 12, 2023

@kciesielski Thank you for your work! I feel a bit akward only commenting on your code and not writing the code myself.

That said:

which gives

[
  {
    "$schema" : "https://json-schema.org/draft/2020-12/schema",
    "required" : [
      "innerChildField"
    ],
    "type" : "object",
    "properties" : {
      "innerChildField" : {
        "$ref" : "#/components/schemas/Child"
      }
    }
  },
  {
    "$schema" : "https://json-schema.org/draft/2020-12/schema",
    "required" : [
      "childName"
    ],
    "type" : "object",
    "properties" : {
      "childName" : {
        "type" : "string"
      }
    }
  }
]

I am not sure if:

  • this is actually a valid JSON Schema (it does not validate: https://www.jsonschemavalidator.net/)
  • this is what the user wants. Typical use case would be to generate a schema for some (nested) case class. The result should be more like:
{
   "$schema":"http://json-schema.org/draft-04/schema#",
   "type":"object",
   "additionalProperties":false,
   "properties":{
      "innerChildField":{
         "$ref":"#/definitions/Child"
      }
   },
   "required":[
      "innerChildField"
   ],
   "definitions":{
      "Child":{
         "type":"object",
         "additionalProperties":false,
         "properties":{
            "childName":{
               "type":"string"
            }
         },
         "required":[
            "childName"
         ]
      }
   }
}

Since the output (JSON schema) should be one Json object, I am now not sure if accepting List[TSchema[_]] in the constructor even makes sense. Internally, this one TSchema may produce a list of ASchemas (is it actually true?), and later, this list of ASchemas is rendered into a single Json object schema, with embedded references, just like in the case of OpenAPI.

EDIT: That is why I am not sure if jsonSchemas.schemas.values.asJson is an acceptable outcome:

  • the result is not a valid Json Schema,
  • individual schemas contain broken references.

@kciesielski
Copy link
Member Author

kciesielski commented May 12, 2023

Thanks for clarifying, I think I misunderstood some crucial points, which seem much clearer now. Indeed we should just be fine with a single Tapir schema converter, but one which builds a structure of the type definition plus all referenced types in the definitions part.
BTW regarding metaschema, it turns out OpenAPI uses its own flavor, where some fields have different meaning, and some are unsupported. See https://swagger.io/docs/specification/data-models/keywords/
I guess this means we need to hardcode https://raw.githubusercontent.com/OAI/OpenAPI-Specification/main/schemas/v3.0/schema.json (or 3.1) as metaschema.
(Sorry, it turns out that the schema I mentioned is OpenAPI specs metaschema, not the OpenAPI-flavored-json metaschema. I'll try to search for such a metaschema. But even if it's available somewhere, the question remains: would such encoding be actually useful? For example, in your case?

@kamilkloch
Copy link
Contributor

would such encoding be actually useful? For example, in your case?

Hm, it would probably be nice to strive for compatibility with Json meta schema, so that clients (IDE, for example), can use this schema for json input validation. Example - https://www.schemastore.org/json/ used within IDEs. Not sure how it will fly with OpenAPI standard.

@kciesielski
Copy link
Member Author

kciesielski commented May 15, 2023

@kamilkloch Thanks again for your time spent looking at this!
After some more investigation I think we may actually end up with a pretty simple solution. The apispec.Schema type represents basically what is needed. It has a $defs field, which can store referenced schemas (what you called definitions in your example), and looks like the final JSON is indeed well validated with the draft-04 schema.
The new iteration of proposed API then becomes:

  object Childhood {
    case class Child(age: Int, height: Option[Int])
  }
  case class Parent(innerChildField: Child, childDetails: Childhood.Child)
  case class Child(childName: String) // to illustrate unique name generation
  val tSchema = implicitly[Schema[Parent]]

  val jsonSchema: ASchema = JsonSchemas(
    tSchema, 
    markOptionsAsNullable = true,
    metaSchema = MetaSchemaDraft04, // default
    schemaName = defaultSchemaName // default
)

import io.circe.Printer
import io.circe.syntax._
import sttp.apispec.circe._

val schemaAsJson = jsonSchema.asJson
println(Printer.spaces2.print(schemaAsJson.deepDropNullValues))

and this gives

{
  "$schema" : "https://json-schema.org/draft-04/schema#",
  "required" : [
    "innerChildField",
    "childDetails"
  ],
  "type" : "object",
  "properties" : {
    "innerChildField" : {
      "$ref" : "#/$defs/Child"
    },
    "childDetails" : {
      "$ref" : "#/$defs/Child1"
    }
  },
  "$defs" : {
    "Child" : {
      "required" : [
        "childName"
      ],
      "type" : "object",
      "properties" : {
        "childName" : {
          "type" : "string"
        }
      }
    },
    "Child1" : {
      "required" : [
        "age"
      ],
      "type" : "object",
      "properties" : {
        "age" : {
          "type" : "integer",
          "format" : "int32"
        },
        "height" : {
          "type" : [
            "integer",
            "null"
          ],
          "format" : "int32"
        }
      }
    }
  }
}

I checked some other cases with https://www.jsonschemavalidator.net/ and I couldn't generate a failing example so far.

@kamilkloch
Copy link
Contributor

kamilkloch commented May 15, 2023

@kciesielski Wow, that looks great, thank you! Back to the most difficult problem ;): do want to keep the name JsonSchemas, now that it is no longer plural?

@kciesielski
Copy link
Member Author

I thought about it too, the name no longer sounds appropriate. Maybe we should get back to TapirSchemaToJsonSchema :)
I'd like @adamw to review the PR, and we'll think of a name.

val schemaIds = keysToSchemas.map { case (k, v) => k -> ((keysToIds(k), v)) }

val nestedKeyedSchemas = (schemaIds.values)
// TODO proper handling of ref input schema
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: Should we just make the method return a ReferenceOr[ASchema] and let the user handle this?

@kciesielski kciesielski requested a review from adamw May 15, 2023 10:47
@kciesielski kciesielski marked this pull request as ready for review May 15, 2023 10:48
@kciesielski kciesielski merged commit c0e0ec3 into master May 18, 2023
15 checks passed
@mergify mergify bot deleted the feature/json-schema-public branch May 18, 2023 20:28
@adamw
Copy link
Member

adamw commented May 19, 2023

Great that you found the right API :) One thing missing - mentionTapirSchemaToJsonSchema somewhere in the docs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request openapi
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants