Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement "inline schemas": ability to add type hints into the type providers' source documents #1447

Merged
merged 11 commits into from
Aug 8, 2022

Conversation

mlaily
Copy link
Contributor

@mlaily mlaily commented Jun 4, 2022

Hello,

I know I should probably have asked about such a feature beforehand, but I was playing with the project and got a bit carried away... sorry!
Let's at least talk about it now!

Does it seem like a welcome addition?
I can see myself using it with the json and xml providers at least.
What do you think?

High level overview:

When enabled (after setting the new InferenceMode static parameter on JsonProvider, XmlProvider, and HtmlProvider to ValuesAndInlineSchemasHints or ValuesAndInlineSchemasOverrides — default value of the enum should be backward compatible), it's now possible to define inline schemas: string values that are used as hints or overrides for type inference in the providers.

Using ValuesAndInlineSchemasHints, inline schemas are about equivalent to setting values interpreted as these types, but easier and with more control over the result.

The ValuesAndInlineSchemasOverrides mode is probably the most intuitive to use:

Using a inline schema once for a value repeated in a sample list or collection is enough to override value-based inference and indicate to the provider we know better.

Example use case:
I have a sample with Code properties whose values look something like "000" "123" "4E5" or "ABC". This list would be inferred as an heterogeneous list of floats or strings, but I know using these as floats makes no sense, so I only want the string version.

Before inline schemas, I would have to either disable type inference (undesirable because I want to infer other values), or replace all the Code properties in my xml/json sample with something that is unambiguously not a float. This is annoying, especially in large samples.

Note it's also possible to provide a measure of unit! :)

Example:

image
(Old screenshot. The enum names have changed)

I added typeof{int{metre}} and typeof{datetimeoffset} in the sample list, so the type provider generates the necessary members to access the value with these types.

Syntax

  • typeof<typeName>
  • typeof{typeName}
  • typeof<typeName<measureName>>
  • typeof{typeName{measureName}}

List of type names allowed: int, int64, bool, float, decimal, date, datetimeoffset, timespan, guid, string, int option, int64 option, bool option, float option, decimal option, date option, datetimeoffset option, timespan option, guid option and string option.

This is the same list as the one already supported by the CsvProvider in its schemas, minus the nullable and optional types.

(I started by allowing optional type definitions, but it led to weird bugs with weird generated types when the parent is also optional, so I decided to only allow "normal" types, and continue relying only on structural inference to generate optionals)

Units of measure allowed: the ones already supported by the default IUnitsOfMeasureProvider in the project. I believe this is the default SI units.

Either <> or {} can be used, so it's easy to define an inline schema both in the JsonProvider and the XmlProvider and HtmlProvider.

Remarks

I don't know if this is the best approach, but I did my best to stay backward compatible.
I created an enum for the public api (because we need to be able to create constant values for the compiler to accept them on a provider, and because there is a special default value for backward compat). This public enum is then mapped on an internal DU used to represent valid cases in an unambiguous way (enums are just vaguely typed ints, DU enforce the list of possible values).

There are currently 4 distinct modes of inference:

  1. no inference
  2. infer from values as before
  3. infer from values and inline schemas as hints
  4. infer from values and inline schemas as overrides (If we didn't have to care about backward compat, I would set this as the default)

After experimenting a bit, I feel like this is all the choices I would want. What do you think?

Is there any way to mark a provided static parameter obsolete? It does not seem to be implemented (and I tried to implement it but the attribute didn't seem to be picked up by VS Code at least)

I realized after seeing the broken docs that I may have changed public apis I should have been more careful not to change, but I don't know what is considered as the public api that shouldn't change too much, and what is fair game. Please advise.

✨ I made an effort to have clean commits with minimum overlap. It should hopefully be easy to review them independently ✨


This PR should fix #1323 and provide a better workaround for #1418 and #1221

@mlaily mlaily mentioned this pull request Jun 9, 2022
@mlaily mlaily marked this pull request as ready for review June 12, 2022 14:19
@cartermp
Copy link
Collaborator

@mlaily you might need to rebase here now that some stuff has merged.

@mlaily
Copy link
Contributor Author

mlaily commented Jun 20, 2022

@mlaily you might need to rebase here now that some stuff has merged.

Thanks for the heads up!

mlaily added 11 commits July 9, 2022 20:42
…ering types from inline schema definitions

Ensure backward compatibility + add doc
This means changing inferPrimitiveType to take a UnitsOfMeasureProvier parameter,
and return InferedType instead of Type so we can set the units on the
returned values in a way that will work with the rest of the code.
where inline schemas types override value-inferred types
instead of just adding new types to the list of inferred types.

This requires adding metadata on InferedType.Primitive to know if the type
should replace other types when merging them.

This also requires adding metadata on InferedType.Heterogeneous
to keep track of optionality inside an heterogeneous type.
(Since we want to be able to transform a heterogeneous type back into a primitive type
when the primitive type is supposed to override other primitive types)

Preserving optionality in heterogeneous types effectively makes nulls
and inline schemas both being considered at the same level of importance when merging.
Not sure why the assert was there, but looking at the similar ConversionsGenerator
gives me confidence it shouldn't be here anymore now that we can have
units of measure in json...
The existing ones changed to take the BackwardCompatible inference mode.
All Json and Xml cases that had value inference enabled have been copied
to 3 new cases: one for each value of InferenceMode.

The expected content is always the same: enabling inline schemas
when no schema is defined in the source file should not change anything to the output.
@mlaily
Copy link
Contributor Author

mlaily commented Jul 9, 2022

Hello,

Any tips on how I could help this PR move forward?

@cartermp
Copy link
Collaborator

cartermp commented Aug 4, 2022

So sorry for the delay, let me review it now.

Copy link
Collaborator

@cartermp cartermp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a little while, but I think this looks good.

As a follow-up PR @mlaily, could you add an example in the docs that explains the feature and how to turn it on?

@mlaily mlaily mentioned this pull request Aug 7, 2022
@mlaily
Copy link
Contributor Author

mlaily commented Aug 7, 2022

@cartermp thanks a lot for the review.

I opened a new PR for the documentation (#1456). Let me know if it's ok for you.

@cartermp
Copy link
Collaborator

cartermp commented Aug 8, 2022

Awesome, thanks!

@cartermp cartermp merged commit d0a5f47 into fsprojects:main Aug 8, 2022
@mlaily
Copy link
Contributor Author

mlaily commented Aug 8, 2022

Woohoo! thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Type overrides for XMLProvider
2 participants