Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support type coercion for inconsistently represented properties #792

Closed
ReubenFrankel opened this issue Jul 5, 2022 · 4 comments
Closed
Labels
kind/Feature New feature or request valuestream/SDK

Comments

@ReubenFrankel
Copy link
Contributor

ReubenFrankel commented Jul 5, 2022

Overview

It is possible for an API to return records that contain differently-typed, formatted or structured data. The SDK could provide a mechanism of property type coercion to solve this issue.

Example 1

Record A (string)

{
    "scope": "profile email connect"
}

Record B (array of strings)

{
    "scope": [
        "profile"
        "email",
        "connect"
    ]
}

Example 2

Record A (string)

{
    "error": "something went wrong"
}

Record B (object containing string)

{
    "error": {
        "message": "something went wrong"
    }
}

Example 3

Record A (string)

{
    "hours": "137"
}

Record B (number)

{
    "hours": 137
}

Proposed Solution(s)

Define a callable mapping function, which is applied for specific properties whenever its argument type matches the property value type. This function should take the type

Callable[S, T]

where S is the Source (argument) type and T is the Target (return) type.

Extension to Python-defined schema properties

Provide a callable function to keyword argument coerce for JsonTypeHelper classes. The function return type can be inferred from the expected property schema (e.g. th.ArrayType(th.StringType) would infer a return type of List[str]).

Example implementations

# string to array (list)
th.Property("scope", th.ArrayType(th.StringType, coerce=str.split))
#th.Property("scope", th.ArrayType(th.StringType, coerce=lambda value: value.split()))

# array (list) to string
th.Property("scope", th.StringType(coerce=' '.join))
#th.Property("scope", th.StringType(coerce=lambda value: ' '.join(value)))

# string to object (dict)
th.Property("error", th.ObjectType(coerce=lambda value: { 'message': value }))

# object (dict) to string
th.Property("error", th.StringType(coerce=lambda value: value.get('message')))

# string to number (int)
th.Property("hours", th.IntegerType(coerce=int)
#th.Property("hours", th.IntegerType(coerce=lambda value: int(value))

# number (int) to string
th.Property("hours", th.StringType(coerce=str)
#th.Property("hours", th.StringType(coerce=lambda value: str(value))

RestStream class property

Provide a callable function for a given JSONPath in jsonpath_coerce dictionary.

Example implementations

jsonpath_coerce = {
    "scope": str.split,                             # string to array (list)
    "scope": ' '.join,                              # array (list) to string
    "error": lambda value: { 'message': value },    # string to object (dict)
    "error": lambda value: value.get('message'),    # object (dict) to string
    "hours": int,                                   # string to number (int)
    "hours": str,                                   # number (int) to string
}

Implementation Considerations

@aaronsteers on Slack:

Performance and implementation-wise there are still some things to figure out, such as whether this should run over every record or only if validation fails. And adding validation on each record/node would have some performance implications also, although perhaps this would be minimal. I could see this being 'validation exception handlers' per node (run only if data doesn't fit) or as pre-processors (run on every record). A few other things to work out also, such as whether these should run only if data exists at the node's path, or if they would run regardless.


Slack discussions:

@ReubenFrankel
Copy link
Contributor Author

Note that type coercion can be achieved currently with RESTStream::post_process, although this method can quickly become quite complex given many properties and factoring in other unrelated post-processing logic.

@stale
Copy link

stale bot commented Jul 18, 2023

This has been marked as stale because it is unassigned, and has not had recent activity. It will be closed after 21 days if no further activity occurs. If this should never go stale, please add the evergreen label, or request that it be added.

@stale stale bot added the stale label Jul 18, 2023
@ReubenFrankel
Copy link
Contributor Author

Still relevant IMO.

@stale stale bot removed the stale label Jul 18, 2023
@ReubenFrankel
Copy link
Contributor Author

I think the implementations I proposed here either overlap too much with schema logic or are a reinvention of stream maps. The best way forward for this kind of issue would be the ability to leverage stream maps as a developer, so closing as a duplicate of #1231. Feel free to reopen otherwise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/Feature New feature or request valuestream/SDK
Projects
None yet
Development

No branches or pull requests

2 participants