Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InsertOpts() Metadata #165

Open
atoi opened this issue Jan 19, 2024 · 5 comments · May be fixed by #329
Open

InsertOpts() Metadata #165

atoi opened this issue Jan 19, 2024 · 5 comments · May be fixed by #329
Assignees

Comments

@atoi
Copy link

atoi commented Jan 19, 2024

It seems that Metadata is ignored when provided as a part JobArgsWithInsertOpts.InsertOpts().
When specified as a part of third param to Client.Insert they are applied just fine.

The problem I see is that insertParamsFromArgsAndOptions neglects JobArgsWithInsertOpts meta and always takes what was passed as a function arg. Actually I do agree that the function arg should have higher priority and override what was specified in JobArgsWithInsertOpts.InsertOpts(), but when the arg is empty I'd expect job's meta to take effect.

Important pieces below:

if insertOpts == nil {
	insertOpts = &InsertOpts{}
}

var jobInsertOpts InsertOpts
if argsWithOpts, ok := args.(JobArgsWithInsertOpts); ok {
	jobInsertOpts = argsWithOpts.InsertOpts()
}
metadata := insertOpts.Metadata
if len(metadata) == 0 {
	metadata = []byte("{}")
}
@bgentry bgentry self-assigned this Jan 20, 2024
@bgentry
Copy link
Contributor

bgentry commented Jan 20, 2024

Hi @atoi, this is actually intentional, although it should have been documented. I pushed up a commit that demonstrates this behavior and documents it here.

I'm wondering what use case you have in mind though. My initial thought was metadata would always be set at insertion time, but it could be possible to set some of that at the worker level. At most we could maybe change the behavior so that the default metadata specified in the JobArgsWithInsertOpts gets merged with / overridden by the metadata from the insert opts, rather than fully overwritten by it. So if your JobArgsWithInsertOpts had metadata {"a": "foo", "b": "bar"} and the Insert() level opts specified {"b": "override, "c": "something_else"}, the resulting metadata would be {"a": "foo", "b": "override", "c": "something_else"}. We might even be able to do that without pulling in an extra dependency like gjson because Postgres itself could do the merge.

I would like to hear more about the intended use case here though to know whether or not this makes sense. Please tell me more about how you'd like to use it 😄

@atoi
Copy link
Author

atoi commented Jan 22, 2024

Hi @bgentry!

My usecase is pretty simple. I need a place to store some job-specific data.
I cannot put that data into args, as that will interfere with job uniqueness check.
As the data is job-specific, I first tried to keep it close to my job definion and used JobArgsWithInsertOpts. I just naively assumed I can use all the options InsertOpts struct offers. After realizing that some options are more equal than others I was able to do what I wanted by utilizing an arg to Insert() function.

I suppose such a behavior should be mentioned in the documentation.

@bgentry
Copy link
Contributor

bgentry commented Jan 22, 2024

Ah, I’m glad I asked because this jogged my memory on something we are missing here. It would be a bit of a misuse of metadata it was used for job-specific data merely to work around the uniqueness check.

What if instead of that, we made the uniqueness check configurable so that you could choose which specific top level keys in the args field you wanted to be considered for the uniqueness check? Would that satisfy your use case by allowing you to choose which part of your args to factor in and excluding the rest? This is something we had always planned to do with unique jobs but haven’t gotten around to yet.

You are right to point out the unequal behavior of metadata in the insertopts here. That field was only exposed recently and I overlooked the fact that the behavior is different here compared to any of the other options. That being said I doing think that making it’s behavior the same (where an insertion time value completely overrides a job level default) would actually make much sense here either as opposed to something like merging the attributes—you simply wouldn’t be able to do much with that while also using any of the features which will use metadata.

In the end I think we will either want to go with the merging behavior and document it appropriately, or fork the type for default insert opts so that it doesn’t have this option which doesn’t make sense to use in this way.

Tagging in @brandur to get his thoughts as well.

@atoi
Copy link
Author

atoi commented Jan 22, 2024

TLDR: Implementing metadata merge will make me happy.

Hmm. Yes, technically the configurable uniqueness check would solve my case. Conceptually though, I'm not so sure.

See, I look at args as "task function parameters" or "things a job needs to get started".
That extra info I'm storing is related to how the job is processed, not started. I store job progress plus some data samples that job collects. I do not need to store that data permanently in "real" tables, as it is needed purely for better user experience and does not hold business value. So I decided to put that data closer to the job that collects it and metadata seemed to be a nice place for it.

I've been told already that metadata is reserved for future usage and it is better not to mess with it. At the same time it should be fine if I "reserve" a conflict-resistant namespace there for my app purposes.

@brandur
Copy link
Contributor

brandur commented Jan 27, 2024

Spoke to @bgentry offline about this one, but broad thoughts:

  • Uniqueness based on configurable specific args keys is probably something we should do, although even if it solves the proximate problem here, I'd expect this metadata issue to come up again. Metadata issues always come up again, heh.
  • I think merging metadata from InsertOpts rather than a full blown overwrite makes sense, although I suspect that we may want to eventually make this configurable in some way. i.e. InsertOpts can pass a flag or boolean indicating whether its metadata should merge with metadata below, or full on replace it. Based on my experience with metadata in past jobs, the appetite for doing creative things with metadata hashes knows no bounds and will eventually grow to desire all possible edge cases to be covered.

brandur added a commit that referenced this issue May 2, 2024
Here, attempt to resolve #165 by better defining how metadata that may
be inserted via `InsertOpts` on `Insert` will interact with metadata
that may be inserted via a job implementing `InsertOpts()`. Currently, a
job's metadata is always ignored.

The behavior to reconcile two metadata objects has some nuance, and I
don't think any one solution will fit everyone's desires. Here, I
propose that we let users defined their own preferred behavior by
implementing a new `InsertOptsMetadataReconcile` option:

    type InsertOpts struct {
        ...

        // MetadataReconcile defines how, in the presence of multiple InsertOpts
        // like one from the job itself and another one from a Client.Insert param,
        // how metadata will be reconciled between the two. For example, two
        // metadata can be merged together, or one can exclude the other.
        //
        // Defaults to MetadataReconcileExclude.
        MetadataReconcile MetadataReconcile

The possible reconciliation modes:

* `MetadataReconcileExclude`: The more specific metadata (from `Insert`
  params) completely excludes the less specific metadata (from a job'
  opts), and the latter is discarded.

* `MetadataReconcileMerge`: Carries out a shallow merge. In case of
  conflict, keys from the more specific metadata win out.

* `MetadataReconcileMergeDeep`: Carries out a deep merge, recursing into
  every object to see if we can use it as a `map[string]any`. In case of
  conflict, keys from the more specific metadata win out.

The default setting is `MetadataReconcileExclude`. This is partly a
safety feature because it is possible to create oddly formed metadata
that might have trouble merging, and partly a performance feature.
Exclude doesn't require trying to parse any of the metadata so that we
can try to merge it.

A note on typing: I tried to marshal/unmarshal maps as `map[any]any`
instead of `map[string]any` so we could support all kinds of wild key
types simultaneously, but `encoding/json` is pretty strict about working
with only one type of key at a time, and failed when I tried. We may
have to keep it as an invariant that if you want to use one of the merge
modes, your metadata must be parseable as `map[string]any`. This
shouldn't be a problem because a struct with `json` tags on it will
always produce a compatible object. I've tried to document this.

Fixes #165.
brandur added a commit that referenced this issue May 2, 2024
Here, attempt to resolve #165 by better defining how metadata that may
be inserted via `InsertOpts` on `Insert` will interact with metadata
that may be inserted via a job implementing `InsertOpts()`. Currently, a
job's metadata is always ignored.

The behavior to reconcile two metadata objects has some nuance, and I
don't think any one solution will fit everyone's desires. Here, I
propose that we let users defined their own preferred behavior by
implementing a new `InsertOptsMetadataReconcile` option:

    type InsertOpts struct {
        ...

        // MetadataReconcile defines how, in the presence of multiple InsertOpts
        // like one from the job itself and another one from a Client.Insert param,
        // how metadata will be reconciled between the two. For example, two
        // metadata can be merged together, or one can exclude the other.
        //
        // Defaults to MetadataReconcileExclude.
        MetadataReconcile MetadataReconcile

The possible reconciliation modes:

* `MetadataReconcileExclude`: The more specific metadata (from `Insert`
  params) completely excludes the less specific metadata (from a job'
  opts), and the latter is discarded.

* `MetadataReconcileMerge`: Carries out a shallow merge. In case of
  conflict, keys from the more specific metadata win out.

* `MetadataReconcileMergeDeep`: Carries out a deep merge, recursing into
  every object to see if we can use it as a `map[string]any`. In case of
  conflict, keys from the more specific metadata win out.

The default setting is `MetadataReconcileExclude`. This is partly a
safety feature because it is possible to create oddly formed metadata
that might have trouble merging, and partly a performance feature.
Exclude doesn't require trying to parse any of the metadata so that we
can try to merge it.

A note on typing: I tried to marshal/unmarshal maps as `map[any]any`
instead of `map[string]any` so we could support all kinds of wild key
types simultaneously, but `encoding/json` is pretty strict about working
with only one type of key at a time, and failed when I tried. We may
have to keep it as an invariant that if you want to use one of the merge
modes, your metadata must be parseable as `map[string]any`. This
shouldn't be a problem because a struct with `json` tags on it will
always produce a compatible object. I've tried to document this.

Fixes #165.
brandur added a commit that referenced this issue May 3, 2024
Here, attempt to resolve #165 by better defining how metadata that may
be inserted via `InsertOpts` on `Insert` will interact with metadata
that may be inserted via a job implementing `InsertOpts()`. Currently, a
job's metadata is always ignored.

The behavior to reconcile two metadata objects has some nuance, and I
don't think any one solution will fit everyone's desires. Here, I
propose that we let users defined their own preferred behavior by
implementing a new `InsertOptsMetadataReconcile` option:

    type InsertOpts struct {
        ...

        // MetadataReconcile defines how, in the presence of multiple InsertOpts
        // like one from the job itself and another one from a Client.Insert param,
        // how metadata will be reconciled between the two. For example, two
        // metadata can be merged together, or one can exclude the other.
        //
        // Defaults to MetadataReconcileExclude.
        MetadataReconcile MetadataReconcile

The possible reconciliation modes:

* `MetadataReconcileExclude`: The more specific metadata (from `Insert`
  params) completely excludes the less specific metadata (from a job'
  opts), and the latter is discarded.

* `MetadataReconcileMerge`: Carries out a shallow merge. In case of
  conflict, keys from the more specific metadata win out.

* `MetadataReconcileMergeDeep`: Carries out a deep merge, recursing into
  every object to see if we can use it as a `map[string]any`. In case of
  conflict, keys from the more specific metadata win out.

The default setting is `MetadataReconcileExclude`. This is partly a
safety feature because it is possible to create oddly formed metadata
that might have trouble merging, and partly a performance feature.
Exclude doesn't require trying to parse any of the metadata so that we
can try to merge it.

A note on typing: I tried to marshal/unmarshal maps as `map[any]any`
instead of `map[string]any` so we could support all kinds of wild key
types simultaneously, but `encoding/json` is pretty strict about working
with only one type of key at a time, and failed when I tried. We may
have to keep it as an invariant that if you want to use one of the merge
modes, your metadata must be parseable as `map[string]any`. This
shouldn't be a problem because a struct with `json` tags on it will
always produce a compatible object. I've tried to document this.

Fixes #165.
brandur added a commit that referenced this issue May 4, 2024
Here, attempt to resolve #165 by better defining how metadata that may
be inserted via `InsertOpts` on `Insert` will interact with metadata
that may be inserted via a job implementing `InsertOpts()`. Currently, a
job's metadata is always ignored.

The behavior to reconcile two metadata objects has some nuance, and I
don't think any one solution will fit everyone's desires. Here, I
propose that we let users defined their own preferred behavior by
implementing a new `InsertOptsMetadataReconcile` option:

    type InsertOpts struct {
        ...

        // MetadataReconcile defines how, in the presence of multiple InsertOpts
        // like one from the job itself and another one from a Client.Insert param,
        // how metadata will be reconciled between the two. For example, two
        // metadata can be merged together, or one can exclude the other.
        //
        // Defaults to MetadataReconcileExclude.
        MetadataReconcile MetadataReconcile

The possible reconciliation modes:

* `MetadataReconcileExclude`: The more specific metadata (from `Insert`
  params) completely excludes the less specific metadata (from a job'
  opts), and the latter is discarded.

* `MetadataReconcileMerge`: Carries out a shallow merge. In case of
  conflict, keys from the more specific metadata win out.

* `MetadataReconcileMergeDeep`: Carries out a deep merge, recursing into
  every object to see if we can use it as a `map[string]any`. In case of
  conflict, keys from the more specific metadata win out.

The default setting is `MetadataReconcileExclude`. This is partly a
safety feature because it is possible to create oddly formed metadata
that might have trouble merging, and partly a performance feature.
Exclude doesn't require trying to parse any of the metadata so that we
can try to merge it.

A note on typing: I tried to marshal/unmarshal maps as `map[any]any`
instead of `map[string]any` so we could support all kinds of wild key
types simultaneously, but `encoding/json` is pretty strict about working
with only one type of key at a time, and failed when I tried. We may
have to keep it as an invariant that if you want to use one of the merge
modes, your metadata must be parseable as `map[string]any`. This
shouldn't be a problem because a struct with `json` tags on it will
always produce a compatible object. I've tried to document this.

Fixes #165.
brandur added a commit that referenced this issue May 4, 2024
Here, attempt to resolve #165 by better defining how metadata that may
be inserted via `InsertOpts` on `Insert` will interact with metadata
that may be inserted via a job implementing `InsertOpts()`. Currently, a
job's metadata is always ignored.

The behavior to reconcile two metadata objects has some nuance, and I
don't think any one solution will fit everyone's desires. Here, I
propose that we let users defined their own preferred behavior by
implementing a new `InsertOptsMetadataReconcile` option:

    type InsertOpts struct {
        ...

        // MetadataReconcile defines how, in the presence of multiple InsertOpts
        // like one from the job itself and another one from a Client.Insert param,
        // how metadata will be reconciled between the two. For example, two
        // metadata can be merged together, or one can exclude the other.
        //
        // Defaults to MetadataReconcileExclude.
        MetadataReconcile MetadataReconcile

The possible reconciliation modes:

* `MetadataReconcileExclude`: The more specific metadata (from `Insert`
  params) completely excludes the less specific metadata (from a job'
  opts), and the latter is discarded.

* `MetadataReconcileMerge`: Carries out a shallow merge. In case of
  conflict, keys from the more specific metadata win out.

* `MetadataReconcileMergeDeep`: Carries out a deep merge, recursing into
  every object to see if we can use it as a `map[string]any`. In case of
  conflict, keys from the more specific metadata win out.

The default setting is `MetadataReconcileExclude`. This is partly a
safety feature because it is possible to create oddly formed metadata
that might have trouble merging, and partly a performance feature.
Exclude doesn't require trying to parse any of the metadata so that we
can try to merge it.

A note on typing: I tried to marshal/unmarshal maps as `map[any]any`
instead of `map[string]any` so we could support all kinds of wild key
types simultaneously, but `encoding/json` is pretty strict about working
with only one type of key at a time, and failed when I tried. We may
have to keep it as an invariant that if you want to use one of the merge
modes, your metadata must be parseable as `map[string]any`. This
shouldn't be a problem because a struct with `json` tags on it will
always produce a compatible object. I've tried to document this.

Fixes #165.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants