Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement optional map[string]string hstore handler. #35

Closed
flowchartsman opened this issue Sep 10, 2014 · 16 comments
Closed

Implement optional map[string]string hstore handler. #35

flowchartsman opened this issue Sep 10, 2014 · 16 comments

Comments

@flowchartsman
Copy link
Contributor

It would be nice to be able to avoid a lot of the boilerplate Scan code and simply do something like this:

type SomeType map[string]string
func (s SomeType) Scan(vr *pgx.ValueReader) (err error) {
    //can't do this until hstore Oid is standardized
    //if vr.Type().DataType != pointOid {
    //}
    s, err = pgx.hstoreMapHandler(vr.ReadString(vr.Len()))
    return
}

And just end up with a populated map.

I understand the utility of writing your own (t Type) Scan function, and I still do that in a lot of cases, but I think the hstore -> map[string]string is a common enough use case that it merits a useful handler.

Also, currently I can

SELECT hstore_to_array(hstorecol) FROM TABLE;

And just use a []string, but I still think a generic handler would be a nice addition for those that understand the cost.

@flowchartsman
Copy link
Contributor Author

@flowchartsman
Copy link
Contributor Author

Just realized, I need a check on encode if the HS is valid, but the map is empty. Easy enough to do. Let me know if this approach interests you and I'll fix it up some, write some tests, and issue a pull request.

@jackc
Copy link
Owner

jackc commented Sep 11, 2014

Looks good. I added some notes to the commit.

I didn't notice any handling for null values -- not null hstores but null values within the hstore. I think we would either want to either use a map[string]NullString or catch any null values and return an error. Or maybe have separate types for whether or not you allow null values. Not sure what is best here.

I'm also investigating ways to figure out the oid of hstore (or any other custom/extension type). If I can figure out a good way to do this then we can reinstate the oid check -- and eventually even look into the binary format.

@flowchartsman
Copy link
Contributor Author

I'll get to work on the escaping and I'll mull over the null values. An error doesn't seem quite right because the spec allows explicit NULLs after all. map[string]NullString seems idiomatic given this framework, but it would be more pain to use. I suspect that two types would be nice for those that want convenience and can guarantee that their values aren't NULL. I also considered a special type that would have a set method and then only update or remove those keys you'd changed.

To my knowledge, the only way to get the hstore OID is a DB call at runtime:

SELECT oid FROM pg_type WHERE typname='hstore'

And even then, I'm not sure if it differs between the different nodes of a cluster. I'd hope not, but it should be looked into. If it could possibly differ, there's really no sane way to add support for an OID check until it's standardized.

@jackc
Copy link
Owner

jackc commented Sep 11, 2014

For the oid problem I was considering running this query at connection time and storing a map of type name -> oid.

select t.oid, ns.nspname, t.typname from pg_type t join pg_namespace ns on t.typnamespace=ns.oid where t.typtype='b';

Then the pgx code could work in type names instead of oids. One problem is it is possible to define two types with the same name in separate schemas :(

@flowchartsman
Copy link
Contributor Author

They don't make this particular problem easy. Unless there's cross-schema OID overlap I'd probably say to fully-qualify the type names and call out ambiguity? I can't remember, are all types visible always, or just public ones?

If so, you could build the list and give each non-public type two entries in the map:
types["type"] and types["schema.type"] and then, if you find a duplicate, delete types["type"] or mark it with a special number that throws an error/warning about type ambiguity.

@jackc
Copy link
Owner

jackc commented Sep 13, 2014

I found a possible solution. Take a look at 263c4d8.

When checking by oid with vr.Type().DataType is not possible, then check by name with vr.Type().DataTypeName. So we should just be able to check for column type of hstore. This is enabled by reading in all the types at connect time. The oid -> name mapping is per connection, so even if one program reads hstores from two different servers with different hstore oids it will still work.

On further investigation of the type name collision problem, oids will be unique, even if the names in different schemas are the same, so I don't think it is a problem.

This also led me to change the way binary format support is registered to be by name instead of oid. This entailed the replacement of one public package variable, so it technically is a breaking change -- but the previous way was completely undocumented so I think it is safe to do this.

I leaving this on a branch for the moment, until we're sure this should solve the extension issues.

@flowchartsman
Copy link
Contributor Author

Cool. In that case, I guess name collision is only an issue if you want to try and make it easier to refer to a type without it's full name. Did we want to allow that, or should users always call out the full name? Maybe only in non-public data types?

Almost done with the new parse code. Switching it up to a saner format that lets me consume more than one character at a time since "NULL" being a special keyword longer than two characters and also containing a repeated character makes the current scheme a no-go.

While I'm thinking about it, do you see any use for this kind of parser elsewhere? If so, I could make it more generic and base it off of Rob Pike's concurrent parser/lexer architecture.

@jackc
Copy link
Owner

jackc commented Sep 15, 2014

In practice, name collision should never be a problem. I don't know why someone would install the hstore extension and then go in a different schema and create another type that isn't an hstore, but name it hstore. But I worry about it, because I want to be correct even in a case which shouldn't occur. The thing is, there is no way to make it fool-proof. Just like the oid can vary from server to server, the extension can be installed to any schema. So we can't rely on it always being in the public schema. The only thing we are guaranteed is that the type is named hstore. So in light of that, checking that the type being read or written is named hstore is as good a check as we are going to get.

As far as making the parser more generic, my general rule of thumb is to wait until you have 3 uses to extract/abstract code. Then you have more context to extract it correctly.

@flowchartsman
Copy link
Contributor Author

Messed up some branch names. Reforked, and here are the changes supporting both hstore -> map[string]string (no NULLs allowed) and NullHstore (NULL of entire column or of individual values allowed)
#36

@jackc
Copy link
Owner

jackc commented Sep 17, 2014

Looks good. I have a few suggestions I'd like to discuss, and depending on your thoughts we'll either merge this in as-as, or with some minor changes.

I'm unsure about directly mapping map[string]string to hstore vs. type Hstore map[string]string. When PostgreSQL 9.4 is released, jsonb should offer a superset of hstore's features at a similar performance level. It wouldn't surprise me if in a few years jsonb is more popular even for values that could/should be represented in hstore. If we use Scanner and Encoder on an Hstore type then we don't box ourselves in down the road. Then again, a similar argument could be made about slice to array types and they are treated natively... Hmm...

Are there any tests? If not, I don't mind adding them - I have a selection of test cases from an hstore parser I wrote in Ruby I could easily port.

I think I'd prefer ParseHstore to be private -- it's easy enough to make public if there's demand. But until then I'd like to avoid increasing the public API of pgx as much as possible.

What do you think?

@flowchartsman
Copy link
Contributor Author

I have no problem making it a special type. Will do.

I haven't made any tests yet; I was going to ask you to take a look and
tell me where you thought they should go and what form they should take.
My only concern would be making sure to exercise every possible syntax
error.

As to making ParseHstore private, no problem. I'll make a couple checkins
later tonight.

On Wed, Sep 17, 2014 at 5:55 PM, Jack Christensen notifications@github.com
wrote:

Looks good. I have a few suggestions I'd like to discuss, and depending on
your thoughts we'll either merge this in as-as, or with some minor changes.

I'm unsure about directly mapping map[string]string to hstore vs. type
Hstore map[string]string. When PostgreSQL 9.4 is released, jsonb should
offer a superset of hstore's features at a similar performance level. It
wouldn't surprise me if in a few years jsonb is more popular even for
values that could/should be represented in hstore. If we use Scanner and
Encoder on an Hstore type then we don't box ourselves in down the road.
Then again, a similar argument could be made about slice to array types and
they are treated natively... Hmm...

Are there any tests? If not, I don't mind adding them - I have a selection
of test cases from an hstore parser I wrote in Ruby I could easily port.

I think I'd prefer ParseHstore to be private -- it's easy enough to make
public if there's demand. But until then I'd like to avoid increasing the
public API of pgx as much as possible.

What do you think?


Reply to this email directly or view it on GitHub
#35 (comment).

@flowchartsman
Copy link
Contributor Author

Actually, I think all these changes are good, except I would like to petition to keep ParseHstore public. The thing that got me going down the road to this commit in the first place was a custom type that's basically a map[string]time.Time. Granted, I got a bit sidetracked and... general along the way, but having this functionality available would allow me to just output that directly without having to instantiate another map first.

@jackc
Copy link
Owner

jackc commented Sep 18, 2014

Sure. If you've got an actual use case for it we can leave it public.

@flowchartsman
Copy link
Contributor Author

Don't know if you get notifications for it (haven't had anyone fork or attempt to push to one of my projects yet, but I've made a push changing the native map to pgx.Hstore. Give it a look-see?

@jackc
Copy link
Owner

jackc commented Sep 19, 2014

I merged in your pull request. I added a bunch of tests, all of them passed except the empty hstore (empty is different than null). That was solved simply by a guard clause at the beginning of the parser.

Thanks a bunch!

@jackc jackc closed this as completed Sep 19, 2014
jackc added a commit that referenced this issue Dec 4, 2021
Make it possible to scan destination of *interface{} type.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants