Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some FromSQLValue instances are very difficult to write as a result of limitations in node-postgres #58

Open
Quelklef opened this issue Aug 3, 2021 · 9 comments

Comments

@Quelklef
Copy link

Quelklef commented Aug 3, 2021

Hello!

I am currently writing my own set of ps/pg bindings for a personal project. Mason Mackaman suggested possibly using postgresql-client instead. (He also mentioned that @paluh had expressed frustration about yet another set of bindings being written).

postgresql-client and my bindings are very similar in design and both exhibit a particular pitfall. I want to document this pitfall here and also propose a potential solution. Should this pitfall be fixed in postgresql-client, I don't really have good reason to continue developing my bindings. Should the pitfall be deemed a WONTFIX or non-problem, I may continue developing my bindings and eventually publish them.

Let me detail the pitfall now.

It seems that postgresql-client and my bindings have the same method for handling returned SQL rows: we ask destination types to instantiate FromSQLValue; then, when an SQL row is returned from node-postgres, we use fromSQLValue to turn it into a Purescript value.

While node-postgres generally displays pretty reasonable behaviour, atypical cases are not handled so well. For instance, on the query SELECT 'fst', 'snd';, which has row type TEXT, TEXT (a size-two row of text), node-postgres will produce ["fst", "snd"]. However, for the query SELECT ('fst', 'snd');, with row type (TEXT, TEXT) (a size-one row containing a 2-tuple of text), node-postgres will produce ["(fst,snd)"] [0].

This means that instantiating FromSQLValue for the row type (TEXT, TEXT) is of notable complexity. The programmer needs to properly handle SQL string syntax, which includes accounting for both quoted and unquoted strings, escape codes, etc [1]. postgresql-client currently does not provide any tools for dealing with this and other complex cases, essentially throwing the programmer into the deep end.

I have two design which I believe to pander better to these complex cases of FromSQLValue as well as retain support for existing cases.

(1) The first design is as follows.

Forgo the parsing from node-postgres entirely. Have FromSQLValue parse strings instead, i.e. class FromSQLValue a where fromSQLValue :: String -> Either String a. Then, provide a module containing SQL parser combinators for use when implementing fromSQLValue.

Many existing FromSQLValue instances will be trivial to update: for instance, parsing BOOL goes from Foreign.readBool to parseBool. Others, such as those with container types, will be less trivial: Foreign.readArray >=> traverse Foreign.readInt will have to become something like parseArrayOf parseInt [2]. However, I believe that all existing cases will be accounted for, as well as additionally accounting for the complex cases previously discussed.

(2) The second design is as follows.

Ask instances of FromSQLValue to provide two things: a description of the expected SQL row type, and then a converter fromSQLValue :: Foreign -> Either String a. The row type description will be used to turn the returned row into a canonical Foreign value, and then the converter will be used as it is currently.

This design is more complex, but also more reverse-compatible, as most fromSQLValue implementations will not need to be changed. Additionally, this second design may work better depending on how regular/irregular the SQL value formatting is. For instance, if a container's syntax changes depending on the values it contains, then this second design will probably handle those cases better.

End second design.

The pitfall outlined in this GitHub issue creates a schism wherein some FromSQLValue instances are easy to write, and some are very difficult, and the divide is not obvious; this strikes me as a significant issue. However, I recognize that (A) these complex cases likely make up a very, very small portion of FromSQLValue instances [3], and (B) this proposal is a fairly notable shift in design. For these reasons, I understand if there is hesitance towards accepting it.

Please let me know your thoughts.


Post-script. It's likely worth mentioning that a similar thought process might apply to ToSQLValue, as well, but that I have not yet thought about it fully.

[0]: The fundamental issue, as far as I can tell, is that node-postgres uses type OIDs to determine how to parse the strings that postgres produces. For instance, if postgresql produces the string t with type OID 18 (CHAR), then node-postgres will parse this to the js value 't'. However, if postgresql produces the string t with type OID 16 (BOOL), then node-postgres will parse this to the js value true. It seems, however, that not all types have OIDs, and if a row is returned with no OID then node-postres will return it unchanged. This is why SELECT (true, false); became "(t,f)": because (BOOL, BOOL) has no OID. Or, at least, node-postgres does not know of any.

I would like to admit, quickly, that I am not an expert on Postgres interop / communication, so it's possible that I am mistaken about something with regards to type OIDs.

[1]: Example difficult-to-parse result: the row produced by

select ('easy', E'with\nnewline', 'with " quote', 'with , comma');

is returned by node-postgres as

[ '(easy,"with\nnewline","with "" quote","with , comma")' ]

[2]: FWIW, the reason that "array of int" needs a nontrivial change is because while Foreign.readArray can read the array structure without reading its contents, one cannot parse an array from a string without parsing its contents.

[3]: Unfortunately for me, my current use-case is one of these very, very few cases =(

@Quelklef
Copy link
Author

Quelklef commented Aug 4, 2021

Note: implementing strategy (2) might take some care in order to make sure that all possible SQL return values are typeable. For instance, one must handle the case of an SQL query returning a set of rows where two or more rows differ in type. This is exemplified by

SELECT 'this is text', ('bla', 'bla', 'bla') UNION ALL SELECT 'this is numbers', (100, 101);

@paluh
Copy link
Collaborator

paluh commented Aug 14, 2021

@Quelklef Sorry for my late response.

I think that want to takle all the problems decribed by you. I have already encountered and tried to "patch" a specific problematic datetime case but this was just a "hot fix".

I'm not sure if I understand your second proposition. On the other hand your first design seems to be really clear and easy to follow so I like it for sure :-) I think that additional possible benefit of this design is that we are possibly moving away from node-postgres on the parsing layer and starting to build a layer of parsers / serializers which are backand agnostic and can be possibly used for postgres on any other language backend. If I understand correctly we can even imagine in the future separation of this layer to a backend agnostic lib (like purescript-postgresql-parsers or something). I wonder how hard is to skip unnecessery node-postgres parsing and get String input. Could you please tell me if you have investigated / solved this part already?

@Quelklef what is your preference - do you prefer your first or second design? Do you think that I should go and dive into the first proposition once again (I have like a computer detox for the next three more months so I'm hardly ever online :-P)?

P.S.
@Kamirus, @JordanMartinez do you have any strong opinions on this issue and its soultions?

@JordanMartinez
Copy link
Contributor

I don't really have one. I lean towards value-based codecs, so I'd prefer not even using FromSQLValue in the first place.

@Quelklef
Copy link
Author

Quelklef commented Aug 16, 2021

This turned out to be a very long comment. I have provided a summary below. Alternatively, please feel free to skim or skip this comment.

My recommendation is for an implementation that looks like this:

class FromSQLValue c a | a -> c where
  sqlTypedesc :: SQLTypedesc c  -- value that tells how to go from an SQL expression to a c
  fromCanonical :: c -> Either String a  -- go from a c to an a

-- Secretly they're just parsers
newtype SQLTypedesc c = SQLTypedesc (Parser c)

-- How fromSQLValue would be written with this
fromSQLValue :: forall c a. FromSQLValue c a -> Either String a
fromSQLValue = runParser sqlTypedesc >=> fromCanonical  -- or something like that

-- Used like this
newtype Box = Box { width :: Number, height :: Number }
instance FromSQLValue Box where
  sqlTypedesc = pgTuple (pgDouble /\ pgDouble) :: SQLTypedesc (Number /\ Number)
  fromCanonical (width /\ height) = Box { width, height }

I believe this implementation to provide a nice API, reasonable implementation, and likely work well with your goal of backend-agnosticism.

Oh, and fully suppressing node-postgres is easy (but undocumented): require('pg').types.getTypeParser = () => x => x;.

The rest of this comment provides more details as well as documents how I concluded that this design is good.


@paluh No sweat on the time frame! In fact, in the time between first posting this GitHub issue and your response, I have been working on the problem on my own and have more thoughts.

"I'm not sure if I understand your second proposition."

Oops! I should explain it a bit more. The idea is to have FromSQLValue request (1) an abstract description of the expected type of the SQL expression, which will be used to turn the expression into a Foreign; then (2) a converter to turn that Foreign into a value.

In code,

class FromSQLValue a where
  exprType :: SQLTypedesc
  fromForeign :: Foreign -> Either String a

data SQLTypedesc
  = Pg_Int
  | Pg_Double
  | Pg_Text
  | Pg_Array SQLTypedesc
  | {- ... -}

The SQLTypedesc would be used to turn the string that PostgreSQL returns into a canonical Foreign value to be passed into fromForeign. For instance, I might have the following instance:

newtype Box = Box { width :: Number, height :: Number }

instance FromSQLValue Box where
  exprType = Pg_Tuple (Pg_Double /\ Pg_Double)  -- expected type is (DOUBLE, DOUBLE)
  fromForeign f = {- readArray then readNumber twice -}

Because I gave an exprType, my fromForeign function has knowledge that the incoming Foreign value has the shape [number, number].

The idea behind this design is as follows. Design (1) just asks FromSQLValue instances for a parser fromSQLValue :: String -> Either String a. This would work fine. However, it doesn't reflect the fact that the incoming Strings are known to be PostgreSQL expression syntax. This design, design (2), tries to reflect that by pulling the heavyweight of parsing the PostgreSQL syntax behind the scenes and exposing SQL values more pleasantly as Foriegns.

A third design: design (3)

An interesting feature of design (2) is that if exprType is known statically, then we should, in theory, be able to type the processed row to something more specific than a Foreign. If exprType = Pg_Text, then we know the Foreign is actually a string.

We can reify this with the typesystem by parameterizing SQLTypedesc, so that an SQLTypedesc a will parse an SQL expression into an a. The following code, for instance, uses GADTs to achieve this.

data SQLTypedesc a where
  Pg_Int :: SQLTypedesc Int
  Pg_Double :: SQLTypedesc Number
  Pg_Text :: SQLTypedesc String
  Pg_Array :: forall a. SQLTypedesc a -> SQLTypedesc (Array a)
  {- ... -}

class FromSQLValue canonical rich where
  exprType :: SQLTypedesc canonical  -- turn the SQL row into some canonical Purescript form
  fromCanonical :: canonical -> Either String rich  -- turn that into a rich value

newtype Box = Box { width :: Number, height :: Number }

instance FromSQLValue Box where
  exprType = Pg_Tuple (Pg_Double /\ Pg_Double)
  fromCanonical (width /\ height) = Box { width, height }

Unfortunately, Purescript does not currently support GADTs.

Fortunately, GADTs are not fundamentally required for this solution. As long as we have

a. an opaque type SQLTypedesc :: Type -> Type,
b. a way to construct SQLTypedescs, and
c. a function to actually use them parse :: SQLTypedesc a -> String -> Either String a,

that's enough.

I believe this can be achieved, in theory, in a number of ways. For instance:

  • Design (3.1)
    • Make SQLTypedesc be an arity-1 type with a phantom parameter
    • Create values of SQLTypedesc with values rather than constructors (e.g. pgDouble :: SQLTypedesc Number)
    • Then parse interprets the type description, using casts under the hood to make the types work out

Implementation stub:

data SQLTypedesc a
  = Pg_Int
  | Pg_Double
  | {- ... -}

pgInt :: SQLTypedesc Int
pgInt = Pg_Int

pgDouble :: SQLTypedesc Number
pgDouble = Pg_Double

{- ... -}

parse :: SQLTypedesc a -> String -> Either String a
parse Pg_Int = parseInt
parse Pg_Double = parseDouble
{- ... -}
  • Design (3.2)
    • Make SQLTypedesc ≈ Text.Parsing.Parser (Parser) (in some way, one could see a Parser as a "description" of the expected format of an incoming string)
    • Values of SQLTypedesc would be thin wrappers over SQL expression parsers
    • Then parse would just run the parser

Implementation stub:

newtype SQLTypedesc a = SQLTypedesc (Parser a)

pgInt :: SQLTypedesc Int
pgInt = SQLTypedesc $ parseInt

{- ... -}

parse :: SQLTypedesc a -> String -> Either String a
parse (SQLTypedesc parser) str = runParser parser str # lmap showErr  -- or something like that
  where showErr = {- ... -}

Again, designs (3.1) and (3.2) are definitely not the only ways to achieve design (3). But they are a couple of reasonable ways.

"your first design seems to be really clear [...] I think that additional possible benefit of this design is that we are possibly [...] starting to build a layer of parsers / serializers which are backand agnostic"

I don't know exactly what you have in mind for this backend-agnostic future; however, in my mind design (3) would work best for it. The issue with design (1) is that it exposes the fact that parsing SQL strings is happening, meaning that client code might inadvertently write something which is PostgreSQL-only. On the contrary, because in design (3) SQLTypedesc is an opaque type, the fact that parsing is happening behind the scenes is essentially an implementation detail.

"what is your preference"

I'm personally in favor of design (3.2). I don't like that design (1) allows the client code to think about SQL strings; design (2) makes below-optimal use of the typesystem; and design (3.1) seems comparable to (3.2) but more complex.

The only issue I can see with design (3.2) is that it would likely less performant than design (1). Using design (3.2), parsing SQL expressions is a two-step process: first, turn them into a canonical value via sqlTypedesc; then, turn that into a client type via fromCanonical. On the contrary, design (1) folds all parsing into a single step.

In my personal bindings, I have been using design (1) with reasonable success so far. (Which is to say, success writing code; I have yet to run any).

"I wonder how hard is to skip unnecessery node-postgres parsing and get String input"

Not difficult! As it turns out, it's easy to suppress node-postgres involvement in both the pg→js direction (parsing) and the js→pg direction (printing).

To disable parsing, use the following:

require('pg').types.getTypeParser = () => x => x;

Note that this relies on undocumented code details, so upgrading node-postgres can very well break it.

An API-compliant way to do this, and perhaps the more obvious solution, is to set every OID parser to s => s. However, this doesn't suppress some parsing behaviour such as array parsing.

As for disabling printing (js→pg), nothing has to be done. As it turns out, one can pass parameter bindings to node-postgres as arrays of PostgreSQL expressions. 👌

"I lean towards value-based codecs"

@JordanMartinez What do you mean by this?

@Quelklef
Copy link
Author

Oh, value-based as in

fromSQLValue :: Parser a -> String -> Either Err a

instead of

fromSqlValue :: FromSQLValue a => String -> Either Err a

@Quelklef
Copy link
Author

I continued work on my personal bindings. As it turns out, a number of the designs I suggested have not-so-obvious issues.

I have landed upon a design which:

  • Does not rely on node-postgres for parsing (the original point of this GitHub issue)
  • Keeps all expression-parsing behind the scenes, which should make it easier to turn the library backend-agnostic in the future
  • Actually seems to work: I have successfully compiled it and am using it for a personal project
  • Is a touch awkward

The idea is basically as follows. Start with

class FromSQLValue a where
  fromSQLValue :: String -> Either String a  -- parse a PostgreSQL expression

instance FromSQLValue String where {- ... -}
-- ... other instances

Because the client code shouldn't be thinking about parsing at all, wrap this in an opaque newtype:

class FromSQLValue a where
  fromSQLValueImplementation :: Impl a

-- type is exported, constructor is not
newtype Impl a = Impl (String -> Either String a)

instance FromSQLValue String where {- ... -}
-- ... other instances

But, because the Impl constructor isn't exported, how are clients supposed to create instances? They use:

mkImpl :: forall a r. FromSQLValue a => (a -> Either String r) -> Impl r

This allows instances like the following:

newtype Box = Box { height :: Number, width :: Number }
instance FromSQLValue Box where
  fromSQLValueImplementation = mkImpl $
    \(width /\ height) ->
      if width < 0.0 || height < 0.0
      then Left "Box with negative dimensions"
      else Right (Box { width, height })

This implementation uses the built-in FromSQLValue instance for Number in order to define its own for Box.

Note that mkImpl accepts as input not an expression parser, but a value parser. This means that the client code never sees the actual PostgreSQL expression. This is nice because (1) we are lowering cognitive load on the end-programmer; and (2) never exposing the actual expressions leaves more room for going backend-agnostic in the future.

The main entrypoint is

fromPg :: FromSQLValue a => String -> Either String a

Please see my current implementation here. (Note that I name the class FromPg instead of FromSQLValue)

@JordanMartinez
Copy link
Contributor

@paluh
Copy link
Collaborator

paluh commented Nov 9, 2021

Hi @Quelklef,

Thanks a lot for all the description and links to your reference implementation. It seems that I won't be able to dive into your solution because it seems that it would be in general major change and I just don't have time before the upcomming release of this lib to do this kind of revolution :-(

I think that we want to provide some better solution for parsing layer and composite types handling in the future so I'm going to leave this issue opened for now and maybe attach to it some appropriate tag.

Once again thanks and sorry for this lame feedback from my side :-(

@Quelklef
Copy link
Author

Hey @paluh!

Completely understandable! It's a pretty large effort ask to finalize a design (especially given my slew of open-ended suggestions), and then implement it. No hard feelings that there aren't the resources to do this right now

With any luck, I will continue thinking about this on my own time, and either fork purescript-postgresql-client or release my own client. But we'll see =)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants