Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Arrays #33

Closed
wants to merge 11 commits into from

5 participants

@solidsnack

Sorry for such a large patch. This integrates some of Bas's work on Postgres arrays. The parser and pretty-printer are much expanded.

@joeyadams

Nice work!

However, I wonder if, instead of hanging the FromField and ToField instances directly on Seq, we should use a newtype wrapper instead, e.g.:

newtype PGArray a = PGArray [a]

instance (FromField a, Typeable a) => FromField (PGArray a) where

Because:

  • Due to PostgreSQL array's multidimensional constraint and other quirks, PostgreSQL arrays don't really have the same semantics as Seq a, [a], or even Array i a.

  • If someone wants to overload Seq to support a more consistent representation of arrays (e.g. JSON), they won't be able to.

  • If we want to support explicit dimensions like '[3:5]={3,4,5}'::int[] in the future, we can extend PGArray, but we can't extend Seq.

  • This avoids an unnecessary conversion to Seq a.

@solidsnack

Well, with my approach it is possible to parse [a], [[a]] and so on up to six braces deep. We would need to do something very different for PGArray ... to be able to handle that kind of structure.

I do not think it is unreasonable for Postgres arrays with explicit indices to fail to parse for Seq, Vector or []. An absolutely sound, reliable mapping of Postgres types to Haskell types is too much to ask for -- and would not reduce one's programming effort, in any event (we always have to handle the not-Ok case).

@joeyadams

Good point. But can't we just have PGArray fill the same role as Seq currently does here? To support multidimensional arrays, stack PGArray type constructors like we stack Seq. Or is Seq used for its performance characteristics rather than just its name?

In either case, I guess it's not a big deal.

@solidsnack

We're using Vector internally, in fact; and I meant to push a commit which used Vector. If we weren't compelled to use Seq by the names/instances issue, then we'd likely have used []; but it certainly is a happy marriage of performant data-type and clean instances. It is probably just as well not to make a new collection type.

@lpsmith
Owner

Yeah, I would like to see instances for Vector rather than Seq. What's the name/instances issue that you are referring to?

Also, how hard would it be to handle the non-comma separator issue that Joey Adams pointed out on the mailing list?

I'm also thinking that the BuiltinTypes module should be redesigned, probably renamed to TypeInfo and the interface changed a bit. I'm thinking what we really need is:

  1. A TypeInfo constant for a number of datatypes, accessible directly and not through a function of BuiltinTypes Enumeration -> TypeInfo

  2. A function that maps TypeOids to TypeInfos

I'm not so sure we really need an enumeration type like there currently is.

I'm perfectly willing on doing this overhaul myself, though.

@solidsnack

Yeah, I would like to see instances for Vector rather than Seq.

I've pushed these instances to the array branch and they should end up as part of this pull request.

What's the name/instances issue that you are referring to?

That an instance for [a] would overlap with String.

Also, how hard would it be to handle the non-comma separator issue that Joey Adams pointed out on the mailing list?

The parser accepts the parameter as a delimiter.

https://github.com/solidsnack/postgresql-simple/blob/arrays/src/Database/PostgreSQL/Simple/Arrays.hs#L36

For nested arrays, commas are used; but when it comes time to parse values, the delimiter that's passed in is used. In the instance definition, the delimiter is hardcoded to a comma ,; but the type information could be used instead.

https://github.com/solidsnack/postgresql-simple/blob/arrays/src/Database/PostgreSQL/Simple/FromField.hs#L251

Each datatype has a delimiter type.

I think support for arrays of atomic types -- uuid, int4, bytea and many others -- would be fine in a feature release. Getting the GIS types right could be a quite a bit more work, as you suggest.

I wonder if there is some way to make the database handle all this stuff. For example, registering a prepared statement that executes any SQL at all and translates the result to JSON.

@lpsmith
Owner

Well, this is going to be a feature release, I'll call it 0.3 and hopefully we can have it ready to go within a few weeks :) Speaking of, @sopvop, if you could fix up the SqlError type as you see fit, that would also be a nice thing to add to 0.3. (And your beginnings of a module to interpret SqlError, too)

Thanks for all the work!

@sopvop

Arrays is the feature I'm looking forward to! (composite types would also be awesome)

I should have time for SqlError type next week.
As on error parsing, I don't know how any other types of errors can be parsed for info useful at runtime, Trigger action and plpgsql exeptions are useful at runtime, but their message bodies are user defined. All other errors are about fixing client code. Maybe naming module 'Errors' was not a good Idea considering it's limited scope.

@lpsmith
Owner

Ok, I pushed a re-engineered BuiltinTypes module to my arrays branch, though it isn't hooked into anything else yet.

https://github.com/lpsmith/postgresql-simple/blob/arrays/src/Database/PostgreSQL/Simple/TypeInfo.hs

@lpsmith lpsmith closed this
@lpsmith lpsmith reopened this
@solidsnack

Added a comment to the TypeInfo module -- not sure how you receive notifications for that. It stands to reason that TypeInfo should include typarray explicitly, so one can go up as well as down...

@lpsmith
Owner

I did get a notification for your comment, actually. When I looked at the pg_type table and thought about going crazy pulling in a lot of information, but then decided to keep to things that we have a specific use case for.

So, out of curiosity, do you have a specific use case in mind for this information?

@lpsmith
Owner

Though I didn't think about compatibility across different PostgreSQL versions until this morning. I guess I have two issues:

  1. How static is this data, really? (And what are the consequences of getting this data wrong? Clearly that depends on the column: getting the OID wrong would almost certainly cause problems, though small changes in the typcategory might not matter so much.)

  2. Also the pg_type table has evolved (looks like mostly additions, though). For example, the typcategory looks like it was added in 8.4, while typarray looks like it was added in 8.3.

Unfortunately I want to support versions all the way back to 8.1... Though it's no longer supported by the community, 8.1 is still supported by Red Hat and there are still a number of software packages that depend on it. (Actually, I don't personally care about 8.3 or 8.2, so I'm not going to be testing with these versions, though they are something I'd like to support.)

typarray is an easy enough to emulate, as the information is already available in the type table though not as directly available in older versions. It appears that typcategory is new information though... on the other hand I'm not using any types in 8.1 that aren't in the static type table, so maybe this isn't actually an issue that impacts me personally, and the level of compatibility that this change would currently create would be good enough.

Also, adding typarray would require pulling more types into the static table, which is fine by me but does bring up the issue of more things that can go wrong.

Perhaps we should add a utility function in Internal or somewhere that sanity-checks the static TypeInfo table? Or maybe this belongs as part of the test suite...

@sopvop

@lpsmith how about merging all the pull requests into 0.3 branch, without releasing it on hackage?

@lpsmith
Owner

@sopvop; i just did that, sorry about the delay

@lpsmith
Owner

@solidsnack, do you have any opinions on whether or not the Database.PostgreSQL.Simple.Arrays module should be part of the public interface? I'm also interested in hearing from other interested parties.

@solidsnack

@lpsmith I believe Database.PostgreSQL.Simple.Arrays should indeed be part of the public interface. There's nothing unsafe about it and it will be helpful to people wanting display arrays in diagnostic messages or debug the library should it be misbehaving.

I realize this code is rather old now. The new stuff is here: https://github.com/erudify/postgresql-simple/tree/arrays

I would be interested to know more about your strategy for getting this merged. I guess there are bunch of features planned for 0.3 but isn't this orthogonal to them? It would be of immediate value to most people using Postgres, I expect.

@lpsmith
Owner

This pull request has already been merged into the 0.3 branch. I didn't see the updates in erudify's branch until now, but a quick persual suggests that not that much has changed.

Unfortunately the FromField instance in the 0.3 branch is currently broken, as I broke it in a rush to allow FromField instances to perform IO actions and thus stop pre-computing the typnames. I added a few failing test cases to the test suite to document the problem, but this does need to be fixed, and is one of a few small tasks blocking the release of 0.3.

The work I did ~2 months ago did get the 0.3 branch a great deal closer to release-ready state, but I'll also admit this particular release isn't a personal priority at the moment. I did write a comment for Bas's benefit that does describe the major task I'd still like to accomplish before the release here.

@solidsnack

Is a feature release -- say 0.2.5 -- for just the arrays out of the question?

@lpsmith
Owner

No, but I'm not sure how much work that would take versus completing what's there.

@solidsnack

Maybe not much... #56

@solidsnack solidsnack closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
This page is out of date. Refresh to see the latest.
View
2  postgresql-simple.cabal
@@ -21,6 +21,7 @@ Library
hs-source-dirs: src
Exposed-modules:
Database.PostgreSQL.Simple
+ Database.PostgreSQL.Simple.Arrays
Database.PostgreSQL.Simple.BuiltinTypes
Database.PostgreSQL.Simple.FromField
Database.PostgreSQL.Simple.FromRow
@@ -86,6 +87,7 @@ test-suite test
, OverloadedStrings
, Rank2Types
, RecordWildCards
+ , PatternGuards
build-depends: base
, base16-bytestring
View
56 src/Database/PostgreSQL/Simple.hs
@@ -4,6 +4,7 @@
{-# LANGUAGE PatternGuards #-}
{-# LANGUAGE RecordWildCards #-}
{-# LANGUAGE ViewPatterns #-}
+{-# LANGUAGE QuasiQuotes #-}
------------------------------------------------------------------------------
-- |
@@ -140,6 +141,7 @@ import Database.PostgreSQL.Simple.ToRow (ToRow(..))
import Database.PostgreSQL.Simple.Types
( Binary(..), In(..), Only(..), Query(..), (:.)(..) )
import Database.PostgreSQL.Simple.Internal as Base
+import Database.PostgreSQL.Simple.SqlQQ (sql)
import qualified Database.PostgreSQL.LibPQ as PQ
import qualified Data.ByteString.Char8 as B
import qualified Data.Text as T
@@ -549,19 +551,19 @@ finishQuery conn q result = do
PQ.TuplesOk -> do
ncols <- PQ.nfields result
let unCol (PQ.Col x) = fromIntegral x :: Int
- typenames <- V.generateM (unCol ncols)
+ typeinfos <- V.generateM (unCol ncols)
(\(PQ.Col . fromIntegral -> col) -> do
- getTypename conn =<< PQ.ftype result col)
+ getTypeInfo conn =<< PQ.ftype result col)
nrows <- PQ.ntuples result
ncols <- PQ.nfields result
forM' 0 (nrows-1) $ \row -> do
- let rw = Row row typenames result
+ let rw = Row row typeinfos result
case runStateT (runReaderT (unRP fromRow) rw) 0 of
Ok (val,col) | col == ncols -> return val
| otherwise -> do
vals <- forM' 0 (ncols-1) $ \c -> do
v <- PQ.getvalue result row c
- return ( typenames V.! unCol c
+ return ( typeinfos V.! unCol c
, fmap ellipsis v )
throw (ConversionFailed
(show (unCol ncols) ++ " values: " ++ show vals)
@@ -963,24 +965,36 @@ fmtError msg q xs = throw FormatError {
-- wrong results. In such cases, write a @newtype@ wrapper and a
-- custom 'Result' instance to handle your encoding.
-getTypename :: Connection -> PQ.Oid -> IO ByteString
-getTypename conn@Connection{..} oid =
+getTypeInfo :: Connection -> PQ.Oid -> IO TypeInfo
+getTypeInfo conn@Connection{..} oid =
case oid2typname oid of
- Just name -> return name
+ Just name -> return $! TypeInfo { typ = NamedOid oid name
+ , typelem = Nothing
+ }
Nothing -> modifyMVar connectionObjects $ \oidmap -> do
case IntMap.lookup (oid2int oid) oidmap of
- Just name -> return (oidmap, name)
+ Just typeinfo -> return (oidmap, typeinfo)
Nothing -> do
- names <- query conn "SELECT typname FROM pg_type WHERE oid=?"
- (Only oid)
- name <- case names of
- [] -> return $ throw SqlError {
- sqlNativeError = -1,
- sqlErrorMsg = "invalid type oid",
- sqlState = ""
- }
- [Only x] -> return x
- _ -> fail "typename query returned more than one result"
- -- oid is a primary key, so the query should
- -- never return more than one result
- return (IntMap.insert (oid2int oid) name oidmap, name)
+ names <- query conn
+ [sql| SELECT p.oid, p.typname, c.oid, c.typname
+ FROM pg_type AS p LEFT OUTER JOIN pg_type AS c
+ ON c.oid = p.typelem
+ WHERE p.oid = ?
+ |] (Only oid)
+ typinf <- case names of
+ [] -> return $ throw SqlError {
+ sqlNativeError = -1,
+ sqlErrorMsg = "invalid type oid",
+ sqlState = ""
+ }
+ [(pOid, pTypName, mbCOid, mbCTypName)] ->
+ return $! TypeInfo { typ = NamedOid pOid pTypName
+ , typelem = do
+ cOid <- mbCOid
+ cTypName <- mbCTypName
+ return $ NamedOid cOid cTypName
+ }
+ _ -> fail "typename query returned more than one result"
+ -- oid is a primary key, so the query should
+ -- never return more than one result
+ return (IntMap.insert (oid2int oid) typinf oidmap, typinf)
View
94 src/Database/PostgreSQL/Simple/Arrays.hs
@@ -0,0 +1,94 @@
+{-# LANGUAGE PatternGuards #-}
+
+------------------------------------------------------------------------------
+-- |
+-- Module: Database.PostgreSQL.Simple.Arrays
+-- Copyright: (c) 2012 Leon P Smith
+-- License: BSD3
+-- Maintainer: Leon P Smith <leon@melding-monads.com>
+-- Stability: experimental
+-- Portability: portable
+--
+-- A Postgres array parser and pretty-printer.
+------------------------------------------------------------------------------
+
+module Database.PostgreSQL.Simple.Arrays where
+
+import Control.Applicative (Applicative(..), Alternative(..), (<$>))
+import Data.ByteString.Char8 (ByteString)
+import qualified Data.ByteString.Char8 as B
+import Data.Monoid
+import Data.Attoparsec.Char8
+
+
+-- | Parse one of three primitive field formats: array, quoted and plain.
+arrayFormat :: Char -> Parser ArrayFormat
+arrayFormat delim = Array <$> array delim
+ <|> Plain <$> plain delim
+ <|> Quoted <$> quoted
+
+data ArrayFormat = Array [ArrayFormat]
+ | Plain ByteString
+ | Quoted ByteString
+ deriving (Eq, Show, Ord)
+
+array :: Char -> Parser [ArrayFormat]
+array delim = char '{' *> option [] (arrays <|> strings) <* char '}'
+ where
+ strings = sepBy1 (Quoted <$> quoted <|> Plain <$> plain delim) (char delim)
+ arrays = sepBy1 (Array <$> array delim) (char ',')
+ -- NB: Arrays seem to always be delimited by commas.
+
+-- | Recognizes a quoted string.
+quoted :: Parser ByteString
+quoted = char '"' *> option "" contents <* char '"'
+ where
+ esc = char '\\' *> (char '\\' <|> char '"')
+ unQ = takeWhile1 (notInClass "\"\\")
+ contents = mconcat <$> many (unQ <|> B.singleton <$> esc)
+
+-- | Recognizes a plain string literal, not containing quotes or brackets and
+-- not containing the delimiter character.
+plain :: Char -> Parser ByteString
+plain delim = takeWhile1 (notInClass (delim:"\"{}"))
+
+-- Mutually recursive 'fmt' and 'delimit' separate out value formatting
+-- from the subtleties of delimiting.
+
+-- | Format an array format item, using the delimiter character if the item is
+-- itself an array.
+fmt :: Char -> ArrayFormat -> ByteString
+fmt = fmt' False
+
+-- | Format a list of array format items, inserting the appropriate delimiter
+-- between them. When the items are arrays, they will be delimited with
+-- commas; otherwise, they are delimited with the passed-in-delimiter.
+delimit :: Char -> [ArrayFormat] -> ByteString
+delimit _ [] = ""
+delimit c [x] = fmt' True c x
+delimit c (x:y:z) = fmt' True c x `B.snoc` c' `mappend` delimit c (y:z)
+ where
+ c' | Array _ <- x = ','
+ | otherwise = c
+
+-- | Format an array format item, using the delimiter character if the item is
+-- itself an array, optionally applying quoting rules. Creates copies for
+-- safety when used in 'FromField' instances.
+fmt' :: Bool -> Char -> ArrayFormat -> ByteString
+fmt' quoting c x =
+ case x of
+ Array items -> '{' `B.cons` delimit c items `B.snoc` '}'
+ Plain bytes -> B.copy bytes
+ Quoted q | quoting -> '"' `B.cons` esc q `B.snoc` '"'
+ | otherwise -> B.copy q
+ -- NB: The 'snoc' and 'cons' functions always copy.
+
+-- | Escape a string according to Postgres double-quoted string format.
+esc :: ByteString -> ByteString
+esc = B.concatMap f
+ where
+ f '"' = "\\\""
+ f '\\' = "\\\\"
+ f c = B.singleton c
+ -- TODO: Implement easy performance improvements with unfoldr.
+
View
21 src/Database/PostgreSQL/Simple/FromField.hs
@@ -53,15 +53,19 @@ import Data.ByteString (ByteString)
import qualified Data.ByteString.Char8 as B
import Data.Int (Int16, Int32, Int64)
import Data.List (foldl')
+import Data.Maybe (fromMaybe)
import Data.Ratio (Ratio)
import Data.Time ( UTCTime, ZonedTime, LocalTime, Day, TimeOfDay )
import Data.Typeable (Typeable, typeOf)
+import Data.Vector (Vector)
+import qualified Data.Vector as V
import Data.Word (Word64)
import Database.PostgreSQL.Simple.Internal
import Database.PostgreSQL.Simple.BuiltinTypes
import Database.PostgreSQL.Simple.Ok
import Database.PostgreSQL.Simple.Types (Binary(..), Null(..))
import Database.PostgreSQL.Simple.Time
+import Database.PostgreSQL.Simple.Arrays
import qualified Database.PostgreSQL.LibPQ as PQ
import System.IO.Unsafe (unsafePerformIO)
import qualified Data.ByteString as SB
@@ -241,6 +245,21 @@ instance (FromField a, FromField b) => FromField (Either a b) where
fromField f dat = (Right <$> fromField f dat)
<|> (Left <$> fromField f dat)
+instance (FromField a, Typeable a) => FromField (Vector a) where
+ fromField f dat = either (returnError ConversionFailed f)
+ (V.fromList <$>)
+ (parseOnly (fromArray ',' f) (maybe "" id dat))
+
+fromArray :: (FromField a) => Char -> Field -> Parser (Ok [a])
+fromArray delim f = sequence . (parseIt <$>) <$> array delim
+ where
+ fElem = f{ typeinfo = TypeInfo tElem Nothing }
+ tInfo = typeinfo f
+ tElem = fromMaybe (typ tInfo) (typelem tInfo)
+ parseIt item = (fromField f' . Just . fmt delim) item
+ where f' | Array _ <- item = f
+ | otherwise = fElem
+
newtype Compat = Compat Word64
mkCompats :: [BuiltinType] -> Compat
@@ -270,7 +289,7 @@ doFromField :: forall a . (Typeable a)
=> Field -> Compat -> (ByteString -> Ok a)
-> Maybe ByteString -> Ok a
doFromField f types cvt (Just bs)
- | Just typ <- oid2builtin (typeOid f)
+ | Just typ <- oid2builtin (typoid $ typ $ typeinfo f)
, mkCompat typ `compat` types = cvt bs
| otherwise = returnError Incompatible f "types incompatible"
doFromField f _ _ _ = returnError UnexpectedNull f ""
View
6 src/Database/PostgreSQL/Simple/FromRow.hs
@@ -68,13 +68,13 @@ class FromRow a where
fieldWith :: FieldParser a -> RowParser a
fieldWith fieldP = RP $ do
let unCol (PQ.Col x) = fromIntegral x :: Int
- Row{..} <- ask
+ r@Row{..} <- ask
column <- lift get
lift (put (column + 1))
let ncols = nfields rowresult
if (column >= ncols)
then do
- let vals = map (\c -> ( typenames ! (unCol c)
+ let vals = map (\c -> ( typenames r ! (unCol c)
, fmap ellipsis (getvalue rowresult row c) ))
[0..ncols-1]
convertError = ConversionFailed
@@ -85,7 +85,7 @@ fieldWith fieldP = RP $ do
\convert and number in target type"
lift (lift (Errors [SomeException convertError]))
else do
- let typename = typenames ! unCol column
+ let typeinfo = typeinfos ! unCol column
result = rowresult
field = Field{..}
lift (lift (fieldP field (getvalue result row column)))
View
23 src/Database/PostgreSQL/Simple/Internal.hs
@@ -52,9 +52,20 @@ import System.IO.Unsafe (unsafePerformIO)
data Field = Field {
result :: !PQ.Result
, column :: {-# UNPACK #-} !PQ.Column
- , typename :: !ByteString
+ , typeinfo :: !TypeInfo
}
+data NamedOid = NamedOid { typoid :: !PQ.Oid
+ , typname :: !ByteString
+ } deriving Show
+
+data TypeInfo = TypeInfo { typ :: !NamedOid
+ , typelem :: !(Maybe NamedOid)
+ } deriving Show
+
+typename :: Field -> ByteString
+typename = typname . typ . typeinfo
+
name :: Field -> Maybe ByteString
name Field{..} = unsafePerformIO (PQ.fname result column)
@@ -71,12 +82,11 @@ format :: Field -> PQ.Format
format Field{..} = unsafePerformIO (PQ.fformat result column)
typeOid :: Field -> PQ.Oid
-typeOid Field{..} = unsafePerformIO (PQ.ftype result column)
-
+typeOid = typoid . typ . typeinfo
data Connection = Connection {
connectionHandle :: {-# UNPACK #-} !(MVar PQ.Connection)
- , connectionObjects :: {-# UNPACK #-} !(MVar (IntMap.IntMap ByteString))
+ , connectionObjects :: {-# UNPACK #-} !(MVar (IntMap.IntMap TypeInfo))
}
data SqlType
@@ -301,10 +311,13 @@ newNullConnection = do
data Row = Row {
row :: {-# UNPACK #-} !PQ.Row
- , typenames :: !(V.Vector ByteString)
+ , typeinfos :: !(V.Vector TypeInfo)
, rowresult :: !PQ.Result
}
+typenames :: Row -> V.Vector ByteString
+typenames = V.map (typname . typ) . typeinfos
+
newtype RowParser a = RP { unRP :: ReaderT Row (StateT PQ.Column Ok) a }
deriving ( Functor, Applicative, Alternative, Monad )
View
10 src/Database/PostgreSQL/Simple/ToField.hs
@@ -39,6 +39,8 @@ import qualified Data.ByteString.Lazy as LB
import qualified Data.Text as ST
import qualified Data.Text.Encoding as ST
import qualified Data.Text.Lazy as LT
+import Data.Vector (Vector)
+import qualified Data.Vector as V
import qualified Database.PostgreSQL.LibPQ as PQ
import Database.PostgreSQL.Simple.Time
@@ -221,6 +223,14 @@ instance ToField Date where
toField = Plain . inQuotes . dateToBuilder
{-# INLINE toField #-}
+instance (ToField a) => ToField (Vector a) where
+ toField xs = Many $
+ Plain (fromByteString "ARRAY[") :
+ (intersperse (Plain (fromChar ',')) . map toField $ V.toList xs) ++
+ [Plain (fromChar ']')]
+ -- Because the ARRAY[...] input syntax is being used, it is possible
+ -- that the use of type-specific separator characters is unnecessary.
+
-- | Surround a string with single-quote characters: \"@'@\"
--
-- This function /does not/ perform any other escaping.
Something went wrong with that request. Please try again.