Skip to content
This repository has been archived by the owner on Dec 22, 2020. It is now read-only.

GridFS Support #90

Open
apocolipse opened this issue Apr 9, 2015 · 1 comment
Open

GridFS Support #90

apocolipse opened this issue Apr 9, 2015 · 1 comment

Comments

@apocolipse
Copy link

I'm curious if you've looked into GridFS support, being that gridfs is split across 2 collections, they're consistently named (fs.files, fs.chunks), and the standalone adapter for gridfs file getting (by filename or id), I think it merits its own functionality, rather than just mappign both collections to postgres and trying to do assembly on that side. I did some preliminary testing to see if it could work (using '$gridfs' special as a source to trigger gridfs, and then using orig document to grab gridfs file by id)

I'm currently running into some issues with encoding however, some imports succeeding (large plaintext files, some pdfs) and then failing at one point on others on

# in transform_to_copy()
'join': incompatible character encodings: UTF-8 and ASCII-8BIT (Encoding::CompatibilityError)

my modification of fetch_special_source():

def fetch_special_source(db, ns, obj, source, original)
      case source
      when "$timestamp"
        Sequel.function(:now)
      when "$gridfs"
        dbname, collection = ns.split(".", 2)
        if collection == 'fs.files'
          grid = Grid.new(db)
          file = grid.get(original["_id"])
          Sequel::SQL::Blob.new(file.read)
        end
      when /^\$exists (.+)/
        # We need to look in the cloned original object, not in the version that
        # has had some fields deleted.
        fetch_exists(original, $1)
      else
        raise SchemaError.new("Unknown source: #{source}")
      end
    end

(I also tried various combinations of hex transforms and utf8 encoding, it still ended up eventually giving me that ASCII error, for reference my column type its inserting into is BYTEA)

Also, I had to add db adapter arguments in all methods up from fetch_special_source() in shema.rb to import_collections() in streamer.rb inorder to create the gridfs object instance in fetch_special_source(), this seems bad, recommendation for where to stick it?

@nelhage
Copy link
Contributor

nelhage commented Apr 9, 2015

Hey – I haven't looked at implement gridfs support, since I don't use it anywhere.

I agree that adding support might be useful, and I'd consider a PR. It'd probably be easier to review a strawman PR than try to speculate about the code via a description.

hex-encoding the binary data is probably the way forward to fix the encoding issue, but I'd try to replicate it in a test and then add a bunch of debug prints or thereabouts to understand what's going on.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants