Skip to content
This repository has been archived by the owner on Feb 12, 2023. It is now read-only.

Increase collected images accessibility / Store media files into database and server file system #164

Open
getodk-bot opened this issue Nov 28, 2017 · 18 comments

Comments

@getodk-bot
Copy link
Member

Issue by mathieubossaert
Friday Aug 04, 2017 at 15:26 GMT
Originally opened as getodk/getodk#1313 (6 comment(s))


Good evening,
It's so easy to collect images with Collect that it's a little bit frustrating not to be able easily access it once they are in Aggregate.
It would be a great enhancement if Aggragete could store medias files into the database and into the file system or if Aggregate could purpose a way to access stored files by unique url.

@getodk-bot
Copy link
Member Author

Comment by nilesr
Friday Aug 04, 2017 at 16:16 GMT


Sync client has a way of pulling down all the instance files for all of your submissions down to a folder on your computer, so you can view them however you like. I haven't tried it though, and it might be ODK2 only.

The relevant method is

getAllTableInstanceFilesFromUri(uri, appId, tableId, schemaETag, dirToSaveDataTo);

@getodk-bot
Copy link
Member Author

Comment by kusaasira
Sunday Oct 01, 2017 at 09:57 GMT


@mathieubossaert Was this resolved?

@getodk-bot
Copy link
Member Author

Comment by mathieubossaert
Monday Oct 02, 2017 at 06:35 GMT


No because we run ODK 1 and I did not have time to take a look at the method... I'll do it in e few weeks.
Thanks a lot.

@getodk-bot
Copy link
Member Author

Comment by kusaasira
Monday Oct 02, 2017 at 06:46 GMT


Okay then. Let me know when you attempt to.

You're welcome.

@getodk-bot
Copy link
Member Author

Comment by mathieubossaert
Wednesday Oct 18, 2017 at 14:35 GMT


@kusaasira Thanks to @DamienFRAZZONI we are close to solve this in PostgreSQL. This article help us to transform bytea field to "large object" and then take advantage of lo_export() function :

https://gbatumbya.wordpress.com/2011/06/15/create-a-large-object-lo-from-binary-data-bytea-on-server-side/

@getodk-bot
Copy link
Member Author

Comment by mathieubossaert
Wednesday Oct 18, 2017 at 15:16 GMT


@kusaasira Here is an example.

Let's create the CAST as in the page discoverd by @DamienFRAZZONI :
https://gbatumbya.wordpress.com/2011/06/15/create-a-large-object-lo-from-binary-data-bytea-on-server-side/

CREATE EXTENSION lo;
CREATE OR REPLACE FUNCTION blob_write(lbytea bytea)
   RETURNS oid
   VOLATILE
   LANGUAGE plpgsql as
$f$
   DECLARE
      loid oid;
      lfd INTEGER;
      lsize INTEGER;
BEGIN
   IF(lbytea IS NULL) THEN
      RETURN NULL;
   END IF;
 
   loid := lo_create(0);
   lfd := lo_open(loid,131072);
   lsize := lowrite(lfd,lbytea);
   PERFORM lo_close(lfd);
   RETURN loid;
END;
$f$;
CREATE CAST (bytea AS oid) WITH FUNCTION blob_write(bytea) AS ASSIGNMENT;

PostgrSQL needs the oid of the object so we can't use a view, then we add a column to the "..._BLB" table and a trigger to populate the "lo" column :

ALTER TABLE odk."FORM_SAISIE_OBS_ORTHO_V1_1_PHOTO_BLB" ADD COLUMN lo oid;
UPDATE odk."FORM_SAISIE_OBS_ORTHO_V1_1_PHOTO_BLB" SET lo="VALUE"

SELECT lo_export("VALUE"::oid, '/home/my_repo/'|| to_char(blb."_LAST_UPDATE_DATE", 'YYYYMMDDHH24MISS') || '_' || "UNROOTED_FILE_PATH")
FROM odk."FORM_SAISIE_OBS_ORTHO_V1_1_PHOTO_BN" bn
JOIN odk."FORM_SAISIE_OBS_ORTHO_V1_1_PHOTO_REF" ref ON bn."_URI" = ref."_DOM_AURI" 
JOIN odk."FORM_SAISIE_OBS_ORTHO_V1_1_PHOTO_BLB" blb on blb."_URI" = ref."_SUB_AURI";

In fact we should populate the lo column, write the file on disk and then set to null the lo column.

It works great ! But in PostgreSQL.

@yanokwa yanokwa removed the Aggregate label Nov 28, 2017
@RemiDesgrange
Copy link

Hi,

what do you think about storing image to the disk, or, even better to an object store (amazon S3, minio, whatever...). I'm really not a big fan of storing images as blob in the database (even if large object are a great solution in postgresql).

@ggalmazor
Copy link
Contributor

Hi @RemiDesgrange! I agree with you on storing blobs in the database... I also don't like the idea and I don't know what consequences could this have in such a limited context as free-tier AppEngine that Aggregate needs to support.

Since we can neither rely on writing files to a local filesystem for compatibility reasons with AppEngine, your suggestion of using an external object store like S3 is a great idea :)

@RemiDesgrange
Copy link

RemiDesgrange commented Dec 7, 2017

And you can have abstractions around it. I didn't look at the code and my Java is a bit rusty but if I find the time I'll make a PR for file writing. S3 might require more work since you often want a presigned url for your image (secret bucket).

Do it exist an equivalent of S3 in google cloud world ?

@yanokwa
Copy link
Member

yanokwa commented Dec 7, 2017

@RemiDesgrange Yup. There's Google Cloud Storage. Pretty much all the cloud providers have some S3 like storage now.

@RemiDesgrange
Copy link

Yes ;-) I looked a bit at the code, where this kind of feature should go ? common ? And add a storage.properties file ?

@mathieubossaert
Copy link

Please let the choice. One of the forces of ODK is the possibility to set an independant, full aggregate / collect data solution outside the internet and without a great powerfull internet access.
Cloud services are not the same. From my point of view, too much cloud services to link make the solution more complex and tricky to maintain and more dependant

@RemiDesgrange
Copy link

My point was :

  • ability to store photo on the filesystem
  • since you can provide an abstraction around object store, also give the ability to store in it. In my first post I talk about minio. Minio is a S3 like, compatible with S3 API, object store.

My first point was : storing images in postgres is not the best idea when you deal with a lot of pictures, filesystem/objectstore are best candidate for that. Cloud provider provide it but you can have on premise for free.

@mathieubossaert
Copy link

Sorry I was focus on S3... I did not know minio and in fact such a tool can leave us autonomous :-)

@mathieubossaert
Copy link

mathieubossaert commented Jun 14, 2019

Hi to all. I reactivate this zombie discussion ;-)
Thanks to @RemiDesgrange I discovered and tried minio and I installed it to serve files from Aggregate. I still have some difficulties to avoid public bucket listing but it is a detail.
I Just create an automated and generic process (cron task) to query Aggregate about binary media columns from form tables and generate files from "large objects" :-).
Certainly it could be optimized !

https://framagit.org/mathieubossaert/sql_divers/snippets/3620

Image scalling down option will be hepful to avoid numerous too heavy file submission
https://docs.opendatakit.org/form-question-types/?highlight=select%20one#scaling-down-images

@kusaasira
Copy link

@mathieubossaert thanks to Google TranslateI managed to go through the link you provided up there. Anyway, I realise it is more specific to PostgreSQL. Let me try to rebuild a MySQL replica.

On the other hand, insteada of a cron job, a trigger could do for this case I guess.

You know how ODK Aggregate generates multiple tables for different forms dynamically? This must be specific per table. Otherwise kudos for the solution so far.

@mathieubossaert
Copy link

Thanks @kusaasira , I'm sorry you didn't saw that text is also in english.
And yes it is a PostgreSQL workaround to get files from blob fields.

Cron task is a better solution for me because I don't want to write a new trigger for each new form.

@derekmorr
Copy link

Has there been any progress on this issue? We are running into this problem with our ODK instance -- the PostgreSQL database keeps running out of memory when we try to load the ODK web UI as a data viewer. I definitely agree that storing images in a filesystem or object store such as S3 is the best approach. If they must be stored in PostgreSQL, then using a large object would be better than bytea as it would let us stream the results out and reduce memory usage.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants