GH-2: PostgreSQL database support #6

klown · 2020-12-17T17:39:01Z

WIP

This code has two main Infusion components:

components to handle basic requests to add, retrieve, and modify records in a PostgreSQL database.
components to handle higher level operations supporting the data model as documented in "Future Data Model"
- Phase I: support retrieval from the model data tables.

CLAassistant · 2020-12-17T17:39:06Z

All committers have signed the CLA.

cindyli · 2021-02-17T19:31:24Z

doc/DatabaseOperations.md

+
+#### Operations API
+
+##### `createOneTable(tabelDef, deleteExisting)`


The advantage of switching to PostgresSQL is the flexibility of SQL that can join any number of tables. For example, to verify an access token associated with an OAuth client and a key, the SQL looks like:

SELECT authorizations.accessToken FROM authorizations, keys, clientCredentials WHERE keys.id = $inputKey AND clientCredentials.oauth2ClientId = $inputOauth2ClientId AND clientCredentials.oauth2ClientSecret = $inputOauth2ClientSecret AND keys.id = authorizations.key AND clientCredentials.oauth2ClientId = authorizations.clientCredentialId

When updating the preferences of a given key, the SQL could look like:

UPDATE prefsSafes SET preferences = $preferences FROM keys WHERE keys.id = $inputKey AND keys.prefsSafeId = prefsSafes.id

What I'm thinking is, instead of providing detailed APIs for creating/updating/inserting etc, it might be better to provide a simple API that takes in an input of a SQL statement and output the status of success/fail and the SQL result such as the retrieved records. This will save API users from translating a SQL to our API structure and save us from maintaining these APIs.

I'm not sure what the standard process is for this, so please take my comments with a grain of salt. I just worry that passing in the SQL directly wouldn't be RESTful, if that's something we'd want to maintain. Unless you mapped SQL commands to HTTP methods.

Also how does the security/permissions work. I'd imagine you'd want to lock down modifying the data, and for reading you may also want a system that can restrict portions of your preferences. e.g. only share my font size setting with my social media service.

I think that Cindy is more talking about architectural API structure rather than necessarily anything that is exposed in a free-form API to the outside world. I agree that directly mapping to Postgres' relational model is preferable since I think we are all clear who won the 1980s battle between object-oriented thinkers and relational ones : P

Note that if we really did want to take this further and expose more free-form structures to the outside world, GraphQL is available as an adaptation of relational queries to an HTTP transport, but this would be unnecessarily heavyweight for our system.

I agree with Cindy that we should have a minimum number of architectural layers between the HTTP transport and the query level - people who work with our system will thank us for it later! (And I know that this is at odds with ways we've approached such problems in the past, and also with still-current ideas about abstraction of storage - but as our lengthy technology evaluation in the GPII demonstrated, one's choice of storage engine is actually in practice one of the least abstractable parts of a system, and it is perfectly valid for us at this point to say that our implementation is pretty closely tied to the choice of Postgres because we have devoted so much effort to analysing all the possible consequences of that choice in different contexts).

I'm not sure what the standard process is for this, so please take my comments with a grain of salt. I just worry that passing in the SQL directly wouldn't be RESTful, if that's something we'd want to maintain. Unless you mapped SQL commands to HTTP methods.

@jobara, Antranig has answered this part. These APIs are cloud side internal APIs that will be called by the server scripts behind the firewall. They are not exposed to the public.

Also how does the security/permissions work. I'd imagine you'd want to lock down modifying the data, and for reading you may also want a system that can restrict portions of your preferences. e.g. only share my font size setting with my social media service.

The lock down when modifying the data is auto controlled by Postgres for concurrent data access. See Postgres explicit locking. In terms of accessing a portion of preferences, Postgres supports JSON and JSONB data types that a portion of JSON can be retrieved as preferences -> fontSize. This is one of the reasons that Postgres was chosen.

@cindy sorry, my mistake about the level that this is referring to. @amb26 thanks for the clarification. I agree, it would simplify the process for communicating to the database.

@cindyli the security question is probably irrelevant as I was thinking about this at a different level.

@cindyli @amb26

Oh. I was trying to shield users from having to deal with raw SQL. The idea behind the various API calls here was to capture frequent requests as a way of mitigating having to write complex SQL. Or even simple SQL for that matter. And I realize that is a gamble trying to guess the most frequent or popular such calls.

That said, if it's better to have an interface that permits any SQL statement, I can live with that. Here are a few ways forward.

The code currently uses Sequelize to interface between the Postgres database and node. Sequelize provides a raw query function. I'll check if it can handle your join example. If so, that might be a way to proceed.

A second approach is to use node's postgres module since it is a JavaScript wrapper for Postgres.

A third approach is some other node postgres module that I'm unaware of. If you can recommend a good one, I can look into it.

@jobara At the risk of beating a dead horse -- @amb26 and @cindyli have already answered, and you've acknowledged them -- users of the preferences server would not interact directly with the database. They would make RESTful GET or POST requests such as https://preferences.org/preferences/userid/retrieve?prefsSet=uio, or something along those lines. The details of that RESTful API are up for discussion in GH-5. Yes, the access needs to be secure and that would by handled by OAuth2 and/or login credentials.

A second approach is to use node's postgres module since it is a JavaScript wrapper for Postgres.

I also looked at node-postgres, which is the pg module on npmjs. It looks promising to me.

Node's postgres module ("pg") looks like an appropriate technology, especially since sequelize is advertising that it is short of maintainers. I don't think we will expose an interface which expects external users to write arbitrary SQL, but exposing intermediate operations which just abstractly create "a table" seems like a level of abstraction that they won't usually benefit from. We will write the SQL ourselves, and then package it behind an domain-appropriate API.

node-postgres it is then, @cindyli @amb26.

Modified the fluid.postgresdb.request component and its tests as approporiate.

Modified: - fluid.postgresdb.request for batch table defintion, - migrate the table definition files from Sequelize to SQL, - unit tests for the request component, - unit tests for the table defintions.

Changed name of function to bulk create tables `createTables()`, to the generic `bulkQuery()`.

Changed table definition files to be one line instead of an array of lines and updated tests as approprite.

Basic operations CREATE, INSERT, UPDATE, DELETE, and TRUNCATE

klown · 2021-03-11T20:31:39Z

@cindyli @amb26 @jobara

Here is the conversion of the postgres access from Sequelize to node-postgres -- ready for review.

Note the the data model is still in flux, as discussed at today's meeting. Still, I did try to remove all the extraneous material from the documentation and fulfill the "ToDos" from the last time it was edited. The result is at dataModel.md. The test data for these models needs work to bring into line with this documentation, but since the data model is likely changing, I have not worked on that yet.

If you feel brave, I would like to know if npm test works on a fresh machine. But, but be forewarned: it runs a shell script that will download a Docker image of Postgres13 and use that for the unit tests. The Docker image is about 160 MB. Also, I've only tested on my Mac laptop -- I have no idea if it will work on a Windows machine.

cindyli · 2021-03-12T21:44:03Z

src/database/scripts/dropDockerTestDatabase.sh

+PGPORT=${PGPORT:-5432}
+PGUSER=${PGUSER:-"admin"}
+
+PGDATABASE="prefs_testdb"


Looking at package.json where the test task is defined, as the PGDATABASE variable is configurable via an environment variable, it probably should be defined through PGDATABASE=${PGDATABASE:-"prefs_testdb"} instead of being hardcoded.

Done. Also, the issue with running export PGDATABASE=prefs_testdb PGPORT=5433; npm run startTestServer should be fixed.

You probably are aware that tests are missing for some APIs such as bulkQuery, loadFromJSON etc.

The tests are/were there. The test for loadFromJSON() was at line 217. , The one for bulkQuery() was at line 168. I've changed the test names to make it more obvious with the latest code.

Can you add a section in README about how to run tests and warn about running test will download a postgres docker image, as what you did in the comment?

Sure.

cindyli · 2021-03-12T21:45:16Z

I hit this error when running export PGDATABASE=prefs_testdb PGPORT=5433; npm run startTestServer

Status: Downloaded newer image for postgres:13.1-alpine
ab9705fb9d34800d0caa2d1f7c0b6f1268a2d4432f5a86b3b758dc018c41d891
2021-03-12 16:38:32 - Checking that PostgresDB is ready...
/var/run/postgresql:5433 - no response
/var/run/postgresql:5433 - accepting connections
2021-03-12 16:38:35 - Creating prefs_testdb database ...
2021-03-12 16:38:35 - createdb: error: could not connect to database template1: FATAL:  role "admin" does not exist
npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! preferencesServer@0.1.0 startTestServer: `./src/database/scripts/startDockerPostgres.sh`
npm ERR! Exit status 1
npm ERR! 
npm ERR! Failed at the preferencesServer@0.1.0 startTestServer script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

npm ERR! A complete log of this run can be found in:
npm ERR!     /Users/cindyli/.npm/_logs/2021-03-12T21_38_35_247Z-debug.log

cindyli · 2021-03-12T21:47:13Z

doc/DatabaseOperations.md

+
+#### Operations API
+
+##### `query(queryString)` (Invoker)


The name query gives an impression of handling SELECT statement. Since this function is used for executing all kind of SQL statements. Do you think if it's better to call it executeSQL()?

The name is from the node-postgres library itself. Its query() method is used for any SQL statement, not just SELECT. Around the third time that I had typed that.pool.query(), I decided that a wrapper for it would simplify the situation; hence that.query().

With the move to using JavaScript classes, the method is now inherited. However, since I think it's a good idea to attach a rejection handler for logging errors, I've renamed the wrapper to runSql().

cindyli · 2021-03-12T21:56:33Z

doc/DatabaseOperations.md

+The returned `Promise` is configured with a rejection handler that logs any
+error.
+
+##### `bulkQuery(queryArray)` (Invoker)


Probably rename the function to executeSQLsFromArray()?

Okay, how about runSqlArray()?

runSqlArray() sounds good.

cindyli · 2021-03-12T21:58:02Z

You probably are aware that tests are missing for some APIs such as bulkQuery, loadFromJSON etc.

cindyli · 2021-03-12T22:04:02Z

doc/DatabaseOperations.md

+function, but where each element is quoted with a single quote.  Also, if
+a value is an object, it is stringified as well as quoted.
+
+##### `fluid.postgresdb.maybeStringify(aValue)` (Function)


What's this API for? Questions I have for two data types it stringifies:

Object. When an object is given, would it be possible that the user wants to insert as a JSON data type rather than a string type?

Array. What the intention for replacing the square brackets with curly brackets?

This API was part of the loadFromJSON() machinery, and is no longer used given @amb26's comment/suggestion below. As was said there, it's not advisable to create these strings by hand. I've replaced the JSON-to-SQL translation code to use node-postgres-format.

Still, the purpose was to get the quotations and other special symbols correct for JSONB and Array column types. The reason for the curly brackets is that is a way to INSERT into an array, for example:
INSERT INTO "tableName" (roles) VALUES ('{user,admin}')";. Here the roles column is an array and the values are specified by '{user,admin}'. Stringifying alone gives '[user,admin]', which is incorrect SQL. However, that's all moot since node-postgres-format does the conversion now.

cindyli · 2021-03-12T22:07:25Z

src/database/src/js/postGresOperations.js

+ * errors.
+ *
+ * @param {Object} pool - node-postgres Pool instance to use for querying.
+ * @param {String} queryString - SQL query.


Be careful with using "SQL query" when what you mean is "SQL statement". As mentioned earlier, the word "query" gives an impression of SELECT statement.

This description probably can be changed to: A string that contains one or more SQL statements.

This comment applies to other places where "SQL query" is used. Thanks.

Right, I've adjusted the comments as appropriate.

cindyli · 2021-03-12T22:17:39Z

Can you add a section in README about how to run tests and warn about running test will download a postgres docker image, as what you did in the comment?

amb26 · 2021-03-12T23:53:38Z

src/database/src/js/postGresOperations.js

+fluid.postgresdb.loadFromJSON = function (that, tableName, jsonArray) {
+    var insertions = [];
+    fluid.each(jsonArray, function (aRecord) {
+        var columns = fluid.postgresdb.stringifyJoinKeys(Object.keys(aRecord));


As the node-postgres docs explain, raw string-bashing for assembling queries like this is highly dangerous. You should apply a parameterised query as documented here: https://node-postgres.com/features/queries#parameterized-query
As @cindyli has suggested, the various "stringify" functions below are problematic and should be removed

amb26 · 2021-03-12T23:57:01Z

src/database/src/js/postGresOperations.js

+ * @param {Any} aValue - Value to stringify if necessary.
+ * @return {Any|String} - The value as is, or stringified.
+ */
+fluid.postgresdb.maybeStringify = function (aValue) {


Conditional conversion functions like this are a code smell and likely an injection risk. It should be the responsibility of the caller to convert data into a form which is already suitable to be inserted into an SQL table rather than relying on some implicit expectations.

jobara · 2021-03-13T20:08:05Z

@klown I wasn't able to get the tests to run locally on my Mac. I've sent you a PR to update linting to fluid-lint-all, add CI configuration, and clean up the dependencies.

amb26 · 2021-04-21T15:02:14Z

src/database/data/tableModels.js

+
+fluid.registerNamespace("fluid.postgresdb");
+
+fluid.postgresdb.tableDefinitions = `


Could/should these be stored in plain .sql text files rather than be embedded in code?

They could and it should be supported. Storing in .js files makes it easy to load using require(), and allows JavaScript template strings. But, I can see the need for inputting raw SQL from a .sql file.

I've looked through node-postgres and can't find anything like a readSqlFromFile(), so, I'll add that as a method. It should be short with a call to fs.readFile() or fs.readFileSync(). Maybe better is a runSqlFromFile() method that loads the SQL and then executes it.

klown added 12 commits December 3, 2020 11:18

feat: add common database operations

6181767

fix: lint and link errors

eff82b4

fix: no default host, port, etc. in base grades

d313947

feat: add actual table definitions based on data model documentation

1fc70a5

fix: add constraints to table columns and adjust data as needed

458588e

Merge branch 'basic-postgres-support' into datamodels-postgres

5c88a0c

fix: links to data model documentation

24a62d1

feat: add user_keys (gpii_keys) table definition (for posterity)

9648110

chore: merge branch 'datamodels-postgres' into basic-postgres-support

6a050c9

fix: update database operations documentation

8fe896c

fix: table definitions, test data; added constants and test utilities

beada9d

feat: database operations for data models -- part I: retrieval

e63d315

klown added 5 commits December 17, 2020 21:38

feat: add fetch of authorization and credentials given access token

0616d8e

fix: proper case for require() of 'databaseConstants.js'

d55326a

fix: improve support for data models -- part I: retrieval

c61c5ef

fix: changes to database ops and models for oauth2

669e8e1

fix: changes to database constants for oauth2

da80e93

cindyli reviewed Feb 17, 2021

View reviewed changes

klown added 11 commits February 21, 2021 01:01

feat: use node-posgres, 'pg', for connections and queries

3a4bf3a

Modified the fluid.postgresdb.request component and its tests as approporiate.

feat: use node-posgres, 'pg', for connections and queries

ae1f9eb

Modified: - fluid.postgresdb.request for batch table defintion, - migrate the table definition files from Sequelize to SQL, - unit tests for the request component, - unit tests for the table defintions.

feat: use node-posgres, 'pg', for connections and queries

e99d3c9

Changed name of function to bulk create tables `createTables()`, to the generic `bulkQuery()`.

feat: use node-posgres, 'pg', for connections and queries

90e98fc

Changed table definition files to be one line instead of an array of lines and updated tests as approprite.

fix: modify environment variables to match standard postgres

4ee7319

feat: use node-postgres (pg): table creation and modification

d6f8a98

feat: use node-postgres (pg): table creation and modification

fa80319

feat: use node-postgres (pg): table creation and modification

72da20f

Basic operations CREATE, INSERT, UPDATE, DELETE, and TRUNCATE

fix: add unit tests for "stringifyJoin" functions

f7564af

fix: date comparison in tests and other tweaks

107f8b7

fix: remove "IF NOT EXISTS" clause from table defintions

1c1f498

klown added 2 commits March 11, 2021 10:48

fix: update data models documentation and implementation

05742c4

fix: spelling, grammar, comment, and lint errors

5f1523b

klown marked this pull request as ready for review March 11, 2021 20:31

fix: name of database option in request component

02fb74c

cindyli reviewed Mar 12, 2021

View reviewed changes

amb26 reviewed Mar 12, 2021

View reviewed changes

klown mentioned this pull request Mar 16, 2021

GH-2: Update linting and add initial CI configs klown/personal-data-server#1

Merged

klown and others added 8 commits March 17, 2021 14:26

fix: environment variables for user and database names

1b33cf1

feat: use ES6 classes and promises instead of fluid infusion

8ffe27d

fix: add and document support for parameterized values

63d0cbe

style: Update to fluid-lint-all; perform linting.

47f1fb5

ci: Add CI configuration

fb8cf11

build: Update dependencies

773999a

fluid-projectGH-2: Update dependencies

d561734

fix: fix lint errors

fbe79eb

amb26 reviewed Apr 21, 2021

View reviewed changes

klown added 3 commits April 26, 2021 16:05

fix: update version of fluid-lint for ignoring .DS_Store files

b6491aa

feat: add ability to load and run sql statements from a file

7878f3b

feat: add node script for running sql statements from file or stdin

de61338

amb26 merged commit f9a5c87 into fluid-project:main Feb 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-2: PostgreSQL database support #6

GH-2: PostgreSQL database support #6

klown commented Dec 17, 2020 •

edited

Loading

CLAassistant commented Dec 17, 2020 •

edited

Loading

cindyli Feb 17, 2021

jobara Feb 17, 2021

amb26 Feb 17, 2021

cindyli Feb 17, 2021

jobara Feb 17, 2021

klown Feb 18, 2021 •

edited

Loading

klown Feb 18, 2021 •

edited

Loading

cindyli Feb 18, 2021

amb26 Feb 18, 2021

klown Feb 18, 2021

klown commented Mar 11, 2021

cindyli Mar 12, 2021

klown Apr 1, 2021

cindyli commented Mar 12, 2021

cindyli Mar 12, 2021

klown Apr 1, 2021

cindyli Mar 12, 2021

klown Apr 1, 2021

cindyli Apr 1, 2021

cindyli commented Mar 12, 2021

cindyli Mar 12, 2021

klown Apr 1, 2021

cindyli Mar 12, 2021

klown Apr 1, 2021

cindyli commented Mar 12, 2021

amb26 Mar 12, 2021

amb26 Mar 12, 2021

jobara commented Mar 13, 2021

amb26 Apr 21, 2021

klown Apr 23, 2021


		#### Operations API

		##### `createOneTable(tabelDef, deleteExisting)`


		fluid.registerNamespace("fluid.postgresdb");

		fluid.postgresdb.tableDefinitions = `

GH-2: PostgreSQL database support #6

GH-2: PostgreSQL database support #6

Conversation

klown commented Dec 17, 2020 • edited Loading

CLAassistant commented Dec 17, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

klown Feb 18, 2021 • edited Loading

Choose a reason for hiding this comment

klown Feb 18, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

klown commented Mar 11, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cindyli commented Mar 12, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cindyli commented Mar 12, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cindyli commented Mar 12, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jobara commented Mar 13, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

klown commented Dec 17, 2020 •

edited

Loading

CLAassistant commented Dec 17, 2020 •

edited

Loading

klown Feb 18, 2021 •

edited

Loading

klown Feb 18, 2021 •

edited

Loading