Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-2: PostgreSQL database support #6

Merged
merged 44 commits into from
Feb 16, 2022

Conversation

klown
Copy link
Member

@klown klown commented Dec 17, 2020

WIP

This code has two main Infusion components:

  • components to handle basic requests to add, retrieve, and modify records in a PostgreSQL database.
  • components to handle higher level operations supporting the data model as documented in "Future Data Model"
    • Phase I: support retrieval from the model data tables.

@CLAassistant
Copy link

CLAassistant commented Dec 17, 2020

CLA assistant check
All committers have signed the CLA.


#### Operations API

##### `createOneTable(tabelDef, deleteExisting)`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The advantage of switching to PostgresSQL is the flexibility of SQL that can join any number of tables. For example, to verify an access token associated with an OAuth client and a key, the SQL looks like:

SELECT authorizations.accessToken 
FROM authorizations, keys, clientCredentials
WHERE keys.id = $inputKey
AND clientCredentials.oauth2ClientId = $inputOauth2ClientId
AND clientCredentials.oauth2ClientSecret = $inputOauth2ClientSecret
AND keys.id = authorizations.key
AND clientCredentials.oauth2ClientId = authorizations.clientCredentialId

When updating the preferences of a given key, the SQL could look like:

UPDATE prefsSafes
SET preferences = $preferences
FROM keys
WHERE keys.id = $inputKey
AND keys.prefsSafeId = prefsSafes.id

What I'm thinking is, instead of providing detailed APIs for creating/updating/inserting etc, it might be better to provide a simple API that takes in an input of a SQL statement and output the status of success/fail and the SQL result such as the retrieved records. This will save API users from translating a SQL to our API structure and save us from maintaining these APIs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the standard process is for this, so please take my comments with a grain of salt. I just worry that passing in the SQL directly wouldn't be RESTful, if that's something we'd want to maintain. Unless you mapped SQL commands to HTTP methods.

Also how does the security/permissions work. I'd imagine you'd want to lock down modifying the data, and for reading you may also want a system that can restrict portions of your preferences. e.g. only share my font size setting with my social media service.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that Cindy is more talking about architectural API structure rather than necessarily anything that is exposed in a free-form API to the outside world. I agree that directly mapping to Postgres' relational model is preferable since I think we are all clear who won the 1980s battle between object-oriented thinkers and relational ones : P

Note that if we really did want to take this further and expose more free-form structures to the outside world, GraphQL is available as an adaptation of relational queries to an HTTP transport, but this would be unnecessarily heavyweight for our system.

I agree with Cindy that we should have a minimum number of architectural layers between the HTTP transport and the query level - people who work with our system will thank us for it later! (And I know that this is at odds with ways we've approached such problems in the past, and also with still-current ideas about abstraction of storage - but as our lengthy technology evaluation in the GPII demonstrated, one's choice of storage engine is actually in practice one of the least abstractable parts of a system, and it is perfectly valid for us at this point to say that our implementation is pretty closely tied to the choice of Postgres because we have devoted so much effort to analysing all the possible consequences of that choice in different contexts).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the standard process is for this, so please take my comments with a grain of salt. I just worry that passing in the SQL directly wouldn't be RESTful, if that's something we'd want to maintain. Unless you mapped SQL commands to HTTP methods.

@jobara, Antranig has answered this part. These APIs are cloud side internal APIs that will be called by the server scripts behind the firewall. They are not exposed to the public.

Also how does the security/permissions work. I'd imagine you'd want to lock down modifying the data, and for reading you may also want a system that can restrict portions of your preferences. e.g. only share my font size setting with my social media service.

The lock down when modifying the data is auto controlled by Postgres for concurrent data access. See Postgres explicit locking. In terms of accessing a portion of preferences, Postgres supports JSON and JSONB data types that a portion of JSON can be retrieved as preferences -> fontSize. This is one of the reasons that Postgres was chosen.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cindy sorry, my mistake about the level that this is referring to. @amb26 thanks for the clarification. I agree, it would simplify the process for communicating to the database.

@cindyli the security question is probably irrelevant as I was thinking about this at a different level.

Copy link
Member Author

@klown klown Feb 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cindyli @amb26

Oh. I was trying to shield users from having to deal with raw SQL. The idea behind the various API calls here was to capture frequent requests as a way of mitigating having to write complex SQL. Or even simple SQL for that matter. And I realize that is a gamble trying to guess the most frequent or popular such calls.

That said, if it's better to have an interface that permits any SQL statement, I can live with that. Here are a few ways forward.

The code currently uses Sequelize to interface between the Postgres database and node. Sequelize provides a raw query function. I'll check if it can handle your join example. If so, that might be a way to proceed.

A second approach is to use node's postgres module since it is a JavaScript wrapper for Postgres.

A third approach is some other node postgres module that I'm unaware of. If you can recommend a good one, I can look into it.

Copy link
Member Author

@klown klown Feb 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jobara At the risk of beating a dead horse -- @amb26 and @cindyli have already answered, and you've acknowledged them -- users of the preferences server would not interact directly with the database. They would make RESTful GET or POST requests such as https://preferences.org/preferences/userid/retrieve?prefsSet=uio, or something along those lines. The details of that RESTful API are up for discussion in GH-5. Yes, the access needs to be secure and that would by handled by OAuth2 and/or login credentials.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A second approach is to use node's postgres module since it is a JavaScript wrapper for Postgres.

I also looked at node-postgres, which is the pg module on npmjs. It looks promising to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Node's postgres module ("pg") looks like an appropriate technology, especially since sequelize is advertising that it is short of maintainers. I don't think we will expose an interface which expects external users to write arbitrary SQL, but exposing intermediate operations which just abstractly create "a table" seems like a level of abstraction that they won't usually benefit from. We will write the SQL ourselves, and then package it behind an domain-appropriate API.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

node-postgres it is then, @cindyli @amb26.

Modified the fluid.postgresdb.request component and its tests as
approporiate.
Modified:
- fluid.postgresdb.request for batch table defintion,
- migrate the table definition files from Sequelize to SQL,
- unit tests for the request component,
- unit tests for the table defintions.
Changed name of function to bulk create tables `createTables()`,
to the generic `bulkQuery()`.
Changed table definition files to be one line instead of an array
of lines and updated tests as approprite.
Basic operations CREATE, INSERT, UPDATE, DELETE, and TRUNCATE
@klown klown marked this pull request as ready for review March 11, 2021 20:31
@klown
Copy link
Member Author

klown commented Mar 11, 2021

@cindyli @amb26 @jobara

Here is the conversion of the postgres access from Sequelize to node-postgres -- ready for review.

Note the the data model is still in flux, as discussed at today's meeting. Still, I did try to remove all the extraneous material from the documentation and fulfill the "ToDos" from the last time it was edited. The result is at dataModel.md. The test data for these models needs work to bring into line with this documentation, but since the data model is likely changing, I have not worked on that yet.

If you feel brave, I would like to know if npm test works on a fresh machine. But, but be forewarned: it runs a shell script that will download a Docker image of Postgres13 and use that for the unit tests. The Docker image is about 160 MB. Also, I've only tested on my Mac laptop -- I have no idea if it will work on a Windows machine.

PGPORT=${PGPORT:-5432}
PGUSER=${PGUSER:-"admin"}

PGDATABASE="prefs_testdb"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at package.json where the test task is defined, as the PGDATABASE variable is configurable via an environment variable, it probably should be defined through PGDATABASE=${PGDATABASE:-"prefs_testdb"} instead of being hardcoded.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Also, the issue with running export PGDATABASE=prefs_testdb PGPORT=5433; npm run startTestServer should be fixed.

You probably are aware that tests are missing for some APIs such as bulkQuery, loadFromJSON etc.

The tests are/were there. The test for loadFromJSON() was at line 217. , The one for bulkQuery() was at line 168. I've changed the test names to make it more obvious with the latest code.

Can you add a section in README about how to run tests and warn about running test will download a postgres docker image, as what you did in the comment?

Sure.

@cindyli
Copy link
Member

cindyli commented Mar 12, 2021

I hit this error when running export PGDATABASE=prefs_testdb PGPORT=5433; npm run startTestServer

Status: Downloaded newer image for postgres:13.1-alpine
ab9705fb9d34800d0caa2d1f7c0b6f1268a2d4432f5a86b3b758dc018c41d891
2021-03-12 16:38:32 - Checking that PostgresDB is ready...
/var/run/postgresql:5433 - no response
/var/run/postgresql:5433 - accepting connections
2021-03-12 16:38:35 - Creating prefs_testdb database ...
2021-03-12 16:38:35 - createdb: error: could not connect to database template1: FATAL:  role "admin" does not exist
npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! preferencesServer@0.1.0 startTestServer: `./src/database/scripts/startDockerPostgres.sh`
npm ERR! Exit status 1
npm ERR! 
npm ERR! Failed at the preferencesServer@0.1.0 startTestServer script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

npm ERR! A complete log of this run can be found in:
npm ERR!     /Users/cindyli/.npm/_logs/2021-03-12T21_38_35_247Z-debug.log


#### Operations API

##### `query(queryString)` (Invoker)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name query gives an impression of handling SELECT statement. Since this function is used for executing all kind of SQL statements. Do you think if it's better to call it executeSQL()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name is from the node-postgres library itself. Its query() method is used for any SQL statement, not just SELECT. Around the third time that I had typed that.pool.query(), I decided that a wrapper for it would simplify the situation; hence that.query().

With the move to using JavaScript classes, the method is now inherited. However, since I think it's a good idea to attach a rejection handler for logging errors, I've renamed the wrapper to runSql().

The returned `Promise` is configured with a rejection handler that logs any
error.

##### `bulkQuery(queryArray)` (Invoker)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably rename the function to executeSQLsFromArray()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, how about runSqlArray()?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runSqlArray() sounds good.

@cindyli
Copy link
Member

cindyli commented Mar 12, 2021

You probably are aware that tests are missing for some APIs such as bulkQuery, loadFromJSON etc.

function, but where each element is quoted with a single quote. Also, if
a value is an object, it is stringified as well as quoted.

##### `fluid.postgresdb.maybeStringify(aValue)` (Function)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this API for? Questions I have for two data types it stringifies:

  1. Object. When an object is given, would it be possible that the user wants to insert as a JSON data type rather than a string type?
  2. Array. What the intention for replacing the square brackets with curly brackets?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API was part of the loadFromJSON() machinery, and is no longer used given @amb26's comment/suggestion below. As was said there, it's not advisable to create these strings by hand. I've replaced the JSON-to-SQL translation code to use node-postgres-format.

Still, the purpose was to get the quotations and other special symbols correct for JSONB and Array column types. The reason for the curly brackets is that is a way to INSERT into an array, for example:
INSERT INTO "tableName" (roles) VALUES ('{user,admin}')";. Here the roles column is an array and the values are specified by '{user,admin}'. Stringifying alone gives '[user,admin]', which is incorrect SQL. However, that's all moot since node-postgres-format does the conversion now.

* errors.
*
* @param {Object} pool - node-postgres Pool instance to use for querying.
* @param {String} queryString - SQL query.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be careful with using "SQL query" when what you mean is "SQL statement". As mentioned earlier, the word "query" gives an impression of SELECT statement.

This description probably can be changed to: A string that contains one or more SQL statements.

This comment applies to other places where "SQL query" is used. Thanks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I've adjusted the comments as appropriate.

@cindyli
Copy link
Member

cindyli commented Mar 12, 2021

Can you add a section in README about how to run tests and warn about running test will download a postgres docker image, as what you did in the comment?

fluid.postgresdb.loadFromJSON = function (that, tableName, jsonArray) {
var insertions = [];
fluid.each(jsonArray, function (aRecord) {
var columns = fluid.postgresdb.stringifyJoinKeys(Object.keys(aRecord));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the node-postgres docs explain, raw string-bashing for assembling queries like this is highly dangerous. You should apply a parameterised query as documented here: https://node-postgres.com/features/queries#parameterized-query
As @cindyli has suggested, the various "stringify" functions below are problematic and should be removed

* @param {Any} aValue - Value to stringify if necessary.
* @return {Any|String} - The value as is, or stringified.
*/
fluid.postgresdb.maybeStringify = function (aValue) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conditional conversion functions like this are a code smell and likely an injection risk. It should be the responsibility of the caller to convert data into a form which is already suitable to be inserted into an SQL table rather than relying on some implicit expectations.

@jobara
Copy link
Member

jobara commented Mar 13, 2021

@klown I wasn't able to get the tests to run locally on my Mac. I've sent you a PR to update linting to fluid-lint-all, add CI configuration, and clean up the dependencies.


fluid.registerNamespace("fluid.postgresdb");

fluid.postgresdb.tableDefinitions = `
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could/should these be stored in plain .sql text files rather than be embedded in code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They could and it should be supported. Storing in .js files makes it easy to load using require(), and allows JavaScript template strings. But, I can see the need for inputting raw SQL from a .sql file.

I've looked through node-postgres and can't find anything like a readSqlFromFile(), so, I'll add that as a method. It should be short with a call to fs.readFile() or fs.readFileSync(). Maybe better is a runSqlFromFile() method that loads the SQL and then executes it.

@amb26 amb26 merged commit f9a5c87 into fluid-project:main Feb 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants