Skip to content
This repository has been archived by the owner on Oct 22, 2020. It is now read-only.

feat: store data in individual database tables #497

Merged
merged 23 commits into from
Sep 6, 2018
Merged

Conversation

tamlyn
Copy link
Contributor

@tamlyn tamlyn commented Aug 18, 2018

Background

  • Store manuscripts, teams and files in their own tables (keep users in entities)
  • Use knex for query building
  • Change some columns to arrays instead of JSONB due to a quirk in pg's query formatting.
  • Use , instead of . in column names (e.g. meta,title) due to dots being special in knex

Any relevant tickets

Partially addresses #473 (users and identities still to do)

How has this been tested?

Existing tests updated, new unit tests written.

@diversemix
Copy link
Contributor

diversemix commented Aug 28, 2018

Well after a long day of trying to get the tests to work... there seems to be a few issues that need to be sorted.

  • The queries do not work if the tables they are joining on are empty
  • There are "random" errors that a given table does not exist.
  • There are "random" errors that a given relations do not exist.
  • deadlocks are detected

Ideas:

  • In the middle of trying to create a framework for data-access.js
  • Improve errors lots of them have the stack:
     at Connection.Object.<anonymous>.Connection.parseE (node_modules/pg/lib/connection.js:553:11)
     at Connection.Object.<anonymous>.Connection.parseMessage (node_modules/pg/lib/connection.js:378:19)
     at Socket.<anonymous> (node_modules/pg/lib/connection.js:119:22)

@diversemix
Copy link
Contributor

I think I've found the problem (at 11:33pm!) ... needs a fix in create-tables.js like:

      let deleteSQL = ""
      await Promise.all(
        // TODO this is dangerous, change it
        rows.map(row => deleteSQL += `DROP TABLE "${row.tablename}" CASCADE;`),
      )
      await db.query(deleteSQL)

or something a bit more elegant!

files: [{ filename: 'manuscript.pdf' }],
})
})
// it.skip('extracts title from PDF', async () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warnings don't fail the build, so you don't need to comment this out

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it does stop you making a commit ;P

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I didn't realise, I may not be running the commit hook locally... I suggest deleting the test and addressing it in #493, rather than commenting it out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good plan!

@@ -28,10 +32,12 @@ describe('Submission', () => {
const user = await User.save(userData)
userId = user.id
mailer.clearMails()
initializing = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need to have an initializing parameter. Jest will wait for the async function to complete before running the tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can see details here and here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right we shouldn't. However I'd like to leave them in - that way if anyone introduces code that makes the tests fail because of async init or tear down then we will easily know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worth using a promise to make it clearer that we're waiting for the database to be initialised. Using an assertion says to me that we're testing the database initialisation.

i.e.

 let initializeDb;

 const initialize = async () => {
    replaySetup('success')
    await Promise.all([
      fs.remove(config.get('pubsweet-server.uploads')),
       createTables(true),
     ])
    const user = await User.save(userData)
    userId = user.id
    mailer.clearMails()
 }

 beforeEach(async () => {
    initializeDb = initialize();
 });

 ...

 it('Returns empty object when there are no manuscripts in the db', async () => {
      await initializeDb();
      const manuscript = await Query.currentSubmission({}, {}, { user: userId })
      expect(manuscript).toBe(null)
 })

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooo yeah I like that 👍
.. you want to add it? @damnhipster

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This initializing stuff is unnecessary noise. All it is asserting is that the beforeEach function completed before the test started but this is guaranteed by Jest already.

If a bug in the initialisation logic is introduced (e.g. forgetting to await a statement) this won't catch it because the beforeEach still completed and set initializing back to false. The most likely way that assertion could fail is if an exception is thrown in the beforeEach. But in that case it's confusing because the initialisation failed so initializing should really be false.

The promise solution is bypassing Jest's automatic waiting in favour of explicit manual waiting. Surely that's likely to lead to more bugs, not fewer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to leave it in there... it helped me find the issue in pubsweet with the asynchronous drop tables. So it would catch issues with pubsweet? Please correct if I am wrong/you disagree.

pool.testDbName = mockTestDbName

return pool
})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this stuff means we can no longer parallelise test runs (need to run jest with --runInBand to avoid errors on multi-core machines). But we have so few tests that for the time being it's only a couple of seconds faster when running in parallel. The gain in simplicity is probably worth the loss in performance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't the simplicity ... it was the fact that jest.mock doesn't work. There are three places db.js is used in the project and the jest mock would only work for one of them. I did try to explicitly mock out the paths but to no avail. Do you know how to ensure that jest mocks everything under node_modules ? @tamlyn

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I just had a look too and gave up. It's a hacky solution which is good when it works but hard to debug when it doesn't. That's what I means about simplicity. At least it's easier to reason about what's going on when you have a single database.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, just a shame it doesn't work. I just saw your commit (e615fc1) to put in the "runInBand" flag, looks good. I was wondering where is the best place to put in the tests for selectById (and status) to test the queries.... At the data-access level it gets tested to make sure it returns the correct object id ... but it doesn't check exactly what is returned. I'm thinking this should go in the resolve.test ...? ... what's the thinking @damnhipster @tamlyn ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With Hem's help I realised that the root of the problem here was that pubsweet-server was present multiple times in the dependency tree. This is a problem because each one gets its own database pool and it introduces subtle problems as seen here. Ensuring that all pubsweet dependencies are always upgraded together avoids this.

I've re-enabled parallel test execution for now. It will become more valuable as the number of tests increases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow! That is fantastic! So how was the dependency tree fixed ... we'll talk tomorrow - great work again 👍

Use DISTINCT to limit files and teams returned in queries.
Recursively delete linked files and teams when deleting a manuscript.
Force Jest to run tests in serial.
Reduce verbosity of data access by dynamically building SQL queries.
Use comma as separator for nested columns because dot is special in Knex and colon is special in Objection.
return { ...entity, [columnName]: value }
}, {})

module.exports = { rowToEntity, entityToRow }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These functions look like they're better off living under server/xpub/entities/util.js, as they're specifically for entities.

Update teams instead of replacing them.
Add some tests.
@tamlyn tamlyn changed the title WIP feat: store data in individual database tables feat: store data in individual database tables Sep 4, 2018
]

const dataAccess = {
async selectById(id) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@diversemix In response to your comment about testing selectById, given that it's a function on the exported dataAccess object, it should be tested in dataAccess.test.js

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... the issue with that is we need to create files and teams within that script. But I guess this is a reflection on the hierarchy of objects. Maybe we directly use the corresponding entity's data-access functions to do that?

@diversemix
Copy link
Contributor

diversemix commented Sep 6, 2018

I'm trying to run the tests on this branch ... and I started with a fresh database and node_modules and I keep seeing #539

Is anyone else seeing this?

const lodash = require('lodash')
const dataAccess = require('./data-access')

const empty = { teamMembers: [] }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bug - fixing as part of the testing changes going it.

@diversemix
Copy link
Contributor

@tamlyn I've updated with the tests and made that minor fix to the definition of empty for File

Copy link
Contributor

@diversemix diversemix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm


describe('FileAccessLayer', () => {
beforeEach(async () => {
testId = await initializeDatabase()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're already assigning testId to the outer variable in initializeDatabase.

@diversemix diversemix merged commit 6f73ff1 into develop Sep 6, 2018
@diversemix diversemix deleted the 423-sql-schema branch September 6, 2018 16:44
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants