Knex + Plugins (Multiple vs Single Database) #1598

andrewthauer · 2020-07-13T13:13:36Z

The knex database connections currenty requires a different database for each plugin (catalog & auth so far). Is this the intended design or will a single database be supported?

Here are some considerations:

A single database would make it easier to provide a single DATABASE_URL environment variable (e.g. postgresql://user:password@host/db_name) and feed it into the knex connection property.
Multiple databases keep things isolated and avoids collisions for other plugins
Multiple databases could make secret management more complicated
Backup strategy for single database would likely be less complex
Backup strategy for multiple databases could be more flexible and secure
Multiple databases requires different databases that may need to be created ahead of time (if the app does not have sufficient permissions to create new databases)

My current solution for dealing with this now is:

// per plugin connection
const baseUrl = process.env.DATABASE_URL; // `postgresql://user:password@host/backstage`
const databaseUrl = `${baseUrl}_${plugin}`;
const dbConfig = { client: 'pg', connection: databaseUrl };
// use dbConfig in knex

This is convention based and seems error prone. It would break for instance if additional query params where adding to the url.

The text was updated successfully, but these errors were encountered:

Rugvip · 2020-07-13T13:22:49Z

Duplicate of #1034, but happy to continue the discussion here tbh as you're bringing up some good points.

andrewthauer · 2020-07-13T19:09:43Z

Sorry didn't notice there was already an issue for this :(

Rugvip · 2020-07-14T16:25:50Z

My thinking is that we would default to requiring a separate DB for each backend plugin, but for DBs that support multi-db + multi-schema, we could support splitting the plugins by schema instead of DB. I outlined my reasoning for that here: #1527 (comment)

Regarding the administration overhead I think a decent standard pattern could be to provide a single DBMS account that has permissions to create DBs, and let the backend create and migrate all databases for each plugin. Then for a more hardened and isolated deployment it would be possible to create the DBs ahead of time and use separate accounts for each backend plugin if desired. Even splitting into multiple DBMS instances if needed.

andrewthauer · 2020-07-15T10:52:53Z

@Rugvip - This seems like a pragmatic approach. As discussed in #1527, it would be good to work towards defining some standard recommendations and patterns to address some of these considerations. I'm getting close to having a deployable instance up and running internally, so I'll loop back once I have some further feedback.

andrewthauer · 2020-08-10T11:24:12Z

#1882 seems somewhat relevant to this as well.

lowjoel · 2020-09-14T09:22:55Z

@Rugvip I've got some time this week to work on this -- is your statement

Regarding the administration overhead I think a decent standard pattern could be to provide a single DBMS account that has permissions to create DBs, and let the backend create and migrate all databases for each plugin. Then for a more hardened and isolated deployment it would be possible to create the DBs ahead of time and use separate accounts for each backend plugin if desired. Even splitting into multiple DBMS instances if needed.

still accurate today? Should I put in effort to implement the code for this? (I don't know how much effort it would be to massage the existing database code though.) If I should, then it'll be a case of using the main admin account to generate databases, users, and their corresponding passwords and shipping them off to plugins. At the same time that would be rather messy since we'd need to generate the passwords and store them somehow (unless we don't use passwords and generate a reproducible one for each of them? which seems bad.)

Are we still going in the direction of multiple schemas (for databases like Postgres)? It would make using Sqlite or MySQL more difficult.

andrewthauer · 2020-09-14T13:26:42Z

@lowjoel - Part of this is already implemented in regards to creating the plugin db connections, postgres connection string handling, etc. The backend-common package's database module has some utilities that could help.

I think it might make sense to figure out what is still required and then create new issues and/or PR's focused on specific use cases. For instance:

Support Multi Schema Database - As you pointed out, do we want to support a single schema'd database only in dbs that support it. My gut feeling is this will introduce a lot of complexity that might not be worth it.

Creating Plugin Databases - Are we trying to solve creating new plugin databases the first time the app starts automatically? I see 3 approaches to this. Use something like pgtools, use custom raw knex commands, or knex migrations. All of these are going to have some database specific aspects. So likely any solution will have to be opt in at the backend app level. The existing database module could support helping with some of this. I actually started to implement a migrations module, but got side tracked (not complete or tested).

Common Migration API - Do we want to introduce a lower level unifying migrations API? Right now each plugin implements migrations internally and there is not much as far as shared code. Would it make sense to have some common code / patterns for plugins to use when dealing with migrations? Also, it might be useful to have a registry of migrations so you could run migrations for all plugins at the same time (e.g. a pre-deploy step).

freben · 2020-09-16T10:02:19Z

One of the reasons we have databases separated in the first place is that as far as I know, knex doesn't support multiple separate migration "families" within a single database. As in, if the migration tracking table just had an extra column with a key that the plugin could define, then it could keep track of several parallel migrations (that hopefully don't trample on each other's toes...) and keep all the tables in one db. This would be convenient for the very most basic, least security conscious scenarios. But I don't think it's an option.

The second level up would be to have two users in app-config with their respective passwords: the master user that can create databases, and the regular user that can't. The backend starts doing initial work using the master user at startup, and then demotes itself to using only the regular user after that in the pools that are given to plugins. The auto-created databases will be getting this single regular user in its ACLs. This provides physical isolation, but doesn't strictly prohibit plugins from doing stuff across databases if they somehow learned the regular user's password.

Agreed that maybe the multi schema route is not necessary at this point, but can be pushed to a possible future feature addition.

@Rugvip any additional input?

Rugvip · 2020-09-16T18:32:01Z

Support Multi Schema Database - Agree with the above points, I'd say wait and see if there's a demand. If there is, maybe add it if it's really simple, like just switching out a knex create method.

@freben I like the idea of stepping down to a user with less access in the case where we're creating the databases. Not something required in the initial implementation though, but nice to have.

@andrewthauer I think moving the migration step to the backend would be sane, that would leave it up to each backend to decide when and how they want to run migrations. Maybe a bit nicely packaged in an interface. We should also consider whether we should and can expose knex a bit more as-is, so that we don't build a layer that forces ppl to have to build custom cli -> our layer -> knex api -> migrations, simply allowing knex-cli -> migrations.

The database creating I'd imagine is a step implemented with pgtools and happens inside the backend before we pass on database connections to the plugins. It would basically ensure that the database exists and create it if it doesn't, I don't think we even need to make it configurable tbh, probably just a method call or flag to a method call.

lowjoel · 2020-09-29T10:28:38Z

@andrewthauer @freben @Rugvip I'm planning on adding logic in backend-common/database for both Postgres and Sqlite to create a database if it doesn't exist.

For Postgres I can do this using knex without a default database (or the postgres database), and create a database per plugin, with the username and password to be that of the plugin. In other words, we will have the backstage_plugin_catalog database, with the catalog username and catalog password. Then I'll return a connection to the lower privileged user with the default database set. Question: what does "lower privileged user" mean? Owner of the database and all its tables? So it can run its own migrations but not of other plugins?

I don't yet know exactly how Sqlite would do this, probably just be different database files? There's no notion of username/password last I checked.

What do you think? We're spinning up plenty of test environments (one per branch!) and needing to manually seed the database is getting frustrating.

andrewthauer · 2020-09-29T20:49:02Z

@lowjoel - I'm thinking it might be worth looking at this from the perspective of what privileges the user has. For instance, one user could just have permissions to create databases while the other cannot create/drop databases, but can read/write to all the tables. One question we might want to consider, is should standard migrations also be run under a different user? If so, then the user running migrations and/or creating dbs could be an admin db account, while the other is a app user account. Anyways my 2 cents.

As far as sqlite3 is there much point putting effort here to support files? I'd be curious to know if anyone is really using this beyond a quick demo, in which the in memory version might be sufficient.

lowjoel · 2020-09-30T01:58:48Z

@andrewthauer We agree significantly 😄.

In practice however, I've not (yet, in my limited experience) seen a true distinction between maintainer/user permissions in Postgres. Most installations I've seen at best would have a "root" user creating databases then delegating permissions to another user to maintain that database (which would typically be the owner of that database). This should be fairly doable with the state of the codebase now, and would be a significant win in the setup experience.

Having this delegation work in three levels -- "root" creates database to delegate to "plugin root", then "plugin root" delegates to "plugin user" -- would be the next step and separate from the initial implementation. I think the knex migration code (which you contributed!) would need additional work if we wanted to segregate permissions at that level, and that would commit the project into a standard way of performing upgrades (which may be a good thing to do?)

I have not seen how sqlite would work with this paradigm yet. Let me try to cook something up that implements the second paragraph and we can go from there? I don't think the third paragraph's implementation would be blocked by it (and I think it would actually be a stepping stone.)

lowjoel · 2020-09-30T10:12:26Z

@andrewthauer @Rugvip @freben I've just submitted #2674, and I see that #2670 has also some of the same intentions. I haven't really tested my PR apart from just booting the backend up and doing some simple sanity tests, but let me know what you think.

It currently uses the provided Postgres credentials to bootstrap the databases and the corresponding credentials.

Rugvip · 2020-09-30T14:53:57Z

@andrewthauer Right now in-memory sqlite should be just fine, but that's somewhat because we don't really need to persist any data for local dev. If we end up with that need somehow we'd prolly have to revisit, but for now I think file-based sqlite has little value.

lowjoel · 2020-10-01T05:12:44Z

Continuing from #2674, where database creation is managed by the backend (entry point), we'd need to significantly redesign the database/plugin API interface to support:

configurations where credentials per plugin is different
creating the database for a plugin that requires it
preventing plugins from accessing other plugins' data

Going back to the main moving parts in this design, this would be:

config (shape/parsing/management), because this is coupled with the rest of the items below
database connection (knex integration)
database-specific behaviours (sqlite = noop, pg = roles/databases)
plugin integration (do plugins get to obtain a handle from any other plugin?)

I'd hate coupling the code implementation with the app-config.yaml, but maybe that's one way around this problem: We can specify in the backend database configuration the credentials, and the corresponding database factory implementation. So a simple database factory would not manage credentials nor manage databases (pass through). We can have one that takes admin credentials and uses it for managing databases and just dishing them out to plugins (pass through credentials, but manages databases). We can have one that does databases and roles (uses provided credentials to make new roles, then creates databases using those newly created roles). But this means that the database specific code would need to implement some interface that the database factory expects from database drivers, and the config shape would be coupled with the database factory.

So in this proposed design:

Database configuration (the backend.database configuration dictionary) will be provided to a DatabaseConfiguration object. This interface will have a getDatabaseConfig(pluginID: string) and getAdminConfig() methods, both returning the same shape. This would be the only place of coupling with the app-config.yaml. An implementation for the current behaviour will always return the same backend.database object; more sophisticated implementations can do more.
PluginDatabaseManager is an interface that will accept a DatabaseConfiguration object. It will have a getDatabaseFactory(pluginID: string) method, meant to be called by the main function just like createDatabaseClient() is today.
this getDatabaseFactory method will create objects implementing the interface PluginDatabaseFactory that have a getDatabase(database: string?): Promise<knex> method. These objects will be handed over to plugins who can further decide if they want a single, or multiple databases (leave undefined to get the "main" database, if provided this would be an additional suffix). The logic in this factory would add necessary prefixes per plugin (perhaps the pluginID as a prefix?) to prevent collisions between plugins. It can use the DatabaseConfiguration object directly (to share credentials), or create new roles (potentially saving them to Vault? but I won't define the method in the signature yet for now). Only when getDatabase is called will databases be created (via the PluginDatabaseManager type, and is private to the implementation of the PluginDatabaseManager).
All of these means that now database implementations must adhere to some interface as well. This probably would mean that I would need to aggregate the Sqlite and Postgres database implementation into objects.

Thoughts? I have not yet considered how extensive the change of async functions are at this point, maybe that's significant? I know for a fact that all existing scaffolded applications would definitely need a skeleton update.

lowjoel · 2020-10-01T06:41:04Z

Something like this would preserve the current behaviour:

class SimpleDatabaseConfiguration implements DatabaseConfiguration {
  private readonly config: Config;

  constructor(config: Config) {
    this.config = config
  }

  getDatabaseConfig(_pluginId: string): Config {
    return this.config;
  }

  getAdminConfig(): Config {
    return this.config;
  }
}

class SimpleDatabaseManager implements PluginDatabaseManager {
  private readonly config: DatabaseConfiguration;

  constructor(config: DatabaseConfiguration) {
    this.config = config
  }

  getDatabaseFactory(pluginId: string): PluginDatabaseFactory {
    return new SimplePluginDatabaseFactory(this, pluginId);
  }

  async getDatabase(pluginId: string, suffix?: string): Promise<knex> {
    const config = this.getDatabaseConfig(pluginId, suffix);
    await this.ensureDatabase(config.connection.database)

    return knex(config);
  }

  private getDatabaseConfig(pluginId: string, suffix?: string) {
    const dbSuffix = suffix ? `_${suffix}` : '';
    return merge(
      this.config.getDatabaseConfig(pluginId),
      {
        connection: {
          database: `backstage_plugin_${pluginId}${dbSuffix}`
        }
      }
    )
  }

  private async ensureDatabase(database: string) {
    // get pg/sqlite adapter
    const config = this.config.getAdminConfig();
    const adapter = getAdapter(config.client);

    await adapter.ensureDatabase(this.config, database);
  }
}

class SimplePluginDatabaseFactory implements PluginDatabaseFactory {
  private manager: SimpleDatabaseManager;
  private readonly pluginId: string;

  constructor(manager: SimpleDatabaseManager, pluginId: string) {
    this.manager = manager;
    this.pluginId = pluginId;
  }

  getDatabase(database?: string): Promise<knex> {
    return this.manager.getDatabase(this.pluginId, database);
  }
}

function makeCreateEnv(loadedConfigs: AppConfig[]) {
  const config = ConfigReader.fromConfigs(loadedConfigs);

  const databaseManager = new SimpleDatabaseManager(new SimpleDatabaseConfiguration(config.getConfig('backend.database')));

  return (plugin: string): PluginEnvironment => {
    const logger = getRootLogger().child({ type: 'plugin', plugin });
    const databaseFactory = databaseManager.getDatabaseFactory(plugin);
    const discovery = SingleHostDiscovery.fromConfig(config);
    return { logger, databaseFactory, config, discovery };
  };
}

Implementing the case of managing databases with the different credentials would then mean overriding DatabaseConfiguration.getAdminConfig, and overriding PluginDatabaseManager.getDatabase to create the role and create the database if it does not exist.

marcuseide · 2020-10-13T08:48:56Z

@andrewthauer Hey, just checking in and asking if you feel that this issue can be closed now or if there are more discussions to be had?

andrewthauer · 2020-10-13T12:02:30Z

@marcuseide - I'm good with closing this issue and handling any remaining items in separate issues.

andrewthauer added the help wanted Help/Contributions wanted from community members label Jul 13, 2020

Rugvip mentioned this issue Jul 13, 2020

Be able to reuse the database setup in multiple backends #1034

Closed

shmidt-i mentioned this issue Jul 24, 2020

Create app backend with PG support & E2E Test #1726

Merged

ZaradarBH mentioned this issue Aug 10, 2020

Create Postgress database dfds/backstage_originaldfdsfork#22

Closed

3 tasks

freben mentioned this issue Aug 12, 2020

Add shared database connection helpers #1915

Merged

6 tasks

Rugvip mentioned this issue Sep 17, 2020

feat(create-app): add dev script and docker-compose workflow #2417

Closed

7 tasks

andrewthauer mentioned this issue Sep 30, 2020

contrib: add basic database creator #2670

Closed

6 tasks

lowjoel mentioned this issue Sep 30, 2020

Auto-create plugin databases #2674

Merged

6 tasks

lowjoel mentioned this issue Oct 1, 2020

refactor: Expose types to handle Database management #2697

Merged

6 tasks

This was referenced Oct 5, 2020

Version Packages #2654

Merged

Version Packages andrewthauer/backstage#13

Closed

Version Packages aasifshaikh/backstage#67

Open

Version Packages cooldashing24/backstage#247

Open

Version Packages jakeswenson/backstage#101

Open

andrewthauer closed this as completed Oct 13, 2020

This was referenced Oct 14, 2020

Version Packages desktophero/backstage#1

Open

Version Packages mateusmarquezini/backstage#1

Open

Version Packages GoWind/backstage#1

Open

Version Packages RoadieHQ/backstage#45

Closed

This was referenced Oct 27, 2020

Version Packages themousepotato/backstage#6

Open

Version Packages o-farooq/backstage#1

Open

github-actions bot mentioned this issue Jan 11, 2021

Version Packages mcalus3/backstage#1

Open

bomkamp mentioned this issue Oct 5, 2021

Add plugin pg schema override #7453

Closed

5 tasks

begrossi mentioned this issue Aug 29, 2022

Knex+Backstage: Add support for 1 database/schema due to production restriction. #13415

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Knex + Plugins (Multiple vs Single Database) #1598

Knex + Plugins (Multiple vs Single Database) #1598

andrewthauer commented Jul 13, 2020 •

edited

Loading

Rugvip commented Jul 13, 2020

andrewthauer commented Jul 13, 2020 •

edited

Loading

Rugvip commented Jul 14, 2020

andrewthauer commented Jul 15, 2020

andrewthauer commented Aug 10, 2020

lowjoel commented Sep 14, 2020 •

edited

Loading

andrewthauer commented Sep 14, 2020 •

edited

Loading

freben commented Sep 16, 2020

Rugvip commented Sep 16, 2020

lowjoel commented Sep 29, 2020 •

edited

Loading

andrewthauer commented Sep 29, 2020

lowjoel commented Sep 30, 2020

lowjoel commented Sep 30, 2020

Rugvip commented Sep 30, 2020

lowjoel commented Oct 1, 2020 •

edited

Loading

lowjoel commented Oct 1, 2020

marcuseide commented Oct 13, 2020

andrewthauer commented Oct 13, 2020

Knex + Plugins (Multiple vs Single Database) #1598

Knex + Plugins (Multiple vs Single Database) #1598

Comments

andrewthauer commented Jul 13, 2020 • edited Loading

Rugvip commented Jul 13, 2020

andrewthauer commented Jul 13, 2020 • edited Loading

Rugvip commented Jul 14, 2020

andrewthauer commented Jul 15, 2020

andrewthauer commented Aug 10, 2020

lowjoel commented Sep 14, 2020 • edited Loading

andrewthauer commented Sep 14, 2020 • edited Loading

freben commented Sep 16, 2020

Rugvip commented Sep 16, 2020

lowjoel commented Sep 29, 2020 • edited Loading

andrewthauer commented Sep 29, 2020

lowjoel commented Sep 30, 2020

lowjoel commented Sep 30, 2020

Rugvip commented Sep 30, 2020

lowjoel commented Oct 1, 2020 • edited Loading

lowjoel commented Oct 1, 2020

marcuseide commented Oct 13, 2020

andrewthauer commented Oct 13, 2020

andrewthauer commented Jul 13, 2020 •

edited

Loading

andrewthauer commented Jul 13, 2020 •

edited

Loading

lowjoel commented Sep 14, 2020 •

edited

Loading

andrewthauer commented Sep 14, 2020 •

edited

Loading

lowjoel commented Sep 29, 2020 •

edited

Loading

lowjoel commented Oct 1, 2020 •

edited

Loading