Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Knex + Plugins (Multiple vs Single Database) #1598

Closed
andrewthauer opened this issue Jul 13, 2020 · 18 comments
Closed

Knex + Plugins (Multiple vs Single Database) #1598

andrewthauer opened this issue Jul 13, 2020 · 18 comments
Labels
help wanted Help/Contributions wanted from community members

Comments

@andrewthauer
Copy link
Collaborator

andrewthauer commented Jul 13, 2020

The knex database connections currenty requires a different database for each plugin (catalog & auth so far). Is this the intended design or will a single database be supported?

Here are some considerations:

  • A single database would make it easier to provide a single DATABASE_URL environment variable (e.g. postgresql://user:password@host/db_name) and feed it into the knex connection property.
  • Multiple databases keep things isolated and avoids collisions for other plugins
  • Multiple databases could make secret management more complicated
  • Backup strategy for single database would likely be less complex
  • Backup strategy for multiple databases could be more flexible and secure
  • Multiple databases requires different databases that may need to be created ahead of time (if the app does not have sufficient permissions to create new databases)

My current solution for dealing with this now is:

// per plugin connection
const baseUrl = process.env.DATABASE_URL; // `postgresql://user:password@host/backstage`
const databaseUrl = `${baseUrl}_${plugin}`;
const dbConfig = { client: 'pg', connection: databaseUrl };
// use dbConfig in knex

This is convention based and seems error prone. It would break for instance if additional query params where adding to the url.

@andrewthauer andrewthauer added the help wanted Help/Contributions wanted from community members label Jul 13, 2020
@Rugvip
Copy link
Member

Rugvip commented Jul 13, 2020

Duplicate of #1034, but happy to continue the discussion here tbh as you're bringing up some good points.

@andrewthauer
Copy link
Collaborator Author

andrewthauer commented Jul 13, 2020

Sorry didn't notice there was already an issue for this :(

@Rugvip
Copy link
Member

Rugvip commented Jul 14, 2020

My thinking is that we would default to requiring a separate DB for each backend plugin, but for DBs that support multi-db + multi-schema, we could support splitting the plugins by schema instead of DB. I outlined my reasoning for that here: #1527 (comment)

Regarding the administration overhead I think a decent standard pattern could be to provide a single DBMS account that has permissions to create DBs, and let the backend create and migrate all databases for each plugin. Then for a more hardened and isolated deployment it would be possible to create the DBs ahead of time and use separate accounts for each backend plugin if desired. Even splitting into multiple DBMS instances if needed.

@andrewthauer
Copy link
Collaborator Author

@Rugvip - This seems like a pragmatic approach. As discussed in #1527, it would be good to work towards defining some standard recommendations and patterns to address some of these considerations. I'm getting close to having a deployable instance up and running internally, so I'll loop back once I have some further feedback.

@andrewthauer
Copy link
Collaborator Author

#1882 seems somewhat relevant to this as well.

@lowjoel
Copy link
Contributor

lowjoel commented Sep 14, 2020

@Rugvip I've got some time this week to work on this -- is your statement

Regarding the administration overhead I think a decent standard pattern could be to provide a single DBMS account that has permissions to create DBs, and let the backend create and migrate all databases for each plugin. Then for a more hardened and isolated deployment it would be possible to create the DBs ahead of time and use separate accounts for each backend plugin if desired. Even splitting into multiple DBMS instances if needed.

still accurate today? Should I put in effort to implement the code for this? (I don't know how much effort it would be to massage the existing database code though.) If I should, then it'll be a case of using the main admin account to generate databases, users, and their corresponding passwords and shipping them off to plugins. At the same time that would be rather messy since we'd need to generate the passwords and store them somehow (unless we don't use passwords and generate a reproducible one for each of them? which seems bad.)

Are we still going in the direction of multiple schemas (for databases like Postgres)? It would make using Sqlite or MySQL more difficult.

@andrewthauer
Copy link
Collaborator Author

andrewthauer commented Sep 14, 2020

@lowjoel - Part of this is already implemented in regards to creating the plugin db connections, postgres connection string handling, etc. The backend-common package's database module has some utilities that could help.

I think it might make sense to figure out what is still required and then create new issues and/or PR's focused on specific use cases. For instance:

Support Multi Schema Database - As you pointed out, do we want to support a single schema'd database only in dbs that support it. My gut feeling is this will introduce a lot of complexity that might not be worth it.

Creating Plugin Databases - Are we trying to solve creating new plugin databases the first time the app starts automatically? I see 3 approaches to this. Use something like pgtools, use custom raw knex commands, or knex migrations. All of these are going to have some database specific aspects. So likely any solution will have to be opt in at the backend app level. The existing database module could support helping with some of this. I actually started to implement a migrations module, but got side tracked (not complete or tested).

Common Migration API - Do we want to introduce a lower level unifying migrations API? Right now each plugin implements migrations internally and there is not much as far as shared code. Would it make sense to have some common code / patterns for plugins to use when dealing with migrations? Also, it might be useful to have a registry of migrations so you could run migrations for all plugins at the same time (e.g. a pre-deploy step).

@freben
Copy link
Member

freben commented Sep 16, 2020

One of the reasons we have databases separated in the first place is that as far as I know, knex doesn't support multiple separate migration "families" within a single database. As in, if the migration tracking table just had an extra column with a key that the plugin could define, then it could keep track of several parallel migrations (that hopefully don't trample on each other's toes...) and keep all the tables in one db. This would be convenient for the very most basic, least security conscious scenarios. But I don't think it's an option.

The second level up would be to have two users in app-config with their respective passwords: the master user that can create databases, and the regular user that can't. The backend starts doing initial work using the master user at startup, and then demotes itself to using only the regular user after that in the pools that are given to plugins. The auto-created databases will be getting this single regular user in its ACLs. This provides physical isolation, but doesn't strictly prohibit plugins from doing stuff across databases if they somehow learned the regular user's password.

Agreed that maybe the multi schema route is not necessary at this point, but can be pushed to a possible future feature addition.

@Rugvip any additional input?

@Rugvip
Copy link
Member

Rugvip commented Sep 16, 2020

Support Multi Schema Database - Agree with the above points, I'd say wait and see if there's a demand. If there is, maybe add it if it's really simple, like just switching out a knex create method.

@freben I like the idea of stepping down to a user with less access in the case where we're creating the databases. Not something required in the initial implementation though, but nice to have.

@andrewthauer I think moving the migration step to the backend would be sane, that would leave it up to each backend to decide when and how they want to run migrations. Maybe a bit nicely packaged in an interface. We should also consider whether we should and can expose knex a bit more as-is, so that we don't build a layer that forces ppl to have to build custom cli -> our layer -> knex api -> migrations, simply allowing knex-cli -> migrations.

The database creating I'd imagine is a step implemented with pgtools and happens inside the backend before we pass on database connections to the plugins. It would basically ensure that the database exists and create it if it doesn't, I don't think we even need to make it configurable tbh, probably just a method call or flag to a method call.

@lowjoel
Copy link
Contributor

lowjoel commented Sep 29, 2020

@andrewthauer @freben @Rugvip I'm planning on adding logic in backend-common/database for both Postgres and Sqlite to create a database if it doesn't exist.

For Postgres I can do this using knex without a default database (or the postgres database), and create a database per plugin, with the username and password to be that of the plugin. In other words, we will have the backstage_plugin_catalog database, with the catalog username and catalog password. Then I'll return a connection to the lower privileged user with the default database set. Question: what does "lower privileged user" mean? Owner of the database and all its tables? So it can run its own migrations but not of other plugins?

I don't yet know exactly how Sqlite would do this, probably just be different database files? There's no notion of username/password last I checked.

What do you think? We're spinning up plenty of test environments (one per branch!) and needing to manually seed the database is getting frustrating.

@andrewthauer
Copy link
Collaborator Author

@lowjoel - I'm thinking it might be worth looking at this from the perspective of what privileges the user has. For instance, one user could just have permissions to create databases while the other cannot create/drop databases, but can read/write to all the tables. One question we might want to consider, is should standard migrations also be run under a different user? If so, then the user running migrations and/or creating dbs could be an admin db account, while the other is a app user account. Anyways my 2 cents.

As far as sqlite3 is there much point putting effort here to support files? I'd be curious to know if anyone is really using this beyond a quick demo, in which the in memory version might be sufficient.

@lowjoel
Copy link
Contributor

lowjoel commented Sep 30, 2020

@andrewthauer We agree significantly 😄.

In practice however, I've not (yet, in my limited experience) seen a true distinction between maintainer/user permissions in Postgres. Most installations I've seen at best would have a "root" user creating databases then delegating permissions to another user to maintain that database (which would typically be the owner of that database). This should be fairly doable with the state of the codebase now, and would be a significant win in the setup experience.

Having this delegation work in three levels -- "root" creates database to delegate to "plugin root", then "plugin root" delegates to "plugin user" -- would be the next step and separate from the initial implementation. I think the knex migration code (which you contributed!) would need additional work if we wanted to segregate permissions at that level, and that would commit the project into a standard way of performing upgrades (which may be a good thing to do?)

I have not seen how sqlite would work with this paradigm yet. Let me try to cook something up that implements the second paragraph and we can go from there? I don't think the third paragraph's implementation would be blocked by it (and I think it would actually be a stepping stone.)

@lowjoel
Copy link
Contributor

lowjoel commented Sep 30, 2020

@andrewthauer @Rugvip @freben I've just submitted #2674, and I see that #2670 has also some of the same intentions. I haven't really tested my PR apart from just booting the backend up and doing some simple sanity tests, but let me know what you think.

It currently uses the provided Postgres credentials to bootstrap the databases and the corresponding credentials.

@Rugvip
Copy link
Member

Rugvip commented Sep 30, 2020

@andrewthauer Right now in-memory sqlite should be just fine, but that's somewhat because we don't really need to persist any data for local dev. If we end up with that need somehow we'd prolly have to revisit, but for now I think file-based sqlite has little value.

@lowjoel
Copy link
Contributor

lowjoel commented Oct 1, 2020

Continuing from #2674, where database creation is managed by the backend (entry point), we'd need to significantly redesign the database/plugin API interface to support:

  • configurations where credentials per plugin is different
  • creating the database for a plugin that requires it
  • preventing plugins from accessing other plugins' data

Going back to the main moving parts in this design, this would be:

  • config (shape/parsing/management), because this is coupled with the rest of the items below
  • database connection (knex integration)
  • database-specific behaviours (sqlite = noop, pg = roles/databases)
  • plugin integration (do plugins get to obtain a handle from any other plugin?)

I'd hate coupling the code implementation with the app-config.yaml, but maybe that's one way around this problem: We can specify in the backend database configuration the credentials, and the corresponding database factory implementation. So a simple database factory would not manage credentials nor manage databases (pass through). We can have one that takes admin credentials and uses it for managing databases and just dishing them out to plugins (pass through credentials, but manages databases). We can have one that does databases and roles (uses provided credentials to make new roles, then creates databases using those newly created roles). But this means that the database specific code would need to implement some interface that the database factory expects from database drivers, and the config shape would be coupled with the database factory.

So in this proposed design:

  • Database configuration (the backend.database configuration dictionary) will be provided to a DatabaseConfiguration object. This interface will have a getDatabaseConfig(pluginID: string) and getAdminConfig() methods, both returning the same shape. This would be the only place of coupling with the app-config.yaml. An implementation for the current behaviour will always return the same backend.database object; more sophisticated implementations can do more.
  • PluginDatabaseManager is an interface that will accept a DatabaseConfiguration object. It will have a getDatabaseFactory(pluginID: string) method, meant to be called by the main function just like createDatabaseClient() is today.
  • this getDatabaseFactory method will create objects implementing the interface PluginDatabaseFactory that have a getDatabase(database: string?): Promise<knex> method. These objects will be handed over to plugins who can further decide if they want a single, or multiple databases (leave undefined to get the "main" database, if provided this would be an additional suffix). The logic in this factory would add necessary prefixes per plugin (perhaps the pluginID as a prefix?) to prevent collisions between plugins. It can use the DatabaseConfiguration object directly (to share credentials), or create new roles (potentially saving them to Vault? but I won't define the method in the signature yet for now). Only when getDatabase is called will databases be created (via the PluginDatabaseManager type, and is private to the implementation of the PluginDatabaseManager).
  • All of these means that now database implementations must adhere to some interface as well. This probably would mean that I would need to aggregate the Sqlite and Postgres database implementation into objects.

Thoughts? I have not yet considered how extensive the change of async functions are at this point, maybe that's significant? I know for a fact that all existing scaffolded applications would definitely need a skeleton update.

@lowjoel
Copy link
Contributor

lowjoel commented Oct 1, 2020

Something like this would preserve the current behaviour:

class SimpleDatabaseConfiguration implements DatabaseConfiguration {
  private readonly config: Config;

  constructor(config: Config) {
    this.config = config
  }

  getDatabaseConfig(_pluginId: string): Config {
    return this.config;
  }

  getAdminConfig(): Config {
    return this.config;
  }
}

class SimpleDatabaseManager implements PluginDatabaseManager {
  private readonly config: DatabaseConfiguration;

  constructor(config: DatabaseConfiguration) {
    this.config = config
  }

  getDatabaseFactory(pluginId: string): PluginDatabaseFactory {
    return new SimplePluginDatabaseFactory(this, pluginId);
  }

  async getDatabase(pluginId: string, suffix?: string): Promise<knex> {
    const config = this.getDatabaseConfig(pluginId, suffix);
    await this.ensureDatabase(config.connection.database)

    return knex(config);
  }

  private getDatabaseConfig(pluginId: string, suffix?: string) {
    const dbSuffix = suffix ? `_${suffix}` : '';
    return merge(
      this.config.getDatabaseConfig(pluginId),
      {
        connection: {
          database: `backstage_plugin_${pluginId}${dbSuffix}`
        }
      }
    )
  }

  private async ensureDatabase(database: string) {
    // get pg/sqlite adapter
    const config = this.config.getAdminConfig();
    const adapter = getAdapter(config.client);

    await adapter.ensureDatabase(this.config, database);
  }
}

class SimplePluginDatabaseFactory implements PluginDatabaseFactory {
  private manager: SimpleDatabaseManager;
  private readonly pluginId: string;

  constructor(manager: SimpleDatabaseManager, pluginId: string) {
    this.manager = manager;
    this.pluginId = pluginId;
  }

  getDatabase(database?: string): Promise<knex> {
    return this.manager.getDatabase(this.pluginId, database);
  }
}

function makeCreateEnv(loadedConfigs: AppConfig[]) {
  const config = ConfigReader.fromConfigs(loadedConfigs);

  const databaseManager = new SimpleDatabaseManager(new SimpleDatabaseConfiguration(config.getConfig('backend.database')));

  return (plugin: string): PluginEnvironment => {
    const logger = getRootLogger().child({ type: 'plugin', plugin });
    const databaseFactory = databaseManager.getDatabaseFactory(plugin);
    const discovery = SingleHostDiscovery.fromConfig(config);
    return { logger, databaseFactory, config, discovery };
  };
}

Implementing the case of managing databases with the different credentials would then mean overriding DatabaseConfiguration.getAdminConfig, and overriding PluginDatabaseManager.getDatabase to create the role and create the database if it does not exist.

@marcuseide
Copy link
Collaborator

@andrewthauer Hey, just checking in and asking if you feel that this issue can be closed now or if there are more discussions to be had?

@andrewthauer
Copy link
Collaborator Author

@marcuseide - I'm good with closing this issue and handling any remaining items in separate issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Help/Contributions wanted from community members
Projects
None yet
Development

No branches or pull requests

5 participants