Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DbDataSource, dependency injection and multi-tenancy #4523

Open
roji opened this issue Jun 20, 2022 · 8 comments
Open

DbDataSource, dependency injection and multi-tenancy #4523

roji opened this issue Jun 20, 2022 · 8 comments
Assignees
Milestone

Comments

@roji
Copy link
Member

roji commented Jun 20, 2022

#4503 will add a new Npgsql package that integrates it nicely into DI, thanks to DbDataSource. The scope there is only to register an NpgsqlDataSource (typically as singleton) and an NpgsqlConnection (typically as scoped), which users can then get injected with and use (note that the generic DbDataSource and DbConnection will also be registered, for database-agnostic patterns).

We could go further and provide some tools to help with multitenancy:

  • The user could provide a "tenant key generator" lambda (which gets us the tenant ID from somewhere in the service provider), plus a "data source creator" lambda (which can create the data source given a tenant ID).
  • We register some singleton "multitenant data source registry".
  • We also register a scoped data source service, retrieved via code which gets the tenant ID, and gets the corresponding data source from the singleton registry (or creating it if it doesn't yet exist).
  • For database-per-tenant scenarios, we have truly distinct data sources, each pointing to a different database.
  • For schema-per-tenant (and possibly some other multitenancy schemes):
    • The user typically wants to set some connection state. For example, in PG there's the search_path state parameter, similar to PATH: it determines which schemas to traverse when resolving an unqualified table name.
    • We'd therefore facilitate having scoped data source wrappers, which actually all use the same singleton data source behind the scenes (because it's the same database), but execute some SQL each time a connection is returned from the pool.
    • For extra bonus performance points, we could prepend/defer that SQL, so that instead of adding a roundtrip for that, the SQL gets delivered when the user executes a real command (see #4522). This would make this free in terms of perf.
  • We'd need to give special thought to multiplexing, where there's no connection; the connection-less NpgsqlDataSource.CreateCommand would also need to prepend each time.
Multi-tenancy with discriminator column

Note: the above can even support the tenant ID as a discriminator column. PostgreSQL allows defining arbitrary connection variables, and then using them in queries:

DROP TABLE IF EXISTS data;
CREATE TABLE data (
    id int PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
    number int
);

INSERT INTO data (number) VALUES (8), (9);

SET foo.bar = 8;
SELECT * FROM data WHERE number=CURRENT_SETTING('foo.bar')::int;

/cc @ajcvickers @JeremyLikness

@luanmm
Copy link

luanmm commented Oct 25, 2023

Hello, @roji.

Regarding this subject, how can I setup a multi-tenant strategy in Npgsql 7? I'm strugling here to do it, trying mixing Microsoft samples like this one and DbDataSource.

My multi-tenancy is with database separation, so I need a connection string for every tenant (my NpgsqlDataSource can't have a singleton lifetime).

To achieve it, like in the referenced sample, I'm trying to use an IDbContextFactory that resolves the connection string for the "current tenant" and it should create the DbContext accordingly.

The problem is that I would have something like that:

    public TenantDbContext CreateDbContext()
    {
        var context = _pooledFactory.CreateDbContext(); // Here the "OnConfiguring" method from DbContext was already been called, throwing an exception (because the "DataSource" is null, as it is being set below)

        var tenant = _tenantContextAccessor.TenantContext?.Tenant;

        var connectionString = tenant?.ConnectionString;

        if (string.IsNullOrEmpty(connectionString))
        {
            if (tenant == null)
                throw new ArgumentNullException(nameof(tenant), "Database connection is not configured (tenant could not be identified).");

            throw new ArgumentNullException(nameof(tenant), "Database connection is not configured.");
        }

        var dataSourceBuilder = new NpgsqlDataSourceBuilder(connectionString);
        dataSourceBuilder.MapEnum<...>(); // I need some custom enums to be registered (that are being configured in "OnModelCreating", with "HasPostgresEnum" extension method)
        dataSourceBuilder.MapEnum<...>();
        dataSourceBuilder.MapEnum<...>();
        dataSourceBuilder.UseNetTopologySuite();

        context.DataSource = dataSourceBuilder.Build(); // This (definately) is not a good way to pass the DataSource to DbContext, but I'm stuck here (here, the "OnConfiguring" has already been called, as stated above)

        return context;
    }

The "OnConfiguring" is something like this:

    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
        => optionsBuilder
            .UseLazyLoadingProxies()
            .UseNpgsql(DataSource!, options => options // DataSource is always null here
                .EnableRetryOnFailure()
                .MigrationsAssembly(typeof(TenantDbContext).Assembly.FullName)
                .UseQuerySplittingBehavior(QuerySplittingBehavior.SplitQuery))
            .UseSnakeCaseNamingConvention();

I think this scenario is not so trivial, but multi-tenancy commonly use multiple databases, so I think it will be positive to support (if it isn't yet).

Thanks in advance for the attention.

And congratulations for all the work here.

@roji
Copy link
Member Author

roji commented Oct 26, 2023

@luanmm each NpgsqlDataSource works with exactly one connection string; so if each tenant of yours has a different connection string, that means you need a data source per tenant. It's important that you don't create a data source instead for each EF DbContext; since each data source represents its own connection pool, doing that would effectively disable connection pooling, with pretty bad performance implications.

All this means that you want to have a "registry" of data sources, keyed on the tenant ID, and probably registered as a singleton service in DI; when a request comes in for some tenant, you either get its data source from the registry (if it was already seen), or create a new data source and store it there.

Then, you initialize the DbContext with that data source. This is conceptually the same as initializing it with the tenant-specific connection string (once you have the data source instance from the singleton registry).

Hopefully the above is enough to get you on the right path - if not let me know. This definitely should be made a bit simpler (or we should at least have some docs on it online).

@luanmm
Copy link

luanmm commented Oct 26, 2023

Thank you very much for the tips, @roji!

I was able to address the issue for a common DbContext creation scenario:

public class DataSourceRegistry
{
    private readonly ILogger<DataSourceRegistry> _logger;

    private readonly IDictionary<string, DbDataSource> _entries = new Dictionary<string, DbDataSource>();

    public DataSourceRegistry(ILogger<DataSourceRegistry> logger)
    {
        _logger = logger;
    }

    public DbDataSource GetOrCreate(string key, Func<DbDataSource> factory)
    {
        if (!_entries.TryGetValue(key, out var dataSource))
        {
            _logger.LogInformation($"Creating (and caching) a new data source for identifier \"{key}\".");

            dataSource = factory();

            _entries.Add(key, dataSource);
        }

        return dataSource;
    }
}

In service registration:

        services.TryAddSingleton<DataSourceRegistry>();

        services.AddDbContext<TenantDbContext>((sp, options) =>
        {
            var tenant = sp.GetService<ITenantContextAccessor>()?.TenantContext?.Tenant;
            if (tenant == null)
                throw new ArgumentNullException(nameof(tenant), "Database connection is not configured (tenant could not be identified).");

            var dataSourceRegistry = sp.GetRequiredService<DataSourceRegistry>();

            var dataSource = dataSourceRegistry.GetOrCreate(tenant.Identifier, () =>
            {
                var dataSourceBuilder = new NpgsqlDataSourceBuilder(tenant.ConnectionString);
                dataSourceBuilder.MapEnum<...>();
                dataSourceBuilder.MapEnum<...>();
                dataSourceBuilder.MapEnum<...>();
                dataSourceBuilder.UseNetTopologySuite();

                return dataSourceBuilder.Build();
            });

            options
                .UseLazyLoadingProxies()
                .UseNpgsql(dataSource, options => options
                    .EnableRetryOnFailure()
                    .MigrationsAssembly(typeof(TenantDbContext).Assembly.FullName)
                    .UseQuerySplittingBehavior(QuerySplittingBehavior.SplitQuery))
                .UseSnakeCaseNamingConvention();
        });

The only thing is that I could not make it work using AddDbContextPool, to use pooled contexts, but is a more advanced use case and not (currently) a requirement. But, if I achieve it at some point, I update here with my solution.

Best regards.

@NinoFloris
Copy link
Member

NinoFloris commented Oct 26, 2023

Using keyed services is another option if you know all your tenants at startup.

#5134

Docs for keyed services:
https://learn.microsoft.com/en-us/aspnet/core/fundamentals/dependency-injection?view=aspnetcore-8.0#keyed-services

Just to note, your registry is registered as a singleton but uses a normal dictionary without a lock.

Singletons can be accessed from multiple threads so better to use a ConcurrentDictionary or to wrap all your code accessing the dictionary in a lock!

@luanmm
Copy link

luanmm commented Oct 26, 2023

@NinoFloris thank you very much for the tips!

I wasn't thinking about concurrency, indeed. It is really a good point.

I will change to a ConcurrentDictionary for now, but will try with keyed services after .NET 8 release next month.

Thank you all for the help!

@onurkanbakirci
Copy link

onurkanbakirci commented May 24, 2024

Following is an implementation with ConcurrentDictionary instead of Dictionary

public class DataSourceRegistry
{
    private readonly ConcurrentDictionary<string, DbDataSource> _entries = new ConcurrentDictionary<string, DbDataSource>();

    public DbDataSource GetOrCreate(string key, Func<DbDataSource> factory)
    {
        return _entries.GetOrAdd(key, _ => factory());
    }
}

@ReneKochITTitans

This comment has been minimized.

@roji

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants