Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sql): Add an SQL permission repository #838

Merged
merged 10 commits into from
Mar 3, 2021

Conversation

deverton
Copy link
Contributor

@deverton deverton commented Feb 26, 2021

This PR maintains backwards compatibility and does not modify existing controllers.

Implements a permission repository backed by SQL with both MySQL and Postgres supported.

Enabling SQL as the permission repository is done as follows:

sql:
  enabled: true
  connectionPools:
    default:
      jdbcUrl: jdbc:mysql://.....
      user: ...
      password: ...
      default: true
  migration:
      jdbcUrl: jdbc:mysql://.....
      user: ...
      password: ...

permissionsRepository:
  redis:
    enabled: false
  sql:
    enabled: true

Integration tests have been updated to enable testing with the SQL provider (using testcontainers), but still default to Redis only as that's faster.

Depends on #835 a bit for testcontainer usage.

This PR maintains backwards compatibility and does not modify existing
controllers.

Implements a permission repository backed by SQL. No migration is
provided as the data is recreated on the next role sync or at startup.

Enabling SQL as the permission repository

```yaml
sql:
  enabled: true
  connectionPools:
    default:
      jdbcUrl: jdbc:mysql://.....
      user: ...
      password: ...
      default: true
  migration:
      jdbcUrl: jdbc:mysql://.....
      user: ...
      password: ...

permissionRepository:
  redis:
    enabled: false
  sql:
    enabled: true
```

Both MySQL and Postgres are supported.

Integration tests have been updated to enable testing with the SQL
provider (using testcontainers), but still default to Redis only
as that's faster.
@deverton deverton marked this pull request as ready for review February 26, 2021 06:33
Copy link
Member

@robzienert robzienert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm no fiat expert, but what you have here makes sense to me from a general technical perspective. Will defer to @jonsie and @cfieber for actual implementation bits.

Have you built and validated this change in any environment (e.g. load tests)?

Great work, thanks for taking this on.

extensionResources.add(resource);
}
});
resources.forEach(this::addResource);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Good improvement.

@deverton
Copy link
Contributor Author

deverton commented Mar 2, 2021

It's been subjected to some load testing in our staging environment. For the workload there we see performance is roughly the same. Results with the current set of patches and with one fiat instance running 4 vCPU and 8 GB.

Elasticache

Amazon ElastiCache running cache.m6g.2xlarge

Action Min Max Average
get 35 ms 109 ms 57 ms
put 4 ms 16 ms 6 ms
getAllById 319 ms 915 ms 510 ms

SQL

Amazon RDS db.r6g.large running 5.7.mysql_aurora.2.09.1

Action Min Max Average
get 28 ms 39 ms 34 ms
put 32 ms 63 ms 37 ms
getAllById 102 ms 130 ms 111 ms

@deverton
Copy link
Contributor Author

deverton commented Mar 2, 2021

Once I get the data migration logic in place we'll try rolling this in our production environment soonish.

Copy link
Collaborator

@jonsie jonsie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Very curious to see how this performs in your production environment. I may take a shot at running this in our preprod env too.

@deverton
Copy link
Contributor Author

deverton commented Mar 3, 2021

Just ran in to an issue with how the dual repository looks up the beans from the context which I'm hoping to fix today.

@jonsie
Copy link
Collaborator

jonsie commented Mar 3, 2021

@deverton Fiat should rebuild all permissions on boot (and on a schedule) so I'm not sure the dual repository is necessary anyways.

@deverton
Copy link
Contributor Author

deverton commented Mar 3, 2021

It might be how we've got things configured, but new instances of Fiat don't seem to role sync immediately. There's typically roughly 10 minutes before the first scheduled sync runs and there's only a single instance in this case.

{"@timestamp":"2021-03-03T04:45:02.602+00:00","@version":1,"message":"Server is now HEALTHY. Hooray!","logger_name":"com.netflix.spinnaker.fiat.config.ResourceProvidersHealthIndicator","thread_name":"http-nio-0.0.0.0-7003-exec-2","level":"INFO","level_value":20000}
...
{"@timestamp":"2021-03-03T04:53:27.007+00:00","@version":1,"message":"Acquired Lock Lock{name='fiat.userrolessyncer', ownerName='spin-fiat-6954658dd9-nrhmj', leaseDurationMillis=10000, successIntervalMillis=600000, failureIntervalMillis=600000, version=1614747207004, ownerSystemTimestamp=1614747207004, attributes=''}.","logger_name":"com.netflix.spinnaker.kork.jedis.lock.RedisLockManager","thread_name":"scheduler-3","level":"INFO","level_value":20000}

@deverton
Copy link
Contributor Author

deverton commented Mar 3, 2021

I've dropped the dual repository code out of this PR for now. I'll do some more testing locally to see if the issues on cutover persist for us.

Would you like me to squash these commits up before merge?

@jonsie
Copy link
Collaborator

jonsie commented Mar 3, 2021

Yeah, 10 minutes is the default of syncDelayMs so perhaps you are using the default here? We have that dropped down to 2 minutes in our config.

No need to squash commits, I'll squash them on merge.

@jonsie jonsie merged commit 1be70ba into spinnaker:master Mar 3, 2021
@deverton
Copy link
Contributor Author

deverton commented Mar 8, 2021

@jonsie so rolled this out in production and it seems to be working fine. Performance for bulk operations like getAllById() and getAllByRole() are much faster. The get() operation is also faster for our data set. The big downside is that put() is much, much slower, nearly 10x as much (50 ms vs 500 ms).

Overall, this means that the user role sync process is still faster than when we were on Redis but is still too slow. The majority of our sync time was spent in getAllById() on Redis so the slower put() is made up for by that.

The put() operation was implemented pretty naively by just deleting and re-inserting records so there's obviously room for improvement there.

@jonsie
Copy link
Collaborator

jonsie commented Mar 9, 2021

@deverton Yeah I was wondering about that delete/insert in the put operation but I decided not to say anything since it's +/- the same behavior in the Redis permission repository.

I think all in all this is a good sign though. We're running this in our test environment right now and things look good, I will promote this to our staging environment soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants