Skip to content
This repository has been archived by the owner on Nov 1, 2022. It is now read-only.

Redis as alternative storage for container images repository cache #2815

Closed
wants to merge 10 commits into from

Conversation

prometherion
Copy link
Contributor

@prometherion prometherion commented Feb 3, 2020

Still a WIP for the feature filled in #2814: allowing redis as an alternative for the container images repository cache.

Providing new flags (--cache-{backend,request-timeout}) in order to allow backend selection and abstract the backend requests timeout (deprecating --memcached-timeout providing backward compatibility along with deprecation notice).

This first contribution aims to scaffold the main file also for future backend cache storages but keeping high backward compatibility to don't break things.

TODO:

  • Redis unit tests
  • Redis integration tests
  • Docs update
  • README.md changes
  • Scaffolding Helm Chart to allow Redis
  • Align fluxctl install to allow the Redis experimental feature
  • e2e tests

@prometherion prometherion changed the title [WIP] Redis [WIP] Redis as alternative storage for container images repository cache Feb 3, 2020
@squaremo
Copy link
Member

squaremo commented Feb 5, 2020

👋 This is a pretty substantial contribution in the making, thank you for getting involved!

May I ask, what are the advantages of Redis over memcached here? I have never been very convinced of memcached's suitability (it was something of an arbitrary choice), so I am interested to hear your take on it.

@prometherion
Copy link
Contributor Author

Honestly, there are any personal preferences over memcached or redis: both got their pros and cons, advantages or trade-off. I like redis because of low memory footprint but, in the end, it's just something personal, as preferred flavor :)

@prometherion prometherion changed the title [WIP] Redis as alternative storage for container images repository cache Redis as alternative storage for container images repository cache Feb 6, 2020
@2opremio
Copy link
Contributor

2opremio commented Feb 7, 2020

@prometherion Thanks a lot for the effort! I will try to review this early next week. In the meantime, would you mind adding an e2e test? extending the fluxctl list-images and image_release tests to execute both on memcached and redis would be a good start.

@prometherion
Copy link
Contributor Author

I'm working on the refactoring to allow test with multiple cache backend but I need to perform first another refactoring regarding the files generated by fluxctl install.

Since I followed the actual implementation, we end up with the following file naming:

  • {redis,memcache}-dep.yaml
  • {redis,memcache}-svc.yaml

To allow bats testing we would need various fixtures for each backend because kustomize doesn't allow to use wildcards on resources names

A possible workaround could be output just cache-{dep,svc}.yaml instead of specifying the backend kind.

@2opremio
Copy link
Contributor

2opremio commented Feb 7, 2020

The Kustomize files in the tests allow templating, in case that helps

.circleci/config.yml Outdated Show resolved Hide resolved
@squaremo
Copy link
Member

@prometherion Do you mind squashing some of the fixup commits? I'm thinking of Wrong package name and the like.

@prometherion prometherion force-pushed the issues/2814 branch 2 times, most recently from a1a5c47 to 59768a9 Compare February 12, 2020 17:19
@prometherion
Copy link
Contributor Author

sure @squaremo: just done!

@Morriz
Copy link

Morriz commented Feb 18, 2020

Really hoping to see this one land! So we can get redis pubsub on new registry images...yay!

@prometherion
Copy link
Contributor Author

@2opremio so, in the end, agreed on discharge the idea of running parallel CircleCI jobs.

Actually, implemented testing of both backing services (redis and memcached):

  • 14_release_image-{memcached,redis}
  • 15_fluxctl_install-{memcached,redis}

If it's ok I'm going to squash commits to keep it cleaner.

@2opremio
Copy link
Contributor

2opremio commented Feb 28, 2020

Sorry to be picky, but I would factor out the test code and call it using a different backends

@prometherion
Copy link
Contributor Author

prometherion commented Feb 28, 2020 via email

@prometherion
Copy link
Contributor Author

@2opremio just pushed: I moved the test logic inside the lib/{release_image,fluxctl_list}.bash files and just calling them from the single BATS files.

I'm still using prefix backend cache service name strategy since I have to ensure to perform the teardown also in case of errors of a single test.

I would write something like (pseudo-code)

@test "Fluxctl list-workloads on memcached" {
  setup 'memcached'
  warmup
  # Test fluxctl list-services
  run fluxctl --k8s-fwd-ns "${FLUX_NAMESPACE}" list-workloads --namespace "${DEMO_NAMESPACE}"
  list_workloads "${status}" "${output}"
  teardown 'memcached'
}

@test "Fluxctl list-workloads on redis" {
  setup 'redis'
  warmup
  # Test fluxctl list-services
  run fluxctl --k8s-fwd-ns "${FLUX_NAMESPACE}" list-workloads --namespace "${DEMO_NAMESPACE}"
  list_workloads "${status}" "${output}"
  teardown 'redis'
}

But honestly, doesn't look good to me since I wouldn't be able to execute tests per backing service (e.g.: E2E_TESTS='*-redis*' make e2e)

@Morriz
Copy link

Morriz commented Mar 17, 2020

Any updates here?

@ianmuscat
Copy link

ianmuscat commented Mar 31, 2020

Since Redis offers authentication out of the box, perhaps it might be worth considering allowing users to have the ability to configure Flux to use password-based authentication with Redis. Right now it seems that this value is hardcoded to an empty string

Password: "", // no password set

@2opremio
Copy link
Contributor

2opremio commented Apr 9, 2020

Unfortunately I don't have time for it, sorry.

@prometherion
Copy link
Contributor Author

NP, who's in charge or available to take a look at this?

Copy link
Member

@squaremo squaremo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nits, but the main change I'm requesting is to remove support for changing the cache from fluxctl install; it's not really meant to contain all options, and it would make this PR simpler*.

If you are (understandably) fatigued by this PR, then no worries -- I still think it is a solid contribution 💯 and I will probably end making these few adjustments and rebasing at some point.

*Though you might have to be arrange the end-to-end tests a little differently.

.circleci/config.yml Outdated Show resolved Hide resolved
chart/flux/README.md Outdated Show resolved Hide resolved
chart/flux/README.md Outdated Show resolved Hide resolved
chart/flux/README.md Show resolved Hide resolved
chart/flux/templates/NOTES.txt Outdated Show resolved Hide resolved
@@ -1,4 +1,4 @@
{{- if and (eq .Values.memcached.enabled true) (eq .Values.registry.disableScanning false) }}
{{- if and (and (eq .Values.memcached.enabled true) (eq .Values.redis.enabled false)) (eq .Values.registry.disableScanning false) }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there something about go templating that means you have to compare values to booleans, rather than just evaluating them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just followed the same comparison applied previously, but we change simplify it if you wish: just tried to don't make too many big diffs to make PR easier to review

@@ -52,6 +52,7 @@ fluxctl install --git-url 'git@github.com:<your username>/flux-get-started' --gi
"do not scan container image registries to fill in the registry cache")
cmd.Flags().StringVarP(&opts.outputDir, "output-dir", "o", "", "a directory in which to write individual manifests, rather than printing to stdout")
cmd.Flags().BoolVar(&opts.AddSecurityContext, "add-security-context", true, "Ensure security context information is added to the pod specs. Defaults to 'true'")
cmd.Flags().StringVar(&opts.CacheBackend, "cache-backend", "memcached", "Select the docker images repository cache backend. Defaults to 'memcached'")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despite the fact that other random flags have found their way in here (I'm looking at you, --add-security-context), I don't think it's necessary for the cache backend to be an option here, especially not while it's experimental.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from an end-user perspective, if I want to give a try to a different cache service, I would really enjoy this flag: furthermore: at first glance, it looks like a Helm alternative and I get your point, probably flux install should just provide a battle-tested installation without so many parameters.

If we want to remove the cache-backend flag, I guess the same fate has to be reserved for add-security-context.

cmd/fluxd/main.go Outdated Show resolved Hide resolved
@@ -60,7 +59,7 @@ const (

// This is used as the "burst" value for rate limiting, and
// therefore also as the limit to the number of concurrent fetches
// and memcached connections, since these in general can't do any
// and cache backend connections, since these in general can't do any
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's used for Redis though?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed, but we could use the same option for the cache.RedisConfig MaxConns property: WDYT?

apiVersion: v1
kind: Service
metadata:
name: redis
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better if this corresponded with the default value for the service name? That way there is less to accidentally get wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(notwithstanding I advise above to remove this from fluxctl install entirely)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep in mind that removing from fluxctl install will require additional time since I'll have to find a better solution on how to hack the e2e tests to evaluate also the Redis server.

prometherion added a commit to prometherion/flux that referenced this pull request May 5, 2020
prometherion added a commit to prometherion/flux that referenced this pull request May 5, 2020
prometherion added a commit to prometherion/flux that referenced this pull request May 5, 2020
Allowing redis as an alternative for the container images repository cache.
Providing new flags (`--cache-{backend,request-timeout}`) in order to
allow backend selection and abstract the backend requests timeout
(deprecating `--memcached-timeout` providing backward compatibility
along with deprecation notice).
This first contribution aims to scaffold the main file also for future
backend cache storages but keeping high backward compatibility to
don't break things.

TODO:
- Redis unit tests
- Redis integration tests
- README.md changes
- Kustomize changes
- Docs update
Refactoring has been requested to avoid the import cycle, also to take
adavantage of `integration_test` suite in order to test same cache
warmer for each implementations.

TODO:
- find out how to setup the properly integration-tests: Makefile task
  seems pointing to a missing bash script or binary
Documenting the additional variables: actually, didn't provided an
abstraction since it's just an experimental feature and we don't want to
break backward compatibility for users overriding variables.
- Updated the docs for new feature
- Updated how to use Redis
- Updated fluxctl install command to allow experimental feature
- Switching to `cache-{dep,svc}.yaml` for fluxctl install output
- Allowing tests per cache backend (using `CACHE_BACKEND` env variable)
Instead of running all tests in parallel, since we got few of them that
really requires to interact with the image scanner component,
duplicating them instead of putting a parallel build step on CircleCI.
@squaremo
Copy link
Member

squaremo commented Jun 4, 2020

Rebased this. @prometherion Are you OK if I remove the fluxctl install flag, and possibly make other, minor amendments?

@prometherion
Copy link
Contributor Author

Rebased this. @prometherion Are you OK if I remove the fluxctl install flag, and possibly make other, minor amendments?

I can take care of it because I totally get your point and probably we have to hack a bit tests with kustomize, although that could increase the test complexity :(

@Morriz
Copy link

Morriz commented Jul 8, 2020

Bumping this as it would help us a lot

@Morriz
Copy link

Morriz commented Aug 11, 2020

@prometherion can you maybe take a look at this again? Progression here would help a lot

@prometherion
Copy link
Contributor Author

prometherion commented Aug 25, 2020

@prometherion can you maybe take a look at this again? Progression here would help a lot

Should we proceed with this?

I mean, it seems Flux will be deprecated in favor of the GitOps Toolkit, not sure this PR is still relevant.

@squaremo could you give us feedback?

@kingdonb
Copy link
Member

I see a lot of work has gone into this PR and I think there is some recognition in the discussion here that it is now a longshot for inclusion in a next release of Flux v1.

In the new image reflector controller, if I understand correctly image data are maintained in the ImageRepository custom resource and memcached is no longer used, I am not certain I understand how this scales as usage grows, but it seems it will no longer be a concern of redis vs. memcached. Will that moot the entire issue, or should we continue the discussion here?

@kingdonb
Copy link
Member

This possibly related discussion has been opened: fluxcd/flux2#802

Deprecate "select latest by build time" for image automation

I have not yet evaluated the design of image-reflector-controller for scalability, but I think the folks who are more deeply embedded than myself have chosen this design to address scalability concerns exposed by flux v1. Especially if you have an opinion about this, if you have time to look, we would appreciate your comment there!

@kingdonb
Copy link
Member

In the interest of reducing the number of open issues not directly related to supporting Flux v1 in maintenance mode, and respecting you may have moved on already, I will go ahead and close out this issue for now.

If you have a use case for Flux that isn't covered well in the new Flux v2 (which is a total rewrite), we want to hear about it.

If you've been following our development efforts then of course we hope you are able to upgrade, here's more info on how to find support with that: https://fluxcd.io/support/

It looks like the decision has been made, and heavyweight image metadata caches in Flux are going to be a thing of the past soon, (for many users in production this is a thing of the past already!) Thanks for making this contribution. Closing.

@kingdonb kingdonb closed this Feb 26, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants