-
Notifications
You must be signed in to change notification settings - Fork 702
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate caching layer for kubeapps plug-ins #3032
Comments
Just a thought: I don't know that we need to go that far, and even if we did, it doesn't solve the fact that such a CRD won't necessarily change when the corresponding index.yaml is updated. With flux it will, aiui, flux itself will handle keeping the cached index.yaml up to date with some set frequency, and I guess you're planning to watch that resource for changes and update the extracted content of the redis cache when that happens. But I wonder if a more general approach, which will work for any requirements, would be for a plugin to be able to provide a function that gets called on some condition, whether that's a time or a watch. Note also that I don't think a separate controller/deployment is required... any goroutine could run on startup as part of the existing kubeappsaips service to watch resources and/or do things at certain times... might be worth starting as a goroutine there, but see what you think. I guess one other thing that may become essential here (if it's a general service provided to all plugins) is that the process that watches is going to need explicit RBAC to whatever it's watching (unlike our API endpoints which can run synchronously as the user). If it's a goroutine running in the kubeappsapis service, then I'm OK with the burden of responsibility being on the person configuring (or the chart) to bind the read-only RBAC for the configured plugins to the kubeappsapis service account. Looks like an interesting problem!
|
* bump fluxv2 version for local dev env * step 2 * step 3 * step 4 * step 5 * step 6 * step 7 * step 8 * step 9 * step 10 * step 11 * step 12 * step 13 * step 14 * undo un-intended changes from a messed up merge * step 15 * step 15 * test cleanup * Michael's feedback * Michael's comments #2 * step 16 * step 17 * step 18 * revert unintended change * step 19 * Michael's feedback #3 * Michael's feedback #4 * small change to force CI test run * small change to add a name to AvailablePackageDetail response * Michael's feedback #5 * add a couple of debug statements to help diagnose CI failures * Add --set redis.auth.password=password in CI * Fix chart deps version Co-authored-by: Antonio Gamez Diaz <agamez@vmware.com>
the first step was pushed. My next item is to improve the efficiency and robustness of cache based on watches. |
possibly related to above note. At the moment I am always setting expiration (time-to-live) to 0 for objects in the cache. That means objects never expire. I am wondering if I should give the caller (a plug-in using the cache) the option to set the TTL to something other than 0 and if so, give them some means either periodically refresh the contents cache or do it on a cache miss. |
That may help with another aspect I mentioned above: some plugins will need items to be updated regardless of whether a related k8s resource changes. For example, AppRepositories CRs in our existing helm support just point to a URL and currently have a cron-job for each to ensure it's checked at some period. Also, as per our discussion on your PR, I'd be keen that we verify that using Redis as the backend for cache data will enable plugins to filter/query for cached data (eg. filtering available packages by various fields, paginating results without duplicates etc.), as if the query possibilities are not strong enough, we may need to use postgresql (with json fields) as we do for the current helm support. Interested to see what you find. |
That sounds good. I can look at that real soon. It would help to see some unit tests in helm plug-in that exercise filter/query options |
on a 2nd thought using k8s.io/client-go/tools/cache may not be relevant for us. Upon closer inspection, k8s.io/client-go/tools/cache is "a client-side caching mechanism". In other words, something you might use in the absence of server-side caching. We have server-side caching with redis. |
copied from slack thread:
Each of (1), (2) and (3) are separated by a network hop, so the farther down
|
Yep, as you said, they're designed and optimized to store a value (only), not a structured document with it's own fields and subfields. Both SQL DBs and Document DBs are optimized for storing and querying structured data (postgresql just happens to do both normalized DB tables and document db functionality).
Yeah, I wondered about this - it makes perfect sense if you want to cache a simple value (the content of an index.yaml, as you're doing), but it may mean more work to query or use finer granularity at the cache layer. Worth thinking whether we can store something general but shared, such as the JSON for an AvailablePackage, for example (though no idea if that would work with Redis). ie. if plugins can cache and query AvailablePackages, though perhaps there's other things that would be needed... look forward to hearing more of your thoughts/investigations there.
Interesting! Regarding your options, I'd personally think that (3) is the best option if the backend can support it (this is what we're doing with the current helm support), but (2b) is less ideal but OK if the backend does not support it. I don't personally think (2a) is an option - the aggregate API should be aggregating results from the plugin API calls of the same signature (I'll post a separate comment about the pagination point that came up on slack). Regarding option (3) with Redis, I'd definitely avoid creating our own indexing system (your 3.1 which you also identify as a nightmare :) ), but using the existing (or nascent) RedisSearch indexing functionality to which you've pointed could be useful?
It's not clear to me from their docs yet whether a collection of data should be stored as one document (as in their books example: I wonder if it's worth investigating creating a doc per available package detail, then indexing as needed. This would mean we'd could provide a trivial API for each plugin to cache and query packages. Not sure though, keen to see what you find out. But to the larger question, yes, if you do go with Redis, I think we'll definitely be depending on this new RedisJSON and RedisSearch capabilities. The only non-OSS part seems to be the Active:Active support, which isn't something we'd need anyway? |
Dimitri said:
I don't think this is true. If individual plugins provide pagination then we can aggregate them easily enough, I think. (Note that the page token used in the API is opaque and so can easily include offsets for each plugin. The only sticking point is that offset needs to be to a certain item in the result, rather than a page of the result, which would be ambiguous as we may have included only half of a page in the aggregated results).
I'm very keen for the aggregate API to support exactly the same signatures as the individual plugins (ie. filtering, pagination etc.) |
FYI, we had a team meeting today where I described the full context of the current situation with the caching layer to Pepe. He mentioned that there is strong goal to have a kubeapps release mid-september which should include 'direct helm' and 'flux' plug-in. Something working that we can demo. Given that, it maybe more important at this time to focus on bringing the flux plug in up-to-date to have feature parity with helm plug-in and not invest 2-3 weeks at this time into a promising but not yet proven technology RediSearch/RedisJSON, particularly given the fact they are in private preview mode at this time. What may look good on a blog post MAY turn out to be less than perfect. I've been around long enough to know that :-). I told Pepe this would be fine and the best option to accomplish this seems like option 2b I described above - filtering done on the flux plug-in side. That way, I can keep what I already have working (some very basic caching with redis where I can use redis to filter by keys (repo names)), and implement other filtering options in flux plug-in itself, which shouldn't be too hard in the short term. I should have it done in 3 days or so. This would be a more iterative approach. Longer term, we will go back and revisit the RedisSearch and RedisJSON modules and see if we can make them do the hard work. The changes would be contained in one plug-in only and wouldn't affect other layers of the code. Helm direct plugin would, for the time being, continue to use postgres DB as the cache, as the caching layer I am working on isn't really finished. Antonio also suggested it maybe worthwhile to look into a GraphQL DB like Neo4J if redis does't deliver what we need when we get enough time to spend looking into what they offer with these new modules. This approach is fine with me. Pepe promised to discuss with Michael on the upcoming stand-up and let me know. To be absolutely clear, I still believe that lightweight, in-memory DB like redis is the a better answer for caching purposes than postgres. We'll just table the unknowns for a little while to make progress with other features and get back to it when we have time |
Yep, happy to move forward with your 2b option if that means 3 days vs 3 weeks... especially the new nature of the redissearch/redisjson. We can postpone the other plugins using a shared caching layer for now.
It may turn out to be that, with the RedisJSON and RedisSearch. Let's see (in the future).
Yep, fine with me. |
As the usage of RedisJSON and RedisSearch it is not allowed due to license restrictions, and kubeapps plugins are behaving with standard responses times in our testing, IMO we should close this issue and reopen just in case we really need to add a caching layer for plugins, after testing in more demanding scenarios. |
Sounds good. I've not done in-real-life tests of the flux plugin (ie. via the UI), but I think Greg has done lots testing (probably without the UI, but should be the same). If we can add the bitnami repo and install charts without issue, then we don't need any more investigation here for now. Let's make that the review task to move this to done. I'm keen to try it out anyway (probably next week) Thanks! |
Great, thanks Greg. That's good to know. And yes, we can work to improve them as we move forward, we just want to ensure it's comparable, which it is. I'll still aim to checkout where we're at using this via the UI on Monday. |
Ah, got to the bottom of it. The culprit had nothing to do with filtering. As a matter of fact, filtering took 0 time :-). The culrpit was unmarshalling bytes from redis cache into |
Excellent, nice work Greg.
I bet you can too! The question is whether you can then improve the helm one afterwards too :P |
yep, a simple switch from json to gob encoding (a few lines of code) and look at the difference :-) The only other thing I wanted to say about this issue is this: at some point back there was talk of writing a generic caching module that all plug-ins could use. I tried my best to do that with this cache I wrote. As things stand today, someone who wants both helm plug-in and flux plug-in will have a postgresql installation as well as redis installation in their cluster. Kind of heavy footprint. I stated this before, I strongly feel redis (an in-memory lightweight cache) is a better fit for our needs that postgresql is (a full-blown relational database that pulls data off disk as needed, granted with much more features than a KV store). Anyway, part of me was hoping at some point helm plug in might be re-factored to use redis as a cache as well. It doesn't have to be now and it doesn't have to be part of this issue, I just wanted to mention so we don't forget about it Having said all that, feel free to close this issue as you see fit |
Awesome Greg!
Yes, I think last time we discussed that we were saying that as long as we can have the same functionality (querying and searching via json queries that both postgres and mongo have, or something that otherwise enables the same functionality when indexing) then we'd be keen. I don't remember exactly, but thought that lead to those extra modules for redis which we can't include. Either way, it'd be excellent if we can do that eventually. |
Well, clear communication is very important, we've said that time and again. In that spirit, I'd like to lay out all the cards (or more like concerns) on the table. Not 100% sure what I am about to say below belongs in a public forum, let me know if not, will be happy to move it to a private forum if needed. Regarding
Let's switch gears for a second. Am I right in thinking that the scenario where a customer has two plug-ins (helm and flux) enabled and therefore both postgresql and redis deployed as unacceptable even for MVP? If I were a customer, I would certainly ask why two different caching solutions are used and my maintenance/administration costs have doubled? If the answer is "not acceptable", this is an existential threat, an underground mine that's just waiting to explode at some future point. We shouldn't just ignore the issue, it'll come up and will be much more painful to deal with later. If the answer is "yes, its acceptable for the MVP", then ok we can proceed and drop this topic for a while. |
catching up on this thread. I agree with Greg that a decision must be made to switch caching over to redis or stay on postgres. We cannot have both solutions (except during transition phase). In general redis should be more suited to caching than using postgres. Greg has shown that there is no performance issue from performing searching/filtering in-code. We should be able to upgrade to using redis, and we can always use RedisSJON/RedisSearch if/when it becomes possible. |
There was an earlier comment from Michael about the TTL for CRDs that do not have update events:
While it may be true we can support this, in case of AppRepository i think we should instead fix the AppRepository itself. Currently, the AppRepository does not have a status that provides conditions of its state as it is common in k8s. Especially, the AppRepository should have condition regarding the last time the sync was executed, whether it failed, and also whether the index.yaml was changed. There is another benefit from having status conditions. Using status conditions makes it easier to provide status feedback in the UI. I believe today we do not show any status, and the only way to find about issues is to look at the cron job (which a user may not have access to). |
agree 100% on both points above (1) decide on ONE caching base technology (redis or postgresql) Finally, as I've already stated, filtering/searching limitations of core redis are a non-issue for datasets we use in practice |
Sheesh, quite a thread to come back to after the weekend. Just want to re-iterate, I'm trying to say "Awesome [work] Greg", and on the following discussion to agree that yes, as long as we can show that we have something that provides the same functionality to the user in the browser (whether that's by having access to the same json document search or something else that enables the same functionality to be exposed when indexing repos), I'm happy for us to consider updating the helm plugin to use the same at some point. The response reads like something else has been understood. Replying in-line:
Right, I'm not involved there.
I understand you to mean here that the plugin itself does the filtering (still server side, but yes, client-side to redis/postgres/whatever) rather than getting the backend technology (redis, postgres, mongo whatever) to do the filtering. That's great and I have no problem with that given how quick you've shown it to be. I think you've mentioned that for a paginated request at the end of a large index, this is then scanning the whole index to return the last page, but that this is still much quicker and won't be an issue for memory. Great, let's start testing that end-to-end as a user of Kubeapps. I plan to setup Kubeapps with the plugin and give it a whirl this week.
I'm a little sad about the tone being used here. I think we've chatted about this numerous times now and each time I've explained the history of how we ended up using postgresql's documentDB support here. It's not due to any fondness of mine.
It's not unacceptable for trying out the flux functionality in Kubeapps, as I'm going to do this week. but for an MVP, I'm not sure that we can recommend enabling both plugins (Helm+Flux) at the same time, given that the Helm plugin will be picking up any helm installation, whether created by the Helm CLI, Kubeapps's helm plugin or those created by flux, rather than leaving the ones created by flux for the flux plugin to handle. We'd already talked about not currently supporting having both Helm+Flux plugins enabled at the same time. It'd make more sense to have Helm+Carvel or Flux+Carvel enabled (when available) but only ever one Helm-based plugin. If we're not supporting a scenario where both Helm and Flux are enabled, we can similarly update the chart so that postgres isn't deployed if the helm plugin is not enabled. If we later find that people do have a need to have both plugins enabled (Helm+Flux) we could do something about that, but I'm not sure we'll need/want to. Longer-term, if we do end up continuing to support both plugins, I'd still be keen to invest in unifying them, for sure, but it could be (for example) that we retire the helm-direct plugin at some point in the future. On the other hand, if updating the helm plugin to use redis is something really important to you right now, I really don't mind. No one wants to stop that from happening. Dimitri wrote:
Hi Dimitri . See my reply above and let me know what you think. Regarding the details of the AppRepository controller, can we please move that to a more relevant issue (if one exists already, or create one if not). Just re-reading my earlier comment before submitting this one, I see that the words I used: "or something that otherwise enables the same functionality when indexing" could be read to mean that I was saying that the technology itself had to enable json search capabilities for me to consider it for some reason, rather than meaning that as long as the choice of technology otherwise enables the same functionality to the user (ability to filter the data). I'm sorry that wasn't clearer. Just for the future, please know that I have no interest in blocking solutions that work well. |
yes, that's how I read it, hence my comments. Apologies for the tone. Pls don't take it to heart, I mean well. Not supporting helm+flux enabled at the same time. Ever? That's a new one, first time I am hearing anything like that. That does change the picture a bit. |
Initially caching layer initially specifically to the flux plugin, but the end goal is to generalise it to provide caching to all plugins as needed
Here is the plan
The text was updated successfully, but these errors were encountered: