-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Adding mount options to functions. #320
Comments
This allows functions to mouunt volumes and other directories. It uses the same configuration as is used by docker. Fixes openfaas#320 Signed-off-by: Ali Al-Shabibi <alshabibi.ali@gmail.com>
Derek add label: proposal |
Personally I think this is a fantastic idea. @alshabib , do you have any specific use-cases in mind that would necessitate this functionality? I'm thinking maybe a database (like MongoDB) as a function...? |
Thanks!
For example, for the process cleaning large amounts of data it is simpler for a function to read the raw data from a volume and rewrite the processed data to the same or other volume.
The database use case is an interesting one, especially if you want to scale the number of readers to a database easily.
…--
Ali Al-Shabibi
On Oct 24, 2017, 7:32 PM +0200, Eric Stoekl ***@***.***>, wrote:
Personally I think this is a fantastic idea. @alshabib , do you have any specific use-cases in mind that would necessitate this functionality? I'm thinking maybe a database (like MongoDB) as a function...?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
We might revisit this in the future but I think it is an anti-pattern for functions which are short-lived and stateless. This will encourage stateful behavior and assumptions. |
I agree that this feature may encourage bad behavior, but then again you cannot stop people from shooting themselves in the foot. Agree also that functions should be stateless and short-lived but that does not mean that they will not consume or emit large amounts of data and there is no reason why the volume of this data should be limited by the http session. This is simply an alternative method to providing input to a function. Would you prefer an option to openfaas which would disable this feature rather than not providing it entirely? |
I am wrinting a ruby function that find an IP address in configuration files (firewall, proxy, bigip etc) That's a good use case for a mount volume fonctionnality no ? |
Feedback (take it or leave it): This is the only missing feature that prevented me from deploying this system for functions that handle batches of file pulls/pushes, translations, scraping. It's really an amazing framework, but I have to often deal with very large flows of handling files gathered from abc protocols due to xyz legal or contractual obligations. A serverless function system like this with the ability to have any kind of volume support would be pretty helpful. I really want full blown compose functionality with regards to volumes and network. File access is way more reasonable for this kind of thing. I hope you guys reconsider implementing such. I would be interested in what workarounds I can apply to achieve the result of bind mounting a specific fixed directory on all swarm workers in the cluster. I might just deploy with the related patch to solve my problem. Is there any other way to solve it? A service on the same network maybe? |
Hi I'd like to know more about your usecase. Do you have any specifics? |
So, I'll give you one such function, say you have to transcode/transcribe a proprietary audio format designed for storing call center interactions. You need to extract from it the audio payload, make it into something that can be understood by a voice transcription engine, as well as extract the other data and store it in a way that it can be digested for its analytics value, and place both of those things in a place where they can be picked up by another process later. I do a lot of batch information pulling/pushing/translating/feeding/scraping across many files for a lot of clients, but occasionally patterns emerge where I'd love to be able to develop a nice parallel function that I can feed variables and a list of files to do work like decrypting these multiple batches of 50000 audio files, dropped on some FTP site on storage we own and can export to our container host machines... I'd rather not do such a thing through a layer like S3, I need something at least slightly faster, and NFS is simple. |
I think that in general the advantage of such a feature would not be to
allow a function to store state but rather enable a function to process
larger data volumes. Of course, it would be hard to prevent a use from
doing the former, but I guess you can only put up so many guard rails.
…On January 13, 2018 at 6:03:15 PM, Toggi3 ***@***.***) wrote:
So, I'll give you one such function, say you have to transcode/transcribe
a proprietary audio format designed for storing call center interactions.
You need to extract from it the audio payload, make it into something that
can be understood by a voice transcription engine, as well as extract the
other data and store it in a way that it can be digested for its analytics
value, and place both of those things in a place where they can be picked
up by another process later. I do a lot of batch information
pulling/pushing/translating/feeding/scraping across many files for a lot of
clients, but occasionally patterns emerge where I'd love to be able to
develop a nice parallel function that I can feed variables and a list of
files to do work like decrypting this batch of 5000 audio files.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#320 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AA5rrKFySN-yETyoymQm7XgEjyimy6DNks5tKURCgaJpZM4QBPd_>
.
|
I ask that you give us just enough rope to hang ourselves if we so choose. We understand the spriit of your project, we just want a way out of its limitations that is easy for us to use. |
+1
…--
Ali Al-Shabibi
On January 13, 2018 at 7:22:08 PM, Toggi3 ***@***.***) wrote:
I ask that you give us just enough rope to hang ourselves if we so choose.
We understand the spriit of your project, we just want a way out of its
limitations that is easy for us to use.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#320 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AA5rrB8Dqc5Ot6ULAFvZleYCx9Npuegzks5tKVbAgaJpZM4QBPd_>
.
|
Using object storage is simple and fast. I'd encourage anyone on this thread to try our recommended approach before pushing back and insisting on volume mounting. Minio / Ceph are both super easy to setup on Docker or Kubernetes: Once you have the daemon running you can access it in the cluster and create buckets / push/fetch objects via a client library. Here's the Python library for instance: https://docs.minio.io/docs/python-client-quickstart-guide I'm planning on providing a small sample function but in the meantime there's our colorisebot which you can check out for inspiration. |
It's an appealing solution, and I appreciate it for sure, the only problem with it is the preexisting infrastructure and scripts that depend on a volume being there. I will definitely take a look in any case, but I won't have time to refactor the many jobs to use s3 when they currently use volumes. Many of us are trying to use newer tools like this to simplify older systems, and while I might be able to sell my boss on the philosophy of why s3 might be better, there is simply too much work to do to scale such a mountain of technical debt just to appease a design preference. Unfortunately, it appears fission isn't better in this regard. I might have to kludge something stupid together with Jenkins to kick off runs I guess... Any other input welcome as volumes are a must at least for the intermediate. Thank you for your work even if we couldn't come together on this problem. It's a good project. |
@alexellis So if I deploy Minio on K8S and have NFS as PersistentVoume for Minio and then store files in Minio from Functions, I will essentially be able to access NFS from the Functions. Is that correct? And would it be a good idea to do this? |
@raw I'm not sure you've read the new blog on using Minio & OpenFaaS? |
@Toggi3 If you have a binary that reads a file from the filesystem, then copy the file with Minio's |
@alexellis I'm sorry, which blog are you talking about? The Getting started OpenFaas on minikube? |
All - I've written a blog post on how to use object storage (S3) with OpenFaaS - just like you have to do with AWS Lambda, Azure Functions or similar. You have client libraries available for most programming languages including a binary client for bash. It's very easy to use and setup: https://blog.alexellis.io/openfaas-storage-for-your-functions/ The performance is also good. This powers Colorisebot on Twitter. |
What if that thing is a gpg encrypted archive that is >10GB that has to be decrypted then untar'd, dumps out a ton of proprietary audio files from a call interaction system that have to be transcribed into digestible pcm wav and csv metadata by another process and stored back on the volume for another process to pick up? I have to first wait for a copy operation from s3, do my operations, then copy it back? Too much time. I already have to sometimes pull these things from remote SFTP, S3, google drive, locations over the internet and I am targeting 24 hour turnaround for jobs like these every day, end-to-end. We don't choose how our payloads are constructed, or even necessarily how they are delivered, because we aren't the producers of them. Some of these payloads are not nice to work with at all. |
@Toggi3 you'd have the same problem(s)/issue(s) with an underlying NFS filesystem. Moving a 10GB file around on the network is a very specialist problem. |
Over the weekend I might try to do as you suggest and compare performance. I agree I have a very specialist problem, for which I have been seeking out specialist solutions like docker functions... |
So a persistent posix compatible volume introduces too much state in the system, but a persistent object store does not? That doesn't even make any sense. State is state. But let's say that's a valid argument for argument's sake. There is also the fact that there's not an object store out there that can compete with the performance of distributed parallel filesystems designed for high performance computing. Or that very few real world applications can easily or efficiently interact with an object store. Not everyone is working with brand new shiny applications. Very few people are. Most of us have to deal with legacy applications, and work to slowly change them over time while we dream of rewriting it in the always distant "someday." Most of us also rely in some way on 3rd party libraries and apps, and again most of those cannot easily or efficiently interact with an object store. Copy the file down and back up? If functions are indeed supposed to be short-lived, then suggesting that they should spend 2/3 of their runtime performing network transfers is rather silly. It's also a massive waste of CPU resources and time. Now we're required to have a substantial amount of additional resources in order to perform the same amount of tasks in the same time as we could if we could just grab the data off of a volume. But let's just say we're fine with copying the file down and back up. What about operations where we need large amounts of disk space. Let's take video transcoding for example. We may need several hundred GBs or more of disk scratch space to perform the operation. We probably want to be able to run more than one function at a time on each server. And we're unlikely to have servers sitting around with several TB of local disk attached to each one, especially in the cloud. It's just cost-prohibitive. But we are probably more likely or inclined to have a large high performing distributed filesystem mounted on each one. Here's an example where we want the mount not for state at all (remember we are assuming here that we're fine with copying a massive file down and back up), but just for temp/scratch space in order to carry out the function. Don't get me wrong, I'm a big fan of this project. And I can admire your dedication to the principles of the project. But the world isn't as black and white, and there are a whole host of people that you're shutting the door to the project on because they can't do something as simple as bind a mount. The door is shut on anyone with any kind of legacy application that they want to start to use something like this for. The door is shut on anyone with an application that has "a very specialist problem." The world is very specialized, and there are a whole lot of specialized applications. You're excluding a lot of people from benefiting from this project over what is such a small request. It's your project, so do what you will, but at the end of the day nobody is asking for anything that docker doesn't already do. All that is being asked is that people be able to utilize an existing basic feature of docker. Let's go back to the beginning (state) for fun. Functions can connect to any external service they want, databases for example. Is a mount point really going to encourage stateful behavior more than a database connection does? I don't really think so. You don't prevent a function from interacting with the world outside of it -- most of which is stateful. I don't see how a volume is fundamentally any different. |
I would like to follow up on this feature request. Honestly its something that is causing me to hit a wall with introducing Openfaas into our current pipeline and offering a migration path from our current VMs and shared process manager approaches to deploying small arbitrary services and event handlers for users. While I understand that it is considered an anti-pattern to rely on mounted volumes for state and configuration in containers, it is also very limiting for cases where it is needed.
Sure it would be great if all of our code were updated to pull configs from consul, could be 100% packaged as standalone in a container, and do any filesystem data transformations through an object-store api. But we aren't there yet and the transition would be slow. We definitely want to get to this point though. Furthermore, Openfaas states that it officially also supports long-running microservice workloads in addition to short-lived functions. So to say that Openfaas only focuses on faas patterns doesn't seem to align with that extended support? I feel it would be ideal to enable users to solve their problems, even if it means they have to enable the feature and that there are warnings and notes around the pattern as being less than ideal. In our case, it would really help transitioning our 15+ year old pipeline. It seems Fission supports Volumes now in their pod spec: But honestly, I want to use Openfaas. I've already prototyped custom templates for a facility private template store. I have written some patches to As a semi-related annecdote, I maintain the build and deploy system for our code at my studio. It happens to be an extension of the Waf build system. Now the maintainer of the Waf build system is extremely opinionated about what should and should not be allowed in the build process for a user, which has led to some feature requests or pull requests being denied. In which case, they end up being an extension added to our build system instead, because we need to enable users to solve their problems. There may not be something directly provided as a 1st class concept in our build system layer, but then we still enable users enough flexibility to do what they need to do to solve their problems. They may need to opt into a feature that is documented with caveats or opinions. |
Actually, I have similar request before #1232 . |
Yes @feiniao0308 it sounds like my situation as well, where we have a studio with tons and tons of library and application versioned deployments to various nfs mounts. They are deployed frequently by many teams. It is currently not feasible for us to fully package them into a container as we aren't 100% able to trace all the dependencies. Some libraries link against other libraries, so you have to resolve the entire dependency chain, even looking at the RPATH of linked libraries, etc, etc. |
Exactly. I hope OpenFaaS could expose the mount option. When I search mount keyword in the issues, I did see many similar issues. |
I've confirmed on each of their slack channels that both Fission and Nuclio support full expression of volume mounting in their yaml specs. It would be really awesome if Openfaas would match the support. |
Not sure if OpenFaaS will support to expose the mount option and let function owner to make the decision. It'll make OpenFaaS more flexible if the function has this option exposed. @alexellis @justinfx |
@feiniao0308 I've already got NFS mounts working in the PodSpec, in Fission.io functions. |
@justinfx you add extra steps to update function pod spec after it's deployed? How do you make it work? |
Not to get too off topic about another project, but you just use their |
@justinfx thanks for the info. I'll check that project. Thanks! |
just +1'ing this as a feature that would be great to have. I've read everyones arguments in this issue as well as this one #1178 and I think it's fair to say this would be a great option. |
I was reading about Hashicorp Nomad and the integration with OpenFaas via the faas-nomad provider. On the topic of volume mounts... |
@pyramation what is your specific use-case, and have you tried object storage yet? |
@justinfx can you detail what your function does? It is likely that you can use object storage such as Minio or build data into the container image. |
Hi @alexellis . I work at a large visual effects studio, with many years of legacy code making up the pipeline. In addition to traditional core and pipeline developers we have hundreds of artists with some level of coding skills that are capable of scripting tooling around their primary digital content creation packages (Autodesk Maya, Foundry Nuke, SideFx houdini, ...). Our common way of interacting with project data is through an complex abstraction on top of NFS mounts and filers. Layers are built on layers, with applications and libraries that write to the file system. |
@justinfx that's interesting! so if I understand correctly, some assets are so large that it's better to have functions dynamically attach themselves to read (and potentially write) to the drives to perform an operation, vs having to download them over a wire each time. (p.s. I used to work for SESI w/houdini) @alexellis my number one use case right now is developer experience for creating openfaas functions, particularly hot-loading. I'm using kubernetes and openfaas and if, during development, I could hot-load my code, I would save quite a bit of time that I normally sit there building the docker images for every code change. In the case of nodejs it would save me a up to minute for every code change. Even when using python, it can feel like a larger-than-needed compile step for any code change, compared to if we had a hot-loading volume the changes take milliseconds - Simon did write something for docker-compose https://gitlab.com/MrSimonEmms/openfaas-functions/-/blob/master/docker-compose.yaml#L12 but I would be great if there was a solution for k8s. |
@pyramation yes I should have put a little more effort into focusing on the data question from @alexellis. We generate lots of data. It would not be uncommon for a simulation to generate 1TB of temporary or intermediate data. Our pipelines are about transforming data until ultimately we produce final pictures for a movie. So the idea of using functions in our pipeline would be to respond to async events and perform transformations on arbitrary data. Some work would be too time consuming and need far too many resources to be done in a function invocation, in which case we would just use functions to trigger jobs in our render farm. But there is plenty of work to be done in event handlers where we need access to image data, simulation data, scene files, and applications and libraries that may have no support for an object storage API. We need the flexibility to support these workflows, even if ultimately it would be better to do what we can through an object storage api that maybe proxies to our nfs file system. |
@alexellis have you had time to consider my last replies to your question as to why a Minio solution would not be sufficient? I would like to know if your position is firm on this and we cannot expect Openfaas to ever allow any kind of mounts (nfs, hostPath, config map). Or if maybe with the amount of support for this feature request, possibly your position has softened to where it could be a opt-in configuration option in the deployment of Openfaas? I feel that there have been enough replies to your request for justification of the feature that it warrants some kind of support to bring this project in line with the same offering in other frameworks. |
Hey folks. I wanted to bring this up again because by not allowing access to volumes, functions are not able to communicate with the GPIO pins using /dev/mem. This is a problem for me and the work around is to run your containers in privileged mode which feels like an even worse idea that possibly allowing state in containers. Given the pi is an explicit deploy target and IoT is an explicit use case this seems like a miss. Is there a work around here that I'm missing? |
@funkymonkeymonk it seems clear that Openfaas has a hard stance against allowing mounts. But a workaround for this limitation is to use mutating webhooks in kubernetes, which would let you do something like an annotation declaring a need for a mount, and the webhook can mutate the spec and add the volumes. You could either write and deploy a mutating web hook manually, or implement it in something like Open Policy Agent. |
Thanks for the thought. Unfortunately I am using faasd to avoid having to run k3s so kunernetes based solutions require a full rethink and honestly if I'm going that route I'll likely look at alternatives instead. |
I also really wish that volumes were exposed.... |
I also have a use case that requires applying --volume to the container to allow USB access for a container to connect to the TPU device that is attached to my Pi. I am not sure if there is a way other than --volume to achieve this. |
I own a Coral edge TPU, so find out exactly what is required and copy and paste the pod spec here. We will not be enabling privileged mode for functions, which I saw you request a day or two ago. I sent you some examples on the issue with devices etc. Did you try them? |
Thank you, Alex. I got your point. I am going to document how I gave functions TPU, GPU, etc access and will share it here later. |
Finaly we searched for another technlogy... We really need big volumes attached to our functions. We tried Fission. It does provide a new volume for each new function but unfortunatly it doesn't scale back to 0. We ended up using Gitfaas.... It creates one pod per request and you have a complete access to the deployment's specs. So we have a clean volume created each time. I know that Openfaas has chosen to fork processes to gain speed. But as leaders on this subject I don't like it when they force people into thinking that FAAS must always be short lived and fast. It's just the road they chose to go down. Each piece of tech has its force and weakness. |
The proposed change would allow functions to mount volumes and other directories through the normal docker configuration. This would allow a function to process relatively large amounts of data without having to pass it through http/stdin.
Any design changes
Add docker mount struct to the CreateFunctionRequest struct and passing it along in the create function handler
Pros + Cons
Pros:
Cons:
Effort required
Little, it's a two line change.
The text was updated successfully, but these errors were encountered: