Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow ignoring of library files too #112

Open
declension opened this issue Nov 10, 2016 · 11 comments
Open

Allow ignoring of library files too #112

declension opened this issue Nov 10, 2016 · 11 comments

Comments

@declension
Copy link
Contributor

declension commented Nov 10, 2016

First, thanks for a great project, been very useful.

Proposal

For small lambdas in particular, it would be useful to either:

  • allow a mode that auto-ignores distribution / library files (e.g. --minimal or something) that are generally not needed (see below)
  • or allow the filters specified in lambda.json to apply not just to the source but to the virtualenv-related inclusions too, before packaging into a zip.

Naturally this should be optional as virtualenv is necessary if you're not running on the standard Amazon setup, etc.

Example

A small python lambda function, consisting of a few source files was uploading very quickly.

By adding a single pip requirement (also, as it happens, very small), and rebuilding, the size of the upload suddenly shot up by ~1000% (many megabytes) - on further examination this was due to the inclusion in the zip of:

  • Compiled versions of all library files (*.pyc)
  • Various Windows binaries e.g. gui.exe which will never be useful on AMIs
  • setuptools, wheel, pip etc and their tests
  • CA Certs (357k) - pulled in by requests (for pip)
    etc

Thanks!

@martinb3
Copy link
Contributor

martinb3 commented Nov 10, 2016

@declension I'm surprised with all the items you're listing as having been included in your zip file. How is, e.g. Python itself, getting into the virtualenv's site-package directory? Maybe you could show us an example?

(I should note that you can also always skip the automatic virtualenv and maintain the necessary dependency files yourself, to control exactly what goes into the zip file.)

@declension
Copy link
Contributor Author

Thanks @martinb3. Perhaps It's a misconfiguration on my part (trying to find an older example now) - yes the Python itself was a mistake (will update above), sorry.

And agree, could go for manual maintenance of dependencies, but I like pip a lot 😄 ... so I guess it'd be nice to keep using this whilst shedding upload weight especially on my slow connection.

Either way I patched in my fork and it's proving useful (for me)...

@declension
Copy link
Contributor Author

declension commented Nov 10, 2016

To follow up (and I may still be doing something wrong), I recreated a cut-down version of that project (no virtualenv), with one dependency (pylms FWIW) and ran:

$ lambda-uploader --no-upload --no-virtualenv

Here is a listing of the lambda_function.zip contents.

HTH

@martinb3
Copy link
Contributor

@declension Could you also share your lambda.json? I'm surprised you need packages like pip and wheel, but perhaps there's a dependency chain somewhere that doesn't make sense?

@declension
Copy link
Contributor Author

@martinb3 interesting, thanks. Maybe PyLMS itself is the problem then - I guess those entries are not normally there?

Maybe I could try a test with a "less unusual" package, but meanwhile here's the latest (anonymised) lambda.conf:

{
  "name": "lambda-tester",
  "description": "Test for lambda-uploader",
  "region": "eu-west-1",
  "handler": "handler.lambda_handler",
  "role": "arn:aws:iam::900000000000:role/service-role/lambda-uploader-test",
  "ignore": [
    ".git",
    ".idea/",
    "metadata/"
  ],
  "timeout": 7,
  "memory": 128
}

@martinb3
Copy link
Contributor

martinb3 commented Nov 14, 2016

@declension given what you've told me so far, I tried to reproduce what you're seeing by creating a lambda.json file with your snippet above:

$ cat requirements.txt
pylms

$ cat lambda.json
{
  "name": "lambda-tester",
  "description": "Test for lambda-uploader",
  "region": "eu-west-1",
  "handler": "handler.lambda_handler",
  "role": "arn:aws:iam::900000000000:role/service-role/lambda-uploader-test",
  "ignore": [
    ".git",
    ".idea/",
    "metadata/"
  ],
  "timeout": 7,
  "memory": 128
}

$ lambda-uploader --no-upload -c lambda.json
λ Building Package
λ Fin

$ du -sh lambda_function.zip
3.3M    lambda_function.zip

As you can see, the resulting zip file is only 3mb. Do you have a fully fledged example we could clone from Github and try to reproduce? Otherwise I'm unable to reproduce what you're seeing.

@declension
Copy link
Contributor Author

@martinb3 - great, that's actually exactly what I'm seeing too - a 3.3MB ZIP (~9MB uncompressed).

The trimmed one I was using came in at just a few KB (uncompressed: ~28k of pylms and a few more for the test source itself), as it didn't need any python 2.7, pip, setuptools etc to work on Lambda.

@martinb3
Copy link
Contributor

@declension I didn't realize this at first, but virtualenv itself requires those packages apparently.

Sometimes people do depend on those, so I'm not sure there's an obvious fix to filter those out (without breaking others' use of this tool). --no-site-packages still keeps those few there.

@jarosser06 thoughts? Or should @declension just ignore them manually if he doesn't want them?

@jarosser06
Copy link
Contributor

We just ignore the package size since a few megabytes isn't a big issue for us. It is a pretty trivial change to use the ignore on the site-packages copy as well however I want to consider the potential problems before making a PR.

If the short term goal is to just have pylms packaged and you have no other need for anything else, then I guess you can call lambda-uploader with --no-virtualenv and then add pylms using the extra files flag (-x). This should give you your basic lambda package with the pylms library in it.

For Example:

lambda-uploader --no-virtualenv -x ~/.virtualenvs/<virtenv>/lib/python2.7/site-packages/pylms

@declension
Copy link
Contributor Author

Thanks; my short-term goal was fixed a while ago (I patched my fork locally and am using that).

Especially given virtualenv's behaviour and pip's size I assumed this would be useful for other users. Interestingly, I notice the exact change (other than previous separate changes I did for #114) I did already sits on SimpleHQ's fork...

@redblacktree
Copy link

+1

AWS Lambda size limits can become a problem pretty quickly. For the vast majority of uses, pip, setup_tools, and wheel are going to be cruft. My contention is that if you lambda function requires any of those, (why?) you should explicitly include them in requirements.txt or lambda.conf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants