Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-141545: Request - Reduce package dependencies chaos #284

Closed
manugarri opened this issue Feb 27, 2020 · 34 comments · Fixed by #725 or #845
Closed

SNOW-141545: Request - Reduce package dependencies chaos #284

manugarri opened this issue Feb 27, 2020 · 34 comments · Fixed by #725 or #845
Assignees
Labels

Comments

@manugarri
Copy link

Please answer these questions before submitting your issue. Thanks!

  1. What version of Python are you using (python --version)?
    python 3.7

  2. What operating system and processor architecture are you using (python -c 'import platform; print(platform.platform())')?
    Linux-5.3.0-40-generic-x86_64-with-debian-buster-sid

  3. What are the component versions in the environment (pip list)?

Package                    Version   
-------------------------- ----------
ansible                    2.8.6     
asn1crypto                 1.2.0     
attrs                      19.3.0    
awscli                     1.16.265  
azure-common               1.1.23    
azure-storage-blob         2.1.0     
azure-storage-common       2.1.0     
backcall                   0.1.0     
bleach                     3.1.0     
blis                       0.4.1     
bokeh                      1.4.0     
boto                       2.49.0    
boto3                      1.10.50   
botocore                   1.13.50   
certifi                    2019.11.28
cffi                       1.13.1    
chardet                    3.0.4     
Click                      7.0       
cloudpickle                1.2.2     
colorama                   0.4.1     
conda                      4.8.0     
conda-package-handling     1.6.0     
credsman                   0.0.1     
croniter                   0.3.31    
cryptography               2.7       
cymem                      2.0.2     
dask                       2.6.0     
datasketch                 1.4.10    
dateparser                 0.7.0     
decorator                  4.4.0     
defusedxml                 0.6.0     
distributed                2.6.0     
docutils                   0.15.2    
dxtrack                    0.6.1     
elasticsearch              7.1.0     
entrypoints                0.3       
fastjsonschema             2.14.1    
fastparquet                0.3.2     
fsspec                     0.6.2     
future                     0.18.1    
gspread                    3.1.0     
HeapDict                   1.0.1     
idna                       2.8       
ijson                      2.5.1     
importlib-metadata         0.23      
ipykernel                  5.1.3     
ipython                    7.8.0     
ipython-genutils           0.2.0     
ipywidgets                 7.5.1     
jedi                       0.15.1    
jellyfish                  0.7.2     
Jinja2                     2.10.3    
jmespath                   0.9.4     
joblib                     0.14.0    
json5                      0.8.5     
jsonschema                 3.1.1     
jupyter-client             5.3.3     
jupyter-core               4.5.0     
jupyterlab                 1.1.4     
jupyterlab-server          1.0.6     
llvmlite                   0.30.0    
locket                     0.2.0     
MarkupSafe                 1.1.1     
metaflow                   2.0.0     
mistune                    0.8.4     
more-itertools             7.2.0     
msgpack                    0.6.2     
murmurhash                 1.0.2     
nbconvert                  5.6.0     
nbformat                   4.4.0     
nl-core-news-sm            2.2.1     
notebook                   6.0.1     
numba                      0.46.0    
numpy                      1.17.3    
oscrypto                   1.1.0     
packaging                  19.2      
pandas                     0.25.3    
pandocfilters              1.4.2     
parso                      0.5.1     
partd                      1.0.0     
pexpect                    4.7.0     
pickleshare                0.7.5     
Pillow                     6.2.1     
pip                        19.3.1    
plac                       0.9.6     
preshed                    3.0.2     
prometheus-client          0.7.1     
prompt-toolkit             2.0.10    
psutil                     5.6.5     
ptyprocess                 0.6.0     
pyarrow                    0.15.1    
pyasn1                     0.4.7     
PyAthena                   1.8.0     
pycosat                    0.6.3     
pycparser                  2.19      
pycryptodomex              3.9.0     
Pygments                   2.4.2     
PyJWT                      1.7.1     
pyOpenSSL                  19.0.0    
pyparsing                  2.4.4     
pyrsistent                 0.15.4    
PySocks                    1.7.1     
python-dateutil            2.8.0     
python-rapidjson           0.8.0     
pytz                       2019.3    
PyYAML                     5.1.2     
pyzmq                      18.1.0    
recordlinkage              0.14      
redis                      3.3.11    
regex                      2019.12.9 
requests                   2.22.0    
requests-aws4auth          0.9       
rsa                        3.4.2     
ruamel-yaml                0.15.71   
s3fs                       0.4.0     
s3transfer                 0.2.1     
scikit-learn               0.21.3    
scipy                      1.3.1     
Send2Trash                 1.5.0     
setuptools                 41.4.0    
six                        1.12.0    
sklearn                    0.0       
smart-open                 1.9.0     
snowconn                   3.6.0     
snowflake-connector-python 2.1.3     
snowflake-sqlalchemy       1.1.17    
sortedcontainers           2.1.0     
spacy                      2.2.1     
SQLAlchemy                 1.3.10    
srsly                      0.1.0     
tblib                      1.5.0     
tenacity                   6.0.0     
terminado                  0.8.2     
testpath                   0.4.2     
thinc                      7.1.1     
thrift                     0.11.0    
toolz                      0.10.0    
tornado                    6.0.3     
tqdm                       4.36.1    
traitlets                  4.3.3     
tzlocal                    2.0.0     
Unidecode                  1.1.1     
urllib3                    1.25.6    
wasabi                     0.2.2     
wcwidth                    0.1.7     
webencodings               0.5.1     
wheel                      0.32.3    
widgetsnbextension         3.5.1     
zict                       1.0.0     
zipp                       0.6.0  
  1. What did you do?
    If possible, provide a recipe for reproducing the error.
    A complete runnable program is good.
    i manage our etl and reporting frameworks so i maintain different snowflake-connector-python versions at all times
  2. What did you expect to see?
    docker builds not failling every week because of snowflake-connector-python crazy and nonsensical requirements that should be set up as external optional dependencies
  3. What did you see instead?
    package errors when importing snowflake-connector-python
  4. Can you set logging to DEBUG and collect the logs?
    since when this happens i have to fix it i cant provide the logs, will do next time it breaks
import logging
import os

for logger_name in ['snowflake.sqlalchemy', 'snowflake.connector', 'botocore']: 
    logger = logging.getLogger(logger_name)
    logger.setLevel(logging.DEBUG)
    ch = logging.StreamHandler()
    ch.setLevel(logging.DEBUG)
    ch.setFormatter(logging.Formatter('%(asctime)s - %(threadName)s %(filename)s:%(lineno)d - %(funcName)s() - %(levelname)s - %(message)s'))
    logger.addHandler(ch)

Basically I think the chaotic behavior of snowflake connector package dependency could be mitigated if most dependencies are moved as optional, such as azure, crypto, etcetera.

@github-actions github-actions bot changed the title Request - Reduce package dependencies chaos SNOW-141545: Request - Reduce package dependencies chaos Feb 27, 2020
@MarkAWard
Copy link

Adding my two cents where many others already have as well, using snowflake-connector-python with its vast number of requirements and restrictions is quite a nuisance when integrating with large projects

@manugarri
Copy link
Author

Maybe it would be possible to reorganize the code a bit?

For example, I see the azure package seems to be used only in the [https://github.com/snowflakedb/snowflake-connector-python/blob/dc31c582be713d73707331092a304928459f8411/file_transfer_agent.py[ file transfer agent., maybe we can remove the dependency and if someone wants to use azure then import it (or fail)?

Also, the biggest pain seems to always be boto3 and botocore restrictive dependencies:

  'boto3>=1.4.4,<1.12',
        'botocore>=1.5.0,<1.15'

Is there a reason why such a restriction is mandatory?

@jacopoch
Copy link

This is a major headache and we'd love it to be addressed 👍

@akdor1154
Copy link

urgh, there is a crapton of stuff pulled in here that should be optional. Why do I need to pull in boto, azure, s3transfer? I'm just firing off some selects in a notebook.
It would be really great if you could turn all the cloud libraries into optional dependency sets.
That way customers can do
pip install snowflake-connector-python[aws] or snowflake-connector-python[azure] and only get the stuff they need.
You could even have a
pip install snowflake-connector-python[kitchensink] listed in your how-to page if you want to tutorial to be bulletproof.

Would you accept a PR to this effect?

@manugarri
Copy link
Author

@akdor1154 that would be the solution i proposed indeed, if you do a PR i will check it (im not a maintainer but im sure this PR would be appreciated).

While you are at that it would be good to do some cleanup, this package has basically no organization (c extensions and modules all over the root level of the repository)

@sfc-gh-mkeller
Copy link
Collaborator

sfc-gh-mkeller commented May 13, 2020

It's not that simple, testing infrastructure changes are necessary for this, which were ignored by all of the proposed PRs, or error handling of missing optional dependencies.
I know it doesn't look like it, but I am working on this.
Please give me some faith and time, big changes are coming to this repository soon.

@akdor1154
Copy link

Thanks for the update. My offer for a PR still stands - would this be accepted from the community? Or does it really need to come from a snowflake employee due to the infra requirements?

@sfc-gh-mkeller
Copy link
Collaborator

Of course you can. I can do the infra changes, but please don't start a new one. See: #260
I provided genuine feedback here and the PR has a good foundation, it was just abandoned by the creator and I didn't have time yet to finish it.

@davebelais
Copy link

I want to second manuggari's critique. Dependencies which are this specific are inappropriate for a library—it's not a requirements.txt file :-). Moreover—it's the inclusion of "<" and "<=" in requirements which is so problematic. You should always be able to simply use "~=" (with a minor-version reference only) if the library correctly uses semantic versioning, and in the cases where there is an issue with a specific patch version—please use "!=" instead (presume updates will correct the issue), and raise an issue with the author of the library to make sure it is so... When I don't have complex dependencies in an application—this lib works excellent, and I appreciate attempts here to ensure compatibility—but you've got to relinquish a little control sometimes :-).

@jacopoch
Copy link

jacopoch commented Aug 10, 2020

Good morning, gentle reminder to try to address this issue, the package broke again today when the Azure dependencies got an update.
Please do something about it so we can stop pinning all sorts of packages we don't even need, maybe a cloud service provider specific package?

@manugarri
Copy link
Author

Im not sure this is the right approach, but I just released a snowflake-connector package that is basically, this package but with the least amount of dependencies. https://pypi.org/project/snowflake-connector-python-lite/

Would love to hear everyone's thoughts.

@jacopoch
Copy link

jacopoch commented Sep 28, 2020

Guys please fix this, it broke again, can't even count the occurrences anymore. It's really a pain.

@sfc-gh-mkeller
Copy link
Collaborator

Guys please fix this, it broke again, can't even count the occurrences anymore. It's really a pain.

What is the issue specifically?

@manugarri
Copy link
Author

@jacopoch check this alternative package instead https://pypi.org/project/snowflake-connector-python-lite/

@jacopoch
Copy link

@jacopoch check this alternative package instead https://pypi.org/project/snowflake-connector-python-lite/

@manugarri we're now evaluating either forking this package and fixing it a bit or testing if we can use nowflake-connector-python-lite. Of course an official distribution would be preferred 👍

@manugarri
Copy link
Author

@jacopoch hey feel free to submit PRs to snowflake-connector-python-lite, i did the least amount of changes to reduce unnecessary package dependencies, but would love to get more eyes on it!.

BTW, I also released snowflake-sqlalchemy-lite and snowconn-lite following the same principles.

@Alonreznik
Copy link

Hi There. Any progress in this case?
If we just want a connector to snowflake for running SQLs, why should I install Azure, AWS, Google, Pandas, and Payarrow?
This is something that just makes us delay our upgrading of the connector because it breaks our service time after time.

@manugarri
Copy link
Author

@Alonreznik check this alternative package instead https://pypi.org/project/snowflake-connector-python-lite/

@Alonreznik
Copy link

Alonreznik commented Oct 15, 2020

@Alonreznik check this alternative package instead https://pypi.org/project/snowflake-connector-python-lite/
Thanks!
Do you have any GitHub/Lab/BitBucket for this repo?

@manugarri
Copy link
Author

yeah , I will add it to the pypy page. https://github.com/manugarri/snowflake-connector-python-lite

@Alonreznik
Copy link

@manugarri Amazing!
Thank you!

Do you know have any aim to update it to the latest connector version?
Is this something can be done automatically?

I would love to see how to make it without breaking the original repo.

Thanks!
Alon

@manugarri
Copy link
Author

hmm yeah that is an issue, right now i ported things manually. You could definitely help doing a PR and i will redeploy

@sfc-gh-mkeller
Copy link
Collaborator

It's nice to see the community helping each other out 😄

If I could make a recommendation @manugarri I'd encourage you to use GitHub actions to do build and releases, as building our C extensions for different OSes requires all 3 OSes to build separate wheels. And getting the right OS setups is also non-trivial.
You could use the following as a template: https://github.com/snowflakedb/snowflake-connector-python/blob/master/.github/workflows/build_test.yml
I made sure that packages built by the build jobs in this Yaml file are releasable.

@Alonreznik
Copy link

Alonreznik commented Oct 18, 2020

@sfc-gh-mkeller
Thanks for the advice. However, we as SF partners are expected to use the official connector in SF compliances and best-practices. Therefore using external libs (which are terrific, I must say), isn't an option in the end, and we want to rely on SF's official python connector. So Please please please push it forward.
Every upgrade of the connector makes us using some sanity tests, after upgrading or downgrading our AWS, GCP, Azure libs, and adding extra libs such as pyarrow.
We can't manage our projects like that and this is not how it supposed to work with Python libraries 👍

Thank you in advanced

Alon

@frsann
Copy link

frsann commented Jun 8, 2021

@sfc-gh-mkeller, any update on this?

We ran into a problem whit a project that only requires the dialect parts of the snowflake-sqlalchemy package. We noticed that the tox-step started to take forever to run. The problem is probably partly due to changes to the dependency resolver in Pip, but this project is really the root cause of the problem. We solved the problem by copy/pasting the dialect related code from snowflake-sqlalchemy, and commented out any reference to the connector (fortunately there weren't that many). So we're now in a situation where we (paying customers of Snowflake) have to hack our way around the connector in order to build and test our projects 👍

Please fix this mess.

@keller00
Copy link

keller00 commented Jun 8, 2021

Hi @frsann
I'm very sorry that you're having difficulties, please reach out to our support team so that we can collect the specific information necessary to fix your issue!

@frsann
Copy link

frsann commented Jun 8, 2021

Ok, I'm a bit offended by you response to be honest. You and I both know technical support wont be fixing the dependency mess in this project.

@jacopoch
Copy link

jacopoch commented Jun 8, 2021

Hi @frsann
I'm very sorry that you're having difficulties, please reach out to our support team so that we can collect the specific information necessary to fix your issue!

Sorry in what sense? The package dependency chaos has been highlighted over a year ago by multiple customers at different times here and the connector hasn't been really refactored yet, are there any plans to make the situation better? We haven't been able to get any conclusive answer from our contacts at Snowflake in the past year and we're every now and then wondering what is being done to address this problem if anything at all :)

It's one of the most complex dependencies we have currently

@sfc-gh-mkeller
Copy link
Collaborator

@frsann Support is a integral part of triaging and investigation of issues. If they can't help you out on their own they will help you collect the necessary information.

@jacopoch We understand that this is a major headache, but please be patient and understand that this is a non-trivial change and it has been actively worked on. We have been evaluating different ways to mitigate the SDK problems we have had (there were many). In the meantime we have relaxed our dependencies where possible.
#725 is what you want to look out for

@jacopoch
Copy link

jacopoch commented Jun 8, 2021

@frsann Support is a integral part of triaging and investigation of issues. If they can't help you out on their own they will help you collect the necessary information.

@jacopoch We understand that this is a major headache, but please be patient and understand that this is a non-trivial change and it has been actively worked on. We have been evaluating different ways to mitigate the SDK problems we have had (there were many). In the meantime we have relaxed our dependencies where possible.
#725 is what you want to look out for

I think on our end it's naturally quite hard to understand what's being worked on and what are the plans (unfortunately your Jira is not reachable for customers) and I think part of the issue here is actually not referencing a work item with a breakdown of the scope/plan of work on this repository, this issue can be solved easily from your end (without overcommitting / overpromising).

If #725 is supposed to fix this issue, it wasn't referenced here before even if the commits are already from the end of March 👍

@manugarri
Copy link
Author

@sfc-gh-mkeller good to know, really looking forward to leaving this issue behind after a year and a half advocating for it to be fixed :)

As a side note, on #725 I dont see any mention to the package requirements being dropped in the commit description (only the name of the commit hints at it). Given how this was a major source of conflicts within snowflake customers it would have been good if it was addressed there.

Also i dont see any change to setup.py on that PR (the cloud requirements still exist as required in master as of today). How does the pr reduces dependencies?

@sfc-gh-mkeller
Copy link
Collaborator

@manugarri for 1 release we’ll keep around the old code path and the dependencies just in case something were to go wrong given the extent of these changes. Once we confirm all is working as expected, we will remove these dependencies completely.
The ticket got closed via automation when I did the merge but our intent was to keep it open till everything is done. Reopening for now and will close once everything is complete.

@sfc-gh-mkeller sfc-gh-mkeller linked a pull request Oct 6, 2021 that will close this issue
6 tasks
@sfc-gh-mkeller
Copy link
Collaborator

The plan in the short term is to keep reevaluating all the rest of our dependencies, but #845 has removed our largest dependencies for now

@manugarri
Copy link
Author

@sfc-gh-mkeller , the PR looks great, thanks a ton!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
9 participants