Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redefine software installation #31

Closed
nj1973 opened this issue Nov 2, 2023 · 9 comments · Fixed by #43 or #54
Closed

Redefine software installation #31

nj1973 opened this issue Nov 2, 2023 · 9 comments · Fixed by #43 or #54
Assignees
Labels
initial release Required for initial open source effort

Comments

@nj1973
Copy link
Collaborator

nj1973 commented Nov 2, 2023

We need to spend some time understanding how a customer moves from a cloned github repo to a working installation.

We have already added a Makefile recipe to create a package containing many required artefacts.

make package

The tarball contains:

  • An OFFLOAD_HOME containing log, run, setup, conf, lib and bin directories
  • The gluentlib egg file in the lib directory which we include in PYTHONPATH in offload.env
  • offload.env templates
  • version file
  • Contents of setup and bin

What we do not include is:

  1. An Oracle client which we no longer bundle. If we switch to python-oracledb then they probably don't need a client, for now I think it has to be a documented prereq.
  2. A Python virtual environment with required packages

It is item 2 above that this issue concerns. We need to think about how we should ensure the customer has the correct Python and packages.

@nj1973
Copy link
Collaborator Author

nj1973 commented Nov 20, 2023

Notes from team chat

Goals for this issue are:

  • Migrate from Python setup.py (egg file) to pyproject.toml (wheel file)
  • Look at whether building with pdm is a good direction
  • Look at whether PyInstaller or Nuitka are good solutions for bundling a Python executable

Notes:

@nj1973 nj1973 self-assigned this Nov 20, 2023
@nj1973 nj1973 added the initial release Required for initial open source effort label Nov 20, 2023
@nj1973
Copy link
Collaborator Author

nj1973 commented Nov 29, 2023

@nj1973
Copy link
Collaborator Author

nj1973 commented Nov 29, 2023

More notes.

Good package structure?

bin/         # Replaces scripts
src/goe/     # Replaces gluentlib/gluentlib
tests/
docs/
pyproject.toml
LICENCE.txt
README.txt

gluent.py should not be in bin, it should go somewhere else inside src/goe, away from the entry level scripts.

I think transport and spark-listener should go into a tools subdirectory.

@nj1973 nj1973 linked a pull request Nov 29, 2023 that will close this issue
@nj1973
Copy link
Collaborator Author

nj1973 commented Nov 29, 2023

Re-opening because the PR was part 1 of 3 changes:

  1. Switch setup.py to pyproject.toml
  2. Restructure the repo in a more standard way (not yet)
  3. Bundle a Python executable of some kind with the final package (not yet)

1 down, 2 to go

@nj1973 nj1973 reopened this Nov 29, 2023
@nj1973
Copy link
Collaborator Author

nj1973 commented Dec 7, 2023

Memo

I tried to install the goe wheel on Debian 5.10 with Python 3.9 and failed with error:

Complete output (22 lines):
  /bin/sh: 1: krb5-config: not found

And also:

  gssapi/raw/misc.c:51:10: fatal error: Python.h: No such file or directory

Solutions:

sudo apt-get -y install libkrb5-dev gcc
sudo apt-get -y install python-dev python3-dev

@nj1973 nj1973 linked a pull request Dec 8, 2023 that will close this issue
@nj1973 nj1973 closed this as completed in #54 Dec 8, 2023
@nj1973
Copy link
Collaborator Author

nj1973 commented Dec 8, 2023

Parts 1 & 2 of the plan complete:

  1. Switch setup.py to pyproject.toml
  2. Restructure the repo in a more standard way (not yet)
  3. Bundle a Python executable of some kind with the final package (not yet)

Part 3 needs some thought. Currently PyInstaller is my primary plan.

@nj1973 nj1973 reopened this Dec 8, 2023
@nj1973
Copy link
Collaborator Author

nj1973 commented Dec 15, 2023

Capturing some thoughts I had recently:

  • Some of the packages were require to be installed pull down a lot of dependencies, an example being the Snowflake client
  • It would be nice if we know we are an Oracle/GCP shop to not need to install packages related to Snowflake, Synapse or Hadoop
  • I wondered if we could do this with sections in the pyproject.toml file. For example the basic installation only installs core dependencies. We then need to additionally install optional-dependencies for the distributions we are interested in.

e.g.:

[project.optional-dependencies]
oracle = [
    "cx-Oracle==7.3.0",
]
gcp = [
    "google-cloud-bigquery==3.4.2",
    "google-cloud-kms==2.14.1",
]
hadoop = [
    "hdfs==2.6.0",
    "impyla==0.17.0",
    "thrift-sasl==0.4.3",
]
... etc ...

@nj1973
Copy link
Collaborator Author

nj1973 commented Dec 20, 2023

Another option for packaging Python:

"
Hatch dropped a new release yesterday that may make things so much easier. It's got built in support for bundling python distros and installing them.
"
https://github.com/pypa/hatch/releases/tag/hatch-v1.8.0

@nj1973
Copy link
Collaborator Author

nj1973 commented Jan 4, 2024

I've spun the remaining tasks from this issue into two new ones and am closing this one.

@nj1973 nj1973 closed this as completed Jan 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
initial release Required for initial open source effort
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant