Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

We need to answer the question - what is "building" a package #74

Closed
lwasser opened this issue Apr 3, 2023 · 27 comments · Fixed by #101
Closed

We need to answer the question - what is "building" a package #74

lwasser opened this issue Apr 3, 2023 · 27 comments · Fixed by #101
Labels
documentation Improvements or additions to documentation help wanted Extra attention is needed sprintable

Comments

@lwasser
Copy link
Member

lwasser commented Apr 3, 2023

i think it would be super helpful on the intro page of the package chapter to include a high level overview of the parts of a python package and what the build/ publish steps means / entails.

This doesn't have to be super involved but high level - perhaps a diagram with an explanation.

"these are the bare-min things that you need to do to create a python package" to really break down and describe the entire process.

@ucodery
Copy link
Collaborator

ucodery commented Jun 9, 2023

I attempted to define what "packaging" is in one of my talks, which seems to be the same question as here. My answer was simply that packaging was the act of taking a project and turning it into 1-N distributions. I then went on to further define these terms, but here we have precedence as the PyPA defines both project and distribution (but not either packaging or building) (https://packaging.python.org/en/latest/glossary/).

From this definition, I think that publishing is a distinct action that is not packaging, and I really don't think it fits inside building, but is still an important step for an author to understand and be able to perform if they want to share their code.

Is this what you had in mind for the intro page? I'm sure diagrams could be made, it is a simple flow.

@merwok
Copy link

merwok commented Jun 9, 2023

I have made the same distinction in previous talks.
Packaging as a topic can be divided into separate actions: building (code into a distributable artifact), uploading (artifact to repository), installing (artifact from repository), and I think the last one was deploying (an app with dependencies).

@ucodery
Copy link
Collaborator

ucodery commented Jun 9, 2023

Indeed the package's journey continues past the act of packaging and even publishing. Another important step is solving, which determines if installing == deploying as you put it, or if deploying involves installing a tree of distributions. Solving is also responsible for choosing the specific distribution that is installed for a release, even after a version has been selected.

@merwok
Copy link

merwok commented Jun 9, 2023

No, I meant that deploying is sending a full application somewhere. It’s separate entirely from uploading (sending a library somewhere so that people can make other libraries or applications).

@lwasser
Copy link
Member Author

lwasser commented Jul 3, 2023

hey there - so sorry i missed this. so @agerber48 asked this question in the python discourse.

i'm very open to a PR - i can also help with the diagram graphics (we could make a nice diagram in google drawings even).

@merwok welcome to pyopensci!! how did you find us?

the goal of this guide is to be very beginner friendly.

so yes i could see something like this as a diagram:

write code / tests --> build into sdist/wheel - publish to pypi / conda-forge

something that really helps newer users understand what packaging is all about and then what that building step means without drowning them in too much detail if that makes sense?

@lwasser lwasser added help wanted Extra attention is needed documentation Improvements or additions to documentation sprintable labels Jul 3, 2023
@merwok
Copy link

merwok commented Jul 3, 2023

This repo was linked from a discussion in the Python packaging forum! (discuss.python.org)

@ucodery
Copy link
Collaborator

ucodery commented Jul 5, 2023

I would be happy to help out with a diagram, but ISTM it would be largely (maybe entirely) a linear path. I think well-defined but approachable definitions is the more important point and that those definitions are right there with any graphic or description of the end-to-end process.

@ucodery
Copy link
Collaborator

ucodery commented Jul 5, 2023

This is essentially my view of "what is packaging" which is a superset of "what is building"

flowchart TD;
    A[Source] --->|build| B
    B[Distributions] --->|upload| C
    C[PyPI Project] --->|install| D[Package]

@lwasser
Copy link
Member Author

lwasser commented Jul 27, 2023

@ucodery very cool graphic - how did you make it animated like that? i just added a note to add a paragraph with a graphic to this page of the guide in our google doc outline

@jagerber48 also pinging you as I know this was a topic brought up by you (i think??) on the Python discourse.

@lwasser
Copy link
Member Author

lwasser commented Jul 27, 2023

i really like the idea of having a graph (horizontal would be better for our guide) that provides an overview of the process. high level. and we could also enhance it to add the elements of that "src" box. it would fit nicely on our build distribution type page at the tope because that page defines wheel vs SDist

@ucodery
Copy link
Collaborator

ucodery commented Jul 28, 2023

@lwasser I was using mermaid, in part because I thought it would allow better collaboration on the diagram. But I now realize you can't see the source of my comment! (if it was a checked-in md file you would see it render, and have the option to view raw)
This is what I wrote to get that image:

```mermaid
flowchart TD;
    A[Source] --->|build| B
    B[Distributions] --->|upload| C
    C[PyPI Project] --->|install| D[Package]
```

I would also prefer a horizontal flow, but I don't think mermaid gives that kind of control.

@NickleDave
Copy link
Contributor

NickleDave commented Jul 28, 2023

I am definitely not a mermaid expert but I think writing flowchart LR instead of flowchart TD will give you a left-to-right graph layout

That is,

```mermaid
flowchart LR;
    A[Source] --->|build| B
    B[Distributions] --->|upload| C
    C[PyPI Project] --->|install| D[Package]
```

gives you

flowchart LR;
    A[Source] --->|build| B
    B[Distributions] --->|upload| C
    C[PyPI Project] --->|install| D[Package]

(edit: sorry, I forgot how to nest code blocks)

(edit edit: and yes we could do this in a MyST doc -- https://mystmd.org/guide/diagrams)

@ucodery
Copy link
Collaborator

ucodery commented Jul 28, 2023

Ah, very helpful! I am still learning most of mermaid myself. I also learned that you can click the copy button to get mermaid text out of an image. But still not simple to colab on a thread of diagrams - you still have to paste it somewhere just to view it.

@jagerber48
Copy link

@lwasser yes, I did ask the question! Thanks for remembering!

It seems like a lot of this is covered here, but here's what I've learned so far. When I asked the question I had less experience deploying code but now I have a tiny bit more experience.

First off, I've encountered build to mean two things. (1) It can mean bundling source code (and maybe some other stuff) into a zip-like file so it can be easily shared . But (2) building also means constructing an environment to test code, including installing the code in question. This is a pernicious overloading because the first definition involves bundling code and the second definition probably involves unbundling the same code. But here we're talking about packaging, so the first definition.

So far it seems that "building is the process of bundling code into a zip-like file for easier distribution" is pretty much correct. The next question, then, is why is it so complicated? I can't answer that better than in the packaging guide, but here's my quick stab. Basically, for python, there are a number of tools for both bundling and unbundling code and even, historically, different formats for the bundles. The bundle/unbundle tools need to be configured. There are a lot of configuration options and bundlers seem to need to be a little bit aware of how the unbundlers operate and maybe vice-versa (I know less about unbundlers, like pip or something I guess?). But anyways, it seems the complexity surrounding "building" is related to this multitude of tools with overlapping and non-overlapping feature sets and configuration options.

So yeah, the answer "building is bundling code into an easily distributable format" is a little bit unsatisfactory, because I wonder: why don't I just zip up my source code? Or write a script to do that? A satisfactory answer would somehow explain how that's the basic idea, but in practice there's more complexity and you need tools for reasons X, Y, and Z, and, since there are many tools there ends up being additional complexity.

@ucodery
Copy link
Collaborator

ucodery commented Aug 7, 2023

Thanks for this additional context @jagerber48.

First off, I've encountered build to mean two things.

Definitely, most terms in Python packaging are overloaded, sometimes with official definitions! I can see the second definition meant when one talks about "building" but it seems like a synecdoche more than a definition as one can't install a package without first building it, generally.

Python packages are certainly bundled into zip files, but they are also bundled into tarball files and the current best practice is to bundle the same code both ways, so I'd leave the specific format out of the definition of building. If I could build on your definition I'd say building is "the process of bundling all code and metadata necessary for the accurate installation of the library or application". Typically, at least some of that code or metadata will need to be generated during the build process, which is what complicates the process. The other messy part is that the question "what code from the source is necessary for the user and where does it go at the install location?" is not something that can be fully known except by the human author.

I have typically referred to "unbundlers" as "installers". There is less diversity here than there is for build tools. Pip is by far and away the most common, but we also have installer. Build/bundling tool should not have to know what installer will be used, or what its settings are. In the case of wheels, the install process is fully laid out in PEPs, so there should be little to no questions or divergence between different tools.

why don't I just zip up my source code? Or write a script to do that?

You could just zip your own code and certainly others in the past have done only this. But you would have to write both the bundler and unbundled yourself. And then there is the problem of how do you get your custom unbundler onto the target machine, and why didn't you use that technique to get your original package there? While the difference between a .whl and a .zip is nothing more than an imposed structure on the contained files, that structure is what tells any installer what to do with the package, whether it is compatible at all with the current system, and a lot of other metadata that is "necessary for the accurate installation".

@lwasser
Copy link
Member Author

lwasser commented Aug 7, 2023

this is great @jagerber48 do you have any interest in helping us with the guide packages as we write them? it would be great to have another eye on this to catch these types of questions and to guide what we write. we will be publishing tutorials as well. @ucodery and others have been helping already!! The more eyes and help the merrier as far as i'm concerned!! i feel like collectively we will create a more accurate and useful guide. if you are interested in that and also potentially joining us in slack can you email me so i can invite you? leah at pyopensci.org ?

essentially what i'd like to do is capture the important pieces in this thread - in our guide. Jeremiah has already pointed out some areas that might be confusing in there. id love to improve things more, together.

@lwasser
Copy link
Member Author

lwasser commented Aug 8, 2023

oh also @merwok same for you - i'm sorry i didn't mean to only invite one person in this thread. it seems like everyone here might want to help in some way? we are working now in a google doc just for the initial pages. and soon it will be online via pr's that you can review. everyone gets credit for contributing regardless of whether it's in the google doc or not - i just have a hard time starting a collaborative writing process on github.

@lwasser
Copy link
Member Author

lwasser commented Aug 9, 2023

hey y'all - small update - i started working on this locally and pushed a sample diagram to discourse here . this diagram shows the big picture of the packaging workflow (feedback welcome). Then i thought we could use and expand upon parts of the diagram to explain parts of the process.

For instance, that right hand section of it is akin to what Jeremiah posted above - we could expand upon that part when we explain "building". I'm thinking we don't need to have those details about installers and such. rather keeping it simpler but explaining why it's a bit more than just zipping things up, is important for our users and community.

Screen Shot 2023-08-09 at 1 25 29 PM

please feel free to comment on the full diagram in discourse. and if not when i get more text pulled together from this discussion i'll share it here for comments

@lwasser
Copy link
Member Author

lwasser commented Aug 9, 2023

text is here if anyone wants to review / comment / edit. @jagerber48 i'd especially appreciate your eye in terms of what questions you still have. but really i want EVERYONE's feedback! i'll add you as an author after comments are made here. this text is designed to go at the top of this page in our guide

@jagerber48
Copy link

@ucodery

Definitely, most terms in Python packaging are overloaded, sometimes with official definitions! I can see the second definition meant when one talks about "building" but it seems like a synecdoche more than a definition as one can't install a package without first building it, generally.

For my second definition I'm talking about the process when some automation tool (like git actions or something) installs python in some environment and then installs a package into the environment then runs tests. In cases I've seen this process involves pip installing the package. Which means it is taking a bundled (built) version of the code and installing it. So in the first definition of "build" bundled code is the output, but in the second, bundled code is an input. Maybe you're suggesting (with your synecdoche comment) that "building" refers to the whole process from having source code on machine A to having source code on machine B? In which case both bundling and unbundling would be part of building?

If I could build on your definition I'd say building is "the process of bundling all code and metadata necessary for the accurate installation of the library or application". Typically, at least some of that code or metadata will need to be generated during the build process, which is what complicates the process. The other messy part is that the question "what code from the source is necessary for the user and where does it go at the install location?" is not something that can be fully known except by the human author.

This is a helpful refinement. The "for the accurate installation" bit is key to help me understand why it's messier than I'd naively think.

I have typically referred to "unbundlers" as "installers". There is less diversity here than there is for build tools. Pip is by far and away the most common, but we also have installer. Build/bundling tool should not have to know what installer will be used, or what its settings are. In the case of wheels, the install process is fully laid out in PEPs, so there should be little to no questions or divergence between different tools.

This is helpful info

While the difference between a .whl and a .zip is nothing more than an imposed structure on the contained files, that structure is what tells any installer what to do with the package, whether it is compatible at all with the current system, and a lot of other metadata that is "necessary for the accurate installation".

Helpful clarification

@lwasser thanks for the invitations to help. I'll e-mail you and I'll have a look at the materials you've shared. Hopefully I can be helpful by continuing to ask questions!

@ucodery
Copy link
Collaborator

ucodery commented Aug 10, 2023

@jagerber48

In cases I've seen this process involves pip installing the package. Which means it is taking a bundled (built) version of the code and installing it. So in the first definition of "build" bundled code is the output, but in the second, bundled code is an input. Maybe you're suggesting (with your synecdoche comment) that "building" refers to the whole process from having source code on machine A to having source code on machine B? In which case both bundling and unbundling would be part of building?

Yes, pip can install a python package from source, sdist, wheel, git URL, and even others. Because it can install formats other than wheels, it is categorized as a build "front-end". Pip cannot transform arbitrary .py files into a package, but it can build any other build tool's package. When pip encounters a format that is not a wheel, it first builds it into a wheel and then installs that wheel into the environment. This is what I am suggesting with my synecdoche comment. Building is one step in a multi-step process that only looks like a single action because github/ pip make it look simple.

I maintain that building is going from source to a bundled format. Installing is going from a bundled format to an importable/ executable module. The confusing bit is that most installers we use for Python, primarily pip, sometimes do both things in order to fulfill a user request.

@lwasser
Copy link
Member Author

lwasser commented Aug 10, 2023

so this exact point that you make @ucodery is something that we discussed early on with folks like @ocefpaf . when filipe explained that pip point to me - it was super confusing! it's. kind of like the confusion around setuptools vs Build vs other tools like flit (back end vs front end conflation).

The confusing bit is that most installers we use for Python, primarily pip, sometimes do both things in order to fulfill a user request.

I think for our guide we don't need to get into that detail because it will confuse folks more. i think our goal should be that scientists understand generally how to create a pacakge (how to create that sdist and wheel), what those files / archives are / represent and then how to install.

that is what my gut is telling me about the depth of our guide. We could note in a breakout that pip actually can do both. BUT it's really confusing to think about that and more info than most might need to be successful in creating a package!

@ucodery
Copy link
Collaborator

ucodery commented Aug 10, 2023

The details of how pip does its work is probably outside the bounds of what any packaging guide needs to have. But I would be careful to never refer to pip as a build tool. One of its jobs could be defined as a build front-end but even that I would avoid labeling pip unless it came up directly. Some projects use pip as the means of building the wheels that it uploads to PyPI but that is not advised.

Pip wears a lot of hats but I would refer to it primarily as a package manager. Its ability to act as a build front-end is only to make it a more versatile installer, not to assist package authors in an upload workflow.

@lwasser
Copy link
Member Author

lwasser commented Aug 10, 2023

that sounds great!

  1. let's make sure the text about building in the google doc above is correct / accurate and answers that general question about building.
  2. then maybe we can go back into the build package tools page and adjust language to for consistency. i could even move it into a hackmd if need be.

i'm mostly focused on new pages now around testing, ci data etc and would also love help there as those sections are empty.

BUT really, whatever you want to work on, please feel free to do so!!! as i appreciate any and all help and just want our users to have some clarity in this ecosystem! ✨

@ucodery
Copy link
Collaborator

ucodery commented Aug 10, 2023

@lwasser back on the diagram I find the edge "Create SDist/ Wheel distributions" going to the node "Build package/ create SDist & Wheel" a bit confusing/ redundant. I think, given the definitions I originally gave in this thread I would have the edge pointing out from the Repository to be "publishing" and point at a new group of tasks, one of which is building and one of which is publishing (so no change in the nodes).

@jagerber48
Copy link

All of this info about pip/building/installing etc. is very helpful and informative for me. But I do feel my point about the term "build" being overloaded has been missed a little bit.

I maintain that building is going from source to a bundled format.

I agree that this is one definition of "build", and it is the most relevant one for packaging. But someone learning about packaging will come across the term "build" in many other contexts. Just look at how many times the word "build" appears in @lwasser's diagram. And I don't think all of those usages refer to "bundling source into a bundled format".

In my own workflow there are at least two example.

  1. When I push code to my repo or add tags my readthedocs page has automation that causes it to automatically do "builds" of my documentation so that the documentation webpage is updated. I don't think this has anything to do with "bundling source code".
  2. When I push code to my repo, under some circumstances, github actions that I have configured will perform "builds" that run linting and tests on my code. I also don't think this involves "bundling source code".

@lwasser
Copy link
Member Author

lwasser commented Aug 21, 2023

ok please review this pr y'all. hoping this provides sufficient information for those newer to packaging! the goal will be to merge by labor day in the us (september 5th) !!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation help wanted Extra attention is needed sprintable
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants