New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
We need to answer the question - what is "building" a package #74
Comments
I attempted to define what "packaging" is in one of my talks, which seems to be the same question as here. My answer was simply that packaging was the act of taking a project and turning it into 1-N distributions. I then went on to further define these terms, but here we have precedence as the PyPA defines both project and distribution (but not either packaging or building) (https://packaging.python.org/en/latest/glossary/). From this definition, I think that publishing is a distinct action that is not packaging, and I really don't think it fits inside building, but is still an important step for an author to understand and be able to perform if they want to share their code. Is this what you had in mind for the intro page? I'm sure diagrams could be made, it is a simple flow. |
I have made the same distinction in previous talks. |
Indeed the package's journey continues past the act of packaging and even publishing. Another important step is solving, which determines if installing == deploying as you put it, or if deploying involves installing a tree of distributions. Solving is also responsible for choosing the specific distribution that is installed for a release, even after a version has been selected. |
No, I meant that deploying is sending a full application somewhere. It’s separate entirely from uploading (sending a library somewhere so that people can make other libraries or applications). |
hey there - so sorry i missed this. so @agerber48 asked this question in the python discourse.
i'm very open to a PR - i can also help with the diagram graphics (we could make a nice diagram in google drawings even). @merwok welcome to pyopensci!! how did you find us? the goal of this guide is to be very beginner friendly. so yes i could see something like this as a diagram: write code / tests --> build into sdist/wheel - publish to pypi / conda-forge something that really helps newer users understand what packaging is all about and then what that building step means without drowning them in too much detail if that makes sense? |
This repo was linked from a discussion in the Python packaging forum! (discuss.python.org) |
I would be happy to help out with a diagram, but ISTM it would be largely (maybe entirely) a linear path. I think well-defined but approachable definitions is the more important point and that those definitions are right there with any graphic or description of the end-to-end process. |
This is essentially my view of "what is packaging" which is a superset of "what is building" flowchart TD;
A[Source] --->|build| B
B[Distributions] --->|upload| C
C[PyPI Project] --->|install| D[Package]
|
@ucodery very cool graphic - how did you make it animated like that? i just added a note to add a paragraph with a graphic to this page of the guide in our google doc outline @jagerber48 also pinging you as I know this was a topic brought up by you (i think??) on the Python discourse. |
i really like the idea of having a graph (horizontal would be better for our guide) that provides an overview of the process. high level. and we could also enhance it to add the elements of that "src" box. it would fit nicely on our build distribution type page at the tope because that page defines wheel vs SDist |
@lwasser I was using mermaid, in part because I thought it would allow better collaboration on the diagram. But I now realize you can't see the source of my comment! (if it was a checked-in md file you would see it render, and have the option to view raw)
I would also prefer a horizontal flow, but I don't think mermaid gives that kind of control. |
I am definitely not a mermaid expert but I think writing That is,
gives you flowchart LR;
A[Source] --->|build| B
B[Distributions] --->|upload| C
C[PyPI Project] --->|install| D[Package]
(edit: sorry, I forgot how to nest code blocks) (edit edit: and yes we could do this in a MyST doc -- https://mystmd.org/guide/diagrams) |
Ah, very helpful! I am still learning most of mermaid myself. I also learned that you can click the copy button to get mermaid text out of an image. But still not simple to colab on a thread of diagrams - you still have to paste it somewhere just to view it. |
@lwasser yes, I did ask the question! Thanks for remembering! It seems like a lot of this is covered here, but here's what I've learned so far. When I asked the question I had less experience deploying code but now I have a tiny bit more experience. First off, I've encountered build to mean two things. (1) It can mean bundling source code (and maybe some other stuff) into a zip-like file so it can be easily shared . But (2) building also means constructing an environment to test code, including installing the code in question. This is a pernicious overloading because the first definition involves bundling code and the second definition probably involves unbundling the same code. But here we're talking about packaging, so the first definition. So far it seems that "building is the process of bundling code into a zip-like file for easier distribution" is pretty much correct. The next question, then, is why is it so complicated? I can't answer that better than in the packaging guide, but here's my quick stab. Basically, for python, there are a number of tools for both bundling and unbundling code and even, historically, different formats for the bundles. The bundle/unbundle tools need to be configured. There are a lot of configuration options and bundlers seem to need to be a little bit aware of how the unbundlers operate and maybe vice-versa (I know less about unbundlers, like pip or something I guess?). But anyways, it seems the complexity surrounding "building" is related to this multitude of tools with overlapping and non-overlapping feature sets and configuration options. So yeah, the answer "building is bundling code into an easily distributable format" is a little bit unsatisfactory, because I wonder: why don't I just zip up my source code? Or write a script to do that? A satisfactory answer would somehow explain how that's the basic idea, but in practice there's more complexity and you need tools for reasons X, Y, and Z, and, since there are many tools there ends up being additional complexity. |
Thanks for this additional context @jagerber48.
Definitely, most terms in Python packaging are overloaded, sometimes with official definitions! I can see the second definition meant when one talks about "building" but it seems like a synecdoche more than a definition as one can't install a package without first building it, generally. Python packages are certainly bundled into zip files, but they are also bundled into tarball files and the current best practice is to bundle the same code both ways, so I'd leave the specific format out of the definition of building. If I could build on your definition I'd say building is "the process of bundling all code and metadata necessary for the accurate installation of the library or application". Typically, at least some of that code or metadata will need to be generated during the build process, which is what complicates the process. The other messy part is that the question "what code from the source is necessary for the user and where does it go at the install location?" is not something that can be fully known except by the human author. I have typically referred to "unbundlers" as "installers". There is less diversity here than there is for build tools. Pip is by far and away the most common, but we also have installer. Build/bundling tool should not have to know what installer will be used, or what its settings are. In the case of wheels, the install process is fully laid out in PEPs, so there should be little to no questions or divergence between different tools.
You could just zip your own code and certainly others in the past have done only this. But you would have to write both the bundler and unbundled yourself. And then there is the problem of how do you get your custom unbundler onto the target machine, and why didn't you use that technique to get your original package there? While the difference between a |
this is great @jagerber48 do you have any interest in helping us with the guide packages as we write them? it would be great to have another eye on this to catch these types of questions and to guide what we write. we will be publishing tutorials as well. @ucodery and others have been helping already!! The more eyes and help the merrier as far as i'm concerned!! i feel like collectively we will create a more accurate and useful guide. if you are interested in that and also potentially joining us in slack can you email me so i can invite you? leah at pyopensci.org ? essentially what i'd like to do is capture the important pieces in this thread - in our guide. Jeremiah has already pointed out some areas that might be confusing in there. id love to improve things more, together. |
oh also @merwok same for you - i'm sorry i didn't mean to only invite one person in this thread. it seems like everyone here might want to help in some way? we are working now in a google doc just for the initial pages. and soon it will be online via pr's that you can review. everyone gets credit for contributing regardless of whether it's in the google doc or not - i just have a hard time starting a collaborative writing process on github. |
hey y'all - small update - i started working on this locally and pushed a sample diagram to discourse here . this diagram shows the big picture of the packaging workflow (feedback welcome). Then i thought we could use and expand upon parts of the diagram to explain parts of the process. For instance, that right hand section of it is akin to what Jeremiah posted above - we could expand upon that part when we explain "building". I'm thinking we don't need to have those details about installers and such. rather keeping it simpler but explaining why it's a bit more than just zipping things up, is important for our users and community. please feel free to comment on the full diagram in discourse. and if not when i get more text pulled together from this discussion i'll share it here for comments |
text is here if anyone wants to review / comment / edit. @jagerber48 i'd especially appreciate your eye in terms of what questions you still have. but really i want EVERYONE's feedback! i'll add you as an author after comments are made here. this text is designed to go at the top of this page in our guide |
For my second definition I'm talking about the process when some automation tool (like git actions or something) installs python in some environment and then installs a package into the environment then runs tests. In cases I've seen this process involves pip installing the package. Which means it is taking a bundled (built) version of the code and installing it. So in the first definition of "build" bundled code is the output, but in the second, bundled code is an input. Maybe you're suggesting (with your synecdoche comment) that "building" refers to the whole process from having source code on machine A to having source code on machine B? In which case both bundling and unbundling would be part of building?
This is a helpful refinement. The "for the accurate installation" bit is key to help me understand why it's messier than I'd naively think.
This is helpful info
Helpful clarification @lwasser thanks for the invitations to help. I'll e-mail you and I'll have a look at the materials you've shared. Hopefully I can be helpful by continuing to ask questions! |
Yes, pip can install a python package from source, sdist, wheel, git URL, and even others. Because it can install formats other than wheels, it is categorized as a build "front-end". Pip cannot transform arbitrary I maintain that building is going from source to a bundled format. Installing is going from a bundled format to an importable/ executable module. The confusing bit is that most installers we use for Python, primarily pip, sometimes do both things in order to fulfill a user request. |
so this exact point that you make @ucodery is something that we discussed early on with folks like @ocefpaf . when filipe explained that pip point to me - it was super confusing! it's. kind of like the confusion around setuptools vs Build vs other tools like flit (back end vs front end conflation).
I think for our guide we don't need to get into that detail because it will confuse folks more. i think our goal should be that scientists understand generally how to create a pacakge (how to create that sdist and wheel), what those files / archives are / represent and then how to install. that is what my gut is telling me about the depth of our guide. We could note in a breakout that pip actually can do both. BUT it's really confusing to think about that and more info than most might need to be successful in creating a package! |
The details of how pip does its work is probably outside the bounds of what any packaging guide needs to have. But I would be careful to never refer to pip as a build tool. One of its jobs could be defined as a build front-end but even that I would avoid labeling pip unless it came up directly. Some projects use pip as the means of building the wheels that it uploads to PyPI but that is not advised. Pip wears a lot of hats but I would refer to it primarily as a package manager. Its ability to act as a build front-end is only to make it a more versatile installer, not to assist package authors in an upload workflow. |
that sounds great!
i'm mostly focused on new pages now around testing, ci data etc and would also love help there as those sections are empty. BUT really, whatever you want to work on, please feel free to do so!!! as i appreciate any and all help and just want our users to have some clarity in this ecosystem! ✨ |
@lwasser back on the diagram I find the edge "Create SDist/ Wheel distributions" going to the node "Build package/ create SDist & Wheel" a bit confusing/ redundant. I think, given the definitions I originally gave in this thread I would have the edge pointing out from the Repository to be "publishing" and point at a new group of tasks, one of which is building and one of which is publishing (so no change in the nodes). |
All of this info about pip/building/installing etc. is very helpful and informative for me. But I do feel my point about the term "build" being overloaded has been missed a little bit.
I agree that this is one definition of "build", and it is the most relevant one for packaging. But someone learning about packaging will come across the term "build" in many other contexts. Just look at how many times the word "build" appears in @lwasser's diagram. And I don't think all of those usages refer to "bundling source into a bundled format". In my own workflow there are at least two example.
|
ok please review this pr y'all. hoping this provides sufficient information for those newer to packaging! the goal will be to merge by labor day in the us (september 5th) !! |
i think it would be super helpful on the intro page of the package chapter to include a high level overview of the parts of a python package and what the build/ publish steps means / entails.
This doesn't have to be super involved but high level - perhaps a diagram with an explanation.
"these are the bare-min things that you need to do to create a python package" to really break down and describe the entire process.
The text was updated successfully, but these errors were encountered: