-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Developing Nim's stdlib and a Nim distribution #173
Comments
I want to elaborate on why I think point 4 is very important
The process for contributing to the standard library as an outside contributor is currently too difficult for some areas. A high quality stdlib is important. The stdlib is relied on by a lot of people. A tough code review process is not a bad thing in principle, but in practice, the current situation has scared off some valuable contributors, and hindered Nim's development. There are issues when there is not a clear concept of an "owner" of a std lib module that can properly moderate and make executive decisions. Such an "owner" needs to have a reasonable domain knowledge, familiarity with the code base, and time to stay up to date with all code review comment threads. Without a clear leader, making a PR can turn into a yak shaving exercise, trying to convince several strong personalities, which may never agree on fundamental issues. Leaving the PR to languish, and turning off potential contributors. The other possibility, is that due to lack of manpower, a PR sits with no progress (not even a comment) for a long period of time. Even a close, wont accept would be welcome here. Any feedback at all. These situations are not the same as Issues or Bug reports. These are people that have spent their valuable time writing code for the project. They deserve timely feedback. This isn't theoretical. These scenarios have actually happened several times here in Nim land. If each module is it's own repo, the repo owner can make final decisions quickly. Making the process to contribute much easier. The Nim core team is small. They don't have the man power or domain expertise to do this for a huge standard library with many modules. This proposal allows the core team to focus on what they are good at, and spread some responsibility to the community. The core team can focus on being trusted curators, and leave the domain specific expertise to the domain experts. This is exactly how Linux distros work. It has it's own challenges, but it's a fairly successful model IMO. The community has already done a good job of filling in gaps in the standard library, creating better alternatives to some stdlib modules. This Proposal leverages what is already happening in the ecosystem, and allows everyone to benefit from it in a more formal way. |
My god, please no.
How many existing packages fit this definition? What you'll end up doing with this proposal is encouraging people to artificially call their packages v1.0.0 to get the included in "the Nim distribution". I propose alternative solutions to the problems that you say this proposal solves:
This is what tags in Nimble are for, designate a tag and mark packages which are "production ready" with it. Then just give them a link: https://nimble.directory/search?query=nim-production-approved
Maybe there is more to this, but if users are behind a firewall then they are more often than not offered a proxy which Nimble supports by the way. I don't understand this problem at all. Maybe the problem is actually different? Are there companies which do not allow installation of packages and/or need pre-approval for each new component that is installed?
This is something that should be resolved with a proper, official and centralised, package website where users can publish their packages (i.e. upload them). You should pour resources into that instead of creating this distribution. |
Yes there are. Many companies in the financial sector work this way. |
That is not mutually exclusive to this proposal. This proposal has a huge extra benefit: In theory, those set of packages and their dependencies have all been tested together, so that you know they won't interfere with each other. It's the same way Linux distros work. You trust that if you Correct me if I'm wrong about my assumptions here. |
In what way could they possibly interfere with each other? What is the proposal for testing that they are compatible? The only advantage this will have is perhaps compatibility with the Nim version that the packages are bundled with. Honestly though, we don't even have good test coverage for the stdlib, it should really be a priority to improve that instead of creating a distribution and supposedly testing it. |
In that case I doubt a distribution where packages are voted on will solve this problem for them. There will always be packages that they wish to use for which they will need to gain approval. Also, I assume that all of the packages in the distribution will need to be audited. Doing so for all packages that the community deems appropriate to include in the distribution will be a much larger burden than doing so for a select few packages that the financial institution requires for their software. |
Here is what the distribution would avoid: You have modules A and B, A depends on C version 1, B depends on C version 2 (incompatible with version 1). A list of Nimble packages doesn't achieve the same. |
So you'll remove any packages that have this incompatibility? That could make what's included in the distribution quite volatile. The real solution to this problem is to implement support for it in the compiler. C v1 should be considered a separate package by the compiler to C v2 somehow. |
No, they won't be added in the first place, only one version of C can make it into the distribution.
I don't agree, it's not the compiler's job to enable a solution that is at best a momentary unfortunate situation and at worst caused by incompetence. |
All this is true. |
Yes, but the problem is that dependencies change as packages are updated. You will find that either removing a package altogether or keeping them frozen in an old state is the only way to keep compatibility. That is what I meant by "volatility". |
Of course, but Araq isn't proposing an NPM-style model. He's proposing an official distribution which the community chooses. My point is that no financial institution will be happy with this because each will have different requirements. We should work towards an NPM-style solution, with an ability to allow these institutions to create self-hosted registries. This is the sustainable way forward and may actually help Nim be financially sustainable (of course, NPM is on a whole different scale... but on the other hand they are hugely profitable AFAIK) |
Again, these are not mutually exclusive.
The community distribution is simply an "example" so to speak, that has certain high standards for compatibility. Again, this is exactly what Linux distros do. It is not a radical idea. |
How so? Previously they accepted nim-1.0.0.tar.xz as a trusted thing programmers are allowed to use, afterwards they do the same for nim-distribution.tar.xz which fullfills the same standards that nim-1.0.0.tar.xz did. The point is that it's easier to get permission for 1 software package as opposed to 10 different dependencies. |
Yes, but they each take time, and Nim as a whole has limited resources. We should spending those resources wisely, on a solution that works long-term and for more use cases. |
Each financial institution will have a different set of packages that they want.
So as long as we put some packages together and call it a distribution the financial institution will just happily trust us? I find that hard to believe. |
I agree, but we disagree on where those resources should be spent. See my initial post about contributing to the stdlib today. |
Everything you suggested so far is more expensive than my proposal. |
My suggestions also solve more problems and are effectively inevitable. You might as well put resources into them because you will have to do so eventually anyway.
This is where we disagree. I strongly don't think this will provide much benefit to our users, even in the short-term, much less in the long-term. |
No, they solve different problems and fail to see that there valid reasons behind the quite common stance, "I won't use X, it's a dependency". |
It is already super easy to create a self-hosted registry. Steps:
[PackageList]
name = "CustomPackages"
url = "http://mydomain.org/packages.json" |
I agree completely with your assessment @andreaferretti. But the big factor here is "big institutions", these will have enough scale to maintain something like this. But if I'm a small startup I don't want to mess with this, I just want a hosted solution that works automagically for me. Indeed, we should reuse this functionality for our custom NPM-like solution, or at least evaluate it to see its limitations. |
Just because something is part of the "Nim distribution" doesn't change the fact that the package is still a dependency... |
But it does change that! Just like today's (rather silly IMO) stdlib's |
I'm with @dom96 in the sense that the resources we have are limited, and we should focus in smaller steps. For example, I would propose to keep important packages under the nim-lang org in github, and give maintainers collaboration access, this way, if a maintainer dissapears, other person can focus on maintain X package without needing to fork and let the old one over there. If you don't want to make those packages look "officially supported", just create another org But I would solve those problems @Araq has listed one by one instead of trying a big solution like this distribution. |
Many people don't want to do "nimble install foo" even if that foo is https://github.com/nim-lang/zip/. Putting popular packages under "nim-lang" wouldn't solve that. They either have restrictions on their machines or firewalls (i.e. financial institutions and believe me, there might be tug of war between devs and either sysadmins or IT security), or they want the "genuine" default experience. The solution is that once one package becomes popular, people propose to be co-maintainers. Possibly we could have say "nim-webdev", "nim-science", "nim-games" organizations if a domain becomes quite big but that should be happening organically. What the Nim community might do is maybe allow subforums in the Nim forum so that people have a privileged place to discuss that. |
I do not consider it a "big solution", I consider it the "smallest solution that could possibly work". The only thing that bothers me is that it adds maintenance costs. However, every other solutions also adds maintenance costs of some sort, it's inevitable. And also hopefully people chime in to mitigate this cost. |
A little wrinkle that hasn't been included in this discussion is package-level documentation. One of my packages of interest, https://github.com/c-blake/cligen in its simplest usage style requires "definitely some" because it's a rare approach with a couple rare features, but "not very much" documentation because usage "scales down" so very far. For a user already committed to learning some things this is not an issue. For "fly by" potential users, I've hoped that I could keep their attention for at least 4..6 paragraphs and 1 code snippet at the top of the README.md. This probably seems like I am arguing against the proposal which I am not, really. It's just a "property" of the proposal. The usual Nim distro has an integrated documentation system. Maybe some thought about how this interacts with that is warranted. I am fine with Another reason unmentioned so far (but perhaps vaguely related to @mratsim's genuine default experience) for avoiding dependencies is that "System" package managers like { If someone has some other ideas to make that mess smaller, it may be a good topic for a related but distinct RFC and/or nim/nimble issue. My best idea is some kind of "generator" for the usually <12 system package managers from language package descriptions and engaging with the people who maintain system package repos. } |
You make it sound like 80%+ of us don't want to install packages via Nimble, which I seriously doubt is the case. I don't doubt that there are people working for financial institutions who have these restrictions, but I want to hear from them, and I would ask you to stop exaggerating how many of these people exist.
As I mentioned previously, whatever restrictions the users have with regards to firewalls can be worked around by proxies. If IT needs to sign off on every single package then I don't see a reason why they wouldn't need to sign off on every single package inside a Nim distribution. A "genuine" default experience is nice, it works well for anaconda for example. But I seriously doubt we've got anywhere near enough mature packages that could be bundled up into a useful distribution. |
Just a vote, not a survey, but I hate having to install packages via This would allow, among other things, packages not written in Nim to depend upon, say, a command line utility written in Nim, or programs written in Nim to depend upon things not written in Nim, such as certain versions of C libraries that could be auto-installed as dependencies via the system package manager request to install some Nim program. |
@dom96 One example: https://irclogs.nim-lang.org/30-03-2018.html#08:08:48
Also I did work in financial institutions (4 years) on the ops side and basically the discussion with developers was "don't do that" or "disclaimer: all damages to the company due to this unapproved code will be supported by your department budget". And this was for unapproved unzip library on AS400. What a financial institution is looking into before choosing a package is Obviously given Nim size, they will probably start with an internal team, but as the reliance on Nim grows, they will be moved to the core business / value addition of the company (the financial institution proprietary algorithms) and the financial institution will look into offloading dependency support to an external provider for various reasons:
Now I agree that a tool to "generate your own Nim distro" would be great, but I think it's best to start with a proof-of-concept, "Nim important packages" distro, get it out, see how people like it. This would be very easy for people to try nim in ix.io or in their own Docker with as less friction as possible, write non-trivial useful programs that needs more than the standard library (say npegs or SDL2 or bigint or crypto or Arraymancer). Then we can write the "generate your own distro" layer so that if someone wants a "Nim distro for finance" or "Nim distro for science" and want to contract a company to maintain it. Alternatively, instead of being by domain, those distributions could be by security properties, say code that has been audited, code that only use safe features of Nim, no warranty code, similar to Ada Spark. Now, that said, assuming we only had one person with the choice of working on either nimble or the distribution the priority should be in making nimble better over the distribution aspect because I agree that the vast majority of users will happily use a package manager. For scalable usage, nimble needs in my opinion:
However, from a time and resource perspective, it's not an either nimble or distribution. @Araq works on Nim full-time, and probably can provide a distribution in less time that you or anyone who wants to tackle lock files as a project on the side of a full-time job. Furthermore while we could also ask @Araq to work on Nimble instead, I would argue that he should be the last one working on it as he doesn't use any packages, everything is in the standard library and so he doesn't have to deal daily with nimble limitations. |
The reason I support Araq proposal is because I prefer to have a small subset of high quality libraries instead of thousands of low quality packages and concentrate support efforts on the libraries in the distribution. When Nimble improves with features to identify valuable packages, when the community is larger, the initial goal of distributions disappears. Look at Debian, there are hundreds of thousands of DEB packages on the Web, but only 59,000 are included into the latest Buster release. When you stick with packages from the distribution, you have the insurance that they will work together without trouble. This type of insurance is important for companies and Nim beginners. |
@pmetras but thats not how third party ecosystems work: popular ecosystems lead to people working on their own packages or even having choice between many quality packages for the same thing, it's a bit like free market vs a big state imo (i know this metaphor is overused): its hard to expect a very minimal team that already has huge amount of work on the language to somehow maintain a huge library suite as well. I think the idea of distro is good in principle, but look at Go, C++, Python, Ruby, Java etc: how often do you see distributions except for niche cases like python science? my point is that having an active ecosystem is much more critical than having distributions, and that it seems its not a problem for many of those much more popular languages/ecosystems (correct me if i am wrong) |
@pmetras sorry, i now realized you argue about something similar, and distributions mostly in the beginning, i agree with that in a way, but i still want to point out that making the ecosystem bigger is more important: not sure how a distribution applies tho, maybe it helps |
Honestly, our package ecosystem is so immature that you won't be able to create a stable distribution that's useful to anyone. Never mind creating a distribution that's useful for a financial institution!
@mratsim I disagree, @Araq has worked on Nimble and should work more on it. The creator of Nim avoiding such a major aspect of the language doesn't do our users any favours. |
It's a long read, so I decided to write it only once; this is how I'm planning on doing a distribution generator: It should fix my personal |
What?! We have nim-regex that's better than our stdlib packages, better packages to do Pegs, better packages to do serialization, a couple of useful UI libraries, ORMs, ...
IMHO a better Nimble cannot solve the inherent fragility of a distributed system. But I've said it before, the points I raised are not solved by a perfect package manager. |
You cannot argue that a decentralized solution is bad when do not exist a language that can be used without third party dependencies. It is easy, if you develop your language+pkg dependencies, you are, how many, 2 developers? 5-10 if you get a lot of help? If you count the people that actually contributes to any Nim package, you have more than 100 people. I would bet every resource we had into making Nim community stronger, I just see this distribution like a specific feature for financial companies, not for the future of the language (and that scares me). |
Arguable. Python with its batteries included surely is/was useful without third party deps. Plenty of people use C++ or C without external dependencies, of course it depends on the application domains.
Fair enough I guess. |
I think there are multiple understandings of what "distribution" mean. I'll try to synthesize what I put in my understanding:
For instance, if we have a distribution about data science and machine learning, I expect to find libraries about dataframes, machine learning and statistical algorithms, graphing. For a data structures and algorithms distributions, I expect to have classical container data structures (trees, hashmaps, etc.) and algorithms (sorts, hash, etc.). Another one about languages could have parsers and lexers libraries. One can imagine a medical, education or scientific distributions. I don't care if it's included into Nim umbrella or not, that it is centralized or distributed, in a container or not. I don't need to create personal distribution or that it is based on Nimble or not. What I want to ease development for beginners and attract some type of companies or governments, when Nim compiler v1.3 is published, I can get data science v1.3 and stdlib v1.3 distributions easily, for instance. There is no barrier against me to become efficient in my domain of interest immediately. I don't need to spend time finding packages with Nimble and debug them or write the documentation... |
Yes, many companies including "FAANGs" prohibit pip/nimble/npm/tarball installs but provide blanket approval for linux/bsd distributions for many reasons. Edit: A summary around supply chain attacks and how to mitigate them: https://drewdevault.com/2022/05/12/Supply-chain-when-will-we-learn.html and previously https://arxiv.org/pdf/2005.09535.pdf |
Here is my suggestion: create periodical "snapshot" lists of compiler version + library names + library versions that are trusted, tested, known to work together and sign it. The snapshot itself is just a list of package names and versions. Such lists is then used:
|
One other wrinkle here is that "work together" here can be a bit more than a single-bit value. I put only |
I'm actually proud that nimterop (which depends on cligen) supports and is tested with 0.19.6, 0.20.2, 1.0.2 and devel x Win, Lin and OSX. This provides users with a package that they can rely on for an extended period of time without being forced to upgrade. Even with this test matrix, nimterop only supports Nim for the last year's worth of releases - 0.19.0 came out in Sep 2018. That doesn't seem like much when you go beyond hobby projects. |
@c-blake using git tags or hashes can ensure the correct code to use for a given distribution version. They are more reliable than version ranges in Nimble. |
I think my point may have been misunderstood. I was just following up on @FedericoCeratto's "tested known to work together". Of course, I/any other package author could (probably) go to a (variable amount of) effort with |
@c-blake if you are suggesting implementing a feature matrix between Nim and each package, this is out of scope for this issue: at the end we need a boolean decision [distribute|not distribute]. Also, implementing and running test for all M^N combinations is exceedingly onerous. |
Well, it could be in scope to use weaker language about promises (and this issue is to discuss scope -- I think you may be jumping the gun on concluding anything about it). |
So we've had a cooling-off period on this RFC during which time I believe I've addressed most of the blockers on @mratsim's list. The thinking on stdlib evolution has, uh, evolved a bit since this RFC was written, but I'm curious to hear if the opinions on distribution have changed, as I'm just about ready to implement something. |
Er ... what? Any links please? |
https://github.com/disruptek/nimph Nimph doesn't add anything to the tasks situation, which really doesn't feel like it needs a language-specific solution. The other goals are easily achieved with Nimph's support for hierarchies and git-native architecture, as well as the compiler's support for Bare (and shallow) git repositories seem well-suited to use as the fundamental building block of distributions. They are trivially validated, contain a wealth of useful metadata, and are easily consumed by other tools. Distributions are already here and work with both compiler and package manager, today. |
Pulling whole repositories does not seem optimal. I just created a proposal for source packages in #179 |
Good, but my RFC is about one official distribution (!), not about having the tools to build distributions. |
As a much trimmed down version of this idea we are trying https://github.com/nim-lang/fusion and see where it gets us. Anything beyond that seems to be beyond our current resources hence I close this RFC. |
Design guidelines for Nim's stdlib
We plan to create a "Nim distribution" which consists of the
Nim compiler, Nimble and a selected/curated set of Nimble packages. The idea is to have the best of both worlds:
A Nim with a stdlib that can be maintained effectively by the Nim core developers and yet also something that has "batteries included". Nevertheless there still is a stdlib and it remains part of Nim's core.
Can the Nim compiler itself depend on a curated Nimble package? In the future yes, in the beginning, it shouldn't. We have to be conservative with Nim's core. This brings us to our first requirement:
(1) What the compiler itself needs must be part of the stdlib.
This is probably a temporary requirement until the "Nim distribution" has been implemented and tested successfully for a couple of months.
The second requirement should be uncontroversial:
(2) Vocabulary types must be part of the stdlib.
These are types most packages need to agree on for better interoperability, for example
Option[T]
. This rule also covers the existing collections likeTable
,CountTable
etc. "Sorted" containers based on a tree-like data structure are still missing and should be added.Time handling, especially the
Time
type are also covered by this rule.(3) Existing, battle-tested modules stay
Reason: There is no benefit in moving them around just to fullfill some design fashion as in "Nim's core MUST BE SMALL". If you don't like an existing module, don't import it. If a compilation target (e.g. JS) cannot support a module, document this limitation.
This covers modules like os, osproc, strscans, strutils, strformat, etc.
And finally:
(4) New stdlib modules do not start as stdlib modules
Nim distribution
I imagine the "Nim distribution" to work like this: We have the usual Nim tarball with a
dist/
directory that contains the set of selected packages we agreed on.Adding a package
Every package in there must be voted into the distribution. The majority decides about whether to include the package or not.
A package must be at version 1 or later in order to be considered for inclusion. Ideally we can use the master branch of the github repository for inclusion.
After the decision to add it was made, a review process should start. The review should be done by the distribution maintainers.
The review process should focus on:
It should not focus on:
Removing a package
Ideally packages are not removed. It's a package others depend on. We should fork the package to ensure it stays online. It's acceptable if the development on a package has stopped. If the community decides that the package A has been superseded by a different package B the distribution can start to include B and deprecate A.
Keeping the packages up to date
CIs ensure the tests are green all the time. The distribution itself will be version controlled and the packages are tied to a specific git commit that has been reviewed. There is a tension here between "use what is known to be stable" and "use the latest" and probably we should support both, default is "stable", and "latest" not only means "latest" but also "unsupported and not reviewed".
Packages can be updated individually via some command like
koch update xyz
.Benefits
Disadvantages
The text was updated successfully, but these errors were encountered: