Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rustpkg #4610

Closed
wants to merge 16 commits into from
Closed

Rustpkg #4610

wants to merge 16 commits into from

Conversation

z0w0
Copy link
Contributor

@z0w0 z0w0 commented Jan 24, 2013

Rustpkg is a revamp of Cargo for the Rust suite that I've been working on for the past week. It's a purely functional package manager that has no central sources of any kind,
but rather installs via URLs. It's similar to how Go's go get tool works, except
rustpkg requires a central metadata file (package.rs) in the repository, archive or folder
in order to figure out how to build the package. This is a side effect of rustpkg
allowing multiple crates to be defined in a single package (i.e. a package is defined
as a set of crates rather than a package being exactly the same as one crate).

The metadata is written in Rust. This is both good and bad. One con is that
Rust's syntax is going to be moderately unstable until v1.0 (as I was told),
so compiling an old package might spring up weird compiling errors. On the other hand,
you get to write the build process in Rust itself which is so incredibly meta and fun
and you get to use an awesome language to build projects in that same awesome language.

Rustpkg also doubles as a powerful build system that gives you two ways to
describe the build process of projects: declarative and imperative. The declarative
syntax allows you to declare the build process using Rust's attribute syntax.
A simplistic declarative package script (package.rs) would be something
like the following (Servo doesn't actually use rustpkg, it's just an example):

#[pkg(id = "org.mozilla.servo",
      vers = "0.5.6")];

#[pkg_crate(file = "src/servo.rc")];
#[pkg_crate(file = "src/servo-gfx.rc")];

The imperative API is for when you need a more powerful build process, such
as probing the system for important configuration, running shell scripts (or
even autotools, which is a planned builtin feature that I want to add). A simple
example using the imperative API is as follows (rustpkg is automatically included):

#[pkg(id = "org.mozilla.servo",
      vers = "0.5.6")];

#[pkg_do(build)]
fn build() {
   let platform = if os::is_toaster() { ~"platform=toaster" } else { ~"platform=alien" };
   let crate = rustpkg::Crate(~"src/servo.rc").cfg(platform).flag(~"-g");

   rustpkg::build(~[crate]);
}

When a package's crates are installed, they are stored under a unique name (Rust's library crates work like this out of the box) in ~/.rustpkg/bin or ~/.rustpkg/lib. This allows packages to coexist in a purely functional manner. However, if you want to easily
use a binary crate that has been installed from the ~/.rustpkg/bin path (after
adding it to your $PATH) then rustpkg provides "preferring" functionality
to symlink a generic name for the crate into the binary directory (which
voids the purely functional label, which is why it's explicitly optional). For example, if you had a crate with the name servo and it's uniquely installed to ~/.rustpkg/bin/servo-<hash>-0.5.6 then rustpkg prefer servo would link it to ~/.rustpkg/bin/servo which you can then easily run as just servo if the binary directory is in your $PATH.

There's also rustpkg do <cmd> which allows you to create scripts callable by the user in the package script:

// ...

#[pkg_do(configure)]
fn configure() {
    io::println(~"let us configure things");
}
$ rustpkg do configure
let us configure things

There's more detailed per-command info in rustpkg --help.

Preferring is currently pretty rusty, along with some of the other features. But that's to
be expected due to this being the initial implementation and I plan to perfect it
over time (including documentation). This pull request also adds std::semver, which is originally made by @erickt but I've ported it to the latest syntax and added it into std with his permission.

I still really want to add the following but they're not important for the initial pull request:

  • Improvements to the user interface based on feedback
  • More features of the imperative API
  • Use the JIT compiler for package scripts (JIT segfaults for me at the moment)
  • Use std::workcache for only compiling things that have changed
  • Add wrappers around autotools that also use workcache

@brson
Copy link
Contributor

brson commented Jan 24, 2013

Thanks, z0w0! This will take some time to review.

@graydon
Copy link
Contributor

graydon commented Jan 25, 2013

First thoughts: \o/ \o/ I totally want this to land.

Subsequent thoughts focus on the remainder of the conversation we were having over in the gist, when tav was asking for ever-fewer moving parts (from the user's side), and I sympathize with that. I like where you've gone with this but I think there might still be a few opportunities to file down extra stuff users have to do in the default case, without sacrificing the things you, me, pcwalton and others (rightly) want to keep possible in more-special cases.

In particular, the two remaining "extra bits" I think we can remove by default are the id name being different from a URL name, and package.rs separate from a crate-root source file (by default). I'm sorry to be a broken record about it but hear me out a moment; these are subtle but the payoff for "letting the user do the least work" is I think quite important:

  • Concerning URLs: It's the case now and I think for the foreseeable future that a DNS-associated name-prefix controlled by a user is a feature of the development landscape. Every user is likely to have a host.com/user prefix, and given this it seems fine to make host.com/user/pkg the naming scheme for packages as in Go. I particularly like that -- without a protocol -- it maps equally well to a filesystem path and a public URL. As tav pointed out, this means you can develop your packages locally in directories called github.com/z0w0/foo so long as that's put in your rustpkg path, then write extern mod foo = "github.com/z0w0/foo"; in neighbouring packages that use it and have them work both locally and against the internet when other people download them. It requires the tool to have a map of URL prefixes and inferred protocol access methods, but I think that is likely to work well in reality. It seems to for Go and on this count I think they really did the right thing (it was what I wanted to do at first too, I just hadn't worked out as many details as they did)
  • Concerning package.rs: it's definitely the case that if you have custom build logic you're going to want that to be in a different file than the entire crate, since you want to build it and run it to perform the build tasks (very slick idea with rustpkg do <cmd> btw. I wouldn't have thought of that). But when you don't have custom build logic, the only reasons for requiring an additional package.rs separate from the crate root that I can see are:
    • To make it easier to find; for this I think the convention that "the shallowest *.rs file in the repo is the crate root" is likely to work fine, or maybe "the shallowest mod.rs file". We already have the problem of having to name the source file associated with a directory module.
    • To handle multiple crates in the same package. A valid concern! But also one I think that can be handled declaratively if we make one minor assumption: that a package provides one or more crates and that the root crate is the one you "get" if you try to link-against the crate by its URL-name. Other crates may come along for the ride as extra installs (either local versioned-together dependencies, build-dependencies, binaries or the like) but they aren't extern mod-able by URL-name. The extras could be indicated declaratively still by either scanning the crate for direct dependencies (extern mod other = "other/mod.rs";) or by looking for attributes in the crate root indicating auxiliary installation artifacts: #[pkg_aux("other_cmd/mod.rs")]
  • One final caveat concerning package.rs as a filename is that given the pkg abbreviation on attributes and on the tool name, possibly pkg.rs is a more symmetric name.

All these are, as I say, just minor concerns. I only raise them at this point in order to try to nudge things that may be harder to nudge the further down the road we get. This is definitely the road I want to be going down though, in general. It's great code and I'm glad to see it; did a quick read through all the files and would be happy to see this landing.

@z0w0
Copy link
Contributor Author

z0w0 commented Jan 26, 2013

@graydon your points are really good, but I haven't done it exactly the way you suggest for a few reasons. I'm completely open to having the URL be the unique ID. But I am really not happy at all with having it find crates to be used/installed just by walking down the directory. That really doesn't sound right to me. I like a central metadata format, because it allows us to have a central place to store data that might be needed in the future. I also think the central metadata declaring where the files is a better way to do it because there's no ambiguity to the logic of finding the files - whereas walking the directory is weak to things like localized filesystem sorting methods, if I'm not mistaken. Plus, if there's any external things that need to detect a package - for example, Travis CI - it is much easier for them to be sure that there will always be a pkg.rs file.

I'm 100% dedicated to this, so if there's a solid concept to tweak towards, then I'm commited to changing it :). But I think this system also works really well. I guess what is really important right now is not to worry about the concept of it as much but rather let the users of the language itself try it out and see what they think could be better.

Also good point about pkg.rs! Changed it.

@brson
Copy link
Contributor

brson commented Jan 26, 2013

@z0w0: do you have a recommendation for how to test this? is there perhaps an example set up? I don't understand from the pull request how to establish dependencies to remote crates or packages.

@steveklabnik
Copy link
Member

Just a note as per the whole 'local vs remote' thing: in Ruby world, bundler has a config option that manages this: http://gembundler.com/v1.2/whats_new.html#local-git-repos

The structure is different, of course, but the idea is that on a per-project basis, you can say "Please override this with a local version instead" so that you can try out your local changes rather than the remote ones, without actually touching the file, which would cause issues when pushing it public.

That said, this might be too different from the way things work now, I haven't looked into this. Packaging is really important, though, so it's mega important that Rust get this right. :)

@z0w0
Copy link
Contributor Author

z0w0 commented Jan 27, 2013

@Kimundi
Copy link
Member

Kimundi commented Jan 27, 2013

How about doing build logic lookup this way:

  • If there is a top-level pkg.rs, use it
  • Else if there is exactly one top-level *.rc file, use it. If more, error with 'ambiguous package repository, blame maintainer'
  • Else if there is exactly one top-level *.rs file, use it. If more or none, error with 'ambiguous package repository, blame maintainer'

This allows three flavors of laziness in publishing a repo:

  • 'Clearly defining everything in pkg.rs' (aka 'not lazy at all')
  • 'Simply putting up a crate'
  • 'Throwing some code on github and have it magically work'

And it also allows the maintainer to 'upgrade' his crate definition without code breaking, so he can for example go from

'a giant *.rs file containing everything' over 'code neatly seperated into modules' to 'I need to use custom build logic'

@z0w0
Copy link
Contributor Author

z0w0 commented Jan 27, 2013

Sounds like sane logic to me. I'll add it later.
On 27 Jan 2013 22:03, "Kimundi" notifications@github.com wrote:

How about doing build logic lookup this way:

  • If there is a top-level pkg.rs, use it
  • Else if there is exactly one top-level *.rc file, use it. If more
    error with 'ambiguous package repository, blame maintainer'
  • Else if there is exactly one top-level *.rs file, use it. If more
    error with 'ambiguous package repository, blame maintainer'

This allows three flavors of laziness in publishing a repo:

  • 'Clearly defining everything is pkg.rs' (aka 'not lazy at all')

  • 'Simply putting up a crate'

  • 'Throwing some code on github and have it magically work' And also
    allows the maintainer to 'upgrade' his crate definition without code
    breaking.


    Reply to this email directly or view it on GitHubhttps://github.com/Rustpkg #4610#issuecomment-12753202.

@graydon
Copy link
Contributor

graydon commented Jan 27, 2013

The scenario I'm hoping to streamline is the "throw a library on github that consists of a handful of source files and nothing terribly special about building them". So if you imagine creating a library github.com/user/foo with files src/foo.rs, src/util.rs and src/tests.rs, then you could picture the thing being buildable with a search rule like:

  • If a file pkg.rs exists in the root, use it
  • if not, take the URL stem foo and look for a file called foo.rs in the repo. If there's exactly one, use it.
  • Otherwise error

Would that be stable / deterministic enough?

The main thing is to derive as much as possible through convention or information implied in the existing structure of a pile of source files. Including the inter-package dependency: an extern mod foo = "github.com/user/foo"; directive in someone else's code should be enough for rustpkg to find and build the external library.

Anyway, as you say this is somewhat academic. I'm fine with the rest of the code. It's moving in the right direction, r+

(Also @Kimundi, keep in mind, .rc files are going away; we only maintain support for that file stem right now out of sloth / compatibility with existing stuff we haven't renamed / reorganized yet.)

@brson
Copy link
Contributor

brson commented Jan 27, 2013

We still use .rc files now because that is cargo's convention for locating the crate.

@brson
Copy link
Contributor

brson commented Jan 27, 2013

Here are my initial notes from using rustpkg:

The UI is very nice.

Output is now in ~/.rustpkg, not ./.rustpkg, and I don't see an option to use the local directory. Using the local directory by default is something I like about cargo. I don't want my four Rust workspaces all competing over resources under ~/.rustpkg, and I don't want to wipe out all my builds globally when something goes wrong in one project

Trying to run hunter:

  • When I rustpkg build inside the rustpkg-test2 directory I don't know what happened to the binary.
  • I tried to rustpkg prefer hunter but 'package not found'
  • Then I rustpkg install and that did something different
  • Then rustpkg prefer hunter and finally my binary moves to ~/.rustpkg/bin

There are three commands involved here and the relationship isn't clear to me. build and install both went and did some work behind the scenes, but neither gave me a binary to run. What is 'build' for? Can 'install' give me the binary by default?

rustpkg install will 'install from the cwd' but rustpkg uninstall won't 'uninstall from the cwd'.

rustpkg uninstall rabbit caused a segfault. Now my database is locked. After removing the lock the database doesn't parse (db.json is 0KB).

@brson
Copy link
Contributor

brson commented Jan 27, 2013

This design is committed to the approach of compiling a program package.rs to compile your program, but it also augments that program with some syntax extensions, apparently injecting some implicit declarations and calls into the source code.

How much does the design depend on this approach and will adding further features to the compiler require adding more syntax transformations? This sort of code can be frustrating to maintain and I don't want it getting out of control. At the least please make sure that all functions and type referenced by your generated code all live in the same module - dedicate rustpkg::rt (or similar) to 'things that rustpkg-generated code might call' (it can start as just a bunch of pub use). Don't generate calls into random library code because the breakage when that code changes is hard to track down.

Also please design these syntax transformations with the intent that they all fit into a syntax extension or compiler pass. package.rs should not be 'a rustpkg file'; it is just a Rust source file.

@brson
Copy link
Contributor

brson commented Jan 27, 2013

@z0w0: Can you describe exactly what transformations rustpkg does to package.rs?

@brson
Copy link
Contributor

brson commented Jan 27, 2013

@z0w0: How do you propose to test rustpkg? Cargo had very few tests and occasionally broke without anybody noticing.

@z0w0
Copy link
Contributor Author

z0w0 commented Jan 28, 2013

The transformations it does as follows:

  1. Generate a __pkg module
  2. Add a function listeners that returns all the rustpkg::Listeners in the package script
  3. Add a main function that is called when the program starts, it runs rustpkg::run(listeners());

It's really hard to test a package manager. I'd suggest testing it by have a standard set of test projects that all the commands are run over to ensure they're still working.

@brson
Copy link
Contributor

brson commented Jan 28, 2013

We talked about this some on IRC but here's my reason for wanting to put everything in ./.rustpkg:

Servo is a package with dependencies on a number of other packages. I maintain 4 different servo workspaces. In each workspace I might make conflicting changes to the subpackages that don't modify the version hash. If each workspace installs these different libraries to a global store then the versioning would need to be very precise to not cause conflicts.

@z0w0
Copy link
Contributor Author

z0w0 commented Jan 28, 2013

Well we don't need to put everything in .rustpkg, just the building necessities (including dependencies). I'd prefer to still install everything globally in ~/.rustpkg. It plays nicer.

@graydon
Copy link
Contributor

graydon commented Jan 28, 2013

I think the scheme @z0w0 proposes will work: rustpkg build deposits in ./.rustpkg, then rustpkg install installs to ~/.rustpkg and rustpkg prefer switches among the same-named things installed in ~/.rustpkg. Then the user can set PATH=./.rustpkg/bin:~/.rustpkg/bin:$PATH and similar for LD_LIBRARY_PATH and off they go.

The key is for builds to be "purely functional" in the sense of capturing all dependencies, which means the semvers fed in from local workspaces with their own diffs from a release tag should include a "patch level" encoding the git rev of the local workspace, because they're different sources: they produce a different artifact that the user might want to switch to using. Might also need to consider the toolchain used to build as a dependency. And probably a few other things too. Pure functional package management needs to be quite careful.

@z0w0
Copy link
Contributor Author

z0w0 commented Jan 28, 2013

@graydon yeah, I think it will work out well. I'll work on that now then. I don't quite understand the second part of your comment (although I get the gist - automatically append a generated patch thing to the semver tag), so I'll get you to elaborate next time we meet on #rust.

@pcwalton
Copy link
Contributor

I'm very opposed to putting stuff in ~/.rustpkg by default. This is one of the things cargo and npm do right: they install to the current directory.

Everyone is just going to reinvent something like RVM if we don't do this properly.

@catamorphism
Copy link
Contributor

As a data point, Cabal (the Haskell package manager) installs stuff into ~/.cabal by default -- or it did at one point -- and lots of people strongly disliked that.

@graydon
Copy link
Contributor

graydon commented Jan 28, 2013

@pcwalton there are two things being conflated here. one is using ~/.rustpkg as a cache for built artifacts identified by hash, the other is registering things by symbolic name in say ~/.rustpkg/bin/ such that a user can use a program they just installed without being in the same directory as it.

Can you elaborate on what you object to? In very precise terms. Outline the failure scenario. I can see a bunch of possible concerns and I'm not sure which you're worried about.

@pcwalton
Copy link
Contributor

Hmm, well, I guess I'm worried about two things:

  1. I write a program foo which depends on version 0.1 of libbar and I have a program bar which depends on incompatible version 0.2 of libbar. Is it the case that I can have two libbar installed simultaneously? If I have version 0.1 of libbar and I execute rustpkg upgrade libbar to upgrade to 0.2, is version 0.1 removed?
  2. I've written a Rust tool mytool. Project A requires mytool 0.1 as part of a build process. Project B requires the incompatible version 0.2 of mytool. Can I get project A to use version 0.1 and project B to use version 0.2?

@graydon
Copy link
Contributor

graydon commented Jan 28, 2013

Multiple version coexistence (down to the source hash) is a necessary part of anything we do here. The only question I'm wondering is whether you're concerned that ... say, the mechanisms for managing multiple versions might be so fragile that users wind up wanting to isolate their work in subdirs just to work around the package manager being buggy about making multiple versions coexist.

@brson
Copy link
Contributor

brson commented Jan 28, 2013

@graydon I am surprised you think we can extend the versioning scheme down to the patch level. That is even more difficult than the current scheme, which already doesn't work reliably. Each library is going to be dynamically linked to the exact binary that it encountered at build time. We might as well not have dlls at all. I anticipate many months of pain and frequently deleting all my rust libraries when something goes wrong.

@brson
Copy link
Contributor

brson commented Jan 28, 2013

@graydon To be clear, I do think you are suggesting that the built libraries of dependent packages live in ~/.rustpkg. So when I build servo, I get copies of cairo, servo-gfx, and all the libraries it depends on in ~/.rustpkg.

@pcwalton
Copy link
Contributor

For development, @brson raises a good point. It sounds wrong to install development libraries for Servo in ~/.rustpkg. For the record, Servo has about two dozen libraries and will surely grow more.

@z0w0
Copy link
Contributor Author

z0w0 commented Jan 29, 2013

The built binaries and dependencies will be built into .rustpkg. They
will not be uniquely named so it works exactly like Cargo. Upon installing
their dependencies are uniquely named and placed in ~/.rustpkg. I hope
that answers you worries. It's the best of both worlds.

For the record, I've never heard anyone complain about Cabal or Rubygems.
On 29 Jan 2013 09:29, "Patrick Walton" notifications@github.com wrote:

For development, @brson https://github.com/brson raises a good point.
It sounds wrong to install development libraries for Servo in ~/.rustpkg.
For the record, Servo has about two dozen libraries and will surely grow
more.


Reply to this email directly or view it on GitHubhttps://github.com//pull/4610#issuecomment-12811758.

@graydon
Copy link
Contributor

graydon commented Jan 29, 2013

We all clearly have strong feelings about this and are talking across purposes. Nobody wants to do anything that results in absurdity or breakage.

Let's not block this pull (which includes such pedestrian matters as "renaming cargo to rustpkg and starting a new code lineage") on the issue of version collision, considering almost nothing else in the remaining infrastructure (from library naming to build caching) exists or works properly yet.

My feeling is this is good enough to land (cargo didn't solve versions either) and we can continue to discuss. r+ from me, is anyone else too worried to let it proceed?

@z0w0
Copy link
Contributor Author

z0w0 commented Jan 29, 2013

I agree with @graydon. Most of the issues voiced here involve the backend. What's important is the frontend is right because the backend can easily be changed. I will definitely work on the ~/.rustpkg and .rustpkg thing and push it. Up to you guys whether or not you want to merge it before that.

@brson
Copy link
Contributor

brson commented Jan 30, 2013

@graydon No, go ahead.

@z0w0
Copy link
Contributor Author

z0w0 commented Feb 3, 2013

What's the status of this, @graydon? Not to rush you or anything, just wondering if everything is still cool.

@graydon
Copy link
Contributor

graydon commented Feb 4, 2013

Rebased a number of times in my workspace, keep getting pulled away before landing. There were a number of minor errors. I'm hoping to land it this week.

@graydon graydon mentioned this pull request Feb 5, 2013
@graydon
Copy link
Contributor

graydon commented Feb 5, 2013

Continued over in #4799 -- closing this one. Thanks!

@graydon graydon closed this Feb 5, 2013
bors added a commit that referenced this pull request Feb 16, 2013
Taking over where #4610 left off. Much rebasing and tidying.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants