Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Format of Pipfile #46

Closed
pradyunsg opened this issue Jan 11, 2017 · 26 comments
Closed

Format of Pipfile #46

pradyunsg opened this issue Jan 11, 2017 · 26 comments

Comments

@pradyunsg
Copy link
Member

#10 organically grew into a discussion on which format should be used in Pipfile. Since that is a completely different question from whether Pipfile should be executable, I think that it deserves a dedicated issue.

You can obviously see comments there for the (brief) history on this.

@kennethreitz
Copy link
Contributor

We're pretty set on the current implementation. It's good.

@pradyunsg
Copy link
Member Author

pradyunsg commented Jan 11, 2017

As of now, here's what I see in the contenders for the Pipfile format:

DSL
  + Python-esque
  + Looks good and neat for nearly all usecases
  + Familiar syntax to everyone expected to use this file
  + Good support for optional arguments
  - Will require effort to support from tools
    . can be supported by pure-Python tools easily - use same library as pip
  - Involves a "new" learning curve (kinda)
  - Differences between various Python versions
    - Would need own separate parser implementation.

TOML
  + Already a part of future ecosystem - PEP518 starts the process
  + Has a (really nice) specification
  + Used by Rust community for their packaging needs
  + Has existing packages in various languages
  - Can get non Python-esque (and ugly) in complex cases

TOML with an optional executable Python Pipfile.in
  Everything in TOML
  + Easier for programically generating the final Pipfile
  - Another extra file; can add confusion.

Every other format has been eliminated because of the same reasons as they were in PEP-518.


If I could do whatever I wanted, I would yank out a spec for something like YAML and ask everyone to just use it where it makes sense in their domain.

More realistically though, as @dstufft puts it:

So far every attempt I've seen at this comes down to a single trade off, Ease of Implementation for tool authors versus ease of use for end users.

TOML

To me, the trade-offs seems most "balanced" with TOML. I just like how the picture comes together with it. There's a pyproject.toml and a Pipfile. Both use the same format. Files of that format can be easily generated by some tool from some other syntax that the end-user fancies, because it's a standard format.

DSL

As I weighed up all of these options, I realized that the Python DSL is actually really nice. But there's just some really bad issues with it:

  • It's a DSL

    It's a custom language. Tools will have to be written to support it. This can be largely handled if the support library pip would use is designed with support for editing the file in mind. That, it so happens, is also a valid use-case to support even for pip as --save is something we want.

    Even with this support library, this DSL would restricted to merely Python-only tools. If it's deemed worth it, then we could add some standardized Intermediate Structure that the support library would produce non-Python tools to touch it. It'll take effort but it can be done.

  • Python is a moving target

    The thing is, Python, the language itself, is constantly evolving. The most recent major release 3.6.0 saw the addition of multiple new syntax features. If the DSL implementation uses the AST, these features are not going to be available on older versions and the file won't the parsable on them if it uses these files features.

    This means that if the DSL needs to be consistent across Python versions, it would have to be implemented in a manner independent of the Python version used - not using the stdlib provided AST but a dedicated parser.

    A dedicated parser is a lot of work and while being the Python-like, the DSL would be a completely independent from Python - the language.

At which point, this leads to the following decisions to be made:

  • How important is the consistency across Python versions?
  • If it's important, does the effort on maintaining a custom declarative DSL have justified rewards in usability?

The answers to these questions lead me to lean toward TOML.

Honestly though, if we settle on anything other than TOML, I think it's worth discussion (on distutils-sig) if we should then switch syntax for pyproject.toml to the same as well. (I wrote this assuming that YAML was still on the cards, applies to the DSL less so)

Just to restate an important thing, this issue doesn't need to hamper the ongoing development of a prototype. It just needs to be finalized before this makes it into pip.

@pradyunsg
Copy link
Member Author

pradyunsg commented Jan 11, 2017

@kennethreitz For a prototype, a POC, I don't mind the exec call in the implementation.

I don't think anyone would want an exec making it into the final thing that gets vendored into pip.

@flying-sheep
Copy link

flying-sheep commented Jan 11, 2017

could you please explain what you mean by “We're pretty set on the current implementation”?

i don’t think it’s a good idea to remain with it if you mean what assume: i don’t think one should choose the initial prototype without having it prove itself in an objective comparison.

and the most upvoted comments in the other thread – that talk about the format itself, not its executability – are the following: (i extracted the parts suggesting actual formats)

@HolgerPeters commented on 22 Nov 2016

I strongly suggest not to make Pipfile a Python file, nor a subset of Python, parsed by a specific DSL-parser […] I am convinced that Pypa should look into Rust's Cargo.toml as an inspiration. Rust's packaging infrastructure is not "right by accident", but by careful research and design. Then, either use, TOML, YAML or ConfigParser formats

👍 25

@flying-sheep commented on 21 Nov 2016

[…] For me, TOML is by far the best option, and it being little known is easily offset by maturity, and simplicity in use and scope

👍 9

@jondot commented on 22 Nov 2016

I would suggest considering the element of least surprise. What format would be least surprising both for developers and IDEs/tooling? […] If you want to welcome people in (and personally I think there's a ton of weight on a package manager for that), use a boring, omnipresent format. Use json, or yaml. If you want to keep people out, use Python or any new DSL which can be easy to build with a simple peg parser.

👍 7

@ncoghlan commented on 22 Nov 2016

There are only two real options on the table for the abstract dependency declarations: the Python-based DSL and TOML. […] As pretty as I think the Python-based DSL is, I'm struggling to see it ending well given the other factors at play […]

👍 7

(there’s also a 👍 20 comment by @domenkozar, which however addresses executability)

so of 4 comments:

Format 👍 👎 Why those scores?
TOML 3 0 Unsure if @jondot thinks TOML niche enough to recommend against, so I put him at ±0
YAML 1 @ncoghlan thinks only TOML and the PyDSL should be considered (-1), and i think it’s OK (+½)
PyDSL 0 4 all 4 comments are explicitly against it

@sirex
Copy link

sirex commented Jan 11, 2017

I would add another minus against DSL. Too much flexibility can grow into something monstrous.

@dstufft
Copy link
Member

dstufft commented Jan 11, 2017

Sorry that I've been kind of AWOL on Pipfile discussions, looking for and starting a new job tend to be a bit of a tax on time.

The proof of concept is likely to go forward with the Python DSL if only because @kennethreitz has already written (most?) of the code to handle it. It's important to stress that this is only a proof of concept though and part of that is going to be hashing out how the idea in general works and how pip integrates with it as well as testing any issues with whatever format we're using.

That being said, I was originally very pro Python DSL, but I've since started wavering more towards the TOML side of things. For two reasons:

  • I think the Python DSL is much nicer in complex cases like our examples tend to be, but that the TOML file tends to be more succinct in the simpler, more likely to be common cases. For an example, something that just lists some dependencies.
  • I am concerned about the question of "What is Python", as others have pointed out, the definition of what is and isn't valid Python changes between releases. We could hopefully narrow this down by heavily restricting ourselves only to syntax that is 2.x/3.x compatible and is unlikely to ever change, but the issue is still there.

@dstufft
Copy link
Member

dstufft commented Jan 11, 2017

To be clear, I am not sure the right direction yet, and hopefully experience with the PoC will give more insight.

@kennethreitz
Copy link
Contributor

Can someone please paste the example from the readme in an ideal TOML format here?

@flying-sheep
Copy link

flying-sheep commented Jan 11, 2017

not sure what would be ideal, as you had time to iterate.

here’s an ad-hoc attempt. everyone feel free to improve it.

[[source]]
url = 'https://pypi.org/'
verify_ssl = true

[requirements]
python = '2.7'

[packages]
requests = { extras = ['socks'] }
Django = '>1.10'
pinax = { git = 'git://github.com/pinax/pinax.git', ref = '1.4', editable = true }

[dev-packages]
nose = '*'

alternatively you can expand inline tables

[packages.pinax]
git = 'git://github.com/pinax/pinax.git'
ref = '1.4'
editable = true

also possible: quoted names or named sources:

[[source]]
'https://pypi.org/' = { verify_ssl = true }
[source.pypi]
# url for PyPI is known
verify_ssl = true

i follow cargo here, which means that using as string as value is a shortcut for { version = 'the-string' }.

@ncoghlan
Copy link
Member

ncoghlan commented Jan 12, 2017

As @flying-sheep notes, any chosen TOML syntax would need iteration to figure out a table structure that gave nice readable files in common cases, but these are updated versions of a couple of draft examples I came up with in #10 (comment):

[Sources]
pypi = "https://pypi.org"
internal = {url: "https://internal/", verify_ssl=False}

[Packages]
django = ">= 1.10.1"
dateutils = {}
mything = {group="development", source="internal"}

[Packages.myotherthing]
group="development"
source="internal"

And showing some nesting examples:

[Groups.development.Packages]
# group="development" implied by nesting
mything = {source="internal"}

[Sources.internal.Packages]
# source="internal" implied by nesting
mything = {group="development")

[Groups.development.Sources.internal.Packages]
# source="internal" implied by nesting
# group="development" implied by nesting
mything = {}

The convention I've adopted there is that the object categories defined by the file format (Groups, Sources, Packages) start with a capital letter (similar to Python classes), while the user-supplied values (group names, source names, package names) and the component attributes (group, source, url, verify_ssl) are conventionally all lowercase.

Without some kind of convention along those lines, nested examples like the last one become really hard to read, as the category names don't stand out from the names of the specific instance within that category and the whole line dissolves into a kind of alphabet soup.

Relative to the DSL, the main readability feature lost in going to TOML is breaking the link between visual nesting (indentation) and structural nesting (dotted table names). However, losing that is also the main benefit from a tool development perspective, since it removes the sensitivity to leading whitespace, as well as much of the context dependence of attribute assignments (only the immediately preceding table header, if any, matters).

Folks that aren't using automated tools to manipulate their TOML files could optionally inject visual structure via comments if they chose to do so:

# WITH group = development
        # FROM source = internal
                [Groups.development.Sources.internal.Packages]
                mything = {}

As with anything based on comments though, the only defence against the comments and the actual configuration getting out of sync would be code review (and potentially a commenting-convention-aware linter)

@pradyunsg
Copy link
Member Author

could you please explain what you mean by “We're pretty set on the current implementation”?

I took it to mean that in this PoC version, there is no need to change the format. And with the current format, we can experiment with the PyDSL, which can't hurt.

@sirex
Copy link

sirex commented Jan 12, 2017

I took some time and tried to assemble an ideal TOML pipfile example.

[sources]
pypi = "https://pypi.org"
legacy = {url="http://pypi.python.org/", verify_ssl=false}

# "main" is the default group.
[packages.main]
django = "~= 1.10"

[packages.py27]
django = {version="~= 1.9", source="legacy"}

[packages.py36]
django = "~= 1.11"

[packages.pypy]
django = "~= 1.8"

[packages.linux]
linux-spicific-pkg = "1.0"

# Group parameters.
[groups]
py27 = {version="~= 2.7"}
py36 = {version="~= 3.6"}
pypy = {python="pypy"}

# Pinned versions, updated automatically by pip.
[versions.main]
django = "1.10.0"

[versions.py27]
django = "1.10.0"

[versions.py36]
django = "1.11.1"

[versions.py36]
django = "1.9.7"

Here all version specifiers use PEP-04440 version specifiers.

All packages should be defined in [packages.<group>] section, where main is the default group.

Group parameters can be defined in [groups] section.

In addition, I think it is a good idea, to save pinned version in the same single TOML file. This would work by identifying [version.<group>] sections and updating values in an automated way. Automated pinned version updates should leave all other sections with comments untouched.

[groups] section parameters would have this options: version - python version specifier and python - python implementation (cpython, pypy, ...).

Tried to look to several example projects and wrote TOML files:

Sanic

https://github.com/channelcat/sanic

[packages.main]
uvloop = ">= 0.5.3"
httptools = ">= 0.0.9"
ujson = ">= 1.35"
aiofiles = ">= 0.3.0"
multidict = ">= 2.0"

[packages.dev]
httptools = ""
ujson = ""
uvloop = ""
aiohttp = ""
aiocache = ""
pytest = ""
coverage = ""
tox = ""
gunicorn = ""
bottle = ""
kyoukai = ""
falcon = ""
tornado = ""
aiofiles = ""

Ansible

https://github.com/ansible/ansible

[packages.main]
paramiko = ""
jinja2 = ""
PyYAML = ""
setuptools = ""
pycrypto = ">= 2.6"

[groups]
main = {version="~= 2.7"}

Pandas

https://github.com/pandas-dev/pandas

[packages.main]
python-dateutil = ""
pytz = ">= 2011k"
numpy = ">= 1.7.0"


[packages.py3]
python-dateutil = ">= 2"

[groups]
py3 = {version="~= 3"}

Fabric

https://github.com/fabric/fabric

[packages.main]
paramiko = ">= 1.10, <3.0"

[packages.py26]
paramiko = ">= 1.10, <1.13"

[groups]
py26 = {version="< 2.6"}

@kennethreitz
Copy link
Contributor

kennethreitz commented Jan 12, 2017

there are again only two groups at this time (a-la composer): default and development.

@kennethreitz
Copy link
Contributor

I am not a fan of the TOML syntax presented so far — I think it's far from intuitive, and would require someone to copy/paste from an example every time they go to use it.

@kennethreitz
Copy link
Contributor

kennethreitz commented Jan 12, 2017

This is the best example I've seen so far, and what I asked for:

[[source]]
url = 'https://pypi.org/'
verify_ssl = true

[requires]
python = '2.7'

[packages]
requests = { extras = ['socks'] }
Django = '>1.10'
pinax = { git = 'git://github.com/pinax/pinax.git', ref = '1.4', editable = true }

[dev-packages]
nose = '*'

I think that is doable.

@kennethreitz
Copy link
Contributor

kennethreitz commented Jan 12, 2017

now the question is, how, in this example, would i specify 'requests' without it being equal to anything?

requests = '*'

?

@kennethreitz
Copy link
Contributor

I see, = '*' is appropriate. Okay, I will work on a branch that uses this functionality.

kennethreitz added a commit that referenced this issue Jan 12, 2017
kennethreitz added a commit that referenced this issue Jan 12, 2017
@kennethreitz
Copy link
Contributor

Got the basics working in the new toml branch/PR. Check out the new README for details:

https://github.com/pypa/pipfile/blob/4230ddf91f25e5ef33eed88d12f0bca672818214/README.rst

@kennethreitz
Copy link
Contributor

Merged the TOML branch.

kennethreitz added a commit that referenced this issue Jan 12, 2017
* basics of TOML working

#46

* update readme

#46

* cleanup parser

* cleanup

* TODO
@kennethreitz
Copy link
Contributor

Pipfile will now use TOML. I like it.

@kennethreitz
Copy link
Contributor

Check out the main README now for details.

@FFX01
Copy link

FFX01 commented Jan 12, 2017

I'm happy with the way this worked out. Looking at the example in the README, I feel that it is very intuitive and concise. I think this format will be simple for beginners to grasp and remember.

@sirex
Copy link

sirex commented Jan 12, 2017

@kennethreitz in your version [requires] section is global for all packages groups. But it is quite common case, when you want different set of packages for different python versions.

Here is slightly modified version of your example, that address this case:

[[source]]
url = 'https://pypi.org/'
verify_ssl = true

[requires]
python = '~= 2.7, ~3'

[packages]
requests = { extras = ['socks'] }
Django = '>1.10'
pinax = { git = 'git://github.com/pinax/pinax.git', ref = '1.4', editable = true }
pathlib = { python = '>= 2.7, < 3.5' }

[dev-packages]
nose = '*'

In this case, pathlib will be installed only for python 2.7..3.4. Python 3.5 and later have pathlib built-in.

And if section names would be defined using TOML syntax:

[packages.default]

[packages.dev]

Then exactly same schema can be used for freeze file too. There will be no need to support two different schemes.

@sirex
Copy link

sirex commented Jan 12, 2017

Packages without version can be specified like this:

nose = {}

@ncoghlan
Copy link
Member

@flying-sheep @kennethreitz Good call on using "-packages" as a suffix to denote alternative package groups, as well as starting small with just the single predefined "dev-packages".

@pradyunsg
Copy link
Member Author

pradyunsg commented Jan 14, 2017

@flying-sheep @kennethreitz Good call on using "-packages" as a suffix to denote alternative package groups, as well as starting small with just the single predefined "dev-packages".

I agree. But if support for custom groups will be added, something like what @sirex says will be the way to go; right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants