Try to figure out the markup type of extensionless README files #3

avar opened this Issue Feb 26, 2010 · 26 comments


None yet
7 participants

avar commented Feb 26, 2010

It's somewhat common especially for Perl projects to have POD markdown in a README file without a .pod extension, here's one example of this.

The simplest way to support this would be to simply shell out to file(1) for README files:

v rakudo (master) $ file README 
README: Perl POD document text

Ciantic commented Mar 5, 2010

Instead of figuring it out, make select box where one can choose it, perhaps to Admin panel of project.

  1. Auto guess the markup from extension, set the select option in Admin panel to this default guessed value.
  2. By having this select box, one can override the guessed value if it turns out badly.

Here is one example why valid files turns out badly:

Because some rST files require additional directives/etc which are not defined within GitHub, and is rendered very badly, see e.g. my Sphinkydoc README perfectly valid Sphinx documentation rST cannot be rendered in GitHub. So I have option by removing extension, to render it as plain text. But that is crude option, there should be a way to override it even though my file were named README.rst.

kneath commented Jul 25, 2010

I am almost definite we will not add a select box. This clutters settings and removes portability (forks, cloning & re-pushing, etc) and gains nothing over just adding the correct extension.

If there is something wrong with the rST parser, please feel free to have a go at fixing it. I don't think any of us regularly author documents using rST, so it's hard for us to know what quirks our implementation has.

avar commented Jul 25, 2010

Yeah, a select box is the wrong way to go, for the complexity and unportability reasons you mention.

why not just ln -s README.ext README? that way you have a default "README", but also a file that explicitly says what the extension is.

mgorny commented Aug 22, 2010

How about looking for a vi-like modeline at the end of the file?

For an example:

@infinity0: I tried doing that; github is ignoring the symlink. E.G. I created a file named README.txt containing markdown text, symlinked to it with ln -s README.txt README.markdown, then did git add README* && git commit -m "blahblah" && git push github --all. Afterwards that branch was displaying README.txt.

I also tried symlinking from README.txt to somefile, with no other README* files in the project root; this causes github to think there is no README file. I guess they're not following symlinks, maybe for security reasons.

obviously if you have README.ext1 and README.ext2 github will get confused.

this thread is about extensionless README files, and my comment was about that. what you tried, has nothing to do with what i was talking about. ln -s README.ext README works fine.

@infinity0: As I understand it, the markup hooks will choose e.g. README.markdown, or README.rst over README.txt or README; this would seem to be necessary in order to enable projects to give their documentation file a standard name. If you want to confirm this, you can check out the test repo I made. README and README.txt are, I think, equivalent in the eyes of the github markup system.

It looks like the README markup detection just ignores a file if it's a symlink: check the test repo's branch with a symlinked README.markdown.

So if README.markdown is itself a symlink, to README.txt or README, it won't get detected, and github will just display the README/README.txt on your repo page. I realize (now) that this wasn't the point of your original comment; you were suggesting the opposite approach of putting markdown in README.markdown and symlinking to it from README.

This works pretty well, except that since not all systems support symlinks, it's better to have the actual file be the one with the standard name, that may be required by packaging tools, git-aware applications, or other entities not intelligent enough to just read the README.markdown file instead.

It also means that even if your system does support symlinks, if your project's packaging system needs to have README or README.txt as an actual file, you have to add a build step to make that happen.

Also, because of the way that the github markup system ignores symlinked files, this approach actually works by hiding the extensionless README file from it, and is therefore not related to the topic of this thread.

um, i'm not sure what you mean by a "standard" name, as applied to readme files with many possible extensions. and why would you ever want to symlink README.markdown to README.txt? isn't that completely confusing?

OTOH, you might want to offer your readmes in different formats, then symlink the plain README file to the default format.

i think you're way overrating the symlink problem. all modern systems work fine with symlinks, even NTFS. if a packaging system really requires something be an actual file rather than a symlink, that would normally be treated as a bug, except in highly specialised and unusual situations.

avar commented Sep 22, 2010

As the original reporter I just want to point out that this entire symlink discussion is completely off-topic for this bug. For that I suggest this incoherent bug report filed yesterday.

This bug is specifically about how GitHub should render extensionless README files. There's a lot of projects that use this style.

Most free software projects don't use a README file with an extension, and those projects usually aren't willing to add one just to appease a platform like GitHub, especially since fixing that should be trivial.

I'd offer to do it, but as far as I can see that part of the source isn't available.

This discussion of symlinks is really a subtopic of a subtopic, since the thread earlier diverged (in the first comment) into a discussion of the problems inherent to automated detection of filetype, and a practical way to avoid and/or override the detected type.

I understood that @infinity0 brought up the topic of symlinks because symlinking to or from a file with a format-specific extension is effectively a way to indicate the format of an extensionless filename or one with a generic extension like .txt.

If I recall correctly, Windows Explorer won't, by default, open extensionless files, and for many users won't know what to do with a file with a .markdown or .rst extension. However, it will open .txt files in Notepad or an equivalent editor. Since markdown, rST, and other text-based formats are text files, it makes sense to put structured text into a file with a .txt extension. Indeed, that's a big part of why markdown et al are useful.

So someone might want to symlink from a file with a .markdown extension as a way to indicate to both humans and automated systems like the github markup processor that the README.txt file is actually in markdown format. This is better than the converse because the symlink isn't guaranteed to work cross-platform. So this way the greatest common denominator (windows users without decent text editors installed) can still open the .txt file smoothly, but users and systems which prefer markdown and have symlink support can readily discover and access the markdown format. Meanwhile the maintainer of the project doesn't have to deal with the confusion of keeping the same information in two files.

@avar: Since the discussion of symlinks is fairly tangential at this point, I'm glad you've cross-linked to the allegedly incoherent bug report that you evidently managed to understand at least the basic idea of. I also tried to collect various alternative ways to indicate the file format of a README without having to duplicate it as a second file.

A lack of support for symlinks under Windows is the underlying reason for this issue with msysgit, entitled "Preservation of Symbolic links in windows", being flagged "Won't Fix".

kneath commented Sep 22, 2010

Most free software projects don't use a README file with an extension, and those projects usually aren't willing to add one just to appease a platform like GitHub, especially since fixing that should be trivial.

Maybe most free software project should start using an extension. Extensions are universally accepted as a way to distinguish file types. Why is there so much backlash against this? I end ruby files with .rb, php files with .php, pearl files with .pl, jpeg files with .jpg, word docs with .doc, text files with .txt... why should READMEs break this long-lived tradition?

There is almost no chance we will be auto-detecting file-types based on the content. But we're looking into adding support for symlinks, so that issue should fix a lot of people's comments in this thread (ones that keep readme in different locations not at project root)

avar commented Sep 22, 2010

Why is there so much backlash against this? [...] why should READMEs break this long-lived tradition?

Calling it a backlash implies that people are actively refusing to add file extensions. That isn't the case, just like executable script files usually have an extension README files usually don't.

why should READMEs break this long-lived tradition?

Most projects have files called README, COPYING, INSTALL, not README.txt, COPYING.txt or INSTALL.txt. Maybe not new projects on GitHub, but existing projects like perl and even git itself.

There's simply a different convention for these files than script files like .pl and .rb. These conventions have even made it into some coding standards.

There is almost no chance we will be auto-detecting file-types based on the content.

Fine, GitHub can do what it wants. But there will always be projects that use extensionless README files, and sometimes these files use on of the incidental markup formats like POD or Markdown.

In those cases auto-detecting the format would give these projects a much better GitHub landing page, since they'd have clickable links and markup.

mgorny commented Sep 23, 2010

Extensions are universally accepted as a way to distinguish file types.

Yeah, in the early 90s.

I end ruby files with .rb, php files with .php, pearl files with .pl,

And nothing's gonna happen if ruby and perl scripts won't start with shebang, or php scripts won't use .

why should READMEs break this long-lived tradition?

Because they follow even an earlier tradition which of you, Windows kids, never heard of. README file is supposed to be named README, not README.whatever.

The only reason I care is that markups like Markdown support writing docs which are quite readable unparsed too. Users of my program don't care what markup github uses, or what wannabe-extension it wants for it; they just want to have that README file like most of software does.

There is almost no chance we will be auto-detecting file-types based on the content.

I don't care about auto-detecting. I'd be happy to see github supporting most common modelines and looking them up for syntax declarations. Just like Python looks for encoding declarations (PEP-0263). Or, maybe Python should use you awesome extensions too?

But we're looking into adding support for symlinks, so that issue should fix a lot of people's comments in this thread (ones that keep readme in different locations not at project root)

And once again you're requiring github users to commit fancy hacks which have nothing to do with the real code.

Something that's important to keep in mind is that anything that github does in this regard needs to be very careful not to get in the way. Any false positive is going to potentially be a major annoyance for someone who doesn't know and doesn't care about this system at all, and might potentially go unnoticed by repo maintainers for some time, making their project homepages look pretty crappy in the meantime and making them quite unsatisfied clients upon its discovery.

The nice thing about using symlinks is that there's no chance that some file will just happen to contain the magical bits of text that make github think it's something that it's not, and no chance that someone's repo homepage will fall victim to a tricky bug in the detection code. So this ends up being less of a hassle for github staff and for potentially numerous users. Symlinks also have the advantages of living in the repo, as opposed to as metadata in github's DB, and actually doing something useful outside of the context of github's markup-rendering system. Plus it's pretty obvious how they work, whereas with something like modelines people would have to learn some fairly obscure microformat. I've been coding exclusively in vim for a couple of years; I've almost never used, and rarely seen, a modeline.

The only downside to symlinks that I can think of is a lack of universal support for them. But since this is just a limitation of the usefulness of the proposed solution, as opposed to a potential problem it would deliver upon unsuspecting users, it doesn't seem like much of an argument against it. Also you'd pretty much just have to get somebody running another OS to commit the symlink once and then just not commit any changes to that file.

I guess that for people who are bothered by the presence of a second filename containing the same info, that's a downside. I think compared to the uncertainty of whether, for example, your rST file is going to be detected as markdown, both now and in future commits, this is not likely to be a big issue for people. Especially considering that, again, the worst-case scenario with symlinks is that you just don't get a fancy-looking repo page.

lol i don't understand how there has been so much discussion on this simple issue in the space of 24 hours. the solution i gave up in post 4 works fine, works right now on github, doesn't interfere with existing traditions, and is really simple to do...

avar commented Sep 23, 2010

We're aware of the workaround in post #4. But a workaround is not a solution as I and mgorny have pointed out.

As has already been pointed out, some systems don't support symlinks, and it's important to have a standard filename (i.e. those used by "existing traditions) available as a regular file. If you want to contribute to the thread, please take the time to actually read what others have written, rather than just laughing at the volume of it.

@avar: sorry, but i can't see any point (from you or mgorny) about why my method is just a "workaround" rather than a "solution". maybe i am missing something, in which case please quote from one of the previous posts.

@intuited: i already replied to those points. it looks like you should follow your own advice. also, lol, did my post really offend you that much that you feel the need to reprimanding me for typing "lol"? grow some skin.

as for the point at hand, the only significant reason raised (indirectly) so far for disliking my method is that it won't work on windows XP. i don't see it as that much of a problem, because (-) README is a *nix tradition, most of the time it's README.txt on windows (-) windows XP is on the way out (-) the symlink will be created as a regular file with the contents "README.ext", and it's not exactly rocket science to work out what this is supposed to mean.

kneath commented Sep 23, 2010

This is way out of control. We are aware of the issue, and we may or may not provide solutions for it. We shipped symlinked readmes yesterday — no further discussions are necessary (no matter how big your beard is).

@infinity0: A few posts back I explained my objections this way: "[By symlinking to README.txt,] the greatest common denominator (windows users without decent text editors installed) can still open the .txt file smoothly, but users and systems which prefer markdown and have symlink support can readily discover and access the markdown format." This was the response I was referring to.

I don't think it's safe to assume that a windows user is going to get a copy of a symlink. If I roll up a zip of my project, and somebody unzips it on Windows, what happens to a symlink README? Presumably it depends on the archiver that I use and the unzipper they use. Anyway the point is that I don't want to have to research this or take your word for it, I just want to let github, and some relatively savvy end-users, know that my README.txt file is in markdown format with the fewest possible potential complications.

It's really not a huge deal but it is important to have the standard documentation file be available in the standard way, regardless of what OS people are using and how they got the file. It's a lot easier to make github's markup formatter check for symlinks than it is to do a comprehensive survey of all the possible permutations of operating system (including iPhoneOS, Symbian, etc.), git client (including hg2git, msysgit, various pure-$LANGUAGE implementations etc.), (un-)packaging tools (assuming you're not rewriting the file as part of the package build procedure), etc. to make sure that people are actually going to get the README file if it's in a symlink.

Assuming that the symlink README doesn't come through, probably a human is going to find and open the markdown file (barring, again, uninformed windows users). However, an automated system isn't necessarily going to be that smart. For example: if starts up, and they do the same README markup processing thing as github, with the same no-symlinks limitation, except that don't look for README.markdown, your project's readme doesn't get displayed there. It's a what-if, but there's pretty much an infinite number of what-ifs, so it makes sense to implement the snazzy extra, rather than the core functionality, using the unreliable feature.

@kneath: I guess I tripped over my metaphorical beard there; I started writing that post before you posted yours. Glad to hear that symlink support is online, thanks.

mgorny commented Sep 24, 2010

Symlinks are workaround, not a solution. They require the repository to implement fancy hacks for code hosting, and even if they are just links, they are somehow equivalent to keeping two copies of the same file.

I don't want my users thinking 'why the heck is this file symlinked to the README?' as I thought once 'why the heck does this README file end with .md? Is that README for some "md" component?' Yep, appending '.something' to a doc is quite popular whenever the doc describes some other code part.

I still think modeline support would be the best possible solution here. And AFAICS bitbucket does something similar; never heard anyone having problems because of it.

kneath commented Sep 24, 2010

If I need to delete this issue, I will.

mgorny commented Sep 24, 2010

That's what you call a professional approach? Delete a feature request because you don't like it?

weiss commented Nov 18, 2013

It seems the symlink workaround stopped working recently? I.e., if is a symlink that points to README, GitHub no longer renders the README contents as Markdown?

mhozza pushed a commit to mhozza/biopython that referenced this issue Feb 6, 2014

@gabipurcaru gabipurcaru referenced this issue in martanne/vis Jan 2, 2015


Fixed README #2

This issue was closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment