Skip to content
This repository has been archived by the owner on Jan 3, 2018. It is now read-only.

About commiting HTML files #159

Closed
rgaiacs opened this issue Feb 5, 2015 · 21 comments
Closed

About commiting HTML files #159

rgaiacs opened this issue Feb 5, 2015 · 21 comments

Comments

@rgaiacs
Copy link
Contributor

rgaiacs commented Feb 5, 2015

The discussion of this issue happened at our mailing list, see http://lists.software-carpentry.org/pipermail/discuss_lists.software-carpentry.org/2015-February/002573.html.

@jiffyclub asked

what's the protocol for generating the HTML pages? Do we ask people making PRs to do it, or does a maintainer do that periodically?

@gvwilson Where can we write the answer for this question? README.md?

@wking
Copy link
Contributor

wking commented Feb 5, 2015

On Thu, Feb 05, 2015 at 01:19:17PM -0800, Raniere Silva wrote:

what's the protocol for generating the HTML pages? Do we ask
people making PRs to do it, or does a maintainer do that
periodically?

@gvwilson Where can we write the answer for this question?
README.md?

This is information for contributors, not all users, so I'd recommend
CONTRIBUTING.md.

@gvwilson
Copy link
Contributor

gvwilson commented Feb 5, 2015 via email

@karinlag
Copy link

karinlag commented Feb 6, 2015

On 05/02/15 22:31, Greg Wilson wrote:

I think CONTRIBUTING.md is the right place.

I have been thinking about this, since I sat down last weekend to work
on the sql material and suddenly discovered that I had no clue what to
do about the html files.

I think it has to be up front straight away somehow, even if that just
means that the repo has a heading somewhere at the top where it points
to where you can figure things out. A contributor should not have to
look (too) hard to figure out what to do.

karin

@wking
Copy link
Contributor

wking commented Feb 6, 2015

On Fri, Feb 06, 2015 at 08:19:30AM -0800, karinlag wrote:

I think it has to be up front straight away somehow, even if that
just means that the repo has a heading somewhere at the top where it
points to where you can figure things out. A contributor should not
have to look (too) hard to figure out what to do.

GitHub posts a nice, highlighted link to CONTRIBUTING.md at the top
whenever you open an issue or pull request 1. A link from the
README would be good to (“For information about contributing to this
project, see CONTRIBUTING.md.”).

@gvwilson
Copy link
Contributor

gvwilson commented Feb 6, 2015

GitHub posts a nice, highlighted link to CONTRIBUTING.md at the top
whenever you open an issue or pull request [1].

Yup - but if people aren't seeing it, they aren't seeing it, so we have
to help.

A link from the
README would be good to (“For information about contributing to this
project, see CONTRIBUTING.md.”).
+1.

G

@andreww
Copy link

andreww commented Feb 7, 2015

For the record, @ChristinaLK has a version of what's needed in the lessons at swcarpentry/shell-novice#68. Once that's settled down I guess it should be copied here and and made into a non-specific template (with a note somewhere in the instructions for creating a new lesson saying that the file will need updating).

Where should instructions for contributing to this repository go?

@karinlag
Copy link

karinlag commented Feb 8, 2015

On 06/02/15 18:50, Greg Wilson wrote:

GitHub posts a nice, highlighted link to CONTRIBUTING.md at the top
whenever you open an issue or pull request [1].

Yup - but if people aren't seeing it, they aren't seeing it, so we have
to help.

This is after you have opened an issue or a pull request. You have to
be fairly brave/knowledgeable to just get to that point.

A link from the
README would be good to (“For information about contributing to this
project, see CONTRIBUTING.md.”).
+1.

If you see right away that all you have to do is to fix the md, it
lowers the bar a lot IMO. +1 to having it in the readme.

K

@gvwilson
Copy link
Contributor

gvwilson commented Feb 8, 2015 via email

@wking
Copy link
Contributor

wking commented Feb 8, 2015

On Sun, Feb 08, 2015 at 04:17:57AM -0800, karinlag wrote:

06/02/15 18:50, Greg Wilson:

GitHub posts a nice, highlighted link to CONTRIBUTING.md at the
top whenever you open an issue or pull request 1.

Yup - but if people aren't seeing it, they aren't seeing it, so we
have to help.

This is after you have opened an issue or a pull request. You have
to be fairly brave/knowledgeable to just get to that point.

Then maybe we need to be more welcoming, since “there's a typo on page
$X, let me create an issue” shouldn't require much bravery or
knowledge beyond 1. Maybe the README text should be:

We welcome contributions of any kind. For more information see
CONTRIBUTING.md.

@karinlag
Copy link

karinlag commented Feb 9, 2015

On 08/02/15 17:32, W. Trevor King wrote:

Then maybe we need to be more welcoming, since “there's a typo on page
$X, let me create an issue” shouldn't require much bravery or
knowledge beyond 1. Maybe the README text should be:

We welcome contributions of any kind. For more information see
CONTRIBUTING.md.

Hm, I might not have dug enough, but I could not find a template README
to add this to. Should we have one?

K

@wking
Copy link
Contributor

wking commented Feb 9, 2015

On Mon, Feb 09, 2015 at 11:28:34AM -0800, karinlag wrote:

08/02/15 17:32, W. Trevor King:

… Maybe the README text should be:

We welcome contributions of any kind. For more information see
CONTRIBUTING.md.

1: https://help.github.com/articles/creating-an-issue/

Hm, I might not have dug enough, but I could not find a template README
to add this to. Should we have one?

I expect it should go somewhere like this 1.

@rgaiacs
Copy link
Contributor Author

rgaiacs commented Feb 13, 2015

I'm closing this since now we have CONTRIBUTING.md.

@jdblischak
Copy link
Contributor

This model of only committing the source files for a PR works when someone is proposing a small change. But what about when someone creates many changes? It has been my experience that most instructors (non-trainees that is) make changes when they are preparing for a workshop. What is the protocol then? Should they always do separate commits whenever they make a change (one for the source and one for the html)? And then when they submit a PR, they'll have to somehow cherrypick only the source commits? Or if they think ahead, they could make all the source commits in a separate branch and then merge them into gh-pages for their workshop (but that would take a lot of foresight on their part since we don't have any instructions that they should do that).

As a concrete example, see this PR to r-novice-inflammation. Because of all the recent development, there are tons of merge conflicts. But we can't separate out the Rmd from the html and md because the changes were committed at the same time.

@wking
Copy link
Contributor

wking commented Feb 16, 2015

On Mon, Feb 16, 2015 at 09:36:53AM -0800, John Blischak wrote:

Should they always do separate commits whenever they make a change
(one for the source and one for the html)?

I think so. In other projects, folks often have separate master and
gh-pages branches, and that makes this distinction easy (via a
‘*.html’ in master's .gitignore). For a gh-pages-only workflow, it's
just going to be a discipline issue to keep the changes in separate
commits.

And then when they submit a PR, they'll have to somehow cherrypick
only the source commits?

Yes, and “somehow” is probably just going to be “if I'd realized I
would be submitting these changes upstream, which commits would I have
put in feature branch $x? In feature branch $y? …”. You can also
remove any committed HTML at cherry-pick time with something like:

$ git checkout -b my-feature origin/gh-pages # or whatever the upstream lesson is
$ git cherry-pick abcd123
$ git checkout HEAD^ -- *.html
$ git commit -a --amend

which will adjust you cherry-pick to restore the version of the HTML
from before your cherry-picked commit (which will remove any HTML
edits from your commit. You might need some extra ‘git rm …’ calls
if you added a new HTML file in the cherry-picked commit).

Or if they think ahead, they could make all the source commits in a
separate branch and then merge them into gh-pages for their workshop
(but that would take a lot of foresight on their part since we don't
have any instructions that they should do that).

This is easier than reconstructing those feature branches after the
fact using cherry-picks. You can recover from the lack of foresight
(see above), but it's good practice to train folks to do all the work
that they intend to share in source-only feature branches. If you are
doing work that you don't intend to share, and then realize that you
have a sharable idea, you should:

$ git add -p
$ git commit

all the unsharable work, then

$ git stash
$ git checkout -b my-feature …
$ git stash pop
$ git commit

the shareable work to a new feature branch based on the upstream
lesson. Create a pull request from that (no need to wait until after
your workshop), and then:

$ git checkout gh-pages
$ git merge my-feature

to pull those source-only commits back into your per-workshop lesson
branch.

That's not a very complicated workflow, but it requires a good deal
more comfort with branching than our current workflows or Git lessons.
Personally, I expect all of our contributors are intelligent enough to
grasp a mental model of a directed, acyclic graph with multiple tips.
Combining that with liberal use of a graphical history viewer (like
gitg) or:

$ git log --graph --topo-order --oneline --decorate --all

(I keep most of that in an alias 1) should be sufficient to keep
folks oriented throughout a branch-y period.

@jdblischak
Copy link
Contributor

Thanks for the advice, @wking. I agree these would be good ways to develop the material if everyone had a decent understanding of Git. But the reality is that most of our instructors do not, and we have no requirement of knowledge of branching as part of instructor training. I want to write better documentation to make my life easier when contributors send PRs to the repo I maintain, but I'm not going to be able to write documentation that can also teach them how to use Git in the first place.

@gvwilson
Copy link
Contributor

On 2015-02-16 11:28 PM, John Blischak wrote:

Thanks for the advice, @wking https://github.com/wking. I agree
these would be good ways to develop the material if everyone had a
decent understanding of Git. But the reality is that most of our
instructors do not, and we have no requirement of knowledge of
branching as part of instructor training. I want to write better
documentation to make my life easier when contributors send PRs to the
repo I maintain, but I'm not going to be able to write documentation
that can also teach them how to use Git in the first place.

+1. @wking, once again, please accept that our workflow is going to be
built around minimal Git knowledge for the foreseeable future.

@jdblischak
Copy link
Contributor

But Greg, my comment was in response to your answer to the thread that started this Issue. You want development in only one branch, but then you want PRs to only include changes to the source code. Then any changes someone makes to improve the lesson for use in their workshop will be lost unless we use one of the strategies that @wking described above (because they will also have to generate the html for their workshop). There is a disconnect here. I want to improve the documentation for contributing to the R lesson, but I can't do that when even I am unclear of what the official process is (before this thread I was having contributors include the generated content in their PR).

@wking
Copy link
Contributor

wking commented Feb 17, 2015

On Tue, Feb 17, 2015 at 07:18:20AM -0800, John Blischak wrote:

… Then any changes someone makes to improve the lesson for use in
their workshop will be lost unless we use one of the strategies that
@wking described above (because they will also have to generate the
html for their workshop). There is a disconnect here.

For folks who aren't ready to use a feature-branch workflow locally,
all we need is a pointer to their gh-pages branch and a summary of the
changes they think should be upstreamed. Then anyone who's motivated
by the changes and has sufficient Git competence can go through and
cherry-pick the changes out of their branch to reconstruct feature
branches for pull-requests.

I want to improve the documentation for contributing to the R
lesson, but I can't do that when even I am unclear of what the
official process is (before this thread I was having contributors
include the generated content in their PR).

This still works, it just has a higher risk of conflicts in the
auto-generated HTML. If you have low enough churn in your lesson, it
probably won't be a problem. You only have to have a branch-capable
Git user involved when you do have conflicts, and that user could be
the lesson maintainer or a third-party volunteer. If you find
yourself getting many conflicting PRs from a single source, it might
be more time-effective to help that contributor figure out a
feature-branch workflow than it would be to go through and massage
each of their PRs, but for one-off contributors it probably doesn't
matter.

@jdblischak
Copy link
Contributor

@wking , I just don't see how this approach scales. SWC is growing larger by the day. We're increasing both the number of distinct lesson repos and instructors. For SWC to grow into a large, distributed, open-source project, we need reliable documentation that any new collaborator can read and then start contributing. Determining how to merge PRs on a case-by-case basis is too time-consuming.

P.S. Perhaps this problem is only particularly acute for the R lessons. Because we write in R Markdown and thus re-run the code when we build a file, there tend to be lots of changes to the html (e.g. because different versions of R have slightly different wordings of error messages). This of course creates many unnecessary merge conflicts.

@wking
Copy link
Contributor

wking commented Feb 19, 2015

On Thu, Feb 19, 2015 at 07:07:21AM -0800, John Blischak wrote:

@wking, I just don't see how this approach scales.

The approach that scales is getting contributors to create source-only
pull requests. If you're instructions are:

  1. Don't make commits that touch both the source and auto-generated
    content (‘git add …’ and ‘git commit’ without ‘-a’ are useful
    here).
  2. In fact, you probably don't need to commit the auto-generated
    content at all until you're done polishing your source changes.

That's the basic workflow that gets maintainers something they can
cherry-pick. For bonus points, contributors could:

3a. ‘git checkout -b my-feature’ to collect your source changes in
feature branches, so it's easy to submit them upstream as
targetted pull requests. After making the feature change, ‘git
checkout gh-pages && git merge my-feature’ to integrate it with
your other changes.

Once they get comfortable with 3a, they can improve to 3b:

3b. ‘git checkout -b my-feature upstream/gh-pages’ to root your source
changes in a feature branch based on the upstream lesson content.
The rest of the workflow is like 3a, but you have a higher risk of
merge conflicts if the upstream lesson content has diverged from
your local gh-pages.

we need reliable documentation that any new collaborator can read
and then start contributing.

I don't think my suggestions one and two are beyond anyone's
capabilities. Folks who are very new to Git may take some time before
they pick up suggestion three though.

Determining how to merge PRs on a case-by-case basis is too
time-consuming.

No matter how good your docs are, some folks will fall through the
cracks and create ugly, mashed-together, catch-all commits ;). If
their changes are exciting enough, I'm just saying that anyone can go
in and refactor their ugly commit into something that isolated the
content-change in a mergable way. That's going to take some
familiarity with massaging history, but (a) it's pretty easy to
massage history once you've been working with Git for a while, (b)
this should be only for the small subset of contributors who didn't
find, read, or remember the docs, and (c) if nobody has the time to
fix it, you can just not merge the PR, or go through a conflicty merge
and handle the resolution like you would for human-generated
conflicts.

@kynan
Copy link

kynan commented Feb 21, 2015

I tend to agree with @wking. For one imho one of our goals should be to make people more comfortable with git to a level they can follow the process outlined above. I also agree that it's not too hard to rebase someone else's pr to make it conform to these rules. I have done similar git surgery many times and I assume so have many of the more senior SWC contributors.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants