Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Providing a definition for distributed (vs centralised) version control #757

Open
KateCourt opened this issue Sep 21, 2020 · 5 comments
Open

Comments

@KateCourt
Copy link

KateCourt commented Sep 21, 2020

Some brief history of version control is provided to introduce students to contemporary version control. This includes a reference to distributed vs centralised systems. This could expanded with a definition that does not rely on the student understanding the phrase 'meaning that they do not need a centralised server to host the repository' and instead explains what distributed means. Something along the lines of 'meaning that users each hold a copy of the code base on their own machines rather than code being held on a centralised server. This means code can be worked on simultaneously rather than only one person be able to work on a section of code at a time.'

(part of checkout process)

@markmatney
Copy link
Contributor

markmatney commented Oct 10, 2020

I agree that this part of the lesson should be developed. As-is, I don't think it's very convincing for learners. Currently, it seems to depend on learners accepting that Git opens up more possibilities for collaboration:

These modern systems also include powerful merging tools that make it possible for multiple authors to work on the same files concurrently.

I don't dispute that statement, but I don't think it's the best way to motivate distributed vs. centralized. Since learners may have had success using a centralized VCS (Google Docs, Box.com, Wikipedia, WordPress, etc.) in the past, and since remotes and collaboration aren't covered until much later, this feels like "trust me, it's better, I'll explain why later" and may not go down easily.

I think the motivation would be stronger if there was an explanation of the "single point of failure" problem of centralized VCSs, and how distributed systems address this by making many copies of the revision history. An instructor might give learners a tour of some centralized VCSs and then pose some questions:

"What happens if there's a network outage?"
"Or if the disk on the server gets corrupted?" (hardware issues happen, even in the "cloud")
"And you need to use an earlier revision of your work before the submission deadline this evening?" 😱

That distributed VCSs (1) lower the risk of data loss and (2) protect against network outages (since each copy includes a full backup of the entire revision history, with no copy being more "authoritative" than another) is a stronger selling point IMO.

All that is to say, more than just lacking a complete definition of "distributed", I feel the lesson lacks a strong motivation for distributed as an alternative to centralized.

@kekoziar
Copy link
Contributor

kekoziar commented Aug 8, 2021

I think there are two challenges presented here.

The first challenge is - as with all Carpentries lessons - to not add extra content or time to the lesson. The issue suggested to change the phrase

meaning that they do not need a centralised server to host the repository

with the expanded

meaning that users each hold a copy of the code base on their own machines rather than code being held on a centralised server. This means code can be worked on simultaneously rather than only one person be able to work on a section of code at a time.

In context, that would result in

More modern systems, such as Git and Mercurial, are distributed, meaning that users each hold a copy of the code base on their own machines rather than code being held on a centralised server. This means code can be worked on simultaneously rather than only one person be able to work on a section of code at a time. These modern systems also include powerful merging tools that make it possible for multiple authors to work on the same files concurrently.

which repeats itself and content already covered above.

@KateCourt What do you think about only revising the phrase?

More modern systems, such as Git and Mercurial, are distributed, meaning that a full repository can be copied to local computers, instead of only existing on a centralized server. (emphasis mine) These modern systems also include powerful merging tools that make it possible for multiple authors to work on the same files concurrently.

@kekoziar
Copy link
Contributor

kekoziar commented Aug 8, 2021

The second challenge is to not overburden learners with advanced knowledge that doesn't meet the objectives of the lesson. The lesson isn't motivating learners to choose between a centralized and distributed VC workflow model; it's motivating them to put their work under some type of version control. Most learners will not have been introduced to the topic of different models of VCS, so an expanded section on the differences between CVCS and DVCS may be interesting to advanced users and computer scientists - I will admit, I fell into a nice rabbit-hole on the history of Git, which happens to coincide with the theory and application of centralized vs distributed systems and workflows - but it's tangential to the lesson and the objectives of the episode.

@markmatney Do you think some of your suggestions might fit into the current exercise, or a new exercise that fits within the episodes existing learning objectives?

Although, since Google Docs, Box, and Dropbox all allow offline work, I'm not sure if What happens if there's a network outage? is a good motivator.

@kekoziar
Copy link
Contributor

kekoziar commented Aug 8, 2021

As an aside:
I really wish we could integrate the phrase "distributed merging," because I think that's really the essence of DVCS. It's not that all repositories are equal; they can't be equal while maintaining a functional development environment. The distributed workflow section of the Pro Git book includes "blessed repository" as the "canonical official" repository. Distributed merging simply allows merging from local source repos. More interestingly, this distributed merging allows workflows to be adapted to the project/company, while the centralized model only allows one gatekeeper workflow. But, this truly is an advanced topic (the earlier referenced section is in chapter 5 of the Pro Git book; our lesson only covers content in chapters 1-2 of Pro Git) and IMO a poor introduction for novice learners to Git.

@markmatney
Copy link
Contributor

markmatney commented Aug 10, 2021

@kekoziar after re-reading my earlier comment, for some reason I was conflating "distributed" with some qualities that aren't unique to DCVSs (file type agnostic, and enabling offline work). Thanks for pointing that out.

After thinking about this more, I would actually advocate for removing any more than a passing mention of distributed vs. centralized from the lesson; I think even the call-out box in question has too much info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants