Code Versioning
Introduction

In this notebook, we'll explore the basics of code versioning using Git, a widely used version control system. Version control is essential for managing changes to your codebase, collaborating with others, and maintaining a history of your project. We'll cover the fundamental concepts of Git, how to perform common version control tasks, and best practices for organizing your code repository.
Table of Contents

    What is Version Control?
    Introduction to Git
    Basic Git Commands
    Working with Branches
    Collaborating with Git
    Step-by-Step Example
    Exercise

1. What is Version Control? <a name="1"></a>

Version control is a system that records changes to a file or set of files over time so that you can recall specific versions later. It allows you to:

    Track Changes: Keep a history of modifications.
    Collaborate: Work with multiple people on the same project.
    Backup: Maintain copies of your codebase.
    Branching and Merging: Experiment with new features without affecting the main codebase.

2. Introduction to Git <a name="2"></a>

Git is a distributed version control system, which means every developer has a complete history of the project on their local machine. It is fast, scalable, and supports non-linear development through branching and merging.
Installation

To install Git, follow the instructions on the official Git website.
3. Basic Git Commands <a name="3"></a>

Here are some fundamental Git commands to get you started:

    Initialize a repository: git init
    Clone a repository: git clone <repository-url>
    Check status: git status
    Add changes: git add <file-or-directory>
    Commit changes: git commit -m "commit message"
    View history: git log
    Push changes: git push
    Pull changes: git pull

Example Workflow

    Initialize a repository:

    bash

git init

Add files and commit:

bash

git add .
git commit -m "Initial commit"

Check status:

bash

git status

View history:

bash

    git log

4. Working with Branches <a name="4"></a>

Branches allow you to work on different versions of your project simultaneously. The default branch in Git is called main (or master in older repositories).
Creating and Switching Branches

    Create a new branch: git branch <branch-name>
    Switch to a branch: git checkout <branch-name>
    Create and switch: git checkout -b <branch-name>

Merging Branches

To merge changes from one branch into another, use the git merge command:

    Switch to the branch you want to merge into:

    bash

git checkout main

Merge the other branch:

bash

    git merge <branch-name>

5. Collaborating with Git <a name="5"></a>

When working with a team, Git allows you to collaborate efficiently. Here are some common tasks:
Cloning a Repository

To start working on a project, clone the repository:

bash

git clone <repository-url>

Pushing and Pulling Changes

    Push changes: git push
    Pull changes: git pull

Resolving Conflicts

Conflicts occur when multiple changes are made to the same part of a file. Git will prompt you to resolve conflicts manually before proceeding with a merge.
Using Remote Repositories

Remote repositories allow you to share your work. Common remote operations include:

    Add a remote: git remote add <name> <url>
    View remotes: git remote -v
    Fetch changes: git fetch

6. Step-by-Step Example <a name="6"></a>

We'll now walk through a detailed example of using Git for version control in a data science project.
Step 1: Initialize a Repository

Navigate to your project directory and initialize a Git repository:

bash

cd my-data-science-project
git init

Step 2: Add and Commit Files

Create some initial files and commit them:

bash

echo "# My Data Science Project" > README.md
git add README.md
git commit -m "Initial commit with README"

Step 3: Create a Branch

Create a new branch for developing a feature:

bash

git checkout -b feature/data-cleaning

Step 4: Make Changes and Commit

Make changes to your project and commit them:

bash

echo "import pandas as pd" > data_cleaning.py
git add data_cleaning.py
git commit -m "Add data cleaning script"

Step 5: Merge Changes

Switch back to the main branch and merge your feature branch:

bash

git checkout main
git merge feature/data-cleaning

Step 6: Push to Remote Repository

If you have a remote repository, push your changes:

bash

git remote add origin <repository-url>
git push -u origin main

7. Exercise <a name="7"></a>
Task

You are provided with a simple data science project structure. Your task is to:

    Initialize a Git repository.
    Create a main branch and make an initial commit.
    Create a new branch called feature/analysis.
    Add a script for data analysis in this branch.
    Commit the changes and merge the branch back into main.

Requirements

    The initial commit should include a README.md file.
    The feature/analysis branch should contain a script data_analysis.py.
    The final main branch should have all changes merged.

Solution

    Initialize a repository:

    bash

cd my-data-science-project
git init

Add and commit initial files:

bash

echo "# My Data Science Project" > README.md
git add README.md
git commit -m "Initial commit with README"

Create and switch to a new branch:

bash

git checkout -b feature/analysis

Add a script for data analysis:

bash

echo "import pandas as pd\nimport numpy as np" > data_analysis.py
git add data_analysis.py
git commit -m "Add data analysis script"

Merge the branch back into main:

bash

git checkout main
git merge feature/analysis