# Git

>[Git](https://en.wikipedia.org/wiki/Git) is a distributed version control system that tracks changes in any set of computer files, usually used for coordinating work among programmers collaboratively developing source code during software development.

This notebook shows how to load text files from `Git` repository.

## Load existing repository from disk

In [None]:
!pip install GitPython

In [15]:
from git import Repo
from dotenv import load_dotenv
import os

load_dotenv()
repo = ''
branch = ''

url = os.getenv("GIT_REPO_URL")
to_path = os.getenv("GIT_TO_PATH")
repo_path = os.getenv("GIT_REPO_PATH")
if not os.path.exists(repo_path):
    repo = Repo.clone_from(url=url, to_path=to_path)

else:
    repo = Repo(repo_path)
branch = repo.head.reference

In [16]:
from langchain.document_loaders import GitLoader

In [17]:
loader = GitLoader(repo_path=repo_path, branch=branch)

In [18]:
data = loader.load()

In [19]:
len(data)

1415

In [25]:
print(data[100])



TypeError: Object of type Document is not JSON serializable

## Clone repository from url

In [5]:
from langchain.document_loaders import GitLoader

In [2]:
loader = GitLoader(
    clone_url="https://github.com/hwchase17/langchain",
    repo_path="./example_data/test_repo2/",
    branch="master",
)

In [10]:
data = loader.load()

In [11]:
len(data)

1074

## Filtering files to load

In [12]:
from langchain.document_loaders import GitLoader

# eg. loading only python files
loader = GitLoader(repo_path="./example_data/test_repo1/", file_filter=lambda file_path: file_path.endswith(".py"))