## Git API

This API provides the possibility to clone a git repository, checkout new branches to develop a feature and push it to a remote.

## Scope

* Configure a git provider
* Clone a repository
* Create a new branch
* Make a change
* Check git status
* Checkout changes
* Push changes
* Pull changes

In [1]:
import hopsworks

## Connect to the cluster

In [2]:
# Connect to your cluster, to be used running inside Jupyter or jobs inside the cluster.
connection = hopsworks.connection()

Connected. Call `.close()` to terminate connection gracefully.


In [3]:
# Uncomment when connecting to the cluster from an external environment.
# connection = hopsworks.connection(project='my_project', host='my_instance', port=443, api_key_value='apikey')

## Get Project

In [4]:
# Get the project object, if used inside your hopsworks cluster it gets the current project
project = connection.get_project()

In [5]:
# Uncomment to get specific project
# project = connection.get_project('my_project')

## Get the API

In [6]:
git_api = project.get_git_api()

## Configure a provider with your access token

In [7]:
PROVIDER="GitHub"

In [8]:
# Configure your GitHub provider, all actions on GitHub repositories will use this token
git_api.set_provider(PROVIDER, "my_user", "my_token")

In [9]:
git_api.get_providers()

[GitProvider('my_user', 'GitHub')]

In [10]:
provider = git_api.get_provider(PROVIDER)

In [11]:
git_api.get_providers()

[GitProvider('my_user', 'GitHub')]

In [12]:
provider.delete()

## Clone repository

In [13]:
REPO_URL="https://github.com/logicalclocks/hops-examples.git" # git repository
HOPSWORKS_FOLDER="Resources" # path in hopsworks filesystem to clone to
BRANCH="master" # optional branch to clone

In [14]:
# Clone the repository into hopsworks filesystem
examples_repo = git_api.clone(REPO_URL, HOPSWORKS_FOLDER, PROVIDER, branch=BRANCH)

2022-04-12 12:58:13,334 INFO: Running command CLONE, current status Initializing
2022-04-12 12:58:18,485 INFO: Running command CLONE, current status Initializing
2022-04-12 12:58:23,597 INFO: Running command CLONE, current status Running
2022-04-12 12:58:28,710 INFO: Running command CLONE, current status Running
2022-04-12 12:58:33,854 INFO: Running command CLONE, current status Running
2022-04-12 12:58:38,984 INFO: Running command CLONE, current status Running
2022-04-12 12:58:44,164 INFO: Running command CLONE, current status Running
2022-04-12 12:58:49,351 INFO: Running command CLONE, current status Running
2022-04-12 12:58:54,532 INFO: Running command CLONE, current status Running
2022-04-12 12:58:59,677 INFO: Running command CLONE, current status Running
2022-04-12 12:59:04,882 INFO: Running command CLONE, current status Running
2022-04-12 12:59:10,051 INFO: Running command CLONE, current status Running
2022-04-12 12:59:15,163 INFO: Running command CLONE, current status Running
20

In [15]:
# List all available git repos in the project
git_api.get_repos()

[GitRepo('hops-examples', 'admin@hopsworks.ai', 'GitHub', '/Projects/demo_ml_meb10000/Resources/hops-examples')]

## Create new branch

In [16]:
branch = "my_new_branch"
# Create new branch
examples_repo.checkout_branch(branch, create=True)

2022-04-12 13:00:48,518 INFO: Running command CREATE_CHECKOUT, current status Initializing
2022-04-12 13:00:53,660 INFO: Running command CREATE_CHECKOUT, current status Initializing
2022-04-12 13:00:58,810 INFO: Running command CREATE_CHECKOUT, current status Running
2022-04-12 13:01:03,987 INFO: Running command CREATE_CHECKOUT, current status Running
2022-04-12 13:01:09,178 INFO: Running command CREATE_CHECKOUT, current status Running
2022-04-12 13:01:14,304 INFO: Running command CREATE_CHECKOUT, current status Running
2022-04-12 13:01:19,514 INFO: Running command CREATE_CHECKOUT, current status Running
2022-04-12 13:01:24,734 INFO: Running command CREATE_CHECKOUT, current status Running
2022-04-12 13:01:29,944 INFO: Running command CREATE_CHECKOUT, current status Running
2022-04-12 13:01:35,141 INFO: Running command CREATE_CHECKOUT, current status Running
2022-04-12 13:01:40,345 INFO: Running command CREATE_CHECKOUT, current status Running
2022-04-12 13:01:45,540 INFO: Running comman

## Check status and checkout files

In [17]:
# Make a modification
dataset_api = project.get_dataset_api()
dataset_api.remove(examples_repo.path + "/tools")

In [18]:
# Check file status
status = examples_repo.status()
status

2022-04-12 13:02:06,893 INFO: Running command STATUS, current status Initializing
2022-04-12 13:02:12,030 INFO: Running command STATUS, current status Initializing
2022-04-12 13:02:17,286 INFO: Running command STATUS, current status Running
2022-04-12 13:02:22,415 INFO: Running command STATUS, current status Running
2022-04-12 13:02:27,578 INFO: Running command STATUS, current status Running
2022-04-12 13:02:32,735 INFO: Running command STATUS, current status Running
2022-04-12 13:02:37,867 INFO: Running command STATUS, current status Running
2022-04-12 13:02:43,056 INFO: Running command STATUS, current status Running
2022-04-12 13:02:48,151 INFO: Git command STATUS finished


[GitFileStatus('tools/maven/suppressions.xml', 'D', ''),
 GitFileStatus('tools/maven/checkstyle.xml', 'D', '')]

In [19]:
# Checkout changes
# Users can specify a list of GitFileStatus objects or files e.g. ["model_design_doc.md"]
examples_repo.checkout_files(status)

2022-04-12 13:02:48,793 INFO: Running command CHECKOUT_FILES, current status Initializing
2022-04-12 13:02:53,927 INFO: Running command CHECKOUT_FILES, current status Initializing
2022-04-12 13:02:59,073 INFO: Running command CHECKOUT_FILES, current status Running
2022-04-12 13:03:04,198 INFO: Running command CHECKOUT_FILES, current status Running
2022-04-12 13:03:09,311 INFO: Running command CHECKOUT_FILES, current status Running
2022-04-12 13:03:14,451 INFO: Running command CHECKOUT_FILES, current status Running
2022-04-12 13:03:19,563 INFO: Running command CHECKOUT_FILES, current status Running
2022-04-12 13:03:24,691 INFO: Running command CHECKOUT_FILES, current status Running
2022-04-12 13:03:29,882 INFO: Running command CHECKOUT_FILES, current status Running
2022-04-12 13:03:35,064 INFO: Running command CHECKOUT_FILES, current status Running
2022-04-12 13:03:40,388 INFO: Running command CHECKOUT_FILES, current status Running
2022-04-12 13:03:45,528 INFO: Running command CHECKOUT_

In [20]:
status = examples_repo.status()
status

2022-04-12 13:04:06,529 INFO: Running command STATUS, current status Initializing
2022-04-12 13:04:11,728 INFO: Running command STATUS, current status Initializing
2022-04-12 13:04:16,855 INFO: Running command STATUS, current status Running
2022-04-12 13:04:22,015 INFO: Running command STATUS, current status Running
2022-04-12 13:04:27,164 INFO: Running command STATUS, current status Running
2022-04-12 13:04:32,336 INFO: Running command STATUS, current status Running
2022-04-12 13:04:37,470 INFO: Running command STATUS, current status Running
2022-04-12 13:04:42,574 INFO: Git command STATUS finished
2022-04-12 13:04:42,576 INFO: Nothing to commit, working tree clean


## Commit a change

In [21]:
# Make a modification
dataset_api = project.get_dataset_api()
dataset_api.remove(examples_repo.path + "/tools")

In [22]:
# all: automatically stage files that have been modified and deleted, but new files are not affected
# files: list of new files to add and commit
examples_repo.commit("test commit", all=True)

2022-04-12 13:04:43,238 INFO: Running command COMMIT, current status Initializing
2022-04-12 13:04:48,343 INFO: Running command COMMIT, current status Initializing
2022-04-12 13:04:53,495 INFO: Running command COMMIT, current status Running
2022-04-12 13:04:58,637 INFO: Running command COMMIT, current status Running
2022-04-12 13:05:03,770 INFO: Running command COMMIT, current status Running
2022-04-12 13:05:08,934 INFO: Running command COMMIT, current status Running
2022-04-12 13:05:14,074 INFO: Running command COMMIT, current status Running
2022-04-12 13:05:19,212 INFO: Running command COMMIT, current status Running
2022-04-12 13:05:24,369 INFO: Running command COMMIT, current status Running
2022-04-12 13:05:29,543 INFO: Running command COMMIT, current status Running
2022-04-12 13:05:34,749 INFO: Running command COMMIT, current status Running
2022-04-12 13:05:39,883 INFO: Running command COMMIT, current status Running
2022-04-12 13:05:45,001 INFO: Running command COMMIT, current stat

In [23]:
examples_repo.get_commits(branch)

[GitCommit('Admin Admin', 'test commit', '0263939396275f966a76142ed970e2261085a923'),
 GitCommit('GitHub', '[HOPSWORKS-2543] AMEND: make stream as property to FG (#303)\n\n', '55b795b832ed98016f48a06627572b773c151f46'),
 GitCommit('GitHub', '[HOPSWORKS-3051] Add new Python extra to HSFS for python engine to replace hive (#302)\n\n', '558aed852459c886a9834d2c7fa1c85943580d36'),
 GitCommit('GitHub', 'Complex feature (#301)\n\n', '1d1a8bd65f14f01a7075785f8f382123634d9076'),
 GitCommit('GitHub', '[HOPSWORKS-2543] add support to insert_stream() to ingest data into offline fs (#293)\n\n', 'f59e89a9cdcfba776fa7656c049ee3587cda988f'),
 GitCommit('GitHub', 'Favicon hopsworks examples version 0 (#300)\n\n', 'a0de4c081ec6d2f2c2b6821cb44b337e57b5387f'),
 GitCommit('GitHub', '[HOPSWORKS-2918] Register builtin transformation functions in the backend (#298)\n\n', '2f8b7f32a76e96e6c54b0c3db9ef2b751d231433'),
 GitCommit('Javier de la Rúa Martínez', '[HOPSWORKS-2828][fix] Fix end_to_end_sklearn notebook

## Push/Pull

In [24]:
# Push branch to remote repository
#examples_repo.push(branch)

In [25]:
# Pull changes from remote repository
#examples_repo.pull(branch)

## Cleanup

In [26]:
examples_repo.delete_branch(branch)

2022-04-12 13:06:47,299 INFO: Running command DELETE, current status Initializing
2022-04-12 13:06:52,503 INFO: Running command DELETE, current status Initializing
2022-04-12 13:06:57,703 INFO: Git command DELETE finished


In [27]:
examples_repo.delete()