Skip to content
This repository has been archived by the owner on Oct 13, 2023. It is now read-only.

6 Local Training #9

Merged
merged 45 commits into from
Oct 13, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
eba3160
Create Dockerfile and trainer.sh
Oct 5, 2020
6c29fbd
Install dependencies and execute a specified project
Oct 5, 2020
783a5b6
Modify trainer.sh to take ENV variable from docker environment
Oct 5, 2020
0746a59
Change base docker image from python3.8:buster to ubuntu:20.04
Oct 5, 2020
cd5f5d6
Implement local execution option
Oct 5, 2020
3ef357c
Add print Python 3.5 compatability
Oct 5, 2020
f11b878
Change project name locating to model path
Oct 6, 2020
ce61092
Fix not finding script location issue with relative path
Oct 6, 2020
93cf6f5
Design CLI tool to run in github repo
Oct 6, 2020
f3d618e
Fix git dependency fail
Oct 6, 2020
e28cee2
Adjust remote cloning based on commit sha
Oct 6, 2020
a8a385f
Log docker output into logs/docker folder in a time stamped log file
Oct 7, 2020
a3ddcd1
Allow user to define custom branch
Oct 7, 2020
f8779d7
Add fast local run time option and refactor out github repo checks
Oct 7, 2020
8d4120a
Modify path to store logs
Oct 7, 2020
275d14f
Remove branch sepecific clone for now
Oct 7, 2020
594134a
Add conda environment.yml dependency install support
Oct 7, 2020
a27a707
Add click argument for option of training script
Oct 8, 2020
efaadb6
Modify Github Token checking to test equality with None
Oct 8, 2020
7fb9fe3
Fix wrong variable name
Oct 8, 2020
66ada80
Fix docker build image issue (not building in hydra package folder
Oct 8, 2020
400792c
Add explanation to script
Oct 8, 2020
a145c91
Report exception for non-implemented parts
Oct 8, 2020
e680d85
Add tests for utility functions
Oct 8, 2020
5493b0a
Refactor json to string
Oct 8, 2020
671bb50
Use gitpython to get branch name, refactoring
Oct 8, 2020
b1194ed
Fix minor extra arg issue
Oct 8, 2020
72cb1ac
Refactor git check code for easier testing
Oct 8, 2020
0dbda0c
Add tests for github functions
Oct 8, 2020
7578468
Add test for CLI
Oct 9, 2020
fb93254
Update requirement
Oct 9, 2020
e1057a9
Check coverage percentage
Oct 9, 2020
937c6fa
Add tests for GitRepo class
Oct 9, 2020
cb871e2
Refactor GitRepo class into its single file
Oct 9, 2020
d2308ca
Refactor out running procedures into classes
Oct 9, 2020
a85e221
Add local platform training procedure
Oct 9, 2020
7adb47e
Add Google Cloud class basic outline
Oct 9, 2020
071be9d
Order dict items for python 3.5 compat to avoid flaky tests
Oct 10, 2020
64820af
Minor name change
Oct 10, 2020
b74e4cd
Fix OrderDict import
Oct 10, 2020
81f6fd2
Add test coverage to github workflow
Oct 13, 2020
05d5397
Add help for option flag
Oct 13, 2020
6270069
Revert breaking change to workflow
Oct 13, 2020
6cb8fa2
Workflow add test coverage
Oct 13, 2020
6489642
Add dependency in github workflow
Oct 13, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .coveragerc
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
[run]
omit = tests/*, setup.py
4 changes: 4 additions & 0 deletions .github/workflows/python-package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ jobs:
run: |
python -m pip install --upgrade pip
pip install flake8 pytest
pip install pytest-cov
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
- name: Lint with flake8
run: |
Expand All @@ -37,3 +38,6 @@ jobs:
- name: Test with pytest
run: |
pytest
- name: Display test coverage
run: |
pytest --cov=. tests/
6 changes: 6 additions & 0 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
FROM continuumio/miniconda3

ADD executor.sh /home
WORKDIR /home

ENTRYPOINT ["sh", "executor.sh"]
9 changes: 9 additions & 0 deletions docker/executor.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
mkdir project
cd project

git clone https://$OAUTH_TOKEN:x-oauth-basic@$GIT_URL .
git checkout $COMMIT_SHA

conda env create -f environment.yml

tsa87 marked this conversation as resolved.
Show resolved Hide resolved
conda run -n hydra $PREFIX_PARAMS python3 $MODEL_PATH
21 changes: 21 additions & 0 deletions docker/local_execution.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@

DIR="$( dirname "${BASH_SOURCE[0]}" )"

# Add random Hash
LOG_NAME=$(date +'%Y_%m_%d_%H_%M_%S')

cd $DIR
docker build -t hydra_image .

docker run \
-e GIT_URL=$1 \
-e COMMIT_SHA=$2 \
-e OAUTH_TOKEN=$3 \
-e MODEL_PATH=$4 \
-e PREFIX_PARAMS=$5 \
hydra_image:latest 2>&1 | tee ${LOG_NAME}.log

# Move Log file to where the program is being called
cd -
mkdir -p tmp/hydra
mv ${DIR}/${LOG_NAME}.log tmp/hydra/
42 changes: 34 additions & 8 deletions hydra/cli.py
Original file line number Diff line number Diff line change
@@ -1,18 +1,44 @@
import os
import click
from hydra.utils import *
from hydra.cloud.local_platform import LocalPlatform
from hydra.cloud.fast_local_platform import FastLocalPlatform
from hydra.version import __version__


@click.group()
@click.version_option(__version__)
def cli():
pass

@click.command()
@click.argument('name')
def hello(name):
click.echo('Hello %s!' % name)

@cli.command()
@click.option('--project_name')
@click.option('--model_name')
@click.option('--cpu')
@click.option('--memory')
@click.option('--options')
def train(project_name, model_name, cpu, memory, options):
click.echo("This is the training command")
@click.option('-m', '--model_path', required=True, type=str)
@click.option('-c', '--cpu', default=16, type=click.IntRange(0, 128), help='Number of CPU cores required')
@click.option('-r', '--memory', default=8, type=click.IntRange(0, 128), help='GB of RAM required')
@click.option('--cloud', default='local', required=True, type=click.Choice(['fast_local','local', 'aws', 'gcp', 'azure'], case_sensitive=False))
@click.option('--github_token', envvar='GITHUB_TOKEN') # Takes either an option or environment var
@click.option('-o', '--options', default='{}', type=str, help='Environmental variables for the script')
def train(model_path, cpu, memory, github_token, cloud, options):
prefix_params = json_to_string(options)

if cloud == 'fast_local':
platform = FastLocalPlatform(model_path, prefix_params)
platform.train()

return 0

check_repo(github_token)
git_url = get_repo_url()
commit_sha = get_commit_sha()

if cloud == 'local':
platform = LocalPlatform(model_path, prefix_params, git_url, commit_sha, github_token)
platform.train()

return 0

raise Exception("Reached parts of Hydra that are not yet implemented.")
Empty file added hydra/cloud/__init__.py
Empty file.
12 changes: 12 additions & 0 deletions hydra/cloud/abstract_platform.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@


class AbstractPlatform():
def __init__(self, model_path, prefix_params):
self.model_path = model_path
self.prefix_params = prefix_params

def train(self):
raise Exception("Not Implemented: Please implement this function in the subclass.")

def serve(self):
raise Exception("Not Implemented: Please implement this function in the subclass.")
13 changes: 13 additions & 0 deletions hydra/cloud/fast_local_platform.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
import os
from hydra.cloud.abstract_platform import AbstractPlatform

class FastLocalPlatform(AbstractPlatform):
def __init__(self, model_path, prefix_params):
super().__init__(model_path, prefix_params)

def train(self):
os.system(" ".join([self.prefix_params, 'python3', self.model_path]))
return 0

def serve(self):
pass
14 changes: 14 additions & 0 deletions hydra/cloud/google_cloud.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
from hydra.cloud.abstract_platform import AbstractPlatform

class GoogleCloud(AbstractPlatform):
def __init__(self, model_path, prefix_params, git_url, commit_sha, github_token):
self.git_url = git_url
self.commit_sha = commit_sha
self.github_token = github_token
super().__init__(model_path, prefix_params)

def train(self):
pass

def serve(self):
pass
21 changes: 21 additions & 0 deletions hydra/cloud/local_platform.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
import os
import subprocess
from hydra.cloud.abstract_platform import AbstractPlatform

class LocalPlatform(AbstractPlatform):
def __init__(self, model_path, prefix_params, git_url, commit_sha, github_token):
self.git_url = git_url
self.commit_sha = commit_sha
self.github_token = github_token
super().__init__(model_path, prefix_params)

def train(self):
execution_script_path = os.path.join(os.path.dirname(__file__), '../../docker/local_execution.sh')
command = ['sh', execution_script_path, self.git_url, self.commit_sha,
self.github_token, self.model_path, self.prefix_params]

subprocess.run(command)
return 0

def serve(self):
pass
21 changes: 21 additions & 0 deletions hydra/git_repo.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@

class GitRepo():
def __init__(self, repo):
self.repo = repo

def is_empty(self):
return self.repo.bare

def is_untracked(self):
return len(self.repo.untracked_files) > 0

def is_modified(self):
return len(self.repo.index.diff(None)) > 0

def is_uncommitted(self):
return len(self.repo.index.diff("HEAD")) > 0

def is_unsynced(self):
branch_name = self.repo.active_branch.name
count_unpushed_commits = len(list(self.repo.iter_commits('origin/{}..{}'.format(branch_name, branch_name))))
return count_unpushed_commits > 0
55 changes: 55 additions & 0 deletions hydra/utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
import re
import os
import git
import json
import warnings
import subprocess
from collections import OrderedDict
from hydra.git_repo import GitRepo


def json_to_string(packet):
od = json.loads(packet, object_pairs_hook=OrderedDict)

params = ""
for key, value in od.items():
params += key + "=" + str(value) + " "

return params.strip()


def get_repo_url():
git_url = subprocess.check_output("git config --get remote.origin.url", shell=True).decode("utf-8").strip()
git_url = re.compile(r"https?://(www\.)?").sub("", git_url).strip().strip('/')
return git_url


def get_commit_sha():
commit_sha = subprocess.check_output("git log --pretty=tformat:'%h' -n1 .", shell=True).decode("utf-8").strip()
return commit_sha


def check_repo(github_token, repo=None):
if github_token == None:
raise Exception("GITHUB_TOKEN not found in environment variable or as argument.")

if repo is None:
repo = git.Repo(os.getcwd())
repo = GitRepo(repo)

if repo.is_empty():
raise Exception("Hydra is not being called in the root of a git repo.")

if repo.is_untracked():
warnings.warn("Some files are not tracked by git.", UserWarning)

if repo.is_modified():
raise Exception("Some modified files are not staged for commit.")

if repo.is_uncommitted():
raise Exception("Some staged files are not commited.")

if repo.is_unsynced():
raise Exception("Some commits are not pushed to the remote repo.")

return 0
2 changes: 1 addition & 1 deletion hydra/version.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = '0.1.0'
__version__ = '0.1.0'
5 changes: 4 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1 +1,4 @@
click==7.1.2
click==7.1.2
pytest==6.1.1
pytest_mock==3.3.1
GitPython==3.1.9
56 changes: 56 additions & 0 deletions tests/test_cil.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
import pytest
from hydra.cli import *
from click.testing import CliRunner

VALID_MODEL_PATH = "d3bug.py"
VALID_REPO_URL = "https://georgian.io/"
VALID_COMMIT_SHA = "m1rr0r1ng"
VALID_FILE_PATH = "ones/and/zer0es"
VALID_GITHUB_TOKEN = "Georgian"
VALID_PREFIX_PARAMS = "{'epoch': 88}"

def test_hello_world():
runner = CliRunner()
result = runner.invoke(hello, ['Peter'])
assert result.exit_code == 0
assert result.output == 'Hello Peter!\n'

def test_train_local(mocker):
def stub(dummy):
pass

mocker.patch(
"hydra.cli.check_repo",
stub
)
mocker.patch(
"hydra.cli.get_repo_url",
return_value=VALID_REPO_URL
)
mocker.patch(
"hydra.cli.get_commit_sha",
return_value=VALID_COMMIT_SHA
)
mocker.patch(
"hydra.cli.os.path.join",
return_value=VALID_FILE_PATH
)
mocker.patch(
"hydra.cli.json_to_string",
return_value=VALID_PREFIX_PARAMS
)

mocker.patch(
'hydra.cli.subprocess.run',
)

runner = CliRunner()
result = runner.invoke(train, ['--model_path', VALID_MODEL_PATH, '--cloud', 'local', '--github_token', VALID_GITHUB_TOKEN])


subprocess.run.assert_called_once_with(
['sh', VALID_FILE_PATH,
VALID_REPO_URL, VALID_COMMIT_SHA, VALID_GITHUB_TOKEN,
VALID_MODEL_PATH, VALID_PREFIX_PARAMS])

assert result.exit_code == 0
4 changes: 0 additions & 4 deletions tests/test_dummy.py

This file was deleted.

Loading