Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maintain model name #16

Merged
merged 2 commits into from
Jun 2, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 3 additions & 7 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -27,19 +27,15 @@ test: test_3_9 test_3_10 test_3_11 ## Test all container versions

.PHONY: test_3_9
test_3_9: build_3_9 ## Test Python 3.9 pickle
docker run -i --rm --volume /tmp:/tmp ${NAME}_py_3_9:latest --model https://huggingface.co/pschork/hello-numerai-models/resolve/main/model_3.9.pkl

.PHONY: test_colab
test_colab: build_3_9 ## Test Python 3.9 pickle colab export
docker run -i --rm --volume /tmp:/tmp ${NAME}_py_3_9:latest --model https://huggingface.co/pschork/hello-numerai-models/resolve/main/colab_3.9.16.pkl
docker run -i --rm -v ${PWD}:${PWD} -v /tmp:/tmp ${NAME}_py_3_9:latest --model ${PWD}/tests/models/model_3_9.pkl

.PHONY: test_3_10
test_3_10: build_3_10 ## Test Python 3.10 pickle
docker run -i --rm --volume /tmp:/tmp ${NAME}_py_3_10:latest --model https://huggingface.co/pschork/hello-numerai-models/resolve/main/model_3.10.pkl
docker run -i --rm -v ${PWD}:${PWD} -v /tmp:/tmp ${NAME}_py_3_10:latest --model ${PWD}/tests/models/model_3_10.pkl

.PHONY: test_3_11
test_3_11: build_3_11 ## Test Python 3.11 pickle
docker run -i --rm --volume /tmp:/tmp ${NAME}_py_3_11:latest --dataset /tmp/v4.1/live.parquet --model https://huggingface.co/pschork/hello-numerai-models/resolve/main/model_3.11.pkl
docker run -i --rm -v ${PWD}:${PWD} -v /tmp:/tmp ${NAME}_py_3_11:latest --model ${PWD}/tests/models/model_3_11.pkl

.PHONY: release
release: release_3_9 release_3_10 release_3_11 ## Push all container tagged releases
Expand Down
20 changes: 12 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,28 @@
# Numerai Model Prediction Docker Environment

## Local testing of pickle models
You can run a local pickle model via
```
docker run -i --rm -v "$PWD:$PWD" numerai_predict_py_3_9:latest --model $PWD/model.pkl --dataset v4.1/live.parquet --debug
```

## Presigned S3 URLs
Presigned GET and POST urls are used to ensure that only the specified model is downloaded during execution
and that model prediction uploads from other models are not accessed or tampered with.

## Presigned S3 model URL
The `--model` arg is designed to accept a pre-signed S3 GET URL generated via boto3
```
params = dict(Bucket='numerai-user-models', Key='integration_test')
params = dict(Bucket='numerai-pickled-user-models',
Key='5a5a8da7-05a4-41bf-9c2b-7f61bab5b89b/model-Kc5pT9r85SRD.pkl')
presigned_model_url = s3_client.generate_presigned_url("get_object", params, ExpiresIn=600)
```

## Presigned S3 post URL
The `--post_url` and `--post_data` args are designed to accept a pre-signed S3 POST URL + urlencoded data dictionary
generated via boto3
```
presigned_post = s3_client.generate_presigned_post(Bucket='numerai-user-models', Key='integration_test', ExpiresIn=600)
presigned_post = s3_client.generate_presigned_post(Bucket='numerai-pickled-user-models-live-output',
Key='5a5a8da7-05a4-41bf-9c2b-7f61bab5b89b/live_predictions-b7446fc4cc7e.csv',
ExpiresIn=600)
post_url = presigned_post['url']
post_data = urllib.parse.urlencode(presigned_post['fields'])
```

## Docker container shell
You can get a shell in the container via `docker run -it --entrypoint /bin/bash numerai_predict_py3.9`

13 changes: 10 additions & 3 deletions predict.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ def parse_args():
return args


def predict(args):
def main(args):
if args.model.lower().startswith("http"):
truncated_url = args.model.split("?")[0]
logging.info(f"Downloading model {truncated_url}")
Expand All @@ -63,7 +63,8 @@ def predict(args):
logging.error(f"{response.reason} {response.text}")
sys.exit(1)

model_pkl = os.path.join(args.output_dir, f"model-{secrets.token_hex(6)}.pkl")
model_name = truncated_url.split("/")[-1]
model_pkl = os.path.join(args.output_dir, model_name)
logging.info(f"Saving model to {model_pkl}")
with open(model_pkl, "wb") as f:
shutil.copyfileobj(response.raw, f)
Expand Down Expand Up @@ -119,6 +120,12 @@ def predict(args):
except TypeError as e:
logging.error(f"Pickle function is invalid - {e}")
sys.exit(1)
except Exception as e:
if args.debug:
logging.exception(e)
else:
logging.error(e)
sys.exit(1)

logging.info(f"Generated {len(predictions)} predictions")
logging.debug(predictions)
Expand All @@ -141,4 +148,4 @@ def predict(args):


if __name__ == "__main__":
predict(parse_args())
main(parse_args())
Binary file added tests/models/model_3_10.pkl
Binary file not shown.
Binary file added tests/models/model_3_11.pkl
Binary file not shown.
Binary file added tests/models/model_3_9.pkl
Binary file not shown.