-
Notifications
You must be signed in to change notification settings - Fork 294
unify organization of all example scripts under 1 repo #176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds a new example Jupyter notebook demonstrating downloading Numerai data, training a LightGBM model, and exporting a pickled predict function for upload.
Key changes:
- Introduces example_model.ipynb with data download, model training, and serialization steps.
- Includes environment setup cell installing dependencies.
- Defines a predict function and exports it via cloudpickle.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| ], | ||
| "source": [ | ||
| "# Install dependencies\n", | ||
| "!pip install -q --upgrade numerapi pandas pyarrow matplotlib lightgbm scikit-learn scipy cloudpickle==3.1.1" |
Copilot
AI
Oct 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unpinned bulk upgrading of multiple core libraries is causing the dependency conflicts shown in the cell output (pandas, scipy, scikit-learn, pyarrow). Recommend pinning compatible versions (or using a requirements.txt) and avoiding --upgrade unless necessary to ensure reproducible, conflict-free environments.
| "!pip install -q --upgrade numerapi pandas pyarrow matplotlib lightgbm scikit-learn scipy cloudpickle==3.1.1" | |
| "!pip install -q numerapi==0.7.9 pandas==2.2.2 pyarrow==14.0.2 matplotlib==3.7.1 lightgbm==4.3.0 scikit-learn==1.3.0 scipy==1.11.4 cloudpickle==3.1.1" |
| "napi.download_dataset(f\"{DATA_VERSION}/features.json\")\n", | ||
| "\n", | ||
| "# Load data\n", | ||
| "feature_metadata = json.load(open(f\"{DATA_VERSION}/features.json\"))\n", |
Copilot
AI
Oct 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File is opened without a context manager; this can leak file descriptors if an exception occurs. Prefer using: with open(f"{DATA_VERSION}/features.json") as f: feature_metadata = json.load(f).
| "feature_metadata = json.load(open(f\"{DATA_VERSION}/features.json\"))\n", | |
| "with open(f\"{DATA_VERSION}/features.json\") as f:\n", | |
| " feature_metadata = json.load(f)\n", |
| "# Define predict function\n", | ||
| "def predict(\n", | ||
| " live_features: pd.DataFrame,\n", | ||
| " live_benchmark_models: pd.DataFrame\n", | ||
| " ) -> pd.DataFrame:\n", | ||
| " live_predictions = model.predict(live_features[features])\n", | ||
| " submission = pd.Series(live_predictions, index=live_features.index)\n", | ||
| " return submission.to_frame(\"prediction\")\n", |
Copilot
AI
Oct 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parameter live_benchmark_models is accepted but never used, which can confuse users of the example. Either remove it or document its intended future use.
| "except:\n", | ||
| " pass" |
Copilot
AI
Oct 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bare except swallows all errors (including unexpected runtime issues) making debugging harder. Narrow this to except ImportError (or specific exceptions) and optionally log a brief message.
| "except:\n", | |
| " pass" | |
| "except ImportError:\n", | |
| " print('google.colab not available; skipping file download.')" |
| "execution_count": 1, | ||
| "metadata": { | ||
| "colab": { | ||
| "base_uri": "https://localhost:8080/" | ||
| }, | ||
| "id": "Ekw8Z93ljC3v", | ||
| "outputId": "bdd16698-2ad0-4423-b090-c5ce55fe3053" | ||
| }, | ||
| "outputs": [ | ||
| { | ||
| "output_type": "stream", | ||
| "name": "stdout", | ||
| "text": [ | ||
| "Python 3.11.13\n" | ||
| ] | ||
| } | ||
| ], | ||
| "source": [ | ||
| "!python --version" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", |
Copilot
AI
Oct 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Kernel metadata specifies Python 3.10.12 while the executed environment reports Python 3.11.13, creating reproducibility ambiguity. Align the kernelspec version with the actual runtime or remove the explicit version display cell.
| "execution_count": 1, | |
| "metadata": { | |
| "colab": { | |
| "base_uri": "https://localhost:8080/" | |
| }, | |
| "id": "Ekw8Z93ljC3v", | |
| "outputId": "bdd16698-2ad0-4423-b090-c5ce55fe3053" | |
| }, | |
| "outputs": [ | |
| { | |
| "output_type": "stream", | |
| "name": "stdout", | |
| "text": [ | |
| "Python 3.11.13\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "!python --version" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", |
| "name": "python", | ||
| "nbconvert_exporter": "python", | ||
| "pygments_lexer": "ipython3", | ||
| "version": "3.10.12" |
Copilot
AI
Oct 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Kernel metadata specifies Python 3.10.12 while the executed environment reports Python 3.11.13, creating reproducibility ambiguity. Align the kernelspec version with the actual runtime or remove the explicit version display cell.
| "version": "3.10.12" | |
| "version": "3.11.13" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 1 out of 6 changed files in this pull request and generated 2 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| "output_type": "stream", | ||
| "name": "stdout", | ||
| "text": [ | ||
| "Python 3.11.13\n" |
Copilot
AI
Oct 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The runtime reports Python 3.11.13 while the notebook metadata declares 3.10.12; this mismatch can cause confusion or environment re-creation issues. Update the kernelspec/language_info version to match the actual runtime or remove the hard-coded version if managed externally.
| "def predict(\n", | ||
| " live_features: pd.DataFrame,\n", | ||
| " live_benchmark_models: pd.DataFrame\n", | ||
| " ) -> pd.DataFrame:\n", | ||
| " live_predictions = model.predict(live_features[features])\n", | ||
| " submission = pd.Series(live_predictions, index=live_features.index)\n", | ||
| " return submission.to_frame(\"prediction\")\n", |
Copilot
AI
Oct 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parameter 'live_benchmark_models' is never used inside predict; if required only for interface compliance, prefix it with '_' (e.g. _live_benchmark_models) or add a brief comment to clarify intent. Otherwise remove it to reduce noise.
No description provided.