Skip to content
Merged
45 changes: 31 additions & 14 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ local disk:
$ cd openml-python
```

3. Swith to the ``develop`` branch:
3. Switch to the ``develop`` branch:

```bash
$ git checkout develop
Expand All @@ -31,7 +31,8 @@ local disk:
$ git checkout -b feature/my-feature
```

Always use a ``feature`` branch. It's good practice to never work on the ``master`` or ``develop`` branch! To make the nature of your pull request easily visible, please perpend the name of the branch with the type of changes you want to merge, such as ``feature`` if it contains a new feature, ``fix`` for a bugfix, ``doc`` for documentation and ``maint`` for other maintenance on the package.
Always use a ``feature`` branch. It's good practice to never work on the ``master`` or ``develop`` branch!
To make the nature of your pull request easily visible, please prepend the name of the branch with the type of changes you want to merge, such as ``feature`` if it contains a new feature, ``fix`` for a bugfix, ``doc`` for documentation and ``maint`` for other maintenance on the package.

4. Develop the feature on your feature branch. Add changed files using ``git add`` and then ``git commit`` files:

Expand Down Expand Up @@ -59,7 +60,15 @@ We recommended that your contribution complies with the
following rules before you submit a pull request:

- Follow the
[pep8 style guilde](https://www.python.org/dev/peps/pep-0008/).
[pep8 style guide](https://www.python.org/dev/peps/pep-0008/).
With the following exceptions or additions:
- The max line length is 100 characters instead of 80.
- When creating a multi-line expression with binary operators, break before the operator.
- Add type hints to all function signatures.
(note: not all functions have type hints yet, this is work in progress.)
- Use the [`str.format`](https://docs.python.org/3/library/stdtypes.html#str.format) over [`printf`](https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting) style formatting.
E.g. use `"{} {}".format('hello', 'world')` not `"%s %s" % ('hello', 'world')`.
(note: old code may still use `printf`-formatting, this is work in progress.)

- If your pull request addresses an issue, please use the pull request title
to describe the issue and mention the issue number in the pull request description. This will make sure a link back to the original issue is
Expand Down Expand Up @@ -105,18 +114,18 @@ tools:
$ pytest --cov=. path/to/tests_for_package
```

- No pyflakes warnings, check with:
- No style warnings, check with:

```bash
$ pip install pyflakes
$ pyflakes path/to/module.py
$ pip install flake8
Comment thread
PGijsbers marked this conversation as resolved.
$ flake8 --ignore E402,W503 --show-source --max-line-length 100
```

- No PEP8 warnings, check with:
- No mypy (typing) issues, check with:

```bash
$ pip install pep8
$ pep8 path/to/module.py
$ pip install mypy
$ mypy openml --ignore-missing-imports --follow-imports skip
```

Filing bugs
Expand Down Expand Up @@ -151,8 +160,8 @@ following rules before submitting:
New contributor tips
--------------------

A great way to start contributing to scikit-learn is to pick an item
from the list of [Easy issues](https://github.com/openml/openml-python/issues?q=label%3Aeasy)
A great way to start contributing to openml-python is to pick an item
from the list of [Good First Issues](https://github.com/openml/openml-python/labels/Good%20first%20issue)
in the issue tracker. Resolving these issues allow you to start
contributing to the project without much prior knowledge. Your
assistance in this area will be greatly appreciated by the more
Expand All @@ -175,6 +184,14 @@ information.

For building the documentation, you will need
[sphinx](http://sphinx.pocoo.org/),
[matplotlib](http://matplotlib.org/), and
[pillow](http://pillow.readthedocs.io/en/latest/).
[sphinx-bootstrap-theme](https://ryan-roemer.github.io/sphinx-bootstrap-theme/)
[sphinx-bootstrap-theme](https://ryan-roemer.github.io/sphinx-bootstrap-theme/),
[sphinx-gallery](https://sphinx-gallery.github.io/)
and
[numpydoc](https://numpydoc.readthedocs.io/en/latest/).
```bash
$ pip install sphinx sphinx-bootstrap-theme sphinx-gallery numpydoc
```
When dependencies are installed, run
```bash
$ sphinx-build -b html doc YOUR_PREFERRED_OUTPUT_DIRECTORY
```
3 changes: 3 additions & 0 deletions ci_scripts/flake8_diff.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
#!/bin/bash

# Update /CONTRIBUTING.md if these commands change.
# The reason for not advocating using this script directly is that it
# might not work out of the box on Windows.
flake8 --ignore E402,W503 --show-source --max-line-length 100 $options
mypy openml --ignore-missing-imports --follow-imports skip
13 changes: 10 additions & 3 deletions doc/contributing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -90,12 +90,19 @@ The package source code is available from
git clone https://github.com/openml/openml-python.git


Once you cloned the package, change into the new directory ``python`` and
execute
Once you cloned the package, change into the new directory.
If you are a regular user, install with

.. code:: bash

python setup.py install
pip install -e .

If you are a contributor, you will also need to install test dependencies

.. code:: bash

pip install -e ".[test]"


Testing
=======
Expand Down
2 changes: 2 additions & 0 deletions doc/progress.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ Changelog
0.9.0
~~~~~

* MAINT #596: Fewer dependencies for regular pip install.
* MAINT #652: Numpy and Scipy are no longer required before installation.
* ADD #560: OpenML-Python can now handle regression tasks as well.
* MAINT #184: Dropping Python2 support.

Expand Down
68 changes: 29 additions & 39 deletions openml/datasets/functions.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
import io
import os
import re
import warnings
from typing import List, Dict, Union

import numpy as np
Expand All @@ -10,11 +9,6 @@

import xmltodict
from scipy.sparse import coo_matrix
# Currently, importing oslo raises a lot of warning that it will stop working
# under python3.8; remove this once they disappear
with warnings.catch_warnings():
warnings.simplefilter("ignore")
from oslo_concurrency import lockutils
from collections import OrderedDict

import openml.utils
Expand All @@ -29,8 +23,7 @@
from ..utils import (
_create_cache_directory,
_remove_cache_dir_for_id,
_create_cache_directory_for_id,
_create_lockfiles_dir,
_create_cache_directory_for_id
)


Expand Down Expand Up @@ -334,6 +327,7 @@ def get_datasets(
return datasets


@openml.utils.thread_safe_if_oslo_installed
def get_dataset(dataset_id: Union[int, str], download_data: bool = True) -> OpenMLDataset:
""" Download the OpenML dataset representation, optionally also download actual data file.

Expand Down Expand Up @@ -361,38 +355,34 @@ def get_dataset(dataset_id: Union[int, str], download_data: bool = True) -> Open
raise ValueError("Dataset ID is neither an Integer nor can be "
"cast to an Integer.")

with lockutils.external_lock(
name='datasets.functions.get_dataset:%d' % dataset_id,
lock_path=_create_lockfiles_dir(),
):
did_cache_dir = _create_cache_directory_for_id(
DATASETS_CACHE_DIR_NAME, dataset_id,
)
did_cache_dir = _create_cache_directory_for_id(
DATASETS_CACHE_DIR_NAME, dataset_id,
)

try:
remove_dataset_cache = True
description = _get_dataset_description(did_cache_dir, dataset_id)
features = _get_dataset_features(did_cache_dir, dataset_id)
qualities = _get_dataset_qualities(did_cache_dir, dataset_id)

arff_file = _get_dataset_arff(description) if download_data else None

remove_dataset_cache = False
except OpenMLServerException as e:
# if there was an exception,
# check if the user had access to the dataset
if e.code == 112:
raise OpenMLPrivateDatasetError(e.message) from None
else:
raise e
finally:
if remove_dataset_cache:
_remove_cache_dir_for_id(DATASETS_CACHE_DIR_NAME,
did_cache_dir)

dataset = _create_dataset_from_description(
description, features, qualities, arff_file
)
try:
remove_dataset_cache = True
description = _get_dataset_description(did_cache_dir, dataset_id)
features = _get_dataset_features(did_cache_dir, dataset_id)
qualities = _get_dataset_qualities(did_cache_dir, dataset_id)

arff_file = _get_dataset_arff(description) if download_data else None

remove_dataset_cache = False
except OpenMLServerException as e:
# if there was an exception,
# check if the user had access to the dataset
if e.code == 112:
raise OpenMLPrivateDatasetError(e.message) from None
else:
raise e
finally:
if remove_dataset_cache:
_remove_cache_dir_for_id(DATASETS_CACHE_DIR_NAME,
did_cache_dir)

dataset = _create_dataset_from_description(
description, features, qualities, arff_file
)
return dataset


Expand Down
8 changes: 2 additions & 6 deletions openml/flows/functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@
import re
import xmltodict
from typing import Union, Dict
from oslo_concurrency import lockutils

from ..exceptions import OpenMLCacheException
import openml._api_calls
Expand Down Expand Up @@ -70,6 +69,7 @@ def _get_cached_flow(fid: int) -> OpenMLFlow:
"cached" % fid)


@openml.utils.thread_safe_if_oslo_installed
def get_flow(flow_id: int, reinstantiate: bool = False) -> OpenMLFlow:
"""Download the OpenML flow for a given flow ID.

Expand All @@ -87,11 +87,7 @@ def get_flow(flow_id: int, reinstantiate: bool = False) -> OpenMLFlow:
the flow
"""
flow_id = int(flow_id)
with lockutils.external_lock(
name='flows.functions.get_flow:%d' % flow_id,
lock_path=openml.utils._create_lockfiles_dir(),
):
flow = _get_flow_description(flow_id)
flow = _get_flow_description(flow_id)

if reinstantiate:
flow.model = flow.extension.flow_to_model(flow)
Expand Down
1 change: 1 addition & 0 deletions openml/runs/functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -466,6 +466,7 @@ def get_runs(run_ids):
return runs


@openml.utils.thread_safe_if_oslo_installed
def get_run(run_id):
"""Gets run corresponding to run_id.

Expand Down
58 changes: 24 additions & 34 deletions openml/tasks/functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,6 @@
import io
import re
import os
import warnings

# Currently, importing oslo raises a lot of warning that it will stop working
# under python3.8; remove this once they disappear
with warnings.catch_warnings():
warnings.simplefilter("ignore")
from oslo_concurrency import lockutils
import xmltodict

from ..exceptions import OpenMLCacheException
Expand Down Expand Up @@ -300,6 +293,7 @@ def get_tasks(task_ids, download_data=True):
return tasks


@openml.utils.thread_safe_if_oslo_installed
def get_task(task_id: int, download_data: bool = True) -> OpenMLTask:
"""Download OpenML task for a given task ID.

Expand All @@ -324,34 +318,30 @@ def get_task(task_id: int, download_data: bool = True) -> OpenMLTask:
raise ValueError("Dataset ID is neither an Integer nor can be "
"cast to an Integer.")

with lockutils.external_lock(
name='task.functions.get_task:%d' % task_id,
lock_path=openml.utils._create_lockfiles_dir(),
):
tid_cache_dir = openml.utils._create_cache_directory_for_id(
TASKS_CACHE_DIR_NAME, task_id,
)
tid_cache_dir = openml.utils._create_cache_directory_for_id(
TASKS_CACHE_DIR_NAME, task_id,
)

try:
task = _get_task_description(task_id)
dataset = get_dataset(task.dataset_id, download_data)
# List of class labels availaible in dataset description
# Including class labels as part of task meta data handles
# the case where data download was initially disabled
if isinstance(task, OpenMLClassificationTask):
task.class_labels = \
dataset.retrieve_class_labels(task.target_name)
# Clustering tasks do not have class labels
# and do not offer download_split
if download_data:
if isinstance(task, OpenMLSupervisedTask):
task.download_split()
except Exception as e:
openml.utils._remove_cache_dir_for_id(
TASKS_CACHE_DIR_NAME,
tid_cache_dir,
)
raise e
try:
task = _get_task_description(task_id)
dataset = get_dataset(task.dataset_id, download_data)
# List of class labels availaible in dataset description
# Including class labels as part of task meta data handles
# the case where data download was initially disabled
if isinstance(task, OpenMLClassificationTask):
task.class_labels = \
dataset.retrieve_class_labels(task.target_name)
# Clustering tasks do not have class labels
# and do not offer download_split
if download_data:
if isinstance(task, OpenMLSupervisedTask):
task.download_split()
except Exception as e:
openml.utils._remove_cache_dir_for_id(
TASKS_CACHE_DIR_NAME,
tid_cache_dir,
)
raise e

return task

Expand Down
Loading