openml · PGijsbers · Apr 16, 2019 · Apr 16, 2019 · Apr 16, 2019 · Apr 16, 2019
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -19,7 +19,7 @@ local disk:
    $ cd openml-python
    ```
 
-3. Swith to the ``develop`` branch:
+3. Switch to the ``develop`` branch:
 
    ```bash
    $ git checkout develop
@@ -31,7 +31,8 @@ local disk:
    $ git checkout -b feature/my-feature
    ```
 
-   Always use a ``feature`` branch. It's good practice to never work on the ``master`` or ``develop`` branch! To make the nature of your pull request easily visible, please perpend the name of the branch with the type of changes you want to merge, such as ``feature`` if it contains a new feature, ``fix`` for a bugfix, ``doc`` for documentation and ``maint`` for other maintenance on the package.
+   Always use a ``feature`` branch. It's good practice to never work on the ``master`` or ``develop`` branch! 
+   To make the nature of your pull request easily visible, please prepend the name of the branch with the type of changes you want to merge, such as ``feature`` if it contains a new feature, ``fix`` for a bugfix, ``doc`` for documentation and ``maint`` for other maintenance on the package.
 
 4. Develop the feature on your feature branch. Add changed files using ``git add`` and then ``git commit`` files:
 
@@ -59,7 +60,15 @@ We recommended that your contribution complies with the
 following rules before you submit a pull request:
 
 -  Follow the
-   [pep8 style guilde](https://www.python.org/dev/peps/pep-0008/).
+   [pep8 style guide](https://www.python.org/dev/peps/pep-0008/).
+   With the following exceptions or additions:
+    - The max line length is 100 characters instead of 80.
+    - When creating a multi-line expression with binary operators, break before the operator.
+    - Add type hints to all function signatures.
+    (note: not all functions have type hints yet, this is work in progress.)
+    - Use the [`str.format`](https://docs.python.org/3/library/stdtypes.html#str.format) over [`printf`](https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting) style formatting.
+     E.g. use `"{} {}".format('hello', 'world')` not `"%s %s" % ('hello', 'world')`.
+     (note: old code may still use `printf`-formatting, this is work in progress.)
 
 -  If your pull request addresses an issue, please use the pull request title
    to describe the issue and mention the issue number in the pull request description. This will make sure a link back to the original issue is
@@ -105,18 +114,18 @@ tools:
   $ pytest --cov=. path/to/tests_for_package
   ```
 
--  No pyflakes warnings, check with:
+-  No style warnings, check with:
 
   ```bash
-  $ pip install pyflakes
-  $ pyflakes path/to/module.py
+  $ pip install flake8
+  $ flake8 --ignore E402,W503 --show-source --max-line-length 100
   ```
 
--  No PEP8 warnings, check with:
+-  No mypy (typing) issues, check with:
 
   ```bash
-  $ pip install pep8
-  $ pep8 path/to/module.py
+  $ pip install mypy
+  $ mypy openml --ignore-missing-imports --follow-imports skip
   ```
 
 Filing bugs
@@ -151,8 +160,8 @@ following rules before submitting:
 New contributor tips
 --------------------
 
-A great way to start contributing to scikit-learn is to pick an item
-from the list of [Easy issues](https://github.com/openml/openml-python/issues?q=label%3Aeasy)
+A great way to start contributing to openml-python is to pick an item
+from the list of [Good First Issues](https://github.com/openml/openml-python/labels/Good%20first%20issue)
 in the issue tracker. Resolving these issues allow you to start
 contributing to the project without much prior knowledge. Your
 assistance in this area will be greatly appreciated by the more
@@ -175,6 +184,14 @@ information.
 
 For building the documentation, you will need
 [sphinx](http://sphinx.pocoo.org/),
-[matplotlib](http://matplotlib.org/), and
-[pillow](http://pillow.readthedocs.io/en/latest/).
-[sphinx-bootstrap-theme](https://ryan-roemer.github.io/sphinx-bootstrap-theme/)
+[sphinx-bootstrap-theme](https://ryan-roemer.github.io/sphinx-bootstrap-theme/),
+[sphinx-gallery](https://sphinx-gallery.github.io/)
+and
+[numpydoc](https://numpydoc.readthedocs.io/en/latest/).
+```bash
+$ pip install sphinx sphinx-bootstrap-theme sphinx-gallery numpydoc
+```
+When dependencies are installed, run
+```bash
+$ sphinx-build -b html doc YOUR_PREFERRED_OUTPUT_DIRECTORY
+```
diff --git a/ci_scripts/flake8_diff.sh b/ci_scripts/flake8_diff.sh
@@ -1,4 +1,7 @@
 #!/bin/bash
 
+# Update /CONTRIBUTING.md if these commands change.
+# The reason for not advocating using this script directly is that it
+# might not work out of the box on Windows.
 flake8 --ignore E402,W503 --show-source --max-line-length 100 $options
 mypy openml --ignore-missing-imports --follow-imports skip
diff --git a/doc/contributing.rst b/doc/contributing.rst
@@ -90,12 +90,19 @@ The package source code is available from
     git clone https://github.com/openml/openml-python.git
 
 
-Once you cloned the package, change into the new directory ``python`` and
-execute
+Once you cloned the package, change into the new directory.
+If you are a regular user, install with
 
 .. code:: bash
 
-    python setup.py install
+    pip install -e .
+
+If you are a contributor, you will also need to install test dependencies
+
+.. code:: bash
+
+    pip install -e ".[test]"
+
 
 Testing
 =======

diff --git a/doc/progress.rst b/doc/progress.rst
@@ -12,6 +12,8 @@ Changelog
 0.9.0
 ~~~~~
 
+* MAINT #596: Fewer dependencies for regular pip install.
+* MAINT #652: Numpy and Scipy are no longer required before installation.
 * ADD #560: OpenML-Python can now handle regression tasks as well.
 * MAINT #184: Dropping Python2 support.
 

diff --git a/openml/datasets/functions.py b/openml/datasets/functions.py
@@ -1,7 +1,6 @@
 import io
 import os
 import re
-import warnings
 from typing import List, Dict, Union
 
 import numpy as np
@@ -10,11 +9,6 @@
 
 import xmltodict
 from scipy.sparse import coo_matrix
-# Currently, importing oslo raises a lot of warning that it will stop working
-# under python3.8; remove this once they disappear
-with warnings.catch_warnings():
-    warnings.simplefilter("ignore")
-    from oslo_concurrency import lockutils
 from collections import OrderedDict
 
 import openml.utils
@@ -29,8 +23,7 @@
 from ..utils import (
     _create_cache_directory,
     _remove_cache_dir_for_id,
-    _create_cache_directory_for_id,
-    _create_lockfiles_dir,
+    _create_cache_directory_for_id
 )
 
 
@@ -334,6 +327,7 @@ def get_datasets(
     return datasets
 
 
+@openml.utils.thread_safe_if_oslo_installed
 def get_dataset(dataset_id: Union[int, str], download_data: bool = True) -> OpenMLDataset:
     """ Download the OpenML dataset representation, optionally also download actual data file.
 
@@ -361,38 +355,34 @@ def get_dataset(dataset_id: Union[int, str], download_data: bool = True) -> Open
         raise ValueError("Dataset ID is neither an Integer nor can be "
                          "cast to an Integer.")
 
-    with lockutils.external_lock(
-        name='datasets.functions.get_dataset:%d' % dataset_id,
-        lock_path=_create_lockfiles_dir(),
-    ):
-        did_cache_dir = _create_cache_directory_for_id(
-            DATASETS_CACHE_DIR_NAME, dataset_id,
-        )
+    did_cache_dir = _create_cache_directory_for_id(
+        DATASETS_CACHE_DIR_NAME, dataset_id,
+    )
 
-        try:
-            remove_dataset_cache = True
-            description = _get_dataset_description(did_cache_dir, dataset_id)
-            features = _get_dataset_features(did_cache_dir, dataset_id)
-            qualities = _get_dataset_qualities(did_cache_dir, dataset_id)
-
-            arff_file = _get_dataset_arff(description) if download_data else None
-
-            remove_dataset_cache = False
-        except OpenMLServerException as e:
-            # if there was an exception,
-            # check if the user had access to the dataset
-            if e.code == 112:
-                raise OpenMLPrivateDatasetError(e.message) from None
-            else:
-                raise e
-        finally:
-            if remove_dataset_cache:
-                _remove_cache_dir_for_id(DATASETS_CACHE_DIR_NAME,
-                                         did_cache_dir)
-
-        dataset = _create_dataset_from_description(
-            description, features, qualities, arff_file
-        )
+    try:
+        remove_dataset_cache = True
+        description = _get_dataset_description(did_cache_dir, dataset_id)
+        features = _get_dataset_features(did_cache_dir, dataset_id)
+        qualities = _get_dataset_qualities(did_cache_dir, dataset_id)
+
+        arff_file = _get_dataset_arff(description) if download_data else None
+
+        remove_dataset_cache = False
+    except OpenMLServerException as e:
+        # if there was an exception,
+        # check if the user had access to the dataset
+        if e.code == 112:
+            raise OpenMLPrivateDatasetError(e.message) from None
+        else:
+            raise e
+    finally:
+        if remove_dataset_cache:
+            _remove_cache_dir_for_id(DATASETS_CACHE_DIR_NAME,
+                                     did_cache_dir)
+
+    dataset = _create_dataset_from_description(
+        description, features, qualities, arff_file
+    )
     return dataset
 
 

diff --git a/openml/flows/functions.py b/openml/flows/functions.py
@@ -5,7 +5,6 @@
 import re
 import xmltodict
 from typing import Union, Dict
-from oslo_concurrency import lockutils
 
 from ..exceptions import OpenMLCacheException
 import openml._api_calls
@@ -70,6 +69,7 @@ def _get_cached_flow(fid: int) -> OpenMLFlow:
                                    "cached" % fid)
 
 
+@openml.utils.thread_safe_if_oslo_installed
 def get_flow(flow_id: int, reinstantiate: bool = False) -> OpenMLFlow:
     """Download the OpenML flow for a given flow ID.
 
@@ -87,11 +87,7 @@ def get_flow(flow_id: int, reinstantiate: bool = False) -> OpenMLFlow:
         the flow
     """
     flow_id = int(flow_id)
-    with lockutils.external_lock(
-            name='flows.functions.get_flow:%d' % flow_id,
-            lock_path=openml.utils._create_lockfiles_dir(),
-    ):
-        flow = _get_flow_description(flow_id)
+    flow = _get_flow_description(flow_id)
 
     if reinstantiate:
         flow.model = flow.extension.flow_to_model(flow)

diff --git a/openml/runs/functions.py b/openml/runs/functions.py
@@ -466,6 +466,7 @@ def get_runs(run_ids):
     return runs
 
 
+@openml.utils.thread_safe_if_oslo_installed
 def get_run(run_id):
     """Gets run corresponding to run_id.
 

diff --git a/openml/tasks/functions.py b/openml/tasks/functions.py
@@ -2,13 +2,6 @@
 import io
 import re
 import os
-import warnings
-
-# Currently, importing oslo raises a lot of warning that it will stop working
-# under python3.8; remove this once they disappear
-with warnings.catch_warnings():
-    warnings.simplefilter("ignore")
-    from oslo_concurrency import lockutils
 import xmltodict
 
 from ..exceptions import OpenMLCacheException
@@ -300,6 +293,7 @@ def get_tasks(task_ids, download_data=True):
     return tasks
 
 
+@openml.utils.thread_safe_if_oslo_installed
 def get_task(task_id: int, download_data: bool = True) -> OpenMLTask:
     """Download OpenML task for a given task ID.
 
@@ -324,34 +318,30 @@ def get_task(task_id: int, download_data: bool = True) -> OpenMLTask:
         raise ValueError("Dataset ID is neither an Integer nor can be "
                          "cast to an Integer.")
 
-    with lockutils.external_lock(
-            name='task.functions.get_task:%d' % task_id,
-            lock_path=openml.utils._create_lockfiles_dir(),
-    ):
-        tid_cache_dir = openml.utils._create_cache_directory_for_id(
-            TASKS_CACHE_DIR_NAME, task_id,
-        )
+    tid_cache_dir = openml.utils._create_cache_directory_for_id(
+        TASKS_CACHE_DIR_NAME, task_id,
+    )
 
-        try:
-            task = _get_task_description(task_id)
-            dataset = get_dataset(task.dataset_id, download_data)
-            # List of class labels availaible in dataset description
-            # Including class labels as part of task meta data handles
-            #   the case where data download was initially disabled
-            if isinstance(task, OpenMLClassificationTask):
-                task.class_labels = \
-                    dataset.retrieve_class_labels(task.target_name)
-            # Clustering tasks do not have class labels
-            # and do not offer download_split
-            if download_data:
-                if isinstance(task, OpenMLSupervisedTask):
-                    task.download_split()
-        except Exception as e:
-            openml.utils._remove_cache_dir_for_id(
-                TASKS_CACHE_DIR_NAME,
-                tid_cache_dir,
-            )
-            raise e
+    try:
+        task = _get_task_description(task_id)
+        dataset = get_dataset(task.dataset_id, download_data)
+        # List of class labels availaible in dataset description
+        # Including class labels as part of task meta data handles
+        #   the case where data download was initially disabled
+        if isinstance(task, OpenMLClassificationTask):
+            task.class_labels = \
+                dataset.retrieve_class_labels(task.target_name)
+        # Clustering tasks do not have class labels
+        # and do not offer download_split
+        if download_data:
+            if isinstance(task, OpenMLSupervisedTask):
+                task.download_split()
+    except Exception as e:
+        openml.utils._remove_cache_dir_for_id(
+            TASKS_CACHE_DIR_NAME,
+            tid_cache_dir,
+        )
+        raise e
 
     return task