support exclusion of directories from scanning for pipeline specs (#143)

* support exclusion of directories from scanning for pipeline specs * update specs exclusion using .dpp_spec_ignore file * lint * fix find_specs to support pipelines-specs in default root_dir * limit click dependency to prevent latest click 7.0 which is inocmpatible with tabulator click requirement * Fixed in upstream tabulator * v2.0.0 * Pattern for wrapping flows as a processor, extend stdout redirect to all processors * add_computed_field, add_metadata -> dataflows * lint * Fix flows test * Refactor concatenate processor * Refactor delete_fields, duplicate * Refactor filter * Refactor find_replace processor * Refactor set-types, sort-rows * Refactor unpivot * lint * Refactor dump_to_sql * Remove resource-matcher code * lint * Consolidate stat sources for flows * Relax click dependency * dump.to_sql -> dump_to_sql * dump.to -> dump_to, add deprecation warnings on old forms * add_metadata -> update_package * Add 'load' processor * Add 'printer' processor * Remove 'no printing' restriction * Documentation, missing file * lint * typo * Fix dump_to_zip tests * Fixing more tests * Fixing more tests * Fixing more tests * Save datapackage location in stats for dump_to_path * Fixing more tests * Fixing more tests * Avoid relative imports in processors * bump dataflows * Fix concatenate API for backward compatibility * bump dataflows * Fix tests * bump dep versions * Stdout redirect for high-level API as well * Stdout redirect for high-level API as well * Update README.md
frictionlessdata · Nov 12, 2018 · 8a13e44 · 8a13e44
1 parent a3c6745
commit 8a13e44
Show file tree

Hide file tree

Showing 6 changed files with 434 additions and 2 deletions.
diff --git a/Makefile b/Makefile
@@ -29,6 +29,7 @@ test:
 	tests/cli/test_cli_exit_codes.sh &&\
 	tests/cli/test_cli_logs.sh &&\
 	tests/cli/test_custom_formatters.sh &&\
+	tests/cli/test_exclude_dirnames.sh &&\
 	tests/cli/test_flow.sh
 
 version:

diff --git a/README.md b/README.md
@@ -163,6 +163,20 @@ This name is, in fact, the name of a Python script containing the processing cod
 - If no processor was found until this point, it will try to search for this processor in the processor search path. The processor search path is taken from the environment variable `DPP_PROCESSOR_PATH`. Each of the `:` separated paths in the path is considered as a possible starting point for resolving the processor.
 - Finally, it will try to find that processor in the Standard Processor Library which is bundled with this package.
 
+### Excluding directories form scanning for pipeline specs
+
+By default `.*` directories are excluded from scanning, you can add additional directory patterns for
+exclusion by creating a `.dpp_spec_ignore` file at the project root. This file has similar syntax
+to .gitignore and will exclude directories from scanning based on glob pattern matching.
+
+For example, the following file will ignore `test*` directories including inside subdirectories
+and `/docs` directory will only be ignored at the project root directory
+
+```
+test*
+/docs
+```
+
 ### Caching
 
 By setting the `cached` property on a specific pipeline step to `True`, this step's output will be stored on disk (in the `.cache` directory, in the same location as the `pipeline-spec.yaml` file).
@@ -1025,7 +1039,6 @@ _Parameters_:
 - `updated_id_column` - Optional name of a column that will be added to the spewed data and contain the id of the updated row in DB.
 
 ### ***`dump_to_path`***
-
 Saves the datapackage to a filesystem path.
 
 _Parameters_:

diff --git a/datapackage_pipelines/specs/specs.py b/datapackage_pipelines/specs/specs.py
@@ -3,6 +3,7 @@
 
 import yaml
 from datapackage_pipelines.status import status_mgr
+from datapackage_pipelines.utilities import dirtools
 
 from .resolver import resolve_executor
 from .errors import SpecError
@@ -37,7 +38,11 @@ def process_schedules(spec: PipelineSpec):
 
 
 def find_specs(root_dir='.') -> PipelineSpec:
-    for dirpath, _, filenames in os.walk(root_dir):
+    for dirpath, dirnames, filenames in dirtools.Dir(root_dir,
+                                                     exclude_file='.dpp_spec_ignore',
+                                                     excludes=['.*']).walk():
+        relpath = os.path.relpath(dirpath, root_dir)
+        dirpath = os.path.join(root_dir, relpath) if relpath != '.' else '.'
         if dirpath.startswith(os.path.join(root_dir, '.')):
             continue
         for filename in filenames: