Skip to content

Commit

Permalink
support exclusion of directories from scanning for pipeline specs (#143)
Browse files Browse the repository at this point in the history
* support exclusion of directories from scanning for pipeline specs

* update specs exclusion using .dpp_spec_ignore file

* lint

* fix find_specs to support pipelines-specs in default root_dir

* limit click dependency to prevent latest click 7.0 which is inocmpatible with tabulator click requirement

* Fixed in upstream tabulator

* v2.0.0

* Pattern for wrapping flows as a processor, extend stdout redirect to all processors

* add_computed_field, add_metadata -> dataflows

* lint

* Fix flows test

* Refactor concatenate processor

* Refactor delete_fields, duplicate

* Refactor filter

* Refactor find_replace processor

* Refactor set-types, sort-rows

* Refactor unpivot

* lint

* Refactor dump_to_sql

* Remove resource-matcher code

* lint

* Consolidate stat sources for flows

* Relax click dependency

* dump.to_sql -> dump_to_sql

* dump.to -> dump_to, add deprecation warnings on old forms

* add_metadata -> update_package

* Add 'load' processor

* Add 'printer' processor

* Remove 'no printing' restriction

* Documentation, missing file

* lint

* typo

* Fix dump_to_zip tests

* Fixing more tests

* Fixing more tests

* Fixing more tests

* Save datapackage location in stats for dump_to_path

* Fixing more tests

* Fixing more tests

* Avoid relative imports in processors

* bump dataflows

* Fix concatenate API for backward compatibility

* bump dataflows

* Fix tests

* bump dep versions

* Stdout redirect for high-level API as well

* Stdout redirect for high-level API as well

* Update README.md
  • Loading branch information
OriHoch authored and akariv committed Nov 12, 2018
1 parent a3c6745 commit 8a13e44
Show file tree
Hide file tree
Showing 6 changed files with 434 additions and 2 deletions.
1 change: 1 addition & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ test:
tests/cli/test_cli_exit_codes.sh &&\
tests/cli/test_cli_logs.sh &&\
tests/cli/test_custom_formatters.sh &&\
tests/cli/test_exclude_dirnames.sh &&\
tests/cli/test_flow.sh

version:
Expand Down
15 changes: 14 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,20 @@ This name is, in fact, the name of a Python script containing the processing cod
- If no processor was found until this point, it will try to search for this processor in the processor search path. The processor search path is taken from the environment variable `DPP_PROCESSOR_PATH`. Each of the `:` separated paths in the path is considered as a possible starting point for resolving the processor.
- Finally, it will try to find that processor in the Standard Processor Library which is bundled with this package.

### Excluding directories form scanning for pipeline specs

By default `.*` directories are excluded from scanning, you can add additional directory patterns for
exclusion by creating a `.dpp_spec_ignore` file at the project root. This file has similar syntax
to .gitignore and will exclude directories from scanning based on glob pattern matching.

For example, the following file will ignore `test*` directories including inside subdirectories
and `/docs` directory will only be ignored at the project root directory

```
test*
/docs
```

### Caching

By setting the `cached` property on a specific pipeline step to `True`, this step's output will be stored on disk (in the `.cache` directory, in the same location as the `pipeline-spec.yaml` file).
Expand Down Expand Up @@ -1025,7 +1039,6 @@ _Parameters_:
- `updated_id_column` - Optional name of a column that will be added to the spewed data and contain the id of the updated row in DB.

### ***`dump_to_path`***

Saves the datapackage to a filesystem path.

_Parameters_:
Expand Down
7 changes: 6 additions & 1 deletion datapackage_pipelines/specs/specs.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@

import yaml
from datapackage_pipelines.status import status_mgr
from datapackage_pipelines.utilities import dirtools

from .resolver import resolve_executor
from .errors import SpecError
Expand Down Expand Up @@ -37,7 +38,11 @@ def process_schedules(spec: PipelineSpec):


def find_specs(root_dir='.') -> PipelineSpec:
for dirpath, _, filenames in os.walk(root_dir):
for dirpath, dirnames, filenames in dirtools.Dir(root_dir,
exclude_file='.dpp_spec_ignore',
excludes=['.*']).walk():
relpath = os.path.relpath(dirpath, root_dir)
dirpath = os.path.join(root_dir, relpath) if relpath != '.' else '.'
if dirpath.startswith(os.path.join(root_dir, '.')):
continue
for filename in filenames:
Expand Down
Loading

0 comments on commit 8a13e44

Please sign in to comment.