setup.cfg
allows to switch default entry point- Generate multiple notebook products from a single task (#708)
NotebookRunner
uploads partially executed notebook if it fails and a client is configured
- Adds support for choosing environment in
cloud.yaml
- Fixes error when running
python -m ploomber.onboard
on Linux - Moving email prompt to onboarding tutorial (#800)
- Adds onboarding command:
python -m ploomber.onboard
- Updating
pipeline.yaml
ifploomnber nb --format {fmt}
command changes extensions (#755)
- Adds documentation for
pipeline.yaml
meta
section - Adds many inline examples
- Improved docs for
pipeline.yaml
grid
ploomber task
prints a confirmation message upon successful executionDAG.close_clients()
only calls.close()
on each client once- Fixes
dag.plot()
error when dag needs rendering
- Fixes incompatibility with nbconvert 5 (#741)
- Improved error messages when the network fails while hitting the cloud build API
- Hiding posthog error logs (#744)
ploomber plot
uses D3 backend ifpygraphviz
is not installed- Request email (optional) after running
ploomber examples
for the first time - Changes to
ploomber cloud
- Compatibility with click
7.x
and8.x
(#719) - Deprecates casting for boolean
static_analysis
flag (#586)
- Support for
env.yaml
composition viameta.import_from
(#679) - Support for
webpdf
for notebook conversion (#675) - SQLAlchemyClient accepts URL object in the constructor (#699)
- Better error message when product has an incorrect extension (#472)
- Better error when
pipeline.yaml
in root directory/
(#497) - Better error message when
NotebookRunner
initialized with astr
(#705) - Error message when missing placeholder in
env.yaml
includes path to offending file - Fixes error when expanding complex args in
env.yaml
(#709) - Validating object returned by
import_tasks_from
(#686)
- [FEATURE] Custom name for products generated by
grid
tasks (#647) - [FEATURE] Execute notebooks/scripts without generating and output notebook via
ScriptRunner
(#614) - [FEATURE] more robust "Did you mean?" suggestions for product and task classes typos in
pipeline.yaml
- [BUGFIX]
ploomber nb --remove
works with.ipynb
files (#692) - [BUGFIX] Use
grid
andparams
inpipeline.yaml
(#522) - [DOC] Adds versioning user guide
- [DOC] Adds cloud user guide
- Better error message when failing to deepcopy a DAG (#670)
- Improvements to the
{{git}}
placeholder feature (#667) - Replaces DAG colors in
ploomber plot
with their RGB values for better compatibility - Pinning
jinja2
to preventnbconvert
from failing
- Style improvements to DAG plot (#650)
- DAG plot only includes task names by default (#393)
ploomber plot --include-products/-p
generates plots with task names and productsDAG.plot(include_products=True)
generates plots with task names and products- Fixes error when replacing file on Windows (#333)
- Fixes error message when config file does not exist (#652)
- Fixes typo in nb command (#665)
- Using UTF-8 for reading and writing in notebook tasks (#334)
- Clearer error message when DAG deepcopy fails
- Beta release of cloud pipeline monitoring
- More robust suggestions when invoking a non-existing command
- CLI loading performance improvements
- Prints message before starting to load the pipeline for better user feedback
- Displaying community link when DAG fails to render or build
- Improved documentation in "ploomber nb --help" (#623)
- Fixed a few errors in the basic concepts tutorial
- More informative error when task does not generate some products
- Better error when all the code is in the parameters cell
- Improves error message when
source
in a task spec is a string without an extension (#619) - Fixes error that caused
dag.render(force=True)
to download remote metadata - Simplify traceback when calling Ploomber task (#605)
- Emitting warning when
resources_
points to large files (#609) - Adds auto-completion steps to documentation (#612)
- Updates documentation to reflect new default format (
py:percent
) (#564) - Showing a mesage when a new version of Ploomber is available (#558)
- Cleaner tracebacks when DAG fails to build or render
- Automatically adding a parameters cell to scripts and notebooks if it's missing
NotebookRunner
static_analysis
behaves differently: it's less picky now, the old behavior default behavior can be turned on if passingstrict
, and can be turned off if passingdisable
(#566)- Improves many error messages for clarity
ploomber install
installs dependencies in the current virtual environment by defaultploomber install
works in systems wherepython
links to Python 2.7 (#435)ploomber install
uses lock files by default if they existploomber install
has options to customize its behaviorploomber scaffold
accepts one positional argument (#484)- Fixes an issue that caused
ploomber nb
to hide traceback when failed to load pipeline (#468)
- Fixed error when parsing cell magics with inline python
- Fixed misspelling in
pygraphviz
error message (#575)
- Sets minimum networkx version (#536)
- Updates documentation links to the new domain (#549)
- Suggests adding the appropriate
pygraphviz
version depending on the Python version (#539) - Improved error message when
pipeline.yaml
does not exist (#517) - Fixes error when scaffolding functions
- Adds SQL runtime parameters
SQLScript
andSQLDump
display source code whenclient.execute
fails- Clearer error message when
NotebookRunner
fails to initialize cli_endpoint
decorator hides traceback when raisingBaseException
errorsDAGSpec
andTaskSpec
errors raised asDAGSpecInitializationError
- Less verbose
ploomber examples
output
- Better user feedback after running
ploomber nb --inject
- Fixed
ploomber nb --inject
whenpipeline.yaml
has.ipynb
files
- Adds
ploomber nb --single-click/--single-click-disable
to enable/disable opening.py
as notebooks with a click on Jupyter ploomber nb
no longer requires a valid entry point if the selected option doesn't need one- Better error message when
Pool
in theSerial
executor raisesRuntimeError
- Notebook static analysis: Better support for IPython magics, support for inline shell (
! echo hi
). closes #478
- Documents
S3Client
andGCloudStorageClient
- Updates to the telemetry module
- Fixes error message when failing to load dotted paths
ploomber scaffold
now supports.R and .Rmd
files (#476)- Fixes an error that caused
ploomber scaffold
to ignore the location of existing packages (#459) - Better error message when running
ploomber execute/run
(suggestsploomber build
) - Better error message when passing positional arguments to
ploomber build
(suggestsploomber task
)
- Fixes an error in the telemetry module
- Improved anonymous user statistics
PLOOMBER_STATS_ENABLED
environment variable can be used to disable stats- Improved error message when a dotted path fails to load (#410)
ploomber scaffold
creates missing modules when adding functions (#332, @fferegrino)NotebookRunner
creates product's parent directories before running (#460)
- Adds
ploomber nb
command for integration with VSCode, PyCharm, Spyder, etc. - Adds methods for saving and removing injected cells to
NotebookSource
- Adds methods for pairing and syncing to
NotebookSource
- Fixes #448:
SQLUpload
ignoringio_handler
- Fixes #447:
pipeline.yaml
supports passing custom init parameters toexecutor
- Adds optional anonymous user statistics
- Fixes
{{root}}
expansion when path_to_here is different than the current working directory - Better error message when initializing
MetaProduct
with non-products - Adds refactoring section (
soorgeon
) to the user guide - Adds shell scripts user guide
Commander
allowsjinja2.Environment
customization
GenericSource
supports extracting upstream
- Fixes an error that caused
copy.deepcopy
to fail onSourceLoader
- Adds
{{now}}
(current timestamp in ISO 8601 format) to default placeholders - Adds
--output/-o
toploomber examples
to change output directory
- Adds
--log-file/-F
option to CLI to log to a file - Clearer error message when a task in a
pipeline.yaml
hasgrid
andparams
- Right bar highlighting fixed
normalize_python
returns input if passed non-Python code- Better error message when requesting an unknown example in
ploomber examples
- Better error message when
ploomber examples
fails due to an unexpected error - Fixes an error in
ploomber examples
that caused the--branch/-b
argument to be ignored
- Adds support for using
grid
and task-level hooks in spec API
- Allow serialization of a subset of params (#338)
- NotebookRunner
static_analysis
turned on by default - NotebookRunner
static_analysis
ignores IPython magics - Improved error message when NotebookRunner
static_analysis
fails - Support for collections in
env.yaml
- Adds
unpack
argument toserializer
/unserializer
decorators to allow a variable number of outputs - General CSS documentation improvements
- Mobile-friendly docs
- Add table explaining each documentation section
- Adds hooks, serialization, debugging, logging, and parametrization cookbook
- Adds FAQ on tasks with a variable number of outputs
- Auto-documenting methods/attributes for classes in the Python API section
- Documents
io
module
- Refactors scripts/notebooks
static_analysis
feature - Shows warning if using default value in scripts/notebooks
static_analysis
parameter - Better error message when
DAG
has duplicated task names - Adds more info to the files generated by ploomber scaffold
- Better error when trying to initialize a task from a path with an unknown extension
- Support for dag-level hooks in Spec API
- Better error message when invalid extension in
NotebookRunner
product - Fixes an error when loading nested templates on Windows
- Task hooks (e.g.,
on_finish
) accept custom args
- Fixes look up of conda root when running
ploomber install
when conda binary is inside theLibrary
directory (Windows) - No longer looking up pip inside conda when running
ploomber install
andsetup.py
does not exist - Adds
--use-lock/-l
option toploomber install
to install using lock files
- Simplifies serializer and unserializer creation with new decorators
- Adds guide on serializer/unserializer decorators to the documentation
- Clearer error message when failing to import function
- Better error message when
tasks
in YAML spec is not a list - Fixes an issue that caused dag plot to fail when using
.svg
- Fixes duplicated log entries when viewing a file in Jupyter
- Fixes cell injection when using the
--notebook-dir
during Jupyter initialization - Reduces verbosity in Jupyter logs (#314)
- Adds
tasks[*].params.resources_
to track changes in external files - Minor bug fixes
- Lazy load for
serializer
,unserialize
, DAG clients, Task clients, Product clients, and task hooks, which allows the Jupyter plugin to work even if the Jupyter process does not have the dependencies required to import such dotted paths - CLI
--help
message shows ifENTRY_POINT
environment variable is defined ploomber scaffold
now takes a-e/--entry-point
optional argument- Fixes error that caused the
{{here}}
placeholder not to work if anenv.yaml
exists - Adds
--empty
option toploomber scaffold
to create apipeline.yaml
with no tasks
- Allowing
pipeline.yaml
at project root if setup.py butsrc/*/pipeline.yaml
is missing - Fixes bug in
EnvDict.find
that caused the{{here}}
placeholder to point to theenv.yaml
file instead of its parent DAGSpec._find_relative
returns relative path to spec- Fixes error that missed
env.yaml
loading when initializing DAGSpecPartial
- Changes the logic that determines project root: only considers
pipeline.yaml
andsetup.py
(instead ofenvironment.yml
orrequirements.txt
) - Adds configuration and scaffold user guides
- Updates Jupyter user guide
- Deletes conda user guide
- Renames internal modules for consistency (this should not impact end-users)
- Fixes error that caused Files generated from TaskGroups in the spec API not to resolve to their absolute values
- Fixes error that caused metadata not to delete on when saving files in Jupyter if using a source in more than one task
DAGSpec
loads anenv.{name}.yaml
file when loading apipeline.{name}.yaml
if one existsploomber plot
saves topipeline.{name}.png
- Override
env.yaml
to load usingPLOOMBER_ENV_FILENAME
environment variable EnvDict
init no longer searches recursively, moved that logic toEnvDict.find
.with_env
decorator now uses the latter to prevent breaking the APIPostgresCopyFrom
compatible withpsycopg>=2.9
jupyter_hot_reload=True
by defaultPythonCallableSource
finds the location of a dotted path without importing any of the submodules- Jupyter integration lazily loads DAGs (no need to import callable tasks)
- CLI no longer showing
env.yaml
parameters when initializing from directory or pattern
- Task's
metadata.params
storesnull
if any parameter isn't serializable - Task status ignores
metadata.params
if they arenull
- Fixes unserialization when an upstream task produces a
MetaProduct
- Adds
remote
parameter toDAG.render
to check status against remote storage NotebookSource
no longer includes the injected cell in itsstr
representationMetadata
uses task params to determine task status- Support for wildcards when building dag partially
- Support to skip upstream dependencies when building partially
- Faster
File
remote metadata downloads using multi-threading duringDAG.render
- Faster upstream dependencies parallel download using multi-threading during
Task.build
- Suppresses papermill
FutureWarning
due to importing a deprecatedpyarrow
module - Fixes error that caused a warning due to unused env params when using
import_tasks_from
- Other bug fixes
DAGSpec.find
exposesstarting_dir
parameterploomber install
supportspip
'srequirements.txt
filesploomber install
supports non-packages (i.e., nosetup.py
)ploomber scaffold
flags to use conda (--conda
) and create package (--package
)
ParamGrid
supports initialization from a list- Adds
tasks[*].grid
to generate multiple tasks at once - Support for using wildcards to declare dependencies (e.g.,
task-*
) - Fixes to
ploomber scaffold
andploomber install
PythonCallable
creates parent directories before execution- Support for the parallel executor in Spec API
DagSpec.find
exposeslazy_import
argumentTaskGroup
internal API changes
GCloudStorageClient
loads credentials relative to the project root- Adds
ploomber install
- Adds
S3Client
DAGSpec
warns if parameter declared in env but unused- Implements
{SQLDump, NotebookRunner, PythonCallable}.load()
File.client
downloads during task execution instead of render- Adds
ploomber.OnlineModel
, which provides a simpler API thanOnlineDAG
for models that implement a.predict()
method - Adds function to find package name if using standard layout
- Changes
extract_product
default in spec API to False - Tasks get a default name equal to the filename without extension (e.g., plot.py -> plot)
File
saves metadata in a.{filename}.metadata
file instead of{filename}.source
- Adds
ploomber examples
command - Adds Deployment guide to documentation
EnvDict
loadsenv.yaml
and uses values as defaults when passing a custom dict- Simpler repr for SQL products
- Improved Spec API docs
- Adds
ploomber.tasks.TaskGroup.from_params
to create multiple tasks at once
- Changes a lot of error messages for clarity
- Clearer
__repr__
forPlaceholder
,File
, andMetaProduct
- Default placeholders can be used in
pipeline.yaml
without definingenv.yaml
- Better formatting for displaying DAG build and render errors
- Spec API initializes task spec as
SQLDump
if product has suffix.csv
or.parquet
- Coloring CLI error traceback
- Spec API skips
SourceLoader
if passing an absolute path DAG.clients
validates keys (usingDAGClients
)params
available as hook argument- Rewritten Spec API documentation
- Better display of errors when building or rendering a DAG (layout and colors)
File
implements theos.PathLike
interface (this works now:pandas.read_parquet(File('file.parquet'))
)- Several error messages refactored for clarity
- Adds
DAGSpec.find()
to automatically findpipeline.yaml
- Adds
OnlineDAG
to convertDAG
objects for in-memory inference - Spec API (
pipeline.yaml
) supports DAG-level and Task-levelserializer
andserializer
- CLI looks for
src/{pkg}/pipeline.yaml
ifpipeline.yaml
doesn't exist - Adds
{{cwd}}
placeholder forenv.yaml
that expands to current working directory
- Support for Python 3.9
SQLAlchemyClient
now accepts an argument to pass custom parameters tosqlalchemy.create_engine
- Temporarily pins papermill version due to an incompatibility with jupytext and nbformat (jupytext does not support cell ids yet)
- Adds
--on-finish/-of
toploomber task
to execute theon_finish
hook - DAGs with R notebooks can render even if the ir kernel is not installed
File
now supports aclient
argument to upload products to cloud storage- Adds
GCloudStorageClient
- Fixes error that caused jupyter to fail to initialize the dag when adding a function to a module already included in the YAML spec
- Fixes IPython namespace errors when using
ploomber interact
- Adds
ploomber.testing.sql.assert_no_duplicates_in_column
to check for record duplicates and optionally show duplicates statistics - Deprecates a few internal methods:
Table.save
,DAG.to_dict()
,Task.to_dict()
- Improvements to SQL static analyzer to warn when relations created
by a SQL script do not match
Product
- A few changes to
Metadata
(internal API) to cover some edge cases - Warning when
Product
metadata is corrupted - Adds new
meta.import_tasks_from
option in YAML specs to import tasks from another file
- Deprecates
ploomber new
andploomber add
- Adds
ploomber scaffold
- Jupyter plugin now exports functions as notebooks using
jupyter_functions_as_notebooks
inpipeline.yaml
ploomber add
generates template tasks and functions if they don't exist- Jupyter plugin now shows PythonCallable tasks as notebooks
- Documentation tutorials re-organization and CSS fixes
- Improvements to the
InMemoryDAG
API - Minor bug fixes
File.__repr__
shows a relative path whenever possible
- Adds support for passing glob-like patterns in
ploomber build
(viaDAGSpec.from_directory
)
- Full Windows compatibility
- Adds documentation to show how to customize notebook output using
nbconvert
- Improvements to introductory tutorials
- Adds
--debug/-d
option toploomber build
to drop a debugger if an exception happens - Ensuring all dag-level, task-level and product-level clients are
closed after
dag.build()
is done - Minor bug fixes
- Removes
matplotlib
from dependencies, now usingIPython.display
for inline plotting - Fixes bug that caused custom args to
{PythonCallable, NotebookRunner}.develop(args="--arg=value")
not to be sent correctly to the subprocess NotebookRunner
(initialized from ipynb) only considers the actual code as its source, ignores the rest of the JSON contents- Fixes bug when
EnvDict
was initialized from anotherEnvDict
PythonCallableSource
can be initialized with dotted pathsDAGSpec
loadsenv.yaml
when initialized with a YAML spec and there is aenv.yaml
file in the spec parent folderDAGSpec
converts relative paths in sources to be so to the project's root folder- Adds
lazy_import
toDAGspec
, to avoid importingPythonCallable
sources (passes the dotted paths as strings instead)
ploomber interact
allows to switch DAG parameters, just likeploomber build
- Adds
PythonCallable.develop()
to develop Python functions interactively NotebookRunner.develop()
to develop now also works with Jupyter lab
- Dropping support for Python 3.5
- Removes
DAGSpec.from_file
, loading from a file is now handled directly by theDAGSpec
constructor - Performance improvements, DAG does not fetch metadata when it doesn't need to
- Factory functions: Bool parameters with default values are now represented as flags when called from the CLI
- CLI arguments to replace values from
env.yaml
are now built with double hyphens instead of double underscores NotebookRunner
creates parent folders for output file if they don't exist- Bug fixes
- NotebookRunner.develop accepts passing arguments to jupyter notebook
- Spec API now supports PythonCallable (by passing a dotted path)
- Upstream dependencies of PythonCallables can be inferred via the
extract_upstream
option in the Spec API - Faster
DAG.render(force=True)
(avoid checking metadata when possible) - Faster notebook rendering when using the extension thanks to the improvement above
data_frame_validator
improvement:validate_schema
can now validate optional columns dtypes- Bug fixes
- Improved
__repr__
methods in PythonCallableSource and NotebookSource - Improved output layout for tables
- Support for nbconvert>=6
- Docstrings are parsed from notebooks and displayed in DAG status table (#242)
- Jupyter extension now works for DAGs defined via directories (via
ENTRY_POINT
env variable) - Adds Jupyter integration guide to documentation
- Several bug fixes
- Improved support for R notebooks (
.Rmd
) - New section for
testing.sql
module in the documentation
- New guides: parametrized pipelines, SQL templating, pipeline testing and debugging
NotebookRunner.debug(kind='pm')
for post-mortem debugging- Fixes bug in Jupyter extension when the pipeline has a task whose source is not a file (e.g. SQLDump)
- Fixes a bug in the CLI custom arg parser that caused dynamic params not to show up
DAGspec
now supportsSourceLoader
- Docstring (from dotted path entry point) is shown in the CLI summary
- Customized sphinx build to execute guides from notebooks
- Support for R
- Adding section on R pipeline to the documentation
- Construct pipeline from a directory (no need to write a
pipeline.yaml
file) - Improved error messages when DAG fails to initialize (jupyter notebook app)
- Bug fixes
- CLI accepts factory function parameters as positional arguments,
types are inferred using type hints, displayed when calling
--help
- CLI accepts env variables (if any), displayed when calling
--help
- Simplified CLI (breaking changes)
- Refactors internal API for notebook conversion, adds tests for common formats
- Metadata is deleted when saving a script from the Jupyter notebook app to make sure the task runs in the next pipeline build
- SQLAlchemyClient now supports custom tokens to split source
- Adding
--log
option to CLI commands - Fixes a bug that caused the
dag
variable not to be exposed during interactive sessions - Fixes
ploomber task
forced run - Adds SQL pipeline tutorial to get started docs
- Minor CSS changes to docs
- Support for
env.yaml
inpipeline.yaml
- Improved CLI. Adds
plot
,report
andtask
commands`
- Changes
pipeline.yaml
default (extract_product: True) - Documentation re-design
- Simplified
ploomber new
generated files - Ability to define
product
in SQL scripts - Products are resolved to absolute paths to avoid ambiguity
- Bug fixes
- Adds Jupyter notebook extension to inject parameters when opening a task
- Improved CLI
ploomber new
,ploomber add
andploomber entry
- Spec API documentation additions
- Support for
on_finish
,on_failure
andon_render
hooks in spec API - Improved validation for DAG specs
- Several bug fixes
- Reduces the number of required dependencies
- A new option in DBAPIClient to split source with a custom separator
- Adds CLI
- New spec API to instantiate DAGs using YAML files
- NotebookRunner.debug() for debugging and .develop() for interacive development
- Bug fixes
- PythonCallable.debug() now works in Jupyter notebooks
- PythonCallable.debug() now uses IPython debugger by default
- Improvements to Task.build() public API
- Moves hook triggering logic to Task to simplify executors implementation
- Adds DAGBuildEarlyStop exception to signal DAG execution stop
- New option in Serial executor to turn warnings and exceptions capture off
- Adds Product.prepare_metadata hook
- Implements hot reload for notebooks and python callables
- General clean ups for old
__str__
and__repr__
in several modules - Refactored ploomber.sources module and ploomber.placeholders (previously ploomber.templates)
- Adds NotebookRunner.debug() and NotebookRunner.develop()
- NotebookRunner: now has an option to run static analysis on render
- Adds documentation for DAG-level hooks
- Bug fixes
- Bug fixes #88, #89, #90, #84, #91
- Modifies Env API: Env() is now Env.load(), Env.start() is now Env()
- New advanced Env guide added to docs
- Env can now be used with a context manager
- Improved DAGConfigurator API
- Deletes logger configuration in executors constructors, logging is available via DAGConfigurator
- Dependencies cleanup
- Removed (numpydoc) as dependency, now optional
- A few bug fixes: #79, #71
- All warnings are captured and shown at the end (Serial executor)
- Moves differ parameter from DAG constructor to DAGConfigurator
- Cleaned up some modules, deprecated some rarely used functionality
- Improves documentation aimed to developers looking to extend ploomber
- Introduces DAGConfigurator for advanced DAG configuration [Experimental API]
- Adds task to upload files to S3 (ploomber.tasks.UploadToS3), requires boto3
- Adds DAG-level on_finish and on_failure hooks
- Support for enabling logging in entry points (via
--logging
) - Support for starting an interactive session using entry points (via python -i -m)
- Improved support for database drivers that can only send one query at a time
- Improved repr for SQLAlchemyClient, shows URI (but hides password)
- PythonCallable now validates signature against params at render time
- Bug fixes
- Faster Product status checking, now performed at rendering time
- New products: GenericProduct and GenericSQLRelation for Products that do not have a specific implementation (e.g. you can use Hive with the DBAPI client + GenericSQLRelation)
- Improved DAG build reports, subselect columns, transform to pandas.DataFrame and dict
- Parallel executor now returns build reports, just like the Serial executor
- DAG parallel executor
- Interact with pipelines from the command line (entry module)
- Bug fixes
- Refactored access to Product.metadata
- New Quickstart and User Guide section in documentation
- DAG rendering and build now continue until no more tasks can render/build (instead of failing at the first exception)
- New
@with_env
and@load_env
decorators for managing environments - Env expansion ({{user}} expands to the current, also {{git}} and {{version}} available)
Task.name
is now optional when Task is initialized with a source that has__name__
attribute (Python functions) or a name attribute (like Placeholders returned from SourceLoader)- New Task.on_render hook
- Bug fixes
- A lot of new tests
- Now compatible with Python 3.5 and higher
- Adds integration with pdb via PythonCallable.debug
- Env.start now accepts a filename to look for
- Improvements to data_frame_validator
- Simplifies installation
- Deletes BashCommand, use ShellScript
- More examples added
- Refactored env module
- Renames SQLStore to SourceLoader
- Improvements to SQLStore
- Improved documentation
- Renamed PostgresCopy to PostgresCopyFrom
- SQLUpload and PostgresCopy have now the same API
- A few fixes to PostgresCopy (#1, #2)
- First release