New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
schedule run notebook example #324
Comments
So papermill doesn't do scheduling by itself. Instead think of it as a tool for executing notebooks that's easy to pass information into. In the provided examples alpha and ratio are just names of those inputs for that situation. You can pass any parameter name with any value into the notebook. Say you wanted to execute a notebook and pass the current date into it. You might call (assuming you're on Linux or Mac)
This would inject a variable called "today" into your notebook with a value of 20190226 (as of writing this post). To schedule this execution you can try following these directions on using crontab. This will show you how to run the script above on a schedule. To run on the first day of every month you'd add this to your crontab:
Hope that helps! |
Thank you for your comments! It makes more sense now. I'm using a windows laptop. I heard that crontab is not available for windows. Could you suggest any other method? |
https://stackoverflow.com/questions/132971/what-is-the-windows-version-of-cron links a few options depending on your OS version. |
I use a combination of Apache Airflow (https://airflow.apache.org/) and Papermill for very complex tasks that are scheduled and it works REALLY well. You'll need to write your own handler, an example could be: import os
import papermill as pm
from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
def execute_python_notebook_task(**context):
notebook_path = context['notebook_path']
out_path = context['out_path']
out_dir = os.path.dirname(out_path)
statement_parameters = context['statement_parameters'] if 'statement_parameters' in context else None
if not os.path.exists(out_dir):
os.makedirs(out_dir)
if callable(statement_parameters):
statement_parameters = statement_parameters(context)
pm.execute_notebook(
notebook_path,
out_path,
parameters=statement_parameters
)
seven_days_ago = datetime.combine(
datetime.today() - timedelta(7),
datetime.min.time()
)
default_args = {
'owner': 'airflow',
'start_date': seven_days_ago,
'provide_context': True,
}
dag_name = 'runnin_notebooks_yo'
schedule_interval = '@monthly'
with DAG(dag_name, default_args=default_args, schedule_interval=schedule_interval) as dag:
run_some_notebook_task = PythonOperator(
task_id='run_some_notebook_task',
python_callable=execute_python_notebook_task,
op_kwargs={
'notebook_path': 'path_to_some_notebook.ipynb',
'out_path': 'path_to_some_notebook.out.ipynb',
'statement_parameters': {
'parameter_1': 'some_value'
}
}
) Please note, Airflow is a pretty full featured tool which includes running branching dependencies of tasks, it may be overkill for what you want, but it is a pretty good tool for handling this sort of scheduling. |
@mbrio @otterotter408 If I'm not mistaken, Apache Airflow is a pain to install on Windows. |
I can attest that the bash on windows approach works quite well for 99% of tasks (though I haven't tried airflow explicitly with this) :D |
Airflow now have a PapermillOperator :) |
I need to schedule running my notebook scripts on the first day of every month. Trying to follow the instruction in parameter but still could not follow. How should I set up the parameters in my case?
From the example provided, it only mentions "alpha" and "ratio". What do they mean? Do I need to stick with these two variables for scheduling. and how do I make them represent " first day of every month"?
Thank you for your time.
The text was updated successfully, but these errors were encountered: