Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Latest luigi #13

Open
wants to merge 27 commits into
base: real-latest-luigi
Choose a base branch
from
Open

Latest luigi #13

wants to merge 27 commits into from

Conversation

thisiscab
Copy link

testing

hugobast and others added 27 commits April 19, 2018 15:31
I think it would be neat if luigi.cfg could interpolate some environment
variables. I think this is the minimum required amount of work to make
this happen but I'm looking for guidance.
We're currently integration our solutions with DataDog. We wanted to
integrate that ability to send metrics to that service from our
Pipeline.

Doing such, will allow us to monitor the status of our Pipeline by
looking at statistics based on metrics sent by the pipeline.

At this moment, there is only one event that's supported but as the
feature progress forward, it's easy to see that we could support a bunch
more.

I had an implementation that was fairly basic at first but after
navigating on the existing PR against the official Luigi repo I've
discovered that there an ongoing implementation of exactly what we were
trying to achieve but with similar service. Thanks to chrispalmer, I've
been able to re-use his original work to implement ours.

spotify#2044
Added more significant name and values to the event data.
This will add an event of type "error" in DataDog telling us that a task
has failed running.
When a task is disabled, it means that it has failed multiple times
given a certain window.

That type of event is interesting to know about so implementing that
into DataDog will allow us to permanently log that information.
This will let us keep track on how many tasks has been started, failed,
disabled so that we can see a graphic and alert if things goes wrong.
Some users may want to namespace their metrics differently. We're
allowing that to be configured in the Luigi configuration file.
Task parameters will now be displayed as tags in DataDog.
Fix issue were already completed task would trigger another event.

We're hooking ourselves at a point where if the task was completed in
previous time, it would still call our DataDog tracking event. When that
case happen, we don't want to log an event again because we've already
logged it.
We were calling the wrong event for disabled tasks event. This would
cause problems since the method signature is different and thus crashing
the task.
Double empty lines
The name of the task shouldn't have a dimension into it, it should be
put into the tags section.
We can now improve the implementation quite a bit by being aware that
metrics can also have tags.

[ch8202] [ch8215]
This removes code duplication and reads much better.
Before adding this feature, we would have the metric namespaced with
different values. For instance we would have `luigi_production` for
production metrics and `luigi_staging` for staging metrics. This caused
a whole lot of problems.

Having this new feature will allow us to set up monitoring on multiple
different environment in parallel thus effectively reducing the number
of metrics we have to manage.
This is not necessary as we have a default value if set to nothing.
DataDog requires the use of `:` instead of `=`.
Instead of using the None value, we use an empty try. The check that we
do before using the value will also return False if we detect an empty
string.
Unfortunately, the datadog python library doesn't follow their own
convention of being configurable from a `datadog.conf` file.

We have to manually set those objects as configuration in Luigi and set
the relevant properties when initializing the `statsd` specialized class
of DataDog.
Previously, we had name that variable `default_event_tags` but it was
confusing because we were sending those tags for both events and
metrics.

Renaming that variables + adding a sane default value will clarify the
goal of that variable.

[ch9820]
In our current implementation of this pipeline, we are already sending
the `environment` parameter in all of our task.

In our DD contrib, we log all the parameters that are passed to a task
as a tag. That said, the environment tag is already set in our case.

If we were to have `env` then we would have both tags sent to DD for
every metrics / events which duplicated the numbers of that that we
trully want.

DD is clever enough that if a tag already exists, then it won't
duplicate it, so by default we want to log the `environment` so that if
it doesn't exists, it will create it, else it will use the one that we
pass to the task itself.

[ch9704]
This would yield an annoying warning telling us that we're passing an
INT when it was expecting a STRING.

This wasn't affecting anything, but today is the day that I'm removing
this warning.
@kcaputo12
Copy link

kcaputo12 commented Jan 28, 2019 via email

@thisiscab thisiscab changed the base branch from master to real-latest-luigi January 28, 2019 21:12
@thisiscab
Copy link
Author

We're working on removing you! :) Sorry for the annoying pings in the meantime. You can always unsubscribe to these notifications!

@kcaputo12
Copy link

kcaputo12 commented Jan 28, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants