Target not available #690

allwyn-pradip · 2022-03-25T07:12:31Z

This would fix the priority not available when creating a plan

…ification (linkedin#618) * create endpoint to build and render messages * update sender tests

* change cache plan delete log level (linkedin#619) * skip error logging if iris bot fails to send messages to a channel it is not in (linkedin#621) * skip error logging if iris bot fails to send messages to a channel it is not in * removed redundant slack api call and added a warning * added a return statement to keep the event from being counted as an error in the metrics Co-authored-by: Damian Durruty <ddurruty@ddurruty-mn2.linkedin.biz> * rackspace webhook support (linkedin#405) * Start rackspace webhook impl by copying alertmanager * Plumb plan name through URL parameter for rackspace webhook * add test for rackspace webhook (WIP) * fix rackspace webhook test data * rackspace webhook should parse plan inside its class This detail is only relevant to the rackspace webhook class and doesn't belong in api.py. This just reverts the changes in api.py and adds support for parsing the plan in the rackspace webhook class. * deduplicate webhook code with class inheritance authored-by: Patrick Baxter <pb@coreos.com> * Add support for custom Slack formatting via attachments/blocks (linkedin#624) * Update iris_slack.py * Update iris_slack.py * bumnp version * fix tests and incident id insert in slack message * remove extra arguments (linkedin#626) * added twillio number override mechanism (linkedin#627) * skip error logging if iris bot fails to send messages to a channel it is not in * removed redundant slack api call and added a warning * added a return statement to keep the event from being counted as an error in the metrics * added twillio number override mechanism * minor change in case application_override_mapping is not defined in the default config Co-authored-by: Damian Durruty <ddurruty@ddurruty-mn2.linkedin.biz> Co-authored-by: ddurruty-li <85372760+ddurruty-li@users.noreply.github.com> Co-authored-by: Damian Durruty <ddurruty@ddurruty-mn2.linkedin.biz> Co-authored-by: Patrick Baxter <patrickbx@gmail.com> Co-authored-by: Luke Young <bored-engineer@users.noreply.github.com>

* change cache plan delete log level (linkedin#619) * skip error logging if iris bot fails to send messages to a channel it is not in (linkedin#621) * skip error logging if iris bot fails to send messages to a channel it is not in * removed redundant slack api call and added a warning * added a return statement to keep the event from being counted as an error in the metrics Co-authored-by: Damian Durruty <ddurruty@ddurruty-mn2.linkedin.biz> * rackspace webhook support (linkedin#405) * Start rackspace webhook impl by copying alertmanager * Plumb plan name through URL parameter for rackspace webhook * add test for rackspace webhook (WIP) * fix rackspace webhook test data * rackspace webhook should parse plan inside its class This detail is only relevant to the rackspace webhook class and doesn't belong in api.py. This just reverts the changes in api.py and adds support for parsing the plan in the rackspace webhook class. * deduplicate webhook code with class inheritance authored-by: Patrick Baxter <pb@coreos.com> * Add support for custom Slack formatting via attachments/blocks (linkedin#624) * Update iris_slack.py * Update iris_slack.py * bumnp version * fix tests and incident id insert in slack message * remove extra arguments (linkedin#626) * added twillio number override mechanism (linkedin#627) * skip error logging if iris bot fails to send messages to a channel it is not in * removed redundant slack api call and added a warning * added a return statement to keep the event from being counted as an error in the metrics * added twillio number override mechanism * minor change in case application_override_mapping is not defined in the default config Co-authored-by: Damian Durruty <ddurruty@ddurruty-mn2.linkedin.biz> * remove experimental-sender changes from master * Update __init__.py * check for active mailing_list target (linkedin#630) * use app's category default if user's not defined * Iris-message-processor cluster management * incorporate suggestions * remove redundant line * remove internal app allowlist * Update __init__.py * remove non-inclusive language (linkedin#639) * remove non-inclusive language (linkedin#635) * master -> leader in config (linkedin#637) * remove non-inclusive language * master -> leader in config * restore sphinx required master_doc (linkedin#638) * remove non-inclusive language * master -> leader in config * restore sphinx required master_doc Co-authored-by: ddurruty-li <85372760+ddurruty-li@users.noreply.github.com> Co-authored-by: Damian Durruty <ddurruty@ddurruty-mn2.linkedin.biz> Co-authored-by: Patrick Baxter <patrickbx@gmail.com> Co-authored-by: Luke Young <bored-engineer@users.noreply.github.com>

* include custom sender addresses for build_message * add endpoint to fetch plan aggregation settings

…kedin#656) * return device ids with target contact * support multi-recipient msgs in external sender

* return device ids with target contact * support multi-recipient msgs in external sender * build messages for dynamic plans * remove unused variables

* return device ids with target contact * support multi-recipient msgs in external sender * build messages for dynamic plans * remove unused variables * link ui to external sender if enabled * fix wording of comment * add external sender into incident target search

* return device ids with target contact * support multi-recipient msgs in external sender * build messages for dynamic plans * remove unused variables * link ui to external sender if enabled * fix wording of comment * add external sender into incident target search * forward twilio deliver status to external sender * handle external message responses * set X-IRIS-INCIDENT header

* still display incidents if sender can't be reached * add ecternal sender peer count endpoint * flake8

linkedin#671) * still display incidents if sender can't be reached * add ecternal sender peer count endpoint * flake8 * split external incident & notification hadling cfg

* change cache plan delete log level (linkedin#619) * skip error logging if iris bot fails to send messages to a channel it is not in (linkedin#621) * skip error logging if iris bot fails to send messages to a channel it is not in * removed redundant slack api call and added a warning * added a return statement to keep the event from being counted as an error in the metrics Co-authored-by: Damian Durruty <ddurruty@ddurruty-mn2.linkedin.biz> * rackspace webhook support (linkedin#405) * Start rackspace webhook impl by copying alertmanager * Plumb plan name through URL parameter for rackspace webhook * add test for rackspace webhook (WIP) * fix rackspace webhook test data * rackspace webhook should parse plan inside its class This detail is only relevant to the rackspace webhook class and doesn't belong in api.py. This just reverts the changes in api.py and adds support for parsing the plan in the rackspace webhook class. * deduplicate webhook code with class inheritance authored-by: Patrick Baxter <pb@coreos.com> * Add support for custom Slack formatting via attachments/blocks (linkedin#624) * Update iris_slack.py * Update iris_slack.py * bumnp version * fix tests and incident id insert in slack message * remove extra arguments (linkedin#626) * added twillio number override mechanism (linkedin#627) * skip error logging if iris bot fails to send messages to a channel it is not in * removed redundant slack api call and added a warning * added a return statement to keep the event from being counted as an error in the metrics * added twillio number override mechanism * minor change in case application_override_mapping is not defined in the default config Co-authored-by: Damian Durruty <ddurruty@ddurruty-mn2.linkedin.biz> * remove experimental-sender changes from master * Update __init__.py * check for active mailing_list target (linkedin#630) * use app's category default if user's not defined * remove internal app allowlist * Update __init__.py * remove non-inclusive language (linkedin#639) * remove non-inclusive language (linkedin#635) * master -> leader in config (linkedin#637) * remove non-inclusive language * master -> leader in config * restore sphinx required master_doc (linkedin#638) * remove non-inclusive language * master -> leader in config * restore sphinx required master_doc * db setup instruction (linkedin#646) the `schema_0.sql` contains `drop table` statement which could be not obvious. I experienced this issue when using a docker image with `DOCKER_DB_BOOTSTRAP=1` (which does a schema and dummy data), and wondering why I lost all my custom plans and templates, after iris container restart. Additionally provide alternative way to removing ONLY_FULL_GROUP_BY. This is useful when running mysql docker images. I.e. from oracla/mysql or bitnami mariadb images. This is easier than modifying the config file, or changing global server configuration when running in the container (i.e. docker or kubernetes). * use DB port from the config in image entrypoint (linkedin#650) * until now, 3306 was hardcoded * port in config file stays optional for backward compatibility Co-authored-by: Michal Zubac <michal.zubac@inuits.eu> * rebasing container image to ubuntu:20.04 (linkedin#649) * rebasing container image to ubuntu:20.04 * switching to Python v3 * which was driven by rsa>=3.1.4 -> oauth2client==1.4.12, it no longer supports Python v2 * adding ops/packer Makefile * to automate steps only found in README.md * done just for the docker image build * instructing packer to clear ENTRYPOINT from original ubuntu image * CMD was not working with the /bin/sh in place of ENTRYPOINT * fixes to actually make it work with Python 3.8 * correct plugin references for uwsgi * fix of execv inputs to match validation, which changed in Python 3.6 * added missing mysql client package, which is used in the entrypoint * not using 'gevent' parameter for uwsgi * it causes "DAMN ! worker N (pid: NNN) died :( trying respawn ..." * not sure why this happens for uwsgi+python3+gevent Co-authored-by: Michal Zubac <michal.zubac@inuits.eu> * add max len to webhook context too long response (linkedin#653) * Update __init__.py * Create __init__.py * Support multiple custom incident handler hooks (linkedin#664) Refactor handling of custom incident handlers to support multiple handlers. Retain compatibility with previous config syntax * Calculate incident priority for incident creation hook - Generate priority by looking for the highest severity priority across all of the plan notifications - Set this as a new priority field within the incident details as passed to the process_create() hook * Make application name in hook incident data support overwritten app name When creating test incidents within the API, the application given to the incident creation hooks is "iris" rather than the application being used. Fix this such that the hooks receive the overwritten application name. Also: move definition of `app` to the parent block in the function so it is more obvious this variable is used later on. * Don't prepend https:// to proxy hostnames (linkedin#668) Historically, this approach has always worked because http forward proxies generally only listen on http:// (not https://) and urllib3 has not supported connecting to a http proxy via https:// so it has always ignored the scheme. However, as of urllib3 >= 1.26 or so, urllib3 does support and attempt connecting to proxies via https:// (if this schem is provided) and it raises an exception if the proxy only listens on http:// Fix this by no longer enforcing a http:// prefix to proxy hostnames. If the user desires connecting to a https:// proxy, this prefix can be provided within Iris's configuration. * remove old custom_incident_handler_dispatcher Co-authored-by: ddurruty-li <85372760+ddurruty-li@users.noreply.github.com> Co-authored-by: Damian Durruty <ddurruty@ddurruty-mn2.linkedin.biz> Co-authored-by: Patrick Baxter <patrickbx@gmail.com> Co-authored-by: Luke Young <bored-engineer@users.noreply.github.com> Co-authored-by: Witold Baryluk <witold.baryluk+github@gmail.com> Co-authored-by: mighq <contact@mighq.net> Co-authored-by: Michal Zubac <michal.zubac@inuits.eu> Co-authored-by: Joe Gillotti <joe@u13.net> Co-authored-by: Joe Gillotti <jgillotti@linkedin.com>

* change cache plan delete log level (linkedin#619) * skip error logging if iris bot fails to send messages to a channel it is not in (linkedin#621) * skip error logging if iris bot fails to send messages to a channel it is not in * removed redundant slack api call and added a warning * added a return statement to keep the event from being counted as an error in the metrics Co-authored-by: Damian Durruty <ddurruty@ddurruty-mn2.linkedin.biz> * rackspace webhook support (linkedin#405) * Start rackspace webhook impl by copying alertmanager * Plumb plan name through URL parameter for rackspace webhook * add test for rackspace webhook (WIP) * fix rackspace webhook test data * rackspace webhook should parse plan inside its class This detail is only relevant to the rackspace webhook class and doesn't belong in api.py. This just reverts the changes in api.py and adds support for parsing the plan in the rackspace webhook class. * deduplicate webhook code with class inheritance authored-by: Patrick Baxter <pb@coreos.com> * Add support for custom Slack formatting via attachments/blocks (linkedin#624) * Update iris_slack.py * Update iris_slack.py * bumnp version * fix tests and incident id insert in slack message * remove extra arguments (linkedin#626) * added twillio number override mechanism (linkedin#627) * skip error logging if iris bot fails to send messages to a channel it is not in * removed redundant slack api call and added a warning * added a return statement to keep the event from being counted as an error in the metrics * added twillio number override mechanism * minor change in case application_override_mapping is not defined in the default config Co-authored-by: Damian Durruty <ddurruty@ddurruty-mn2.linkedin.biz> * remove experimental-sender changes from master * Update __init__.py * check for active mailing_list target (linkedin#630) * use app's category default if user's not defined * remove internal app allowlist * Update __init__.py * remove non-inclusive language (linkedin#639) * remove non-inclusive language (linkedin#635) * master -> leader in config (linkedin#637) * remove non-inclusive language * master -> leader in config * restore sphinx required master_doc (linkedin#638) * remove non-inclusive language * master -> leader in config * restore sphinx required master_doc * db setup instruction (linkedin#646) the `schema_0.sql` contains `drop table` statement which could be not obvious. I experienced this issue when using a docker image with `DOCKER_DB_BOOTSTRAP=1` (which does a schema and dummy data), and wondering why I lost all my custom plans and templates, after iris container restart. Additionally provide alternative way to removing ONLY_FULL_GROUP_BY. This is useful when running mysql docker images. I.e. from oracla/mysql or bitnami mariadb images. This is easier than modifying the config file, or changing global server configuration when running in the container (i.e. docker or kubernetes). * use DB port from the config in image entrypoint (linkedin#650) * until now, 3306 was hardcoded * port in config file stays optional for backward compatibility Co-authored-by: Michal Zubac <michal.zubac@inuits.eu> * rebasing container image to ubuntu:20.04 (linkedin#649) * rebasing container image to ubuntu:20.04 * switching to Python v3 * which was driven by rsa>=3.1.4 -> oauth2client==1.4.12, it no longer supports Python v2 * adding ops/packer Makefile * to automate steps only found in README.md * done just for the docker image build * instructing packer to clear ENTRYPOINT from original ubuntu image * CMD was not working with the /bin/sh in place of ENTRYPOINT * fixes to actually make it work with Python 3.8 * correct plugin references for uwsgi * fix of execv inputs to match validation, which changed in Python 3.6 * added missing mysql client package, which is used in the entrypoint * not using 'gevent' parameter for uwsgi * it causes "DAMN ! worker N (pid: NNN) died :( trying respawn ..." * not sure why this happens for uwsgi+python3+gevent Co-authored-by: Michal Zubac <michal.zubac@inuits.eu> * add max len to webhook context too long response (linkedin#653) * Update __init__.py * Create __init__.py * Support multiple custom incident handler hooks (linkedin#664) Refactor handling of custom incident handlers to support multiple handlers. Retain compatibility with previous config syntax * Calculate incident priority for incident creation hook - Generate priority by looking for the highest severity priority across all of the plan notifications - Set this as a new priority field within the incident details as passed to the process_create() hook * Make application name in hook incident data support overwritten app name When creating test incidents within the API, the application given to the incident creation hooks is "iris" rather than the application being used. Fix this such that the hooks receive the overwritten application name. Also: move definition of `app` to the parent block in the function so it is more obvious this variable is used later on. * Don't prepend https:// to proxy hostnames (linkedin#668) Historically, this approach has always worked because http forward proxies generally only listen on http:// (not https://) and urllib3 has not supported connecting to a http proxy via https:// so it has always ignored the scheme. However, as of urllib3 >= 1.26 or so, urllib3 does support and attempt connecting to proxies via https:// (if this schem is provided) and it raises an exception if the proxy only listens on http:// Fix this by no longer enforcing a http:// prefix to proxy hostnames. If the user desires connecting to a https:// proxy, this prefix can be provided within Iris's configuration. * remove old custom_incident_handler_dispatcher * process ldap and oncall syncs concurrently (linkedin#677) Co-authored-by: ddurruty-li <85372760+ddurruty-li@users.noreply.github.com> Co-authored-by: Damian Durruty <ddurruty@ddurruty-mn2.linkedin.biz> Co-authored-by: Patrick Baxter <patrickbx@gmail.com> Co-authored-by: Luke Young <bored-engineer@users.noreply.github.com> Co-authored-by: Witold Baryluk <witold.baryluk+github@gmail.com> Co-authored-by: mighq <contact@mighq.net> Co-authored-by: Michal Zubac <michal.zubac@inuits.eu> Co-authored-by: Joe Gillotti <joe@u13.net> Co-authored-by: Joe Gillotti <jgillotti@linkedin.com>

…rimental-sender

* use locks with api cache * add ability to filter incidents by claimed * use gevent lock directly

* use locks with api cache * add ability to filter incidents by claimed * use gevent lock directly * import gevent.lock explicitly in cache

* use locks with api cache * add ability to filter incidents by claimed * use gevent lock directly * import gevent.lock explicitly in cache * use gevent BoundedSemaphore instead of Semaphore

* use locks with api cache * add ability to filter incidents by claimed * use gevent lock directly * import gevent.lock explicitly in cache * use gevent BoundedSemaphore instead of Semaphore * init api cache immediately

* use locks with api cache * add ability to filter incidents by claimed * use gevent lock directly * import gevent.lock explicitly in cache * use gevent BoundedSemaphore instead of Semaphore * init api cache immediately * set iris client init log to debug

…rimental-sender

…into experimental-sender

Experimental sender

@allwyn-pradip

The api makes use of [gevent], a coroutine based networking library which relies heavily on monkey patching the stdlib. From the [gevent.monkey] docs: > Warning Patching too late can lead to unreliable behaviour > (for example, some modules may still use blocking sockets) or even errors. This appears to have happened here. Thanks to @allwyn-pradip for pointing me at the right file in PR linkedin#690. Resolves linkedin#686, linkedin#699, linkedin#644. Blog on gevent: https://eng.lyft.com/what-the-heck-is-gevent-4e87db98a8 > In the case of gevent — monkey patching has to be the absolute first thing a process does [gevent]: https://www.gevent.org/index.html [gevent.monkey]: https://www.gevent.org/api/gevent.monkey.html

@allwyn-pradip

The api makes use of [gevent], a coroutine based networking library which relies heavily on monkey patching the stdlib. From the [gevent.monkey] docs: > Warning Patching too late can lead to unreliable behaviour > (for example, some modules may still use blocking sockets) or even errors. This appears to have happened here. Thanks to @allwyn-pradip for pointing me at the right file in PR #690. Resolves #686, #699, #644. Blog on gevent: https://eng.lyft.com/what-the-heck-is-gevent-4e87db98a8 > In the case of gevent — monkey patching has to be the absolute first thing a process does [gevent]: https://www.gevent.org/index.html [gevent.monkey]: https://www.gevent.org/api/gevent.monkey.html

bilbof · 2022-04-11T14:00:13Z

@allwyn-pradip you should be able to close this PR now that #703 is in. Thanks - this PR showed how to fix the issue.

diegocepedaw and others added 30 commits May 28, 2021 16:05

Create endpoint to build and render messages from OOB or Incident not…

51a85ef

…ification (linkedin#618) * create endpoint to build and render messages * update sender tests

change cache plan delete log level (linkedin#620)

7da69e5

Iris-message-processor cluster management

dc9ab79

incorporate suggestions

853418f

remove redundant line

56b3cd0

replace sender_cache with direct role lookup

f4f9a10

forward notifications to external sender

cfe6bf7

skip target reprioritization if no cache

9f1d1bd

include custom sender addresses for build_message (linkedin#645)

bc0c0ee

* include custom sender addresses for build_message * add endpoint to fetch plan aggregation settings

return device ids with target contact

8aff9c6

add support for multirecipient messages in InternalBuildMessages (lin…

9151da8

…kedin#656) * return device ids with target contact * support multi-recipient msgs in external sender

support building messages from dynamic plans (linkedin#658)

0005cbd

* return device ids with target contact * support multi-recipient msgs in external sender * build messages for dynamic plans * remove unused variables

still display incidents if sender can't be reached (linkedin#662)

bb8c698

Experimental sender (linkedin#663)

e51ef04

* still display incidents if sender can't be reached * add ecternal sender peer count endpoint * flake8

Don't prepend https:// to proxy hostnames (linkedin#668)

6134ea5

Merge branch 'master' of https://github.com/allwyn-pradip/iris

a8bec94

split external incident & notification handling into different configs (

b6e8232

linkedin#671) * still display incidents if sender can't be reached * add ecternal sender peer count endpoint * flake8 * split external incident & notification hadling cfg

do not load configs from util

ab7a706

do not load configs from util

89fde4d

Merge remote-tracking branch 'upstream/experimental-sender' into expe…

6aa410a

…rimental-sender

create external_sender_incident_dryrun category

a2dc45a

remove comment

66bb126

restore auth req

8b134a1

create external_sender_incident_dryrun category

ef7bce5

diegocepedaw and others added 20 commits March 29, 2022 10:48

use locks with api cache (linkedin#693)

1f2bbb4

* use locks with api cache * add ability to filter incidents by claimed * use gevent lock directly

import gevent.lock explicitly in cache (linkedin#694)

9a25f95

* use locks with api cache * add ability to filter incidents by claimed * use gevent lock directly * import gevent.lock explicitly in cache

use gevent BoundedSemaphore instead of Semaphore (linkedin#695)

a0a861d

* use locks with api cache * add ability to filter incidents by claimed * use gevent lock directly * import gevent.lock explicitly in cache * use gevent BoundedSemaphore instead of Semaphore

init api cache immediately (linkedin#696)

8ebb321

* use locks with api cache * add ability to filter incidents by claimed * use gevent lock directly * import gevent.lock explicitly in cache * use gevent BoundedSemaphore instead of Semaphore * init api cache immediately

correctly handle no target match

f781e55

correctly handle no target match (linkedin#698)

ff2fe98

add render jinja endpoint

a3ab552

Merge remote-tracking branch 'upstream/experimental-sender' into expe…

e25e7e5

…rimental-sender

make renderjinja endpoint internal_allowlist_only

6af4ab4

don't override iris context

56c85bf

fill in dummy iris meta data for OOB msgs

e7e064e

small formatting change

0f26492

don't add application to subject

026228d

Merge branch 'experimental-sender' of https://github.com/linkedin/iris …

f71e957

…into experimental-sender

Merge pull request #14 from diegocepedaw/experimental-sender

e6fbb6f

Experimental sender

webhook update

66e8142

sender import update

16f756d

Merge branch 'experimental-sender-fw' into iris-master-fw

a0dc575

Merge branch 'linkedin:master' into iris-master-fw

51b9b7e

bilbof mentioned this pull request Apr 11, 2022

iris Plan issue #644

Closed

update uwsgi to python3

3f6155b

Sender import fix

e085697

bilbof mentioned this pull request Apr 11, 2022

Call gevent.monkey.patch_all immediately on import #703

Merged

diegocepedaw closed this Apr 11, 2022

allwyn-pradip mentioned this pull request Apr 12, 2022

Call gevent.monkey.patch_all immediately on import (#703) allwyn-pradip/iris#17

Merged

allwyn-pradip mentioned this pull request Apr 20, 2022

Error while sending message or call to target #669

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Target not available #690

Target not available #690

allwyn-pradip commented Mar 25, 2022

bilbof commented Apr 11, 2022

Target not available #690

Target not available #690

Conversation

allwyn-pradip commented Mar 25, 2022

bilbof commented Apr 11, 2022