## Alerts Data Parsing

Parse all the alerts data of multiple sources downloaded from pagerduty and merge them to form an alerts dataset.
Finally create a dataset with the columns as alert identifier, alert id, incident number, affected services, alert condition, etc.

Use alerts data from the following sources in the final dataset:
1. Adobe Sign SRE
2. Adobe Sign DBA
3. Adobe Sign FedRAMP SRE
4. Splunk
5. Sign - Pingdom
6. Datadog-Prod
7. eda-alerts-prod
6. Sign MS Techops Oncall

In [1]:
import os
import json
import pandas as pd

In [2]:
directory = './Alerts_data/pagerduty_data'

In [3]:
final_json = []
for alert_file in os.listdir(directory):
    if alert_file == '.ipynb_checkpoints':
        continue
    else:
        with open(directory + '/' + alert_file, 'r') as f:
            data = json.load(f)
        final_json += data['incidents']
with open('./Alerts_data/all_alerts.json', 'w') as f:
    json.dump(final_json,f)

In [3]:
f = json.load(open('./Alerts_data/all_alerts.json', 'r'))
df_normalised = pd.json_normalize(f)

In [6]:
t = pd.to_datetime('2020-10-01')

In [10]:
df_normalised[df_normalised['created_at'] > t.strftime('%Y-%m-%dT%XZ')]['created_at'].sort_values()

39468    2020-10-01T00:20:19Z
39467    2020-10-01T01:51:27Z
39466    2020-10-01T02:06:25Z
31244    2020-10-01T03:55:19Z
39465    2020-10-01T04:09:19Z
                 ...         
1219     2022-07-19T07:44:22Z
1218     2022-07-19T07:47:25Z
1217     2022-07-19T08:28:51Z
1216     2022-07-19T09:11:17Z
1200     2022-07-19T09:38:04Z
Name: created_at, Length: 43833, dtype: object

In [5]:
print('total', len(df_normalised['incident_number']), 'alerts')

total 58255 alerts


In [6]:
services = list(set(list(df_normalised['service.summary'])))

In [7]:
services

['DC SRE Manager On-Call',
 'DCSRE-Manager-OnCall-Escalation',
 'eda-alerts-stage',
 'eda-alerts-dev',
 'Adobe Sign Dev',
 'DataDog-Stage-Pythian',
 'Adobe Sign FedRAMP SRE',
 'Cortex-AdobeSign',
 'Adobe Sign Splunk',
 'Adobe Sign DBA',
 'Sign - Pingdom',
 'Splunk',
 'Datadog-Prod-Pythian',
 'Sign MS Techops Oncall',
 'eda-alerts-prod',
 'Hystrix-Dev',
 'Datadog-Prod',
 'Adobe Sign SRE',
 'Sean Test',
 'CSO Launch']

In [8]:
service_ab = ['sre', 'dcsreoc', 'cso', 'dba' , 'fsre' , 'splunk' , 'ddg' , 'dcsreoce' , 'pgdm' , 'eda' , 'smto' , 'dev']
service_dict = {
    'Adobe Sign SRE' : 'sre',
    'DC SRE Manager On-Call' : 'dcsreoc',
    'CSO Launch' : 'cso',
    'Adobe Sign DBA' : 'dba',
    'Adobe Sign FedRAMP SRE' : 'fsre',
    'Splunk' : 'splunk',
    'Adobe Sign Splunk' : 'splunk',
    'Datadog-Prod' : 'ddg',
    'DCSRE-Manager-OnCall-Escalation' : 'dcsreoce',
    'Sign - Pingdom' : 'pgdm',
    'eda-alerts-prod' : 'eda',
    'Sign MS Techops Oncall' : 'smto',
    'Adobe Sign Dev' : 'dev'
}

In [9]:
df_dict = {}
for service in service_dict:
    print(service)
    df_dict[service_dict[service]] = df_normalised[df_normalised['service.summary'] == service].astype(str)

Adobe Sign SRE
DC SRE Manager On-Call
CSO Launch
Adobe Sign DBA
Adobe Sign FedRAMP SRE
Splunk
Adobe Sign Splunk
Datadog-Prod
DCSRE-Manager-OnCall-Escalation
Sign - Pingdom
eda-alerts-prod
Sign MS Techops Oncall
Adobe Sign Dev


In [10]:
parsed_dict = {}

## SRE

In [11]:
parsed_dict['sre'] = {'incident_number':[], 'created_at':[], 'alert_identifier':[], 'node_name':[], 'service':[], 'metric_value':[], 'description':[], 'alert_condition':[]}
for i, inc_num in enumerate(df_dict['sre']['incident_number'].values):
    # print(i)
    temp = {'incident_number':None, 'created_at':None, 'alert_identifier':None, 'node_name':None, 'service':None, 'metric_value':None, 'description':None, 'alert_condition':None}
    try:
        val = ('alertname' not in (df_dict['sre']['first_trigger_log_entry.channel.cef_details.details.firing'].iloc[i].split('\n')[1].split('=')[0].lower()))
        if (not val):
            firing = df_dict['sre']['first_trigger_log_entry.channel.cef_details.details.firing'].iloc[i].split('\n')
            temp['incident_number'] = inc_num
            # print('hey')
            temp['created_at'] = df_dict['sre']['created_at'].iloc[i]
            # print('ho')
            temp['alert_identifier'] = firing[1].split('=')[1].lower()
            temp['node_name'] = temp['alert_identifier']
            temp['service'] = df_dict['sre']['first_trigger_log_entry.service.summary'].iloc[i]
            count = 0
            for j in firing:
                if 'metric_value' in j and count < 2:
                    count += 1
                    temp['metric_value'] = j.split('=')[1]
                if 'alertcondition' in j and count < 2:
                    count += 1
                    temp['alert_condition'] = j.split('=')[1]
            temp['description'] = df_dict['sre']['first_trigger_log_entry.event_details.description'].iloc[i]
    except:
        print(i, inc_num, df_dict['sre']['title'].iloc[i].lower())
    flag = True
    for e in temp:
        if temp[e] is None:
            flag = False
            break
    if flag:
        for e in temp:
            parsed_dict['sre'][e].append(temp[e])
        

1813 509370 join active cso
3060 1448915 join cso 17662 - adobe sign europe 1 - critical systems notification
3799 337213 please join preemptive cso
6512 144999 test mail. please ignore.
7326 589759 sorry testing - ack in slack and resolve
7733 13843 testing 123
8982 406435 cso 11993 - cemt na3 (azure) - critical systems notification
9088 740175 fw: cso 15663 - adobe sign europe 1 - critical systems notification
9538 1413640 fw: cso 17536 - default services - critical systems notification
10000 517801 please join active cso
11360 932026 serving cupcakes for document cloud - adobe sign - cupcake
16268 414735 re: cso 12041 - document cloud - adobe sign - critical systems notification
20920 765361 this is a critical systems notification for eoa migration service.


In [12]:
try:
    pd.DataFrame(parsed_dict['sre'])
    print('Parsing did well')
    print('Parsed', len(parsed_dict['sre']['incident_number']), 'alerts')
except:
    print('Not all arrays of same size')

Parsing did well
Parsed 21510 alerts


## DBA

In [13]:
parsed_dict['dba'] = {'incident_number':[], 'created_at':[], 'alert_identifier':[], 'node_name':[], 'service':[], 'metric_value':[], 'description':[], 'alert_condition':[]}
for i, inc_num in enumerate(df_dict['dba']['incident_number'].values):
    # print(i)
    temp = {'incident_number':None, 'created_at':None, 'alert_identifier':None, 'node_name':None, 'service':None, 'metric_value':None, 'description':None, 'alert_condition':None}
    try:
        val = ('alertname' not in (df_dict['dba']['first_trigger_log_entry.channel.cef_details.details.firing'].iloc[i].split('\n')[1].split('=')[0].lower()))
        if (not val):
            temp['service'] = df_dict['dba']['first_trigger_log_entry.service.summary'].iloc[i]
            temp['description'] = df_dict['dba']['first_trigger_log_entry.event_details.description'].iloc[i]
            firing = df_dict['dba']['first_trigger_log_entry.channel.cef_details.details.firing'].iloc[i].split('\n')
            temp['incident_number'] = (inc_num)
            # print('hey')
            temp['created_at'] = (df_dict['dba']['created_at'].iloc[i])
            # print('ho')
            temp['alert_identifier'] = (df_dict['dba']['first_trigger_log_entry.channel.cef_details.details.firing'].iloc[i].split('\n')[1].split('=')[1].lower())
            temp['node_name'] = (temp['alert_identifier'])
            count = 0
            flag1 = True
            flag2 = True
            for j in firing:
                if 'metric_value' in j and count < 2:
                    count += 1
                    temp['metric_value'] = (j.split('=')[1])
                    flag1 = False
                if 'alertcondition' in j and count < 2:
                    count += 1
                    temp['alert_condition'] = (j.split('=')[1])
                    flag2 = False
            if flag1:
                temp['metric_value'] = ('NA')
            if flag2:
                temp['alert_condition'] = ('NA')
                
    except Exception as e:
        # print(i, inc_num, df_dict['dba']['title'].iloc[i].lower())
        if ('prod' in df_dict['dba']['title'].iloc[i].lower()):
            temp['incident_number'] = (inc_num)
            temp['created_at'] = (df_dict['dba']['created_at'].iloc[i])
            temp['service'] = (df_dict['dba']['first_trigger_log_entry.service.summary'].iloc[i])
            temp['description'] = (df_dict['dba']['first_trigger_log_entry.event_details.description'].iloc[i])
            try:
                ls = list(map(str.strip,df_dict['dba']['title'].iloc[i].split('-')))
                ind = ls.index('prod') + 1
                rem = ''
                for j in range(len(ls)):
                    rem += ls[j] + ' '
                temp['alert_identifier'] = (rem)
            except:
                print(i,inc_num, 'prod nhi mila')
            
            if (' on ' not in temp['alert_identifier'] and ' for ' not in temp['alert_identifier']):
                print(i,inc_num, 'none', temp['alert_identifier'], df_dict['dba']['title'].iloc[i])
            else:
                if(' on ' in temp['alert_identifier']):
                    ind = temp['alert_identifier'].find(' on ')
                    temp['node_name'] = (temp['alert_identifier'][:ind])
                else:
                    if(' for ' in temp['alert_identifier']):
                        ind = temp['alert_identifier'].find(' for ')
                        temp['node_name'] = (temp['alert_identifier'][:ind])
                    else:
                        temp['node_name'] = (temp['alert_identifier'])
            metric_text = df_dict['dba']['first_trigger_log_entry.channel.details.body'].iloc[i].lower().split('\n')
            for j in metric_text:
                if 'metric value' in j:
                    temp['metric_value'] = (j.split(':')[1])
            temp['alert_condition'] = (metric_text[2])
            # print(i, temp['alert_identifier'], temp['node_name'])
    flag = True
    for e in temp:
        if temp[e] is None:
            flag = False
            break
    if flag:
        for e in temp:
            parsed_dict['dba'][e].append(temp[e])

In [14]:
try:
    pd.DataFrame(parsed_dict['dba'])
    print('Parsing did well')
    print('Parsed', len(parsed_dict['dba']['incident_number']), 'alerts')
except:
    print('Not all arrays of same size')

Parsing did well
Parsed 9416 alerts


## FSRE

In [15]:
parsed_dict['fsre'] = {'incident_number':[], 'created_at':[], 'alert_identifier':[], 'node_name':[], 'service':[], 'metric_value':[], 'description':[], 'alert_condition':[]}
for i, inc_num in enumerate(df_dict['fsre']['incident_number'].values):
    # print(i)
    temp = {'incident_number':None, 'created_at':None, 'alert_identifier':None, 'node_name':None, 'service':None, 'metric_value':None, 'description':None, 'alert_condition':None}
    try:
        val = ('alertname' not in (df_dict['fsre']['first_trigger_log_entry.channel.cef_details.details.firing'].iloc[i].split('\n')[1].split('=')[0].lower()))
        if (not val):
            firing = df_dict['fsre']['first_trigger_log_entry.channel.cef_details.details.firing'].iloc[i].split('\n')
            temp['incident_number'] = (inc_num)
            temp['created_at'] = (df_dict['fsre']['created_at'].iloc[i])
            temp['alert_identifier']  = (df_dict['fsre']['first_trigger_log_entry.channel.cef_details.details.firing'].iloc[i].split('\n')[1].split('=')[1].lower())
            temp['node_name'] = (temp['alert_identifier'])
            temp['service'] = (df_dict['fsre']['first_trigger_log_entry.service.summary'].iloc[i])
            count = 0
            flag1 = True
            flag2 = True
            for j in firing:
                if 'metric_value' in j and count < 2:
                    count += 1
                    temp['metric_value'] = (j.split('=')[1])
                    flag1 = False
                if 'alertcondition' in j and count < 2:
                    count += 1
                    temp['alert_condition'] = (j.split('=')[1])
                    flag2 = False
            if flag1:
                temp['metric_value'] = ('NA')
            if flag2:
                temp['alert_condition'] = ('NA')
            temp['description'] = (df_dict['fsre']['first_trigger_log_entry.event_details.description'].iloc[i])
    except:
        print(i, inc_num, df_dict['fsre']['title'].iloc[i].lower())
    flag = True
    for e in temp:
        if temp[e] is None:
            flag = False
            break
    if flag:
        for e in temp:
            parsed_dict['fsre'][e].append(temp[e])

0 1171775 alert received for govprod
1 1171774 alert received for govprod
2 1169150 alert received for govprod
3 1168887 alert received for govprod
4 1189584 alert received for govprod
5 1092661 alert received for govprod
6 1176763 alert received for govprod
7 1072745 alert received for govprod
8 1072426 alert received for govprod
9 1072427 alert received for govprod
10 1072428 alert received for govprod
11 1072401 alert received for govprod
12 1072399 alert received for govprod
13 1072400 alert received for govprod
14 1072395 alert received for govprod
15 1072393 alert received for govprod
16 1072394 alert received for govprod
17 1072387 alert received for govprod
18 1072388 alert received for govprod
19 1072389 alert received for govprod
20 1071708 alert received for govprod
21 1060740 alert received for govprod
22 1060741 alert received for govprod
23 1060742 alert received for govprod
24 1060743 alert received for govprod
25 1060294 alert received for govprod
26 1060017 alert recei

In [16]:
try:
    pd.DataFrame(parsed_dict['fsre'])
    print('Parsing did well')
    print('Parsed', len(parsed_dict['fsre']['incident_number']), 'alerts')
except:
    print('Not all arrays of same size')

Parsing did well
Parsed 18 alerts


## Pingdom

In [17]:
parsed_dict['pgdm'] = {'incident_number':[], 'created_at':[], 'alert_identifier':[], 'node_name':[], 'service':[], 'metric_value':[], 'description':[], 'alert_condition':[]}
for i, inc_num in enumerate(df_dict['pgdm']['incident_number'].values):
    # print(i)
    temp = {'incident_number':None, 'created_at':None, 'alert_identifier':None, 'node_name':None, 'service':None, 'metric_value':None, 'description':None, 'alert_condition':None}
    try:
        val = ('pingdom' not in (df_dict['pgdm']['first_trigger_log_entry.channel.cef_details.message'].iloc[i].lower()))
        if (not val):
            temp['service'] = (df_dict['pgdm']['first_trigger_log_entry.service.summary'].iloc[i])
            
            temp['metric_value'] = ('NA')
            
            temp['description'] = (df_dict['pgdm']['first_trigger_log_entry.event_details.description'].iloc[i])
            
            temp['alert_condition'] = (df_dict['pgdm']['first_trigger_log_entry.channel.details.long_description'].iloc[i])
            
            temp['incident_number'] = (inc_num)
            temp['created_at'] = (df_dict['pgdm']['created_at'].iloc[i])
            
            rem_string = ''
            for j in df_dict['pgdm']['first_trigger_log_entry.channel.cef_details.message'].iloc[i].split('-')[1:]:
                rem_string = rem_string + j
            # print(rem_string)
            # + ' in ' +  df_dict['pgdm']['first_trigger_log_entry.channel.details.full_url'].iloc[i] + df_dict['pgdm']['first_trigger_log_entry.channel.cef_details.message'].iloc[i].split('-')[1:])
            temp['alert_identifier'] = (df_dict['pgdm']['first_trigger_log_entry.channel.cef_details.message'].iloc[i].split('-')[0] + ' in ' +  df_dict['pgdm']['first_trigger_log_entry.channel.details.full_url'].iloc[i] + rem_string)
            temp['node_name'] = (df_dict['pgdm']['first_trigger_log_entry.channel.cef_details.message'].iloc[i].split('-')[0] + df_dict['pgdm']['first_trigger_log_entry.channel.cef_details.message'].iloc[i].split('-')[1])
        else:
            print(df_dict['pgdm']['first_trigger_log_entry.channel.cef_details.message'].iloc[i].lower())
    except:
        print(i, inc_num, df_dict['pgdm']['first_trigger_log_entry.channel.cef_details.message'].iloc[i].lower())
    flag = True
    for e in temp:
        if temp[e] is None:
            flag = False
            break
    if flag:
        for e in temp:
            parsed_dict['pgdm'][e].append(temp[e])

In [18]:
try:
    pd.DataFrame(parsed_dict['pgdm'])
    print('Parsing did well')
    print('Parsed', len(parsed_dict['pgdm']['incident_number']), 'alerts')
except:
    print('Not all arrays of same size')

Parsing did well
Parsed 1098 alerts


## Splunk

In [19]:
parsed_dict['splunk'] = {'incident_number':[], 'created_at':[], 'alert_identifier':[], 'node_name':[], 'service':[], 'metric_value':[], 'description':[], 'alert_condition':[]}
for i, inc_num in enumerate(df_dict['splunk']['incident_number'].values):
    # print(i)
    temp = {'incident_number':None, 'created_at':None, 'alert_identifier':None, 'node_name':None, 'service':None, 'metric_value':None, 'description':None, 'alert_condition':None}
    try:
        # val = ('alertname' not in (df_dict['splunk']['first_trigger_log_entry.channel.cef_details.details.firing'].iloc[i].split('\n')[1].split('=')[0].lower()))
        # if (not val):
        temp['incident_number']=(inc_num)
        # print('hey')
        temp['created_at']=(df_dict['splunk']['created_at'].iloc[i])
        # print('ho')
        temp['alert_identifier']=(df_dict['splunk']['title'].iloc[i])
        temp['node_name']=(temp['alert_identifier'])
        temp['service']=(df_dict['splunk']['first_trigger_log_entry.service.summary'].iloc[i])
        temp['metric_value']=('NA')
        temp['description']=(df_dict['splunk']['first_trigger_log_entry.event_details.description'].iloc[i])
        temp['alert_condition']=('NA')
    except:
        print(i, inc_num, df_dict['splunk']['title'].iloc[i].lower())
    
    flag = True
    for e in temp:
        if temp[e] is None:
            flag = False
            break
    if flag:
        for e in temp:
            parsed_dict['splunk'][e].append(temp[e])

In [20]:
try:
    pd.DataFrame(parsed_dict['splunk'])
    print('Parsing did well')
    print('Parsed', len(parsed_dict['splunk']['incident_number']), 'alerts')
except:
    print('Not all arrays of same size')

Parsing did well
Parsed 2827 alerts


## Datadog

In [21]:
parsed_dict['ddg'] = {'incident_number':[], 'created_at':[], 'alert_identifier':[], 'node_name':[], 'service':[], 'metric_value':[], 'description':[], 'alert_condition':[]}
count = 0
for i, inc_num in enumerate(df_dict['ddg']['incident_number'].values):
    temp = {'incident_number':None, 'created_at':None, 'alert_identifier':None, 'node_name':None, 'service':None, 'metric_value':None, 'description':None, 'alert_condition':None}
    check = False
    try:
        if ('prod' in df_dict['ddg']['title'].iloc[i].lower()):
            try:
                ls = list(map(str.strip,df_dict['ddg']['title'].iloc[i].lower().split('-')))
                ind = ls.index('prod') + 1
                rem_string = ''
                for j in range(ind,len(ls)):
                    rem_string += ls[j] + ' '
                # print(rem_string)
                flag = True
            except:
                print(i,inc_num, 'prod nhi mila',df_dict['ddg']['title'].iloc[i])
                flag = False
            if flag:
                temp['alert_identifier']=(rem_string)
                temp['incident_number']=(inc_num)
                temp['created_at']=(df_dict['ddg']['created_at'].iloc[i])
                temp['description']=(df_dict['ddg']['first_trigger_log_entry.channel.description'].iloc[i])
                temp['service']=(df_dict['ddg']['first_trigger_log_entry.service.summary'].iloc[i])
                if (' on ' not in temp['alert_identifier'] and ' for ' not in temp['alert_identifier'] and flag):
                    temp['node_name']=(ls[ind])
                else:
                    if(' on ' in temp['alert_identifier']):
                        ind = temp['alert_identifier'].find(' on ')
                        temp['node_name'] = (temp['alert_identifier'][:ind])
                    else:
                        if(' for ' in temp['alert_identifier']):
                            ind = temp['alert_identifier'].find(' for ')
                            temp['node_name']=(temp['alert_identifier'][:ind])
                metric_text = df_dict['ddg']['first_trigger_log_entry.channel.details.body'].iloc[i].lower().split('\n')
                # print(metric_text, '\n')
                check = True
                try:
                    for j in metric_text:
                        if 'metric value' in j:
                            check = False
                            temp['metric_value']=(j.split(':')[1])
                    temp['alert_condition']=(metric_text[2])
                except:
                    count += 1
                    temp['metric_value']=('NA')
                    temp['alert_condition']=('NA')
                if check:
                    print(i, inc_num, metric_text)
        else:
            print(i,'prod hi nhi')
        # val = ('alertname' not in (df_dict['dba']['first_trigger_log_entry.channel.cef_details.details.firing'].iloc[i].split('\n')[1].split('=')[0].lower()))
        # if (not val):
        #     parsed_dict['dba']['incident_number'].append(inc_num)
        #     # print('hey')
        #     parsed_dict['dba']['created_at'].append(df_dict['dba']['created_at'].iloc[i])
        #     # print('ho')
        #     parsed_dict['dba']['alert_identifier'].append(df_dict['dba']['first_trigger_log_entry.channel.cef_details.details.firing'].iloc[i].split('\n')[1].split('=')[1].lower())
        #     parsed_dict['dba']['node_name'].append(parsed_dict['dba']['alert_identifier'][-1])
    except Exception as e:
        if check:
            print('yes', metric_text)
        print(i, inc_num, df_dict['ddg']['title'].iloc[i].lower())
        
    flag = True
    for e in temp:
        if temp[e] is None:
            flag = False
            break
    if flag:
        for e in temp:
            parsed_dict['ddg'][e].append(temp[e])

360 530963 ['tungsten connector process is down for isc-a1.in.eu1dc2.echosign.com @slack-adobesignmainapp-sign-alerts-prod  @pagerduty-datadog-prod  ', '', 'procs critical: 4 processes found for tungsten_connector']
361 530960 ['tomcat catalina process is down for app-b4.in.na2dc2.echosign.com @slack-adobesignmainapp-sign-alerts-prod  @pagerduty-datadog-prod   @webhook-haas_adobesign_datadog_alert', '', 'procs critical: 0 processes found for apache_catalina']
364 530858 ['tungsten connector process is down for isc-b6.in.na4dc2.echosign.com @slack-adobesignmainapp-sign-alerts-prod  @pagerduty-datadog-prod  ', '', 'procs critical: 5 processes found for tungsten_connector']
368 530729 ['tomcat catalina process is down for app-b4.in.na2dc2.echosign.com @slack-adobesignmainapp-sign-alerts-prod  @pagerduty-datadog-prod   @webhook-haas_adobesign_datadog_alert', '', 'procs critical: 0 processes found for apache_catalina']
369 530724 ['tomcat catalina process is down for app-b12.in.na1dc1.echos

In [22]:
try:
    pd.DataFrame(parsed_dict['ddg'])
    print('Parsing did well')
    print('Parsed', len(parsed_dict['ddg']['incident_number']), 'alerts')
except:
    print('Not all arrays of same size')

Parsing did well
Parsed 15348 alerts


## EDA

In [23]:
parsed_dict['eda'] = {'incident_number':[], 'created_at':[], 'alert_identifier':[], 'node_name':[], 'service':[], 'metric_value':[], 'description':[], 'alert_condition':[]}
for i, inc_num in enumerate(df_dict['eda']['incident_number'].values):
    temp = {'incident_number':None, 'created_at':None, 'alert_identifier':None, 'node_name':None, 'service':None, 'metric_value':None, 'description':None, 'alert_condition':None}
    temp['incident_number'] = (inc_num)
    temp['created_at'] = (df_dict['eda']['created_at'].iloc[i])
    if ('eda' in df_dict['eda']['title'].iloc[i].lower()):
        temp['alert_identifier'] = (df_dict['eda']['title'].iloc[i])
        temp['node_name'] = ('EDA ' + df_dict['eda']['title'].iloc[i].split(':')[1])
    else:
        temp['alert_identifier'] = (df_dict['eda']['title'].iloc[i].split(',')[0])
        temp['node_name'] = (df_dict['eda']['title'].iloc[i].split(',')[0])
    temp['service'] = df_dict['eda']['first_trigger_log_entry.service.summary'].iloc[i]
    temp['metric_value'] = 'NA'
    temp['description'] = 'NA'
    temp['alert_condition'] = 'NA'
    
    flag = True
    for e in temp:
        if temp[e] is None:
            flag = False
            break
    if flag:
        for e in temp:
            parsed_dict['eda'][e].append(temp[e])

In [24]:
try:
    pd.DataFrame(parsed_dict['eda'])
    print('Parsing did well')
    print('Parsed', len(parsed_dict['eda']['incident_number']), 'alerts')
except:
    print('Not all arrays of same size')

Parsing did well
Parsed 536 alerts


## Sign MS Techops On Call

In [25]:
parsed_dict['smto'] = {'incident_number':[], 'created_at':[], 'alert_identifier':[], 'node_name':[], 'service':[], 'metric_value':[], 'description':[], 'alert_condition':[]}
for i, inc_num in enumerate(df_dict['smto']['incident_number'].values):
    temp = {'incident_number':None, 'created_at':None, 'alert_identifier':None, 'node_name':None, 'service':None, 'metric_value':None, 'description':None, 'alert_condition':None}
    if ('splunk alert' in df_dict['smto']['title'].iloc[i].lower()):
        temp['incident_number']=(inc_num)
        temp['created_at']=(df_dict['smto']['created_at'].iloc[i])
        temp['alert_identifier']=(df_dict['smto']['title'].iloc[i])
        temp['node_name']=('Splunk Alert ' + df_dict['smto']['title'].iloc[i].split(':')[1])
        temp['service'] = df_dict['smto']['first_trigger_log_entry.service.summary'].iloc[i]
        temp['metric_value'] = 'NA'
        temp['description'] = 'NA'
        temp['alert_condition'] = 'NA'
        
    flag = True
    for e in temp:
        if temp[e] is None:
            flag = False
            break
    if flag:
        for e in temp:
            parsed_dict['smto'][e].append(temp[e])

In [26]:
try:
    pd.DataFrame(parsed_dict['pgdm'])
    print('Parsing did well')
    print('Parsed', len(parsed_dict['smto']['incident_number']), 'alerts')
except:
    print('Not all arrays of same size')

Parsing did well
Parsed 166 alerts


## Merging All

In [27]:
parsed_merged = {'incident_number':[], 'created_at':[], 'alert_identifier':[], 'node_name':[], 'metric_value':[], 'service':[], 'description':[], 'alert_condition':[]}
service_final = ['sre', 'dba' , 'fsre' , 'splunk' , 'pgdm', 'eda', 'ddg', 'smto']
for service in service_final:
    for key in parsed_merged:
        parsed_merged[key] = parsed_merged[key] + parsed_dict[service][key]
parsed_merged['alert_identifier'] = list(map(str.strip, parsed_merged['alert_identifier']))
parsed_merged['node_name'] = list(map(str.strip, parsed_merged['node_name']))

## Finding Unique alerts and creating a Mapping

In [28]:
uniq_alerts = list(set(parsed_merged['node_name']))      ## order remains the same

In [29]:
len(uniq_alerts)

937

In [30]:
alerts_df_dict = {'alert_name':[], 'id':[]}
for i, alert in enumerate (uniq_alerts):
    alerts_df_dict['alert_name'].append(alert)
    alerts_df_dict['id'].append(i)
pd.DataFrame(alerts_df_dict).to_csv('./Alerts_data/alert_id_mapping.csv')

In [31]:
alert_df = pd.read_csv('./Alerts_data/alert_id_mapping.csv')
alerts_mapping = {}
for i, alert in enumerate (alert_df['alert_name'].values):
    alerts_mapping[alert] = i

In [32]:
parsed_merged['alert_id'] = []
for i,alert in enumerate(parsed_merged['node_name']):
    parsed_merged['alert_id'].append(alerts_mapping[alert])

parsed_merged_df = pd.DataFrame(parsed_merged)
parsed_merged_df.to_csv('./Alerts_data/all_alerts_parsed.csv')
parsed_merged_df.head()

Unnamed: 0,incident_number,created_at,alert_identifier,node_name,metric_value,service,description,alert_condition,alert_id
0,635303,2021-08-18T03:18:20Z,instance_disk_volume_usage_high_alert,instance_disk_volume_usage_high_alert,90.5047382227446,Adobe Sign SRE,[FIRING:1] instance_disk_volume_usage_high_ale...,(100 - (sum(node_filesystem_avail_bytes{tiern...,726
1,635155,2021-08-18T01:25:51Z,high_qpid_queuesize_failed_callbackoutboundeve...,high_qpid_queuesize_failed_callbackoutboundeve...,4177.0,Adobe Sign SRE,[FIRING:1] high_qpid_queuesize_failed_callback...,min(qpid_queuedepthmessages{queuename,833
2,635137,2021-08-18T01:12:02Z,high_qpid_oldestmessageage_workflowsignatureq_...,high_qpid_oldestmessageage_workflowsignatureq_...,643673.0,Adobe Sign SRE,[FIRING:1] high_qpid_oldestmessageage_workflow...,min(qpid_oldestmessageage{queuename,612
3,635061,2021-08-18T00:23:49Z,high_qpid_oldestmessageage_agreementexpiration...,high_qpid_oldestmessageage_agreementexpiration...,651633.0,Adobe Sign SRE,[FIRING:1] high_qpid_oldestmessageage_agreemen...,min(qpid_oldestmessageage{queuename,871
4,634681,2021-08-17T19:25:49Z,high_qpid_queuesize_agreementeventpropagationq...,high_qpid_queuesize_agreementeventpropagationq...,707.0,Adobe Sign SRE,[FIRING:1] high_qpid_queuesize_agreementeventp...,min(qpid_queuedepthmessages{queuename,163
