# Codebusters KPIs and Other Usefull Patterns

## import libraries

In [3]:
import pandas as pd
import numpy as np

Import and clean `all_tickets_2022` data set 
- remove unnecessary columns
- rename columns

# Prepare data with all tickets

### Import all service request

Tickets are taken from the following jira filter:<br/> `project = CI AND issuetype in (standardIssueTypes(), "Expense Delivery") AND "Epic Link" is EMPTY AND "Case Number/s" is not EMPTY AND cf[14125] in ("Service Request (SR)") AND resolved is not EMPTY AND resolutiondate >= 2022-12-19 and resolutiondate <= 2023-01-08`

In [15]:
all_sr_origin = pd.read_csv('../../DataSets/KPIs/MainTickets/all_service_requests.csv')
all_sr_origin

Unnamed: 0,Summary,Issue key,Issue id,Issue Type,Status,Project key,Project name,Project type,Project lead,Project description,...,Comment.18,Comment.19,Comment.20,Comment.21,Comment.22,Comment.23,Comment.24,Comment.25,Comment.26,Comment.27
0,Rehau - Add EDI number Airplus,CI-8574,468635,Expense Delivery,Closed,CI,codebusters,software,rohkamm,,...,,,,,,,,,,
1,Numiga VFL Wolfsburg - Remove a receipt,CI-8568,468149,Expense Delivery,Closed,CI,codebusters,software,rohkamm,,...,,,,,,,,,,
2,oebb: correct perdiem/EMS,CI-8560,467735,Expense Delivery,Closed,CI,codebusters,software,rohkamm,,...,,,,,,,,,,
3,Hensoldt - Update Approval notification mail,CI-8554,467243,Expense Delivery,Closed,CI,codebusters,software,rohkamm,,...,,,,,,,,,,
4,vw-cso: activate bing map in itinerary section,CI-8552,467085,Expense Delivery,Closed,CI,codebusters,software,rohkamm,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
309,Aquila Capital - Expense issue with export and...,CI-6926,393799,Expense Delivery,Closed,CI,codebusters,software,rohkamm,,...,04/Aug/22 10:36 AM;gmalarski;Export path added...,27/Jan/23 7:54 PM;raphose;SR is closed since A...,,,,,,,,
310,VW - Splitting of approval for expense statements,CI-6911,393585,Expense Delivery,Closed,CI,codebusters,software,rohkamm,,...,,,,,,,,,,
311,Porsche - Adjustment of approval process for n...,CI-6901,393357,Expense Delivery,Closed,CI,codebusters,software,rohkamm,,...,,,,,,,,,,
312,Hugo Boss - add 3 new meal receipt types for t...,CI-6825,389774,Expense Delivery,Closed,CI,codebusters,software,rohkamm,,...,,,,,,,,,,


### Select from the dataset only columns necessary for the further analysis

In [5]:
all_sr = all_sr_origin[['Issue key', 'Issue id', 'Status', 'Reporter', 'Created', 'Due Date', 'Resolved', 'Custom field (Customer/s)','Custom field (Type of Request)']]
all_sr.rename({"Custom field (Customer/s)": "Customer", "Custom field (Type of Request)":"Type of Request"},axis=1, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().rename(


In [11]:
all_sr.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 314 entries, 0 to 313
Data columns (total 9 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Issue key        314 non-null    object
 1   Issue id         314 non-null    int64 
 2   Status           314 non-null    object
 3   Reporter         314 non-null    object
 4   Created          314 non-null    object
 5   Due Date         314 non-null    object
 6   Resolved         314 non-null    object
 7   Customer         314 non-null    object
 8   Type of Request  314 non-null    object
dtypes: int64(1), object(8)
memory usage: 22.2+ KB


In [12]:
all_sr.isnull().sum()

Issue key          0
Issue id           0
Status             0
Reporter           0
Created            0
Due Date           0
Resolved           0
Customer           0
Type of Request    0
dtype: int64

### Convert date table to `datetime` object

In [13]:
all_sr["Created"] = pd.to_datetime(all_sr["Created"])
all_sr["Resolved"] = pd.to_datetime(all_sr["Resolved"])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_sr["Created"] = pd.to_datetime(all_sr["Created"])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_sr["Resolved"] = pd.to_datetime(all_sr["Resolved"])


In [14]:
all_sr.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 314 entries, 0 to 313
Data columns (total 9 columns):
 #   Column           Non-Null Count  Dtype         
---  ------           --------------  -----         
 0   Issue key        314 non-null    object        
 1   Issue id         314 non-null    int64         
 2   Status           314 non-null    object        
 3   Reporter         314 non-null    object        
 4   Created          314 non-null    datetime64[ns]
 5   Due Date         314 non-null    object        
 6   Resolved         314 non-null    datetime64[ns]
 7   Customer         314 non-null    object        
 8   Type of Request  314 non-null    object        
dtypes: datetime64[ns](2), int64(1), object(6)
memory usage: 22.2+ KB


**Summary**
- there is not SR ticket without `resolved` date
- there is no `null` value in any of the column
- date type columns converted to `datetime` object in order to make calculucations on these columns

# Prepare data with individuals tasks for service requests

### Import and clean `all_individual_tasks` data set 
- we are not able to downaload full data becuase of jira limitation, therefore we need to combine data from differen periods
- remove unnecessary columns
- replace missing `date` values with 0
- drop rows with "null" values

In [449]:
individual_part_1 = pd.read_csv('../../DataSets/from2022/from20220101_till_2022_05_31.csv')
individual_part_2 = pd.read_csv('../../DataSets/from2022/from_20220601_till_20220831.csv')
individual_part_3 = pd.read_csv('../../DataSets/from2022/from_20220901_till_20221231.csv')
individual_part_4 = pd.read_csv('../../DataSets/from2022/from20230101_till_20230310.csv')
individual_part_5 = pd.read_csv('../../DataSets/from2022/from20230311_till_20230417.csv')

all_individual_tasks_combined = pd.concat([individual_part_1, individual_part_2, individual_part_3, individual_part_4, individual_part_5], axis=0)
all_individual_tasks_combined

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


Unnamed: 0,Summary,Issue key,Issue id,Parent id,Issue Type,Status,Project key,Project name,Project type,Project lead,...,Custom field (cytric Stability Findings),Comment,Comment.1,Component/s,Custom field (Customer/s),Sprint.35,Sprint.36,Custom field (Priority Index),Comment.2,Comment.3
0,Import in Production,CI-10217127,394675,373630,Individual Task,Done,CI,codebusters,software,rohkamm,...,,,,,,,,,,
1,Communicate,CI-10217126,394671,389490,Individual Task,Done,CI,codebusters,software,rohkamm,...,,,,,,,,,,
2,Communicate,CI-10217125,394670,393625,Individual Task,Done,CI,codebusters,software,rohkamm,...,,,,,,,,,,
3,Configure,CI-10217121,394664,389982,Individual Task,Done,CI,codebusters,software,rohkamm,...,,,,,,,,,,
4,Configure,CI-10217120,394663,370941,Individual Task,Done,CI,codebusters,software,rohkamm,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1197,Verify,CI-10235131,455228,447787,Individual Task,Done,CI,codebusters,software,rohkamm,...,,,,,,,,,,
1198,Import in Production,CI-10235129,455216,441299,Individual Task,Done,CI,codebusters,software,rohkamm,...,,,,,,,,,,
1199,Import to Customer Test System,CI-10235128,455211,452464,Individual Task,Done,CI,codebusters,software,rohkamm,...,,,,,,,,,,
1200,Review,CI-10235127,455210,446917,Individual Task,Done,CI,codebusters,software,rohkamm,...,,,,,,,,,,


Select only columns needed for further calculations

In [450]:
all_individual_tasks_2022_origin = all_individual_tasks_combined[['Summary','Issue id', 'Parent id', 'Created', 'Resolved']]
all_individual_tasks_2022 = all_individual_tasks_2022_origin

Drop columns with missing values

In [451]:
# all_individual_tasks_2022.dropna(inplace=True)

### Convert date like columns to `datetime` object

In [452]:
all_individual_tasks_2022["Created"] = pd.to_datetime(all_individual_tasks_2022["Created"])
all_individual_tasks_2022["Resolved"] = pd.to_datetime(all_individual_tasks_2022["Resolved"])
all_individual_tasks_2022.info()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_individual_tasks_2022["Created"] = pd.to_datetime(all_individual_tasks_2022["Created"])


<class 'pandas.core.frame.DataFrame'>
Int64Index: 14409 entries, 0 to 1201
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   Summary    14409 non-null  object        
 1   Issue id   14409 non-null  int64         
 2   Parent id  14409 non-null  int64         
 3   Created    14409 non-null  datetime64[ns]
 4   Resolved   13953 non-null  datetime64[ns]
dtypes: datetime64[ns](2), int64(2), object(1)
memory usage: 675.4+ KB


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_individual_tasks_2022["Resolved"] = pd.to_datetime(all_individual_tasks_2022["Resolved"])


In [453]:
all_individual_tasks_2022['Summary'].value_counts()

Review                            2571
Configure                         2539
Verify                            2253
Communicate                       2048
Import to Test System             1859
Import in Production              1756
Import to Customer Test System     639
Explain                            350
Import to Staging System           161
Clarify                            155
Do                                  65
Maciej Training                      2
Michał Training                      2
Marcin Training                      2
Training - Maciej                    1
Bartosz Training                     1
configure 1                          1
configure 2                          1
Training - Marcin                    1
Import to Test/Staging               1
CLONE - Review                       1
Name: Summary, dtype: int64

In [454]:
all_individual_tasks_2022.isnull().sum()

Summary        0
Issue id       0
Parent id      0
Created        0
Resolved     456
dtype: int64

### Daterminate how long each ticket was in particular individual task

- calculate time for eacch subtask
- calculate how long the ticket was in review, configure, verify, import.., communicate steps
- calcualte the total eg. sum all time spend in verify for one ticket


In [455]:
all_individual_tasks_2022

Unnamed: 0,Summary,Issue id,Parent id,Created,Resolved
0,Import in Production,394675,373630,2022-05-30 22:30:00,2022-06-21 13:17:00
1,Communicate,394671,389490,2022-05-30 20:27:00,2022-05-31 09:59:00
2,Communicate,394670,393625,2022-05-30 20:27:00,2022-05-31 09:04:00
3,Configure,394664,389982,2022-05-30 18:01:00,2022-05-31 14:22:00
4,Configure,394663,370941,2022-05-30 17:43:00,2022-06-17 13:58:00
...,...,...,...,...,...
1197,Verify,455228,447787,2023-03-13 07:41:00,2023-03-13 07:42:00
1198,Import in Production,455216,441299,2023-03-13 05:38:00,2023-04-04 16:32:00
1199,Import to Customer Test System,455211,452464,2023-03-13 04:54:00,2023-03-15 07:48:00
1200,Review,455210,446917,2023-03-13 04:14:00,2023-03-13 10:41:00


#### Calculate time from creation till ticket was resolved

In [456]:
all_individual_tasks_2022_time_calculation = all_individual_tasks_2022
all_individual_tasks_2022_time_calculation["time_in"] = all_individual_tasks_2022_time_calculation["Resolved"] - all_individual_tasks_2022_time_calculation["Created"]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_individual_tasks_2022_time_calculation["time_in"] = all_individual_tasks_2022_time_calculation["Resolved"] - all_individual_tasks_2022_time_calculation["Created"]


In [457]:
all_individual_tasks_2022_time_calculation.loc[all_individual_tasks_2022_time_calculation["Parent id"] == 438382]

Unnamed: 0,Summary,Issue id,Parent id,Created,Resolved,time_in
25,Review,438383,438382,2022-12-29 14:47:00,2023-01-04 11:44:00,5 days 20:57:00
2070,Import in Production,439047,438382,2023-01-04 11:44:00,NaT,NaT
2071,Configure,439046,438382,2023-01-04 11:44:00,2023-01-04 11:44:00,0 days 00:00:00
289,Communicate,463676,438382,2023-04-06 15:26:00,2023-04-06 15:26:00,0 days 00:00:00


In [458]:
all_individual_tasks_2022_time_calculation["Summary"].value_counts()

Review                            2571
Configure                         2539
Verify                            2253
Communicate                       2048
Import to Test System             1859
Import in Production              1756
Import to Customer Test System     639
Explain                            350
Import to Staging System           161
Clarify                            155
Do                                  65
Maciej Training                      2
Michał Training                      2
Marcin Training                      2
Training - Maciej                    1
Bartosz Training                     1
configure 1                          1
configure 2                          1
Training - Marcin                    1
Import to Test/Staging               1
CLONE - Review                       1
Name: Summary, dtype: int64

### Calculate time ticket was in review step

In [459]:
all_individual_tasks_reivew_time = all_individual_tasks_2022[all_individual_tasks_2022['Summary'] == 'Review']
all_individual_tasks_reivew_time

Unnamed: 0,Summary,Issue id,Parent id,Created,Resolved,time_in
5,Review,394662,370941,2022-05-30 17:42:00,2022-05-30 17:43:00,0 days 00:01:00
49,Review,394479,394478,2022-05-30 10:37:00,2022-06-01 15:58:00,2 days 05:21:00
70,Review,394391,394390,2022-05-30 04:46:00,2022-05-31 08:28:00,1 days 03:42:00
77,Review,394271,394270,2022-05-27 14:42:00,2022-05-27 16:12:00,0 days 01:30:00
89,Review,394220,394219,2022-05-27 13:03:00,2022-05-27 13:59:00,0 days 00:56:00
...,...,...,...,...,...,...
1145,Review,455595,455594,2023-03-13 18:39:00,2023-03-14 08:33:00,0 days 13:54:00
1147,Review,455586,455585,2023-03-13 17:31:00,2023-03-15 09:33:00,1 days 16:02:00
1179,Review,455382,455381,2023-03-13 13:50:00,2023-03-29 11:17:00,15 days 21:27:00
1192,Review,455294,449351,2023-03-13 09:23:00,2023-04-13 17:13:00,31 days 07:50:00


In [460]:
all_individual_tasks_reivew_time.isnull().sum()

Summary        0
Issue id       0
Parent id      0
Created        0
Resolved     155
time_in      155
dtype: int64

Drop rows with `null` in `resolved` columns

In [461]:
all_individual_tasks_reivew_no_null = all_individual_tasks_reivew_time.dropna(subset=['Resolved'])
all_individual_tasks_reivew_no_null.isnull().sum()

Summary      0
Issue id     0
Parent id    0
Created      0
Resolved     0
time_in      0
dtype: int64

Group rows by `Parent id` to get all subtaks for one ticekt

In [462]:
all_individual_tasks_reivew_no_null.groupby("Parent id")["time_in"].sum()

Parent id
174802     0 days 00:01:00
329036     0 days 00:00:00
336670     0 days 00:01:00
338553   238 days 13:36:00
345462     8 days 01:23:00
                ...       
464933     0 days 17:32:00
464936     0 days 18:52:00
464974     0 days 00:00:00
464979     0 days 00:08:00
465008     0 days 00:00:00
Name: time_in, Length: 2117, dtype: timedelta64[ns]

In [463]:
type(all_individual_tasks_reivew_no_null)

pandas.core.frame.DataFrame

In [464]:
all_individual_tasks_reivew_no_null.rename({"time_in": "time_in_review"},axis=1, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().rename(


In [465]:
all_individual_tasks_reivew_no_null[['Parent id', 'time_in_review']]

Unnamed: 0,Parent id,time_in_review
5,370941,0 days 00:01:00
49,394478,2 days 05:21:00
70,394390,1 days 03:42:00
77,394270,0 days 01:30:00
89,394219,0 days 00:56:00
...,...,...
1145,455594,0 days 13:54:00
1147,455585,1 days 16:02:00
1179,455381,15 days 21:27:00
1192,449351,31 days 07:50:00


In [466]:
all_tickets_with_review_time = pd.merge(all_tickets_without_null, all_individual_tasks_reivew_no_null[['Parent id', 'time_in_review']], how="left",left_on = "Issue id",right_on = "Parent id")
all_tickets_with_review_time

Unnamed: 0,Issue key,Issue id,Status,Reporter,Created,Due Date,Resolved,Customer,Type of Request,Parent id,time_in_review
0,CI-8485,464979,Closed,yutaka,2023-04-14 02:51:00,21/Apr/23 12:00 AM,2023-04-17 11:31:00,Enercon,Service Request (SR),464979.0,0 days 00:08:00
1,CI-8458,464220,Closed,ssanz,2023-04-11 14:18:00,12/Apr/23 12:00 AM,2023-04-14 09:34:00,TUI AG,Incident Request (IR),464220.0,0 days 01:17:00
2,CI-8453,463947,Closed,ssanz,2023-04-10 12:13:00,14/Apr/23 12:00 AM,2023-04-11 16:22:00,Allianz,Service Request (SR),463947.0,1 days 04:09:00
3,CI-8452,463886,Closed,kgostynska,2023-04-07 15:59:00,11/Apr/23 12:00 AM,2023-04-12 15:17:00,Porsche,Incident Request (IR),463886.0,3 days 23:42:00
4,CI-8423,462678,Closed,nsanchez,2023-04-04 10:51:00,10/Apr/23 12:00 AM,2023-04-10 10:48:00,Allianz,Service Request (SR),462678.0,1 days 22:09:00
...,...,...,...,...,...,...,...,...,...,...,...
152,CI-7795,432002,Closed,nsanchez,2022-12-01 13:05:00,15/Dec/22 12:00 AM,2023-02-13 10:17:00,Amadeus,Service Request (SR),432002.0,17 days 19:53:00
153,CI-7794,431986,Closed,skarja,2022-12-01 12:38:00,08/Dec/22 12:00 AM,2022-12-20 14:44:00,Deutschlandradio,Service Request (SR),431986.0,4 days 01:37:00
154,CI-7792,431942,Closed,ssanz,2022-12-01 11:33:00,08/Dec/22 12:00 AM,2022-12-01 14:33:00,Allianz,Incident Request (IR),431942.0,0 days 02:40:00
155,CI-7789,431819,Closed,yutaka,2022-12-01 05:06:00,08/Dec/22 12:00 AM,2022-12-09 04:15:00,Ineos,Service Request (SR),431819.0,4 days 04:50:00


In [467]:
all_tickets_with_review_time.shape

(157, 11)

### Determinate time spent on codebusters

- calculate the total time ticket was in subtasks "Configure", "Review", "mport to Test System", "Import in Production", "Import to Customer Test System"
- apply the results to new dataset based on `all_tickets_2022`

In [468]:
all_individual_with_codebusters_time = all_individual_tasks_2022_time_calculation.loc[all_individual_tasks_2022_time_calculation["Summary"].isin(["Configure", "Review", "Import to Test System", "Import in Production", "Import to Customer Test System", "Import to Test", "Import to Staging System"])]

Summarize time for each codebusters subtask 

In [469]:
codebusters_time = all_individual_with_codebusters_time.groupby("Parent id")["time_in"].sum()

In [470]:
codebusters_time.rename({"time_in": "time_on_codebusters"},axis=1, inplace=True)

In [471]:
codebusters_time

Parent id
174802    0 days 00:01:00
327573   16 days 00:25:00
329036   53 days 01:03:00
330489    0 days 01:04:00
330742   14 days 03:15:00
               ...       
464936    0 days 18:52:00
464974    0 days 00:08:00
464979    0 days 01:47:00
465008    2 days 21:36:00
465050    0 days 00:00:00
Name: time_in, Length: 2336, dtype: timedelta64[ns]

In [472]:
codebusters_time.isnull().sum()

0

In [473]:
all_tickets.shape

(166, 9)

In [474]:
all_tickets.isnull().sum()

Issue key           0
Issue id            0
Status              0
Reporter            0
Created             0
Due Date            0
Resolved           36
Customer            0
Type of Request     0
dtype: int64

Merge total time on codebusters with all jira ticekts dataset

In [475]:
all_tickets_with_codebusters_time = pd.merge(all_tickets_without_null, codebusters_time, how="left",left_on = "Issue id",right_on = "Parent id")

In [476]:
all_tickets_with_codebusters_time

Unnamed: 0,Issue key,Issue id,Status,Reporter,Created,Due Date,Resolved,Customer,Type of Request,time_in
0,CI-8485,464979,Closed,yutaka,2023-04-14 02:51:00,21/Apr/23 12:00 AM,2023-04-17 11:31:00,Enercon,Service Request (SR),0 days 01:47:00
1,CI-8458,464220,Closed,ssanz,2023-04-11 14:18:00,12/Apr/23 12:00 AM,2023-04-14 09:34:00,TUI AG,Incident Request (IR),2 days 17:20:00
2,CI-8453,463947,Closed,ssanz,2023-04-10 12:13:00,14/Apr/23 12:00 AM,2023-04-11 16:22:00,Allianz,Service Request (SR),1 days 04:09:00
3,CI-8452,463886,Closed,kgostynska,2023-04-07 15:59:00,11/Apr/23 12:00 AM,2023-04-12 15:17:00,Porsche,Incident Request (IR),4 days 02:10:00
4,CI-8423,462678,Closed,nsanchez,2023-04-04 10:51:00,10/Apr/23 12:00 AM,2023-04-10 10:48:00,Allianz,Service Request (SR),2 days 04:40:00
...,...,...,...,...,...,...,...,...,...,...
125,CI-7795,432002,Closed,nsanchez,2022-12-01 13:05:00,15/Dec/22 12:00 AM,2023-02-13 10:17:00,Amadeus,Service Request (SR),73 days 20:50:00
126,CI-7794,431986,Closed,skarja,2022-12-01 12:38:00,08/Dec/22 12:00 AM,2022-12-20 14:44:00,Deutschlandradio,Service Request (SR),8 days 14:11:00
127,CI-7792,431942,Closed,ssanz,2022-12-01 11:33:00,08/Dec/22 12:00 AM,2022-12-01 14:33:00,Allianz,Incident Request (IR),0 days 02:41:00
128,CI-7789,431819,Closed,yutaka,2022-12-01 05:06:00,08/Dec/22 12:00 AM,2022-12-09 04:15:00,Ineos,Service Request (SR),5 days 04:12:00


In [477]:
all_tickets_with_codebusters_time.rename({"time_in": "time_on_codebusters"},axis=1, inplace=True)
all_tickets_with_codebusters_time

Unnamed: 0,Issue key,Issue id,Status,Reporter,Created,Due Date,Resolved,Customer,Type of Request,time_on_codebusters
0,CI-8485,464979,Closed,yutaka,2023-04-14 02:51:00,21/Apr/23 12:00 AM,2023-04-17 11:31:00,Enercon,Service Request (SR),0 days 01:47:00
1,CI-8458,464220,Closed,ssanz,2023-04-11 14:18:00,12/Apr/23 12:00 AM,2023-04-14 09:34:00,TUI AG,Incident Request (IR),2 days 17:20:00
2,CI-8453,463947,Closed,ssanz,2023-04-10 12:13:00,14/Apr/23 12:00 AM,2023-04-11 16:22:00,Allianz,Service Request (SR),1 days 04:09:00
3,CI-8452,463886,Closed,kgostynska,2023-04-07 15:59:00,11/Apr/23 12:00 AM,2023-04-12 15:17:00,Porsche,Incident Request (IR),4 days 02:10:00
4,CI-8423,462678,Closed,nsanchez,2023-04-04 10:51:00,10/Apr/23 12:00 AM,2023-04-10 10:48:00,Allianz,Service Request (SR),2 days 04:40:00
...,...,...,...,...,...,...,...,...,...,...
125,CI-7795,432002,Closed,nsanchez,2022-12-01 13:05:00,15/Dec/22 12:00 AM,2023-02-13 10:17:00,Amadeus,Service Request (SR),73 days 20:50:00
126,CI-7794,431986,Closed,skarja,2022-12-01 12:38:00,08/Dec/22 12:00 AM,2022-12-20 14:44:00,Deutschlandradio,Service Request (SR),8 days 14:11:00
127,CI-7792,431942,Closed,ssanz,2022-12-01 11:33:00,08/Dec/22 12:00 AM,2022-12-01 14:33:00,Allianz,Incident Request (IR),0 days 02:41:00
128,CI-7789,431819,Closed,yutaka,2022-12-01 05:06:00,08/Dec/22 12:00 AM,2022-12-09 04:15:00,Ineos,Service Request (SR),5 days 04:12:00


In [478]:
all_tickets_with_codebusters_time.notnull().sum()

Issue key              130
Issue id               130
Status                 130
Reporter               130
Created                130
Due Date               130
Resolved               130
Customer               130
Type of Request        130
time_on_codebusters    130
dtype: int64

### Determinate time spent outside codebusters

- calculate the total time ticket was in one of the subtask "Explain", "Verify", "Clarify", "Communicate", "Import in Production", "Import to Customer Test System"
- apply the results to new dataset based on `all_tickets_2022`

In [479]:
all_individual_with_others_time = all_individual_tasks_2022_time_calculation.loc[all_individual_tasks_2022_time_calculation["Summary"].isin(["Explain", "Verify", "Clarify", "Communicate"])]

In [480]:
all_individual_with_others_time

Unnamed: 0,Summary,Issue id,Parent id,Created,Resolved,time_in
1,Communicate,394671,389490,2022-05-30 20:27:00,2022-05-31 09:59:00,0 days 13:32:00
2,Communicate,394670,393625,2022-05-30 20:27:00,2022-05-31 09:04:00,0 days 12:37:00
6,Communicate,394656,370941,2022-05-30 17:01:00,2022-05-30 17:02:00,0 days 00:01:00
8,Verify,394654,370941,2022-05-30 17:01:00,2022-05-30 17:01:00,0 days 00:00:00
10,Verify,394652,370941,2022-05-30 17:01:00,2022-05-30 17:01:00,0 days 00:00:00
...,...,...,...,...,...,...
1181,Communicate,455378,446917,2023-03-13 13:33:00,2023-03-22 00:36:00,8 days 11:03:00
1187,Verify,455361,443331,2023-03-13 12:29:00,2023-03-13 12:29:00,0 days 00:00:00
1193,Communicate,455291,453008,2023-03-13 09:21:00,2023-03-13 09:49:00,0 days 00:28:00
1194,Verify,455269,453010,2023-03-13 08:57:00,2023-03-13 14:22:00,0 days 05:25:00


Summarize time for each subtask outside codeubsters

In [481]:
others_time = all_individual_with_others_time.groupby("Parent id")["time_in"].sum()

Summarize time for each subtask outside codeubsters

Merge `all_tickets` with time outisde codebusters column

In [482]:
all_tickets_with_outside_codebusters_time = pd.merge(all_tickets_without_null, others_time, how="left",left_on = "Issue id",right_on = "Parent id")

In [483]:
all_tickets_with_outside_codebusters_time.rename({"time_in": "time_outside_codebusters"},axis=1, inplace=True)
all_tickets_with_outside_codebusters_time

Unnamed: 0,Issue key,Issue id,Status,Reporter,Created,Due Date,Resolved,Customer,Type of Request,time_outside_codebusters
0,CI-8485,464979,Closed,yutaka,2023-04-14 02:51:00,21/Apr/23 12:00 AM,2023-04-17 11:31:00,Enercon,Service Request (SR),3 days 06:53:00
1,CI-8458,464220,Closed,ssanz,2023-04-11 14:18:00,12/Apr/23 12:00 AM,2023-04-14 09:34:00,TUI AG,Incident Request (IR),0 days 01:56:00
2,CI-8453,463947,Closed,ssanz,2023-04-10 12:13:00,14/Apr/23 12:00 AM,2023-04-11 16:22:00,Allianz,Service Request (SR),0 days 00:00:00
3,CI-8452,463886,Closed,kgostynska,2023-04-07 15:59:00,11/Apr/23 12:00 AM,2023-04-12 15:17:00,Porsche,Incident Request (IR),0 days 21:08:00
4,CI-8423,462678,Closed,nsanchez,2023-04-04 10:51:00,10/Apr/23 12:00 AM,2023-04-10 10:48:00,Allianz,Service Request (SR),3 days 19:17:00
...,...,...,...,...,...,...,...,...,...,...
125,CI-7795,432002,Closed,nsanchez,2022-12-01 13:05:00,15/Dec/22 12:00 AM,2023-02-13 10:17:00,Amadeus,Service Request (SR),0 days 00:21:00
126,CI-7794,431986,Closed,skarja,2022-12-01 12:38:00,08/Dec/22 12:00 AM,2022-12-20 14:44:00,Deutschlandradio,Service Request (SR),10 days 11:55:00
127,CI-7792,431942,Closed,ssanz,2022-12-01 11:33:00,08/Dec/22 12:00 AM,2022-12-01 14:33:00,Allianz,Incident Request (IR),0 days 00:19:00
128,CI-7789,431819,Closed,yutaka,2022-12-01 05:06:00,08/Dec/22 12:00 AM,2022-12-09 04:15:00,Ineos,Service Request (SR),2 days 18:57:00


### Merge all dataframes together

Merge df containing "review" time with df containg "time on codebusters" 

In [484]:
all_tickets_with_time = pd.merge(all_tickets_with_review_time, all_tickets_with_codebusters_time[['Issue id','time_on_codebusters']],  how="left",left_on = "Issue id",right_on = "Issue id")

Merge df containing "review" and "time on codebusters" time with df containg "time outside codebusters" 

In [485]:
all_tickets_with_time = pd.merge(all_tickets_with_time, all_tickets_with_outside_codebusters_time[['Issue id','time_outside_codebusters']],  how="left",left_on = "Issue id",right_on = "Issue id")

In [486]:
all_tickets_with_time

Unnamed: 0,Issue key,Issue id,Status,Reporter,Created,Due Date,Resolved,Customer,Type of Request,Parent id,time_in_review,time_on_codebusters,time_outside_codebusters
0,CI-8485,464979,Closed,yutaka,2023-04-14 02:51:00,21/Apr/23 12:00 AM,2023-04-17 11:31:00,Enercon,Service Request (SR),464979.0,0 days 00:08:00,0 days 01:47:00,3 days 06:53:00
1,CI-8458,464220,Closed,ssanz,2023-04-11 14:18:00,12/Apr/23 12:00 AM,2023-04-14 09:34:00,TUI AG,Incident Request (IR),464220.0,0 days 01:17:00,2 days 17:20:00,0 days 01:56:00
2,CI-8453,463947,Closed,ssanz,2023-04-10 12:13:00,14/Apr/23 12:00 AM,2023-04-11 16:22:00,Allianz,Service Request (SR),463947.0,1 days 04:09:00,1 days 04:09:00,0 days 00:00:00
3,CI-8452,463886,Closed,kgostynska,2023-04-07 15:59:00,11/Apr/23 12:00 AM,2023-04-12 15:17:00,Porsche,Incident Request (IR),463886.0,3 days 23:42:00,4 days 02:10:00,0 days 21:08:00
4,CI-8423,462678,Closed,nsanchez,2023-04-04 10:51:00,10/Apr/23 12:00 AM,2023-04-10 10:48:00,Allianz,Service Request (SR),462678.0,1 days 22:09:00,2 days 04:40:00,3 days 19:17:00
...,...,...,...,...,...,...,...,...,...,...,...,...,...
152,CI-7795,432002,Closed,nsanchez,2022-12-01 13:05:00,15/Dec/22 12:00 AM,2023-02-13 10:17:00,Amadeus,Service Request (SR),432002.0,17 days 19:53:00,73 days 20:50:00,0 days 00:21:00
153,CI-7794,431986,Closed,skarja,2022-12-01 12:38:00,08/Dec/22 12:00 AM,2022-12-20 14:44:00,Deutschlandradio,Service Request (SR),431986.0,4 days 01:37:00,8 days 14:11:00,10 days 11:55:00
154,CI-7792,431942,Closed,ssanz,2022-12-01 11:33:00,08/Dec/22 12:00 AM,2022-12-01 14:33:00,Allianz,Incident Request (IR),431942.0,0 days 02:40:00,0 days 02:41:00,0 days 00:19:00
155,CI-7789,431819,Closed,yutaka,2022-12-01 05:06:00,08/Dec/22 12:00 AM,2022-12-09 04:15:00,Ineos,Service Request (SR),431819.0,4 days 04:50:00,5 days 04:12:00,2 days 18:57:00


### Convert `time in..` columns to minutes

Convert `timedelta` columns to float

In [487]:
all_tickets_with_time_converted = all_tickets_with_time
all_tickets_with_time_converted.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 157 entries, 0 to 156
Data columns (total 13 columns):
 #   Column                    Non-Null Count  Dtype          
---  ------                    --------------  -----          
 0   Issue key                 157 non-null    object         
 1   Issue id                  157 non-null    int64          
 2   Status                    157 non-null    object         
 3   Reporter                  157 non-null    object         
 4   Created                   157 non-null    datetime64[ns] 
 5   Due Date                  157 non-null    object         
 6   Resolved                  157 non-null    datetime64[ns] 
 7   Customer                  157 non-null    object         
 8   Type of Request           157 non-null    object         
 9   Parent id                 152 non-null    float64        
 10  time_in_review            152 non-null    timedelta64[ns]
 11  time_on_codebusters       157 non-null    timedelta64[ns]
 12  time_out

Convert `time in..` columns to minutes

In [488]:
all_tickets_with_time_converted['time_in_review'] = all_tickets_with_time_converted['time_in_review'].dt.total_seconds().div(60)
all_tickets_with_time_converted

Unnamed: 0,Issue key,Issue id,Status,Reporter,Created,Due Date,Resolved,Customer,Type of Request,Parent id,time_in_review,time_on_codebusters,time_outside_codebusters
0,CI-8485,464979,Closed,yutaka,2023-04-14 02:51:00,21/Apr/23 12:00 AM,2023-04-17 11:31:00,Enercon,Service Request (SR),464979.0,8.0,0 days 01:47:00,3 days 06:53:00
1,CI-8458,464220,Closed,ssanz,2023-04-11 14:18:00,12/Apr/23 12:00 AM,2023-04-14 09:34:00,TUI AG,Incident Request (IR),464220.0,77.0,2 days 17:20:00,0 days 01:56:00
2,CI-8453,463947,Closed,ssanz,2023-04-10 12:13:00,14/Apr/23 12:00 AM,2023-04-11 16:22:00,Allianz,Service Request (SR),463947.0,1689.0,1 days 04:09:00,0 days 00:00:00
3,CI-8452,463886,Closed,kgostynska,2023-04-07 15:59:00,11/Apr/23 12:00 AM,2023-04-12 15:17:00,Porsche,Incident Request (IR),463886.0,5742.0,4 days 02:10:00,0 days 21:08:00
4,CI-8423,462678,Closed,nsanchez,2023-04-04 10:51:00,10/Apr/23 12:00 AM,2023-04-10 10:48:00,Allianz,Service Request (SR),462678.0,2769.0,2 days 04:40:00,3 days 19:17:00
...,...,...,...,...,...,...,...,...,...,...,...,...,...
152,CI-7795,432002,Closed,nsanchez,2022-12-01 13:05:00,15/Dec/22 12:00 AM,2023-02-13 10:17:00,Amadeus,Service Request (SR),432002.0,25673.0,73 days 20:50:00,0 days 00:21:00
153,CI-7794,431986,Closed,skarja,2022-12-01 12:38:00,08/Dec/22 12:00 AM,2022-12-20 14:44:00,Deutschlandradio,Service Request (SR),431986.0,5857.0,8 days 14:11:00,10 days 11:55:00
154,CI-7792,431942,Closed,ssanz,2022-12-01 11:33:00,08/Dec/22 12:00 AM,2022-12-01 14:33:00,Allianz,Incident Request (IR),431942.0,160.0,0 days 02:41:00,0 days 00:19:00
155,CI-7789,431819,Closed,yutaka,2022-12-01 05:06:00,08/Dec/22 12:00 AM,2022-12-09 04:15:00,Ineos,Service Request (SR),431819.0,6050.0,5 days 04:12:00,2 days 18:57:00


In [489]:
all_tickets_with_time_converted['time_on_codebusters'] = all_tickets_with_time_converted['time_on_codebusters'].dt.total_seconds().div(60)
all_tickets_with_time_converted

Unnamed: 0,Issue key,Issue id,Status,Reporter,Created,Due Date,Resolved,Customer,Type of Request,Parent id,time_in_review,time_on_codebusters,time_outside_codebusters
0,CI-8485,464979,Closed,yutaka,2023-04-14 02:51:00,21/Apr/23 12:00 AM,2023-04-17 11:31:00,Enercon,Service Request (SR),464979.0,8.0,107.0,3 days 06:53:00
1,CI-8458,464220,Closed,ssanz,2023-04-11 14:18:00,12/Apr/23 12:00 AM,2023-04-14 09:34:00,TUI AG,Incident Request (IR),464220.0,77.0,3920.0,0 days 01:56:00
2,CI-8453,463947,Closed,ssanz,2023-04-10 12:13:00,14/Apr/23 12:00 AM,2023-04-11 16:22:00,Allianz,Service Request (SR),463947.0,1689.0,1689.0,0 days 00:00:00
3,CI-8452,463886,Closed,kgostynska,2023-04-07 15:59:00,11/Apr/23 12:00 AM,2023-04-12 15:17:00,Porsche,Incident Request (IR),463886.0,5742.0,5890.0,0 days 21:08:00
4,CI-8423,462678,Closed,nsanchez,2023-04-04 10:51:00,10/Apr/23 12:00 AM,2023-04-10 10:48:00,Allianz,Service Request (SR),462678.0,2769.0,3160.0,3 days 19:17:00
...,...,...,...,...,...,...,...,...,...,...,...,...,...
152,CI-7795,432002,Closed,nsanchez,2022-12-01 13:05:00,15/Dec/22 12:00 AM,2023-02-13 10:17:00,Amadeus,Service Request (SR),432002.0,25673.0,106370.0,0 days 00:21:00
153,CI-7794,431986,Closed,skarja,2022-12-01 12:38:00,08/Dec/22 12:00 AM,2022-12-20 14:44:00,Deutschlandradio,Service Request (SR),431986.0,5857.0,12371.0,10 days 11:55:00
154,CI-7792,431942,Closed,ssanz,2022-12-01 11:33:00,08/Dec/22 12:00 AM,2022-12-01 14:33:00,Allianz,Incident Request (IR),431942.0,160.0,161.0,0 days 00:19:00
155,CI-7789,431819,Closed,yutaka,2022-12-01 05:06:00,08/Dec/22 12:00 AM,2022-12-09 04:15:00,Ineos,Service Request (SR),431819.0,6050.0,7452.0,2 days 18:57:00


In [490]:
all_tickets_with_time_converted['time_outside_codebusters'] = all_tickets_with_time['time_outside_codebusters'].dt.total_seconds().div(60)
all_tickets_with_time_converted

Unnamed: 0,Issue key,Issue id,Status,Reporter,Created,Due Date,Resolved,Customer,Type of Request,Parent id,time_in_review,time_on_codebusters,time_outside_codebusters
0,CI-8485,464979,Closed,yutaka,2023-04-14 02:51:00,21/Apr/23 12:00 AM,2023-04-17 11:31:00,Enercon,Service Request (SR),464979.0,8.0,107.0,4733.0
1,CI-8458,464220,Closed,ssanz,2023-04-11 14:18:00,12/Apr/23 12:00 AM,2023-04-14 09:34:00,TUI AG,Incident Request (IR),464220.0,77.0,3920.0,116.0
2,CI-8453,463947,Closed,ssanz,2023-04-10 12:13:00,14/Apr/23 12:00 AM,2023-04-11 16:22:00,Allianz,Service Request (SR),463947.0,1689.0,1689.0,0.0
3,CI-8452,463886,Closed,kgostynska,2023-04-07 15:59:00,11/Apr/23 12:00 AM,2023-04-12 15:17:00,Porsche,Incident Request (IR),463886.0,5742.0,5890.0,1268.0
4,CI-8423,462678,Closed,nsanchez,2023-04-04 10:51:00,10/Apr/23 12:00 AM,2023-04-10 10:48:00,Allianz,Service Request (SR),462678.0,2769.0,3160.0,5477.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
152,CI-7795,432002,Closed,nsanchez,2022-12-01 13:05:00,15/Dec/22 12:00 AM,2023-02-13 10:17:00,Amadeus,Service Request (SR),432002.0,25673.0,106370.0,21.0
153,CI-7794,431986,Closed,skarja,2022-12-01 12:38:00,08/Dec/22 12:00 AM,2022-12-20 14:44:00,Deutschlandradio,Service Request (SR),431986.0,5857.0,12371.0,15115.0
154,CI-7792,431942,Closed,ssanz,2022-12-01 11:33:00,08/Dec/22 12:00 AM,2022-12-01 14:33:00,Allianz,Incident Request (IR),431942.0,160.0,161.0,19.0
155,CI-7789,431819,Closed,yutaka,2022-12-01 05:06:00,08/Dec/22 12:00 AM,2022-12-09 04:15:00,Ineos,Service Request (SR),431819.0,6050.0,7452.0,4017.0


In [491]:
all_tickets_with_time_converted.isnull().sum()

Issue key                   0
Issue id                    0
Status                      0
Reporter                    0
Created                     0
Due Date                    0
Resolved                    0
Customer                    0
Type of Request             0
Parent id                   5
time_in_review              5
time_on_codebusters         0
time_outside_codebusters    1
dtype: int64

In [492]:
all_tickets_with_time_converted.dropna(subset=['time_in_review'], how="any", inplace=True)
all_tickets_with_time_converted.isnull().sum()

Issue key                   0
Issue id                    0
Status                      0
Reporter                    0
Created                     0
Due Date                    0
Resolved                    0
Customer                    0
Type of Request             0
Parent id                   0
time_in_review              0
time_on_codebusters         0
time_outside_codebusters    0
dtype: int64

## Save data for KPIs purposes

In [493]:
all_tickets_with_time_converted.to_csv('all_partners.csv', index=False)

### Add a column to all tasks with the latest communicate step date
- count "real" close time, bease on the date when ticket was moved to `communicate` step

In [494]:
all_individual_tasks_2022

Unnamed: 0,Summary,Issue id,Parent id,Created,Resolved,time_in
0,Import in Production,394675,373630,2022-05-30 22:30:00,2022-06-21 13:17:00,21 days 14:47:00
1,Communicate,394671,389490,2022-05-30 20:27:00,2022-05-31 09:59:00,0 days 13:32:00
2,Communicate,394670,393625,2022-05-30 20:27:00,2022-05-31 09:04:00,0 days 12:37:00
3,Configure,394664,389982,2022-05-30 18:01:00,2022-05-31 14:22:00,0 days 20:21:00
4,Configure,394663,370941,2022-05-30 17:43:00,2022-06-17 13:58:00,17 days 20:15:00
...,...,...,...,...,...,...
1197,Verify,455228,447787,2023-03-13 07:41:00,2023-03-13 07:42:00,0 days 00:01:00
1198,Import in Production,455216,441299,2023-03-13 05:38:00,2023-04-04 16:32:00,22 days 10:54:00
1199,Import to Customer Test System,455211,452464,2023-03-13 04:54:00,2023-03-15 07:48:00,2 days 02:54:00
1200,Review,455210,446917,2023-03-13 04:14:00,2023-03-13 10:41:00,0 days 06:27:00


In [495]:
summary_duplicates = all_individual_tasks_2022.duplicated(['Summary', 'Parent id'], keep=False)
all_individual_tasks_2022[summary_duplicates].sort_values("Parent id")

Unnamed: 0,Summary,Issue id,Parent id,Created,Resolved,time_in
448,Verify,392575,327573,2022-05-18 13:20:00,2022-05-18 13:20:00,0 days 00:00:00
2393,Verify,395426,327573,2022-06-03 13:45:00,2022-06-03 13:45:00,0 days 00:00:00
2866,Configure,372862,329036,2022-03-02 13:49:00,2022-04-20 13:03:00,48 days 23:14:00
1200,Communicate,386568,329036,2022-04-25 10:56:00,2022-04-25 10:57:00,0 days 00:01:00
1251,Import in Production,386221,329036,2022-04-21 11:00:00,2022-04-25 10:56:00,3 days 23:56:00
...,...,...,...,...,...,...
165,Import to Test System,464549,464220,2023-04-12 16:19:00,2023-04-12 16:19:00,0 days 00:00:00
229,Verify,464267,464220,2023-04-11 15:37:00,2023-04-11 15:54:00,0 days 00:17:00
227,Configure,464273,464220,2023-04-11 15:54:00,2023-04-12 16:19:00,1 days 00:25:00
13,Verify,465057,464226,2023-04-14 11:56:00,2023-04-14 12:33:00,0 days 00:37:00


In [496]:
all_individual_tasks_2022_communicate = all_individual_tasks_2022[all_individual_tasks_2022["Summary"] == "Communicate"]
all_individual_tasks_2022_communicate.sort_values("Parent id")

Unnamed: 0,Summary,Issue id,Parent id,Created,Resolved,time_in
914,Communicate,432840,316466,2022-12-05 15:32:00,NaT,NaT
3250,Communicate,370798,324525,2022-02-18 20:16:00,NaT,NaT
3251,Communicate,370797,326547,2022-02-18 20:14:00,2022-02-18 20:16:00,0 days 00:02:00
2391,Communicate,395428,327573,2022-06-03 13:45:00,2022-06-03 13:45:00,0 days 00:00:00
1200,Communicate,386568,329036,2022-04-25 10:56:00,2022-04-25 10:57:00,0 days 00:01:00
...,...,...,...,...,...,...
41,Communicate,465003,464220,2023-04-14 09:09:00,2023-04-14 09:34:00,0 days 00:25:00
89,Communicate,464826,464675,2023-04-13 14:21:00,2023-04-17 10:12:00,3 days 19:51:00
81,Communicate,464838,464816,2023-04-13 14:49:00,NaT,NaT
47,Communicate,464986,464974,2023-04-14 04:39:00,NaT,NaT


In [497]:
all_individual_tasks_2022_communicate.nunique()

Summary         1
Issue id     2048
Parent id    1862
Created      1865
Resolved     1743
time_in      1185
dtype: int64

Drop rows with duplicated `Communicate` subtask and keep only the newest one

In [498]:
all_individual_tasks_2022_communicate.sort_values(["Parent id", "Created"])
all_individual_tasks_2022_communicate_no_duplicates = all_individual_tasks_2022_communicate.drop_duplicates(['Summary', 'Parent id'], keep="last")
# all_individual_tasks_2022_communicate_no_duplicates = all_individual_tasks_2022_communicate
all_individual_tasks_2022_communicate_no_duplicates

Unnamed: 0,Summary,Issue id,Parent id,Created,Resolved,time_in
2,Communicate,394670,393625,2022-05-30 20:27:00,2022-05-31 09:04:00,0 days 12:37:00
22,Communicate,394578,394215,2022-05-30 15:27:00,2022-06-03 09:53:00,3 days 18:26:00
27,Communicate,394570,393200,2022-05-30 15:09:00,2022-05-30 17:27:00,0 days 02:18:00
29,Communicate,394565,394093,2022-05-30 15:05:00,2022-06-17 03:13:00,17 days 12:08:00
30,Communicate,394559,392974,2022-05-30 14:43:00,2022-06-07 01:58:00,7 days 11:15:00
...,...,...,...,...,...,...
1137,Communicate,455610,455603,2023-03-14 02:12:00,2023-03-15 00:20:00,0 days 22:08:00
1150,Communicate,455580,450086,2023-03-13 17:05:00,2023-03-17 10:11:00,3 days 17:06:00
1180,Communicate,455379,451967,2023-03-13 13:35:00,2023-03-13 14:13:00,0 days 00:38:00
1181,Communicate,455378,446917,2023-03-13 13:33:00,2023-03-22 00:36:00,8 days 11:03:00


In [499]:
all_tickets_with_time_and_cummunicate_date = pd.merge(all_tickets_with_time_converted, all_individual_tasks_2022_communicate_no_duplicates[['Parent id','Created']], how="left",left_on = "Issue id",right_on = "Parent id").drop(columns=["Parent id"])
all_tickets_with_time_and_cummunicate_date.rename({"Created_y": "comminicate_created"},axis=1, inplace=True)
all_tickets_with_time_and_cummunicate_date

KeyError: "['Parent id'] not found in axis"

Drop rows without `communicate_create`, we are taking in to consideration only completed tickets

In [None]:
all_tickets_2022_with_time_and_cummunicate_date.isnull().sum()

In [None]:
all_tickets_2022_with_time_and_cummunicate_date.nunique()

In [None]:
500 - 145

In [None]:
all_tickets_2022_with_time_and_cummunicate_date.dropna(subset=['comminicate_created'], how="any", inplace=True)

In [None]:
all_tickets_2022_with_time_and_cummunicate_date

**Summary:**
- rows with null value are droped
- we take into consideration only completed tickets

### Save data to csv file

In [None]:
all_tickets_2022_with_time_and_cummunicate_date.to_csv('all_tickets_combined.csv', index=False)

In [None]:
all_individual_tasks_2022_communicate_no_duplicates.nunique()

## Data Analysis

### Compare the size of the created data sets
- compare the size of each datasests, in order to verify v

In [None]:
all_individual_tasks_2022_origin[all_individual_tasks_2022_origin['Parent id'] == 450084]

In [None]:
all_individual_tasks_2022.info()

**Summary:**
- We should receive inside `all_individual_tasks_2022_communicate_no_duplicates` 1372 rows not 1297

# To Do:
- invesitigate discrepency between datasests: `codebusters_time`(1798 rows),  `others_time` (1672 rows), `all_tickets_2022` (1699 rows)

In [None]:
duplucated_communicate = all_individual_tasks_2022_communicate.duplicated(['Summary', 'Parent id'])
all_individual_tasks_2022_communicate[duplucated_communicate]

In [None]:
codebusters_time.shape

In [None]:
others_time.shape

In [None]:
all_tickets.shape

## Investigate different columns with the tickets age

- check the useful KPIs

In [None]:
all_tickets_2022_updated.head(-5)

In [None]:
all_tickets_2022_updated[all_tickets_2022_updated['Issue key'] == 'CI-7689']

In [None]:
all_tickets_2022_updated['age_till_communicate_hours'].describe()

In [None]:
# all_tickets_2022_updated.to_csv('all_tickets_2022_updated.csv')