# Codebusters KPIs and Other Usefull Patterns

## import libraries

In [91]:
import pandas as pd
import numpy as np

Import and clean `all_tickets_2022` data set 
- remove unnecessary columns
- rename columns

# Prepare data with all tickets

### Import all service request

Tickets are taken from the following jira filter:<br/> `project = CI AND issuetype in (standardIssueTypes(), "Expense Delivery") AND "Epic Link" is EMPTY AND "Case Number/s" is not EMPTY AND cf[14125] in ("Service Request (SR)") AND resolved is not EMPTY AND resolutiondate >= 2022-12-19 and resolutiondate <= 2023-01-08`

In [92]:
all_sr_origin = pd.read_csv('../../DataSets/KPIs/MainTickets/all_service_requests.csv')
all_sr_origin

Unnamed: 0,Summary,Issue key,Issue id,Issue Type,Status,Project key,Project name,Project type,Project lead,Project description,...,Comment.18,Comment.19,Comment.20,Comment.21,Comment.22,Comment.23,Comment.24,Comment.25,Comment.26,Comment.27
0,Rehau - Add EDI number Airplus,CI-8574,468635,Expense Delivery,Closed,CI,codebusters,software,rohkamm,,...,,,,,,,,,,
1,Numiga VFL Wolfsburg - Remove a receipt,CI-8568,468149,Expense Delivery,Closed,CI,codebusters,software,rohkamm,,...,,,,,,,,,,
2,oebb: correct perdiem/EMS,CI-8560,467735,Expense Delivery,Closed,CI,codebusters,software,rohkamm,,...,,,,,,,,,,
3,Hensoldt - Update Approval notification mail,CI-8554,467243,Expense Delivery,Closed,CI,codebusters,software,rohkamm,,...,,,,,,,,,,
4,vw-cso: activate bing map in itinerary section,CI-8552,467085,Expense Delivery,Closed,CI,codebusters,software,rohkamm,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
309,Aquila Capital - Expense issue with export and...,CI-6926,393799,Expense Delivery,Closed,CI,codebusters,software,rohkamm,,...,04/Aug/22 10:36 AM;gmalarski;Export path added...,27/Jan/23 7:54 PM;raphose;SR is closed since A...,,,,,,,,
310,VW - Splitting of approval for expense statements,CI-6911,393585,Expense Delivery,Closed,CI,codebusters,software,rohkamm,,...,,,,,,,,,,
311,Porsche - Adjustment of approval process for n...,CI-6901,393357,Expense Delivery,Closed,CI,codebusters,software,rohkamm,,...,,,,,,,,,,
312,Hugo Boss - add 3 new meal receipt types for t...,CI-6825,389774,Expense Delivery,Closed,CI,codebusters,software,rohkamm,,...,,,,,,,,,,


### Select from the dataset only columns necessary for the further analysis

In [93]:
all_sr = all_sr_origin[['Issue key', 'Issue id', 'Status', 'Reporter', 'Created', 'Due Date', 'Resolved', 'Custom field (Customer/s)','Custom field (Type of Request)']]
all_sr.rename({"Custom field (Customer/s)": "Customer", "Custom field (Type of Request)":"Type of Request"},axis=1, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_sr.rename({"Custom field (Customer/s)": "Customer", "Custom field (Type of Request)":"Type of Request"},axis=1, inplace=True)


In [94]:
all_sr.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 314 entries, 0 to 313
Data columns (total 9 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Issue key        314 non-null    object
 1   Issue id         314 non-null    int64 
 2   Status           314 non-null    object
 3   Reporter         314 non-null    object
 4   Created          314 non-null    object
 5   Due Date         314 non-null    object
 6   Resolved         314 non-null    object
 7   Customer         314 non-null    object
 8   Type of Request  314 non-null    object
dtypes: int64(1), object(8)
memory usage: 22.2+ KB


In [95]:
all_sr.isnull().sum()

Issue key          0
Issue id           0
Status             0
Reporter           0
Created            0
Due Date           0
Resolved           0
Customer           0
Type of Request    0
dtype: int64

### Convert date table to `datetime` object

In [96]:
all_sr["Created"] = pd.to_datetime(all_sr["Created"])
all_sr["Resolved"] = pd.to_datetime(all_sr["Resolved"])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_sr["Created"] = pd.to_datetime(all_sr["Created"])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_sr["Resolved"] = pd.to_datetime(all_sr["Resolved"])


In [97]:
all_sr.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 314 entries, 0 to 313
Data columns (total 9 columns):
 #   Column           Non-Null Count  Dtype         
---  ------           --------------  -----         
 0   Issue key        314 non-null    object        
 1   Issue id         314 non-null    int64         
 2   Status           314 non-null    object        
 3   Reporter         314 non-null    object        
 4   Created          314 non-null    datetime64[ns]
 5   Due Date         314 non-null    object        
 6   Resolved         314 non-null    datetime64[ns]
 7   Customer         314 non-null    object        
 8   Type of Request  314 non-null    object        
dtypes: datetime64[ns](2), int64(1), object(6)
memory usage: 22.2+ KB


In [98]:
all_sr.shape

(314, 9)

**Summary**
- there is not SR ticket without `resolved` date
- there is no `null` value in any of the column
- date type columns converted to `datetime` object in order to make calculucations on these columns

# Prepare data with individuals tasks for service requests

## Import and clean `all_individual_tasks` data set 
- we are not able to downaload full data becuase of jira limitation, therefore we need to combine data from differen periods
- remove unnecessary columns
- replace missing `date` values with 0
- drop rows with "null" values

In [99]:
individual_part_1 = pd.read_csv('DataSets/IndividualTasks/from_20220901_till_20221231.csv')
individual_part_2 = pd.read_csv('DataSets/IndividualTasks/from_20230101_till_20230331.csv')
individual_part_3 = pd.read_csv('DataSets/IndividualTasks/from_20230401_till_20230516.csv')



all_individual_tasks_combined = pd.concat([individual_part_1, individual_part_2, individual_part_3], axis=0)
all_individual_tasks_combined

  individual_part_1 = pd.read_csv('../../DataSets/KPIs/IndividualTasks/from_20220901_till_20221231.csv')


Unnamed: 0,Summary,Issue key,Issue id,Parent id,Issue Type,Status,Project key,Project name,Project type,Project lead,...,Custom field (When did it happen?),Custom field (When was it solved?),Custom field (Which customers are affected?),Custom field (arctic Dev Findings),Custom field (arctic QA Findings),Custom field (cytric Stability Findings),Comment,Comment.1,Comment.2,Comment.3
0,Review,CI-10239939,472135,472134,Individual Task,Done,CI,codebusters,software,rohkamm,...,,,,,,,,,,
1,Review,CI-10239935,472112,472111,Individual Task,Done,CI,codebusters,software,rohkamm,...,,,,,,,,,,
2,Import to Test System,CI-10239926,472054,470604,Individual Task,Done,CI,codebusters,software,rohkamm,...,,,,,,,,,,
3,Configure,CI-10239925,472053,470604,Individual Task,Done,CI,codebusters,software,rohkamm,...,,,,,,,,,,
4,Import in Production,CI-10239908,472007,470369,Individual Task,Done,CI,codebusters,software,rohkamm,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
14,Review,CI-10239444,470370,470369,Individual Task,Done,CI,codebusters,software,rohkamm,...,,,,,,,,,,
15,Review,CI-10239415,470330,470329,Individual Task,Done,CI,codebusters,software,rohkamm,...,,,,,,,,,,
16,Communicate,CI-10238747,468182,454791,Individual Task,Done,CI,codebusters,software,rohkamm,...,,,,,,,,,,
17,Communicate,CI-10238742,468160,467976,Individual Task,Done,CI,codebusters,software,rohkamm,...,,,,,,,,,,


### Reset indexex of a combined dataset </br>
Indexes of a new dataset are not appropriate, we need to apply `reset_index` method in order to match them with the number of rows

In [100]:
all_individual_tasks_combined.reset_index(inplace=True)
all_individual_tasks_combined

Unnamed: 0,index,Summary,Issue key,Issue id,Parent id,Issue Type,Status,Project key,Project name,Project type,...,Custom field (When did it happen?),Custom field (When was it solved?),Custom field (Which customers are affected?),Custom field (arctic Dev Findings),Custom field (arctic QA Findings),Custom field (cytric Stability Findings),Comment,Comment.1,Comment.2,Comment.3
0,0,Review,CI-10239939,472135,472134,Individual Task,Done,CI,codebusters,software,...,,,,,,,,,,
1,1,Review,CI-10239935,472112,472111,Individual Task,Done,CI,codebusters,software,...,,,,,,,,,,
2,2,Import to Test System,CI-10239926,472054,470604,Individual Task,Done,CI,codebusters,software,...,,,,,,,,,,
3,3,Configure,CI-10239925,472053,470604,Individual Task,Done,CI,codebusters,software,...,,,,,,,,,,
4,4,Import in Production,CI-10239908,472007,470369,Individual Task,Done,CI,codebusters,software,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6104,14,Review,CI-10239444,470370,470369,Individual Task,Done,CI,codebusters,software,...,,,,,,,,,,
6105,15,Review,CI-10239415,470330,470329,Individual Task,Done,CI,codebusters,software,...,,,,,,,,,,
6106,16,Communicate,CI-10238747,468182,454791,Individual Task,Done,CI,codebusters,software,...,,,,,,,,,,
6107,17,Communicate,CI-10238742,468160,467976,Individual Task,Done,CI,codebusters,software,...,,,,,,,,,,


In [101]:
all_individual_tasks_combined.to_csv("all_individual_tasks_combined.csv")

### Filter columns and select only the ones needed for further calculations

In [102]:
all_individual_tasks = all_individual_tasks_combined[['Summary','Issue id', 'Parent id', 'Created', 'Resolved']]
all_individual_tasks

Unnamed: 0,Summary,Issue id,Parent id,Created,Resolved
0,Review,472135,472134,15/May/23 1:22 PM,15/May/23 2:01 PM
1,Review,472112,472111,15/May/23 12:39 PM,15/May/23 12:39 PM
2,Import to Test System,472054,470604,15/May/23 10:14 AM,15/May/23 10:14 AM
3,Configure,472053,470604,15/May/23 10:14 AM,15/May/23 10:14 AM
4,Import in Production,472007,470369,15/May/23 8:54 AM,15/May/23 1:16 PM
...,...,...,...,...,...
6104,Review,470370,470369,08/May/23 6:17 PM,15/May/23 8:54 AM
6105,Review,470330,470329,08/May/23 3:47 PM,15/May/23 11:12 AM
6106,Communicate,468182,454791,26/Apr/23 6:11 PM,15/May/23 5:50 AM
6107,Communicate,468160,467976,26/Apr/23 4:10 PM,15/May/23 5:56 AM


In [103]:
all_individual_tasks[all_individual_tasks['Parent id'] == 465210]

Unnamed: 0,Summary,Issue id,Parent id,Created,Resolved
658,Explain,466478,465210,20/Apr/23 8:22 AM,26/Apr/23 10:33 AM
854,Review,465211,465210,17/Apr/23 9:43 AM,20/Apr/23 8:22 AM
5126,Explain,466478,465210,20/Apr/23 8:22 AM,26/Apr/23 10:33 AM
5322,Review,465211,465210,17/Apr/23 9:43 AM,20/Apr/23 8:22 AM


### Convert date like columns to `datetime` object

In [104]:
all_individual_tasks["Created"] = pd.to_datetime(all_individual_tasks["Created"])
all_individual_tasks["Resolved"] = pd.to_datetime(all_individual_tasks["Resolved"])
all_individual_tasks.info()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_individual_tasks["Created"] = pd.to_datetime(all_individual_tasks["Created"])


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6109 entries, 0 to 6108
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   Summary    6109 non-null   object        
 1   Issue id   6109 non-null   int64         
 2   Parent id  6109 non-null   int64         
 3   Created    6109 non-null   datetime64[ns]
 4   Resolved   6109 non-null   datetime64[ns]
dtypes: datetime64[ns](2), int64(2), object(1)
memory usage: 238.8+ KB


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_individual_tasks["Resolved"] = pd.to_datetime(all_individual_tasks["Resolved"])


In [105]:
all_individual_tasks['Summary'].value_counts()

Review                            1088
Configure                         1047
Verify                             950
Communicate                        922
Import to Test System              790
Import in Production               634
Import to Customer Test System     402
Explain                            183
Do                                  51
Clarify                             42
Name: Summary, dtype: int64

### Check if there are any `null` values

In [106]:
all_individual_tasks.isnull().sum()

Summary      0
Issue id     0
Parent id    0
Created      0
Resolved     0
dtype: int64

### Drop all rows with null value

In [107]:
all_individual_tasks_without_null = all_individual_tasks.dropna()
all_individual_tasks_without_null.isnull().sum()

Summary      0
Issue id     0
Parent id    0
Created      0
Resolved     0
dtype: int64

**Summary:**

- We are interested only in completed tickets so we have droped rows which contain `null` in `resolved` columns
- Data is filtered and cleaned we are ready for analysis part

# Data Analysis 

# Daterminate how long each ticket was in particular individual task

- calculate time for eacch subtask
- calculate how long each ticket was in review
- calculate how long each ticket was "on codebusters"
- calculate how long each ticket was "outside codebusters"


## Calculate time spent in each of the subtask

In [108]:
all_individual_tasks_time_calculation = all_individual_tasks_without_null
all_individual_tasks_time_calculation["time_in"] = all_individual_tasks_time_calculation["Resolved"] - all_individual_tasks_time_calculation["Created"]

### List all available statuses of the CI ticket subtask

In [109]:
all_individual_tasks_time_calculation["Summary"].value_counts()

Review                            1088
Configure                         1047
Verify                             950
Communicate                        922
Import to Test System              790
Import in Production               634
Import to Customer Test System     402
Explain                            183
Do                                  51
Clarify                             42
Name: Summary, dtype: int64

## Calculate time ticket was in review step

In [110]:
all_individual_tasks_reivew = all_individual_tasks_time_calculation[all_individual_tasks_time_calculation['Summary'] == 'Review']
all_individual_tasks_reivew.rename({"time_in": "time_in_review"},axis=1, inplace=True)
all_individual_tasks_reivew.reset_index(inplace=True)
all_individual_tasks_reivew

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_individual_tasks_reivew.rename({"time_in": "time_in_review"},axis=1, inplace=True)


Unnamed: 0,index,Summary,Issue id,Parent id,Created,Resolved,time_in_review
0,0,Review,472135,472134,2023-05-15 13:22:00,2023-05-15 14:01:00,0 days 00:39:00
1,1,Review,472112,472111,2023-05-15 12:39:00,2023-05-15 12:39:00,0 days 00:00:00
2,13,Review,471790,471789,2023-05-12 13:50:00,2023-05-12 14:45:00,0 days 00:55:00
3,15,Review,471776,471775,2023-05-12 13:27:00,2023-05-12 14:45:00,0 days 01:18:00
4,16,Review,471771,471770,2023-05-12 13:18:00,2023-05-12 14:44:00,0 days 01:26:00
...,...,...,...,...,...,...,...
1083,6100,Review,470620,470619,2023-05-09 14:19:00,2023-05-15 11:28:00,5 days 21:09:00
1084,6101,Review,470612,470611,2023-05-09 14:05:00,2023-05-15 11:29:00,5 days 21:24:00
1085,6102,Review,470605,470604,2023-05-09 13:50:00,2023-05-15 10:14:00,5 days 20:24:00
1086,6104,Review,470370,470369,2023-05-08 18:17:00,2023-05-15 08:54:00,6 days 14:37:00


# Investigation starts

In [161]:
all_individual_tasks_reivew[all_individual_tasks_reivew["Parent id"] == 438360]

Unnamed: 0,index,Summary,Issue id,Parent id,Created,Resolved,time_in_review


In [162]:
all_sr[all_sr["Issue id"] == 438360]

Unnamed: 0,Issue key,Issue id,Status,Reporter,Created,Due Date,Resolved,Customer,Type of Request
196,CI-7938,438360,Closed,nsanchez,2022-12-29 11:30:00,30/Dec/22 12:00 AM,2022-12-29 15:44:00,Ineos,Service Request (SR)


In [160]:
null_rows

Unnamed: 0,Issue key,Issue id,Status,Reporter,Created,Due Date,Resolved,Customer,Type of Request,Parent id,time_in_review
196,CI-7938,438360,Closed,nsanchez,2022-12-29 11:30:00,30/Dec/22 12:00 AM,2022-12-29 15:44:00,Ineos,Service Request (SR),,NaT
197,CI-7936,438267,Closed,nsanchez,2022-12-28 10:25:00,01/Jan/23 12:00 AM,2022-12-29 09:48:00,Daimler (MFTBC),Service Request (SR),,NaT
199,CI-7933,438238,Closed,ssanz,2022-12-27 16:12:00,03/Jan/23 12:00 AM,2022-12-28 14:14:00,Allianz,Service Request (SR),,NaT
201,CI-7928,438007,Closed,yutaka,2022-12-23 04:20:00,26/Dec/22 12:00 AM,2023-01-06 01:30:00,Eberspacher,Service Request (SR),,NaT
203,CI-7925,437929,Closed,nsanchez,2022-12-22 14:58:00,02/Jan/23 12:00 AM,2023-01-05 15:29:00,Porsche,Service Request (SR),,NaT
...,...,...,...,...,...,...,...,...,...,...,...
308,CI-7029,398756,Closed,skarja,2022-06-23 14:32:00,30/Jun/22 12:00 AM,2023-01-27 19:55:00,TeamBank AG,Service Request (SR),,NaT
309,CI-6926,393799,Closed,aklabisz,2022-05-25 12:42:00,12/Jul/22 12:00 AM,2023-01-27 19:53:00,Aquila Capital,Service Request (SR),,NaT
310,CI-6911,393585,Closed,aklabisz,2022-05-24 15:11:00,11/Aug/22 12:00 AM,2023-01-24 10:33:00,VW,Service Request (SR),,NaT
311,CI-6901,393357,Closed,aklabisz,2022-05-23 13:21:00,30/May/22 12:00 AM,2023-03-21 13:06:00,Porsche,Service Request (SR),,NaT


# Investigation ends

In [159]:
all_sr.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 314 entries, 0 to 313
Data columns (total 9 columns):
 #   Column           Non-Null Count  Dtype         
---  ------           --------------  -----         
 0   Issue key        314 non-null    object        
 1   Issue id         314 non-null    int64         
 2   Status           314 non-null    object        
 3   Reporter         314 non-null    object        
 4   Created          314 non-null    datetime64[ns]
 5   Due Date         314 non-null    object        
 6   Resolved         314 non-null    datetime64[ns]
 7   Customer         314 non-null    object        
 8   Type of Request  314 non-null    object        
dtypes: datetime64[ns](2), int64(1), object(6)
memory usage: 22.2+ KB


### Grup tickets by the `Parent id` and sum total time spent in `review` subtask


In [114]:
all_individual_tasks_reivew_sum = all_individual_tasks_reivew.groupby(by=["Parent id"], dropna=False)["time_in_review"].sum()
all_individual_tasks_reivew_sum = all_individual_tasks_reivew_sum.to_frame()
all_individual_tasks_reivew_sum.reset_index(inplace=True)
all_individual_tasks_reivew_sum

Unnamed: 0,Parent id,time_in_review
0,331914,566 days 10:00:00
1,338034,501 days 19:04:00
2,349721,0 days 00:02:00
3,351500,1036 days 07:16:00
4,356993,388 days 17:33:00
...,...,...
708,471770,0 days 02:52:00
709,471775,0 days 02:36:00
710,471789,0 days 01:50:00
711,472111,0 days 00:00:00


In [115]:
all_individual_tasks_reivew_sum[all_individual_tasks_reivew_sum["Parent id"] == 465210]

Unnamed: 0,Parent id,time_in_review
592,465210,5 days 21:18:00


In [116]:
all_individual_tasks_reivew_sum.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 713 entries, 0 to 712
Data columns (total 2 columns):
 #   Column          Non-Null Count  Dtype          
---  ------          --------------  -----          
 0   Parent id       713 non-null    int64          
 1   time_in_review  713 non-null    timedelta64[ns]
dtypes: int64(1), timedelta64[ns](1)
memory usage: 11.3 KB


In [117]:
all_individual_tasks_reivew_sum[all_individual_tasks_reivew_sum["Parent id"] == 465210]

Unnamed: 0,Parent id,time_in_review
592,465210,5 days 21:18:00


### Check on the particular ticket if the total time is summed correctly

In [118]:
all_individual_tasks_reivew.groupby("Parent id").count()

Unnamed: 0_level_0,index,Summary,Issue id,Created,Resolved,time_in_review
Parent id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
331914,1,1,1,1,1,1
338034,1,1,1,1,1,1
349721,1,1,1,1,1,1
351500,2,2,2,2,2,2
356993,1,1,1,1,1,1
...,...,...,...,...,...,...
471770,2,2,2,2,2,2
471775,2,2,2,2,2,2
471789,2,2,2,2,2,2
472111,3,3,3,3,3,3


In [119]:
all_individual_tasks_reivew.loc[all_individual_tasks_reivew["Parent id"] == 345462]

Unnamed: 0,index,Summary,Issue id,Parent id,Created,Resolved,time_in_review


In [120]:
type(all_individual_tasks_reivew_sum)

pandas.core.frame.DataFrame

In [121]:
all_individual_tasks_reivew_sum_df = all_individual_tasks_reivew_sum
all_individual_tasks_reivew_sum_df.reset_index(inplace=True)
all_individual_tasks_reivew_sum_df

Unnamed: 0,index,Parent id,time_in_review
0,0,331914,566 days 10:00:00
1,1,338034,501 days 19:04:00
2,2,349721,0 days 00:02:00
3,3,351500,1036 days 07:16:00
4,4,356993,388 days 17:33:00
...,...,...,...
708,708,471770,0 days 02:52:00
709,709,471775,0 days 02:36:00
710,710,471789,0 days 01:50:00
711,711,472111,0 days 00:00:00


In [122]:
all_individual_tasks_reivew_sum_df.loc[all_individual_tasks_reivew_sum_df["Parent id"] == 345462]

Unnamed: 0,index,Parent id,time_in_review


In [155]:
all_individual_tasks_reivew_sum_df.isnull().sum()

index             0
Parent id         0
time_in_review    0
dtype: int64

In [124]:
all_individual_tasks_reivew_sum_df

Unnamed: 0,index,Parent id,time_in_review
0,0,331914,566 days 10:00:00
1,1,338034,501 days 19:04:00
2,2,349721,0 days 00:02:00
3,3,351500,1036 days 07:16:00
4,4,356993,388 days 17:33:00
...,...,...,...
708,708,471770,0 days 02:52:00
709,709,471775,0 days 02:36:00
710,710,471789,0 days 01:50:00
711,711,472111,0 days 00:00:00


**Summary:**
- There are some tickets which have been more than once in the `Review` step
- We can notice that after gruping and summing tickets, case with 'Parent id' `345462` has total time equals more than `8 days`, which is the sum of the time spend in previous two `Review` subtasks

### Merge `all_sr` dataset with `all_individual_tasks_reivew_sum_df`

In [157]:
all_sr_review_time = pd.merge(all_sr, all_individual_tasks_reivew_sum_df[['Parent id', 'time_in_review']], how="left",left_on = "Issue id",right_on = "Parent id")
all_sr_review_time

Unnamed: 0,Issue key,Issue id,Status,Reporter,Created,Due Date,Resolved,Customer,Type of Request,Parent id,time_in_review
0,CI-8574,468635,Closed,yutaka,2023-04-28 03:47:00,01/May/23 12:00 AM,2023-05-08 16:08:00,Rehau,Service Request (SR),468635.0,0 days 00:48:00
1,CI-8568,468149,Closed,aruizrobles,2023-04-26 15:30:00,03/May/23 12:00 AM,2023-05-04 17:55:00,Numiga,Service Request (SR),468149.0,3 days 22:32:00
2,CI-8560,467735,Closed,nsanchez,2023-04-25 11:57:00,08/May/23 12:00 AM,2023-05-08 14:02:00,OBB,Service Request (SR),467735.0,13 days 16:48:00
3,CI-8554,467243,Closed,yutaka,2023-04-24 06:44:00,08/May/23 12:00 AM,2023-05-10 04:06:00,Hensoldt,Service Request (SR),467243.0,0 days 00:00:00
4,CI-8552,467085,Closed,nsanchez,2023-04-21 14:58:00,27/Apr/23 12:00 AM,2023-05-08 13:46:00,VW,Service Request (SR),467085.0,13 days 19:46:00
...,...,...,...,...,...,...,...,...,...,...,...
309,CI-6926,393799,Closed,aklabisz,2022-05-25 12:42:00,12/Jul/22 12:00 AM,2023-01-27 19:53:00,Aquila Capital,Service Request (SR),,NaT
310,CI-6911,393585,Closed,aklabisz,2022-05-24 15:11:00,11/Aug/22 12:00 AM,2023-01-24 10:33:00,VW,Service Request (SR),,NaT
311,CI-6901,393357,Closed,aklabisz,2022-05-23 13:21:00,30/May/22 12:00 AM,2023-03-21 13:06:00,Porsche,Service Request (SR),,NaT
312,CI-6825,389774,Closed,aklabisz,2022-05-11 11:16:00,16/May/22 12:00 AM,2023-01-24 12:32:00,Hugo Boss,Service Request (SR),,NaT


In [126]:
all_sr_review_time.isnull().sum()

Issue key           0
Issue id            0
Status              0
Reporter            0
Created             0
Due Date            0
Resolved            0
Customer            0
Type of Request     0
Parent id          92
time_in_review     92
dtype: int64

### Investigate `null` rows after merging  main SR tickets with individual subtasks

In [127]:
null_rows = all_sr_review_time[all_sr_review_time["time_in_review"].isnull()]
null_rows.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 92 entries, 196 to 312
Data columns (total 11 columns):
 #   Column           Non-Null Count  Dtype          
---  ------           --------------  -----          
 0   Issue key        92 non-null     object         
 1   Issue id         92 non-null     int64          
 2   Status           92 non-null     object         
 3   Reporter         92 non-null     object         
 4   Created          92 non-null     datetime64[ns] 
 5   Due Date         92 non-null     object         
 6   Resolved         92 non-null     datetime64[ns] 
 7   Customer         92 non-null     object         
 8   Type of Request  92 non-null     object         
 9   Parent id        0 non-null      float64        
 10  time_in_review   0 non-null      timedelta64[ns]
dtypes: datetime64[ns](2), float64(1), int64(1), object(6), timedelta64[ns](1)
memory usage: 8.6+ KB


In [128]:
null_rows

Unnamed: 0,Issue key,Issue id,Status,Reporter,Created,Due Date,Resolved,Customer,Type of Request,Parent id,time_in_review
196,CI-7938,438360,Closed,nsanchez,2022-12-29 11:30:00,30/Dec/22 12:00 AM,2022-12-29 15:44:00,Ineos,Service Request (SR),,NaT
197,CI-7936,438267,Closed,nsanchez,2022-12-28 10:25:00,01/Jan/23 12:00 AM,2022-12-29 09:48:00,Daimler (MFTBC),Service Request (SR),,NaT
199,CI-7933,438238,Closed,ssanz,2022-12-27 16:12:00,03/Jan/23 12:00 AM,2022-12-28 14:14:00,Allianz,Service Request (SR),,NaT
201,CI-7928,438007,Closed,yutaka,2022-12-23 04:20:00,26/Dec/22 12:00 AM,2023-01-06 01:30:00,Eberspacher,Service Request (SR),,NaT
203,CI-7925,437929,Closed,nsanchez,2022-12-22 14:58:00,02/Jan/23 12:00 AM,2023-01-05 15:29:00,Porsche,Service Request (SR),,NaT
...,...,...,...,...,...,...,...,...,...,...,...
308,CI-7029,398756,Closed,skarja,2022-06-23 14:32:00,30/Jun/22 12:00 AM,2023-01-27 19:55:00,TeamBank AG,Service Request (SR),,NaT
309,CI-6926,393799,Closed,aklabisz,2022-05-25 12:42:00,12/Jul/22 12:00 AM,2023-01-27 19:53:00,Aquila Capital,Service Request (SR),,NaT
310,CI-6911,393585,Closed,aklabisz,2022-05-24 15:11:00,11/Aug/22 12:00 AM,2023-01-24 10:33:00,VW,Service Request (SR),,NaT
311,CI-6901,393357,Closed,aklabisz,2022-05-23 13:21:00,30/May/22 12:00 AM,2023-03-21 13:06:00,Porsche,Service Request (SR),,NaT


In [129]:
all_sr_review_time_without_null = all_sr_review_time.dropna()
all_sr_review_time_without_null.isnull().sum()

Issue key          0
Issue id           0
Status             0
Reporter           0
Created            0
Due Date           0
Resolved           0
Customer           0
Type of Request    0
Parent id          0
time_in_review     0
dtype: int64

**Summary**:
- Ror tickets like `CI-8488`revew time equals 0, so we need to drop it, because it can leads to wrong KPIs calculations

# TO DO
- update the lates dateset

## Determinate time spent on codebusters

- calculate the total time ticket was in subtasks "Configure", "Review", "mport to Test System", "Import in Production", "Import to Customer Test System"
- apply the results to all tickets dataset

In [130]:
all_individual_with_codebusters_time = all_individual_tasks_time_calculation.loc[all_individual_tasks_time_calculation["Summary"].isin(["Configure", "Review", "Import to Test System", "Import in Production", "Import to Customer Test System", "Import to Test", "Import to Staging System"])]

Summarize time for each codebusters subtask 

In [131]:
all_individual_with_codebusters_time.groupby("Parent id").count()

Unnamed: 0_level_0,Summary,Issue id,Created,Resolved,time_in
Parent id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
331914,1,1,1,1,1
338034,1,1,1,1,1
349721,4,4,4,4,4
351500,2,2,2,2,2
356993,4,4,4,4,4
...,...,...,...,...,...
471770,2,2,2,2,2
471775,2,2,2,2,2
471789,2,2,2,2,2
472111,3,3,3,3,3


In [132]:
all_individual_with_codebusters_time[all_individual_with_codebusters_time['Parent id'] == 329036]

Unnamed: 0,Summary,Issue id,Parent id,Created,Resolved,time_in


In [133]:
all_individual_with_codebusters_time_sum = all_individual_with_codebusters_time.groupby("Parent id")["time_in"].sum()
all_individual_with_codebusters_time_sum_df = all_individual_with_codebusters_time_sum.to_frame()
all_individual_with_codebusters_time_sum_df.rename({"time_in": "time_on_codebusters"},axis=1, inplace=True)
all_individual_with_codebusters_time_sum_df.reset_index(inplace=True)
all_individual_with_codebusters_time_sum_df

Unnamed: 0,Parent id,time_on_codebusters
0,331914,566 days 10:00:00
1,338034,501 days 19:04:00
2,349721,129 days 23:09:00
3,351500,1036 days 07:16:00
4,356993,406 days 21:42:00
...,...,...
777,471770,0 days 02:52:00
778,471775,0 days 02:36:00
779,471789,0 days 01:50:00
780,472111,0 days 00:00:00


In [134]:
# all_individual_tasks_reivew_sum_df = all_individual_tasks_reivew_sum.to_frame()
# all_individual_tasks_reivew_sum_df.reset_index(inplace=True)
# all_individual_tasks_reivew_sum_df

### Merge total time on codebusters with all jira ticekts dataset

In [135]:
all_sr_codebusters_time = pd.merge(all_sr, all_individual_with_codebusters_time_sum_df, how="inner",left_on = "Issue id",right_on = "Parent id")
all_sr_codebusters_time 

Unnamed: 0,Issue key,Issue id,Status,Reporter,Created,Due Date,Resolved,Customer,Type of Request,Parent id,time_on_codebusters
0,CI-8574,468635,Closed,yutaka,2023-04-28 03:47:00,01/May/23 12:00 AM,2023-05-08 16:08:00,Rehau,Service Request (SR),468635,0 days 01:00:00
1,CI-8568,468149,Closed,aruizrobles,2023-04-26 15:30:00,03/May/23 12:00 AM,2023-05-04 17:55:00,Numiga,Service Request (SR),468149,5 days 22:46:00
2,CI-8560,467735,Closed,nsanchez,2023-04-25 11:57:00,08/May/23 12:00 AM,2023-05-08 14:02:00,OBB,Service Request (SR),467735,13 days 16:48:00
3,CI-8554,467243,Closed,yutaka,2023-04-24 06:44:00,08/May/23 12:00 AM,2023-05-10 04:06:00,Hensoldt,Service Request (SR),467243,0 days 13:56:00
4,CI-8552,467085,Closed,nsanchez,2023-04-21 14:58:00,27/Apr/23 12:00 AM,2023-05-08 13:46:00,VW,Service Request (SR),467085,25 days 13:00:00
...,...,...,...,...,...,...,...,...,...,...,...
244,CI-7227,407574,Closed,skarja,2022-08-10 00:39:00,30/Sep/22 12:00 AM,2023-01-24 11:52:00,Rheinmetall,Service Request (SR),407574,102 days 23:41:00
245,CI-7226,407535,Closed,ssaharoy,2022-08-09 15:05:00,16/Aug/22 12:00 AM,2023-01-18 06:29:00,Porsche,Service Request (SR),407535,0 days 00:00:00
246,CI-7205,406532,Closed,nsanchez,2022-08-04 11:54:00,02/May/23 12:00 AM,2023-04-10 08:47:00,Hugo Boss,Service Request (SR),406532,485 days 13:30:00
247,CI-6901,393357,Closed,aklabisz,2022-05-23 13:21:00,30/May/22 12:00 AM,2023-03-21 13:06:00,Porsche,Service Request (SR),393357,0 days 00:00:00


### Investigate `null` rows after merging  main SR tickets with individual subtasks

In [136]:
all_sr_codebusters_time.isnull().sum()

Issue key              0
Issue id               0
Status                 0
Reporter               0
Created                0
Due Date               0
Resolved               0
Customer               0
Type of Request        0
Parent id              0
time_on_codebusters    0
dtype: int64

In [137]:
all_sr_codebusters_time_without_null = all_sr_codebusters_time.dropna()
all_sr_codebusters_time_without_null.isnull().sum()

Issue key              0
Issue id               0
Status                 0
Reporter               0
Created                0
Due Date               0
Resolved               0
Customer               0
Type of Request        0
Parent id              0
time_on_codebusters    0
dtype: int64

In [138]:
all_sr_codebusters_time_without_null

Unnamed: 0,Issue key,Issue id,Status,Reporter,Created,Due Date,Resolved,Customer,Type of Request,Parent id,time_on_codebusters
0,CI-8574,468635,Closed,yutaka,2023-04-28 03:47:00,01/May/23 12:00 AM,2023-05-08 16:08:00,Rehau,Service Request (SR),468635,0 days 01:00:00
1,CI-8568,468149,Closed,aruizrobles,2023-04-26 15:30:00,03/May/23 12:00 AM,2023-05-04 17:55:00,Numiga,Service Request (SR),468149,5 days 22:46:00
2,CI-8560,467735,Closed,nsanchez,2023-04-25 11:57:00,08/May/23 12:00 AM,2023-05-08 14:02:00,OBB,Service Request (SR),467735,13 days 16:48:00
3,CI-8554,467243,Closed,yutaka,2023-04-24 06:44:00,08/May/23 12:00 AM,2023-05-10 04:06:00,Hensoldt,Service Request (SR),467243,0 days 13:56:00
4,CI-8552,467085,Closed,nsanchez,2023-04-21 14:58:00,27/Apr/23 12:00 AM,2023-05-08 13:46:00,VW,Service Request (SR),467085,25 days 13:00:00
...,...,...,...,...,...,...,...,...,...,...,...
244,CI-7227,407574,Closed,skarja,2022-08-10 00:39:00,30/Sep/22 12:00 AM,2023-01-24 11:52:00,Rheinmetall,Service Request (SR),407574,102 days 23:41:00
245,CI-7226,407535,Closed,ssaharoy,2022-08-09 15:05:00,16/Aug/22 12:00 AM,2023-01-18 06:29:00,Porsche,Service Request (SR),407535,0 days 00:00:00
246,CI-7205,406532,Closed,nsanchez,2022-08-04 11:54:00,02/May/23 12:00 AM,2023-04-10 08:47:00,Hugo Boss,Service Request (SR),406532,485 days 13:30:00
247,CI-6901,393357,Closed,aklabisz,2022-05-23 13:21:00,30/May/22 12:00 AM,2023-03-21 13:06:00,Porsche,Service Request (SR),393357,0 days 00:00:00


**Summary:**

## Determinate time spent outside codebusters

- calculate the total time ticket was in one of the subtask "Explain", "Verify", "Clarify", "Communicate"
- apply results to all SR tickets dataset

In [139]:
all_individual_others_time = all_individual_tasks_time_calculation.loc[all_individual_tasks_time_calculation["Summary"].isin(["Explain", "Verify", "Clarify", "Communicate"])]
all_individual_others_time 

Unnamed: 0,Summary,Issue id,Parent id,Created,Resolved,time_in
7,Verify,471871,466458,2023-05-12 15:49:00,2023-05-12 15:49:00,0 days 00:00:00
9,Verify,471869,466458,2023-05-12 15:49:00,2023-05-12 15:49:00,0 days 00:00:00
14,Explain,471783,465493,2023-05-12 13:41:00,2023-05-15 05:58:00,2 days 16:17:00
17,Communicate,471768,471138,2023-05-12 13:03:00,2023-05-12 15:03:00,0 days 02:00:00
18,Communicate,471767,471134,2023-05-12 13:03:00,2023-05-12 15:03:00,0 days 02:00:00
...,...,...,...,...,...,...
6098,Explain,471783,465493,2023-05-12 13:41:00,2023-05-15 05:58:00,2 days 16:17:00
6099,Communicate,470747,455636,2023-05-09 16:30:00,2023-05-15 10:59:00,5 days 18:29:00
6106,Communicate,468182,454791,2023-04-26 18:11:00,2023-05-15 05:50:00,18 days 11:39:00
6107,Communicate,468160,467976,2023-04-26 16:10:00,2023-05-15 05:56:00,18 days 13:46:00


Summarize time for each subtask outside codeubsters

In [140]:
all_individual_others_time_sum = all_individual_others_time.groupby("Parent id")["time_in"].sum()
all_individual_others_time_sum_df = all_individual_others_time_sum.to_frame()
all_individual_others_time_sum_df.rename({"time_in": "time_outside_codebusters"},axis=1, inplace=True)
all_individual_others_time_sum_df.reset_index(inplace=True)
all_individual_others_time_sum_df

Unnamed: 0,Parent id,time_outside_codebusters
0,331914,0 days 00:00:00
1,338034,17 days 05:13:00
2,349721,9 days 11:08:00
3,351500,0 days 00:00:00
4,356993,4 days 00:28:00
...,...,...
832,471086,3 days 08:46:00
833,471134,0 days 07:14:00
834,471138,0 days 04:04:00
835,471236,0 days 01:02:00


In [141]:
# all_individual_with_codebusters_time_sum = all_individual_with_codebusters_time.groupby("Parent id")["time_in"].sum()
# all_individual_with_codebusters_time_sum_df = all_individual_with_codebusters_time_sum.to_frame()
# all_individual_with_codebusters_time_sum_df.rename({"time_in": "time_on_codebusters"},axis=1, inplace=True)
# all_individual_with_codebusters_time_sum_df.reset_index(inplace=True)
# all_individual_with_codebusters_time_sum_df

### Merge `all_sr` with `all_individual_others_time_sum_df` dataset

In [142]:
all_sr_outside_codebusters_time = pd.merge(all_sr, all_individual_others_time_sum_df, how="inner",left_on = "Issue id",right_on = "Parent id")
all_sr_outside_codebusters_time

Unnamed: 0,Issue key,Issue id,Status,Reporter,Created,Due Date,Resolved,Customer,Type of Request,Parent id,time_outside_codebusters
0,CI-8574,468635,Closed,yutaka,2023-04-28 03:47:00,01/May/23 12:00 AM,2023-05-08 16:08:00,Rehau,Service Request (SR),468635,20 days 23:42:00
1,CI-8568,468149,Closed,aruizrobles,2023-04-26 15:30:00,03/May/23 12:00 AM,2023-05-04 17:55:00,Numiga,Service Request (SR),468149,10 days 06:04:00
2,CI-8560,467735,Closed,nsanchez,2023-04-25 11:57:00,08/May/23 12:00 AM,2023-05-08 14:02:00,OBB,Service Request (SR),467735,12 days 11:22:00
3,CI-8554,467243,Closed,yutaka,2023-04-24 06:44:00,08/May/23 12:00 AM,2023-05-10 04:06:00,Hensoldt,Service Request (SR),467243,31 days 04:48:00
4,CI-8552,467085,Closed,nsanchez,2023-04-21 14:58:00,27/Apr/23 12:00 AM,2023-05-08 13:46:00,VW,Service Request (SR),467085,8 days 08:36:00
...,...,...,...,...,...,...,...,...,...,...,...
271,CI-7029,398756,Closed,skarja,2022-06-23 14:32:00,30/Jun/22 12:00 AM,2023-01-27 19:55:00,TeamBank AG,Service Request (SR),398756,556 days 02:34:00
272,CI-6911,393585,Closed,aklabisz,2022-05-24 15:11:00,11/Aug/22 12:00 AM,2023-01-24 10:33:00,VW,Service Request (SR),393585,112 days 22:14:00
273,CI-6901,393357,Closed,aklabisz,2022-05-23 13:21:00,30/May/22 12:00 AM,2023-03-21 13:06:00,Porsche,Service Request (SR),393357,229 days 03:33:00
274,CI-6825,389774,Closed,aklabisz,2022-05-11 11:16:00,16/May/22 12:00 AM,2023-01-24 12:32:00,Hugo Boss,Service Request (SR),389774,195 days 23:52:00


In [143]:
all_sr_outside_codebusters_time.isnull().sum()

Issue key                   0
Issue id                    0
Status                      0
Reporter                    0
Created                     0
Due Date                    0
Resolved                    0
Customer                    0
Type of Request             0
Parent id                   0
time_outside_codebusters    0
dtype: int64

**Summary:**

## Combine all created datasets
- In this step we will merge all created erlier datasets: `all_sr_outside_codebusters_time`, `all_sr_codebusters_time` and `all_sr_review_time` 

Merge df containing "review" time with df containg "time on codebusters" 

In [144]:
all_tickets_time_on_and_outside_codebusters = pd.merge(all_sr_outside_codebusters_time, all_sr_codebusters_time[['Issue id','time_on_codebusters']],  how="inner",left_on = "Issue id",right_on = "Issue id")
all_tickets_time_on_and_outside_codebusters

Unnamed: 0,Issue key,Issue id,Status,Reporter,Created,Due Date,Resolved,Customer,Type of Request,Parent id,time_outside_codebusters,time_on_codebusters
0,CI-8574,468635,Closed,yutaka,2023-04-28 03:47:00,01/May/23 12:00 AM,2023-05-08 16:08:00,Rehau,Service Request (SR),468635,20 days 23:42:00,0 days 01:00:00
1,CI-8568,468149,Closed,aruizrobles,2023-04-26 15:30:00,03/May/23 12:00 AM,2023-05-04 17:55:00,Numiga,Service Request (SR),468149,10 days 06:04:00,5 days 22:46:00
2,CI-8560,467735,Closed,nsanchez,2023-04-25 11:57:00,08/May/23 12:00 AM,2023-05-08 14:02:00,OBB,Service Request (SR),467735,12 days 11:22:00,13 days 16:48:00
3,CI-8554,467243,Closed,yutaka,2023-04-24 06:44:00,08/May/23 12:00 AM,2023-05-10 04:06:00,Hensoldt,Service Request (SR),467243,31 days 04:48:00,0 days 13:56:00
4,CI-8552,467085,Closed,nsanchez,2023-04-21 14:58:00,27/Apr/23 12:00 AM,2023-05-08 13:46:00,VW,Service Request (SR),467085,8 days 08:36:00,25 days 13:00:00
...,...,...,...,...,...,...,...,...,...,...,...,...
244,CI-7227,407574,Closed,skarja,2022-08-10 00:39:00,30/Sep/22 12:00 AM,2023-01-24 11:52:00,Rheinmetall,Service Request (SR),407574,0 days 00:01:00,102 days 23:41:00
245,CI-7226,407535,Closed,ssaharoy,2022-08-09 15:05:00,16/Aug/22 12:00 AM,2023-01-18 06:29:00,Porsche,Service Request (SR),407535,76 days 21:56:00,0 days 00:00:00
246,CI-7205,406532,Closed,nsanchez,2022-08-04 11:54:00,02/May/23 12:00 AM,2023-04-10 08:47:00,Hugo Boss,Service Request (SR),406532,0 days 00:00:00,485 days 13:30:00
247,CI-6901,393357,Closed,aklabisz,2022-05-23 13:21:00,30/May/22 12:00 AM,2023-03-21 13:06:00,Porsche,Service Request (SR),393357,229 days 03:33:00,0 days 00:00:00


In [145]:
all_tickets_time_on_and_outside_codebusters.isnull().sum()

Issue key                   0
Issue id                    0
Status                      0
Reporter                    0
Created                     0
Due Date                    0
Resolved                    0
Customer                    0
Type of Request             0
Parent id                   0
time_outside_codebusters    0
time_on_codebusters         0
dtype: int64

### Merge `all_tickets_time_on_and_outside_codebusters` with `all_sr_review_time` dataset

In [146]:
all_tickets_with_time = pd.merge(all_tickets_time_on_and_outside_codebusters, all_sr_review_time[['Issue id','time_in_review']],  how="inner",left_on = "Issue id",right_on = "Issue id")
all_tickets_with_time

Unnamed: 0,Issue key,Issue id,Status,Reporter,Created,Due Date,Resolved,Customer,Type of Request,Parent id,time_outside_codebusters,time_on_codebusters,time_in_review
0,CI-8574,468635,Closed,yutaka,2023-04-28 03:47:00,01/May/23 12:00 AM,2023-05-08 16:08:00,Rehau,Service Request (SR),468635,20 days 23:42:00,0 days 01:00:00,0 days 00:48:00
1,CI-8568,468149,Closed,aruizrobles,2023-04-26 15:30:00,03/May/23 12:00 AM,2023-05-04 17:55:00,Numiga,Service Request (SR),468149,10 days 06:04:00,5 days 22:46:00,3 days 22:32:00
2,CI-8560,467735,Closed,nsanchez,2023-04-25 11:57:00,08/May/23 12:00 AM,2023-05-08 14:02:00,OBB,Service Request (SR),467735,12 days 11:22:00,13 days 16:48:00,13 days 16:48:00
3,CI-8554,467243,Closed,yutaka,2023-04-24 06:44:00,08/May/23 12:00 AM,2023-05-10 04:06:00,Hensoldt,Service Request (SR),467243,31 days 04:48:00,0 days 13:56:00,0 days 00:00:00
4,CI-8552,467085,Closed,nsanchez,2023-04-21 14:58:00,27/Apr/23 12:00 AM,2023-05-08 13:46:00,VW,Service Request (SR),467085,8 days 08:36:00,25 days 13:00:00,13 days 19:46:00
...,...,...,...,...,...,...,...,...,...,...,...,...,...
244,CI-7227,407574,Closed,skarja,2022-08-10 00:39:00,30/Sep/22 12:00 AM,2023-01-24 11:52:00,Rheinmetall,Service Request (SR),407574,0 days 00:01:00,102 days 23:41:00,102 days 23:41:00
245,CI-7226,407535,Closed,ssaharoy,2022-08-09 15:05:00,16/Aug/22 12:00 AM,2023-01-18 06:29:00,Porsche,Service Request (SR),407535,76 days 21:56:00,0 days 00:00:00,NaT
246,CI-7205,406532,Closed,nsanchez,2022-08-04 11:54:00,02/May/23 12:00 AM,2023-04-10 08:47:00,Hugo Boss,Service Request (SR),406532,0 days 00:00:00,485 days 13:30:00,485 days 13:30:00
247,CI-6901,393357,Closed,aklabisz,2022-05-23 13:21:00,30/May/22 12:00 AM,2023-03-21 13:06:00,Porsche,Service Request (SR),393357,229 days 03:33:00,0 days 00:00:00,NaT


Check for null values

In [147]:
all_tickets_with_time.isnull().sum()

Issue key                    0
Issue id                     0
Status                       0
Reporter                     0
Created                      0
Due Date                     0
Resolved                     0
Customer                     0
Type of Request              0
Parent id                    0
time_outside_codebusters     0
time_on_codebusters          0
time_in_review              27
dtype: int64

Drop rows which contain null value

In [148]:
all_tickets_with_time_without_null = all_tickets_with_time.dropna()
all_tickets_with_time_without_null.isnull().sum()

Issue key                   0
Issue id                    0
Status                      0
Reporter                    0
Created                     0
Due Date                    0
Resolved                    0
Customer                    0
Type of Request             0
Parent id                   0
time_outside_codebusters    0
time_on_codebusters         0
time_in_review              0
dtype: int64

### Convert `time in..` columns to minutes

Convert `timedelta` columns to float

In [149]:
all_tickets_with_time_converted = all_tickets_with_time_without_null
all_tickets_with_time_converted.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 222 entries, 0 to 248
Data columns (total 13 columns):
 #   Column                    Non-Null Count  Dtype          
---  ------                    --------------  -----          
 0   Issue key                 222 non-null    object         
 1   Issue id                  222 non-null    int64          
 2   Status                    222 non-null    object         
 3   Reporter                  222 non-null    object         
 4   Created                   222 non-null    datetime64[ns] 
 5   Due Date                  222 non-null    object         
 6   Resolved                  222 non-null    datetime64[ns] 
 7   Customer                  222 non-null    object         
 8   Type of Request           222 non-null    object         
 9   Parent id                 222 non-null    int64          
 10  time_outside_codebusters  222 non-null    timedelta64[ns]
 11  time_on_codebusters       222 non-null    timedelta64[ns]
 12  time_in_

Convert `time in..` columns to minutes

In [150]:
all_tickets_with_time_converted['time_in_review'] = all_tickets_with_time_converted['time_in_review'].dt.total_seconds().div(60)
all_tickets_with_time_converted['time_on_codebusters'] = all_tickets_with_time_converted['time_on_codebusters'].dt.total_seconds().div(60)
all_tickets_with_time_converted['time_outside_codebusters'] = all_tickets_with_time['time_outside_codebusters'].dt.total_seconds().div(60)

all_tickets_with_time_converted.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 222 entries, 0 to 248
Data columns (total 13 columns):
 #   Column                    Non-Null Count  Dtype         
---  ------                    --------------  -----         
 0   Issue key                 222 non-null    object        
 1   Issue id                  222 non-null    int64         
 2   Status                    222 non-null    object        
 3   Reporter                  222 non-null    object        
 4   Created                   222 non-null    datetime64[ns]
 5   Due Date                  222 non-null    object        
 6   Resolved                  222 non-null    datetime64[ns]
 7   Customer                  222 non-null    object        
 8   Type of Request           222 non-null    object        
 9   Parent id                 222 non-null    int64         
 10  time_outside_codebusters  222 non-null    float64       
 11  time_on_codebusters       222 non-null    float64       
 12  time_in_review        

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_tickets_with_time_converted['time_in_review'] = all_tickets_with_time_converted['time_in_review'].dt.total_seconds().div(60)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_tickets_with_time_converted['time_on_codebusters'] = all_tickets_with_time_converted['time_on_codebusters'].dt.total_seconds().div(60)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/st

In [151]:
all_tickets_with_time_converted.reset_index(inplace=True)
all_tickets_with_time_converted

Unnamed: 0,index,Issue key,Issue id,Status,Reporter,Created,Due Date,Resolved,Customer,Type of Request,Parent id,time_outside_codebusters,time_on_codebusters,time_in_review
0,0,CI-8574,468635,Closed,yutaka,2023-04-28 03:47:00,01/May/23 12:00 AM,2023-05-08 16:08:00,Rehau,Service Request (SR),468635,30222.0,60.0,48.0
1,1,CI-8568,468149,Closed,aruizrobles,2023-04-26 15:30:00,03/May/23 12:00 AM,2023-05-04 17:55:00,Numiga,Service Request (SR),468149,14764.0,8566.0,5672.0
2,2,CI-8560,467735,Closed,nsanchez,2023-04-25 11:57:00,08/May/23 12:00 AM,2023-05-08 14:02:00,OBB,Service Request (SR),467735,17962.0,19728.0,19728.0
3,3,CI-8554,467243,Closed,yutaka,2023-04-24 06:44:00,08/May/23 12:00 AM,2023-05-10 04:06:00,Hensoldt,Service Request (SR),467243,44928.0,836.0,0.0
4,4,CI-8552,467085,Closed,nsanchez,2023-04-21 14:58:00,27/Apr/23 12:00 AM,2023-05-08 13:46:00,VW,Service Request (SR),467085,12036.0,36780.0,19906.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
217,242,CI-7458,416956,Closed,ssanz,2022-09-26 11:00:00,03/Oct/22 12:00 AM,2023-01-24 15:34:00,Endress Hauser,Service Request (SR),416956,0.0,185.0,185.0
218,243,CI-7229,407702,Closed,ssanz,2022-08-10 12:43:00,17/Aug/22 12:00 AM,2023-01-17 15:55:00,Vaillant,Service Request (SR),407702,18713.0,22955.0,19913.0
219,244,CI-7227,407574,Closed,skarja,2022-08-10 00:39:00,30/Sep/22 12:00 AM,2023-01-24 11:52:00,Rheinmetall,Service Request (SR),407574,1.0,148301.0,148301.0
220,246,CI-7205,406532,Closed,nsanchez,2022-08-04 11:54:00,02/May/23 12:00 AM,2023-04-10 08:47:00,Hugo Boss,Service Request (SR),406532,0.0,699210.0,699210.0
