# Codebusters KPIs and Other Usefull Patterns

## import libraries

In [521]:
import pandas as pd
import numpy as np

Import and clean `all_tickets_2022` data set 
- remove unnecessary columns
- rename columns

# Prepare data with all tickets

### Import all service request

Tickets are taken from the following jira filter:<br/> `project = CI AND issuetype in (standardIssueTypes(), "Expense Delivery") AND "Epic Link" is EMPTY AND "Case Number/s" is not EMPTY AND cf[14125] in ("Service Request (SR)") AND resolved is not EMPTY AND resolutiondate >= 2022-12-19 and resolutiondate <= 2023-01-08`

In [522]:
all_sr_origin = pd.read_csv('../../DataSets/KPIs/MainTickets/all_service_requests.csv')
all_sr_origin

Unnamed: 0,Summary,Issue key,Issue id,Issue Type,Status,Project key,Project name,Project type,Project lead,Project description,...,Comment.18,Comment.19,Comment.20,Comment.21,Comment.22,Comment.23,Comment.24,Comment.25,Comment.26,Comment.27
0,Rehau - Add EDI number Airplus,CI-8574,468635,Expense Delivery,Closed,CI,codebusters,software,rohkamm,,...,,,,,,,,,,
1,Numiga VFL Wolfsburg - Remove a receipt,CI-8568,468149,Expense Delivery,Closed,CI,codebusters,software,rohkamm,,...,,,,,,,,,,
2,oebb: correct perdiem/EMS,CI-8560,467735,Expense Delivery,Closed,CI,codebusters,software,rohkamm,,...,,,,,,,,,,
3,Hensoldt - Update Approval notification mail,CI-8554,467243,Expense Delivery,Closed,CI,codebusters,software,rohkamm,,...,,,,,,,,,,
4,vw-cso: activate bing map in itinerary section,CI-8552,467085,Expense Delivery,Closed,CI,codebusters,software,rohkamm,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
309,Aquila Capital - Expense issue with export and...,CI-6926,393799,Expense Delivery,Closed,CI,codebusters,software,rohkamm,,...,04/Aug/22 10:36 AM;gmalarski;Export path added...,27/Jan/23 7:54 PM;raphose;SR is closed since A...,,,,,,,,
310,VW - Splitting of approval for expense statements,CI-6911,393585,Expense Delivery,Closed,CI,codebusters,software,rohkamm,,...,,,,,,,,,,
311,Porsche - Adjustment of approval process for n...,CI-6901,393357,Expense Delivery,Closed,CI,codebusters,software,rohkamm,,...,,,,,,,,,,
312,Hugo Boss - add 3 new meal receipt types for t...,CI-6825,389774,Expense Delivery,Closed,CI,codebusters,software,rohkamm,,...,,,,,,,,,,


### Select from the dataset only columns necessary for the further analysis

In [523]:
all_sr = all_sr_origin[['Issue key', 'Issue id', 'Status', 'Reporter', 'Created', 'Due Date', 'Resolved', 'Custom field (Customer/s)','Custom field (Type of Request)']]
all_sr.rename({"Custom field (Customer/s)": "Customer", "Custom field (Type of Request)":"Type of Request"},axis=1, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().rename(


In [524]:
all_sr.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 314 entries, 0 to 313
Data columns (total 9 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Issue key        314 non-null    object
 1   Issue id         314 non-null    int64 
 2   Status           314 non-null    object
 3   Reporter         314 non-null    object
 4   Created          314 non-null    object
 5   Due Date         314 non-null    object
 6   Resolved         314 non-null    object
 7   Customer         314 non-null    object
 8   Type of Request  314 non-null    object
dtypes: int64(1), object(8)
memory usage: 22.2+ KB


In [525]:
all_sr.isnull().sum()

Issue key          0
Issue id           0
Status             0
Reporter           0
Created            0
Due Date           0
Resolved           0
Customer           0
Type of Request    0
dtype: int64

### Convert date table to `datetime` object

In [526]:
all_sr["Created"] = pd.to_datetime(all_sr["Created"])
all_sr["Resolved"] = pd.to_datetime(all_sr["Resolved"])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_sr["Created"] = pd.to_datetime(all_sr["Created"])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_sr["Resolved"] = pd.to_datetime(all_sr["Resolved"])


In [527]:
all_sr.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 314 entries, 0 to 313
Data columns (total 9 columns):
 #   Column           Non-Null Count  Dtype         
---  ------           --------------  -----         
 0   Issue key        314 non-null    object        
 1   Issue id         314 non-null    int64         
 2   Status           314 non-null    object        
 3   Reporter         314 non-null    object        
 4   Created          314 non-null    datetime64[ns]
 5   Due Date         314 non-null    object        
 6   Resolved         314 non-null    datetime64[ns]
 7   Customer         314 non-null    object        
 8   Type of Request  314 non-null    object        
dtypes: datetime64[ns](2), int64(1), object(6)
memory usage: 22.2+ KB


In [585]:
all_sr.shape

(314, 9)

**Summary**
- there is not SR ticket without `resolved` date
- there is no `null` value in any of the column
- date type columns converted to `datetime` object in order to make calculucations on these columns

# Prepare data with individuals tasks for service requests

## Import and clean `all_individual_tasks` data set 
- we are not able to downaload full data becuase of jira limitation, therefore we need to combine data from differen periods
- remove unnecessary columns
- replace missing `date` values with 0
- drop rows with "null" values

In [528]:
individual_part_1 = pd.read_csv('../../DataSets/KPIs/IndividualTasks/from_20220901_till_20221231.csv')
individual_part_2 = pd.read_csv('../../DataSets/KPIs/IndividualTasks/from_20230101_till_20230331.csv')
individual_part_3 = pd.read_csv('../../DataSets/KPIs/IndividualTasks/from_20230401_till_20230515.csv')



all_individual_tasks_combined = pd.concat([individual_part_1, individual_part_2, individual_part_3], axis=0)
all_individual_tasks_combined

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


Unnamed: 0,Summary,Issue key,Issue id,Parent id,Issue Type,Status,Project key,Project name,Project type,Project lead,...,Custom field (When did it happen?),Custom field (When was it solved?),Custom field (Which customers are affected?),Custom field (arctic Dev Findings),Custom field (arctic QA Findings),Custom field (cytric Stability Findings),Comment,Comment.1,Comment.2,Comment.3
0,Review,CI-10239939,472135,472134,Individual Task,Done,CI,codebusters,software,rohkamm,...,,,,,,,,,,
1,Review,CI-10239935,472112,472111,Individual Task,Done,CI,codebusters,software,rohkamm,...,,,,,,,,,,
2,Import to Test System,CI-10239926,472054,470604,Individual Task,Done,CI,codebusters,software,rohkamm,...,,,,,,,,,,
3,Configure,CI-10239925,472053,470604,Individual Task,Done,CI,codebusters,software,rohkamm,...,,,,,,,,,,
4,Import in Production,CI-10239908,472007,470369,Individual Task,Done,CI,codebusters,software,rohkamm,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
14,Review,CI-10239444,470370,470369,Individual Task,Done,CI,codebusters,software,rohkamm,...,,,,,,,,,,
15,Review,CI-10239415,470330,470329,Individual Task,Done,CI,codebusters,software,rohkamm,...,,,,,,,,,,
16,Communicate,CI-10238747,468182,454791,Individual Task,Done,CI,codebusters,software,rohkamm,...,,,,,,,,,,
17,Communicate,CI-10238742,468160,467976,Individual Task,Done,CI,codebusters,software,rohkamm,...,,,,,,,,,,


### Reset indexex of a combined dataset </br>
Indexes of a new dataset are not appropriate, we need to apply `reset_index` method in order to match them with the number of rows

In [529]:
all_individual_tasks_combined.reset_index(inplace=True)
all_individual_tasks_combined

Unnamed: 0,index,Summary,Issue key,Issue id,Parent id,Issue Type,Status,Project key,Project name,Project type,...,Custom field (When did it happen?),Custom field (When was it solved?),Custom field (Which customers are affected?),Custom field (arctic Dev Findings),Custom field (arctic QA Findings),Custom field (cytric Stability Findings),Comment,Comment.1,Comment.2,Comment.3
0,0,Review,CI-10239939,472135,472134,Individual Task,Done,CI,codebusters,software,...,,,,,,,,,,
1,1,Review,CI-10239935,472112,472111,Individual Task,Done,CI,codebusters,software,...,,,,,,,,,,
2,2,Import to Test System,CI-10239926,472054,470604,Individual Task,Done,CI,codebusters,software,...,,,,,,,,,,
3,3,Configure,CI-10239925,472053,470604,Individual Task,Done,CI,codebusters,software,...,,,,,,,,,,
4,4,Import in Production,CI-10239908,472007,470369,Individual Task,Done,CI,codebusters,software,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6104,14,Review,CI-10239444,470370,470369,Individual Task,Done,CI,codebusters,software,...,,,,,,,,,,
6105,15,Review,CI-10239415,470330,470329,Individual Task,Done,CI,codebusters,software,...,,,,,,,,,,
6106,16,Communicate,CI-10238747,468182,454791,Individual Task,Done,CI,codebusters,software,...,,,,,,,,,,
6107,17,Communicate,CI-10238742,468160,467976,Individual Task,Done,CI,codebusters,software,...,,,,,,,,,,


In [587]:
all_individual_tasks_combined.to_csv("all_individual_tasks_combined.csv")

### Filter columns and select only the ones needed for further calculations

In [530]:
all_individual_tasks = all_individual_tasks_combined[['Summary','Issue id', 'Parent id', 'Created', 'Resolved']]
all_individual_tasks

Unnamed: 0,Summary,Issue id,Parent id,Created,Resolved
0,Review,472135,472134,15/May/23 1:22 PM,15/May/23 2:01 PM
1,Review,472112,472111,15/May/23 12:39 PM,15/May/23 12:39 PM
2,Import to Test System,472054,470604,15/May/23 10:14 AM,15/May/23 10:14 AM
3,Configure,472053,470604,15/May/23 10:14 AM,15/May/23 10:14 AM
4,Import in Production,472007,470369,15/May/23 8:54 AM,15/May/23 1:16 PM
...,...,...,...,...,...
6104,Review,470370,470369,08/May/23 6:17 PM,15/May/23 8:54 AM
6105,Review,470330,470329,08/May/23 3:47 PM,15/May/23 11:12 AM
6106,Communicate,468182,454791,26/Apr/23 6:11 PM,15/May/23 5:50 AM
6107,Communicate,468160,467976,26/Apr/23 4:10 PM,15/May/23 5:56 AM


In [531]:
all_individual_tasks[all_individual_tasks['Parent id'] == 465210]

Unnamed: 0,Summary,Issue id,Parent id,Created,Resolved
658,Explain,466478,465210,20/Apr/23 8:22 AM,26/Apr/23 10:33 AM
854,Review,465211,465210,17/Apr/23 9:43 AM,20/Apr/23 8:22 AM
5126,Explain,466478,465210,20/Apr/23 8:22 AM,26/Apr/23 10:33 AM
5322,Review,465211,465210,17/Apr/23 9:43 AM,20/Apr/23 8:22 AM


### Convert date like columns to `datetime` object

In [532]:
all_individual_tasks["Created"] = pd.to_datetime(all_individual_tasks_2022["Created"])
all_individual_tasks["Resolved"] = pd.to_datetime(all_individual_tasks_2022["Resolved"])
all_individual_tasks.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6109 entries, 0 to 6108
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   Summary    6109 non-null   object        
 1   Issue id   6109 non-null   int64         
 2   Parent id  6109 non-null   int64         
 3   Created    6052 non-null   datetime64[ns]
 4   Resolved   6052 non-null   datetime64[ns]
dtypes: datetime64[ns](2), int64(2), object(1)
memory usage: 238.8+ KB


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_individual_tasks["Created"] = pd.to_datetime(all_individual_tasks_2022["Created"])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_individual_tasks["Resolved"] = pd.to_datetime(all_individual_tasks_2022["Resolved"])


In [533]:
all_individual_tasks['Summary'].value_counts()

Review                            1088
Configure                         1047
Verify                             950
Communicate                        922
Import to Test System              790
Import in Production               634
Import to Customer Test System     402
Explain                            183
Do                                  51
Clarify                             42
Name: Summary, dtype: int64

### Check if there are any `null` values

In [534]:
all_individual_tasks.isnull().sum()

Summary       0
Issue id      0
Parent id     0
Created      57
Resolved     57
dtype: int64

### Drop all rows with null value

In [535]:
all_individual_tasks_without_null = all_individual_tasks.dropna()
all_individual_tasks_without_null.isnull().sum()

Summary      0
Issue id     0
Parent id    0
Created      0
Resolved     0
dtype: int64

**Summary:**

- We are interested only in completed tickets so we have droped rows which contain `null` in `resolved` columns
- Data is filtered and cleaned we are ready for analysis part

# Data Analysis 

# Daterminate how long each ticket was in particular individual task

- calculate time for eacch subtask
- calculate how long each ticket was in review
- calculate how long each ticket was "on codebusters"
- calculate how long each ticket was "outside codebusters"


## Calculate time spent in each of the subtask

In [536]:
all_individual_tasks_time_calculation = all_individual_tasks_without_null
all_individual_tasks_time_calculation["time_in"] = all_individual_tasks_time_calculation["Resolved"] - all_individual_tasks_time_calculation["Created"]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_individual_tasks_time_calculation["time_in"] = all_individual_tasks_time_calculation["Resolved"] - all_individual_tasks_time_calculation["Created"]


### List all available statuses of the CI ticket subtask

In [537]:
all_individual_tasks_time_calculation["Summary"].value_counts()

Review                            1070
Configure                         1041
Verify                             943
Communicate                        915
Import to Test System              783
Import in Production               629
Import to Customer Test System     399
Explain                            180
Do                                  50
Clarify                             42
Name: Summary, dtype: int64

## Calculate time ticket was in review step

In [599]:
all_individual_tasks_reivew = all_individual_tasks_without_null[all_individual_tasks_without_null['Summary'] == 'Review']
all_individual_tasks_reivew.rename({"time_in": "time_in_review"},axis=1, inplace=True)
all_individual_tasks_reivew.reset_index(inplace=True)
all_individual_tasks_reivew

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().rename(


Unnamed: 0,index,Summary,Issue id,Parent id,Created,Resolved,time_in_review
0,0,Review,472135,472134,2022-05-30 22:30:00,2022-06-21 13:17:00,21 days 14:47:00
1,1,Review,472112,472111,2022-05-30 20:27:00,2022-05-31 09:59:00,0 days 13:32:00
2,13,Review,471790,471789,2022-05-30 16:28:00,2022-05-31 14:21:00,0 days 21:53:00
3,15,Review,471776,471775,2022-05-30 15:47:00,2022-06-01 16:10:00,2 days 00:23:00
4,16,Review,471771,471770,2022-05-30 15:47:00,2022-05-30 15:47:00,0 days 00:00:00
...,...,...,...,...,...,...,...
1065,6100,Review,470620,470619,2022-07-13 14:06:00,2022-07-13 14:06:00,0 days 00:00:00
1066,6101,Review,470612,470611,2022-07-13 12:39:00,2022-07-19 08:47:00,5 days 20:08:00
1067,6102,Review,470605,470604,2022-07-13 11:45:00,2022-08-04 13:41:00,22 days 01:56:00
1068,6104,Review,470370,470369,2022-07-13 11:25:00,2022-07-21 12:32:00,8 days 01:07:00


In [607]:
all_individual_tasks_reivew[all_individual_tasks_reivew["Parent id"] == 465210]

Unnamed: 0,index,Summary,Issue id,Parent id,Created,Resolved,time_in_review
135,854,Review,465211,465210,2022-05-09 08:08:00,2022-05-09 08:08:00,0 days 00:00:00
914,5322,Review,465211,465210,2022-08-11 12:33:00,2022-08-22 09:37:00,10 days 21:04:00


In [613]:
all_individual_tasks_reivew[all_individual_tasks_reivew["Parent id"] == 465162]

Unnamed: 0,index,Summary,Issue id,Parent id,Created,Resolved,time_in_review
136,858,Review,465163,465162,2022-05-09 04:58:00,2022-05-12 13:53:00,3 days 08:55:00
915,5326,Review,465163,465162,2022-08-11 11:49:00,2022-08-11 11:50:00,0 days 00:01:00


In [612]:
null_rows

Unnamed: 0,Issue key,Issue id,Status,Reporter,Created,Due Date,Resolved,Customer,Type of Request,Parent id,time_in_review
13,CI-8489,465210,Closed,kgostynska,2023-04-17 09:43:00,24/Apr/23 12:00 AM,2023-04-26 10:33:00,ArcelorMittal,Service Request (SR),,NaT
14,CI-8488,465162,Closed,yutaka,2023-04-17 03:43:00,24/Apr/23 12:00 AM,2023-04-18 05:37:00,CMS Hasche Sigle,Service Request (SR),,NaT
15,CI-8487,465050,Closed,nsanchez,2023-04-14 11:16:00,27/Apr/23 12:00 AM,2023-04-28 11:25:00,VW,Service Request (SR),,NaT
16,CI-8486,465008,Closed,yutaka,2023-04-14 09:23:00,21/Apr/23 12:00 AM,2023-05-08 02:17:00,Amadeus,Service Request (SR),,NaT
17,CI-8485,464979,Closed,yutaka,2023-04-14 02:51:00,21/Apr/23 12:00 AM,2023-04-17 11:31:00,Enercon,Service Request (SR),,NaT
...,...,...,...,...,...,...,...,...,...,...,...
88,CI-8239,452166,Closed,yutaka,2023-03-01 02:16:00,06/Mar/23 12:00 AM,2023-03-29 09:54:00,Endress Hauser,Service Request (SR),,NaT
129,CI-8118,446512,Closed,skarja,2023-02-06 20:05:00,13/Feb/23 12:00 AM,2023-04-19 01:24:00,TeamBank AG,Service Request (SR),,NaT
263,CI-7788,431813,Closed,yutaka,2022-12-01 01:54:00,08/Dec/22 12:00 AM,2023-04-05 06:54:00,Amadeus,Service Request (SR),,NaT
285,CI-7550,420892,Closed,yutaka,2022-10-12 17:05:00,19/Oct/22 12:00 AM,2023-04-04 16:34:00,Hugo Boss,Service Request (SR),,NaT


### Grup tickets by the `Parent id` and sum total time spent in `review` subtask


In [611]:
all_individual_tasks_reivew_time.groupby("Parent id").groups

{174802: Int64Index([1564], dtype='int64'),
 329036: Int64Index([1255], dtype='int64'),
 336670: Int64Index([9090], dtype='int64'),
 338553: Int64Index([3306], dtype='int64'),
 345462: Int64Index([2906, 2994], dtype='int64'),
 347539: Int64Index([4588], dtype='int64'),
 349721: Int64Index([12274], dtype='int64'),
 353363: Int64Index([4551], dtype='int64'),
 357612: Int64Index([4491], dtype='int64'),
 357942: Int64Index([4772], dtype='int64'),
 358397: Int64Index([3932, 3936], dtype='int64'),
 358945: Int64Index([4592], dtype='int64'),
 359493: Int64Index([3433, 3658], dtype='int64'),
 359638: Int64Index([3026], dtype='int64'),
 359731: Int64Index([4250], dtype='int64'),
 359734: Int64Index([2733], dtype='int64'),
 359849: Int64Index([4305], dtype='int64'),
 359961: Int64Index([4581], dtype='int64'),
 359968: Int64Index([4047], dtype='int64'),
 360164: Int64Index([4049], dtype='int64'),
 360277: Int64Index([4661], dtype='int64'),
 360335: Int64Index([4480], dtype='int64'),
 360424: Int6

In [617]:
all_individual_tasks_reivew_sum = all_individual_tasks_reivew_time.groupby(by=["Parent id"], dropna=False)["time_in_review"].sum()
all_individual_tasks_reivew_sum = all_individual_tasks_reivew_sum.to_frame()
all_individual_tasks_reivew_sum.reset_index(inplace=True)
all_individual_tasks_reivew_sum

TypeError: groupby() got an unexpected keyword argument 'dropna'

In [618]:
!pip install -U pandas

Collecting pandas
  Downloading pandas-2.0.1-cp38-cp38-win_amd64.whl (10.8 MB)
Collecting python-dateutil>=2.8.2
  Using cached python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
Collecting tzdata>=2022.1
  Downloading tzdata-2023.3-py2.py3-none-any.whl (341 kB)
Collecting numpy>=1.20.3; python_version < "3.10"
  Downloading numpy-1.24.3-cp38-cp38-win_amd64.whl (14.9 MB)
Installing collected packages: python-dateutil, tzdata, numpy, pandas
  Attempting uninstall: python-dateutil

ERROR: Could not install packages due to an EnvironmentError: [WinError 5] Access is denied: 'C:\\Users\\gmalarski\\Anaconda3\\Lib\\site-packages\\~umpy\\core\\_multiarray_tests.cp38-win_amd64.pyd'
Consider using the `--user` option or check the permissions.




    Found existing installation: python-dateutil 2.8.1
    Uninstalling python-dateutil-2.8.1:
      Successfully uninstalled python-dateutil-2.8.1
  Attempting uninstall: numpy
    Found existing installation: numpy 1.18.5
    Uninstalling numpy-1.18.5:
      Successfully uninstalled numpy-1.18.5


In [621]:
pd.__version__

'1.0.5'

In [598]:
all_individual_tasks_reivew_sum.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2061 entries, 0 to 2060
Data columns (total 2 columns):
 #   Column          Non-Null Count  Dtype          
---  ------          --------------  -----          
 0   Parent id       2061 non-null   int64          
 1   time_in_review  2061 non-null   timedelta64[ns]
dtypes: int64(1), timedelta64[ns](1)
memory usage: 32.3 KB


In [597]:
all_individual_tasks_reivew_sum[all_individual_tasks_reivew_sum["Parent id"] == 465210]

Unnamed: 0,Parent id,time_in_review


### Check on the particular ticket if the total time is summed correctly

In [605]:
all_individual_tasks_reivew.groupby("Parent id").count()

Unnamed: 0_level_0,index,Summary,Issue id,Created,Resolved,time_in_review
Parent id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
331914,1,1,1,1,1,1
338034,1,1,1,1,1,1
349721,1,1,1,1,1,1
351500,2,2,2,2,2,2
356993,1,1,1,1,1,1
...,...,...,...,...,...,...
471770,2,2,2,2,2,2
471775,2,2,2,2,2,2
471789,2,2,2,2,2,2
472111,3,3,3,3,3,3


In [602]:
more_than_one = all_individual_tasks_reivew[all_individual_tasks_reivew.groupby("Parent id").count()['Summary'] > 2]

  more_than_one = all_individual_tasks_reivew[all_individual_tasks_reivew.groupby("Parent id").count()['Summary'] > 2]


IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).

In [541]:
all_individual_tasks_reivew.loc[all_individual_tasks_reivew["Parent id"] == 345462]

Unnamed: 0,Summary,Issue id,Parent id,Created,Resolved,time_in_review


In [542]:
all_individual_tasks_reivew_sum_df = all_individual_tasks_reivew_sum.to_frame()
all_individual_tasks_reivew_sum_df.reset_index(inplace=True)
all_individual_tasks_reivew_sum_df

Unnamed: 0,Parent id,time_in_review
0,174802,0 days 00:01:00
1,329036,0 days 00:00:00
2,336670,0 days 00:01:00
3,338553,238 days 13:36:00
4,345462,8 days 01:23:00
...,...,...
2056,471134,0 days 18:55:00
2057,471138,0 days 19:37:00
2058,471236,0 days 00:21:00
2059,471244,0 days 00:01:00


In [588]:
all_individual_tasks_reivew_sum_df.loc[all_individual_tasks_reivew_sum_df["Parent id"] == 345462]

Unnamed: 0,Parent id,time_in_review


In [544]:
all_individual_tasks_reivew_sum_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2061 entries, 0 to 2060
Data columns (total 2 columns):
 #   Column          Non-Null Count  Dtype          
---  ------          --------------  -----          
 0   Parent id       2061 non-null   int64          
 1   time_in_review  2061 non-null   timedelta64[ns]
dtypes: int64(1), timedelta64[ns](1)
memory usage: 32.3 KB


In [545]:
all_individual_tasks_reivew_sum_df

Unnamed: 0,Parent id,time_in_review
0,174802,0 days 00:01:00
1,329036,0 days 00:00:00
2,336670,0 days 00:01:00
3,338553,238 days 13:36:00
4,345462,8 days 01:23:00
...,...,...
2056,471134,0 days 18:55:00
2057,471138,0 days 19:37:00
2058,471236,0 days 00:21:00
2059,471244,0 days 00:01:00


**Summary:**
- There are some tickets which have been more than once in the `Review` step
- We can notice that after gruping and summing tickets, case with 'Parent id' `345462` has total time equals more than `8 days`, which is the sum of the time spend in previous two `Review` subtasks

### Merge `all_sr` dataset with `all_individual_tasks_reivew_sum_df`

In [546]:
all_sr_review_time = pd.merge(all_sr, all_individual_tasks_reivew_sum_df[['Parent id', 'time_in_review']], how="left",left_on = "Issue id",right_on = "Parent id")
all_sr_review_time

Unnamed: 0,Issue key,Issue id,Status,Reporter,Created,Due Date,Resolved,Customer,Type of Request,Parent id,time_in_review
0,CI-8574,468635,Closed,yutaka,2023-04-28 03:47:00,01/May/23 12:00 AM,2023-05-08 16:08:00,Rehau,Service Request (SR),468635.0,0 days 00:24:00
1,CI-8568,468149,Closed,aruizrobles,2023-04-26 15:30:00,03/May/23 12:00 AM,2023-05-04 17:55:00,Numiga,Service Request (SR),468149.0,1 days 23:16:00
2,CI-8560,467735,Closed,nsanchez,2023-04-25 11:57:00,08/May/23 12:00 AM,2023-05-08 14:02:00,OBB,Service Request (SR),467735.0,6 days 20:24:00
3,CI-8554,467243,Closed,yutaka,2023-04-24 06:44:00,08/May/23 12:00 AM,2023-05-10 04:06:00,Hensoldt,Service Request (SR),467243.0,0 days 00:00:00
4,CI-8552,467085,Closed,nsanchez,2023-04-21 14:58:00,27/Apr/23 12:00 AM,2023-05-08 13:46:00,VW,Service Request (SR),467085.0,6 days 21:53:00
...,...,...,...,...,...,...,...,...,...,...,...
309,CI-6926,393799,Closed,aklabisz,2022-05-25 12:42:00,12/Jul/22 12:00 AM,2023-01-27 19:53:00,Aquila Capital,Service Request (SR),393799.0,38 days 18:52:00
310,CI-6911,393585,Closed,aklabisz,2022-05-24 15:11:00,11/Aug/22 12:00 AM,2023-01-24 10:33:00,VW,Service Request (SR),393585.0,24 days 12:21:00
311,CI-6901,393357,Closed,aklabisz,2022-05-23 13:21:00,30/May/22 12:00 AM,2023-03-21 13:06:00,Porsche,Service Request (SR),393357.0,28 days 01:38:00
312,CI-6825,389774,Closed,aklabisz,2022-05-11 11:16:00,16/May/22 12:00 AM,2023-01-24 12:32:00,Hugo Boss,Service Request (SR),389774.0,8 days 14:12:00


In [547]:
all_sr_review_time.isnull().sum()

Issue key           0
Issue id            0
Status              0
Reporter            0
Created             0
Due Date            0
Resolved            0
Customer            0
Type of Request     0
Parent id          66
time_in_review     66
dtype: int64

### Investigate `null` rows after merging  main SR tickets with individual subtasks

In [548]:
null_rows = all_sr_review_time[all_sr_review_time["time_in_review"].isnull()]
null_rows.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 66 entries, 13 to 288
Data columns (total 11 columns):
 #   Column           Non-Null Count  Dtype          
---  ------           --------------  -----          
 0   Issue key        66 non-null     object         
 1   Issue id         66 non-null     int64          
 2   Status           66 non-null     object         
 3   Reporter         66 non-null     object         
 4   Created          66 non-null     datetime64[ns] 
 5   Due Date         66 non-null     object         
 6   Resolved         66 non-null     datetime64[ns] 
 7   Customer         66 non-null     object         
 8   Type of Request  66 non-null     object         
 9   Parent id        0 non-null      float64        
 10  time_in_review   0 non-null      timedelta64[ns]
dtypes: datetime64[ns](2), float64(1), int64(1), object(6), timedelta64[ns](1)
memory usage: 6.2+ KB


In [549]:
null_rows

Unnamed: 0,Issue key,Issue id,Status,Reporter,Created,Due Date,Resolved,Customer,Type of Request,Parent id,time_in_review
13,CI-8489,465210,Closed,kgostynska,2023-04-17 09:43:00,24/Apr/23 12:00 AM,2023-04-26 10:33:00,ArcelorMittal,Service Request (SR),,NaT
14,CI-8488,465162,Closed,yutaka,2023-04-17 03:43:00,24/Apr/23 12:00 AM,2023-04-18 05:37:00,CMS Hasche Sigle,Service Request (SR),,NaT
15,CI-8487,465050,Closed,nsanchez,2023-04-14 11:16:00,27/Apr/23 12:00 AM,2023-04-28 11:25:00,VW,Service Request (SR),,NaT
16,CI-8486,465008,Closed,yutaka,2023-04-14 09:23:00,21/Apr/23 12:00 AM,2023-05-08 02:17:00,Amadeus,Service Request (SR),,NaT
17,CI-8485,464979,Closed,yutaka,2023-04-14 02:51:00,21/Apr/23 12:00 AM,2023-04-17 11:31:00,Enercon,Service Request (SR),,NaT
...,...,...,...,...,...,...,...,...,...,...,...
88,CI-8239,452166,Closed,yutaka,2023-03-01 02:16:00,06/Mar/23 12:00 AM,2023-03-29 09:54:00,Endress Hauser,Service Request (SR),,NaT
129,CI-8118,446512,Closed,skarja,2023-02-06 20:05:00,13/Feb/23 12:00 AM,2023-04-19 01:24:00,TeamBank AG,Service Request (SR),,NaT
263,CI-7788,431813,Closed,yutaka,2022-12-01 01:54:00,08/Dec/22 12:00 AM,2023-04-05 06:54:00,Amadeus,Service Request (SR),,NaT
285,CI-7550,420892,Closed,yutaka,2022-10-12 17:05:00,19/Oct/22 12:00 AM,2023-04-04 16:34:00,Hugo Boss,Service Request (SR),,NaT


In [550]:
all_sr_review_time_without_null = all_sr_review_time.dropna()
all_sr_review_time_without_null.isnull().sum()

Issue key          0
Issue id           0
Status             0
Reporter           0
Created            0
Due Date           0
Resolved           0
Customer           0
Type of Request    0
Parent id          0
time_in_review     0
dtype: int64

**Summary**:
- Ror tickets like `CI-8488`revew time equals 0, so we need to drop it, because it can leads to wrong KPIs calculations

# TO DO
- update the lates dateset

## Determinate time spent on codebusters

- calculate the total time ticket was in subtasks "Configure", "Review", "mport to Test System", "Import in Production", "Import to Customer Test System"
- apply the results to all tickets dataset

In [551]:
all_individual_with_codebusters_time = all_individual_tasks_time_calculation.loc[all_individual_tasks_time_calculation["Summary"].isin(["Configure", "Review", "Import to Test System", "Import in Production", "Import to Customer Test System", "Import to Test", "Import to Staging System"])]

Summarize time for each codebusters subtask 

In [552]:
all_individual_with_codebusters_time.groupby("Parent id").count()

Unnamed: 0_level_0,Summary,Issue id,Created,Resolved,time_in
Parent id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
331914,1,1,1,1,1
338034,1,1,1,1,1
349721,4,4,4,4,4
351500,2,2,2,2,2
356993,4,4,4,4,4
...,...,...,...,...,...
471770,2,2,2,2,2
471775,2,2,2,2,2
471789,2,2,2,2,2
472111,3,3,3,3,3


In [553]:
all_individual_with_codebusters_time[all_individual_with_codebusters_time['Parent id'] == 329036]

Unnamed: 0,Summary,Issue id,Parent id,Created,Resolved,time_in


In [554]:
all_individual_with_codebusters_time_sum = all_individual_with_codebusters_time.groupby("Parent id")["time_in"].sum()
all_individual_with_codebusters_time_sum_df = all_individual_with_codebusters_time_sum.to_frame()
all_individual_with_codebusters_time_sum_df.rename({"time_in": "time_on_codebusters"},axis=1, inplace=True)
all_individual_with_codebusters_time_sum_df.reset_index(inplace=True)
all_individual_with_codebusters_time_sum_df

Unnamed: 0,Parent id,time_on_codebusters
0,331914,0 days 00:02:00
1,338034,0 days 00:01:00
2,349721,4 days 08:13:00
3,351500,15 days 10:33:00
4,356993,17 days 14:49:00
...,...,...
776,471770,1 days 02:10:00
777,471775,2 days 21:39:00
778,471789,1 days 10:33:00
779,472111,1 days 06:48:00


In [555]:
# all_individual_tasks_reivew_sum_df = all_individual_tasks_reivew_sum.to_frame()
# all_individual_tasks_reivew_sum_df.reset_index(inplace=True)
# all_individual_tasks_reivew_sum_df

### Merge total time on codebusters with all jira ticekts dataset

In [556]:
all_sr_codebusters_time = pd.merge(all_sr, all_individual_with_codebusters_time_sum_df, how="inner",left_on = "Issue id",right_on = "Parent id")
all_sr_codebusters_time 

Unnamed: 0,Issue key,Issue id,Status,Reporter,Created,Due Date,Resolved,Customer,Type of Request,Parent id,time_on_codebusters
0,CI-8574,468635,Closed,yutaka,2023-04-28 03:47:00,01/May/23 12:00 AM,2023-05-08 16:08:00,Rehau,Service Request (SR),468635,8 days 00:48:00
1,CI-8568,468149,Closed,aruizrobles,2023-04-26 15:30:00,03/May/23 12:00 AM,2023-05-04 17:55:00,Numiga,Service Request (SR),468149,61 days 06:56:00
2,CI-8560,467735,Closed,nsanchez,2023-04-25 11:57:00,08/May/23 12:00 AM,2023-05-08 14:02:00,OBB,Service Request (SR),467735,10 days 13:28:00
3,CI-8554,467243,Closed,yutaka,2023-04-24 06:44:00,08/May/23 12:00 AM,2023-05-10 04:06:00,Hensoldt,Service Request (SR),467243,22 days 06:07:00
4,CI-8552,467085,Closed,nsanchez,2023-04-21 14:58:00,27/Apr/23 12:00 AM,2023-05-08 13:46:00,VW,Service Request (SR),467085,71 days 23:58:00
...,...,...,...,...,...,...,...,...,...,...,...
244,CI-7227,407574,Closed,skarja,2022-08-10 00:39:00,30/Sep/22 12:00 AM,2023-01-24 11:52:00,Rheinmetall,Service Request (SR),407574,2 days 23:15:00
245,CI-7226,407535,Closed,ssaharoy,2022-08-09 15:05:00,16/Aug/22 12:00 AM,2023-01-18 06:29:00,Porsche,Service Request (SR),407535,0 days 00:00:00
246,CI-7205,406532,Closed,nsanchez,2022-08-04 11:54:00,02/May/23 12:00 AM,2023-04-10 08:47:00,Hugo Boss,Service Request (SR),406532,0 days 15:22:00
247,CI-6901,393357,Closed,aklabisz,2022-05-23 13:21:00,30/May/22 12:00 AM,2023-03-21 13:06:00,Porsche,Service Request (SR),393357,3 days 18:32:00


### Investigate `null` rows after merging  main SR tickets with individual subtasks

In [557]:
all_sr_codebusters_time.isnull().sum()

Issue key              0
Issue id               0
Status                 0
Reporter               0
Created                0
Due Date               0
Resolved               0
Customer               0
Type of Request        0
Parent id              0
time_on_codebusters    0
dtype: int64

In [558]:
all_sr_codebusters_time_without_null = all_sr_codebusters_time.dropna()
all_sr_codebusters_time_without_null.isnull().sum()

Issue key              0
Issue id               0
Status                 0
Reporter               0
Created                0
Due Date               0
Resolved               0
Customer               0
Type of Request        0
Parent id              0
time_on_codebusters    0
dtype: int64

In [559]:
all_sr_codebusters_time_without_null

Unnamed: 0,Issue key,Issue id,Status,Reporter,Created,Due Date,Resolved,Customer,Type of Request,Parent id,time_on_codebusters
0,CI-8574,468635,Closed,yutaka,2023-04-28 03:47:00,01/May/23 12:00 AM,2023-05-08 16:08:00,Rehau,Service Request (SR),468635,8 days 00:48:00
1,CI-8568,468149,Closed,aruizrobles,2023-04-26 15:30:00,03/May/23 12:00 AM,2023-05-04 17:55:00,Numiga,Service Request (SR),468149,61 days 06:56:00
2,CI-8560,467735,Closed,nsanchez,2023-04-25 11:57:00,08/May/23 12:00 AM,2023-05-08 14:02:00,OBB,Service Request (SR),467735,10 days 13:28:00
3,CI-8554,467243,Closed,yutaka,2023-04-24 06:44:00,08/May/23 12:00 AM,2023-05-10 04:06:00,Hensoldt,Service Request (SR),467243,22 days 06:07:00
4,CI-8552,467085,Closed,nsanchez,2023-04-21 14:58:00,27/Apr/23 12:00 AM,2023-05-08 13:46:00,VW,Service Request (SR),467085,71 days 23:58:00
...,...,...,...,...,...,...,...,...,...,...,...
244,CI-7227,407574,Closed,skarja,2022-08-10 00:39:00,30/Sep/22 12:00 AM,2023-01-24 11:52:00,Rheinmetall,Service Request (SR),407574,2 days 23:15:00
245,CI-7226,407535,Closed,ssaharoy,2022-08-09 15:05:00,16/Aug/22 12:00 AM,2023-01-18 06:29:00,Porsche,Service Request (SR),407535,0 days 00:00:00
246,CI-7205,406532,Closed,nsanchez,2022-08-04 11:54:00,02/May/23 12:00 AM,2023-04-10 08:47:00,Hugo Boss,Service Request (SR),406532,0 days 15:22:00
247,CI-6901,393357,Closed,aklabisz,2022-05-23 13:21:00,30/May/22 12:00 AM,2023-03-21 13:06:00,Porsche,Service Request (SR),393357,3 days 18:32:00


**Summary:**

## Determinate time spent outside codebusters

- calculate the total time ticket was in one of the subtask "Explain", "Verify", "Clarify", "Communicate"
- apply results to all SR tickets dataset

In [560]:
all_individual_others_time = all_individual_tasks_time_calculation.loc[all_individual_tasks_time_calculation["Summary"].isin(["Explain", "Verify", "Clarify", "Communicate"])]
all_individual_others_time 

Unnamed: 0,Summary,Issue id,Parent id,Created,Resolved,time_in
7,Verify,471871,466458,2022-05-30 17:01:00,2022-05-30 17:01:00,0 days 00:00:00
9,Verify,471869,466458,2022-05-30 17:01:00,2022-05-30 17:01:00,0 days 00:00:00
14,Explain,471783,465493,2022-05-30 16:18:00,2022-05-30 18:01:00,0 days 01:43:00
17,Communicate,471768,471138,2022-05-30 15:47:00,2022-05-30 15:47:00,0 days 00:00:00
18,Communicate,471767,471134,2022-05-30 15:47:00,2022-05-30 15:47:00,0 days 00:00:00
...,...,...,...,...,...,...
6098,Explain,471783,465493,2022-07-13 14:06:00,2022-09-29 15:35:00,78 days 01:29:00
6099,Communicate,470747,455636,2022-07-13 14:06:00,2022-07-13 14:06:00,0 days 00:00:00
6106,Communicate,468182,454791,2022-07-13 11:16:00,2022-07-21 11:28:00,8 days 00:12:00
6107,Communicate,468160,467976,2022-07-13 11:11:00,2022-07-13 14:30:00,0 days 03:19:00


Summarize time for each subtask outside codeubsters

In [561]:
all_individual_others_time_sum = all_individual_others_time.groupby("Parent id")["time_in"].sum()
all_individual_others_time_sum_df = all_individual_others_time_sum.to_frame()
all_individual_others_time_sum_df.rename({"time_in": "time_outside_codebusters"},axis=1, inplace=True)
all_individual_others_time_sum_df.reset_index(inplace=True)
all_individual_others_time_sum_df

Unnamed: 0,Parent id,time_outside_codebusters
0,331914,0 days 00:48:00
1,338034,2 days 15:35:00
2,349721,1 days 03:12:00
3,351500,160 days 01:15:00
4,356993,0 days 17:16:00
...,...,...
831,471086,62 days 20:16:00
832,471134,4 days 02:39:00
833,471138,8 days 15:05:00
834,471236,20 days 10:28:00


In [562]:
# all_individual_with_codebusters_time_sum = all_individual_with_codebusters_time.groupby("Parent id")["time_in"].sum()
# all_individual_with_codebusters_time_sum_df = all_individual_with_codebusters_time_sum.to_frame()
# all_individual_with_codebusters_time_sum_df.rename({"time_in": "time_on_codebusters"},axis=1, inplace=True)
# all_individual_with_codebusters_time_sum_df.reset_index(inplace=True)
# all_individual_with_codebusters_time_sum_df

### Merge `all_sr` with `all_individual_others_time_sum_df` dataset

In [563]:
all_sr_outside_codebusters_time = pd.merge(all_sr, all_individual_others_time_sum_df, how="inner",left_on = "Issue id",right_on = "Parent id")
all_sr_outside_codebusters_time

Unnamed: 0,Issue key,Issue id,Status,Reporter,Created,Due Date,Resolved,Customer,Type of Request,Parent id,time_outside_codebusters
0,CI-8574,468635,Closed,yutaka,2023-04-28 03:47:00,01/May/23 12:00 AM,2023-05-08 16:08:00,Rehau,Service Request (SR),468635,31 days 21:34:00
1,CI-8568,468149,Closed,aruizrobles,2023-04-26 15:30:00,03/May/23 12:00 AM,2023-05-04 17:55:00,Numiga,Service Request (SR),468149,5 days 12:03:00
2,CI-8560,467735,Closed,nsanchez,2023-04-25 11:57:00,08/May/23 12:00 AM,2023-05-08 14:02:00,OBB,Service Request (SR),467735,144 days 19:06:00
3,CI-8554,467243,Closed,yutaka,2023-04-24 06:44:00,08/May/23 12:00 AM,2023-05-10 04:06:00,Hensoldt,Service Request (SR),467243,8 days 09:39:00
4,CI-8552,467085,Closed,nsanchez,2023-04-21 14:58:00,27/Apr/23 12:00 AM,2023-05-08 13:46:00,VW,Service Request (SR),467085,33 days 11:41:00
...,...,...,...,...,...,...,...,...,...,...,...
270,CI-7029,398756,Closed,skarja,2022-06-23 14:32:00,30/Jun/22 12:00 AM,2023-01-27 19:55:00,TeamBank AG,Service Request (SR),398756,12 days 16:29:00
271,CI-6911,393585,Closed,aklabisz,2022-05-24 15:11:00,11/Aug/22 12:00 AM,2023-01-24 10:33:00,VW,Service Request (SR),393585,0 days 00:18:00
272,CI-6901,393357,Closed,aklabisz,2022-05-23 13:21:00,30/May/22 12:00 AM,2023-03-21 13:06:00,Porsche,Service Request (SR),393357,0 days 16:28:00
273,CI-6825,389774,Closed,aklabisz,2022-05-11 11:16:00,16/May/22 12:00 AM,2023-01-24 12:32:00,Hugo Boss,Service Request (SR),389774,0 days 16:39:00


In [564]:
all_sr_outside_codebusters_time.isnull().sum()

Issue key                   0
Issue id                    0
Status                      0
Reporter                    0
Created                     0
Due Date                    0
Resolved                    0
Customer                    0
Type of Request             0
Parent id                   0
time_outside_codebusters    0
dtype: int64

**Summary:**

## Combine all created datasets
- In this step we will merge all created erlier datasets: `all_sr_outside_codebusters_time`, `all_sr_codebusters_time` and `all_sr_review_time` 

Merge df containing "review" time with df containg "time on codebusters" 

In [565]:
all_tickets_time_on_and_outside_codebusters = pd.merge(all_sr_outside_codebusters_time, all_sr_codebusters_time[['Issue id','time_on_codebusters']],  how="inner",left_on = "Issue id",right_on = "Issue id")
all_tickets_time_on_and_outside_codebusters

Unnamed: 0,Issue key,Issue id,Status,Reporter,Created,Due Date,Resolved,Customer,Type of Request,Parent id,time_outside_codebusters,time_on_codebusters
0,CI-8574,468635,Closed,yutaka,2023-04-28 03:47:00,01/May/23 12:00 AM,2023-05-08 16:08:00,Rehau,Service Request (SR),468635,31 days 21:34:00,8 days 00:48:00
1,CI-8568,468149,Closed,aruizrobles,2023-04-26 15:30:00,03/May/23 12:00 AM,2023-05-04 17:55:00,Numiga,Service Request (SR),468149,5 days 12:03:00,61 days 06:56:00
2,CI-8560,467735,Closed,nsanchez,2023-04-25 11:57:00,08/May/23 12:00 AM,2023-05-08 14:02:00,OBB,Service Request (SR),467735,144 days 19:06:00,10 days 13:28:00
3,CI-8554,467243,Closed,yutaka,2023-04-24 06:44:00,08/May/23 12:00 AM,2023-05-10 04:06:00,Hensoldt,Service Request (SR),467243,8 days 09:39:00,22 days 06:07:00
4,CI-8552,467085,Closed,nsanchez,2023-04-21 14:58:00,27/Apr/23 12:00 AM,2023-05-08 13:46:00,VW,Service Request (SR),467085,33 days 11:41:00,71 days 23:58:00
...,...,...,...,...,...,...,...,...,...,...,...,...
244,CI-7227,407574,Closed,skarja,2022-08-10 00:39:00,30/Sep/22 12:00 AM,2023-01-24 11:52:00,Rheinmetall,Service Request (SR),407574,14 days 09:43:00,2 days 23:15:00
245,CI-7226,407535,Closed,ssaharoy,2022-08-09 15:05:00,16/Aug/22 12:00 AM,2023-01-18 06:29:00,Porsche,Service Request (SR),407535,4 days 00:03:00,0 days 00:00:00
246,CI-7205,406532,Closed,nsanchez,2022-08-04 11:54:00,02/May/23 12:00 AM,2023-04-10 08:47:00,Hugo Boss,Service Request (SR),406532,0 days 18:51:00,0 days 15:22:00
247,CI-6901,393357,Closed,aklabisz,2022-05-23 13:21:00,30/May/22 12:00 AM,2023-03-21 13:06:00,Porsche,Service Request (SR),393357,0 days 16:28:00,3 days 18:32:00


In [566]:
all_tickets_time_on_and_outside_codebusters.isnull().sum()

Issue key                   0
Issue id                    0
Status                      0
Reporter                    0
Created                     0
Due Date                    0
Resolved                    0
Customer                    0
Type of Request             0
Parent id                   0
time_outside_codebusters    0
time_on_codebusters         0
dtype: int64

### Merge `all_tickets_time_on_and_outside_codebusters` with `all_sr_review_time` dataset

In [567]:
all_tickets_with_time = pd.merge(all_tickets_time_on_and_outside_codebusters, all_sr_review_time[['Issue id','time_in_review']],  how="inner",left_on = "Issue id",right_on = "Issue id")
all_tickets_with_time

Unnamed: 0,Issue key,Issue id,Status,Reporter,Created,Due Date,Resolved,Customer,Type of Request,Parent id,time_outside_codebusters,time_on_codebusters,time_in_review
0,CI-8574,468635,Closed,yutaka,2023-04-28 03:47:00,01/May/23 12:00 AM,2023-05-08 16:08:00,Rehau,Service Request (SR),468635,31 days 21:34:00,8 days 00:48:00,0 days 00:24:00
1,CI-8568,468149,Closed,aruizrobles,2023-04-26 15:30:00,03/May/23 12:00 AM,2023-05-04 17:55:00,Numiga,Service Request (SR),468149,5 days 12:03:00,61 days 06:56:00,1 days 23:16:00
2,CI-8560,467735,Closed,nsanchez,2023-04-25 11:57:00,08/May/23 12:00 AM,2023-05-08 14:02:00,OBB,Service Request (SR),467735,144 days 19:06:00,10 days 13:28:00,6 days 20:24:00
3,CI-8554,467243,Closed,yutaka,2023-04-24 06:44:00,08/May/23 12:00 AM,2023-05-10 04:06:00,Hensoldt,Service Request (SR),467243,8 days 09:39:00,22 days 06:07:00,0 days 00:00:00
4,CI-8552,467085,Closed,nsanchez,2023-04-21 14:58:00,27/Apr/23 12:00 AM,2023-05-08 13:46:00,VW,Service Request (SR),467085,33 days 11:41:00,71 days 23:58:00,6 days 21:53:00
...,...,...,...,...,...,...,...,...,...,...,...,...,...
244,CI-7227,407574,Closed,skarja,2022-08-10 00:39:00,30/Sep/22 12:00 AM,2023-01-24 11:52:00,Rheinmetall,Service Request (SR),407574,14 days 09:43:00,2 days 23:15:00,103 days 12:49:00
245,CI-7226,407535,Closed,ssaharoy,2022-08-09 15:05:00,16/Aug/22 12:00 AM,2023-01-18 06:29:00,Porsche,Service Request (SR),407535,4 days 00:03:00,0 days 00:00:00,15 days 16:28:00
246,CI-7205,406532,Closed,nsanchez,2022-08-04 11:54:00,02/May/23 12:00 AM,2023-04-10 08:47:00,Hugo Boss,Service Request (SR),406532,0 days 18:51:00,0 days 15:22:00,6 days 01:59:00
247,CI-6901,393357,Closed,aklabisz,2022-05-23 13:21:00,30/May/22 12:00 AM,2023-03-21 13:06:00,Porsche,Service Request (SR),393357,0 days 16:28:00,3 days 18:32:00,28 days 01:38:00


Check for null values

In [568]:
all_tickets_with_time.isnull().sum()

Issue key                    0
Issue id                     0
Status                       0
Reporter                     0
Created                      0
Due Date                     0
Resolved                     0
Customer                     0
Type of Request              0
Parent id                    0
time_outside_codebusters     0
time_on_codebusters          0
time_in_review              66
dtype: int64

Drop rows which contain null value

In [569]:
all_tickets_with_time_without_null = all_tickets_with_time.dropna()
all_tickets_with_time_without_null.isnull().sum()

Issue key                   0
Issue id                    0
Status                      0
Reporter                    0
Created                     0
Due Date                    0
Resolved                    0
Customer                    0
Type of Request             0
Parent id                   0
time_outside_codebusters    0
time_on_codebusters         0
time_in_review              0
dtype: int64

### Convert `time in..` columns to minutes

Convert `timedelta` columns to float

In [570]:
all_tickets_with_time_converted = all_tickets_with_time_without_null
all_tickets_with_time_converted.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 183 entries, 0 to 248
Data columns (total 13 columns):
 #   Column                    Non-Null Count  Dtype          
---  ------                    --------------  -----          
 0   Issue key                 183 non-null    object         
 1   Issue id                  183 non-null    int64          
 2   Status                    183 non-null    object         
 3   Reporter                  183 non-null    object         
 4   Created                   183 non-null    datetime64[ns] 
 5   Due Date                  183 non-null    object         
 6   Resolved                  183 non-null    datetime64[ns] 
 7   Customer                  183 non-null    object         
 8   Type of Request           183 non-null    object         
 9   Parent id                 183 non-null    int64          
 10  time_outside_codebusters  183 non-null    timedelta64[ns]
 11  time_on_codebusters       183 non-null    timedelta64[ns]
 12  time_in_

Convert `time in..` columns to minutes

In [571]:
all_tickets_with_time_converted['time_in_review'] = all_tickets_with_time_converted['time_in_review'].dt.total_seconds().div(60)
all_tickets_with_time_converted['time_on_codebusters'] = all_tickets_with_time_converted['time_on_codebusters'].dt.total_seconds().div(60)
all_tickets_with_time_converted['time_outside_codebusters'] = all_tickets_with_time['time_outside_codebusters'].dt.total_seconds().div(60)

all_tickets_with_time_converted.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 183 entries, 0 to 248
Data columns (total 13 columns):
 #   Column                    Non-Null Count  Dtype         
---  ------                    --------------  -----         
 0   Issue key                 183 non-null    object        
 1   Issue id                  183 non-null    int64         
 2   Status                    183 non-null    object        
 3   Reporter                  183 non-null    object        
 4   Created                   183 non-null    datetime64[ns]
 5   Due Date                  183 non-null    object        
 6   Resolved                  183 non-null    datetime64[ns]
 7   Customer                  183 non-null    object        
 8   Type of Request           183 non-null    object        
 9   Parent id                 183 non-null    int64         
 10  time_outside_codebusters  183 non-null    float64       
 11  time_on_codebusters       183 non-null    float64       
 12  time_in_review        

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_tickets_with_time_converted['time_in_review'] = all_tickets_with_time_converted['time_in_review'].dt.total_seconds().div(60)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_tickets_with_time_converted['time_on_codebusters'] = all_tickets_with_time_converted['time_on_codebusters'].dt.total_seconds().div(60)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/st

In [581]:
all_tickets_with_time_converted.reset_index(inplace=True)
all_tickets_with_time_converted

Unnamed: 0,index,Issue key,Issue id,Status,Reporter,Created,Due Date,Resolved,Customer,Type of Request,Parent id,time_outside_codebusters,time_on_codebusters,time_in_review
0,0,CI-8574,468635,Closed,yutaka,2023-04-28 03:47:00,01/May/23 12:00 AM,2023-05-08 16:08:00,Rehau,Service Request (SR),468635,45934.0,11568.0,24.0
1,1,CI-8568,468149,Closed,aruizrobles,2023-04-26 15:30:00,03/May/23 12:00 AM,2023-05-04 17:55:00,Numiga,Service Request (SR),468149,7923.0,88256.0,2836.0
2,2,CI-8560,467735,Closed,nsanchez,2023-04-25 11:57:00,08/May/23 12:00 AM,2023-05-08 14:02:00,OBB,Service Request (SR),467735,208506.0,15208.0,9864.0
3,3,CI-8554,467243,Closed,yutaka,2023-04-24 06:44:00,08/May/23 12:00 AM,2023-05-10 04:06:00,Hensoldt,Service Request (SR),467243,12099.0,32047.0,0.0
4,4,CI-8552,467085,Closed,nsanchez,2023-04-21 14:58:00,27/Apr/23 12:00 AM,2023-05-08 13:46:00,VW,Service Request (SR),467085,48221.0,103678.0,9953.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
178,244,CI-7227,407574,Closed,skarja,2022-08-10 00:39:00,30/Sep/22 12:00 AM,2023-01-24 11:52:00,Rheinmetall,Service Request (SR),407574,20743.0,4275.0,149089.0
179,245,CI-7226,407535,Closed,ssaharoy,2022-08-09 15:05:00,16/Aug/22 12:00 AM,2023-01-18 06:29:00,Porsche,Service Request (SR),407535,5763.0,0.0,22588.0
180,246,CI-7205,406532,Closed,nsanchez,2022-08-04 11:54:00,02/May/23 12:00 AM,2023-04-10 08:47:00,Hugo Boss,Service Request (SR),406532,1131.0,922.0,8759.0
181,247,CI-6901,393357,Closed,aklabisz,2022-05-23 13:21:00,30/May/22 12:00 AM,2023-03-21 13:06:00,Porsche,Service Request (SR),393357,988.0,5432.0,40418.0


In [573]:
all_tickets_with_time_converted.dropna(subset=['time_in_review'], how="any", inplace=True)
all_tickets_with_time_converted.isnull().sum()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_tickets_with_time_converted.dropna(subset=['time_in_review'], how="any", inplace=True)


Issue key                   0
Issue id                    0
Status                      0
Reporter                    0
Created                     0
Due Date                    0
Resolved                    0
Customer                    0
Type of Request             0
Parent id                   0
time_outside_codebusters    0
time_on_codebusters         0
time_in_review              0
dtype: int64

## Save data for KPIs purposes

In [574]:
all_tickets_with_time_converted.to_csv('all_partners.csv', index=False)

### Add a column to all tasks with the latest communicate step date
- count "real" close time, bease on the date when ticket was moved to `communicate` step

In [575]:
all_individual_tasks_2022

Unnamed: 0,Summary,Issue id,Parent id,Created,Resolved,time_in
0,Import in Production,394675,373630,2022-05-30 22:30:00,2022-06-21 13:17:00,21 days 14:47:00
1,Communicate,394671,389490,2022-05-30 20:27:00,2022-05-31 09:59:00,0 days 13:32:00
2,Communicate,394670,393625,2022-05-30 20:27:00,2022-05-31 09:04:00,0 days 12:37:00
3,Configure,394664,389982,2022-05-30 18:01:00,2022-05-31 14:22:00,0 days 20:21:00
4,Configure,394663,370941,2022-05-30 17:43:00,2022-06-17 13:58:00,17 days 20:15:00
...,...,...,...,...,...,...
14067,Explain,465501,465496,2023-04-18 04:31:00,2023-04-18 04:31:00,0 days 00:00:00
14068,Review,465500,465499,2023-04-18 04:31:00,2023-04-18 04:36:00,0 days 00:05:00
14069,Review,465497,465496,2023-04-18 04:16:00,2023-04-18 04:31:00,0 days 00:15:00
14070,Review,465494,465493,2023-04-18 02:16:00,2023-05-09 18:08:00,21 days 15:52:00


In [576]:
summary_duplicates = all_individual_tasks_2022.duplicated(['Summary', 'Parent id'], keep=False)
all_individual_tasks_2022[summary_duplicates].sort_values("Parent id")

Unnamed: 0,Summary,Issue id,Parent id,Created,Resolved,time_in
448,Verify,392575,327573,2022-05-18 13:20:00,2022-05-18 13:20:00,0 days 00:00:00
7173,Verify,395426,327573,2022-06-03 13:45:00,2022-06-03 13:45:00,0 days 00:00:00
1251,Import in Production,386221,329036,2022-04-21 11:00:00,2022-04-25 10:56:00,3 days 23:56:00
1254,Configure,386188,329036,2022-04-21 09:08:00,2022-04-21 11:00:00,0 days 01:52:00
1278,Communicate,386023,329036,2022-04-20 13:04:00,2022-04-20 13:04:00,0 days 00:00:00
...,...,...,...,...,...,...
13246,Review,471321,470324,2023-05-11 11:03:00,2023-05-11 11:39:00,0 days 00:36:00
13346,Verify,470938,470827,2023-05-10 10:53:00,2023-05-10 10:53:00,0 days 00:00:00
13348,Verify,470936,470827,2023-05-10 10:53:00,2023-05-10 10:53:00,0 days 00:00:00
13290,Verify,471153,471086,2023-05-10 17:39:00,2023-05-10 19:54:00,0 days 02:15:00


In [577]:
all_individual_tasks_2022_communicate = all_individual_tasks_2022[all_individual_tasks_2022["Summary"] == "Communicate"]
all_individual_tasks_2022_communicate.sort_values("Parent id")

Unnamed: 0,Summary,Issue id,Parent id,Created,Resolved,time_in
3251,Communicate,370797,326547,2022-02-18 20:14:00,2022-02-18 20:16:00,0 days 00:02:00
7171,Communicate,395428,327573,2022-06-03 13:45:00,2022-06-03 13:45:00,0 days 00:00:00
1278,Communicate,386023,329036,2022-04-20 13:04:00,2022-04-20 13:04:00,0 days 00:00:00
1200,Communicate,386568,329036,2022-04-25 10:56:00,2022-04-25 10:57:00,0 days 00:01:00
1459,Communicate,384549,330489,2022-04-12 10:15:00,2022-04-12 10:42:00,0 days 00:27:00
...,...,...,...,...,...,...
13646,Communicate,468640,468635,2023-04-28 04:17:00,2023-05-08 16:08:00,10 days 11:51:00
13498,Communicate,469841,469376,2023-05-04 18:01:00,2023-05-05 16:27:00,0 days 22:26:00
13502,Communicate,469828,469474,2023-05-04 17:12:00,2023-05-05 16:37:00,0 days 23:25:00
13229,Communicate,471406,471236,2023-05-11 14:06:00,2023-05-11 14:37:00,0 days 00:31:00


In [578]:
all_individual_tasks_2022_communicate.nunique()

Summary         1
Issue id     1914
Parent id    1743
Created      1741
Resolved     1691
time_in      1162
dtype: int64

Drop rows with duplicated `Communicate` subtask and keep only the newest one

In [579]:
all_individual_tasks_2022_communicate.sort_values(["Parent id", "Created"])
all_individual_tasks_2022_communicate_no_duplicates = all_individual_tasks_2022_communicate.drop_duplicates(['Summary', 'Parent id'], keep="last")
# all_individual_tasks_2022_communicate_no_duplicates = all_individual_tasks_2022_communicate
all_individual_tasks_2022_communicate_no_duplicates

Unnamed: 0,Summary,Issue id,Parent id,Created,Resolved,time_in
2,Communicate,394670,393625,2022-05-30 20:27:00,2022-05-31 09:04:00,0 days 12:37:00
22,Communicate,394578,394215,2022-05-30 15:27:00,2022-06-03 09:53:00,3 days 18:26:00
27,Communicate,394570,393200,2022-05-30 15:09:00,2022-05-30 17:27:00,0 days 02:18:00
29,Communicate,394565,394093,2022-05-30 15:05:00,2022-06-17 03:13:00,17 days 12:08:00
30,Communicate,394559,392974,2022-05-30 14:43:00,2022-06-07 01:58:00,7 days 11:15:00
...,...,...,...,...,...,...
14008,Communicate,465745,421830,2023-04-18 11:40:00,2023-04-18 11:40:00,0 days 00:00:00
14025,Communicate,465695,464772,2023-04-18 10:32:00,2023-04-18 11:15:00,0 days 00:43:00
14029,Communicate,465690,464444,2023-04-18 10:31:00,2023-04-18 11:15:00,0 days 00:44:00
14039,Communicate,465592,422099,2023-04-18 09:43:00,2023-04-18 09:43:00,0 days 00:00:00


In [580]:
all_tickets_with_time_and_cummunicate_date = pd.merge(all_tickets_with_time_converted, all_individual_tasks_2022_communicate_no_duplicates[['Parent id','Created']], how="left",left_on = "Issue id",right_on = "Parent id").drop(columns=["Parent id"])
all_tickets_with_time_and_cummunicate_date.rename({"Created_y": "comminicate_created"},axis=1, inplace=True)
all_tickets_with_time_and_cummunicate_date

KeyError: "['Parent id'] not found in axis"

Drop rows without `communicate_create`, we are taking in to consideration only completed tickets

In [None]:
all_tickets_2022_with_time_and_cummunicate_date.isnull().sum()

In [None]:
all_tickets_2022_with_time_and_cummunicate_date.nunique()

In [None]:
500 - 145

In [None]:
all_tickets_2022_with_time_and_cummunicate_date.dropna(subset=['comminicate_created'], how="any", inplace=True)

In [None]:
all_tickets_2022_with_time_and_cummunicate_date

**Summary:**
- rows with null value are droped
- we take into consideration only completed tickets

### Save data to csv file

In [None]:
all_tickets_2022_with_time_and_cummunicate_date.to_csv('all_tickets_combined.csv', index=False)

In [None]:
all_individual_tasks_2022_communicate_no_duplicates.nunique()

## Data Analysis

### Compare the size of the created data sets
- compare the size of each datasests, in order to verify v

In [None]:
all_individual_tasks_2022_origin[all_individual_tasks_2022_origin['Parent id'] == 450084]

In [None]:
all_individual_tasks_2022.info()

**Summary:**
- We should receive inside `all_individual_tasks_2022_communicate_no_duplicates` 1372 rows not 1297

# To Do:
- invesitigate discrepency between datasests: `codebusters_time`(1798 rows),  `others_time` (1672 rows), `all_tickets_2022` (1699 rows)

In [None]:
duplucated_communicate = all_individual_tasks_2022_communicate.duplicated(['Summary', 'Parent id'])
all_individual_tasks_2022_communicate[duplucated_communicate]

In [None]:
codebusters_time.shape

In [None]:
others_time.shape

In [None]:
all_tickets.shape

## Investigate different columns with the tickets age

- check the useful KPIs

In [None]:
all_tickets_2022_updated.head(-5)

In [None]:
all_tickets_2022_updated[all_tickets_2022_updated['Issue key'] == 'CI-7689']

In [None]:
all_tickets_2022_updated['age_till_communicate_hours'].describe()

In [None]:
# all_tickets_2022_updated.to_csv('all_tickets_2022_updated.csv')