# Codebusters KPIs and Other Usefull Patterns

## import libraries

In [1]:
import pandas as pd
import numpy as np

Import and clean `all_tickets_2022` data set 
- remove unnecessary columns
- rename columns

# Prepare data with all tickets

### Import all service request

Tickets are taken from the following jira filter:<br/> `project = CI AND issuetype in (standardIssueTypes(), "Expense Delivery") AND "Epic Link" is EMPTY AND "Case Number/s" is not EMPTY AND cf[14125] in ("Service Request (SR)") AND resolved is not EMPTY AND resolutiondate >= 2022-12-19 and resolutiondate <= 2023-01-08`

In [2]:
all_sr_origin = pd.read_csv('../../DataSets/KPIs/MainTickets/all_service_requests.csv')
all_sr_origin

Unnamed: 0,Summary,Issue key,Issue id,Issue Type,Status,Project key,Project name,Project type,Project lead,Project description,...,Comment.18,Comment.19,Comment.20,Comment.21,Comment.22,Comment.23,Comment.24,Comment.25,Comment.26,Comment.27
0,Rehau - Add EDI number Airplus,CI-8574,468635,Expense Delivery,Closed,CI,codebusters,software,rohkamm,,...,,,,,,,,,,
1,Numiga VFL Wolfsburg - Remove a receipt,CI-8568,468149,Expense Delivery,Closed,CI,codebusters,software,rohkamm,,...,,,,,,,,,,
2,oebb: correct perdiem/EMS,CI-8560,467735,Expense Delivery,Closed,CI,codebusters,software,rohkamm,,...,,,,,,,,,,
3,Hensoldt - Update Approval notification mail,CI-8554,467243,Expense Delivery,Closed,CI,codebusters,software,rohkamm,,...,,,,,,,,,,
4,vw-cso: activate bing map in itinerary section,CI-8552,467085,Expense Delivery,Closed,CI,codebusters,software,rohkamm,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
309,Aquila Capital - Expense issue with export and...,CI-6926,393799,Expense Delivery,Closed,CI,codebusters,software,rohkamm,,...,04/Aug/22 10:36 AM;gmalarski;Export path added...,27/Jan/23 7:54 PM;raphose;SR is closed since A...,,,,,,,,
310,VW - Splitting of approval for expense statements,CI-6911,393585,Expense Delivery,Closed,CI,codebusters,software,rohkamm,,...,,,,,,,,,,
311,Porsche - Adjustment of approval process for n...,CI-6901,393357,Expense Delivery,Closed,CI,codebusters,software,rohkamm,,...,,,,,,,,,,
312,Hugo Boss - add 3 new meal receipt types for t...,CI-6825,389774,Expense Delivery,Closed,CI,codebusters,software,rohkamm,,...,,,,,,,,,,


### Select from the dataset only columns necessary for the further analysis

In [3]:
all_sr = all_sr_origin[['Issue key', 'Issue id', 'Status', 'Reporter', 'Created', 'Due Date', 'Resolved', 'Custom field (Customer/s)','Custom field (Type of Request)']]
all_sr.rename({"Custom field (Customer/s)": "Customer", "Custom field (Type of Request)":"Type of Request"},axis=1, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_sr.rename({"Custom field (Customer/s)": "Customer", "Custom field (Type of Request)":"Type of Request"},axis=1, inplace=True)


In [4]:
all_sr.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 314 entries, 0 to 313
Data columns (total 9 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Issue key        314 non-null    object
 1   Issue id         314 non-null    int64 
 2   Status           314 non-null    object
 3   Reporter         314 non-null    object
 4   Created          314 non-null    object
 5   Due Date         314 non-null    object
 6   Resolved         314 non-null    object
 7   Customer         314 non-null    object
 8   Type of Request  314 non-null    object
dtypes: int64(1), object(8)
memory usage: 22.2+ KB


In [5]:
all_sr.isnull().sum()

Issue key          0
Issue id           0
Status             0
Reporter           0
Created            0
Due Date           0
Resolved           0
Customer           0
Type of Request    0
dtype: int64

### Convert date table to `datetime` object

In [6]:
all_sr["Created"] = pd.to_datetime(all_sr["Created"])
all_sr["Resolved"] = pd.to_datetime(all_sr["Resolved"])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_sr["Created"] = pd.to_datetime(all_sr["Created"])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_sr["Resolved"] = pd.to_datetime(all_sr["Resolved"])


In [7]:
all_sr.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 314 entries, 0 to 313
Data columns (total 9 columns):
 #   Column           Non-Null Count  Dtype         
---  ------           --------------  -----         
 0   Issue key        314 non-null    object        
 1   Issue id         314 non-null    int64         
 2   Status           314 non-null    object        
 3   Reporter         314 non-null    object        
 4   Created          314 non-null    datetime64[ns]
 5   Due Date         314 non-null    object        
 6   Resolved         314 non-null    datetime64[ns]
 7   Customer         314 non-null    object        
 8   Type of Request  314 non-null    object        
dtypes: datetime64[ns](2), int64(1), object(6)
memory usage: 22.2+ KB


**Summary**
- there is not SR ticket without `resolved` date
- there is no `null` value in any of the column
- date type columns converted to `datetime` object in order to make calculucations on these columns

# Prepare data with individuals tasks for service requests

## Import and clean `all_individual_tasks` data set 
- we are not able to downaload full data becuase of jira limitation, therefore we need to combine data from differen periods
- remove unnecessary columns
- replace missing `date` values with 0
- drop rows with "null" values

In [9]:
individual_part_1 = pd.read_csv('../../DataSets/KPIs/IndividualTasks/from20220101_till_2022_05_31.csv')
individual_part_2 = pd.read_csv('../../DataSets/KPIs/IndividualTasks/from_20220601_till_20220831.csv')
individual_part_3 = pd.read_csv('../../DataSets/KPIs/IndividualTasks/from_20220901_till_20221231.csv')
individual_part_4 = pd.read_csv('../../DataSets/KPIs/IndividualTasks/from20230101_till_20230310.csv')
individual_part_5 = pd.read_csv('../../DataSets/KPIs/IndividualTasks/from20230311_till_20230417.csv')
individual_part_5 = pd.read_csv('../../DataSets/KPIs/IndividualTasks/from20230418_till_20230511.csv')


all_individual_tasks_combined = pd.concat([individual_part_1, individual_part_2, individual_part_3, individual_part_4, individual_part_5], axis=0)
all_individual_tasks_combined

  individual_part_1 = pd.read_csv('../../DataSets/KPIs/IndividualTasks/from20220101_till_2022_05_31.csv')
  individual_part_2 = pd.read_csv('../../DataSets/KPIs/IndividualTasks/from_20220601_till_20220831.csv')
  individual_part_3 = pd.read_csv('../../DataSets/KPIs/IndividualTasks/from_20220901_till_20221231.csv')
  individual_part_4 = pd.read_csv('../../DataSets/KPIs/IndividualTasks/from20230101_till_20230310.csv')


Unnamed: 0,Summary,Issue key,Issue id,Parent id,Issue Type,Status,Project key,Project name,Project type,Project lead,...,Custom field (arctic QA Findings),Custom field (cytric Stability Findings),Comment,Comment.1,Component/s,Custom field (Customer/s),Sprint.35,Sprint.36,Custom field (Priority Index),Custom field (Remaining Story Points)
0,Import in Production,CI-10217127,394675,373630,Individual Task,Done,CI,codebusters,software,rohkamm,...,,,,,,,,,,
1,Communicate,CI-10217126,394671,389490,Individual Task,Done,CI,codebusters,software,rohkamm,...,,,,,,,,,,
2,Communicate,CI-10217125,394670,393625,Individual Task,Done,CI,codebusters,software,rohkamm,...,,,,,,,,,,
3,Configure,CI-10217121,394664,389982,Individual Task,Done,CI,codebusters,software,rohkamm,...,,,,,,,,,,
4,Configure,CI-10217120,394663,370941,Individual Task,Done,CI,codebusters,software,rohkamm,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
860,Explain,CI-10237898,465501,465496,Individual Task,Done,CI,codebusters,software,rohkamm,...,,,,,,,,,,
861,Review,CI-10237897,465500,465499,Individual Task,Done,CI,codebusters,software,rohkamm,...,,,,,,,,,,
862,Review,CI-10237896,465497,465496,Individual Task,Done,CI,codebusters,software,rohkamm,...,,,,,,,,,,
863,Review,CI-10237895,465494,465493,Individual Task,Done,CI,codebusters,software,rohkamm,...,,,,,,,,,,


### Reset indexex of a combined dataset </br>
Indexes of a new dataset are not appropriate, we need to apply `reset_index` method in order to match them with the number of rows

In [12]:
all_individual_tasks_combined.reset_index(inplace=True)
all_individual_tasks_combined

  all_individual_tasks_combined.reset_index(inplace=True)


Unnamed: 0,level_0,index,Summary,Issue key,Issue id,Parent id,Issue Type,Status,Project key,Project name,...,Custom field (arctic QA Findings),Custom field (cytric Stability Findings),Comment,Comment.1,Component/s,Custom field (Customer/s),Sprint.35,Sprint.36,Custom field (Priority Index),Custom field (Remaining Story Points)
0,0,0,Import in Production,CI-10217127,394675,373630,Individual Task,Done,CI,codebusters,...,,,,,,,,,,
1,1,1,Communicate,CI-10217126,394671,389490,Individual Task,Done,CI,codebusters,...,,,,,,,,,,
2,2,2,Communicate,CI-10217125,394670,393625,Individual Task,Done,CI,codebusters,...,,,,,,,,,,
3,3,3,Configure,CI-10217121,394664,389982,Individual Task,Done,CI,codebusters,...,,,,,,,,,,
4,4,4,Configure,CI-10217120,394663,370941,Individual Task,Done,CI,codebusters,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
14067,14067,860,Explain,CI-10237898,465501,465496,Individual Task,Done,CI,codebusters,...,,,,,,,,,,
14068,14068,861,Review,CI-10237897,465500,465499,Individual Task,Done,CI,codebusters,...,,,,,,,,,,
14069,14069,862,Review,CI-10237896,465497,465496,Individual Task,Done,CI,codebusters,...,,,,,,,,,,
14070,14070,863,Review,CI-10237895,465494,465493,Individual Task,Done,CI,codebusters,...,,,,,,,,,,


### Filter columns and select only the ones needed for further calculations

In [14]:
all_individual_tasks_2022 = all_individual_tasks_combined[['Summary','Issue id', 'Parent id', 'Created', 'Resolved']]

Drop columns with missing values

In [None]:
# all_individual_tasks_2022.dropna(inplace=True)

### Convert date like columns to `datetime` object

In [15]:
all_individual_tasks_2022["Created"] = pd.to_datetime(all_individual_tasks_2022["Created"])
all_individual_tasks_2022["Resolved"] = pd.to_datetime(all_individual_tasks_2022["Resolved"])
all_individual_tasks_2022.info()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_individual_tasks_2022["Created"] = pd.to_datetime(all_individual_tasks_2022["Created"])


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14072 entries, 0 to 14071
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   Summary    14072 non-null  object        
 1   Issue id   14072 non-null  int64         
 2   Parent id  14072 non-null  int64         
 3   Created    14072 non-null  datetime64[ns]
 4   Resolved   13635 non-null  datetime64[ns]
dtypes: datetime64[ns](2), int64(2), object(1)
memory usage: 549.8+ KB


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_individual_tasks_2022["Resolved"] = pd.to_datetime(all_individual_tasks_2022["Resolved"])


In [16]:
all_individual_tasks_2022['Summary'].value_counts()

Configure                         2501
Review                            2496
Verify                            2191
Communicate                       2009
Import to Test System             1822
Import in Production              1688
Import to Customer Test System     629
Explain                            333
Import to Staging System           161
Clarify                            158
Do                                  73
Michał Training                      2
Marcin Training                      2
Maciej Training                      2
Training - Marcin                    1
Training - Maciej                    1
Import to Test/Staging               1
Bartosz Training                     1
CLONE - Review                       1
Name: Summary, dtype: int64

### Check if there are any `null` values

In [17]:
all_individual_tasks_2022.isnull().sum()

Summary        0
Issue id       0
Parent id      0
Created        0
Resolved     437
dtype: int64

In [22]:
all_individual_tasks_2022['Resolved'].dropna(inplace=True)
all_individual_tasks_2022.isnull().sum()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_individual_tasks_2022['Resolved'].dropna(inplace=True)


Summary        0
Issue id       0
Parent id      0
Created        0
Resolved     437
time_in      437
dtype: int64

**Summary:**
- there are `437` rows with null value in `resolved` column, but we will match on

## Daterminate how long each ticket was in particular individual task

- calculate time for eacch subtask
- calculate how long the ticket was in review, configure, verify, import.., communicate steps
- calcualte the total eg. sum all time spend in verify for one ticket


### Calculate time spent for each of the subtask

In [19]:
all_individual_tasks_2022_time_calculation = all_individual_tasks_2022
all_individual_tasks_2022_time_calculation["time_in"] = all_individual_tasks_2022_time_calculation["Resolved"] - all_individual_tasks_2022_time_calculation["Created"]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_individual_tasks_2022_time_calculation["time_in"] = all_individual_tasks_2022_time_calculation["Resolved"] - all_individual_tasks_2022_time_calculation["Created"]


In [20]:
all_individual_tasks_2022_time_calculation.loc[all_individual_tasks_2022_time_calculation["Parent id"] == 438382]

Unnamed: 0,Summary,Issue id,Parent id,Created,Resolved,time_in
7276,Review,438383,438382,2022-12-29 14:47:00,2023-01-04 11:44:00,5 days 20:57:00
13089,Import in Production,439047,438382,2023-01-04 11:44:00,NaT,NaT
13090,Configure,439046,438382,2023-01-04 11:44:00,2023-01-04 11:44:00,0 days 00:00:00


In [None]:
all_individual_tasks_2022_time_calculation["Summary"].value_counts()

### Calculate time ticket was in review step

In [None]:
all_individual_tasks_reivew_time = all_individual_tasks_2022[all_individual_tasks_2022['Summary'] == 'Review']
all_individual_tasks_reivew_time

In [None]:
all_individual_tasks_reivew_time.isnull().sum()

Drop rows with `null` in `resolved` columns

In [None]:
all_individual_tasks_reivew_no_null = all_individual_tasks_reivew_time.dropna(subset=['Resolved'])
all_individual_tasks_reivew_no_null.isnull().sum()

Group rows by `Parent id` to get all subtaks for one ticekt

In [None]:
all_individual_tasks_reivew_no_null.groupby("Parent id")["time_in"].sum()

In [None]:
type(all_individual_tasks_reivew_no_null)

In [None]:
all_individual_tasks_reivew_no_null.rename({"time_in": "time_in_review"},axis=1, inplace=True)

In [None]:
all_individual_tasks_reivew_no_null[['Parent id', 'time_in_review']]

In [None]:
all_tickets_with_review_time = pd.merge(all_tickets_without_null, all_individual_tasks_reivew_no_null[['Parent id', 'time_in_review']], how="left",left_on = "Issue id",right_on = "Parent id")
all_tickets_with_review_time

In [None]:
all_tickets_with_review_time.shape

### Determinate time spent on codebusters

- calculate the total time ticket was in subtasks "Configure", "Review", "mport to Test System", "Import in Production", "Import to Customer Test System"
- apply the results to new dataset based on `all_tickets_2022`

In [None]:
all_individual_with_codebusters_time = all_individual_tasks_2022_time_calculation.loc[all_individual_tasks_2022_time_calculation["Summary"].isin(["Configure", "Review", "Import to Test System", "Import in Production", "Import to Customer Test System", "Import to Test", "Import to Staging System"])]

Summarize time for each codebusters subtask 

In [None]:
codebusters_time = all_individual_with_codebusters_time.groupby("Parent id")["time_in"].sum()

In [None]:
codebusters_time.rename({"time_in": "time_on_codebusters"},axis=1, inplace=True)

In [None]:
codebusters_time

In [None]:
codebusters_time.isnull().sum()

In [None]:
all_tickets.shape

In [None]:
all_tickets.isnull().sum()

Merge total time on codebusters with all jira ticekts dataset

In [None]:
all_tickets_with_codebusters_time = pd.merge(all_tickets_without_null, codebusters_time, how="left",left_on = "Issue id",right_on = "Parent id")

In [None]:
all_tickets_with_codebusters_time

In [None]:
all_tickets_with_codebusters_time.rename({"time_in": "time_on_codebusters"},axis=1, inplace=True)
all_tickets_with_codebusters_time

In [None]:
all_tickets_with_codebusters_time.notnull().sum()

### Determinate time spent outside codebusters

- calculate the total time ticket was in one of the subtask "Explain", "Verify", "Clarify", "Communicate", "Import in Production", "Import to Customer Test System"
- apply the results to new dataset based on `all_tickets_2022`

In [None]:
all_individual_with_others_time = all_individual_tasks_2022_time_calculation.loc[all_individual_tasks_2022_time_calculation["Summary"].isin(["Explain", "Verify", "Clarify", "Communicate"])]

In [None]:
all_individual_with_others_time

Summarize time for each subtask outside codeubsters

In [None]:
others_time = all_individual_with_others_time.groupby("Parent id")["time_in"].sum()

Summarize time for each subtask outside codeubsters

Merge `all_tickets` with time outisde codebusters column

In [None]:
all_tickets_with_outside_codebusters_time = pd.merge(all_tickets_without_null, others_time, how="left",left_on = "Issue id",right_on = "Parent id")

In [None]:
all_tickets_with_outside_codebusters_time.rename({"time_in": "time_outside_codebusters"},axis=1, inplace=True)
all_tickets_with_outside_codebusters_time

### Merge all dataframes together

Merge df containing "review" time with df containg "time on codebusters" 

In [None]:
all_tickets_with_time = pd.merge(all_tickets_with_review_time, all_tickets_with_codebusters_time[['Issue id','time_on_codebusters']],  how="left",left_on = "Issue id",right_on = "Issue id")

Merge df containing "review" and "time on codebusters" time with df containg "time outside codebusters" 

In [None]:
all_tickets_with_time = pd.merge(all_tickets_with_time, all_tickets_with_outside_codebusters_time[['Issue id','time_outside_codebusters']],  how="left",left_on = "Issue id",right_on = "Issue id")

In [None]:
all_tickets_with_time

### Convert `time in..` columns to minutes

Convert `timedelta` columns to float

In [None]:
all_tickets_with_time_converted = all_tickets_with_time
all_tickets_with_time_converted.info()

Convert `time in..` columns to minutes

In [None]:
all_tickets_with_time_converted['time_in_review'] = all_tickets_with_time_converted['time_in_review'].dt.total_seconds().div(60)
all_tickets_with_time_converted

In [None]:
all_tickets_with_time_converted['time_on_codebusters'] = all_tickets_with_time_converted['time_on_codebusters'].dt.total_seconds().div(60)
all_tickets_with_time_converted

In [None]:
all_tickets_with_time_converted['time_outside_codebusters'] = all_tickets_with_time['time_outside_codebusters'].dt.total_seconds().div(60)
all_tickets_with_time_converted

In [None]:
all_tickets_with_time_converted.isnull().sum()

In [None]:
all_tickets_with_time_converted.dropna(subset=['time_in_review'], how="any", inplace=True)
all_tickets_with_time_converted.isnull().sum()

## Save data for KPIs purposes

In [None]:
all_tickets_with_time_converted.to_csv('all_partners.csv', index=False)

### Add a column to all tasks with the latest communicate step date
- count "real" close time, bease on the date when ticket was moved to `communicate` step

In [None]:
all_individual_tasks_2022

In [None]:
summary_duplicates = all_individual_tasks_2022.duplicated(['Summary', 'Parent id'], keep=False)
all_individual_tasks_2022[summary_duplicates].sort_values("Parent id")

In [None]:
all_individual_tasks_2022_communicate = all_individual_tasks_2022[all_individual_tasks_2022["Summary"] == "Communicate"]
all_individual_tasks_2022_communicate.sort_values("Parent id")

In [None]:
all_individual_tasks_2022_communicate.nunique()

Drop rows with duplicated `Communicate` subtask and keep only the newest one

In [None]:
all_individual_tasks_2022_communicate.sort_values(["Parent id", "Created"])
all_individual_tasks_2022_communicate_no_duplicates = all_individual_tasks_2022_communicate.drop_duplicates(['Summary', 'Parent id'], keep="last")
# all_individual_tasks_2022_communicate_no_duplicates = all_individual_tasks_2022_communicate
all_individual_tasks_2022_communicate_no_duplicates

In [None]:
all_tickets_with_time_and_cummunicate_date = pd.merge(all_tickets_with_time_converted, all_individual_tasks_2022_communicate_no_duplicates[['Parent id','Created']], how="left",left_on = "Issue id",right_on = "Parent id").drop(columns=["Parent id"])
all_tickets_with_time_and_cummunicate_date.rename({"Created_y": "comminicate_created"},axis=1, inplace=True)
all_tickets_with_time_and_cummunicate_date

Drop rows without `communicate_create`, we are taking in to consideration only completed tickets

In [None]:
all_tickets_2022_with_time_and_cummunicate_date.isnull().sum()

In [None]:
all_tickets_2022_with_time_and_cummunicate_date.nunique()

In [None]:
500 - 145

In [None]:
all_tickets_2022_with_time_and_cummunicate_date.dropna(subset=['comminicate_created'], how="any", inplace=True)

In [None]:
all_tickets_2022_with_time_and_cummunicate_date

**Summary:**
- rows with null value are droped
- we take into consideration only completed tickets

### Save data to csv file

In [None]:
all_tickets_2022_with_time_and_cummunicate_date.to_csv('all_tickets_combined.csv', index=False)

In [None]:
all_individual_tasks_2022_communicate_no_duplicates.nunique()

## Data Analysis

### Compare the size of the created data sets
- compare the size of each datasests, in order to verify v

In [None]:
all_individual_tasks_2022_origin[all_individual_tasks_2022_origin['Parent id'] == 450084]

In [None]:
all_individual_tasks_2022.info()

**Summary:**
- We should receive inside `all_individual_tasks_2022_communicate_no_duplicates` 1372 rows not 1297

# To Do:
- invesitigate discrepency between datasests: `codebusters_time`(1798 rows),  `others_time` (1672 rows), `all_tickets_2022` (1699 rows)

In [None]:
duplucated_communicate = all_individual_tasks_2022_communicate.duplicated(['Summary', 'Parent id'])
all_individual_tasks_2022_communicate[duplucated_communicate]

In [None]:
codebusters_time.shape

In [None]:
others_time.shape

In [None]:
all_tickets.shape

## Investigate different columns with the tickets age

- check the useful KPIs

In [None]:
all_tickets_2022_updated.head(-5)

In [None]:
all_tickets_2022_updated[all_tickets_2022_updated['Issue key'] == 'CI-7689']

In [None]:
all_tickets_2022_updated['age_till_communicate_hours'].describe()

In [None]:
# all_tickets_2022_updated.to_csv('all_tickets_2022_updated.csv')