#Gaps In Time 
##Purpose
The purpose of this code is to take in a CSV file containing call data and create a new CSV file that includes information on breaks taken during the calls. The new file will have a row for each call and a row for each break, and will include the start and end times for each, as well as the duration of each call and break in minutes.

##Dependencies
This code uses the pandas library to read in the CSV file and manipulate the data.

##Input
The input is a CSV file containing call data. The file must have at least three columns: "TimeStart2", "TimeStart3", and "Duration". The first two columns should contain the date and time the call started, and the third column should contain the duration of the call in the format "hh:mm:ss".

##Output
The output is a new CSV file with a row for each call and a row for each break, and columns for the start and end times, the type of call (either "Call" or "Break"), and the duration of the call or break in minutes. Additionally, the new file will include two columns for the duration of calls and breaks separately.

##Functionality
1.  Read in the CSV file and convert the "Duration" column to a timedelta object and fill any NA values with a timedelta of zero.
2. Convert the "start_datetime" and "end_datetime" columns to datetime objects.
3. Sort the dataframe by start time.
4. Create a column for the time gap between the current call and the previous call.
5. Create a column to indicate if there was a break before the current call.
6. Create a column for the end time of the break.
7. Create a column for the start time of the break.
8. Create a column for the duration of the break.
9. Create a new dataframe to hold the rows for each call and the break.
10. Iterate over the rows of the original dataframe and add a row for the current call.
11. If there was a break before the current call, add a row for the break.
12. Round the duration of calls and breaks to the nearest minute.
13. Create two new columns to hold the duration in minutes for calls only and breaks only.
14. Fill any missing values with 0.
15. Convert columns to integer.
16. Save the new dataframe to a CSV file.

In [102]:
filename = 'mimi.csv'
exportfilename = 'call_log_with_breaks_mimi.csv'

In [103]:
import pandas as pd

# read in the data
df = pd.read_csv(filename, skiprows=2)

# convert the "Duration" column to a timedelta object and fill any NA values with a timedelta of zero
df['Duration'] = pd.to_timedelta(df['Duration'].fillna(0))

# convert the "start_datetime" and "end_datetime" columns to datetime objects
df['start_datetime'] = pd.to_datetime(df['TimeStart2'] + ' ' + df['TimeStart3'], format='%m/%d/%Y %I:%M:%S %p')
df['end_datetime'] = df['start_datetime'] + df['Duration']

# sort the dataframe by start time
df = df.sort_values(by='start_datetime')

# create a column for the time gap between the current call and the previous call
df['gap'] = df['start_datetime'] - df['end_datetime'].shift(1)

# create a column to indicate if there was a break before the current call
df['break'] = df['gap'] > pd.Timedelta(minutes=1)

# create a column for the end time of the break
df['break_end_time'] = df['start_datetime'][df['break']].shift(-1)

# create a column for the start time of the break
df['break_time'] = df['end_datetime'].shift(1)[df['break']]

# create a column for the duration of the break
df['break_duration'] = df['start_datetime'] - df['end_datetime'].shift(1).fillna(pd.Timedelta(0))[df['break']]

# create a new dataframe to hold the rows for each call and the break
new_df = pd.DataFrame(columns=['start_datetime', 'end_datetime', 'Type', 'Duration_minutes'])

# iterate over the rows of the original dataframe
for i, row in df.iterrows():
    # add a row for the current call
    current_call = {'start_datetime': row['start_datetime'], 'end_datetime': row['end_datetime'], 'Type': 'Call', 'Duration_minutes': row['Duration'].total_seconds() / 60}
    new_df = pd.concat([new_df, pd.DataFrame(current_call, index=[0])], ignore_index=True)
    
    # if there was a break before the current call, add a row for the break
    if row['break']:
        break_duration = row['break_duration']
        if pd.isna(break_duration):
            break_duration = pd.Timedelta(0)
        break_row = {'start_datetime': row['break_time'], 'end_datetime': row['start_datetime'], 'Type': 'Break', 'Duration_minutes': break_duration.total_seconds() / 60}
        new_df = pd.concat([new_df, pd.DataFrame(break_row, index=[0])], ignore_index=True)
        
# round to the nearest minute
new_df['Duration_minutes'] = new_df['Duration_minutes'].round().astype(int)

# Create a new column 'Duration_Call' to hold the duration in minutes for calls only
new_df.loc[new_df['Type'] == 'Call', 'Duration_Call'] = new_df['Duration_minutes']

# Create a new column 'Duration_Break' to hold the duration in minutes for breaks only
new_df.loc[new_df['Type'] == 'Break', 'Duration_Break'] = new_df['Duration_minutes']

# Fill any missing values with 0
new_df.fillna(0, inplace=True)

# Convert columns to integer
new_df[['Duration_Call', 'Duration_Break']] = new_df[['Duration_Call', 'Duration_Break']].astype(int)

# Save the new dataframe to a CSV file
new_df.to_csv(exportfilename)

#Print df
new_df

Unnamed: 0,start_datetime,end_datetime,Type,Duration_minutes,Duration_Call,Duration_Break
0,2023-01-02 09:04:16,2023-01-02 09:04:16,Call,0,0,0
1,2023-01-02 09:08:42,2023-01-02 09:11:34,Call,3,3,0
2,2023-01-02 09:04:16,2023-01-02 09:08:42,Break,4,0,4
3,2023-01-02 09:12:45,2023-01-02 09:18:22,Call,6,6,0
4,2023-01-02 09:11:34,2023-01-02 09:12:45,Break,1,0,1
...,...,...,...,...,...,...
11173,2023-03-29 13:52:52,2023-03-29 13:53:27,Call,1,1,0
11174,2023-03-29 13:51:38,2023-03-29 13:52:52,Break,1,0,1
11175,2023-03-29 13:54:17,2023-03-29 13:54:17,Call,0,0,0
11176,2023-03-29 13:54:31,2023-03-29 13:55:05,Call,1,1,0
