## Problem Statement : 
The excel below contains data on the work done by AGI bots who can multi-task and do multiple types of work. They record the start and end time of each task that they undertake along with the name of the activity in this excel file.

[Time Series](https://s3-us-west-2.amazonaws.com/secure.notion-static.com/a5d30693-0a11-417a-a01c-078ea10bea91/Untitled.xlsx)

Since the bots can multi-task and therefore can be doing multiple tasks in parallel, it is not possible to directly determine when they were working and when they were idle. You are required to:

1. Perform transformations on the data to output continuous periods (start, end) of work done by each bot and aggregate the activities done during such periods as an array against each period.
2. Solve the problem using python and SQL both

### Reading the File : 

In [1]:
import pandas as pd 
df = pd.read_excel('Time Series.xlsx')

In [2]:
df.head(10)

Unnamed: 0,Name,Start,End,Activity
0,Priyanka,2023-10-10 22:43:52.620,2023-07-20 03:31:52.620,Inspection
1,Jyoti,2023-08-24 05:55:52.620,2023-05-17 20:19:52.620,Remote Inspection
2,Jyoti,2023-06-08 08:19:52.620,2023-04-08 05:55:52.620,Updates
3,Priyanka,2023-09-21 15:31:52.620,2023-05-27 10:43:52.620,Reporting
4,Priyanka,2023-10-07 03:31:52.620,2023-04-30 13:07:52.620,Reply to Customers
5,Priyanka,2023-05-19 03:31:52.620,2023-05-11 20:19:52.620,Updates
6,Deepti,2023-08-02 13:07:52.620,2023-04-25 10:43:52.620,Fund raising
7,Jyoti,2023-05-04 22:43:52.620,2023-08-17 20:19:52.620,Business Development
8,Priyanka,2023-08-15 03:31:52.620,2023-07-02 13:07:52.620,Reporting
9,Sharan,2023-06-08 13:07:52.620,2023-09-24 01:07:52.620,Reply to Customers


In [3]:
df.isna().sum()

Name        0
Start       0
End         0
Activity    0
dtype: int64

In [4]:
df['Name'].value_counts()

Sharan      2241
Ravi        2201
Priyanka    2165
Jyoti       2140
Deepti      2138
Name: Name, dtype: int64

In [12]:
from pandasql import sqldf

In [5]:
pd.set_option('display.max_rows', None)

In [3]:
df_priyanka = df[df['Name'] == 'Priyanka']

In [4]:
df_priyanka_sorted = df_priyanka.sort_values('Start',ascending=True)
df_priyanka_sorted

Unnamed: 0,Name,Start,End,Activity
2743,Priyanka,2023-03-29 15:31:52.620,2023-05-31 15:31:52.620,Business Development
10663,Priyanka,2023-03-29 17:55:52.620,2023-06-25 17:55:52.620,Remote Inspection
4962,Priyanka,2023-03-29 22:43:52.620,2023-08-19 10:43:52.620,Podcast
904,Priyanka,2023-03-30 01:07:52.620,2023-09-01 08:19:52.620,Call
6884,Priyanka,2023-03-30 01:07:52.620,2023-06-16 01:07:52.620,Fund raising
...,...,...,...,...
1531,Priyanka,2023-10-15 03:31:52.620,2023-08-21 10:43:52.620,Call
6227,Priyanka,2023-10-15 05:55:52.620,2023-08-01 15:31:52.620,Call
766,Priyanka,2023-10-15 13:07:52.620,2023-09-11 05:55:52.620,Send Email
2013,Priyanka,2023-10-15 15:31:52.620,2023-07-29 20:19:52.620,Call


In [10]:
work_periods = []
current_period = None
i = 0

while i < len(df_priyanka_sorted):
    row = df_priyanka_sorted.iloc[i]
    if current_period is None:
        # If current_period is empty we will start a new period and then update the current_period to time of the row .
        current_period = [row["Start"], row["End"], {row["Activity"]}]
    elif row["Start"] <= current_period[1]:
        # If the start time of the current row lies in the current period, we extend the end time of current period to 
        # acommodate the duration of the activities
        current_period[1] = row["End"]
        current_period[2].add(row["Activity"])
    else:
        # If the start time of the current row is outside the current period,
        # append the current period to the list of work periods and start a new period
        work_periods.append(current_period)
        current_period = [row["Start"], row["End"], {row["Activity"]}]
    i += 1


work_periods.append(current_period)

# Create a new DataFrame for the result
result = pd.DataFrame(columns=["Start", "End", "Activities"])

# Convert the work periods list to rows in the result DataFrame
for period in work_periods:
    result = result.append({
        "Start": period[0],
        "End": period[1],
        "Activities": ", ".join(period[2])
    }, ignore_index=True)

# Print the result DataFrame
result.shape[0]

1057

In [13]:
# Continuous Period for Bot Priyanka:
result['Name'] = 'Priyanka'
result.head(5)

Unnamed: 0,Start,End,Activities,Name
0,2023-03-29 15:31:52.620,2023-03-31 10:43:52.620,"Business Development, Podcast, Send Email, Upd...",Priyanka
1,2023-04-04 05:55:52.620,2023-04-02 08:19:52.620,"Podcast, Send Email, Updates, Reporting, Remot...",Priyanka
2,2023-04-06 08:19:52.620,2023-04-04 20:19:52.620,"Business Development, Podcast, Send Email, Upd...",Priyanka
3,2023-04-09 01:07:52.620,2023-04-02 13:07:52.620,"Send Email, Inspection, Reply to Customers, Po...",Priyanka
4,2023-04-09 17:55:52.620,2023-04-04 05:55:52.620,"Business Development, Podcast, Send Email, Upd...",Priyanka


In [18]:
df['Name'].unique()

array(['Priyanka', 'Jyoti', 'Deepti', 'Sharan', 'Ravi'], dtype=object)

In [22]:
# Similary Doing it for other bots inside a function:
def calculate_work_periods(df, name):
    df_filtered = df[df['Name'] == name]
    df_sorted = df_filtered.sort_values('Start', ascending=True)

    work_periods = []
    current_period = None
    i = 0

    while i < len(df_sorted):
        row = df_sorted.iloc[i]
        if current_period is None:
            current_period = [row["Start"], row["End"], {row["Activity"]}]
        elif row["Start"] <= current_period[1]:
            current_period[1] = row["End"]
            current_period[2].add(row["Activity"])
        else:
            work_periods.append(current_period)
            current_period = [row["Start"], row["End"], {row["Activity"]}]
        i += 1

    work_periods.append(current_period)

    result = pd.DataFrame(columns=["Start", "End", "Activities"])

    for period in work_periods:
        result = result.append({
            "Start": period[0],
            "End": period[1],
            "Activities": ", ".join(period[2])
        }, ignore_index=True)

    result['Name'] = name
    return result




names = df['Name'].unique()

final_result = pd.DataFrame(columns=["Start", "End", "Activities", "Name"])

for name in names:
    result = calculate_work_periods(df, name)
    final_result = final_result.append(result)

print(final_result.head(5))

                    Start                     End  \
0 2023-03-29 15:31:52.620 2023-03-31 10:43:52.620   
1 2023-04-04 05:55:52.620 2023-04-02 08:19:52.620   
2 2023-04-06 08:19:52.620 2023-04-04 20:19:52.620   
3 2023-04-09 01:07:52.620 2023-04-02 13:07:52.620   
4 2023-04-09 17:55:52.620 2023-04-04 05:55:52.620   

                                          Activities      Name  
0  Business Development, Podcast, Send Email, Upd...  Priyanka  
1  Podcast, Send Email, Updates, Reporting, Remot...  Priyanka  
2  Business Development, Podcast, Send Email, Upd...  Priyanka  
3  Send Email, Inspection, Reply to Customers, Po...  Priyanka  
4  Business Development, Podcast, Send Email, Upd...  Priyanka  


In [24]:
final_result.sort_values('Name')

Unnamed: 0,Start,End,Activities,Name
622,2023-08-31 17:55:52.620,2023-07-20 20:19:52.620,"Remote Inspection, Inspection",Deepti
362,2023-07-25 20:19:52.620,2023-05-05 03:31:52.620,Reporting,Deepti
361,2023-07-25 20:19:52.620,2023-05-22 03:31:52.620,Podcast,Deepti
360,2023-07-25 13:07:52.620,2023-06-26 13:07:52.620,"Send Email, Reporting, Fund raising",Deepti
359,2023-07-25 10:43:52.620,2023-06-10 01:07:52.620,Updates,Deepti
...,...,...,...,...
783,2023-09-06 22:43:52.620,2023-04-20 22:43:52.620,Fund raising,Sharan
784,2023-09-07 01:07:52.620,2023-09-01 10:43:52.620,Updates,Sharan
785,2023-09-07 03:31:52.620,2023-07-14 22:43:52.620,Remote Inspection,Sharan
769,2023-09-05 13:07:52.620,2023-05-18 05:55:52.620,Remote Inspection,Sharan
