<a href="https://colab.research.google.com/github/faisal-ba-systems/ML-course-documents/blob/main/EDA_on_Teams_APA_on_SBP_2025_Business_Automation_Ltd.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Analysis on SBP 2025
### Number of Goals: 6
### Number of Targets: 35
### Number of Teams: 12


## Import Libraries

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.subplots as sp
import seaborn as sns

## Import Dataset

In [None]:
!pip install -q gdown
!gdown 1Hdo9UyKUdRQXUsHlXTU8Usurwg_UKO0do1vE2q1OuvA

Downloading...
From (original): https://drive.google.com/uc?id=1Hdo9UyKUdRQXUsHlXTU8Usurwg_UKO0do1vE2q1OuvA
From (redirected): https://docs.google.com/spreadsheets/d/1Hdo9UyKUdRQXUsHlXTU8Usurwg_UKO0do1vE2q1OuvA/export?format=xlsx
To: /content/SBP  Master Data.xlsx
0.00B [00:00, ?B/s]230kB [00:00, 6.23MB/s]


In [None]:
excel_path ='/content/SBP  Master Data.xlsx'
APA_status_df = pd.read_excel(excel_path,sheet_name='Team APA Status')

print("Shape of dataset:", APA_status_df.shape)
APA_status_df

Shape of dataset: (12, 3)


Unnamed: 0,# SL No.,Name,Team APA Status
0,1,Project Operation,Verfied and Confirmed
1,2,Implementation & ITS,Verfied and Confirmed
2,3,Mobile Apps & Games,Verfied and Confirmed
3,4,Supply Chain,Verfied and Confirmed
4,5,Finance & Logistics,Verfied and Confirmed
5,6,Webcrafter,Verfied and Confirmed
6,7,InnovX,Submitted
7,8,Application,Submitted
8,9,Business Development,Submitted
9,10,Industry 4.0,Not Submitted


## Statistical Dataset Analysis

In [None]:
def report_data_types_uniques_check(df):
    col = []
    d_type = []
    uniques = []
    n_uniques = []

    for i in df.columns:
        col.append(i)
        d_type.append(df[i].dtypes)
        uniques.append(df[i].unique()[:5])
        n_uniques.append(df[i].nunique())

    return pd.DataFrame({'Column': col, 'd_type': d_type, 'unique_sample': uniques, 'n_uniques': n_uniques})

report_data_types_uniques_check(APA_status_df)

Unnamed: 0,Column,d_type,unique_sample,n_uniques
0,# SL No.,int64,"[1, 2, 3, 4, 5]",12
1,Name,object,"[Project Operation, Implementation & ITS, Mobi...",12
2,Team APA Status,object,"[Verfied and Confirmed, Submitted, Not Submitted]",3


### SBP - Team APA Status

In [None]:
all_teams_business_automation = list(APA_status_df['Team APA Status'].unique())
print("Number of teams:", len(all_teams_business_automation))

Number of teams: 3


In [None]:
# Count the occurrences of each APA Status
status_counts = APA_status_df['Team APA Status'].value_counts().reset_index()
status_counts.columns = ['APA Status', 'Count']

# Create a pie chart using Plotly
fig = px.pie(status_counts,
             names='APA Status',
             values='Count',
             color='APA Status',
             color_discrete_map={
                'Verfied and Confirmed': 'green',
                'Submitted': 'gold',
                'Not Submitted': 'red'
            },
             title='Team APA Status Distribution')

fig.show()

In [None]:
# Add a counter column (1) to use for count aggregation
APA_status_df['Count'] = 1

# Create bar chart
fig = px.bar(
    APA_status_df,
    x='Name',
    y='Count',
    color='Team APA Status',
    color_discrete_map={
        'Verfied and Confirmed': 'green',
        'Submitted': 'gold',
        'Not Submitted': 'red'
    },
    title='APA Submission Status by Team',
    labels={'Count': 'Number of Entries'},
)

# Rotate x-axis labels and hide Y-axis
fig.update_layout(
    xaxis_title='Team Name',
    yaxis_title=None,
    xaxis_tickangle=-45,
    barmode='group',
    plot_bgcolor='white',
    yaxis=dict(showticklabels=False, showgrid=False, zeroline=False)  # Hide Y-axis labels and grid
)

fig.show()

In [None]:
df = pd.read_excel(excel_path,sheet_name='Final Master Data')

print("Shape of dataset:", df.shape)
df.head()

Shape of dataset: (209, 9)


Unnamed: 0,Goals,Targets,#SL No. (Team Task),Team Activities,Team,Deadline,Status,Unnamed: 7,Unnamed: 8
0,Goal 1: Customer-Centric and Sustainable Innov...,Target 1.4 : Launch pilot products informed by...,1,Assign 01 resource from the PO (Project Operat...,Project Operation,2025-12-25,To-Do,,
1,Goal 1: Customer-Centric and Sustainable Innov...,Target 1.1 : Foster a culture of innovation by...,2,Reduce 50% time for writing project documents ...,Project Operation,2025-06-25,In-Progress,,
2,Goal 1: Customer-Centric and Sustainable Innov...,Target 1.2 : Allocate time and resources for e...,3,Organize and manage an annual hackathon event ...,Project Operation,2025-12-25,To-Do,,
3,Goal 1: Customer-Centric and Sustainable Innov...,Target 1.2 : Allocate time and resources for e...,4,Allocate 2 resources—1 from Project Operations...,Project Operation,2025-12-25,In-Progress,,Aligned with Target\nCountable & measurealble\...
4,Goal 1: Customer-Centric and Sustainable Innov...,"Target 1.3 : Scale successful, customer-driven...",5,We will scale up our products to 05 new local ...,Project Operation,2025-12-25,In-Progress,,


In [None]:
# Use regex to separate the goal number and name
df[['Goal', 'Goal Name']] = df['Goals'].str.extract(r'(Goal \d+):?\s*(.+)')
# Extract 'Target Number' and 'Target Name'
df[['Target', 'Target Name']] = df['Targets'].str.extract(r'(Target \d+\.\d+)\s*:?\s*(.+)')
df.head()

Unnamed: 0,Goals,Targets,#SL No. (Team Task),Team Activities,Team,Deadline,Status,Unnamed: 7,Unnamed: 8,Goal,Goal Name,Target,Target Name
0,Goal 1: Customer-Centric and Sustainable Innov...,Target 1.4 : Launch pilot products informed by...,1,Assign 01 resource from the PO (Project Operat...,Project Operation,2025-12-25,To-Do,,,Goal 1,Customer-Centric and Sustainable Innovation,Target 1.4,Launch pilot products informed by customer ins...
1,Goal 1: Customer-Centric and Sustainable Innov...,Target 1.1 : Foster a culture of innovation by...,2,Reduce 50% time for writing project documents ...,Project Operation,2025-06-25,In-Progress,,,Goal 1,Customer-Centric and Sustainable Innovation,Target 1.1,Foster a culture of innovation by establishing...
2,Goal 1: Customer-Centric and Sustainable Innov...,Target 1.2 : Allocate time and resources for e...,3,Organize and manage an annual hackathon event ...,Project Operation,2025-12-25,To-Do,,,Goal 1,Customer-Centric and Sustainable Innovation,Target 1.2,Allocate time and resources for employees to w...
3,Goal 1: Customer-Centric and Sustainable Innov...,Target 1.2 : Allocate time and resources for e...,4,Allocate 2 resources—1 from Project Operations...,Project Operation,2025-12-25,In-Progress,,Aligned with Target\nCountable & measurealble\...,Goal 1,Customer-Centric and Sustainable Innovation,Target 1.2,Allocate time and resources for employees to w...
4,Goal 1: Customer-Centric and Sustainable Innov...,"Target 1.3 : Scale successful, customer-driven...",5,We will scale up our products to 05 new local ...,Project Operation,2025-12-25,In-Progress,,,Goal 1,Customer-Centric and Sustainable Innovation,Target 1.3,"Scale successful, customer-driven, and sustain..."


In [None]:
target_df = pd.read_excel(excel_path,sheet_name='Target List')

print("Shape of dataset:", target_df.shape)
target_df.head()

Shape of dataset: (35, 2)


Unnamed: 0,# Target Serial,Name
0,1.1,Foster a culture of innovation by establishing...
1,1.2,Allocate time and resources for employees to w...
2,1.3,"Scale successful, customer-driven, and sustain..."
3,1.4,Launch pilot products informed by customer ins...
4,1.5,Collaborate with academic institutions and res...


In [None]:
# Filter rows where 'Target Name' is NOT in target_df['Name'] and create a copy
remaining_df = target_df[~target_df['Name'].isin(df['Target Name'])].copy()

# Add the 'targets' column safely
remaining_df['targets'] = 'Target ' + remaining_df['# Target Serial'].astype(str)
remaining_df

Unnamed: 0,# Target Serial,Name,targets
20,4.4,Implement remote working strategies to decreas...,Target 4.4
21,4.5,Track and report on the environmental impact o...,Target 4.5


# Analysis Single Features
- Number of Activities VS Team
- Number of Targets VS Team
- Analysis Status Count
- Analysis Deadline Distribution
- Analysis Distribution of Goals

## Number of Activities VS Team

In [None]:
# Group by team and count activities
activity_counts = df.groupby('Team').size().reset_index(name='Activity Count')

# Create interactive bar chart
fig = px.bar(
    activity_counts,
    x='Team',
    y='Activity Count',
    color='Activity Count',
    color_continuous_scale='RdYlGn',  # Red for lower, green for higher
    hover_data=['Activity Count'],
    labels={'Activity Count': 'Activity Count', 'Team': 'Team Name'},
    title='Total number of Activities per Team'
)

fig.show()

## Number of Targets VS Team

In [None]:
# Group by team and count targets
target_counts = df.groupby('Team')['Target'].nunique().reset_index(name='Target Count')

# Create interactive bar chart
fig = px.bar(
    target_counts,
    x='Team',
    y='Target Count',
    color='Target Count',
    color_continuous_scale='RdYlGn',  # Red for lower, green for higher
    hover_data=['Target Count'],
    labels={'Target Count': 'Target Count', 'Team': 'Team Name'},
    title='Number of Target Boost by Team'
)

fig.show()


## Analysis Status Count

In [None]:
status_counts = df['Status'].value_counts().reset_index()
status_counts.columns = ['Status', 'Count']
status_color_map = {
    'Done': 'green',
    'In-Progress': 'olive',
    'To-Do': 'slateblue',
    'Skipped': 'darkred'
}
# Plot using Plotly Express (horizontal bar)
fig = px.bar(
    status_counts,
    x='Count',
    y='Status',
    orientation='h',
    title='Activity Wise Status Distribution',
    color='Status',
    color_discrete_map=status_color_map
)

fig.update_layout(
    xaxis_title='Count',
    yaxis_title='Status',
    showlegend=False,
    plot_bgcolor='white'
)

fig.show()

## Analysis Deadline Distribution

In [None]:
# Assuming your 'Timeline' column is in datetime format, if not, you can convert it like:
df['Timeline'] = pd.to_datetime(df['Deadline'], errors='coerce')

# Extract the month from 'Timeline' column
df['Month'] = df['Timeline'].dt.month_name()  # Get month name (January, February, etc.)

# Group by month and count the occurrences
timeline_counts = df['Month'].value_counts().reset_index()
timeline_counts.columns = ['Month', 'Count']

# Sort the months in chronological order
month_order = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
timeline_counts['Month'] = pd.Categorical(timeline_counts['Month'], categories=month_order, ordered=True)
timeline_counts = timeline_counts.sort_values('Month')

# Plot using Plotly Express (vertical bar with Month on x-axis and Count on y-axis)
fig = px.bar(
    timeline_counts,
    x='Month',
    y='Count',
    title='Activity Distribution by Month',
    color='Month',
    color_discrete_sequence=px.colors.qualitative.Dark2
)

fig.update_layout(
    xaxis_title='Month',
    yaxis_title='Activity Count',
    showlegend=False,
    plot_bgcolor='white'
)

fig.show()

## Analysis Distribution of Goals

In [None]:
# Group and count goals
goal_counts = df['Goals'].value_counts().reset_index()
goal_counts.columns = ['Goals', 'Count']

# Plot using Plotly Express
fig = px.bar(
    goal_counts,
    x='Count',
    y='Goals',
    orientation='h',
    title='Distribution of Goals by Activities Count',
    color='Goals',
    color_discrete_sequence=px.colors.qualitative.Dark2
)

fig.update_layout(
    xaxis_title='Count',
    yaxis_title='Goals',
    showlegend=False,
    plot_bgcolor='white'
)

fig.show()

# Analysis Mutiple Features
- Goal Wise Activity Count by Status
- Goal vs Team with Activity Count
- Analysis Target Distribution with Activity Count by Status
- Activity Analysis With Team


### Goal Wise Activity Count by Status

In [None]:
# Create the grouped bar chart
fig = px.histogram(
    df,
    x='Goal',
    color='Status',
    color_discrete_map=status_color_map,
    barmode='group',
    title='Total Activities per Goal by Status'

)

# Rotate x-axis labels and improve layout
fig.update_layout(
    xaxis_title='Goal',
    yaxis_title='Number of Activities',
    bargap=0.2,
    xaxis_tickangle=45,
    plot_bgcolor='white'
)

fig.show()

### Goal vs Team with Activity Count


In [None]:

df['Count'] = 1
goal_team_counts = df.groupby(['Team', 'Goal']).size().reset_index(name='Activity Count')

fig = px.bar(
    goal_team_counts,
    x='Team',
    y='Activity Count',
    color='Goal',
    title='Goal vs Team with Activity Count',
    labels={'Activity Count': 'Activity Count', 'Goal': 'Goal', 'Team': 'Team'},
    text='Activity Count',
)

# Update layout for better presentation
fig.update_layout(
    barmode='group',
    xaxis_tickangle=-45,
    plot_bgcolor='white'
)

fig.show()

## Analysis Target Distribution with Activity Count by Status

In [None]:
# Create the interactive grouped bar chart
fig = px.histogram(
    df,
    x='Target',
    color='Status',
    barmode='group',
    color_discrete_map=status_color_map,
    title='Activities per Target by Status',
    # category_orders={"Target": sorted(df['Target'].unique())}
)

# Customize layout and rotate x-axis labels
fig.update_layout(
    xaxis_title='Target',
    yaxis_title='Activity Count',
    bargap=0.2,
    xaxis_tickangle=45,
    plot_bgcolor='white'
)

fig.show()

### Top 10 Targets by Activity Count

In [None]:
import plotly.express as px
import pandas as pd
import textwrap

# Function to wrap text with <br> every N characters
def wrap_hover_text(text, width=50):
    return '<br>'.join(textwrap.wrap(text, width=width))

# Prepare top 10 targets
top_targets = df[['Target', 'Target Name']].value_counts().reset_index(name='Activity_Count')
top_targets = top_targets.head(10)

# Wrap long 'Targets' text for better hover display
top_targets['Target Name'] = top_targets['Target Name'].apply(lambda x: wrap_hover_text(x, 100))

# Create bar chart
fig = px.bar(
    top_targets,
    x='Activity_Count',
    y='Target',
    orientation='h',
    color='Target',
    text='Target',
    hover_name='Target',
    hover_data={'Target Name': True},
    title='Top 10 Targets by Activity Count',
    color_discrete_sequence=px.colors.qualitative.Dark2
)

# Adjust layout
fig.update_traces(insidetextanchor='start')

fig.update_layout(
    xaxis_title='Activity Count',
    yaxis_title='',
    yaxis=dict(showticklabels=False),
    showlegend=False,
    plot_bgcolor='white',
    margin=dict(l=5, r=40, t=60, b=40),
    height=600
)

fig.show()


## Targets note boosted by any Team

In [None]:
import textwrap

# Wrap long text in the 'Name' column for hover display
def wrap_hover_text(text, width=40):
    return '<br>'.join(textwrap.wrap(text, width))

# Create a new column for wrapped hover text
remaining_df['HoverText'] = remaining_df['Name'].apply(lambda x: wrap_hover_text(x, 100))

# Add a dummy value column to display bars
remaining_df['Value'] = 1

# Create horizontal bar chart with yellow bars
fig = px.bar(
    remaining_df,
    x='Value',
    y='targets',
    orientation='h',
    text='Name',  # Show Name inside the bar
    title='Targets note boosted by any Team',
    color_discrete_sequence=['gold']  # Yellow
)

# Place text inside the bars and add custom hovertemplate
fig.update_traces(
    textposition='inside',
    insidetextanchor='start',
    hovertemplate='%{customdata[0]}<extra></extra>',
    customdata=remaining_df[['HoverText']]
)

# Layout tweaks for smaller size
fig.update_layout(
    xaxis=dict(visible=False),
    yaxis_title='Targets',
    showlegend=False,
    plot_bgcolor='white',
    margin=dict(l=20, r=30, t=40, b=30),
    height=200
)

fig.show()


### Most Common Target by Team

In [None]:
most_common_target = top_targets.iloc[0]['Target']
top_targets_by_team = df[df['Target'] == most_common_target]['Team'].value_counts().reset_index()
top_targets_by_team.columns = ['Team', 'Count']
px.bar(top_targets_by_team,
       x='Team',
       y='Count',
       title=f'Activity Distribution of Teams on the highest target ({most_common_target})',
       color='Team').show()

## Activity Analysis With Team

In [None]:
# Plot: Status vs Team
fig = px.histogram(
    df,
    x='Status',
    color='Team',
    barmode='group',
    title='Activity Count vs Activity Status by Team',
)

# Layout adjustments
fig.update_layout(
    xaxis_title='Status',
    yaxis_title='Count of Activities',
    bargap=0.2
)

fig.show()

In [None]:
# Plot: Status vs Team
fig = px.histogram(
    df,
    x='Team',
    color='Status',
    color_discrete_map=status_color_map,
    barmode='group',
    title='Activity Count vs Team by Activity Status',
)

# Layout adjustments
fig.update_layout(
    xaxis_title='Team',
    yaxis_title='Count of Status',
    bargap=0.2
)
# Rotate x-axis labels
fig.update_xaxes(tickangle=-45)

fig.show()

## Summary APA - 2025

In [None]:
import plotly.express as px
import pandas as pd

# Your summary list (fixed commas)
summary = [
    'Need budget for Innovation lab,  and AI/ML implementation, other initiative',
    'Need to more focus on Operational Excellence and here is huge improvements scope',
    'Need to ensure log for all team who are related to delivery and project related work.',
    'Two targets need to reconsider where no activities available',
    'There has not been any significant activity centered on the GovStack, but all teams should increase their activities on the GovStack roadmap.',
    'A minor changes is required for SBP-2025 to improve efficiency & alighment with goals'
]

# Create DataFrame and reverse the order
df_summary = pd.DataFrame({
    # 'Serial': list(range(len(summary), 0, -1)),  # Serial numbers reversed
    'Serial': list(range(1, len(summary) + 1)),  # 1 to N
    'Summary': summary[::-1],                   # Summary reversed
    'Value': [1] * len(summary)
})

# Plot horizontal bar chart
fig = px.bar(
    df_summary,
    x='Value',
    y='Serial',
    orientation='h',
    text='Summary',
    color='Summary',
    color_discrete_sequence=px.colors.qualitative.Pastel
)

# Style and layout
fig.update_traces(
    textposition='inside',
    insidetextanchor='start',
    textfont=dict(size=14),
    hoverinfo='none'  # Disable hover info
)

fig.update_layout(
    showlegend=False,
    xaxis=dict(visible=False),
    yaxis=dict(visible=False),
    margin=dict(l=80, r=80, t=80, b=80),
    plot_bgcolor='white',
    title='Findings on APA on SBP - 2025',
)

fig.show()

# Team Wise Filtering

In [None]:
# Import required libraries
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from IPython.display import display, HTML


#####################################
# SECTION 1: TEAM ANALYSIS SECTION
#####################################
print("\n" + "="*50)
print("SECTION 1: TEAM ANALYSIS")
print("="*50)

# Display available teams for reference
# team_names = sorted(df['Team'].unique().tolist())
team_names = df['Team'].unique().tolist()
print(f"\nAvailable teams: {team_names}")

# Team selection dropdown
selected_team = 'Webcrafter' # @param ["", "Application", "Business Development", "CIRT & Infra", "HR,Admin & GSD", "Implementation & ITS", "Industry 4.o", "InnovX", "Mobile Apps & Games", "Project Operation", "Webcrafter"] {type:"string"}

# Function to analyze team-specific data with interactive plots
def analyze_team(team_name):
    if not team_name:
        print("\nNo team selected. Please select a team from the dropdown.")
        return None

    try:
        team_data = df[df['Team'] == team_name]
        if len(team_data) == 0:
            print(f"\nTeam '{team_name}' not found in the dataset.")
            return None

        print(f"\n=== {team_name} Analysis ===")
        print(f"Total activities: {len(team_data)}")

        # Status breakdown
        status_counts = team_data['Status'].value_counts()
        print("\nStatus distribution:")
        for status, count in status_counts.items():
            print(f"{status}: {count} ({count/len(team_data)*100:.1f}%)")

        # Goal focus
        goal_counts = team_data['Goal'].value_counts()
        print("\nGoal distribution:")
        for goal, count in goal_counts.items():
            print(f"{goal}: {count} ({count/len(goal_counts)*100:.1f}%)")

        # Timeline distribution
        timeline_counts = team_data['Timeline'].value_counts().sort_index()
        print("\nTimeline distribution:")
        for timeline, count in timeline_counts.items():
            print(f"{timeline}: {count}")

        # Create interactive visualizations for the selected team

        # 1. Status distribution pie chart
        status_df = pd.DataFrame({'Status': status_counts.index, 'Count': status_counts.values})
        fig_status = px.pie(
            status_df,
            values='Count',
            names='Status',
            color='Status',
            title=f'Status Distribution for {team_name}',
            # color_discrete_sequence=px.colors.qualitative.Set3,
            color_discrete_map=status_color_map,
            hole=0.3
        )
        fig_status.update_traces(textposition='inside', textinfo='percent+label')
        fig_status.show()

        # 2. Goal distribution bar chart
        goal_df = pd.DataFrame({'Goal': goal_counts.index, 'Count': goal_counts.values})
        goal_df = goal_df.sort_values('Count', ascending=False)
        fig_goals = px.bar(
            goal_df,
            x='Goal',
            y='Count',
            title=f'Goal Distribution for {team_name}',
            color='Goal',
            text='Count'
        )
        fig_goals.update_layout(xaxis_title='Goal', yaxis_title='Number of Activities')
        fig_goals.show()

        # 3. Timeline activity count
        timeline_df = pd.DataFrame({'Timeline': timeline_counts.index, 'Count': timeline_counts.values})
        fig_timeline = px.line(
            timeline_df,
            x='Timeline',
            y='Count',
            title=f'Timeline Activity Count for {team_name}',
            markers=True
        )
        fig_timeline.update_layout(xaxis_title='Timeline', yaxis_title='Number of Activities')
        fig_timeline.update_traces(line=dict(width=3))
        fig_timeline.show()

        # 4. Status by Goal heatmap
        status_by_goal = pd.crosstab(team_data['Goal'], team_data['Status'])
        fig_heatmap = px.imshow(
            status_by_goal,
            text_auto=True,
            aspect="auto",
            title=f'Status by Goal for {team_name}',
            labels=dict(x="Status", y="Goal", color="Count"),
            color_continuous_scale="YlGnBu"
        )
        fig_heatmap.update_layout(height=400)
        fig_heatmap.show()

        # Return data for potential further analysis
        return team_data

    except Exception as e:
        print(f"Error analyzing team: {e}")
        return None

# Run team analysis if a team is selected
if selected_team:
    team_data = analyze_team(selected_team)



SECTION 1: TEAM ANALYSIS

Available teams: ['Project Operation', 'Webcrafter', 'Implementation & ITS', 'Mobile Apps & Games', 'InnovX', 'Application', 'Supply Chain', 'Finance & Logistics', 'Business Development']

=== Webcrafter Analysis ===
Total activities: 35

Status distribution:
To-Do: 18 (51.4%)
In-Progress: 13 (37.1%)
Done: 4 (11.4%)

Goal distribution:
Goal 2: 10 (166.7%)
Goal 5: 8 (133.3%)
Goal 1: 7 (116.7%)
Goal 4: 5 (83.3%)
Goal 6: 4 (66.7%)
Goal 3: 1 (16.7%)

Timeline distribution:
2025-02-25 00:00:00: 2
2025-03-25 00:00:00: 1
2025-04-25 00:00:00: 1
2025-05-25 00:00:00: 1
2025-06-25 00:00:00: 3
2025-07-25 00:00:00: 4
2025-08-25 00:00:00: 2
2025-09-25 00:00:00: 3
2025-10-25 00:00:00: 4
2025-11-25 00:00:00: 6
2025-12-25 00:00:00: 8


# Goal Wise Filtering

In [None]:
#####################################
# SECTION 2: GOAL ANALYSIS SECTION
#####################################
print("\n" + "="*50)
print("SECTION 2: GOAL ANALYSIS")
print("="*50)

# Display available goals for reference
# goal_names = sorted(df['Goal'].unique().tolist())
goal_names = df['Goal'].unique().tolist()
print(f"\nAvailable goals: {goal_names}")

# Goal selection dropdown
selected_goal = 'Goal 2' # @param ["", "Goal 1", "Goal 2", "Goal 3", "Goal 4", "Goal 5", "Goal 6"] {type:"string"}

# Function to analyze goal-specific data with interactive plots
def analyze_goal(goal_name):
    if not goal_name:
        print("\nNo goal selected. Please select a goal from the dropdown.")
        return None

    try:
        goal_data = df[df['Goal'] == goal_name]
        if len(goal_data) == 0:
            print(f"\nGoal '{goal_name}' not found in the dataset.")
            return None

        print(f"\n=== {goal_name} Analysis ===")
        print(f"Total activities: {len(goal_data)}")

        # Status breakdown
        status_counts = goal_data['Status'].value_counts()
        print("\nStatus distribution:")
        for status, count in status_counts.items():
            print(f"{status}: {count} ({count/len(goal_data)*100:.1f}%)")

        # Team contribution
        team_counts = goal_data['Team'].value_counts()
        print("\nTeam contribution:")
        for team, count in team_counts.items():
            print(f"{team}: {count} ({count/len(goal_data)*100:.1f}%)")

        # Timeline distribution
        timeline_counts = goal_data['Timeline'].value_counts().sort_index()
        print("\nTimeline distribution:")
        for timeline, count in timeline_counts.items():
            print(f"{timeline}: {count}")

        # Create interactive visualizations for the selected goal

        # 1. Status distribution pie chart
        status_df = pd.DataFrame({'Status': status_counts.index, 'Count': status_counts.values})
        fig_status = px.pie(
            status_df,
            values='Count',
            names='Status',
            color='Status',
            title=f'Status Distribution for {goal_name}',
            # color_discrete_sequence=px.colors.qualitative.Pastel,
            color_discrete_map=status_color_map,
            hole=0.3
        )
        fig_status.update_traces(textposition='inside', textinfo='percent+label')
        fig_status.show()

        # 2. Team contribution bar chart
        team_df = pd.DataFrame({'Team': team_counts.index, 'Count': team_counts.values})
        team_df = team_df.sort_values('Count', ascending=False)
        fig_teams = px.bar(
            team_df,
            x='Team',
            y='Count',
            title=f'Team Contribution for {goal_name}',
            color='Team',
            text='Count'
        )
        fig_teams.update_layout(xaxis_title='Team', yaxis_title='Number of Activities')
        fig_teams.update_xaxes(tickangle=45)
        fig_teams.show()

        # 3. Timeline activity count
        timeline_df = pd.DataFrame({'Timeline': timeline_counts.index, 'Count': timeline_counts.values})
        fig_timeline = px.line(
            timeline_df,
            x='Timeline',
            y='Count',
            title=f'Timeline Activity Count for {goal_name}',
            markers=True
        )
        fig_timeline.update_layout(xaxis_title='Timeline', yaxis_title='Number of Activities')
        fig_timeline.update_traces(line=dict(width=3))
        fig_timeline.show()

        # 4. Status by Team heatmap
        status_by_team = pd.crosstab(goal_data['Team'], goal_data['Status'])
        fig_heatmap = px.imshow(
            status_by_team,
            text_auto=True,
            aspect="auto",
            title=f'Status by Team for {goal_name}',
            labels=dict(x="Status", y="Team", color="Count"),
            color_continuous_scale="Viridis"
        )
        fig_heatmap.update_layout(height=400)
        fig_heatmap.show()

        # Return data for potential further analysis
        return goal_data

    except Exception as e:
        print(f"Error analyzing goal: {e}")
        return None

# Run goal analysis if a goal is selected
if selected_goal:
    goal_data = analyze_goal(selected_goal)






SECTION 2: GOAL ANALYSIS

Available goals: ['Goal 1', 'Goal 2', 'Goal 3', 'Goal 4', 'Goal 5', 'Goal 6']

=== Goal 2 Analysis ===
Total activities: 60

Status distribution:
To-Do: 33 (55.0%)
In-Progress: 24 (40.0%)
Done: 3 (5.0%)

Team contribution:
Project Operation: 10 (16.7%)
Webcrafter: 10 (16.7%)
Implementation & ITS: 8 (13.3%)
Mobile Apps & Games: 7 (11.7%)
Application: 7 (11.7%)
Business Development: 6 (10.0%)
InnovX: 5 (8.3%)
Supply Chain: 4 (6.7%)
Finance & Logistics: 3 (5.0%)

Timeline distribution:
2025-02-25 00:00:00: 3
2025-03-25 00:00:00: 1
2025-05-25 00:00:00: 1
2025-06-25 00:00:00: 10
2025-07-25 00:00:00: 5
2025-09-25 00:00:00: 7
2025-10-25 00:00:00: 2
2025-11-25 00:00:00: 1
2025-12-25 00:00:00: 30


# Team and Goal Combined Filtering

In [None]:
#####################################
# SECTION 3: COMBINED INSIGHTS
#####################################
print("\n" + "="*50)
print("SECTION 3: COMBINED INSIGHTS")
print("="*50)

# Team selection dropdown
selected_team = 'Webcrafter' # @param ["", "Application", "Business Development", "CIRT & Infra", "HR,Admin & GSD", "Implementation & ITS", "Industry 4.o", "InnovX", "Mobile Apps & Games", "Project Operation", "Webcrafter"] {type:"string"}
# Goal selection dropdown
selected_goal = 'Goal 1' # @param ["", "Goal 1", "Goal 2", "Goal 3", "Goal 4", "Goal 5", "Goal 6"] {type:"string"}
# Only run this section if both team and goal are selected
if selected_team and selected_goal:
    # Filter data for the selected team and goal
    combined_data = df[(df['Team'] == selected_team) & (df['Goal'] == selected_goal)]

    if len(combined_data) > 0:
        print(f"\n=== Combined Analysis for Team '{selected_team}' and Goal '{selected_goal}' ===")
        print(f"Total activities: {len(combined_data)}")

        # Status breakdown
        status_counts = combined_data['Status'].value_counts()
        print("\nStatus distribution:")
        for status, count in status_counts.items():
            print(f"{status}: {count} ({count/len(combined_data)*100:.1f}%)")

        # Timeline distribution
        timeline_counts = combined_data['Timeline'].value_counts().sort_index()
        print("\nTimeline distribution:")
        for timeline, count in timeline_counts.items():
            print(f"{timeline}: {count}")

        # Create interactive visualizations for the combined data

        # 1. Status distribution pie chart
        status_df = pd.DataFrame({'Status': status_counts.index, 'Count': status_counts.values})
        fig_combined_status = px.pie(
            status_df,
            values='Count',
            names='Status',
            color='Status',
            title=f'Status Distribution for {selected_team} on {selected_goal}',
            # color_discrete_sequence=px.colors.qualitative.Bold,
            color_discrete_map=status_color_map,
            hole=0.4
        )
        fig_combined_status.update_traces(textposition='inside', textinfo='percent+label')
        fig_combined_status.show()

        # 2. Timeline activity count
        if len(timeline_counts) > 1:  # Only show if there's more than one timeline point
            timeline_df = pd.DataFrame({'Timeline': timeline_counts.index, 'Count': timeline_counts.values})
            fig_combined_timeline = px.bar(
                timeline_df,
                x='Timeline',
                y='Count',
                title=f'Timeline Activities for {selected_team} on {selected_goal}',
                color='Count',
                text='Count'
            )
            fig_combined_timeline.update_layout(xaxis_title='Timeline', yaxis_title='Number of Activities')
            fig_combined_timeline.show()

        # Manually map status labels to colors
        colors = [status_color_map.get(status, 'gray') for status in status_df['Status']]

        # 3. Combined summary in a single view
        fig_summary = make_subplots(
            rows=1, cols=2,
            specs=[[{"type": "pie"}, {"type": "bar"}]],
            subplot_titles=(f"Status Distribution", f"Activity Timeline"),
            horizontal_spacing=0.1
        )

        # Add pie chart
        for i, status in enumerate(status_df['Status']):
            fig_summary.add_trace(
                go.Pie(
                    labels=status_df['Status'],
                    values=status_df['Count'],
                    name="Status",
                    marker=dict(colors=colors),
                    hole=0.4,
                    textinfo='percent+label'
                ),
                row=1, col=1
            )

        # Add timeline bar chart if there's more than one timeline point
        if len(timeline_counts) > 1:
            fig_summary.add_trace(
                go.Bar(
                    x=timeline_df['Timeline'],
                    y=timeline_df['Count'],
                    name="Activities",
                    text=timeline_df['Count'],
                    textposition='auto'
                ),
                row=1, col=2
            )

        fig_summary.update_layout(
            title_text=f"Summary for {selected_team} on {selected_goal}",
            height=500,
            showlegend=False
        )
        fig_summary.show()
    else:
        print(f"\nNo activities found for Team '{selected_team}' working on Goal '{selected_goal}'.")

print("\nAnalysis complete! Review the interactive visualizations above for insights.")


SECTION 3: COMBINED INSIGHTS

=== Combined Analysis for Team 'Webcrafter' and Goal 'Goal 1' ===
Total activities: 7

Status distribution:
In-Progress: 4 (57.1%)
To-Do: 2 (28.6%)
Done: 1 (14.3%)

Timeline distribution:
2025-04-25 00:00:00: 1
2025-06-25 00:00:00: 1
2025-07-25 00:00:00: 1
2025-09-25 00:00:00: 1
2025-10-25 00:00:00: 1
2025-11-25 00:00:00: 2



Analysis complete! Review the interactive visualizations above for insights.
