# Exploring All Geounit Revenue from Submitted Tickets

This notebook analyzes All Geounits revenue from the tickets submited in the system month by month using the global tickets WLES operations data.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import sys
import os
sys.path.append(os.path.abspath(os.path.join(os.getcwd(), '..')))

%matplotlib inline

# Set Seaborn style
sns.set_style("whitegrid")  # You can choose any Seaborn style such as "darkgrid", "white", "dark", "ticks"


## Data Loading and Initial Exploration

Let's load the data and take a look at its structure and content.

In [None]:
# Load the data
df = pd.read_csv('../raw_data/global_tickets_wles_ops_data.csv')

# Display the first few rows and data info
print(df.head())
print("\nDataset Info:")
df.info()

# Exploring Job Type Codes

In this section, we'll analyze the 'Job Type code' column to better understand the distribution of different job types in our dataset. This analysis will help us:

1. Identify the unique job types present in the data
2. Determine the frequency of each job type
3. Calculate the percentage distribution of job types
4. Visualize the distribution using bar plots and pie charts

This exploration will provide insights into the variety of jobs represented in our dataset and their relative prevalence, which could be crucial for understanding the nature of the work being performed and potentially identifying any imbalances or patterns in job type distribution.

In [None]:
# Explore unique values in Job Type code column

# Get unique Job Type codes
job_types = df['Job Type code'].unique()

# Count occurrences of each Job Type code
job_type_counts = df['Job Type code'].value_counts()

print(f"Number of unique Job Type codes: {len(job_types)}")
print("\nUnique Job Type codes:")
print(job_types)

print("\nJob Type code counts:")
print(job_type_counts)

# Calculate percentage of each Job Type code
job_type_percentages = (job_type_counts / len(df) * 100).round(2)

print("\nJob Type code percentages:")
print(job_type_percentages)

# Create a bar plot of Job Type code counts
plt.figure(figsize=(12, 6))
job_type_counts.plot(kind='bar')
plt.title('Count of Job Type Codes')
plt.xlabel('Job Type Code')
plt.ylabel('Count')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()

# Create a pie chart of Job Type code percentages
plt.figure(figsize=(10, 10))
plt.pie(job_type_percentages, labels=job_type_percentages.index, autopct='%1.1f%%')
plt.title('Distribution of Job Type Codes')
plt.axis('equal')
plt.show()

## Data Preprocessing

Now, let's preprocess the data to prepare it for our analysis:
1. Convert date columns to datetime format
2. Extract month and year from the end date, applying the following rule:
   - If the date is from the 1st to the 25th (inclusive), use that month
   - If the date is from the 26th onwards, use the subsequent month
3. Group the data by Geounit and Month-Year, summing the revenue

In [None]:
# Data Preprocessing

# Import the preprocessing function
import importlib
import utils.revenue_data_preprocessing as rdp
importlib.reload(rdp)
print(dir(rdp))

# Apply the preprocessing function
df = rdp.preprocess_tickets_data(df)

# 3. Group the data by Geounit and Month-Year, summing the revenue
grouped_data = df.groupby(['Sl Geounit (Code)', pd.Grouper(key='Adjusted Date', freq='MS')])['Field Ticket USD net value'].sum().reset_index()

# Rename columns for clarity
grouped_data.columns = ['Geounit', 'Date', 'Revenue']

# Sort the data by Geounit and Date
grouped_data = grouped_data.sort_values(['Geounit', 'Date'])

# Display the first few rows of the processed data
print(grouped_data.head(10))

# Display summary statistics
print("\nSummary Statistics:")
print(grouped_data.describe())

# Check for any missing values
print("\nMissing Values:")
print(grouped_data.isnull().sum())

## Visualization: Monthly Revenue by Geounit

Let's create a line plot to visualize the monthly revenue trends for each Geounit.

In [None]:
import ipywidgets as widgets
from ipywidgets import interact

# Function to plot revenue over time for selected geounits
def plot_revenue_over_time(selected_geounits):
    plt.figure(figsize=(15, 8))
    
    for geounit in selected_geounits:
        data = grouped_data[grouped_data['Geounit'] == geounit]
        plt.plot(data['Date'], data['Revenue'], label=geounit)
    
    plt.title('Revenue Over Time by Geounit', fontsize=16)
    plt.xlabel('Date', fontsize=12)
    plt.ylabel('Revenue (USD)', fontsize=12)
    plt.legend(title='Geounit', bbox_to_anchor=(1.05, 1), loc='upper left')
    plt.grid(True, linestyle='--', alpha=0.7)
    plt.xticks(rotation=45)
    plt.tight_layout()
    plt.show()

# Get unique geounits
geounits = sorted(grouped_data['Geounit'].unique())

# Create multi-select widget
geounit_selector = widgets.SelectMultiple(
    options=geounits,
    value=[geounits[0]],  # Default to first geounit
    description='Geounits:',
    disabled=False
)

# Create interactive plot
interact(plot_revenue_over_time, selected_geounits=geounit_selector)

# Display summary statistics for all geounits
print("Summary Statistics by Geounit:")
summary_stats = grouped_data.groupby('Geounit')['Revenue'].agg(['mean', 'median', 'min', 'max']).round(2)
summary_stats.columns = ['Mean Revenue', 'Median Revenue', 'Min Revenue', 'Max Revenue']
display(summary_stats)

## Analysis

From the plot above, we can observe the following:

1. Different Geounits have varying levels of revenue.
2. Some Geounits show more volatility in their monthly revenue than others.
3. There might be seasonal patterns or trends for certain Geounits.

To further analyze this data, we could:
- Calculate and compare the average monthly revenue for each Geounit
- Identify the top-performing Geounits
- Analyze the revenue trends over time for specific Geounits of interest
- Investigate any correlation between revenue and other variables in the dataset