# Notebook 05 ‚Äî Report Assembly & Export (Time-Agnostic)

**Purpose**: Create the KPI page, visuals (non-time-based), tables, anomaly & recommendation pages, and export a combined PDF that includes the original reference PDF.

**Deliverables**:
- KPI indicator cards
- Platform/store comparison charts
- Stream composition analysis
- Top performers tables (HTML export)
- Anomaly summary and recommendations
- Combined PDF report with original reference PDF appended
- Deliverable package (ZIP)

In [18]:
# Cell 1 ‚Äî Imports and Load Previously Saved Artifacts
import pandas as pd
import numpy as np
from pathlib import Path
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from PyPDF2 import PdfReader, PdfWriter
import warnings
warnings.filterwarnings('ignore')

# Setup paths
DATA_DIR = Path('../data')
out_dir = Path('../outputs')
advanced_dir = out_dir / 'advanced'
fig_dir = out_dir / 'figures'
reports_dir = out_dir / 'reports'
cleaned_dir = out_dir / 'cleaned'

fig_dir.mkdir(parents=True, exist_ok=True)
reports_dir.mkdir(parents=True, exist_ok=True)

print("Loading artifacts from previous notebooks...\n")

# Load cleaned / computed CSVs from Notebook 03 and 04 outputs
top_artists = None
top_albums = None
top_tracks = None

# Try Notebook 04 outputs first, then Notebook 03
if (advanced_dir / 'top_artists.csv').exists():
    top_artists = pd.read_csv(advanced_dir / 'top_artists.csv')
    top_albums = pd.read_csv(advanced_dir / 'top_albums.csv')
    top_tracks = pd.read_csv(advanced_dir / 'top_tracks.csv')
    print("‚úÖ Loaded top performers from Notebook 04 (advanced/)")
elif (fig_dir / 'top_50_artists.csv').exists():
    top_artists = pd.read_csv(fig_dir / 'top_50_artists.csv')
    top_albums = pd.read_csv(fig_dir / 'top_50_albums.csv')
    top_tracks = pd.read_csv(fig_dir / 'top_50_tracks.csv')
    print("‚úÖ Loaded top performers from Notebook 03 (figures/)")
else:
    print("‚ö†Ô∏è Top performers CSVs not found. Please run Notebook 03 or 04 first.")

# Load clusters if available
clusters = None
if (advanced_dir / 'top_tracks_with_clusters.csv').exists():
    clusters = pd.read_csv(advanced_dir / 'top_tracks_with_clusters.csv')
    print("‚úÖ Loaded clustering data from Notebook 04")

# Load anomalies if available
anomalies = None
if (advanced_dir / 'anomalies_cross_section.csv').exists():
    anomalies = pd.read_csv(advanced_dir / 'anomalies_cross_section.csv')
    print("‚úÖ Loaded anomaly data from Notebook 04")

# Load main combined dataframe
if (cleaned_dir / 'df_all.parquet').exists():
    df_all = pd.read_parquet(cleaned_dir / 'df_all.parquet')
    print("‚úÖ Loaded cleaned combined data from Notebook 02")
else:
    print("‚ö†Ô∏è Loading raw CSVs (cleaned data not found)...")
    try:
        airtel = pd.read_csv(DATA_DIR / 'airtel-report.csv', encoding='utf-8')
        jio = pd.read_csv(DATA_DIR / 'jiosaavn-report.csv', encoding='utf-8')
        wynk = pd.read_csv(DATA_DIR / 'wynk-report.csv', encoding='utf-8')
        df_all = pd.concat([airtel, jio, wynk], ignore_index=True, sort=False)
        df_all.columns = df_all.columns.str.strip().str.lower().str.replace(' ', '_')
        print("‚úÖ Loaded raw CSVs")
    except Exception as e:
        print(f"‚ùå Error loading data: {e}")
        df_all = pd.DataFrame()

print(f"\nDataframe shape: {df_all.shape}")
print(f"Columns: {list(df_all.columns[:10])}..." if len(df_all.columns) > 10 else f"Columns: {list(df_all.columns)}")

Loading artifacts from previous notebooks...

‚úÖ Loaded top performers from Notebook 04 (advanced/)
‚úÖ Loaded clustering data from Notebook 04
‚úÖ Loaded anomaly data from Notebook 04
‚úÖ Loaded cleaned combined data from Notebook 02

Dataframe shape: (8882, 12)
Columns: ['source', 'activity_period', 'year_month', 'store_name', 'country', 'artist', 'album', 'track', 'revenue', 'stream_count']...


In [19]:
# Cell 2 ‚Äî KPI Cards (Plotly Indicators)
print("Creating KPI indicator cards...\n")

# Determine revenue column name
revenue_col = None
for col in ['revenue', 'rev', 'royality', 'royalty', 'income']:
    if col in df_all.columns:
        revenue_col = col
        break

if revenue_col is None:
    print("‚ö†Ô∏è Revenue column not found in dataframe")
    total_rev = 0
else:
    # Ensure numeric
    df_all[revenue_col] = pd.to_numeric(df_all[revenue_col], errors='coerce').fillna(0)
    total_rev = df_all[revenue_col].sum()

# Get top artist info
if top_artists is not None and not top_artists.empty:
    top_artist_name = top_artists.iloc[0]['artist']
    top_artist_rev = float(top_artists.iloc[0]['total_revenue'])
else:
    top_artist_name = "N/A"
    top_artist_rev = 0

# Get track count
num_tracks = len(top_tracks) if top_tracks is not None else 0

# Create indicator figure
fig = make_subplots(
    rows=1, cols=3, 
    specs=[[{'type':'indicator'}, {'type':'indicator'}, {'type':'indicator'}]],
    subplot_titles=['Total Revenue', f'Top Artist: {top_artist_name}', 'Unique Top Tracks']
)

fig.add_trace(
    go.Indicator(
        mode='number',
        value=total_rev,
        number={'prefix': '‚Çπ', 'valueformat': ',.2f'},
        title={'text': 'Total Revenue', 'font': {'size': 16}}
    ),
    row=1, col=1
)

fig.add_trace(
    go.Indicator(
        mode='number',
        value=top_artist_rev,
        number={'prefix': '‚Çπ', 'valueformat': ',.2f'},
        title={'text': f'Top Artist<br>{top_artist_name}', 'font': {'size': 14}}
    ),
    row=1, col=2
)

fig.add_trace(
    go.Indicator(
        mode='number',
        value=num_tracks,
        number={'valueformat': ','},
        title={'text': 'Unique Top Tracks', 'font': {'size': 16}}
    ),
    row=1, col=3
)

fig.update_layout(
    height=300,
    title_text='Key Performance Indicators',
    title_font_size=20,
)

try:
    fig.write_image(str(fig_dir / 'kpi_cards.png'), scale=2, width=1200, height=300)
    print("‚úÖ Saved kpi_cards.png\n")
except Exception as e:
    print(f"‚ö†Ô∏è Could not save KPI PNG (kaleido issue): {e}")
    try:
        fig.write_html(str(fig_dir / 'kpi_cards.html'))
        print("‚úÖ Saved kpi_cards.html instead\n")
    except Exception as e2:
        print(f"‚ö†Ô∏è Could not save KPI HTML fallback: {e2}")

fig.show()

Creating KPI indicator cards...

‚ö†Ô∏è Could not save KPI PNG (kaleido issue): 
Image export using the "kaleido" engine requires the Kaleido package,
which can be installed using pip:

    $ pip install --upgrade kaleido

‚úÖ Saved kpi_cards.html instead



In [20]:
# Cell 3 ‚Äî DSP Bar Chart (Store Comparison)
print("Creating platform/store revenue comparison...\n")

# Find source/store column
source_col = None
for col in ['source', 'store_name', 'platform', 'store']:
    if col in df_all.columns:
        source_col = col
        break

if revenue_col and source_col:
    dsp = df_all.groupby(source_col, as_index=False)[revenue_col].sum().sort_values(revenue_col, ascending=False)
    dsp.columns = ['Platform', 'Revenue']
    
    fig = px.bar(
        dsp, 
        x='Revenue', 
        y='Platform', 
        orientation='h',
        title='Revenue by Platform/Store',
        labels={'Revenue': 'Revenue (‚Çπ)', 'Platform': 'Platform'},
        color='Revenue',
        color_continuous_scale='Blues',
        text='Revenue'
    )
    
    fig.update_traces(texttemplate='‚Çπ%{text:.2s}', textposition='outside')
    fig.update_layout(height=400, showlegend=False)
    
    try:
        fig.write_image(str(fig_dir / 'dsp_revenue.png'), scale=2, width=1000, height=400)
        print("‚úÖ Saved dsp_revenue.png\n")
    except Exception as e:
        print(f"‚ö†Ô∏è Could not save PNG: {e}")
        fig.write_html(str(fig_dir / 'dsp_revenue.html'))
        print("‚úÖ Saved dsp_revenue.html instead\n")    
        print('‚ö†Ô∏è Store or revenue column not found; please check column names.')

    else:

        display(dsp)    
    fig.show()

Creating platform/store revenue comparison...



‚ö†Ô∏è Could not save PNG: 
Image export using the "kaleido" engine requires the Kaleido package,
which can be installed using pip:

    $ pip install --upgrade kaleido

‚úÖ Saved dsp_revenue.html instead

‚ö†Ô∏è Store or revenue column not found; please check column names.


In [21]:
# Cell 4 ‚Äî Streaming Composition Pie (Unit Type)
print("Creating stream/unit composition analysis...\n")

# Attempt to find stream-type columns
pie_series = None

if 'ad_supported_streams' in df_all.columns or 'subscription_streams' in df_all.columns:
    ad = pd.to_numeric(df_all.get('ad_supported_streams', 0), errors='coerce').sum()
    sub = pd.to_numeric(df_all.get('subscription_streams', 0), errors='coerce').sum()
    pie_series = pd.Series({'Ad-supported': ad, 'Subscription': sub})
    title = 'Stream Type Distribution'
elif 'unit_type' in df_all.columns:
    pie_series = df_all.groupby('unit_type')[revenue_col].sum()
    title = 'Revenue by Unit Type'
else:
    # Fallback: distribution by source
    if source_col:
        pie_series = df_all.groupby(source_col)[revenue_col].sum()
        title = 'Revenue Distribution by Source'
    else:
        pie_series = pd.Series({'Total': total_rev})
        title = 'Total Revenue (No Breakdown Available)'

fig = px.pie(
    values=pie_series.values, 
    names=pie_series.index, 
    title=title,
    color_discrete_sequence=px.colors.qualitative.Set3
)

fig.update_traces(textposition='inside', textinfo='percent+label')
fig.update_layout(height=500)

try:
    fig.write_image(str(fig_dir / 'stream_pie.png'), scale=2, width=800, height=500)
    print("‚úÖ Saved stream_pie.png\n")
except Exception as e:

    print(f"‚ö†Ô∏è Could not save PNG: {e}")fig.show()

    fig.write_html(str(fig_dir / 'stream_pie.html'))display(pd.DataFrame({'Category': pie_series.index, 'Value': pie_series.values}))

    print("‚úÖ Saved stream_pie.html instead\n")

SyntaxError: invalid syntax (1718078985.py, line 39)

In [None]:
# Cell 5 ‚Äî Caller Tune Overview (if data exists) or Reference Note
print("Checking for caller tune data...\n")

# Check for caller tune fields
caller_cols = [c for c in df_all.columns if 'caller' in c.lower() or 'tune' in c.lower() or 'crbt' in c.lower()]

if caller_cols:
    print(f"Found caller tune column(s): {caller_cols}\n")
    caller_dist = df_all.groupby(caller_cols[0], as_index=False)[revenue_col].sum().sort_values(revenue_col, ascending=False).head(20)
    caller_dist.to_csv(fig_dir / 'caller_tune_distribution.csv', index=False)
    print("‚úÖ Saved caller_tune_distribution.csv\n")
    display(caller_dist)
else:
    print("‚ÑπÔ∏è No caller-tune columns found in data.")
    print("Original reference PDF will be attached to the final report for caller-tune visuals.\n")
    
    # Create a note file
    with open(reports_dir / 'caller_tune_note.txt', 'w') as f:
        f.write("Caller Tune Analysis\n")
        f.write("===================\n\n")
        f.write("No caller tune data was found in the provided datasets.\n")
        f.write("Please refer to the original Dashboard PDF (appended to final report) for caller tune insights.\n")
    
    print("‚úÖ Created caller_tune_note.txt")

Checking for caller tune data...

‚ÑπÔ∏è No caller-tune columns found in data.
Original reference PDF will be attached to the final report for caller-tune visuals.

‚úÖ Created caller_tune_note.txt


In [None]:
# Cell 6 ‚Äî Country Map (if country column exists)
print("Creating country/regional analysis...\n")

if 'country' in df_all.columns:
    country_rev = df_all.groupby('country', as_index=False)[revenue_col].sum().sort_values(revenue_col, ascending=False)
    
    # Try choropleth
    try:
        fig = px.choropleth(
            country_rev, 
            locations='country', 
            locationmode='country names', 
            color=revenue_col,
            title='Revenue by Country',
            color_continuous_scale='Greens',
            labels={revenue_col: 'Revenue (‚Çπ)'}
        )
        fig.update_geos(showcountries=True, showcoastlines=True)
        fig.update_layout(height=500)
        
        try:
            fig.write_image(str(fig_dir / 'country_map.png'), scale=2, width=1000, height=500)
            print("‚úÖ Saved country_map.png\n")
        except Exception as e:
            print(f"‚ö†Ô∏è Could not save PNG: {e}")
            fig.write_html(str(fig_dir / 'country_map.html'))
            print("‚úÖ Saved country_map.html instead\n")
        
        fig.show()
    except Exception as e:
        print(f"‚ö†Ô∏è Could not create choropleth: {e}")
        print("Saving country revenue as CSV instead.\n")

        print('‚ÑπÔ∏è No country column found. Country map skipped.')

    # Always save CSVelse:

    country_rev.to_csv(fig_dir / 'country_revenue.csv', index=False)    display(country_rev)
    print("‚úÖ Saved country_revenue.csv\n")

SyntaxError: invalid syntax (2640188111.py, line 38)

In [None]:
# Cell 7 ‚Äî Top Tables Export (HTML for PDF embedding)
print("Exporting top performers tables to HTML...\n")

if top_artists is not None:
    # Add styling to HTML tables
    html_style = """
    <style>
        table { border-collapse: collapse; width: 100%; font-family: Arial, sans-serif; }
        th { background-color: #4ECDC4; color: white; padding: 12px; text-align: left; }
        td { padding: 10px; border-bottom: 1px solid #ddd; }
        tr:hover { background-color: #f5f5f5; }
    </style>
    """
    
    # Top Artists
    with open(fig_dir / 'top_artists_table.html', 'w') as f:
        f.write("<h2>Top 20 Artists by Revenue</h2>")
        f.write(html_style)
        f.write(top_artists.head(20).to_html(index=False, float_format='%.2f'))
    print("‚úÖ Saved top_artists_table.html")
    
    # Top Albums
    if top_albums is not None:
        with open(fig_dir / 'top_albums_table.html', 'w') as f:
            f.write("<h2>Top 20 Albums by Revenue</h2>")
            f.write(html_style)
            f.write(top_albums.head(20).to_html(index=False, float_format='%.2f'))
        print("‚úÖ Saved top_albums_table.html")
    
    # Top Tracks
    if top_tracks is not None:
        with open(fig_dir / 'top_tracks_table.html', 'w') as f:
            f.write("<h2>Top 20 Tracks by Revenue</h2>")
            f.write(html_style)
            f.write(top_tracks.head(20).to_html(index=False, float_format='%.2f'))
        print("‚úÖ Saved top_tracks_table.html")
    
    print(f"\nTables exported to {fig_dir}/\n")
    
    # Display preview
    print("Preview - Top 10 Artists:")
    display(top_artists.head(10))
else:
    print("‚ö†Ô∏è Top performers data not available. Please run Notebook 03 or 04 first.")

Exporting top performers tables to HTML...

‚úÖ Saved top_artists_table.html
‚úÖ Saved top_albums_table.html
‚úÖ Saved top_tracks_table.html

Tables exported to ../outputs/figures/

Preview - Top 10 Artists:


Unnamed: 0,artist,total_revenue,total_streams,avg_revenue_per_stream
0,F A Sumon,9318.861616,165875.0,0.05618
1,Kaushik Chakraborty,998.393974,17704.0,0.056394
2,Arijit Singh,755.228074,13443.0,0.05618
3,S.P. Venkatesh,677.85,9038.0,0.075
4,Pratik Sen,616.961729,8722.0,0.070736
5,Shreya Ghoshal,570.957592,10163.0,0.05618
6,Iman Chakraborty,496.481979,8836.0,0.056189
7,Rupam Islam,479.578736,6762.0,0.070923
8,Anupam Roy,475.938564,8404.0,0.056632
9,Rabindranath Tagore,422.4,5632.0,0.075


In [None]:
# Cell 8 ‚Äî Anomaly & Recommendation Summary
print("Creating anomaly and recommendation summary...\n")

summary_lines = []
summary_lines.append("=" * 80)
summary_lines.append("MUSIC ROYALTY ANALYSIS - EXECUTIVE SUMMARY")
summary_lines.append("=" * 80)
summary_lines.append("")

# Anomaly summary
n_anomalies = 0
if anomalies is not None and not anomalies.empty:
    n_anomalies = anomalies.shape[0]
    anomalies_sample = anomalies.head(50)
    anomalies_sample.to_csv(reports_dir / 'anomalies_sample_for_report.csv', index=False)
    
    summary_lines.append("ANOMALY DETECTION RESULTS")
    summary_lines.append("-" * 80)
    summary_lines.append(f"Flagged {n_anomalies:,} suspicious records by cross-sectional rules:")
    summary_lines.append("  - Extremely high revenue (top 0.5%)")
    summary_lines.append("  - Extremely high streams (top 0.5%)")
    summary_lines.append("  - Extremely high revenue-per-stream ratio (top 0.5%)")
    summary_lines.append("")
    summary_lines.append(f"See detailed report: anomalies_sample_for_report.csv")
    summary_lines.append("")
    print(f"‚úÖ Saved anomalies_sample_for_report.csv ({len(anomalies_sample)} rows)")
else:
    summary_lines.append("ANOMALY DETECTION RESULTS")
    summary_lines.append("-" * 80)
    summary_lines.append("No cross-sectional anomalies file found.")
    summary_lines.append("Ensure Notebook 04 has been run with anomaly detection enabled.")
    summary_lines.append("")

# Key findings
summary_lines.append("KEY FINDINGS")
summary_lines.append("-" * 80)
if top_artists is not None and not top_artists.empty:
    summary_lines.append(f"1. Top Artist: {top_artists.iloc[0]['artist']} (‚Çπ{top_artists.iloc[0]['total_revenue']:,.2f})")
    summary_lines.append(f"2. Total Revenue: ‚Çπ{total_rev:,.2f}")
    summary_lines.append(f"3. Unique Artists: {len(top_artists):,}")
    summary_lines.append(f"4. Unique Tracks: {len(top_tracks) if top_tracks is not None else 'N/A':,}")
else:
    summary_lines.append("Run Notebook 03 or 04 to generate key findings.")
summary_lines.append("")

# Recommendations
summary_lines.append("RECOMMENDATIONS")
summary_lines.append("-" * 80)
recommendations = [
    "1. Data Quality: Audit flagged anomaly rows for ingestion duplicates or reporting errors.",
    "2. Revenue Focus: Prioritize artists/albums with concentrated revenue for marketing efforts.",
    "3. Attribution: If caller-tune or promo info is missing, request those fields for better analysis.",
    "4. Platform Strategy: Analyze platform-specific pricing and monetization strategies.",
    "5. Artist Development: Consider development programs for emerging talent.",
    "6. Catalog Management: Optimize catalog management for long-tail monetization.",
    "7. Cross-Platform: Encourage multi-platform distribution for better revenue diversification.",
    "8. Fraud Prevention: Implement automated fraud detection based on IsolationForest findings."
]
summary_lines.extend(recommendations)
summary_lines.append("")
summary_lines.append("=" * 80)
summary_lines.append(f"Report generated on: {pd.Timestamp.now().strftime('%Y-%m-%d %H:%M:%S')}")
summary_lines.append("=" * 80)

# Write to file
with open(reports_dir / 'report_recommendations.txt', 'w') as f:
    f.write('\n'.join(summary_lines))

print("‚úÖ Saved report_recommendations.txt\n")
print('\n'.join(summary_lines))

Creating anomaly and recommendation summary...

‚úÖ Saved anomalies_sample_for_report.csv (47 rows)
‚úÖ Saved report_recommendations.txt

MUSIC ROYALTY ANALYSIS - EXECUTIVE SUMMARY

ANOMALY DETECTION RESULTS
--------------------------------------------------------------------------------
Flagged 47 suspicious records by cross-sectional rules:
  - Extremely high revenue (top 0.5%)
  - Extremely high streams (top 0.5%)
  - Extremely high revenue-per-stream ratio (top 0.5%)

See detailed report: anomalies_sample_for_report.csv

KEY FINDINGS
--------------------------------------------------------------------------------
1. Top Artist: F A Sumon (‚Çπ9,318.86)
2. Total Revenue: ‚Çπ29,700.78
3. Unique Artists: 20
4. Unique Tracks: 50

RECOMMENDATIONS
--------------------------------------------------------------------------------
1. Data Quality: Audit flagged anomaly rows for ingestion duplicates or reporting errors.
2. Revenue Focus: Prioritize artists/albums with concentrated revenue for 

In [None]:
# Cell 9 ‚Äî Assemble PDF Pages Programmatically
print("Assembling final PDF report...\n")

try:
    import img2pdf
    
    # Collect all PNG images
    image_files = sorted(fig_dir.glob('*.png'))
    pngs = [str(p) for p in image_files]
    
    if pngs:
        print(f"Found {len(pngs)} PNG images to include in report:")
        for png in pngs:
            print(f"  - {Path(png).name}")
        
        # Create PDF from images
        generated_pdf = reports_dir / 'final_report_generated.pdf'
        with open(generated_pdf, 'wb') as f:
            f.write(img2pdf.convert(pngs))
        print(f"\n‚úÖ Created generated report: {generated_pdf.name}\n")
        
        # Append original reference PDF if it exists
        reference_pdf = DATA_DIR / 'Dashboard - Overview (1).pdf'
        final_pdf = reports_dir / 'final_report_combined.pdf'
        
        writer = PdfWriter()
        
        # Add generated pages
        if generated_pdf.exists():
            reader = PdfReader(str(generated_pdf))
            for page in reader.pages:
                writer.add_page(page)
            print(f"Added {len(reader.pages)} pages from generated report")
        
        # Add reference PDF if it exists
        if reference_pdf.exists():
            reader = PdfReader(str(reference_pdf))
            for page in reader.pages:
                writer.add_page(page)
            print(f"Added {len(reader.pages)} pages from reference PDF")
        else:
            print(f"‚ö†Ô∏è Reference PDF not found: {reference_pdf}")
        
        # Write final combined PDF
        with open(final_pdf, 'wb') as f:
            writer.write(f)
        
        print(f"\nüéâ Final combined PDF created: {final_pdf}")
        print(f"   Location: {final_pdf.resolve()}")
        
    else:
        print("‚ö†Ô∏è No PNG images found to create PDF")
        print("   Please run previous cells to generate visualizations")
        
except ImportError:
    print("‚ö†Ô∏è img2pdf not installed. Installing...")
    import subprocess
    subprocess.check_call(['pip', 'install', 'img2pdf'])
    print("‚úÖ Installed img2pdf. Please re-run this cell.")
except Exception as e:
    print(f"‚ùå Error creating PDF: {e}")
    print("\nAlternative: Use 'File > Export Notebook As > PDF' in Jupyter to export this notebook.")

Assembling final PDF report...

Found 7 PNG images to include in report:
  - dsp_revenue_share.png
  - monthly_revenue_trend.png
  - platform_performance_dashboard.png
  - platform_revenue_comparison.png
  - revenue_by_country.png
  - stream_type_share.png
  - top_15_artists.png


Image contains an alpha channel. Computing a separate soft mask (/SMask) image to store transparency in PDF.
Image contains an alpha channel. Computing a separate soft mask (/SMask) image to store transparency in PDF.
Image contains an alpha channel. Computing a separate soft mask (/SMask) image to store transparency in PDF.
Image contains an alpha channel. Computing a separate soft mask (/SMask) image to store transparency in PDF.
Image contains an alpha channel. Computing a separate soft mask (/SMask) image to store transparency in PDF.
Image contains an alpha channel. Computing a separate soft mask (/SMask) image to store transparency in PDF.
Image contains an alpha channel. Computing a separate soft mask (/SMask) image to store transparency in PDF.



‚úÖ Created generated report: final_report_generated.pdf

Added 7 pages from generated report
‚ö†Ô∏è Reference PDF not found: ../data/Dashboard - Overview (1).pdf

üéâ Final combined PDF created: ../outputs/reports/final_report_combined.pdf
   Location: /home/parambrata-ghosh/Development/Personal/internship-assignment/reresumesubmissionparambrataghosh/outputs/reports/final_report_combined.pdf


In [None]:
# Cell 10 ‚Äî Package Deliverables into ZIP
print("Creating deliverable package (ZIP)...\n")

import shutil
from datetime import datetime

# Create timestamp for unique filename
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
zip_name = f'deliverable_package_{timestamp}'

# Create ZIP archive
try:
    shutil.make_archive(
        str(out_dir / zip_name), 
        'zip', 
        root_dir=str(out_dir),
        base_dir='.'
    )
    
    zip_path = out_dir / f'{zip_name}.zip'
    zip_size_mb = zip_path.stat().st_size / (1024 * 1024)
    
    print(f"‚úÖ Deliverable package created: {zip_path.name}")
    print(f"   Location: {zip_path.resolve()}")
    print(f"   Size: {zip_size_mb:.2f} MB")
    print("\nPackage contents:")
    print("  üìÅ cleaned/      - Cleaned and normalized datasets")
    print("  üìÅ figures/      - Visualization PNGs and tables")
    print("  üìÅ reports/      - Final PDF and recommendation summary")
    print("  üìÅ advanced/     - Advanced analysis outputs (if Notebook 04 was run)")
    
except Exception as e:
    print(f"‚ùå Error creating ZIP: {e}")

## Deliverables Summary

**Generated Files**:

### Visualizations (outputs/figures/)
- `kpi_cards.png` - Key performance indicator dashboard
- `dsp_revenue.png` - Platform/store revenue comparison
- `stream_pie.png` - Stream composition breakdown
- `country_map.png` - Geographic revenue distribution (if applicable)
- Various other charts from Notebooks 03 & 04

### Tables (outputs/figures/)
- `top_artists_table.html` - Top 20 artists by revenue
- `top_albums_table.html` - Top 20 albums by revenue
- `top_tracks_table.html` - Top 20 tracks by revenue
- `country_revenue.csv` - Revenue by country breakdown

### Reports (outputs/reports/)
- `final_report_combined.pdf` - Complete report with visualizations + reference PDF
- `report_recommendations.txt` - Executive summary and recommendations
- `anomalies_sample_for_report.csv` - Flagged suspicious records

### Package
- `deliverable_package_[timestamp].zip` - Complete deliverable archive

**Next Steps**:
1. Review the final PDF report
2. Share the ZIP package with stakeholders
3. Implement recommended actions
4. Schedule regular reporting cadence

**Notes**:
- All analyses are time-agnostic (no date dependencies)
- Cross-sectional insights focus on current state
- Scenario projections replace time-series forecasts
- Original reference PDF appended for completeness