<a href="https://colab.research.google.com/github/nmansour67/skills-introduction-to-github/blob/main/Data_Generator_Prompt_Text.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

CONTEXT:
I am a hospital data architect evaluating a robotic surgery platform. I need to
generate a synthetic dataset of 500 surgical cases to analyze whether robotic
surgery performance varies by patient BMI. This dataset will be used for
statistical analysis in a separate analysis phase.

ROLE:
You are a Senior Hospital Data Analyst with expertise in surgical outcomes
research and Python-based synthetic data generation.

TASK:
Generate a complete, executable Python script that creates a realistic synthetic
dataset of 500 surgical cases and saves it as a CSV file for download.

DATASET SPECIFICATIONS:

Required Columns:
1. Patient_ID: Unique identifier (format: "STD-0001" for standard, "ROB-0001" for robotic)
2. BMI: Body Mass Index in kg/m²
3. Surgery_Type: Either "Standard" or "Robot" (250 cases each)
4. Duration_Minutes: Surgical duration in minutes

BMI Distribution:
- Normal distribution centered at mean = 30 kg/m²
- Standard deviation = 6 kg/m²
- Range: 18 to 50 kg/m² (truncated normal distribution)

CRITICAL CONSTRAINT - THE INTERACTION EFFECT:

Standard Surgery Duration:
- Baseline: 120 minutes average
- BMI effect: Slight linear increase with higher BMI (+0.5 min per BMI point above 30)
- Variability: ±15 minutes (standard deviation)
- Pattern: Relatively consistent across all BMI ranges

Robotic Surgery Duration (⚠️ BMI-DEPENDENT PERFORMANCE):

IF BMI < 35 (Non-obese to Class I Obesity):
  • Baseline: 90 minutes (30 minutes FASTER than standard)
  • BMI effect: Minimal (+0.2 min per BMI point)
  • Variability: ±12 minutes
  • Rationale: Better visualization, precision, tremor elimination

IF BMI ≥ 35 (Class II/III Obesity):
  • Baseline: 150 minutes (30 minutes SLOWER than standard)
  • BMI effect: Steep increase (+1.5 min per BMI point above 35)
  • Variability: ±20 minutes (higher due to technical challenges)
  • Rationale: Port placement difficulty, limited instrument reach,
               workspace constraints, loss of haptic feedback

OUTPUT FORMAT:

Provide complete Python code that:
1. Installs required libraries (pandas, numpy)
2. Generates 500 surgical cases with the specifications above
3. Creates a pandas DataFrame with the 4 required columns
4. Saves the DataFrame as 'robot_surgery_data.csv' in /tmp/ directory
5. Downloads the CSV file to my computer via files.download()
6. Displays summary statistics showing the interaction effect

The code must be ready to copy/paste into Google Colab and run immediately.

SUCCESS CRITERIA:
The generated CSV should demonstrate clear interaction effect when analyzed:
- Low BMI patients: Robot faster than Standard
- High BMI patients: Robot slower than Standard
- BMI threshold around 35 where crossover occurs

CONTEXT:
I have a CSV file called 'robot_surgery_data.csv' containing 500 surgical cases
comparing robotic vs. standard surgery techniques. The dataset includes Patient_ID,
BMI, Surgery_Type, and Duration_Minutes. I suspect there is a BMI-dependent
interaction effect where robotic surgery performance varies based on patient obesity
level, but I need statistical analysis and visualization to confirm this.

ROLE:
You are a Senior Biostatistician and Data Visualization Expert specializing in
surgical outcomes research and interaction effect analysis.

TASK:
Generate a complete, executable Python script that:
1. Prompts me to UPLOAD the CSV file 'robot_surgery_data.csv'
2. Loads and validates the data
3. Performs comprehensive statistical analysis
4. Creates professional visualizations revealing the interaction effect
5. Generates an executive summary with clinical recommendations

REQUIRED ANALYSIS COMPONENTS:

1. DATA UPLOAD & VALIDATION:
   • Use files.upload() to prompt for CSV upload
   • Load CSV with pandas
   • Validate columns: Patient_ID, BMI, Surgery_Type, Duration_Minutes
   • Check for missing values
   • Display data summary (first 10 rows, descriptive statistics)

2. DESCRIPTIVE STATISTICS:
   • Overall statistics by Surgery_Type
   • BMI-stratified analysis (split at BMI = 35)
   • Compare robot vs. standard within each BMI category
   • Calculate mean duration differences

3. INTERACTION EFFECT DETECTION:
   • Perform linear regression with interaction term:
     Duration ~ Surgery_Type + BMI + (Surgery_Type × BMI)
   • Report coefficients and p-values
   • Explain whether interaction is statistically significant

4. VISUALIZATIONS (Create 2-panel figure):

   Panel 1: Scatter Plot with Regression Lines
   • X-axis: BMI (18-50)
   • Y-axis: Duration_Minutes
   • Color-coded points: Standard (blue) vs Robot (red)
   • Fitted polynomial curves for each surgery type
   • Vertical line at BMI = 35 (critical threshold)
   • Annotations showing "Robot Faster" and "Robot Slower" regions
   
   Panel 2: Grouped Box Plots
   • X-axis: BMI categories (Normal <25, Overweight 25-30, Obese I 30-35,
             Obese II 35-40, Obese III >40)
   • Y-axis: Duration_Minutes
   • Grouped boxes: Standard vs Robot within each BMI category
   • Shows crossover pattern clearly

5. EXECUTIVE SUMMARY (Print to console):
   • Key finding: "Robot is X min faster for BMI <35, Y min slower for BMI ≥35"
   • Statistical significance of interaction effect
   • Clinical recommendation: Which patients should receive robotic surgery?
   • Financial implication: % of patients who benefit vs. don't benefit

6. OUTPUT FILES:
   • Save visualization as 'surgery_interaction_analysis.png' (300 DPI)
   • Download visualization automatically
   • Option to save statistical results as 'analysis_results.txt'

OUTPUT FORMAT:

Provide complete Python code ready to copy/paste into Google Colab that:
1. Clearly labels each analysis section with print statements
2. Uses matplotlib/seaborn for high-quality visualizations
3. Includes comprehensive comments explaining each step
4. Handles potential errors (e.g., wrong file format, missing columns)
5. Produces publication-ready figures
6. Generates actionable clinical insights

CRITICAL REQUIREMENTS:
- The code must START by prompting for file upload (not assume file is present)
- Must detect and quantify the interaction effect statistically
- Must create visualizations that clearly show crossover at BMI ≈ 35
- Must provide clinical interpretation, not just statistical output

SUCCESS CRITERIA:
After running this code, I should be able to:
- Confirm whether the interaction effect exists (statistical test)
- Visualize exactly where the crossover occurs (BMI threshold)
- Make evidence-based decisions about patient selection for robotic surgery
- Present findings to hospital leadership with professional figures