<a href="https://colab.research.google.com/github/eoinleen/Biophysics-general/blob/main/FIDA_Binding_Kd_fit.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [29]:
"""
================================================================================
FIDA (Flow-Induced Dispersion Analysis) Binding Data Analysis Tool v3.0
================================================================================

EXPECTED INPUT DATA FORMAT:
---------------------------
The script expects an Excel file (.xlsx) with specific column naming:
• Concentration column: 'protein_conc', 'concentration', 'ligand_conc', etc.
• Rh measurement columns: 'Rh_1', 'Rh_2', 'Rh_3', etc. (technical replicates)
• Units: Concentration in μM, Rh values in nm
• Missing data: Leave blank or use NaN (automatically handled)

Example structure:
| protein_conc | Rh_1  | Rh_2  | Rh_3  |
|-------------|-------|-------|-------|
| 10.0        | 3.78  | 3.89  | 3.72  |
| 5.0         | 3.77  | 3.89  | 3.71  |
| 2.5         | 3.75  | 3.89  | NaN   |
| 0.0         | 3.40  | 3.41  | 3.38  |

BINDING MODELS AND EQUATIONS:
=============================

1. Single-Site Binding (Rh Format):
   Rh_obs = Rh_free + (Rh_bound - Rh_free) × [L] / (Kd + [L])

   This is the fundamental 1:1 binding equation describing the equilibrium between
   free protein and protein-ligand complex. As ligand concentration increases,
   the observed Rh approaches Rh_bound asymptotically.

2. GraphPad Prism One-Site Binding:
   Y = Bmax × X / (Kd + X) + Background

   Mathematically identical to model 1, but parameterized differently for
   compatibility with GraphPad Prism. Background = Rh_free, Bmax = binding amplitude.

3. Prism with Non-Specific Binding:
   Y = Bmax × X / (Kd + X) + NS × X + Background

   Accounts for linear non-specific interactions that don't saturate. The NS term
   represents non-specific binding that increases linearly with concentration.

4. Hill Equation (Cooperative Binding):
   Rh_obs = Rh_free + (Rh_bound - Rh_free) × [L]^n / (Kd^n + [L]^n)

   Includes cooperativity parameter (n). When n > 1, binding shows positive
   cooperativity (sigmoidal curve); n < 1 indicates negative cooperativity.

FITTING ALGORITHM COMPARISON:
=============================
GraphPad Prism uses iterative least-squares fitting with user-defined constraints
and starting values. This Python script uses scipy.optimize.curve_fit, which:

• Employs the Levenberg-Marquardt algorithm (robust non-linear least squares)
• Automatically estimates starting parameters from data characteristics
• Performs 10,000+ iterations until convergence (maxfev=10000)
• Uses covariance matrix for parameter uncertainty estimation
• Provides identical statistical rigor to commercial software

The key advantage: automatic parameter initialization and robust convergence
without manual intervention, while maintaining publication-quality accuracy.

STATISTICAL OUTPUTS:
===================
• R²: Coefficient of determination (>0.95 excellent, >0.90 good)
• AIC: Akaike Information Criterion for model selection (lower = better)
• Parameter uncertainties: Standard errors from covariance matrix
• Model comparison: Statistical ranking of all fitted models

================================================================================
"""


# =============================================================================
# EXCEL EXPORT AND ZIP CREATION
# =============================================================================

print("Creating Excel export with all analysis results...")

# Prepare experimental data for export (fix variable name issue)
if 'experimental_data' in locals():
    # Convert experimental_data to proper format with consistent lengths
    exp_export_data = []
    max_len = len(experimental_data.get('concentration_experimental', []))

    # Create rows for each experimental point
    for i in range(max_len):
        row = {'concentration': experimental_data['concentration_experimental'][i]}
        for rh_col in rh_columns:
            exp_col_name = f'{rh_col}_experimental'
            if exp_col_name in experimental_data and i < len(experimental_data[exp_col_name]):
                row[rh_col] = experimental_data[exp_col_name][i]
            else:
                row[rh_col] = np.nan
        exp_export_data.append(row)

    # Convert to DataFrame
    exp_df = pd.DataFrame(exp_export_data) if exp_export_data else None
else:
    # Fallback: recreate experimental data in long format
    exp_export_rows = []
    for rh_col in rh_columns:
        results = all_results[rh_col]
        if results is not None:
            conc_clean, rh_clean = results['clean_data']
            for conc, rh in zip(conc_clean, rh_clean):
                exp_export_rows.append({
                    'dataset': rh_col,
                    'concentration': conc,
                    'rh_value': rh
                })
    exp_df = pd.DataFrame(exp_export_rows) if exp_export_rows else None

# Create Excel file with multiple sheets
excel_filename = f'FIDA_analysis_{filename.split(".")[0]}.xlsx'
with pd.ExcelWriter(excel_filename, engine='xlsxwriter') as writer:

    # Sheet 1: Original Data
    df.to_excel(writer, sheet_name='Original_Data', index=False)

    # Sheet 2: Experimental Data (cleaned)
    if exp_df is not None and not exp_df.empty:
        exp_df.to_excel(writer, sheet_name='Experimental_Data', index=False)

    # Sheet 3: Fit Curves
    if export_data:
        fit_df = pd.DataFrame(export_data)
        fit_df.to_excel(writer, sheet_name='Fit_Curves', index=False)

    # Sheet 4: Parameter Summary
    if summary_data:
        summary_df = pd.DataFrame(summary_data)
        summary_df.to_excel(writer, sheet_name='Parameters', index=False)

    # Sheet 5: Model Comparison
    if comparison_df is not None and not comparison_df.empty:
        comparison_df.to_excel(writer, sheet_name='Model_Comparison', index=False)

    # Sheet 6: Parameter Conversions
    if conversion_data:
        conversion_df = pd.DataFrame(conversion_data)
        conversion_df.to_excel(writer, sheet_name='Parameter_Conversions', index=False)

print(f"Excel file '{excel_filename}' created successfully!")

# Create CSV files for external plotting
csv_filename = f'FIDA_fit_curves_{filename.split(".")[0]}.csv'
if export_data:
    fit_df = pd.DataFrame(export_data)
    fit_df.to_csv(csv_filename, index=False)
    print(f"CSV file '{csv_filename}' created successfully!")

exp_csv_filename = f'FIDA_experimental_data_{filename.split(".")[0]}.csv'
if exp_df is not None and not exp_df.empty:
    exp_df.to_csv(exp_csv_filename, index=False)
    print(f"Experimental data CSV '{exp_csv_filename}' created successfully!")
else:
    print("No experimental data available for CSV export")

# Save the main combined plot
main_plot_filename = f'FIDA_combined_analysis_{filename.split(".")[0]}.png'
fig.savefig(main_plot_filename, dpi=300, bbox_inches='tight',
           facecolor='none', edgecolor='none', transparent=True)
print(f"Main plot '{main_plot_filename}' saved successfully!")

# Create comprehensive statistics file
stats_filename = f'FIDA_statistics_{filename.split(".")[0]}.txt'
with open(stats_filename, 'w') as f:
    f.write("FIDA BINDING ANALYSIS - STATISTICAL SUMMARY\n")
    f.write("="*60 + "\n\n")

    f.write(f"Analysis Date: {pd.Timestamp.now().strftime('%Y-%m-%d %H:%M:%S')}\n")
    f.write(f"Input File: {filename}\n")
    f.write(f"Datasets Analyzed: {len(rh_columns)}\n\n")

    # Model explanations
    f.write("BINDING MODELS FITTED:\n")
    f.write("-" * 30 + "\n")
    f.write("1. Single-site binding (Rh format):\n")
    f.write("   Rh_obs = Rh_free + (Rh_bound - Rh_free) × [L] / (Kd + [L])\n\n")
    f.write("2. Prism one-site binding:\n")
    f.write("   Y = Bmax × X / (Kd + X) + Background\n\n")
    f.write("3. Prism with non-specific binding:\n")
    f.write("   Y = Bmax × X / (Kd + X) + NS × X + Background\n\n")
    f.write("4. Hill equation (cooperative binding):\n")
    f.write("   Rh_obs = Rh_free + (Rh_bound - Rh_free) × [L]^n / (Kd^n + [L]^n)\n\n")

    # Statistical interpretation
    f.write("STATISTICAL INTERPRETATION:\n")
    f.write("-" * 30 + "\n")
    f.write("• R² > 0.95: Excellent fit\n")
    f.write("• R² > 0.90: Good fit\n")
    f.write("• R² > 0.80: Acceptable fit\n")
    f.write("• Lower AIC values indicate better models\n")
    f.write("• Hill coefficient (n):\n")
    f.write("  - n = 1: Non-cooperative binding\n")
    f.write("  - n > 1: Positive cooperativity\n")
    f.write("  - n < 1: Negative cooperativity\n\n")

    # Results summary
    if comparison_df is not None and not comparison_df.empty:
        f.write("MODEL COMPARISON RESULTS:\n")
        f.write("-" * 30 + "\n")
        f.write(comparison_df.to_string(index=False))
        f.write("\n\n")

        # Recommendations
        f.write("RECOMMENDATIONS:\n")
        f.write("-" * 15 + "\n")
        for dataset in comparison_df['Dataset'].unique():
            recommended = comparison_df[(comparison_df['Dataset'] == dataset) &
                                     (comparison_df['Recommended'] == True)]
            if not recommended.empty:
                best_model = recommended.iloc[0]['Model']
                best_r2 = recommended.iloc[0]['R²']
                best_aic = recommended.iloc[0]['AIC']
                f.write(f"{dataset}: {best_model} (R² = {best_r2:.4f}, AIC = {best_aic:.2f})\n")

    # Parameter summary
    if summary_data:
        f.write("\n\nDETAILED PARAMETERS:\n")
        f.write("-" * 20 + "\n")
        summary_df = pd.DataFrame(summary_data)
        f.write(summary_df.to_string(index=False))

print(f"Statistics file '{stats_filename}' created successfully!")

# Create README file
readme_filename = f'FIDA_README_{filename.split(".")[0]}.txt'
with open(readme_filename, 'w') as f:
    f.write("FIDA BINDING ANALYSIS RESULTS PACKAGE\n")
    f.write("="*50 + "\n\n")
    f.write("This package contains the complete analysis results from the FIDA\n")
    f.write("(Flow-Induced Dispersion Analysis) binding data analysis.\n\n")

    f.write("FILES INCLUDED:\n")
    f.write("-" * 15 + "\n")
    f.write(f"• {excel_filename} - Complete analysis in Excel format\n")
    f.write(f"• {csv_filename} - Fit curves for external plotting\n")
    f.write(f"• {exp_csv_filename} - Experimental data points\n")
    f.write(f"• {main_plot_filename} - Combined analysis plots\n")

    # Add individual plot files if they exist
    if 'individual_figures' in locals() and individual_figures:
        for fig_name in individual_figures:
            f.write(f"• {fig_name} - Individual model plots\n")

    f.write(f"• {stats_filename} - Statistical summary and interpretation\n")
    f.write(f"• {readme_filename} - This file\n\n")

    f.write("EXCEL SHEETS EXPLAINED:\n")
    f.write("-" * 22 + "\n")
    f.write("• Original_Data: Your input data\n")
    f.write("• Experimental_Data: Cleaned data used for fitting\n")
    f.write("• Fit_Curves: Smooth curves for all fitted models\n")
    f.write("• Parameters: All fitted parameters with uncertainties\n")
    f.write("• Model_Comparison: Statistical comparison of all models\n")
    f.write("• Parameter_Conversions: Rh ↔ Prism format conversions\n\n")

    f.write("FOR GRAPHPAD PRISM USERS:\n")
    f.write("-" * 25 + "\n")
    f.write("Use the 'Parameter_Conversions' sheet to get parameters in\n")
    f.write("GraphPad Prism format. The conversion formulas are:\n")
    f.write("• Background = Rh_free\n")
    f.write("• Bmax = Rh_bound - Rh_free\n")
    f.write("• Kd = Kd (unchanged)\n\n")

    f.write("CITATION:\n")
    f.write("-" * 9 + "\n")
    f.write("If you use this analysis in your research, please cite:\n")
    f.write("FIDA Binding Data Analysis Tool v3.0 (2025)\n")
    f.write("Generated by Claude (Anthropic)\n")

print(f"README file '{readme_filename}' created successfully!")

# =============================================================================
# CREATE AND DOWNLOAD ZIP FILE
# =============================================================================

print("\nCreating ZIP package for download...")

zip_filename = f'FIDA_Analysis_Package_{filename.split(".")[0]}.zip'

# Collect all files to include in ZIP
files_to_zip = [
    excel_filename,
    stats_filename,
    readme_filename
]

# Add main plot
if os.path.exists(main_plot_filename):
    files_to_zip.append(main_plot_filename)

# Add CSV files if they were created successfully
if os.path.exists(csv_filename):
    files_to_zip.append(csv_filename)
if os.path.exists(exp_csv_filename):
    files_to_zip.append(exp_csv_filename)

# Add individual plot files if they exist
if 'individual_figures' in locals() and individual_figures:
    files_to_zip.extend([f for f in individual_figures if os.path.exists(f)])

# Create ZIP file
with zipfile.ZipFile(zip_filename, 'w', zipfile.ZIP_DEFLATED) as zipf:
    for file in files_to_zip:
        if os.path.exists(file):
            zipf.write(file)
            print(f"  Added {file} to ZIP")
        else:
            print(f"  Warning: {file} not found, skipping")

print(f"\nZIP package '{zip_filename}' created successfully!")
print(f"Package contains {len([f for f in files_to_zip if os.path.exists(f)])} files")

# Download the ZIP file
print("\nDownloading analysis package...")
files.download(zip_filename)

print("\n" + "="*60)
print("ANALYSIS COMPLETE!")
print("="*60)
print(f"✓ Excel analysis file: {excel_filename}")
print(f"✓ CSV data files: {csv_filename}, {exp_csv_filename}")
print(f"✓ Publication-quality plots: {main_plot_filename}")
if 'individual_figures' in locals() and individual_figures:
    print(f"✓ Individual model plots: {len(individual_figures)} files")
print(f"✓ Statistical summary: {stats_filename}")
print(f"✓ Complete package: {zip_filename}")
print("\nAll files have been packaged and downloaded!")
print("Check your downloads folder for the ZIP file.")

# Clean up individual files (optional)
cleanup_choice = input("\nClean up individual files? (ZIP contains everything) [y/n]: ").strip().lower()
if cleanup_choice == 'y':
    for file in files_to_zip:
        try:
            if os.path.exists(file):
                os.remove(file)
        except:
            pass
    print("Individual files cleaned up. ZIP package retained.")

print("\nThank you for using FIDA Binding Analysis Tool v3.0!")

Creating Excel export with all analysis results...
Excel file 'FIDA_analysis_20250724 (24).xlsx' created successfully!
CSV file 'FIDA_fit_curves_20250724 (24).csv' created successfully!
Experimental data CSV 'FIDA_experimental_data_20250724 (24).csv' created successfully!
Main plot 'FIDA_combined_analysis_20250724 (24).png' saved successfully!
Statistics file 'FIDA_statistics_20250724 (24).txt' created successfully!
README file 'FIDA_README_20250724 (24).txt' created successfully!

Creating ZIP package for download...
  Added FIDA_analysis_20250724 (24).xlsx to ZIP
  Added FIDA_statistics_20250724 (24).txt to ZIP
  Added FIDA_README_20250724 (24).txt to ZIP
  Added FIDA_combined_analysis_20250724 (24).png to ZIP
  Added FIDA_fit_curves_20250724 (24).csv to ZIP
  Added FIDA_experimental_data_20250724 (24).csv to ZIP
  Added FIDA_individual_models_1 Rh (nm)_20250724 (24).png to ZIP
  Added FIDA_individual_models_2 Rh (nm)_20250724 (24).png to ZIP

ZIP package 'FIDA_Analysis_Package_20250

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>


ANALYSIS COMPLETE!
✓ Excel analysis file: FIDA_analysis_20250724 (24).xlsx
✓ CSV data files: FIDA_fit_curves_20250724 (24).csv, FIDA_experimental_data_20250724 (24).csv
✓ Publication-quality plots: FIDA_combined_analysis_20250724 (24).png
✓ Individual model plots: 2 files
✓ Statistical summary: FIDA_statistics_20250724 (24).txt
✓ Complete package: FIDA_Analysis_Package_20250724 (24).zip

All files have been packaged and downloaded!
Check your downloads folder for the ZIP file.

Clean up individual files? (ZIP contains everything) [y/n]: y
Individual files cleaned up. ZIP package retained.

Thank you for using FIDA Binding Analysis Tool v3.0!
