# Disease Outbreak Predictor
Course: CPE 3018 – Numerical Methods <br>
Project: Data-Driven Disease Outbreak Predictor <br>
Language: Python (Jupyter Notebook) <br>

Overview <br>
This notebook implements a disease outbreak predictor based on Newton's Divided Difference Method. It:
1. Uses a small number of time-based data points (days vs. cases/deaths).
2. Builds an approximating polynomial using divided differences.
3. Performs:
 - Interpolation (estimating missing data within the time range).
 - Extrapolation (estimating future cases beyond the time range).
4. Provides plots which shows the:
 - Observed data
 - Interpolated curve
 - Extrapolated future predictions

## 0. Project Setup

In [1]:
# Import libraries
import numpy as np                  # For numerical computations, arrays, math
import pandas as pd                 # For tabular display and basic data manipulation
import matplotlib.pyplot as plt     # For plotting and visualization
import warnings

# Suppress non-critical
warnings.filterwarnings('ignore')

# Improve output readability
np.set_printoptions(precision=6, suppress=True)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)

## 1. Class Implementations

### 1.1. Divided Difference Method Implementation
 - This class implements the Divided Difference method with Newton polynomials <br>

Contains the following methods: <br>
> **_comp_dd**: Constructs the divided differences table <br>
> **evaluate**: Evaluates the resulting interpolating polynomial for any x (interpolation or extrapolation) <br>
> **print_dd_table**: Provides a console printout of the divided differences table <br>

In [2]:
class NewtonDividedDifference:
    """
    Newton's Divided Difference Interpolation Method

    Attributes:
        x_data (np.array): Time points (independent variable)
        y_data (np.array): Case/death counts (dependent variable)
        n (int): Number of data points
        dd_table (np.array): 2D array of divided differences
    """

    def __init__(self, x_data, y_data):
        """
        Args:
            x_data (list/array): Time values (e.g., days)
            y_data (list/array): Case or death counts
        """
        self.x_data = np.array(x_data, dtype=float)
        self.y_data = np.array(y_data, dtype=float)
        self.n = len(self.x_data)

        # Basic input validation
        if len(self.x_data) != len(self.y_data):
            raise ValueError("x_data and y_data must have the same length")
        if self.n < 2:
            raise ValueError("At least 2 data points are required")

        # Sort data by x-values to maintain proper order
        sorted_indices = np.argsort(self.x_data)
        self.x_data = self.x_data[sorted_indices]
        self.y_data = self.y_data[sorted_indices]

        # Compute the divided differences table once at initialization
        self.dd_table = self._comp_dd()

    def _comp_dd(self):
        """
        Compute the divided differences table.

        The table is n x n, where:
        - Column 0: f[x_i] (the original function values)
        - Column 1: f[x_i, x_{i+1}] (first-order divided differences)
        - Column 2: f[x_i, x_{i+1}, x_{i+2}] (second-order)
        - etc. to column n

        Returns:
            np.array: (n x n) divided differences table
        """
        # Output variable
        diff_table = np.zeros((self.n, self.n))       # Initialize with zeros
        diff_table[:, 0] = self.y_data                                  # Put the data values in the first column

        # Fill the rest of the table using the divided difference formula
        for j in range(1, self.n):          # column (order of difference)
            for i in range(self.n - j):     # row
                numerator = diff_table[i+1, j-1] - diff_table[i, j-1]
                denominator = self.x_data[i+j] - self.x_data[i]
                diff_table[i, j] = numerator / denominator

        return diff_table

    def evaluate(self, x):
        """
        Evaluate the Newton interpolating polynomial at one or more points.

        Uses a nested multiplication scheme for evaluation:
        P(x) = a_0 + (x-x_0)[a_1 + (x-x_1)[a_2 + ...]]

        Args:
            x (float/array): Point(s) at which to evaluate P(x)

        Returns:
            float/np.array: Interpolated (or extrapolated) values
        """
        # Input handling
        x = np.atleast_1d(x)

        # Output variable
        results = np.zeros_like(x, dtype=float)

        # Evaluate the polynomial at each x-value
        for idx, x_val in enumerate(x):
            # Start with the highest-order coefficient
            result = self.dd_table[0, self.n - 1]

            # Horner-like scheme going backward through columns
            for j in range(self.n - 2, -1, -1):
                result = result * (x_val - self.x_data[j]) + self.dd_table[0, j]

            # Final result saved to array
            results[idx] = result

        return results if len(results) > 1 else results[0]      # If the input was scalar, return scalar; else return array

    def show_ddTable(self):
        """
        Return the divided differences table as a nicely formatted DataFrame.

        Returns:
            pd.DataFrame: Table of divided differences labeled by x_i and order
        """
        # Creating the table with a DataFrame
        df = pd.DataFrame(self.dd_table[:, :],
                          index=[f'x_{i}={x:.0f}' for i, x in enumerate(self.x_data)])
        column_names = ['f[x_i]']
        
        # Generating the headers
        for i in range(1, self.n):
            column_names.append(f'f[x_i...x_{i}]')
        df.columns = column_names[:self.n]

        print(df)


### 1.2. Disease Outbreak Predictor Class
 - This class is the main class that actually generates new data, either filling in missing data using interpolation or predicting future data using extrapolation.

Contains the following methods: <br>
> **input_data**: Public method to input observed case and deaths data <br>
> **interpolate**: Fills in missing data <br>
> **extrapolate**: Predicts future data <br>
> **retrieve_table**: Provides a console printout of the internally generated divided difference table

In [3]:
class DiseaseOutbreakPredictor:
    """ Data-Driven Disease Outbreak Predictor """
    def __init__(self, days, data_name, data):
        self.days = np.array([], dtype=float)
        self.data = np.array([], dtype=float)
        self.data_name = None
        self.approximator = None
        self.table = None

        # List of days must be provided
        if days is None or len(days) == 0:
            raise ValueError("List of days must be provided!")
        self.days = np.array(days, dtype=float)

        # data_name should be a string
        if not isinstance(data_name, str) or data_name.strip() == "":
            raise ValueError("Data name must be a non-empty string")
        self.data_name = data_name

        # Input handling for list of data
        if data is None or len(data) != len(self.days):
            raise ValueError(f"{data_name} and days are not of the same size!")
        self.data = np.array(data, dtype=float)

        # Fill data to table
        try:
            self.table = pd.DataFrame({
                'Day': self.days.astype(int),
                self.data_name : self.data.astype(float)
            })
            self.table[f"{self.data_name} (Rounded)"] = np.ceil(self.data).astype(int)
            self.table["Source"] = "Observed"
        except:
            raise Exception("Table was not properly filled up!")

        # Create approximators only when there are at least 2 data points
        self.approximator = None
        if self.data.size >= 2:
            self.approximator = NewtonDividedDifference(self.days, self.data)

    def _approximate(self, x_days):
        """Shared private method for interpolation and extrapolation."""
        
        if self.approximator is None:
            raise ValueError(f"Not enough data points to create an approximator for {self.data_name}")
        
        x_arr = np.array(x_days, dtype=float)
        predictions = np.maximum(self.approximator.evaluate(x_arr), 0)

        return predictions

    def interpolate(self, fine = 0):
        """Interpolate values for given days inside the known range."""

        # fine should be a 0 or positive
        if not (isinstance(fine, (int, np.integer)) and fine >= 0):
            raise ValueError("fine should be 0 or positive!")
        
        if (fine == 0): # Default behavior which fills in missing days from day 1 up until the last day in the data
            observed_days_int = set(np.array(self.days, dtype=int))
            full_days = np.arange(1, int(self.days.max()) + 1)
            days_to_interpolate = [int(d) for d in full_days if int(d) not in observed_days_int]
        else:           # Otherwise, fine indicates the number of points to fill within the range.
            days_to_interpolate = np.linspace(self.days.min(), self.days.max(), fine)

        res = self._approximate(days_to_interpolate).astype(float)

        insert_data = pd.DataFrame({
            'Day' : days_to_interpolate,
            self.data_name : res
        })
        insert_data[f"{self.data_name} (Rounded)"] = np.ceil(res).astype(int)
        insert_data["Source"] = "Interpolated" if (fine == 0) else "Interpolated (detailed)"

        self.table = pd.concat([self.table, insert_data], ignore_index=True)
        self.table = self.table.sort_values('Day').reset_index(drop=True)

    def extrapolate(self, fine = 0, extend = 1.5):
        """Extrapolate values for days outside the observed data range."""
        
        # fine should be a a positive integer or 0
        if not (isinstance(fine, (int, np.integer)) and fine >= 0):
            raise ValueError("fine should be a positive integer or 0!")
        
        # extend should be greater than 1.5
        if (extend <= 1.5):
            raise ValueError("extend should be greater than 1.5!")
        
        # Determine the last day
        max_day = int(np.max(self.days))
        end_day = int(np.ceil(extend * max_day))
        
        if (fine == 0): # Default behavior which generate future days for extrapolation up to multiplier defined by extend
            future_days = list(np.arange(max_day + 1, end_day + 1))
        else:           # Otherwise, fine indicates the number of points to fill within the range.
            future_days = np.linspace(self.days.max(), end_day, fine)

        res = self._approximate(future_days).astype(float)

        insert_data = pd.DataFrame({
            'Day' : future_days,
            self.data_name : res
        })
        insert_data[f"{self.data_name} (Rounded)"] = np.ceil(res).astype(int)
        insert_data["Source"] = "Extrapolated" if (fine == 0) else "Extrapolated (detailed)"

        self.table = pd.concat([self.table, insert_data], ignore_index=True)
        self.table = self.table.sort_values('Day').reset_index(drop=True)

    def show_dataTable(self, filter=[]):
        """Display the data table with optional filtering."""
        if not isinstance(filter, list):
            raise ValueError("filter should be a list of source values to exclude")

        if len(filter) == 0:
            display_table = self.table.copy()
        else:
            display_table = self.table[~self.table['Source'].isin(filter)].copy()

        print(display_table.to_string(index=False))
        print()

    def show_ddTable(self):
        """Get the divided differences table for cases or deaths, if available."""
        self.approximator.show_ddTable()
    
    def summary(self):
        """Summary of the input data state."""
        try:
            print( pd.DataFrame([{
                'Number of data points': int(self.days.size),
                'Range of days': (f"Days {self.days.min():.0f} to {self.days.max():.0f}" if self.days.size > 0 else 'N/A'),
                f'Range of {self.data_name}': (f"{self.data.min():.0f} to {self.data.max():.0f}" if self.data.size > 0 else 'N/A'),
            }]).to_string(index=False))
            print()
        except Exception as e:
            print(f"No {self.data_name} data available to display")
            print(f"[{e}]")

## 2. Start of Data Analysis
 - To demonstrate the algorithm in action, a sample outbreak dataset is provided.
 - Modify values as desired, and when finished, click "Run cell and below" to recalculate from new data.

In [9]:
days_observed = [1, 3, 5, 7, 10]
cases_observed = [50, 120, 280, 650, 1800]
deaths_observed = [5, 4, 5, 7, 8]

try:
    cases = DiseaseOutbreakPredictor(days_observed, "Cases", cases_observed)
except Exception as e:
    print(f"[{e}]")

try:
    deaths = DiseaseOutbreakPredictor(days_observed, "Deaths", None)
except Exception as e:
    print(f"[{e}]")

[Deaths and days are not of the same size!]


### Input data in tabular form:

In [5]:
try:
    cases.show_dataTable()
except Exception as e:
    print("No case data available to display")
    print(f"[{e}]")

 Day  Cases  Cases (Rounded)   Source
   1   50.0               50 Observed
   3  120.0              120 Observed
   5  280.0              280 Observed
   7  650.0              650 Observed
  10 1800.0             1800 Observed



In [6]:
try:
    deaths.show_dataTable()
except Exception as e:
    print("No death data available to display")
    print(f"[{e}]")

No death data available to display
[name 'deaths' is not defined]


### Summary of input data:

In [8]:
try:
    cases.summary()
except Exception as e:
    print("No case data available to display")
    print(f"[{e}]")

try:
    deaths.summary()
except Exception as e:
    print("No death data available to display")
    print(e)

 Number of data points Range of days Range of Cases
                     5  Days 1 to 10     50 to 1800

No death data available to display
name 'deaths' is not defined


## 3. Divided Differences Analysis
 - We inspect the divided difference tables for cases and deaths (if available).

### Divided Differences Table for Cases:

In [None]:
try:
    cases.show_ddTable()
except Exception as e:
    print("No case data available to display")
    print(f"[{e}]")

### Divided Differences Table for Deaths:

In [None]:
print(predictor.retrieve_table('deaths'))

## 4. Interpolation: Estimating Missing Values
 - We first determine which days are missing in the input data

 - These values 'fill in the gaps' on days when no data was recorded
 - Because these days are inside the observed range, interpolation is relatively reliable.

### Approximated cases data:

In [None]:
try:
    interpolated_cases = np.atleast_1d(predictor.interpolate(days_to_interpolate, data_type='cases'))
    interp_cases_df = pd.DataFrame({
        'Day': days_to_interpolate,
        'Cases': interpolated_cases,
        'Cases (rounded up)': np.ceil(interpolated_cases).astype(int)
    })
    print(interp_cases_df.to_string(index=False))
except ValueError as e:
    print("Cannot interpolate cases:", e)

In [None]:
try:
    interpolated_deaths = np.atleast_1d(predictor.interpolate(days_to_interpolate, data_type='deaths'))
    interp_deaths_df = pd.DataFrame({
        'Day': days_to_interpolate,
        'Deaths': interpolated_deaths,
        'Deaths (rounded up)': np.ceil(interpolated_deaths).astype(int)
    })
    print(interp_deaths_df.to_string(index=False))
except ValueError as e:
    print("Cannot interpolate deaths:", e)

## 5. Extrapolation: Future Predictions
 - The same exact method can be done to predict cases and deaths for future days.

In [None]:
# Automatically generate future days for extrapolation up to 1.5x the maximum observed day
max_day = int(np.max(predictor.days))
end_day = int(np.ceil(1.5 * max_day))
future_days = list(np.arange(max_day + 1, end_day + 1))

In [None]:
try:
    future_cases = predictor.extrapolate(future_days, 'cases')

    extrap_cases_df = pd.DataFrame({
        'Day': future_days,
        'Cases': np.atleast_1d(future_cases),
        'Cases (rounded up)': np.ceil(future_cases).astype(int)
    })

    print(extrap_cases_df.to_string(index=False))
except Exception as e:
    print("Cannot extrapolate cases:", e)


In [None]:
try:
    future_deaths = predictor.extrapolate(future_days, 'deaths')

    extrap_deaths_df = pd.DataFrame({
        'Day': future_days,
        'Deaths': np.atleast_1d(future_deaths),
        'Deaths (rounded up)': np.ceil(future_deaths).astype(int)
    })

    print(extrap_deaths_df.to_string(index=False))
except Exception as e:
    print("Cannot extrapolate deaths:", e)

 - Predictions near the last day are more reasonable.
 - Predictions far from the last day are highly uncertain.
 - Of course, real outbreaks also depend on interventions, behavior, and many other factors not accounted for in this heavily simplified model.

In [None]:
try:
    obs = obs_cases_df.copy()
    obs['Source'] = 'Observed'

    interp = interp_cases_df.copy()
    interp['Source'] = 'Interpolated'

    extrap = extrap_cases_df.copy()
    extrap['Source'] = 'Extrapolated'

    merged_cases = pd.concat([obs, interp, extrap], ignore_index=True)
    merged_cases = merged_cases.sort_values('Day').reset_index(drop=True)

    print(merged_cases.to_string(index=False))
except Exception as e:
    print("Failed to merge cases data:", e)

In [None]:
try:
    obs = obs_deaths_df.copy()
    obs['Source'] = 'Observed'

    interp = interp_deaths_df.copy()
    interp['Source'] = 'Interpolated'

    extrap = extrap_deaths_df.copy()
    extrap['Source'] = 'Extrapolated'

    merged_deaths = pd.concat([obs, interp, extrap], ignore_index=True)
    merged_deaths = merged_deaths.sort_values('Day').reset_index(drop=True)

    print(merged_deaths.to_string(index=False))
except Exception as e:
    print("Failed to merge deaths data:", e)

## 7. Data Visualization
This section generates plots:
 - Cases (observed, interpolation, extrapolation)
 - Deaths (observed, interpolation, extrapolation)


In [None]:
try:
    plt.figure(figsize=(10,6))

    # Integer days as points in the plot
    marker_map = {'Observed': 'o', 'Interpolated': 'D', 'Extrapolated': '^'}
    color_map = {'Observed': 'C0', 'Interpolated': 'C1', 'Extrapolated': 'C2'}
    size_map = {'Observed': 140, 'Interpolated': 100, 'Extrapolated': 100}

    for src in ['Observed', 'Interpolated', 'Extrapolated']:
        subset = merged_cases[merged_cases['Source'] == src]
        plt.scatter(subset['Day'], subset['Cases'],
                    marker=marker_map[src],
                    s=size_map[src],
                    c=color_map[src],
                    edgecolor='k',
                    linewidth=0.9,
                    label=f'{src}',
                    zorder=6)

    # Generate points in between the integer days
    x_interp = np.linspace(predictor.days.min(), predictor.days.max(), 300)
    x_extrap = np.linspace(predictor.days.max(), end_day, 300)

    # Smooth interpolated curve
    y_smooth_interp = np.atleast_1d(predictor.interpolate(x_interp, data_type='cases'))
    plt.plot(x_interp, y_smooth_interp, label='Interpolated (smooth)', linestyle='--', c='C1', linewidth=2, zorder=3)

    # Smooth extrapolated curve
    y_smooth_extrap = np.atleast_1d(predictor.extrapolate(x_extrap, data_type='cases'))
    plt.plot(x_extrap, y_smooth_extrap, label='Extrapolated (smooth)', linestyle='--', c='C2', linewidth=2, zorder=2)

    plt.xlabel('Day')
    plt.ylabel('Cases')
    plt.title('Cases Data')
    plt.legend()
    plt.grid(True, linestyle='--', alpha=0.5)
    plt.tight_layout()
    plt.show()
except Exception as e:
    print('Plotting failed:', e)

In [None]:
try:
    plt.figure(figsize=(10,6))

    # Integer days as points in the plot
    marker_map = {'Observed': 'o', 'Interpolated': 'D', 'Extrapolated': '^'}
    color_map = {'Observed': 'C0', 'Interpolated': 'C1', 'Extrapolated': 'C2'}
    size_map = {'Observed': 140, 'Interpolated': 100, 'Extrapolated': 100}

    for src in ['Observed', 'Interpolated', 'Extrapolated']:
        subset = merged_deaths[merged_deaths['Source'] == src]
        plt.scatter(subset['Day'], subset['Deaths'],
                    marker=marker_map[src],
                    s=size_map[src],
                    c=color_map[src],
                    edgecolor='k',
                    linewidth=0.9,
                    label=f'{src}',
                    zorder=6)

    # Generate points in between the integer days
    x_interp = np.linspace(predictor.days.min(), predictor.days.max(), 300)
    x_extrap = np.linspace(predictor.days.max(), end_day, 300)

    # Smooth interpolated curve
    y_smooth_interp = np.atleast_1d(predictor.interpolate(x_interp, data_type='deaths'))
    plt.plot(x_interp, y_smooth_interp, label='Interpolated (smooth)', linestyle='--', c='C1', linewidth=2, zorder=3)

    # Smooth extrapolated curve
    y_smooth_extrap = np.atleast_1d(predictor.extrapolate(x_extrap, data_type='deaths'))
    plt.plot(x_extrap, y_smooth_extrap, label='Extrapolated (smooth)', linestyle='--', c='C2', linewidth=2, zorder=2)

    plt.xlabel('Day')
    plt.ylabel('Deaths')
    plt.title('Deaths Data')
    plt.legend()
    plt.grid(True, linestyle='--', alpha=0.5)
    plt.tight_layout()
    plt.show()
except Exception as e:
    print('Plotting failed:', e)

## Comprehensive Results Summary
This code cell prints a textual report summarizing results.

In [None]:
# 12. COMPREHENSIVE RESULTS SUMMARY
# ---------------------------------

print("\n" + "="*70)
print("COMPREHENSIVE OUTBREAK PREDICTION REPORT")
print("="*70)

print("\n1. DATA SUMMARY")
print("-"*70)
print(f"Total observed data points: {len(predictor.days)}")
print(f"Time period: Days {predictor.days.min():.0f} to {predictor.days.max():.0f}")
print(f"Total cases (observed): {predictor.case_data.sum():.0f}")
print(f"Total deaths (observed): {predictor.death_data.sum():.0f}")
print(f"Overall Case Fatality Rate: {(predictor.death_data.sum()/predictor.case_data.sum()*100):.2f}%")

print("\n2. INTERPOLATION RESULTS")
print("-"*70)
print(f"Days interpolated: {len(days_to_interpolate)}")
interpolated_total = np.sum(predictor.interpolate(days_to_interpolate, 'cases'))
print(f"Estimated total cases in interpolated days: {interpolated_total:.0f}")

print("\n3. EXTRAPOLATION RESULTS")
print("-"*70)
print(f"Days extrapolated: {len(future_days)}")
future_predictions = predictor.extrapolate(future_days, 'cases')
print(f"Predicted total cases in future days: {future_predictions.sum():.0f}")
print(f"Maximum predicted daily cases: {future_predictions.max():.0f}")
print(f"Day of maximum predicted cases: Day {future_days[np.argmax(future_predictions)]:.0f}")

print("\n6. QUALITATIVE RECOMMENDATIONS")
print("-"*70)
print("✓ Interpolation Inside Observed Range (High Confidence)")
print("  - Use for estimating missing days between observed data points.")
print("  - Errors and error bounds are relatively small and stable.")

print("\n⚠ Short-Term Extrapolation (Moderate Confidence)")
print("  - Reasonable up to a few days beyond the last observed day.")
print("  - Error bounds increase but may still be acceptable for rough forecasting.")

print("\n✗ Long-Term Extrapolation (Low Confidence)")
print("  - Not recommended far beyond the observed time range.")
print("  - Real outbreaks change dynamically and are not purely polynomial.")

print("\n" + "="*70)
print("END OF REPORT")
print("="*70)