## **Data Simulation**  
### *By Jyreneah Angel and Nicole Grace Joligon* 

## **IMPORT LIBRARIES**

In [None]:
import numpy as np
import pandas as pd

## **FUNCTION DEFINITION**

In [41]:
def simulate_refill_history(n_individuals=1000, obs_period=720):
    # Define group probabilities (e.g., groups 1 and 6 are 10% each)
    group_probs = [0.1, 0.2, 0.2, 0.2, 0.2, 0.1]
    groups = np.random.choice(np.arange(1,7), size=n_individuals, p=group_probs)

    data_list = []
    for i in range(n_individuals):
        group = groups[i]
        refill_dates = [0]  # start at day 0
        current_day = 0

        while current_day < obs_period:
            refill_duration = np.random.choice([30, 60, 90])

            # Group-specific behavior
            if group == 3:
                extra_delay = np.random.randint(0, 11) * len(refill_dates)
            elif group == 2:
                extra_delay = np.random.randint(0, 21)
            elif group == 6 and len(refill_dates) > 1:
                # Non-persistence: stop after the first refill
                break
            else:
                extra_delay = 0

            current_day += refill_duration + extra_delay
            if current_day <= obs_period:
                refill_dates.append(current_day)
            else:
                break

        # Create a DataFrame for this individual
        individual_df = pd.DataFrame({
            'individual': i+1,
            'group': group,
            'refill_date': refill_dates
        })
        individual_df['refill_duration'] = individual_df['refill_date'].diff()
        data_list.append(individual_df)

    simulated_data = pd.concat(data_list, ignore_index=True)
    return simulated_data

## Purpose
The goal of this simulation is to simulate refill histories for a population of individuals over a specific observation period. This can be useful for analyzing refill patterns, group behaviors, or persistence in medication adherence.

## Input Parameters
- **n_individuals = 1000**: The number of individuals to simulate in the study.
- **obs_period = 720**: The observation period in days (e.g., 720 days).

## Simulation Process

### 1. Group Assignment
- Each individual is assigned to one of six groups based on specified probabilities.
  - Group probabilities are as follows:
    - Groups 1 and 6 have 10% probability each.
    - Groups 2, 3, 4, and 5 have 20% probability each.

### 2. Refill Dates
- For each individual:
  - Refill history starts at day 0.
  - Refill durations are chosen randomly from 30, 60, or 90 days.
  
### 3. Group-Specific Behavior
  - **Group 3**: Extra delay is added based on the number of refills.
  - **Group 2**: Random extra delay is added (ranging from 0 to 20 days).
  - **Group 6**: Only one refill is allowed, simulating non-persistence in medication adherence.

### 4. Refill Days Calculation
  - Refill days are calculated by adding refill durations and any group-specific delays, and the process continues until the observation period ends.

## Output
The output is a DataFrame containing:
- **individual**: The individual ID.
- **group**: The group to which the individual belongs.
- **refill_date**: The dates when the individual refilled.
- **refill_duration**: The time between consecutive refills.

## Use Case
This simulated data could be used to analyze refill patterns, group behaviors, or persistence in medication adherence.


## **GENERATING SIMULATED DATA**

In [60]:
# Generate simulated data and get the last few rows
export_df = simulate_refill_history()

# Display the last few rows of the simulated data
export_df.tail()

Unnamed: 0,individual,group,refill_date,refill_duration
10625,1000,3,377,60.0
10626,1000,3,419,42.0
10627,1000,3,477,58.0
10628,1000,3,531,54.0
10629,1000,3,624,93.0
