In [1]:
# for automatically reloading changes in local functions
%load_ext autoreload
%autoreload 2

### Import

In [2]:
from concordance_for_github import calc_concordance_with_ratio 
import pandas as pd

### Sample Data

The input DataFrame should have the following structure:

1. **Patient ID Column**:  
   A column containing unique identifiers for patients (e.g., `Masked_patient_id`). This column ensures that each row corresponds to a specific patient.

2. **Date Column**:  
   A column with the dates when clinical activities were recorded (e.g., `Date`). The date should be in one of the following formats:  
   - `YYYY-MM-DD` (string)  
   - `datetime` format <br><br>

3. **Indicator Columns**:  
   One or more binary columns representing the indicators of interest (e.g., `BP`, `Weight`, `eGFR`). Each column indicates whether the corresponding clinical activity was performed on the given date.  
   - **Values**:  
     - `1`: The clinical activity was performed.  
     - `0`: The clinical activity was not performed.


In [86]:
# Sample data for two patients
data = {
    'Masked_patient_id': ['A1', 'A1', 'A1', 'A1', 'A2', 'A2'],
    'Date': ['2020-05-02', '2021-01-18', '2021-10-15', '2022-01-15', '2021-07-15', '2022-01-01'],
    'BP': [1, 1, 1, 1, 1, 1],
    'Weight': [0, 0, 1, 0, 1, 0],
    'eGFR': [1, 1, 1, 1, 0, 1],
}

# Create Pandas dataframe
df = pd.DataFrame(data)

# Convert date column into datetime format
df['Date'] = pd.to_datetime(df['Date'], format='%Y-%m-%d')

# Display the dataframe
display(df)

Unnamed: 0,Masked_patient_id,Date,BP,Weight,eGFR
0,A1,2020-05-02,1,0,1
1,A1,2021-01-18,1,0,1
2,A1,2021-10-15,1,1,1
3,A1,2022-01-15,1,0,1
4,A2,2021-07-15,1,1,0
5,A2,2022-01-01,1,0,1


### Calculate Concordance Scores

The `calc_concordance_with_ratio` function calculates the concordance score for each clinical indicator using the ratio model:

$$
\text{Concordance Score} = \frac{\text{Number of Concordant Days in Evaluation Period}}{\text{Total Number of Days in Evaluation Period}}
$$

The **total concordance score** is calculated by averaging the concordance scores across all individual indicators.

---

#### Parameters
- **`df_` (DataFrame)**:  
  The DataFrame containing clinical activity data, including (masked) patient IDs, dates, and activity types.  

- **`evaluation_start_date` (str)**:  
  The start date of the evaluation period in `'YYYY-MM-DD'` format.

- **`evaluation_length` (int)**:  
  The duration (in days) of the evaluation period.

- **`indicators` (list of str or str)**:  
  The clinical indicators for which concordance scores should be calculated.  
  Example: `['eGFR', 'HbA1c']` or `'eGFR'`.

- **`patient_col` (str)**:  
  Name of the column containing unique patient identifiers.

- **`date_col` (str)**:  
  Name of the column containing the dates on which activities were recorded.

- **`validity_periods` (dict, optional)**:  
  A dictionary where keys are indicators and values are their validity durations (in days).  
  If `None`, validity periods are loaded from `validity_periods_file`.  
  **Default**: `None`

- **`validity_periods_file` (str, optional)**:  
  Path to a YAML file specifying validity periods.  
  **Default**: `'validity_periods.yml'`

---

#### Returns
- **`df_score` (DataFrame)**:  
  A DataFrame containing concordance scores for all requested indicators (columns) for each patient (rows).

---

#### Example Usage
In the following example: 

1. The evaluation period starts on **1 January 2022** and spans **365 days**.
2. Concordance scores are calculated for the indicators **eGFR** and **BP** (blood pressure).
3. Validity periods are 182.5 days for BP and 365 days for eGFR.

```python
df_concordance = calc_concordance_with_ratio(
    df_=df,
    evaluation_start_date="2022-01-01",
    evaluation_length=365,
    indicators=["eGFR", "BP"],
    patient_col="Masked_patient_id",
    date_col="Date",
    validity_periods={"BP": 182.5, "eGFR": 365}
)
```

In [87]:
# Display the sample data as specified above
print("Input DataFrame:")
display(df)

# Calculate the concordance scores using the ratio model
# Note: The concordance score for 'Weight' is not calculated, as it is not specified in the indicators list.
df_concordance = calc_concordance_with_ratio(
    df_=df, 
    evaluation_start_date='2022-01-01',
    evaluation_length=365,
    indicators=['BP', 'eGFR'],
    patient_col='Masked_patient_id',
    date_col='Date', 
    validity_periods={'eGFR': 365, 'BP': 182.5}  
)

# Display the resulting concordance scores
print("\nResulting Concordance Scores:")
display(df_concordance.round(3))

Input DataFrame:


Unnamed: 0,Masked_patient_id,Date,BP,Weight,eGFR
0,A1,2020-05-02,1,0,1
1,A1,2021-01-18,1,0,1
2,A1,2021-10-15,1,1,1
3,A1,2022-01-15,1,0,1
4,A2,2021-07-15,1,1,0
5,A2,2022-01-01,1,0,1



Resulting Concordance Scores:


Unnamed: 0,Masked_patient_id,concordance_BP,concordance_eGFR,concordance_total
0,A1,0.538,1.0,0.769
1,A2,0.5,1.0,0.75


### Using your own data
To use your own data, ensure that your DataFrame includes the required columns as described in the **'Sample Data'** section above. 

An example CSV file is available in the GitHub repository to help you understand the expected format and structure of the data. This file also includes additional columns with test results, which are ignored since the concordance scores are not based on test results. These extra columns demonstrate that more complex DataFrames can still be used as input, as long as they include the minimum required columns. 

In [89]:
# Load CSV file
df2 = pd.read_csv('sample_data.csv')

# Change date column to DateTime format (modify column name and format if necessary)
df2['Date'] = pd.to_datetime(df2['Date'], format='%d/%m/%y')  

# Display the sample data as specified above
print("Input DataFrame:")
display(df2)

# Calculate the concordance scores using the ratio model
df_concordance = calc_concordance_with_ratio(
    df_=df2, 
    evaluation_start_date='2022-01-01',
    evaluation_length=365,
    indicators=['BP', 'Weight', 'eGFR'],
    patient_col='Masked_patient_id',
    date_col='Date', 
    validity_periods_file='validity_periods.yml' 
)

# Display the resulting concordance scores
print("\nResulting Concordance Scores:")
display(df_concordance.round(2))

Input DataFrame:


Unnamed: 0,Masked_patient_id,Date,BP,Weight,eGFR,BP_systolic,Weight_kg,eGFR_mLmin
0,A1,2020-12-15,1,0,1,118.0,,87.0
1,A1,2022-01-15,1,1,1,120.0,68.0,89.0
2,A1,2022-03-20,1,0,1,125.0,,88.0
3,A1,2022-07-10,1,1,1,122.0,70.0,90.0
4,A1,2023-04-01,1,0,0,118.0,,
5,A2,2021-06-10,1,1,0,140.0,72.0,
6,A2,2022-02-01,1,1,1,138.0,72.0,85.0
7,A2,2022-05-15,1,0,1,135.0,,83.0
8,A2,2022-09-20,1,0,0,132.0,,
9,A2,2023-02-10,0,1,1,,75.0,80.0



Resulting Concordance Scores:


Unnamed: 0,Masked_patient_id,concordance_BP,concordance_Weight,concordance_eGFR,concordance_total
0,A1,0.96,0.96,0.96,0.96
1,A2,0.92,0.5,0.92,0.78
2,A3,0.93,0.88,0.17,0.66
3,A4,0.8,0.53,0.8,0.71
4,A5,0.8,0.5,1.0,0.77
