## Cascadia RSV A/B swab dates and serum collection date plots
**The goal of this script is to create a single plot for each patient that contains:**
- a timeline of the dates that they tested POSITIVE for RSV A or RSV B (see see computational_notebooks/gjuviler/rsv_imprinting/01-data/Imprinting_Sera2 - column name: '0a_rsv_b' or '0a_rsv_a'. A 1 denotes a positive swab.)
- a timeline of the dates that serum was collected that WE HAVE RIGHT NOW (see computational_notebooks/gjuviler/rsv_imprinting/Bloom_Simonich_CASCADIA_Oct2025_Samples.xlsx)
- the date and outcome of pre-F binding antibody tests (see computational_notebooks/gjuviler/rsv_imprinting/Imprinting_Sera2 - column name: 'ar_rsv_pre_f')
    - not sure if this should be a number or convert the number to a simple yes/no if binding occurred or not
    - currently on Teagan's plots, this is on a timeline of patient visit (redcap repeat instance), but we would rather have a date
- the date and outcome of neutralization titer assays (see computational_notebooks/gjuviler/rsv_imprinting/Imprinting_Sera2 - column name: 'ar_rsva_nd50')
    - again, this is currently on a patient visit timeline, but we want the date
    - there is only rsva neut data

**Previous work**
- Teagan has two notebooks in her comp notebook (see computational_notebooks/tmcmahon/2025/RSV_imprinting/02_notebooks) that create plots
- These notebooks are extremely long and it's hard to tell exactly what is going on. The outputs are located in the output folder, and are fairly useful. However, the changes mentioned above need to be made
- There is a Data dictionary for the swab data in the sample_info folder in 01-data in Teagan's comp notebook. 
- There is a data dictionary for the serum data in 01-data/sample_info/CASCADIA_DataMart_dd_shared_20240830 folder in Teagan's comp. notebook. 


# Next steps
The next step is getting the binding and titer data plotted by date. According to their email, ar_sars_msd_sc_date is the collection date and ar_date is the date it was run. There is also ar_sars_msd_sc_date_mmwr and ar_sars_msd_sc_date_epi (which appers to be in year - week format). In the ar_sars_msd_sc_date column, which should be the collection date, the numbers are unclear. For example, for ptid 20000401, the dates are 23332 and 22912. I have no idea what dates these are. I might need to dig around Teagan's notebook to see if there is an original file, and maybe the dates got messed up by excel somehow?

I will also need to double check which rsv column I should be using to determine the rsv positive swabs. 

### serum dates: 
- according to the data dictionary (tmcmahon2025/RSV_imprinting/01-data/sample_info/CASCADIA_DataMart_dd_shared_20240830), the date format is in SAS format, which is the number of days between January 1, 1960, and the specified date. If that is true:
- 23332 = 18 Nov. 2023, corresponds to 2023 week 46 which makes sense
- 22909 = 21 Sep. 2022, corresponds to 2022 week 38 which also makes sense
This SAS format appears to be correct - next step is to create a script that converts it to a readable date format (YYYY-MM-DD)

# Important data from each spreadsheet with column names
The three main spreadsheets (Imprinting_Sera2_, Imprinting_Swab2, and Bloom_Simonich_CASCADIA_Oct2025_Samples) have a lot of info that we don't need. 
This cell will break down the important info that needs to be plotted along with the column names and associated dates. Many of the dates are in different formats.
All files contain **ptid** (patient identifier), which is important in keeping track of which samples we have. 
This information was mostly found in the following data dictionaries:
    - tmcmahon2025/RSV_imprinting/01-data/sample_info/CASCADIA_DataMart_dd_shared_20240830
    - tmcmahon2025/RSV_imprinting/01-data/sample_info/CASCADIA_Swab results_Data dictionary

### Imprinting_Sera2
- Neutralization titer assay results: **'ar_rsva_nd50'**
    - date that the assay was performed (NOT collection date): **'ar_date'**
- RSV pre-F binding anitibody data: **'ar_rsv_pre_f'**
    - date that the serum was collected for this test: **'ar_sars_msd_sc_date'**
        - listed in SAS format (days since Jan. 1, 1960), so it must be converted

### Imprinting_Swab2
- RSV A or B swab data. This data is in multiple columns (listed below), and I'm not totally surw which to use:
    - **rsv_a** / **rsv_b** : CRI/CRIMP data - 1 = positive, 0 = negative
    - **oa_rsv_a** / **oa_rsv_b** : Open Array data - 1 = positive, 0 = negative
    - **rsv_a_all** / **rsv_b_all** : combines rsv and oa_rsv columns - 1 = positive, 0 = negative

### Bloom_Simonich_CASCADIA_Oct2025_Samples
- Originally, this was one file with two sheets, one for adults and one for children. I broke it into 3 files (1 adult, 1 children, 1 with both)
- The aliquot IDs for samples that we have on hand: **aliquot_id**
    - these don't show up in the sera sheet because they have not been tested. THe important thing is that we know where ours fall in relation to the ones that have been tested. 

### Import necessary components

In [None]:
import math
import os
import altair as alt
import numpy as np
import pandas as pd
from pathlib import Path
from datetime import datetime, timedelta

#os.chdir('..')
print(os.getcwd())


/fh/fast/bloom_j/computational_notebooks/gjuviler/rsv_imprinting


### Read the current data

In [20]:


selected_child = pd.read_csv('01-data/Bloom_Simonich_CASCADIA_Oct2025_Samples_children.csv')
selected_adult = pd.read_csv('01-data/Bloom_Simonich_CASCADIA_Oct2025_Samples_adult.csv')

sera = pd.read_csv('01-data/Imprinting_Sera2.csv')       #sera data
swab = pd.read_csv('01-data/Imprinting_swab2.csv')       #swab data
sera_swab = '01-data/Imprinting_sera_swab'      #this is the new dataframe into which I will import the necessary information

### input names for saved chart/timelines

In [48]:
date = '26.02.25'

sera_filtered_converted_date = '01-data/Imprinting_Sera2_converted_dates_to_plot.csv'       #save the filtered sera dataframe with dates converted to YYYY-MM-DD from SAS
swab_aliquot = '01-data/swab+aliquots_to_plot.csv'      #save the combined swab data with our selected aliquots concatanated in

aliquot_swab_timeline = f'03-results/26.02.25.selected_aliquot_swab_timelines.html'    #save the swab+aliquot timelines as an html

### First, let's create a new dataframe of serum that contains only the patient ids that we have on hand, then filters for ptid, aliquot id, rsva/b outcome, ar_sars_msd_sc_date, ar_rsv_pre_f, ar_rsva_nd50

In [49]:
selected_child['ptid'] = pd.to_numeric(selected_child['ptid'], downcast='integer', errors='coerce')     #convert the ptid column to int

df_aliquots_child = selected_child[['household_id', 'ptid', 'aliquot_id', 'collect_dt']]    #creates a new df with just these 4 columns

#dataframe of our selected ptids + dates
selected_ptids = []        #creates a list of patient ids (in int format) that we have that we can use to select the correct sera data
for index, ptid in enumerate(df_aliquots_child.iloc[:, 1]):
    if ptid != 'EMPTY':     #some of the rows have EMPTY where the ptid should be 
        selected_ptids.append(ptid)

sera_filtered_ptid = sera[sera['ptid'].isin(selected_ptids)].copy()
sera_filtered = sera_filtered_ptid[['ptid', 'aliquot_id', 'rsv_a_outcome', 'rsv_b_outcome', 'ar_sars_msd_sc_date', 'ar_rsv_pre_f', 'ar_date', 'ar_rsva_nd50']]
sera_filtered


Unnamed: 0,ptid,aliquot_id,rsv_a_outcome,rsv_b_outcome,ar_sars_msd_sc_date,ar_rsv_pre_f,ar_date,ar_rsva_nd50
303,20000401,3704633g,1,1,23332.0,126881.8,,
304,20000401,0027014g,1,1,,,23294.0,
305,20000401,8189674g,1,1,22909.0,293.3,,
306,20000401,0027014g,1,1,,,23201.0,422.0
307,20000401,1665540g,1,1,,,23485.0,886.0
...,...,...,...,...,...,...,...,...
637,20074272,4210123g,0,1,23172.0,750.0,,
638,20074272,7047035g,0,1,,,23397.0,465.0
639,20074272,3883693g,0,1,,,23470.0,
640,20074272,3883693g,0,1,,,23484.0,1474.0


### Convert the sas date format (days after Jan. 1, 1960) in ar_sars_msd_sc_date column to YYYY-MM-DD format

In [None]:
def sas_to_ymd(sas_date):
    if pd.isna(sas_date):
        return None
    sas_epoch = datetime(1960, 1, 1)
    converted_date = sas_epoch + timedelta(days=int(sas_date))
    return converted_date.strftime("%Y-%m-%d")

sera_filtered['collect_dt_ymd_binding'] = sera_filtered['ar_sars_msd_sc_date'].map(sas_to_ymd)
sera_filtered['dt_ymd_neut'] = sera_filtered['ar_date'].map(sas_to_ymd)

df_sera = sera_filtered     #df_sera is the sera data that will be plotted
sera_filtered






A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  sera_filtered['collect_dt_ymd_binding'] = sera_filtered['ar_sars_msd_sc_date'].map(sas_to_ymd)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  sera_filtered['dt_ymd_neut'] = sera_filtered['ar_date'].map(sas_to_ymd)


Unnamed: 0,ptid,aliquot_id,rsv_a_outcome,rsv_b_outcome,ar_sars_msd_sc_date,ar_rsv_pre_f,ar_date,ar_rsva_nd50,collect_dt_ymd_binding,dt_ymd_neut
303,20000401,3704633g,1,1,23332.0,126881.8,,,2023-11-18,
304,20000401,0027014g,1,1,,,23294.0,,,2023-10-11
305,20000401,8189674g,1,1,22909.0,293.3,,,2022-09-21,
306,20000401,0027014g,1,1,,,23201.0,422.0,,2023-07-10
307,20000401,1665540g,1,1,,,23485.0,886.0,,2024-04-19
...,...,...,...,...,...,...,...,...,...,...
637,20074272,4210123g,0,1,23172.0,750.0,,,2023-06-11,
638,20074272,7047035g,0,1,,,23397.0,465.0,,2024-01-22
639,20074272,3883693g,0,1,,,23470.0,,,2024-04-04
640,20074272,3883693g,0,1,,,23484.0,1474.0,,2024-04-18


# Filtering Swab Data

### Create a dataframe that contains only our selected ptids, aliquot ids, and dates from the selected_child spreadsheet and convert the dates to YYYY-MM-DD

In [51]:
filtered_aliquots = selected_child[['ptid', 'aliquot_id', 'collect_dt']].copy()
filtered_aliquots

def dmy_to_ymd(date_dmy):
    if date_dmy == 'EMPTY':
        return(None)
    dmy = pd.to_datetime(date_dmy, format='mixed')
    return(dmy)

filtered_aliquots['collect_dt_ymd_aliquot'] = filtered_aliquots['collect_dt'].map(dmy_to_ymd)
filtered_aliquots



Unnamed: 0,ptid,aliquot_id,collect_dt,collect_dt_ymd_aliquot
0,20000401.0,3001262g,9/21/22,2022-09-21
1,20000401.0,6125192g,11/18/23,2023-11-18
2,20000533.0,8129615g,8/19/22,2022-08-19
3,20000533.0,3614113g,8/21/23,2023-08-21
4,20001363.0,2140292g,7/24/22,2022-07-24
5,20001363.0,4446778g,9/10/23,2023-09-10
6,20002793.0,9555599g,9/3/22,2022-09-03
7,20002793.0,8242954g,9/23/23,2023-09-23
8,20002903.0,4027265g,8/22/22,2022-08-22
9,20002903.0,4636768g,9/2/23,2023-09-02


### And then we can create a filtered dataset for when our ptids test positive for rsv a or b from the swab csv and convert the dates to YYYYMMDD
I am filtering by the oa_rsv_a or oa_rsv_b column, but based on Teagan's plots, she seems to be filtering by something else. She has 3 positive events for RSV A for ptid 20001363, but they only have two if filtering the way that I am. I'll have to ask Cassie which column is actually correct. The 2022_46 week swab is the one that is missing by my filtering strategy. It's possible that I should just be filtering by the rsv_a and rsv_b columns? There is also a RSV_a_all column that combines the two I think...

In [52]:
swab_filtered = swab[swab['ptid'].isin(selected_ptids)].copy()     #filter swab data to contain only our ptids
swab_positive = swab_filtered[(swab['oa_rsv_a'] == 1) | (swab['oa_rsv_b'] == 1)]        #filter for positive rsv a or b swabs
swab_positive = swab_positive[['ptid', 'swab_date', 'oa_rsv_a', 'oa_rsv_b']]

swab_positive['collect_dt_ymd_swab'] = swab_positive['swab_date'].map(dmy_to_ymd)       #convert the swab dates to YYYYMMDD
swab_positive

  swab_positive = swab_filtered[(swab['oa_rsv_a'] == 1) | (swab['oa_rsv_b'] == 1)]        #filter for positive rsv a or b swabs


Unnamed: 0,ptid,swab_date,oa_rsv_a,oa_rsv_b,collect_dt_ymd_swab
3008,20000401,27-Sep-22,1.0,0.0,2022-09-27
3009,20000401,2-Oct-22,1.0,0.0,2022-10-02
3010,20000401,10-Oct-22,1.0,0.0,2022-10-10
3011,20000401,16-Oct-22,1.0,0.0,2022-10-16
3067,20000401,14-Nov-23,0.0,1.0,2023-11-14
3103,20000533,27-Nov-22,0.0,1.0,2022-11-27
3280,20001363,7-Nov-22,1.0,0.0,2022-11-07
3295,20001363,19-Feb-23,1.0,0.0,2023-02-19
3477,20002793,28-Jan-23,1.0,0.0,2023-01-28
3539,20002793,6-Jan-24,0.0,1.0,2024-01-06


### Now, we can merge the aliquot/date dataframe and the filtered swab dataframe to have the swab dates next to the aliquot dates

In [55]:
swab_aliquot_concat = pd.concat([swab_positive, filtered_aliquots], ignore_index=True)
swab_aliquot_concat = swab_aliquot_concat[['ptid', 'aliquot_id', 'collect_dt_ymd_aliquot', 'oa_rsv_a', 'oa_rsv_b', 'collect_dt_ymd_swab']].sort_values(['ptid'])

df_event = swab_aliquot_concat
df_event


Unnamed: 0,ptid,aliquot_id,collect_dt_ymd_aliquot,oa_rsv_a,oa_rsv_b,collect_dt_ymd_swab
0,20000401.0,,NaT,1.0,0.0,2022-09-27
1,20000401.0,,NaT,1.0,0.0,2022-10-02
2,20000401.0,,NaT,1.0,0.0,2022-10-10
3,20000401.0,,NaT,1.0,0.0,2022-10-16
4,20000401.0,,NaT,0.0,1.0,2023-11-14
...,...,...,...,...,...,...
77,20074272.0,4751353g,2023-11-11,,,NaT
78,20074272.0,9669406g,2023-12-30,,,NaT
65,,EMPTY,NaT,,,NaT
67,,EMPTY,NaT,,,NaT


In [65]:
import pandas as pd
import altair as alt

# --- Step 0: Ensure datetime columns ---
df_event["collect_dt_ymd_swab"] = pd.to_datetime(df_event["collect_dt_ymd_swab"], errors="coerce")
df_event["collect_dt_ymd_aliquot"] = pd.to_datetime(df_event["collect_dt_ymd_aliquot"], errors="coerce")
df_event["event_date"] = df_event["collect_dt_ymd_swab"].combine_first(df_event["collect_dt_ymd_aliquot"])

df_sera["collect_dt_ymd_binding"] = pd.to_datetime(df_sera["collect_dt_ymd_binding"], errors="coerce")
df_sera["dt_ymd_neut"] = pd.to_datetime(df_sera["dt_ymd_neut"], errors="coerce")

# --- Step 1: Add event_type to df_event ---
df_event["event_type"] = None
df_event.loc[df_event["oa_rsv_a"] == 1, "event_type"] = "RSV A pos."
df_event.loc[df_event["oa_rsv_b"] == 1, "event_type"] = "RSV B pos."
df_event.loc[df_event["aliquot_id"].notna(), "event_type"] = "Aliquot"

timeline_df = df_event.dropna(subset=["event_type", "event_date"]).copy()
timeline_df["ptid"] = timeline_df["ptid"].astype(int)

# --- Step 2: Create master date list including ALL dates from events + binding + neut ---
all_dates = pd.concat([
    timeline_df[["event_date"]],
    df_sera[["collect_dt_ymd_binding"]].rename(columns={"collect_dt_ymd_binding": "event_date"}),
    df_sera[["dt_ymd_neut"]].rename(columns={"dt_ymd_neut": "event_date"})
]).dropna().drop_duplicates().sort_values("event_date").reset_index(drop=True)

# Create ordered categorical date labels
all_dates["date_label_master"] = all_dates["event_date"].dt.strftime("%Y-%m-%d")
ordered_labels = all_dates["date_label_master"].tolist()

# --- Step 3: Assign master date label to all dfs ---
timeline_df["date_label_master"] = pd.Categorical(
    timeline_df["event_date"].dt.strftime("%Y-%m-%d"),
    categories=ordered_labels,
    ordered=True
)

df_sera["date_label_master_binding"] = pd.Categorical(
    df_sera["collect_dt_ymd_binding"].dt.strftime("%Y-%m-%d"),
    categories=ordered_labels,
    ordered=True
)

df_sera["date_label_master_neut"] = pd.Categorical(
    df_sera["dt_ymd_neut"].dt.strftime("%Y-%m-%d"),
    categories=ordered_labels,
    ordered=True
)

# --- Step 4: ptid dropdown ---
ptid_dropdown = alt.binding_select(
    options=sorted(timeline_df["ptid"].unique()), name="Select ptid: "
)
ptid_selection = alt.selection_point(fields=["ptid"], bind=ptid_dropdown,
                                     value=sorted(timeline_df["ptid"].unique())[0])

# --- Step 5: Bottom timeline (ordinal axis, evenly spaced with all dates) ---
timeline_chart = (
    alt.Chart(timeline_df)
    .mark_point(size=120)
    .encode(
        x=alt.X(
            "date_label_master:N",
            title="Event Dates (Chronological)",
            axis=alt.Axis(labelAngle=-45, labelFontSize=12, titleFontSize=14),
            scale=alt.Scale(domain=ordered_labels),
        ),
        y=alt.value(50),
        color=alt.Color(
            "event_type:N",
            scale=alt.Scale(domain=["RSV A pos.", "RSV B pos.", "Aliquot"],
                            range=["red", "blue", "black"]),
            legend=alt.Legend(title="Event Type", labelFontSize=12, titleFontSize=14),
        ),
        shape=alt.Shape(
            "event_type:N",
            scale=alt.Scale(domain=["RSV A pos.", "RSV B pos.", "Aliquot"],
                            range=["circle", "square", "triangle"]),
        ),
        tooltip=["ptid", "event_type", "event_date"]
    )
    .add_params(ptid_selection)
    .transform_filter(ptid_selection)
    .properties(width=800, height=100, title="Timeline of Positive Swab Events and Selected Aliquots")
)

# --- Step 6: Binding plot (ordinal x aligned to timeline) ---
binding_df = df_sera.dropna(subset=["ar_rsv_pre_f", "collect_dt_ymd_binding"]).copy()
binding_df["ptid"] = binding_df["ptid"].astype(int)

binding_chart = (
    alt.Chart(binding_df)
    .mark_point(size=80, color="green")
    .encode(
        x=alt.X("date_label_master_binding:N", axis=None, scale=alt.Scale(domain=ordered_labels)),
        y=alt.Y("ar_rsv_pre_f:Q", title="Binding (ar_rsv_pre_f)"),
        tooltip=["ptid", "ar_rsv_pre_f", "collect_dt_ymd_binding"]
    )
    .add_params(ptid_selection)
    .transform_filter(ptid_selection)
    .properties(width=800, height=150, title="Binding Data")
)

# --- Step 7: Neutralization plot (ordinal x aligned to timeline) ---
neut_df = df_sera.dropna(subset=["ar_rsva_nd50", "dt_ymd_neut"]).copy()
neut_df["ptid"] = neut_df["ptid"].astype(int)

neut_chart = (
    alt.Chart(neut_df)
    .mark_point(size=80, color="purple")
    .encode(
        x=alt.X("date_label_master_neut:N", axis=None, scale=alt.Scale(domain=ordered_labels)),
        y=alt.Y("ar_rsva_nd50:Q", title="Neutralization (ar_rsva_nd50)"),
        tooltip=["ptid", "ar_rsva_nd50", "dt_ymd_neut"]
    )
    .add_params(ptid_selection)
    .transform_filter(ptid_selection)
    .properties(width=800, height=150, title="Neutralization Titers")
)

# --- Step 8: Stack vertically ---
final_chart = alt.vconcat(
    neut_chart,
    binding_chart,
    timeline_chart
)


final_chart

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_sera["collect_dt_ymd_binding"] = pd.to_datetime(df_sera["collect_dt_ymd_binding"], errors="coerce")
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_sera["dt_ymd_neut"] = pd.to_datetime(df_sera["dt_ymd_neut"], errors="coerce")
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_sera["date_label_m

### Now, we can plot the data on a timeline

In [56]:
df_event["collect_dt_ymd_swab"]  = pd.to_datetime(df_event["collect_dt_ymd_swab"], errors="coerce")
df_event["collect_dt_ymd_aliquot"] = pd.to_datetime(df_event["collect_dt_ymd_aliquot"], errors="coerce")

df_event["event_date"] = df_event["collect_dt_ymd_swab"].combine_first(df_event["collect_dt_ymd_aliquot"])

df_event["event_type"] = None
df_event.loc[df_event["oa_rsv_a"] == 1, "event_type"] = "RSV A pos."
df_event.loc[df_event["oa_rsv_b"] == 1, "event_type"] = "RSV B pos."
df_event.loc[df_event["aliquot_id"].notna(), "event_type"] = "Aliquot"

timeline_df = df_event.dropna(subset=["event_type", "event_date"]).copy()
timeline_df["ptid"] = timeline_df["ptid"].astype(int)

# --- STEP 1: CREATE DATE LABEL STRING ---
timeline_df["date_label"] = timeline_df["event_date"].dt.strftime("%Y-%m-%d")

# --- STEP 2: Make Dropdown Selection ---
ptid_dropdown = alt.binding_select(
    options=sorted(timeline_df["ptid"].unique()),
    name="Select ptid: "
)

ptid_selection = alt.selection_point(
    fields=["ptid"],
    bind=ptid_dropdown,
    value=sorted(timeline_df["ptid"].unique())[0]
)

# --- STEP 3: BUILD ALTair CHART ---
chart = (
    alt.Chart(timeline_df)
    .mark_point(size=120)
    .encode(
        x=alt.X(
            "date_label:N",
            title="Date (YYYY-MM-DD)",
            sort=alt.SortField(field="event_date", order="ascending"),
            axis=alt.Axis(
                labelAngle=-45,
                labelFontSize=14,
                titleFontSize=18
            ),
        ),
        y=alt.value(50),
        color=alt.Color(
            "event_type:N",
            scale=alt.Scale(
                domain=["RSV A pos.", "RSV B pos.", "Aliquot"],
                range=["red", "blue", "black"]
            ),
            legend=alt.Legend(
                title="Event Type",
                labelFontSize=14,
                titleFontSize=18
            )
        ),
        shape=alt.Shape(
            "event_type:N",
            scale=alt.Scale(
                domain=["RSV A pos.", "RSV B pos.", "Aliquot"],
                range=["circle", "square", "triangle"]
            )
        ),
        tooltip=["ptid", "event_type", "event_date"]
    )
    .add_params(ptid_selection)
    .transform_filter(ptid_selection)
    .properties(
        width=800, height=100,
        title=alt.TitleParams(
            text="Timeline of Positive Swab Events and Selected Aliquots",
            fontSize=20,
            anchor='middle'
        )
    )
)

chart