# **Exercise Session 1 - Solutions**
# Developed by Biljana Jonoska Stojkova, PhD
# Revised by Johnson Chen

## **Lecture 1 - Principles of Study Design and Sampling Techniques**

Today, we will explore the Copper case study to define primary research questions and study design. This preparation will set the stage for Day 2, where we will formulate the statistical problem while incorporating the complexities of data structures into the problem definition. 

Each student will have the opportunity to practice formulating research questions. You will learn how to identify the basic statistical concepts behind the study, including:

- **Primary Research Question (Day 1)**
- **Study Design Type (Day 1)**
- **Limitations from the Study Design and Analysis Methods (Day 1)**
- **Statistical Hypothesis Formulation (Day 2)**
- **Sample Size Considerations (Day 2)**
- **Presenting Your Study Protocol (Day 3)**

This exercise will be conducted in three parts. You will work within your team, each of you will have a role to play. The final part of the exercise will involve a small presentation of your findings.

## **Today's learning goal**:  

- Clearly define the research question

- Clearly explain complex study design


## **Study: One-Year Trial Evaluating the Durability and Antimicrobial Efficacy of Copper in Public Transportation Systems**

### **Introduction**

The main objective of this study is to test the effect of three Copper products after 12 months of use on public transit. Copper is known for its biocidal properties against microorganisms. During the COVID pandemic, epidemiological measures were taken to reduce the spread of transmissible diseases. A mining company that produces Copper funded this research study to assess the usability of Copper on public transit.

The application of copper (Cu) alloys to high-touch surfaces could help reduce the risk of cross-contamination; however, little is known about the durability and efficacy of engineered copper surfaces after prolonged use.

Three different commercially available Cu alloy products, ranging from 80 to 91.3% Cu content, were installed on high-touch surfaces in buses and trains (SkyTrain) in Vancouver, as well as subway cars, streetcars, and buses in Toronto, and monitored over the course of one year. The primary objective of this study was to establish the antimicrobial efficacy and durability of Cu alloy surfaces over 12 months of use in public transit vehicles located in two Canadian cities.

For more details read published scientific article in Nature - Scientific Reports:
[https://www.nature.com/articles/s41598-024-56225-9](https://www.nature.com/articles/s41598-024-56225-9) 

Please note that the data set we will use in this course uses simulated data, that follow the descirptions of the data in the published article.


### **Primary Research Question**

Does the use of Copper products reduce the biomass on public transit in Vancouver and Toronto after 12 months?

### **Study Design**

Stanchions (handrails on public transit) are highly touchable surfaces frequently used by passengers.

In this study, copper products were randomly installed on 110 stanchions across three buses and four trains (SkyTrain) in Vancouver, and three buses, two subway cars, and two streetcars in Toronto. Each copper-coated stanchion was paired with a control stanchion placed nearby to ensure close proximity for comparison. Bacterial counts (Colony Forming Units, CFU) were measured every two months after peak morning routes. A Petrifilm Plate Reader Advanced imager was used to collect and process the microbial samples in the microbiology lab to obtain CFU numbers. Three replicate samples were taken from each stanchion for both copper-coated and control surfaces.

There is extensive literature supporting CFU as a reliable measure of biomass on highly touchable surfaces in public transit, so this will be used as the primary outcome measure for this study.

The 12-month trial was conducted in collaboration with the Toronto Transit Commission (TTC) and Vancouver TransLink (TL). A total of 14 vehicles were used.

Microbial samples were collected one to three hours after the last passenger departed from the transit vehicle and prior to cleaning.

Although cleaning protocols were changing over the study period, microbial samples for copper and control stanchions were taken simultaneously, as the main objective was to compare the copper-coated surfaces to the control surfaces after 12 months.

Read more here: https://asda.stat.ubc.ca/Workshops/asda.stat.ubc.ca/Workshop/2024-07-VSP_Course1/ExerciseStudyDesign.html

<img src="../images/ProblemFormulation-StatsMethodologie.drawio.png">


**Teams 1-18:**

Each team member is assigned a role: a researcher or a statistician.

**Researcher**

**Q1. Describe what you want to learn from the dataset.**

- Write a sentence on the research question using simple language, e.g., `Does using copper on handrails reduce germs on public transit?`


**Task Q1 Answer (Researcher):**  Does using copper on handrails reduce germs on public transit overall vehicles and accross both cities, compared to stainless steel controls?


**Q2. Describe how you plan to collect data in simple terms.**

- Write a sentence on the study design in simple terms, such as `We will put copper on some handrails and not on others, and then we will count how many germs are on each.`

- If easier, use drawing software or pen and paper  to identify the structures in the dataset. A study design diagram or drawing can support the written details of the study design and illustrate the complexities in the data. This can also be a very useful tool when discussing with your team.

**Task Q2 Answer (Researcher):** We will install copper on three Stanchions and stainless steel (controls) on another set of three matched Stanchions within each vehicle in each city, we will take three samples from each Stanchion.  


<img src="../images/41598_2024_56225_Fig1_HTML.png">

**Statistician**

Guide the researcher in formulating a clear research question. Ask questions like this:

- Can you please tell me more about how you are planning to measure the 'reduction of germs'?

**Researcher**:

Well, we have a standard measure in the microbiology field known as Colony-forming Units (CFU). But it is highly skewed, and it is typically analyzed on a log 10 scale.

**Statistician:**

Oh, I see, so this is a relevant measure for the phenomenon of interest (reduced biomass on transit)!

**Statistician:**

Your main goal is to compare CFU for Copper and Control handrails at 12 months. These measurements need to be comparable (as in the basic math rule, we need to compare apples to apples). How are you planning to install the Copper and Control (Steel) materials on transit?

**Researcher**:

Yes, we will pair Control and Copper handrails (Stanchions) and install them close to each other on each vehicle.

**Statistician:**

How many stanchions have Copper products installed, and how many have stainless steel?

**Researcher:**

We have 3 pairs of each on 3 vehicles in two cities (Vancouver and Toronto), making a total of 36 stanchions. Do you think this will be enough sample size?

**Statistician:**

Alright! I cannot determine anything about the sample size until we have clearly defined the primary statistical hypothesis. Your study design is complex, so we will need to discuss further to get there. Do you collect a variable on pairing that will indicate which stanchion is matched to which (between the Control and Copper)?

**Researcher:**

Oh, I am not sure! I will have to refer you to the company that collects the data and stores them in a database.

**Statistician:**

How many vehicles per city? How many stanchions per vehicle? Are any repeated measures taken from each stanchion? Are they balanced?

**Researcher:**

We have 3 vehicles per city, 6 stanchions per vehicle, and from each stanchion, we take 3 samples. There is minimal missing data.

Side Note: Of course, the real study sample size was much bigger. Here, we emulate that dataset, and it is much smaller, but it incorporates most of the complexities found in the real dataset.

**Statistician:**

Given the complexity of the study design, we need to clarify your research question and prioritize the research questions.

Is your primary interest to assess the mean CFU reduction between Controls and Copper across all cities and vehicles and all repeated samples taken from each stanchion? Or do you want to assess the mean CFU reduction between Controls and Copper per city or per vehicle?

**Researcher:**

Great questions! The primary interest is to assess the mean CFU reduction between Controls and Copper across all cities and vehicles and all repeated samples.

However, we would also like to estimate the mean CFU difference between the Control and Copper for each city. But this is a secondary research question.

**Statistician:**

Alright, so for the secondary research question, we will estimate the mean CFU difference between the Control and Copper for each city, across all vehicles and all repeated samples taken from each stanchion?

**Researcher:**

Yes!

**Statistician:**

Splendid! I will write up the research questions, prioritized by primary and secondary research questions. I will also write a paragraph to describe the variables in the study design, as this needs to be explained for the report.

### In the following cells, please write one sentence for each research question and explain the study design.

**Hint:** Following the discussion between the statistician and the researcher, choose which of the research questions are primary and which are secondary research questions, and write (copy/paste) your answers in the cells below.

       A. Estimate the effect of copper in reducing the bacterial life on public transit, across all vehicles and across two Canadian cities (Vancouver and Toronto).
        
       B. Estimate the effect of copper in reducing the bacterial life on public transit, across all vehicles in each of the Canadian cities (Vancouver and Toronto) separately.

       C. Estimate the effect of copper in reducing the bacterial life on public transit for each vehicle in each of the Canadian cities (Vancouver and Toronto) separately.

       D. Estimate the effect of copper in reducing the bacterial life on public transit for each vehicle separately.

**Primary Research Question - Statistician**: A


**Secondary Research Question - Statistician**: B


**Study Design - Statistician**: Use graphics to illustrate the design. Identify which variables are structural, such as:

**Hint**:
- Which variable is the observational unit?
- Are there repeated measures for each observational unit?
- Are observational units clustered?
- Are clusters of observational units further clustered?

**Answer: Study Design - Statistician  Explain**:

- Stanchion is the observational unit
- Yes, Three repeated measures (samples) collected from each Stanchion
- Yes, Stanchions are clustered within a pair of two stanchions (a Copper and a Control stanchion within each vehicle).
- Yes, Pairs of Stanchions are clustered within each vehicle, and vehicles are clustered within city


<img src="../images/CopperStudyStructure.drawio.png">

**Statistician:**

Any estimates obtained for the primary and secondary analysis will be valid under the assumption that we can combine the data, i.e., CFU is measured according to the same protocols in each city and for each vehicle at the same time point.

Cleaning protocols need to be explained clearly and included in the research report as well. The cleaning protocol defines the scenario (and limitations) under which the estimates of this study will be valid. Therefore, I recommend that you include this as a major limitation of the study in your study protocol.

**Researcher:**

Got it! We have the cleaning protocols, and they will adhere to the same protocol as much as possible, but real life is, of course, messy. Yes, we will write this up.


### In the following cells, researcher please write one or a few sentences to address the major limitation

**Major limitation - Researcher**: When we combine the data from different vehicles and cities together we make an assumption that outcome of interest (CFU) is measured under comparable conditions accross all vehiles and cities. Eventhough cleanining protocols are mostly consistent accross the vehicles and the cities, any deviations from the standard protocols may render CFU incomparable between the vehicles and cities, which ultimately can result in biased estimates of the effect of Copper. 

**Upload your work from Lecture 1 Exercise session**

- Each student will upload the Jupiter Notebook on Canvas Course 1: https://canvas.ubc.ca/courses/144703:

 `[Lecture_1_Exercise_Session 1]_[TeamNumber]_[student name].ipynb`
eg., `Lecture_1_Exercise_Session 1_Team21_Biljana_Jonoska_Stojkova.ipynb`

- Please write at the title who was responsible for writing each paragraph. 

Navigate to the Assignments section on Canvas Course 1, and upload the Jupiter document on Canvas under:
`Class Participation\Lecture 1 - Principles of study design and sampling techniques` 
