# Data Project Blog Post: Investigating Dengue Outbreaks in Singapore

## Singapore has a serious dengue problem. 
According to [Channel NewsAsia](https://www.channelnewsasia.com/singapore/dengue-cases-nea-warns-outbreak-2023-large-clusters-aedes-mosquito-3216781), >32,000 cases of dengue were reported in 2022. That's almost **6 times** of what was reported in 2021.

Dengue is a virus spread by the Aedes mosquito:

<img src="photos/aedes.jpeg" width=300 height=200/>

These mosquitoes might be small but they should not be underestimated. Dengue can cause:
- bone pain
- fever
- rashes 
- nausea
- internal bleeding
- death

The Aedes mosquito thrives in tropical areas and since Singapore is a tropical country, its citizens are particularly vulnerable to the virus. That's why we think it's important to study potential predictors of dengue, so that concrete action can be taken to address the dengue outbreak.

## So...what are the potential predictors of dengue?

Due to Singapore's tropical climate, weather is an *obvious* predictor of dengue. BUT, what exactly about Singapore's weather makes it a breeding ground for these mosquitoes? Are certain weather elements more influential in creating optimal conditions for these mosquitoes?

The [National Environmental Agency's](https://www.nea.gov.sg/dengue-zika/stop-dengue-now) "BLOCK" strategy to combat dengue involves the removal of stagnant water. This approach guided us in narrowing down which weather elements we should examine; water-related weather elements like **precipitation** and **relative humidity** were chosen. 

Also, if you've been to Singapore, one thing you'll notice is that it is HOT!

![hot.gif](attachment:4c2c41ac-d386-4f7c-b910-b9fe8f7bd12b.gif)

The sun beats down on us mercilessly. This led us to think that maybe the **hours of sunshine** has something to do with dengue. Hence, we decided to focus on these 3 weather elements as potential predictors of dengue.

We said weather was an obvious predictor. What about a *unobvious* predictor?

People were searching about COVID-19 symptoms on Google 2-3 weeks before they were diagnosed with the disease (READ [THIS](https://www.nature.com/articles/s41746-021-00396-6)). The same was observed for chikungunya and **dengue fever** in India (READ [THIS](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6230529/)).

Given that strong correlations have been found between diseases and Google searches, it could be the same in Singapore for a virus like dengue! Google searches are not directly related to the transmission of dengue, but it could be indicative of impending dengue outbreaks. If a strong relationship is found between Google searches and dengue cases, Google searches could become an important source of information in ensuring Singapore's healthcare infrastructure is equipped to deal with impending dengue outbreaks.


## Research Questions and Hypotheses

Our overarching research question is: How do Google search trends correlate with various measures of weather and dengue cases in Singapore?

Let's break this down:

- RQ1: What is the relationship between trends of Google search terms for dengue and the number of reported dengue cases in Singapore?
    - H1: The number of dengue-related Google searches is positively correlated with the number of reported dengue cases in Singapore.
    - H2: An increase in dengue-related Google searches will be followed by a slightly delayed increase (within 0-2 weeks) in the number of reported dengue cases in Singapore.
    - H3: An increase in the number of reported dengue cases in Singapore will be followed by a slightly delayed increased (within 0-2 weeks) in dengue-related Google searches.
<p></p>

- RQ2: What is the relationship between various weather measures and dengue cases in Singapore?
    - H4: Precipitation rate is positively correlated with the number of dengue cases in Singapore.
    - H5: Relative humidity is positively correlated with the number of dengue cases in Singapore.
    - H6: Hours of sunshine is positively correlated with the number of dengue cases in Singapore.
<p></p>

- RQ3: What is the relationship between various weather measures and trends of Google search terms for dengue in Singapore?
    - H7: Precipitation is positively correlated with the number of dengue-related Google searches in Singapore.
    - H8: Humidity is positively correlated with the number of dengue-related Google searches in Singapore.
    - H9: Hours of sunshine is positively correlated with the number of dengue-related Google searches in Singapore.
    
<p></p>
Let's see if there's evidence to back these claims up!

## Finding & Cleaning Data

1. Dengue Data

[data.gov](https://data.gov.sg/) is a repository of public data provided by the Singapore government. Through this website, we found official data about the number of reported dengue cases from 2012-2022. The data was provided in a weekly format for each individual year.

2. Google Data

Data on Google searches were found using [Google Trends](https://trends.google.com/home). We used 9 keywords: 'dengue', 'dengue fever', 'bone pain', 'rain', 'mosquito bite', 'fever', 'rashes', 'rash', 'mosquito'. These keywords were a mix of terms that directly mentioned the virus, common symptoms of the virus and weather elements associated with dengue.

We limited the location of searches to be in Singapore and the timeframe to be from 2012-2022. As Google Trends provides only *relative* data within the query duration, instead of downloading data from the entire period of 2012-2022, we downloaded datasets for each year in that period.

3. Weather Data

All weather data was downloaded from the [Open Meteo API](https://open-meteo.com/en/docs/historical-weather-api), which provided historical weather data for a variety of weather measures. Daily weather data was obtained for temperature and precipiation/rain amount, while hourly data was obtained for relative humidity (RH) and direct solar radiation (DNI) amount. This was because Open Meteo did not allow aggregated daily RH and DNI data to be obtained directly (for whatever reason). 

Corresponding to the dengue and search trend data, the weather data was scoped to the period of 2012-2022.

## Let's get cracking!

After we cleaned the data, we calculated the correlation coefficients of the relevant variables. Here's what we found:

### Finding 1 (RQ1; H1)

The correlation between the Google searches and dengue cases is positive and of moderate strength (r = +0.451). 

![Screenshot 2023-05-05 at 1.13.54 AM.png](attachment:1ba709da-828f-44bd-b95a-24ae2216ba9f.png)

However, this r-value was derived from the aggregated effect of all 9 keywords. We found that some terms produced a stronger correlation to dengue cases.

![Screenshot 2023-05-05 at 1.18.55 AM.png](attachment:fabd583b-b1ca-4568-b31f-8bb45bbeec0b.png)

We observe a trend that differentiates the strength of correlations. Terms directly referencing dengue like 'dengue fever' and 'dengue' produce strong correlations than all the terms as a whole (r = +0.508 and r = +0.619). Terms indirectly referencing dengue e.g terms about dengue symptoms produced weaker correlations.

Given this trend, analysing these terms in 2 groups instead of 1 whole group might produce more fruitful results. So, we regrouped them. Of the terms indirectly referencing dengue, 'bone pain' and 'rain' produce negative correlations of negligible strength. We decided to exclude these 2 terms.

The correlation coefficient of direct dengue-related terms with reported dengue cases is +0.590, while the correlation coefficient of indirect dengue-related terms with dengue cases is +0.321.

### Finding 2 (RQ1; H2 & H3)

Previous research of similar nature found that there were lag times between increased Google searches and increased reports of a disease. We thought that the same might apply to dengue so we introduced lags of 1-2 weeks.

Introducing lag times weakened the positive correlations between dengue cases and the 2 types of Google search terms. 
- Terms directly referencing dengue (1 week lag): +0.484
- Terms indirectly referencing dengue (1 week lag): +0.283
- Terms directly referencing dengue (2 week lag): +0.439
- Terms indirectly referencing dengue (2 week lag): +0.279

Then, we thought: what if the reports of dengue came before increases in Google searches? More media coverage on dengue cases could lead to a greater interest in the topic.

However, again, the correlations between Google search terms and dengue cases were weakened:
- Terms directly referencing dengue (1 week lag): +0.524
- Terms indirectly referencing dengue (1 week lag): +0.254
- Terms directly referencing dengue (2 week lag): +0.483
- Terms indirectly referencing dengue (2 week lag): +0.215

It seems like timely Google searches have a stronger relationship with reported dengue cases. This might be because people are searching on Google when they start to experience dengue symptoms and suspect they may have dengue. The time between experiencing these symptoms and receiving a diagnosis is likely to be shorter than the estimated onset of dengue.

### Finding 3 (RQ 2; H4 & H5)

We were fairly confident that we would find evidence to support H4. In our minds, there was a straightforward relationship of rain = more stagnant water = more breeding grounds.

Our analysis proved us **wrong**. Precipitation is negatively and weakly correlated to dengue cases (r = -0.0653).

![mindblown.gif](attachment:ba98e75f-a8eb-400d-acc0-c5742782c086.gif)

AHA! For water to become stagnant, it would need to remain untouched for a while. We introduced lags, thinking that it would produce positive correlations.

Indeed, there was a positive correlation between precipitation and dengue cases after introducing a lag of 10 weeks. However, this correlation is also weak (r = +0.151). A lag of only 1 week still produced a negative correlation (r = -0.0935).

![Screenshot 2023-05-05 at 2.00.39 AM.png](attachment:161e1462-3e51-4e5d-92cc-f44f8e02c516.png)

So, it seems we're lacking evidence to claim that precipitation is positively correlated to dengue cases. 

As our data source provided data on temperature, we were curious to explore if temperature had a relationship with dengue cases. It also produced positive but weak correlation (r = +0.160). However, once we introduced a lag of 1 week, the correlation dramatically strengthened (r = +0.885). This correlation is the strongest we have witnessed across the all variables.

We were also confident that relative humidity would be positively correlated to dengue cases as humidity is affected by rain. Now that we know rain isn't a strong predictor of dengue, we adjusted our expectations.

Alas, we were right to do so. In fact, the correlation between relative humidity and dengue cases is so weak that it's negligible (r = +0.000821). 

### Finding 4 (RQ2; H6)

What about the last weather element, hours of sunshine? We found a positive correlation between the two variables. However, this correlation is so weak that it is also negligible (r = +0.0459). 

None of the weather elements showed relationships within our expectations.

### Finding 5 (RQ3; H7, H8 & H9)

It was hard to tell what the outcome of our analysis would be.

For precipitation, we saw that in RQ1, there was a positive correlation of moderate strength between Google searches and dengue cases while in RQ2, there was a negative correlation of negligible strength between precipitation and dengue cases. In the end, both types of Google search terms produced negative and weak correlations with precipitation (r = -0.115 and r = -0.112).

For relative humidity, we also see a weak and negative correlation to both types of Google search terms (r = -0.067 and r = -0.117). This defies our expectations of higher humidity creating better conditions for the reproduction of the Aedes mosquito. Perhaps we shouldn't be equating the reproduction of the mosquitoes to the transmission of the virus.

Lastly, for hours of sunshine, it is weakly and positively correlated to Google search terms (r = +0.089 and r = +0.131). 

Overall, our exploration of H3 does not provide compelling evidence that weather and Google search terms are positively related.

## Takeaways

Overall, we find that many of our hypotheses were either rejected or even shown to be the contrary, at least based on the correlations and visual inspection of the line graph trends. One of the most intriguing is that neither rainfall nor relative humidity really correlates with dengue cases, which is opposite to what we would usually expect, given the increased breeding of Aedes aegypti mosquitoes in wetter environments. Nonetheless, mean temperature was still found to be one of the stronger predictors for dengue cases at lags of 0-1 week, in line with general expectations.

In terms of Google search trends, we find that search terms directly related to dengue are moderately and positively correlated with dengue cases with no lag intervals, which supports our overarching theory that people will tend to search more about dengue when they have dengue symptoms, which correspondingly reflects dengue case numbers. Perhaps, given the high awareness of dengue in Singapore, people with prolonged symptoms or symptoms characteristic of dengue tend to think of dengue and perform searches to confirm (or otherwise assure themselves) that they might have or do not have dengue.

Further investigation into specific time periods could be made to evaluate more specific trends, as well as to uncover the possible reasons behind the significantly lower number of dengue cases between 2017-2019 compared to previous or later years, and the subsequent sharp rises in dengue cases in 2022. A possibility is the Covid-19 pandemic which may have impacted the breeding habitats of the mosquitoes and transmission of dengue.

Additionally, more work can be done to look at the number of cases by specific regions/areas in Singapore, to investigate if characteristics of different regions/areas could be related to the number of dengue cases. For example, a more densely populated region/area in Singapore could have a higher transmission rate due to the closer proximity of residents, or that regions/areas with an older demographic may be more prone to dengue; but these speculations need to be tested.