# Breast Cancer Incidence Report (Python Version)

*Disclaimer: This report has been written for the authors learning purposes only and uses open data from Public Health Scotland under the UK Open Government Licence (OGL)*

## Aim

To inform the planning and provision of cancer treatment services by analysing breast cancer incidence data reported by NHS Borders.

## Introduction

Between 1997-2021, breast cancer had the third highest number of incidences of any cancer type reported by NHS Borders. In this period, breast cancer in males made up less than 1% of total breast cancer incidences and this report will therefor focus on incidences among females.

### Import Libraries

In [136]:
import pandas as pd
import janitor
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

### Read In Data

In [2]:
cancer_incidence = pd.read_csv("data/cancer_incidence_by_health_board.csv").clean_names()

In [53]:
five_year_summary = pd.read_csv("data/five_year_cancer_incidence_summary.csv").clean_names()

In [63]:
geography_codes = pd.read_csv("data/geography_codes_and_labels.csv").clean_names()

### Filter Data For Borders Only

In [3]:
cancer_incidence_borders = cancer_incidence[cancer_incidence["hb"] == "S08000016"].copy()

### Total Cancer Incidences by Cancer Site

Between 1997-2021, breast cancer had the third highest number of incidences of any cancer type reported by NHS Borders. In this period, breast cancer in males made up less than 1% of total breast cancer incidences and this report will therefor focus on incidences among females.

In [32]:
summarised_data = cancer_incidence_borders[~(cancer_incidence_borders["cancersite"] == "All cancer types") 
                         & (cancer_incidence_borders["sex"] == "All")].groupby("cancersite").agg(total_incidences=("incidencesallages", "sum")).copy()

In [122]:
(summarised_data[summarised_data["total_incidences"] > 2000]
.sort_values("total_incidences", ascending=False)
.reset_index()
.rename(columns={"cancersite":"Cancer Site", "total_incidences":"Total Incidences"})
.style.hide(axis="index")
)

Cancer Site,Total Incidences
Non-melanoma skin cancer,6174
Basal cell carcinoma of the skin,4049
Breast,2614
"Trachea, bronchus and lung",2534
Colorectal cancer,2514
Squamous cell carcinoma of the skin,2075


### Breast Cancer Incidences by Sex

In [120]:
(cancer_incidence_borders[(cancer_incidence_borders["cancersite"] == "Breast") 
                         & ~(cancer_incidence_borders["sex"] == "All")]
.groupby("sex")
.agg(total_incidences=("incidencesallages", "sum"))
.reset_index()
.rename(columns={"sex":"Sex", "total_incidences":"Total Incidences"})
.style.hide(axis="index")
)

Sex,Total Incidences
Females,2598
Male,16


According to NHS Borders data, breast cancer among females hasthe highest number of incidences and highest mean European age-standardised rate (EASR) of any cancer type.

### Female Cancer Incidences

In [133]:
(cancer_incidence_borders[~(cancer_incidence_borders["cancersite"] == "All cancer types") 
                         & (cancer_incidence_borders["sex"] == "Females")]
.groupby("cancersite")
.agg(total_incidences=("incidencesallages", "sum"))
.sort_values("total_incidences", ascending=False)
.head(3)
.reset_index()
.rename(columns={"cancersite":"Cancer Site", "total_incidences":"Total Incidences"})
.style.hide(axis="index").highlight_max(subset="Total Incidences", color = "#9fd6f3")
)

Cancer Site,Total Incidences
Breast,2598
Non-melanoma skin cancer,2519
Basal cell carcinoma of the skin,1882


### Female EASR by Cancer Type

In [135]:
(cancer_incidence_borders[~(cancer_incidence_borders["cancersite"] == "All cancer types") 
                         & (cancer_incidence_borders["sex"] == "Females")]
.groupby("cancersite")
.agg(mean_easr=("easr", "mean"))
.sort_values("mean_easr", ascending=False)
.head(3)
.reset_index()
.rename(columns={"cancersite":"Cancer Site", "mean_easr":"Mean EASR"})
.style.hide(axis="index").highlight_max(subset="Mean EASR", color = "#9fd6f3")
)

Cancer Site,Mean EASR
Breast,161.363978
Non-melanoma skin cancer,150.399554
Basal cell carcinoma of the skin,113.917832


## Trends Over Time

In [9]:
cancer_site_list = ["Breast", "All cancer types"]

In [10]:
ci_borders_female = cancer_incidence_borders[(cancer_incidence_borders["cancersite"].isin(cancer_site_list))
                         & (cancer_incidence_borders["sex"] == "Females")].copy()

In [163]:
fig1 = px.line(ci_borders_female, x='year', y='incidencesallages', color = "cancersite", markers=True, template="plotly_white", title = "Female Breast Cancer Incidences <br><sup>NHS Borders: 1997-2021</sup>")

fig1.update_layout(xaxis_title='Year',
                   yaxis_title='Incidences',
                   yaxis_range=[0,500],
                  xaxis = dict(
                      tickmode = 'linear',
                      tick0 = 1997,
                      dtick = 1))

fig1.update_xaxes(tickangle=45)


fig1.show()

**What does this visualisation tell us?**

- That breast cancer incidences in females appear to be driving the trend of all cancer type incidences.
- That breast cancer incidences appear to peak approximately every 3 years.

When we look at the year-on-year percentage changes in breast cancer incidences we can gain further insights. The table below shows:

- The average percentage increase in incidences from the previous year at the 8 peaks highlighted is 84%.
- This trend is less evident in 2020 when we may have expected it, indeed there was only a 9% increase from 2020.

**Why might there be a 3 year trend?**

Women who meet screening criteria are invited for breast screening once every 3 years (NHS National Services Scotland, 2022).

**Why might we not see the same peak in 2020 as we may have expected?**

Due to the COVID-19 pandemic, no invites to breast screenings were sent between 30 March 2020 and 3 August 2020 (Public Health Scotland, 2022).

## Health Board Comparison

To understand how these rates compare to other health boards in Scotland, we can visualise the EASR over a five year period. The EASR is the European age-standardised incidence rate per 100,000 person-years at risk.

In [61]:
five_year_summary_filtered = five_year_summary[(five_year_summary["sex"] == "Females")
                  & (five_year_summary["cancersite"] == "Breast") 
                  & ~(five_year_summary["hb"] == "GR0800001")].copy()

In [65]:
summary_merged = five_year_summary_filtered.merge(geography_codes, on = "hb", how = "left")

In [125]:
(summary_merged[["hbname", "easr"]]
.rename(columns={"hbname":"Health Board"})
.sort_values("easr", ascending=False)
.style.background_gradient(subset=["easr"]).hide(axis="index")
)

Health Board,easr
NHS Dumfries and Galloway,174.61535
NHS Lothian,172.317942
NHS Forth Valley,171.658482
NHS Lanarkshire,169.148598
NHS Greater Glasgow and Clyde,168.800748
NHS Borders,164.813609
NHS Fife,164.420714
NHS Tayside,163.422246
NHS Highland,162.50394
NHS Ayrshire and Arran,157.001853
