# Visualisation Tools for Screening
### Version 0.3.0
#### Demo Project

This demo provides an example of using the *vis_tools_screening* package. 

It will cover:
1. Datasets - uploading built in ones or new ones
2. Basic statistical graphs and possible customisations
3. More advanced visualisations, including:
* Interactive bar charts
* Choropleth maps

In [None]:
# Ignore warnings for purposes of demo notebook
import warnings
warnings.filterwarnings('ignore')

### Datasets

This package features functions which take the raw data from a CSV file as input, and performs several steps to prepare it for analysis. The functions are held in the module datasets.py. The functions carry out tasks including missing data handling and removing irrelevant rows or columns.
Using this function to read in the data ensures that it is clean and prepared for visualisation steps.

There are three functions available: load_cerv(), load_bowel() and load_breast(). Each function is tailored to the data cleaning and preprocessing of the three major screening programmes in the UK respectively: cervical, bowel and breast cancer.

Import one of the built-in training datasets:

In [None]:
# Import function to load the cleaned cervical cancer DataFrame from a local file into the notebook
import datasets as ds
# Run function
df = ds.load_cerv()

The package also has functionality for importing a custom dataset, using the load_custom() function. Please place your own dataset into the 'data' folder.

If you are using your own dataset. Ensure its columns include:
* 'Area Code', eg. 'E12000001'
* 'Area Name', eg. 'Exeter'
* 'Area Type', eg. 'LA'
* 'Time period', eg. 2010
* 'Value', eg. '77.5379545'
* 'Age', eg. '25-64 yrs'
* 'Sex', eg. 'Female'

### Basic Data Exploration

In [None]:
cerv_explore = ds.BasicDataExploration(df)
cerv_explore.explore()

### Basic Statistical Graphs
Basic graph tools are within the baseline.py module. Once the DataFrame has been loaded and cleaned, basic statistical descriptive graphs may be plotted. In this example, we will plot a histogram of the screening percentage uptake data (the 'Value' column) and the number of datapoints in the dataset on the right.

Keep in mind that this is not appropriate for extensive statistical analysis at this stage, as the datapoints being plotted vary in what geographical unit the uptake percentage is being measured from. 

In [None]:
# Import basline module
from baseline import *

#### Histogram 
Can choose the numeric column in the dataset to plot. 
Change the parameters of the graph.
If you would like to learn more about the function run: help(histogram).

In [None]:
help(histogram)

In [None]:
print('A basic histogram plot of the data: ')
histogram(df, col='Value', title='Percentage Uptake Histogram', x_label='Percentage Uptake (%)', y_label='Frequency')

#### Line plot
In this next example, we will view the percentage uptake over time for three areas: Exeter, Mid Sussex and Horsham, and compare them using a lineplot.
This function can also be used to view the data for on only one area.

In [None]:
print('A basic line plot of the data: ')
linear_comp(df, area_list=['Tendring', 'Rossendale', 'Bromsgrove','Wyre', 'Dartford', 'East Staffordshire'])

### Advanced visualisations
In the next part of this demo project, we will use three advanced visualisations to explain the functionalities of a python package built to visualise screening uptake data: a London and an animated Regional England choropleth map, an animated rank-based graph and a country-wide analysis plot. 

In [None]:
# Import tools for advanced visualisations
import plotly.io as pio
pio.renderers.default = "notebook_connected"
from visualisation import *
from plotly.offline import iplot, init_notebook_mode
init_notebook_mode(connected = True)

#### London Choropleth

Our London and Regional England choropleth map will allow us to quickly and easily visualise the data on a geographical basis. Colour-coded regions will indicate the relative uptake levels of the screening across the country, making it easy to spot trends and areas of higher or lower uptake.

In [None]:
# Plot a map of London and mean uptake across all years
ldn_map = LondonMap(df)
ldn_map.val_labels = True
ldn_map.plot_london_map(colour_palette='fire')

#### Animated England Regions Choropleth

In [None]:
# Plot an animated Choropleth map of England regions
region_map = Region_Analysis(colorscale='mint')
# For information on available colorscales, run the command: 'help(Region_Analysis.process_colorscale)'

#### Deprivation Status Graphs

In [None]:
from deprivation import *

In [None]:
depriv = DeprivationPlots(df)
depriv.most_least_plot(2016)

#### Animated Rank Based Graph

The animated rank-based graph will provide a different perspective on the data. This graph will rank the uptake levels of the screening in specified areas across the country and show how they change over time. This will allow us to see how the uptake of the screening varies over time and make comparisons between different areas.

In [None]:
# View a list of all regions included in the Rank-based graph
Rank_Based_Graph(df).list_areas(area_type="Region")

In [None]:
from visualisation import visualise_rank as vis
vis(area_type = "LA", list_reg = ['Tendring', 'Rossendale', 'Bromsgrove', 'Wyre', 'Dartford', 'East Staffordshire'], sns_palette = "Spectral", width = 900, height = 600, showlegend = False, rank_text_size = 16)

If using the tools to produce graphs in other than Jupyter Notebook environment, these commands can be used to plot them individually:  
Animated Bar Chart: Rank_Based_Graph(df).animated_bars(area_type="LA", list_reg=['Tendring', 'Rossendale', 'Bromsgrove','Wyre', 'Dartford', 'East Staffordshire'])   
Animated Scatter Plot: Rank_Based_Graph(df).animated_scatter(area_type="LA", list_reg=['Tendring', 'Rossendale', 'Bromsgrove','Wyre', 'Dartford', 'East Staffordshire'])   
Animated Bar Chart of all the regions: Rank_Based_Graph(df).plot_full_animated_graph(area_type='Region')   

#### Country-wide Analysis LinePlot

Finally, our country-wide analysis plot will provide an overview on England's performance as a whole in screening uptake programmes. There is also support to fetch the years with the highest and lowest values of uptake.

In [None]:
country_analysis = Country_Analysis()

In [None]:
country_analysis.lineplot_cancer_England()