# Animal rescue incidents in London
Pawel Maciej Darulewski

Student number: s200123

Dataset: [Animal rescue incidents attended by LFB](https://data.london.gov.uk/dataset/animal-rescue-incidents-attended-by-lfb)

Webste: [Heroku](https://london-animal-rescue.herokuapp.com)

All plots were created in Bokeh library and they are stored on my GitHub page. If you cannot see the plot, you can click on the reference.

In [1]:
import math

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from bokeh.io import output_file, show, reset_output, output_notebook
from bokeh.layouts import gridplot, layout
from bokeh.models import ColumnDataSource, LinearColorMapper, Toolbar, ToolbarBox, Legend, FactorRange, Range1d
from bokeh.models.tools import HoverTool, WheelZoomTool, PanTool
from bokeh.palettes import brewer, viridis
from bokeh.plotting import figure
from bokeh.tile_providers import get_provider, Vendors
from bokeh.transform import factor_cmap, factor_mark, transform
import pyproj

output_notebook()

In [2]:
dataframe = pd.read_csv('data/Animal Rescue incidents attended by LFB from Jan 2009.csv', encoding='ISO-8859-1')

# Motivation

## Dataset description
The dataset concerns the anmial rescue incident attended by LFB and comes from London Datastore [[1]](https://data.london.gov.uk/dataset/animal-rescue-incidents-attended-by-lfb). It is a relatively small dataset
The dataset contains 6546 entries with 28 attributes:
```
RangeIndex: 6546 entries, 0 to 6545
Data columns (total 28 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   IncidentNumber              6546 non-null   object 
 1   DateTimeOfCall              6546 non-null   object 
 2   CalYear                     6546 non-null   int64  
 3   FinYear                     6546 non-null   object 
 4   TypeOfIncident              6546 non-null   object 
 5   PumpCount                   6503 non-null   float64
 6   PumpHoursTotal              6502 non-null   float64
 7   HourlyNotionalCost(£)       6546 non-null   int64  
 8   IncidentNotionalCost(£)     6502 non-null   float64
 9   FinalDescription            6541 non-null   object 
 10  AnimalGroupParent           6546 non-null   object 
 11  OriginofCall                6546 non-null   object 
 12  PropertyType                6546 non-null   object 
 13  PropertyCategory            6546 non-null   object 
 14  SpecialServiceTypeCategory  6546 non-null   object 
 15  SpecialServiceType          6546 non-null   object 
 16  WardCode                    6539 non-null   object 
 17  Ward                        6539 non-null   object 
 18  BoroughCode                 6540 non-null   object 
 19  Borough                     6540 non-null   object 
 20  StnGroundName               6546 non-null   object 
 21  PostcodeDistrict            6546 non-null   object 
 22  Easting_m                   3261 non-null   float64
 23  Northing_m                  3261 non-null   float64
 24  Easting_rounded             6546 non-null   int64  
 25  Northing_rounded            6546 non-null   int64  
 26  Latitude                    3261 non-null   float64
 27  Longitude                   3261 non-null   float64
dtypes: float64(7), int64(4), object(17)
memory usage: 1.4+ MB
```

## Why?
The dataset was chosen in order to draw attention on the problem of stray and wild animals in big agglomerations such London. The problem is severe, especially as far as foxes are concerned — over the last two decades, the number of urban foxes in the United Kingdom has increased fourfold [[2]](https://www.theguardian.com/environment/2017/apr/16/urban-foxes-number-one-for-every-300-residents-study-suggests). A lot of stray dogs and cats are in distress every day and analysing the data about the incidents may pinpoint the crucial parts of the city where a lot of animals may be in danger. This analysis may help to reduce the number of fatal cases and provide crucial information where and when services such as RSPCA should look for animals in distress.

## The goal
The goal of the analysis is to find information about the typical locations and types of animals that need help in the city. It may give the insights about places where the Fire Brigade or RSPCA should look for animals.
As far as the visualisation is concerned, the aim is to create a descriptive visualisation providing crucial information which may help rescuing the animals in London. The visualisations should interactive, consistent and eye-catching.

# Basic statistics

## Data cleaning and preprocessing
The dataset was relatively clean, however it needed some preprocessing. Firstly, majority of the coordinates were in a longitude and latitude but all of the coordinates were provided in northing and easting manner. Moreover Bokeh maps need the mercator convention, thus, the conversion had to be made with filling the empty places. This process takes a lot of time so the dataframe is saved to a separate CSV file after all of the preprocessing.

In [3]:
dataframe['Easting_m'] = dataframe['Easting_m'].fillna(dataframe['Easting_rounded'])
dataframe['Northing_m'] = dataframe['Northing_m'].fillna(dataframe['Northing_rounded'])

east_north_prj = pyproj.Proj('epsg:27700')
mercator_proj = pyproj.Proj('epsg:3857')

dataframe['mercator'] = dataframe.apply(
    lambda row: pyproj.transform(east_north_prj, mercator_proj, row['Easting_m'], row['Northing_m']),
    axis=1
)

dataframe[['x_mercator', 'y_mercator']] = pd.DataFrame(dataframe['mercator'].tolist(), index=dataframe.index)

dataframe['Easting_m'] = dataframe['Easting_m'].fillna(dataframe['Easting_rounded'])
dataframe['Northing_m'] = dataframe['Northing_m'].fillna(dataframe['Northing_rounded'])

dataframe[['x_mercator', 'y_mercator']] = pd.DataFrame(dataframe['mercator'].tolist(), index=dataframe.index)

dataframe = dataframe.drop(['mercator'], axis=1)

Moreover, the dates were converted to datetime type and new attributes were generated by that — hour, day of week, day, and month.

In [4]:
dataframe['DateTimeOfCall'] = pd.to_datetime(dataframe['DateTimeOfCall'])
dataframe['Hour'] = dataframe['DateTimeOfCall'].dt.hour
dataframe['DayOfWeek'] = dataframe['DateTimeOfCall'].dt.dayofweek
dataframe['Day'] = dataframe['DateTimeOfCall'].dt.day
dataframe['Month'] = dataframe['DateTimeOfCall'].dt.month
dataframe['Year'] = dataframe['DateTimeOfCall'].dt.year

There are also a lot of issues concernig the uppercase and lowercase incosistencies between the values which results in greater number of categories, thus every string-type column was parsed to the lowercase.

In [5]:
dataframe = dataframe.applymap(lambda x: x.lower() if type(x) == str else x)

As mentioned before, the preprocessed dataframe is stored to the separate CSV file.

In [6]:
dataframe.to_csv('data/preprocessed.csv', index=False)

Due to the fact that the operations from above take some time, I have run the notebook from this place, with the saved, preprocessed source file.

In [17]:
dataframe = pd.read_csv('data/preprocessed.csv')
dataframe['DateTimeOfCall'] = pd.to_datetime(dataframe['DateTimeOfCall'])
dataframe['Year'] = dataframe['Year'].astype(str)
dataframe['Month'] = dataframe['Month'].astype(str)
dataframe['Day'] = dataframe['Day'].astype(str)
dataframe['DayOfWeek'] = dataframe['DayOfWeek'].astype(str)
dataframe['Hour'] = dataframe['Hour'].astype(str)

The list of animal groups and boroughs were also created, because during the creation of the plots they came in handy.

In [8]:
animal_list = dataframe['AnimalGroupName'].unique()
animal_list.sort()

borough_list = dataframe['Borough'].unique()
borough_list.sort()

The styling method were also declared, as to modify the backgrounds, axis, colours of fonts and so on. Those options will be set for each of the plots, so to reduce the amount of the code, all of them were introduced in a function.

In [9]:
def style_plot(plot, with_legend=False):
    if with_legend:
        plot.legend.background_fill_alpha = 0.0
        plot.legend.border_line_alpha = 0.0
        plot.legend.label_text_color = "white"
        
    plot.border_fill_color = "#2C3033"
    plot.title.text_color ='white'
    plot.xaxis.major_label_text_color = 'white'
    plot.xaxis.major_tick_line_color = 'white'
    plot.xaxis.minor_tick_line_color = 'white'
    plot.xaxis.axis_line_color = 'white'
    plot.yaxis.major_label_text_color = 'white'
    plot.yaxis.major_tick_line_color = 'white'
    plot.yaxis.minor_tick_line_color = 'white'
    plot.yaxis.axis_line_color = 'white'
    plot.xgrid.grid_line_color = 'white'
    plot.ygrid.grid_line_color = 'white'
    plot.background_fill_color = "#2C3033"
    plot.xgrid.grid_line_color = "white"
    plot.xgrid.grid_line_alpha = 0.1
    plot.ygrid.grid_line_color = "white"
    plot.ygrid.grid_line_alpha = 0.1
    plot.xaxis.axis_label_text_color = 'white'
    plot.yaxis.axis_label_text_color = 'white'
    
    return plot        

## Dataset stats

## Count of categories

For the basic analysis, the statistics about number of incidents for categorical values of animal types, boroughs, wards, postcodes, property type, incident type, etc. were investigated. Due to the fact that some attributes contain many unique categories (e.g. *PostcodeDistrict* has got 269 unique values), only the first and last 5 elements were taken into the consideration. Thanks to that, one can quickly find the most and least common attributes concerning the rescues.

In [29]:
plots = []
categories = [
    'Hour', 'DayOfWeek', 'Month', 'Year', 'TypeOfIncident',
    'AnimalGroupParent', 'OriginofCall', 'PropertyType', 'PropertyCategory',
    'SpecialServiceTypeCategory', 'SpecialServiceType', 'Ward',
    'Borough', 'StnGroundName', 'PostcodeDistrict'
]

cmap = viridis(len(categories))

for index, category in enumerate(categories):
    
    df = pd.DataFrame(dataframe[category].value_counts()).reset_index()
    if len(df) > 10:
        df = pd.concat([df.head(5), df.tail(5)])
    source = ColumnDataSource(df)
    
    p = figure(
        x_range=FactorRange(factors=df['index']),
        plot_height=500,
        plot_width=600,
        toolbar_location=None,
        title='{} — Number of incidents'.format(category),
        x_axis_label=category,
        y_axis_label='Number of incidents',
    )
    
    p.vbar(
        x='index',
        top=category,
        source=source,
        width=0.6,
        fill_alpha=0.5,
        color=cmap[index]
    )
    
    p = style_plot(p, False)
    
    # p.y_range = Range1d(0, 500)
    p.toolbar.logo = None
    p.toolbar_location = None
    p.y_range.start = 0
    p.xaxis.major_label_orientation = math.pi/6
    plots.append(p)
    
distros = gridplot(plots, ncols=3, merge_tools=False)

# output_file('output/basic_stats.html')

show(distros)

Short conclusions concerning the plots:
* *Hour* – the incidents are reported mainly on the mornings, rather than during the evenings or nights,
* *DayOfWeek* – the least number of cases is reported on Wednesdays,
* *Month* – July has got the maximum number of cases, February got the minimum,
* *Year* – There is a decreasing trend over the years,
* *TypeOfIncident* – only one type *special incident*,
* *AnimalGroupParent* – incidents concerning cats is two and a half times bigger than the second place – the birds,
* *OriginofCall* – the mobile phone is the most popular way of communicating with the services,
* *PropertyType* – the majority of cases is in house - single occupancy, almost three times as big as purpose built flats/maisonettes - up to 3 storeys,
* *PropertyCategory* – dwelling is the most common category of property,
* *SpecialServiceTypeCategory* – other type is the most popular, probably due to the fact that the rest of the categories do not cover the "ground" level of rescue,
* *SpecialServiceType* – the domestic animals are rescued from heights mainly,
* *Ward* – Lea Bridge, the ward from north-east London have the greatest number of cases, probably due to rivers, lakes, and forests around.
* *Borough* – Enfield, the north of London has got the majority of cases,
* *StnGroundName* – Edmonton in Enfield is the station handling the most cases,
* *PostcodeDistrict* – however, on the south, the Croydon is the region with the greatest number of cases.

## Distributions considering the date and locations.

Moreover, the distribution per week of the day for the animal group were analysed. Due to the small amount of some incidents for certain animal groups, the y-axis fixed sized line were commented, to see the dependencies.

In [30]:
animals = pd.DataFrame(dataframe[['AnimalGroupParent', 'DayOfWeek']])
animals = animals.groupby(['DayOfWeek', 'AnimalGroupParent']).size().unstack()
animals.index = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']

plots = []

for animal in animal_list:
    
    df = pd.DataFrame(animals[animal]).reset_index()
    
    source = ColumnDataSource(df)
    
    p = figure(
        x_range=FactorRange(factors=df['index']),
        plot_height=150,
        plot_width=400,
        toolbar_location=None,
        title='{} — Incidents per weekday'.format(animal),
        x_axis_label='Weekday',
        y_axis_label='Frequency',
    )
    
    p.vbar(
        x='index',
        top=animal,
        source=source,
        width=0.6,
        fill_alpha=0.5,
        color='#E84AA3'
    )
    
    p = style_plot(p, False)
    
    # p.y_range = Range1d(0, 500)
    p.toolbar.logo = None
    p.toolbar_location = None
    p.y_range.start = 0
    plots.append(p)
        
animals_distro = gridplot(plots, ncols=3, merge_tools=False)

# output_file('output/animals_distro.html')

show(animals_distro)

As one can see, there is no visible relation between the day of the week and the amount of incidents. The hypothesis was that there is a significant difference between weekends and working days, due to the free time of the incident reporter. However, for the foxes, the number of cases is increasing throughout the working days, probably along with the amount of the people going out after work. It is also hard to draw conclusions concerning farm animals due to the small number of incidents. For the common types of animals, such as cats, dogs, and birds, the amount of cases is constant throughout the whole week.

In order to get more insight in the dataset, a similar analysis of number of cases were performed for boroughs and day of the week.

In [31]:
boroughs = pd.DataFrame(dataframe[['Borough', 'DayOfWeek']])
boroughs = boroughs.groupby(['DayOfWeek', 'Borough']).size().unstack()
boroughs.index = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']

plots = []

for index, borough in enumerate(borough_list):    
    df = pd.DataFrame(boroughs[borough]).reset_index()
    
    source = ColumnDataSource(df)
    
    p = figure(
        x_range=FactorRange(factors=df['index']),
        plot_height=180,
        plot_width=350,
        toolbar_location=None,
        title='{} — Incidents per weekday'.format(borough),
        x_axis_label='Weekday',
        y_axis_label='Frequency'
    )
    
    p.vbar(
        x='index',
        top=borough,
        source=source,
        width=0.6,
        fill_alpha=0.5,
        color='#0CC6E8'
    )
    
    p = style_plot(p, False)
    
    p.y_range.start = 0
    p.y_range = Range1d(0, 40)
    p.toolbar.logo = None
    p.toolbar_location = None

    plots.append(p)
        
boroughs_distro = gridplot(plots, merge_tools=False, ncols=5)
# output_file('output/boroughs_distro.html')
show(boroughs_distro)

In this case, the y-axis was fixed, to see which boroughs have great and low number of incidents. There are many cases in the suburbs of London, with very low number in the city centre. Moreover, there is no significant correlation between the borough and the date of the week.

## Locations

Due to the fact, that Bokeh does not contain a similar feature as Folium for heatmaps, it was decided to plot the animal categories in the map with semi-transparent shapes as a workaround.

In [32]:
tile_provider = get_provider(Vendors.STAMEN_TONER)
_tools_to_show = 'box_zoom,pan,save,hover,reset,tap,wheel_zoom'     

m = figure(
    title='Categories of animals',
    plot_width=880,
    plot_height=700,
    x_range=(-80000, 10000),
    y_range=(6654123, 6763961),
    x_axis_type='mercator',
    y_axis_type='mercator',
    tools=_tools_to_show
)

source = ColumnDataSource(dataframe)

species = dataframe['AnimalGroupParent'].unique()
species.sort()
markers = ['hex'] * len(species)

scatters = {}
items = list()

cmap = viridis(len(species))

counter = 0

for index, animal in enumerate(species):
    source = ColumnDataSource(dataframe.loc[dataframe['AnimalGroupParent'] == animal])
    scatters[animal] = m.scatter(
        x='x_mercator', y='y_mercator',
        source=source,
        fill_alpha=0.3,
        size=15,
        color=cmap[index],
        marker=factor_mark('AnimalGroupParent', markers, species),
        visible=False, muted_alpha=0.00)
    
    items.append((animal, [scatters[animal]]))
    
    hover = m.select(dict(type=HoverTool))
    hover.tooltips = [("Animal", "@AnimalGroupParent"),]
    hover.mode = 'mouse'

legend = Legend(items=items)

m.toolbar.active_scroll = m.select_one(WheelZoomTool)
m.add_tile(tile_provider)
m.add_layout(legend)
m.legend.location = 'top_left'
m.legend.click_policy = 'hide'
m.legend.label_text_font_size = "10px"
m.toolbar.logo = None
# m.toolbar_location = None
# m.legend.orientation = "ver"

m = style_plot(m)
# output_file('output/map_categories.html')
show(m)

This plot may be hard to interpret for popular animal incidents, such as dogs or cats. However, it pinpoints crucial locations for deers, horses, cows and sheep. As it was mentioned earlier, those kinds of animals incidents generate a lot of costs, thus those spots should be carefully protected to reduce the amount of cases.

### Correlations



# Data Analysis

## Costs

The aim of that point is to find correlations between the animal type and localisation of the issue. The hypothesis here is that the cost increases with the distance from the centre of the city and with the size, rarity of the animal.

To do that, the heatmap was created, where on x-axis there are boroughs, on y-axis the animal types and the heat values are averaged and standardised notional costs of incidents.

In [33]:
costs = dataframe[['Borough', 'AnimalGroupParent', 'IncidentNotionalCost(£)']]
costs = costs.groupby(['Borough', 'AnimalGroupParent']).mean()
costs = pd.DataFrame(costs)
costs = pd.pivot_table(costs, index='Borough', columns='AnimalGroupParent', values='IncidentNotionalCost(£)')
costs.index.name = 'Borough'
costs.columns.name = 'AnimalGroupParent'
costs = costs.stack().rename("value").reset_index()

mapper = LinearColorMapper(
    palette='Turbo256',
    low=costs.value.min(), high=costs.value.max()
)

p = figure(
    plot_width=1300,
    plot_height=600,
    title="Mean notional cost of rescue — borough vs animal type",
    x_range=list(costs.Borough.drop_duplicates()),
    y_range=list(costs.AnimalGroupParent.drop_duplicates()),
    toolbar_location=None,
    tools="")

p.rect(
    x="Borough",
    y="AnimalGroupParent",
    width=1,
    height=1,
    source=ColumnDataSource(costs),
    line_color=None,
    fill_color=transform('value', mapper),
    fill_alpha=0.7
)

p.add_tools(HoverTool(
    tooltips=[
        ('Animal group', '@AnimalGroupParent'),
        ('Borough', '@Borough'),
        ("value", "@value")],
    show_arrow=True,mode="mouse", point_policy="follow_mouse"
))

p = style_plot(p, False)
p.xaxis.major_label_orientation = math.pi/4

# output_file('output/heatmap.html')
show(p)

It turned out, that the rarity of the animal has the main influence on the cost of rescue. The cost is constant on the horizontal level (amongst the boroughs) for majority of the animal types. However, the costs are higher for big animals, such as horses, farm and heavy livestock animals. The cost is also high for unknown wild animals, probably due to uncertainties about the species and possible rescue circumstances.

This plot also gives additional information about the boroughs with certain types of animal incidents. For example in Tandridge there were no incidents with cats, dogs, birds or foxes in the dataset.

### Timeseries

For the datetime factors, monthly and weekly patterns were analysed. For the occurrences of incidents, the timeseries were represented with the code below. The resampling of the datetime index were used with the count for number of incidents and sum for costs aggregation methods.

In [34]:
p = figure(
    x_axis_type="datetime",
    plot_height=300,
    plot_width=1800
)
    
sources = []
colours = ['#FEB2F4', '#BBE8FF']
for index, resampler in enumerate(['M', 'W']):
    group = dataframe.set_index('DateTimeOfCall')
    group = group.resample(resampler).count()
    group = group.reset_index()

    sources.append(ColumnDataSource(group))

    p.line(
        x='DateTimeOfCall',
        y='IncidentNumber',
        line_width=3,
        source=sources[index],
        legend='Number of incidents {}'.format(resampler),
        color=colours[index],
        line_alpha=0.5,
    )

    p = style_plot(p, True)

# output_file('output/timeseries_incidents.html')
show(p)



Two main things are visible in this plots. First of all, it seems that there is a seasonal trend for monthly line – every year there is one characteristical peak in the middle of the year, in the summer. It may be connected with the holiday season or the heatwaves. The second one is a huge drop in the number of cases in the beginning of 2020. Moreover, there is a huge peak in the second part of July 2013, both in weekly and monthly approach.

## Costs

Sending the rescue team is expensive, so maybe it is possible to reduce costs for various types of animals. The plot similar to the one above was created, also for weekly and monthly manner.

In [35]:
p = figure(
    x_axis_type="datetime",
    plot_height=300,
    plot_width=1800
)
    
sources = []
colours = ['#82FFD5', '#FFD564']
for index, resampler in enumerate(['M', 'W']):
    group = dataframe.set_index('DateTimeOfCall')
    group = group.resample(resampler).sum()
    group = group.reset_index()

    sources.append(ColumnDataSource(group))

    p.line(
        x='DateTimeOfCall',
        y='IncidentNotionalCost(£)',
        line_width=3,
        source=sources[index],
        legend='Incident notional cost {}'.format(resampler),
        color=colours[index],
        line_alpha=0.5,
    )

    p = style_plot(p, True)
    
# output_file('output/timeseries_costs.html')
show(p)



The plot is very similar to the one above, which means that the cost of the hihgly correlated with the costs. It is important to remmember, that the dataset contains only notional costs of the incident, not the actual ones.

### Incidents hourly

The fraction of the incidents for animal groups per hour of the day were investigated. Despite the fact, that there were only a few incidents for e.g. bulls or fish, it is clearly visible, that for cats or dogs, the majority of the cases are during the day, however for deers, the incidents were introduced during the night mainly.

In [36]:
animals = dataframe[['AnimalGroupParent', 'Hour', 'IncidentNumber']]
animals = animals.groupby(['Hour', 'AnimalGroupParent']).count().reset_index()

animals['Normalisation'] = animals.reset_index().apply(
    lambda x: x['IncidentNumber'] / animals.groupby('AnimalGroupParent').sum().loc[x['AnimalGroupParent'], 'IncidentNumber'],
    axis=1
)

animals = animals.drop(['IncidentNumber'], axis=1)

animals_pivot = pd.pivot_table(animals, index='Hour', columns='AnimalGroupParent')
animals_pivot.columns = animals_pivot.columns.droplevel(0)

animals_pivot.index = animals_pivot.index.map(str)

source = ColumnDataSource(animals_pivot)

p = figure(
    x_range=FactorRange(factors=animals_pivot.index),
    plot_height=850,
    plot_width=1600,
    toolbar_location=None,
    title='Incidents per hour',
    x_axis_label='Hour of the day',
    y_axis_label='Relative frequency'
)

# Using seaborn colour palette in the Hex format for colouring each category of the crime in each iteration.
cmap = sns.color_palette('husl', len(animals_pivot.columns)).as_hex()

bar = {}
items = list()

for index, animal in enumerate(animal_list):
    bar[animal] = p.vbar(
        x='Hour',
        top=animal,
        source=source,
        width=0.6,
        color=cmap[index],
        fill_alpha=0.5,
        muted=True,
        muted_alpha=0.05,
        legend_label=animal
    )
    items.append((animal, [bar[animal]]))
    
legend = Legend(items=items)

# p.add_layout(legend, 'left')
p.legend.location = 'top_left'
p.legend.click_policy = 'mute'
p.y_range.start = 0

p = style_plot(p, True)

# output_file('output/animals_hourly.html')
show(p)

The density of incidents were presented using a hexbin Bokeh plot and the coordinates.

In [37]:
source = ColumnDataSource(dataframe)

tile_provider = get_provider(Vendors.STAMEN_TONER)

p = figure(
    title='Density of actions',
    x_axis_type='mercator',
    y_axis_type='mercator',
    tools='wheel_zoom,reset',
    x_range=(-63000, 40000),
    y_range=(6674123, 6743961),
    plot_width=800,
    plot_height=600,
    
)

p = style_plot(p)

r, bins = p.hexbin(
    x=dataframe['x_mercator'],
    y=dataframe['y_mercator'],
    size=1800,
    hover_color='#A1D3FF',
    hover_alpha=0.8,
    fill_alpha=0.8,
    palette=list(brewer['Oranges'][9])[::-1]
)

p.add_tools(HoverTool(
    tooltips=[("Count", "@c")],
    show_arrow=True,mode="mouse", point_policy="follow_mouse", renderers=[r]
))

r = p.add_tile(tile_provider)
r.level = 'underlay'
p.toolbar.logo = None
p.toolbar_location = None

# output_file('output/map_hexbin.html')
show(p)

The plot confirms the aforementioned observations — the strict city centre has significantly less cases than the areas further away from the main aglomeration, because there are much more space for wild animals and places to live for people.

## Genre

The genre of visualisation will be a partitioned poster. Infographics are eye-catching and are a good medium for sharing knowledge about current affairs. They are often used to raise awareness amongst citizens. The key is to make informative and visually consistent poster which have selection features and hovering interactivity types.

The tools used in project:
* Visual Narrative
 * Consistent Visual Platform – elements of the dashboard should be consistent that the viewer is not distracted during the reading.
 * Feature Distinction – in the analysis, there are various categorical features such as type of animals or borough. It is helpful for the viewer to distinguish the features easily.
 * Zooming – it is not video, thus it is not zooming per se, but maps or plots can be zommed in or out to be more precise about the analysis. The plots feel more interactive this way, that is why the Bokeh library was chosen.

* Narrative Structure
 * User Directed Path – the elements on the webpage are not placed in random, however, the user chooses which one to focus on in the certain moment, which gives a freedom of choose for the user (they will not be forced to follow certain steps).
 * Hover Highlighting / Details – the hovering features give more insight to the analysis for people watching it.
 * Filtering / Selection / Search – there are a lot of categorical features in the dataset, it will be easier to filter out unnecessary elements.
 * Captions / Headlines – the plots should be self-explanatory to the viewer to not waste their time understanding things on the plots.
 * Accompanying Article – the further reading may provide better insight and back conclusions from the analysis.
 * Introductory Text – it is important to give a short introduction to the problem on the webpage.

## Visualisations

1. The map
2. The map
3. The heatmap
 * wanted to find the relation
4. Bar plots
5. Blabla

## Discussion

I am very happy about the plots, I have learnt a lot about the Bokeh library doing this project and I believe that I have achieved the appropriate quality of good visualisations. I am doing dashboards in Tableau as part of my projects at work — I started to think about them a little bit differently now. When it is needed to code everything, it is not about "drag and drop" and looking for something interesting, it is important to have a good plan for visualisations. Moreover, I find Bokeh as a good alternative to Plotly as far as the interactivity is concerned.

The good thing to do would be combining this dataset with others, for example the demographic data and find interesting insights. Probably the feature engineering would allow to find relations between amount of incidents of cats and dogs and wealth of the district.

## Contributions

The whole assignment was done by me.