# Using Python to fetch and analyse stop and search data

This notebook shows how Python and the `pandas` library can be used to get a story out of stop and search data. 

Specifically we're going to find out the worst day for stops and searches, as this will allow us to use a number of techniques.

We begin by importing the `pandas` library and renaming it `pd` for ease of use, and then using the [police API's Stop and searches by force 'method'](https://data.police.uk/docs/method/stops-force/) to fetch data for the Metropolitan Police force.

In [29]:
import pandas as pd
#read in some JSON from the UK police API - this should show stops by a particular force during a specified month
policestops = pd.read_json("https://data.police.uk/api/stops-force?force=metropolitan&date=2021-06")
#show the new variable
print(policestops)

      age_range  ...                    object_of_search
0       over 34  ...                    Controlled drugs
1         10-17  ...                   Offensive weapons
2         10-17  ...                   Offensive weapons
3         18-24  ...  Evidence of offences under the Act
4         25-34  ...                        Stolen goods
...         ...  ...                                 ...
16680     25-34  ...                    Controlled drugs
16681      None  ...                    Controlled drugs
16682      None  ...                    Controlled drugs
16683     10-17  ...                    Controlled drugs
16684     18-24  ...                    Controlled drugs

[16685 rows x 16 columns]


Let's use `.info()` to get an overview.

In [7]:
policestops.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16685 entries, 0 to 16684
Data columns (total 16 columns):
 #   Column                               Non-Null Count  Dtype              
---  ------                               --------------  -----              
 0   age_range                            14484 non-null  object             
 1   outcome                              16685 non-null  object             
 2   involved_person                      16685 non-null  bool               
 3   self_defined_ethnicity               16466 non-null  object             
 4   gender                               16472 non-null  object             
 5   legislation                          16685 non-null  object             
 6   outcome_linked_to_object_of_search   0 non-null      float64            
 7   datetime                             16685 non-null  datetime64[ns, UTC]
 8   removal_of_more_than_outer_clothing  0 non-null      float64            
 9   outcome_object              

## Convert the date column to a string column

The information we need - the day of the month - needs to be created by extracting it from another column: specifically the 'datetime' column.

There are a couple ways of extracting the day from that column. The most straightforward is to treat the dates as text and grab the characters that specify the day.

(Another is to use a library designed for working with datetime objects, such as `datetime`)

To convert a column from one type to another, we use the `.astype()` function.

In [31]:
#convert the datetime column to a string and store in a new column
policestops['dateasstring'] = policestops['datetime'].astype('str')
#show the first few rows
policestops['dateasstring'].head()

0    2021-06-04 12:51:00+00:00
1    2021-06-04 21:00:00+00:00
2    2021-06-05 21:10:00+00:00
3    2021-06-07 07:00:00+00:00
4    2021-06-09 12:00:00+00:00
Name: dateasstring, dtype: object

Note that this doesn't look any different to the original column - it is only the `dtype:` property at the bottom which tells us that this is an `object` rather than `datetime64[ns, UTC]`, which is what you get when using `.head()` on the original column

In [10]:
#show the original column for comparison
policestops['datetime'].head()

0   2021-06-04 12:51:00+00:00
1   2021-06-04 21:00:00+00:00
2   2021-06-05 21:10:00+00:00
3   2021-06-07 07:00:00+00:00
4   2021-06-09 12:00:00+00:00
Name: datetime, dtype: datetime64[ns, UTC]

## Create a function to extract the day from the date string

Now that we've converted those dates to strings, we can create a function that extracts the day of the month.

We know that the string always follows a particular pattern: four digits for the year, followed by a dash, then two digits for the month, a dash, and two digits for the day, and so on.

We can extract the day digits by using their positions - or **indices**.

Specifically, we can specify a **slice** - which is a start and end point (index) to grab from and to.

First, we test slicing on one of the dates to make sure we've got the right indices. 

In [19]:
#we test on the first item in this column, grabbing a slice from position 8 to position 9 (the 10 is not included)
policestops['dateasstring'][0][8:10]

'04'

Next, we create a function to store the steps we are going to go through with each date string.

In [20]:
#define a function called 'grabday' and name one parameter: 'date'
def grabday(date):
  #grab the 5th and 6th characters in 'date' and store in a variable called 'chars5to6'
  chars8to9 = date[8:10]
  #return that variable
  return(chars8to9)

To apply this function to the whole of a column we can either loop through that column and run the function on each one, or we can use `.apply()` which does the same thing.

In [26]:
#apply the function to all items in the column, and store in a variable called 'days'
days = policestops['dateasstring'].apply(grabday)
#show the first few rows
print(days.head())
#print the number of items
print(len(days))

0    04
1    04
2    05
3    07
4    09
Name: dateasstring, dtype: object
16685


This is how you would achieve a similar result using a loop - you can see more lines of code are involved. 

In [32]:
#create an empty list to store the results of the loop
dayslist = []

#loop through each item in the column
for i in policestops['dateasstring']:
  #grab the 8th-9th characters in that item and store result in variable 'day'
  #(alternatively we could run the function on that item)
  day = i[8:10]
  #add to the previously empty list
  dayslist.append(day)

#show the first 5 items in the list after the loop has finished
print(dayslist[:5])
#show how many items
print(len(dayslist))

['04', '04', '05', '07', '09']
16685


We can now add this back to the dataframe.

In [28]:
policestops['day'] = days

## Counting the most frequently occuring value (day)

Now we can use that column as the basis for a `.value_counts()` function, which also helpfully returns the results ordered from highest to lowest. 

In [32]:
policestops['day'].value_counts()

11    734
05    679
25    666
24    662
04    648
19    634
09    626
03    623
10    620
23    614
07    596
17    584
02    583
22    582
06    554
12    552
01    550
18    546
08    532
16    531
26    530
30    516
15    505
28    494
21    457
20    438
29    431
14    426
13    394
27    337
31     41
Name: day, dtype: int64

We can even convert these numbers to a percentage of all stops and searches that month.

To do that we need the number of columns. We can do that by extracting the first item from the result of running the `.shape` function on a dataset (this returns two items: the rows and columns)

In [38]:
#show what we get when running shape (two items)
print(policestops.shape)
#and what we get when adding an index of 0 to specify the first item in that series
print(policestops.shape[0])

(16685, 18)
16685


Here then is the calculation: each count is divided by the number of rows in total (the number of stop & searches)

In [39]:
#divide the number of stops on each day by the number of stops in total to get a % 
policestops['day'].value_counts()/policestops.shape[0]
#0.04 means 4%

11    0.043992
05    0.040695
25    0.039916
24    0.039676
04    0.038837
19    0.037998
09    0.037519
03    0.037339
10    0.037159
23    0.036800
07    0.035721
17    0.035001
02    0.034942
22    0.034882
06    0.033203
12    0.033084
01    0.032964
18    0.032724
08    0.031885
16    0.031825
26    0.031765
30    0.030926
15    0.030267
28    0.029607
21    0.027390
20    0.026251
29    0.025832
14    0.025532
13    0.023614
27    0.020198
31    0.002457
Name: day, dtype: float64

## Extracting the day from a datetime64 object

Here's [the other approach](https://newbedev.com/get-year-month-or-day-from-numpy-datetime64):

In [6]:
#use the .day function 
print(policestops['datetime'][0].day)
print(policestops['datetime'][2].day)

4
5


In [10]:
datedays = []

for i in policestops['datetime']:
  dateday = i.day
  datedays.append(dateday)

print(datedays)
print(len(datedays))

[4, 4, 5, 7, 9, 9, 12, 12, 12, 14, 14, 14, 15, 20, 23, 23, 1, 31, 1, 1, 1, 1, 1, 1, 2, 1, 31, 31, 1, 2, 2, 2, 1, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 5, 5, 6, 6, 6, 6, 6, 6, 6, 3, 7, 6, 7, 7, 7, 6, 7, 7, 7, 7, 8, 7, 6, 6, 6, 7, 8, 8, 7, 6, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 8, 9, 9, 9, 9, 9, 10, 9, 9, 9, 9, 9, 10, 10, 10, 9, 10, 10, 10, 8, 10, 10, 10, 10, 11, 11, 9, 11, 11, 11, 10, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 12, 12, 10, 11, 12, 12, 12, 12, 12, 12, 11, 12, 12, 12, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 13, 12, 15, 15, 15, 15, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 15, 15, 16, 16, 16, 16, 16, 17, 9, 16, 16, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 18, 18, 17, 18, 17, 18, 18, 11, 18, 18, 19, 18, 19, 17, 19, 19, 19, 19, 19, 19, 18, 18, 28, 18, 18, 19, 19, 19, 19, 19, 16, 17, 19, 19, 19, 19, 19, 19, 4, 11, 18, 20, 20, 20, 20, 20, 20, 20, 20, 20, 21, 21, 20, 20, 21, 21, 21, 21

In [18]:
#https://stackoverflow.com/questions/13703720/converting-between-datetime-timestamp-and-datetime64
dateweekdays = []

for i in policestops['datetime']:
  #the .weekday function can't be used on datetime64
  #so we have to convert to datetime
  datetime = pd.to_datetime(i)
  #then extract the weekday number
  dateweekday = datetime.weekday()
  dateweekdays.append(dateweekday)

print(dateweekdays)
print(len(dateweekdays))

[4, 4, 5, 0, 2, 2, 5, 5, 5, 0, 0, 0, 1, 6, 2, 2, 1, 0, 1, 1, 1, 1, 1, 1, 2, 1, 0, 0, 1, 2, 2, 2, 1, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 5, 5, 6, 6, 6, 6, 6, 6, 6, 3, 0, 6, 0, 0, 0, 6, 0, 0, 0, 0, 1, 0, 6, 6, 6, 0, 1, 1, 0, 6, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 2, 3, 3, 3, 2, 3, 3, 3, 1, 3, 3, 3, 3, 4, 4, 2, 4, 4, 4, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 3, 4, 5, 5, 5, 5, 5, 5, 4, 5, 5, 5, 6, 6, 6, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 5, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 3, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 3, 4, 3, 4, 4, 4, 4, 4, 5, 4, 5, 3, 5, 5, 5, 5, 5, 5, 4, 4, 0, 4, 4, 5, 5, 5, 5, 5, 2, 3, 5, 5, 5, 5, 5, 5, 4, 4, 4, 6, 6, 6, 6, 6, 6, 6, 6, 6, 0, 0, 6, 6, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 2, 3, 3, 3, 3, 4, 4, 3, 3, 4, 6, 4, 4, 4, 