# Cleaning an .ods file in pandas

We are going to import some data from [Criminal Justice System statistics quarterly: March 2021](https://www.gov.uk/government/statistics/criminal-justice-system-statistics-quarterly-march-2021).

In [None]:
import pandas as pd

In [None]:
overview21mar = "https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1011767/overview-tables-march-2021.ods"

## Dealing with the ods spreadsheet format

The first problem we have is that this isn't a CSV or XLS file.

How do we import an .ods file? [Off to Google](https://stackoverflow.com/questions/17834995/how-to-convert-opendocument-spreadsheets-to-a-pandas-dataframe):

> "This is available natively in pandas 0.25. So long as you have odfpy installed (conda install odfpy OR pip install odfpy) you can do
>
> `pd.read_excel("the_document.ods", engine="odf")`

First, then we need to install `odfpy`.


In [None]:
!pip install odfpy

Collecting odfpy
  Downloading odfpy-1.4.1.tar.gz (717 kB)
[?25l[K     |▌                               | 10 kB 20.7 MB/s eta 0:00:01[K     |█                               | 20 kB 20.7 MB/s eta 0:00:01[K     |█▍                              | 30 kB 11.8 MB/s eta 0:00:01[K     |█▉                              | 40 kB 9.7 MB/s eta 0:00:01[K     |██▎                             | 51 kB 5.1 MB/s eta 0:00:01[K     |██▊                             | 61 kB 5.4 MB/s eta 0:00:01[K     |███▏                            | 71 kB 5.8 MB/s eta 0:00:01[K     |███▋                            | 81 kB 6.6 MB/s eta 0:00:01[K     |████▏                           | 92 kB 6.7 MB/s eta 0:00:01[K     |████▋                           | 102 kB 5.2 MB/s eta 0:00:01[K     |█████                           | 112 kB 5.2 MB/s eta 0:00:01[K     |█████▌                          | 122 kB 5.2 MB/s eta 0:00:01[K     |██████                          | 133 kB 5.2 MB/s eta 0:00:01[K     |██████▍ 

Once installed, we can use the `read_excel` function from `pandas`, but specify `engine="odf"`.

The [documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html) explains that the `engine=` argument supports the following 'engines':

> "“xlrd”, “openpyxl”, “odf”, “pyxlsb”..... “odf” supports OpenDocument file formats (.odf, .ods, .odt).

We can also specify which sheet we want, and get it to skip the first 4 rows before the heading row.

In [None]:
#import the data specifying the 'engine' Docs say "Supported engines: “xlrd”, “openpyxl”, “odf”, “pyxlsb”"
#and specifying the name of the sheet
#and asking to skip the first 4 rows before the headings
overview21mardf = pd.read_excel(overview21mar, 
                                engine="odf", 
                                sheet_name="Q5_1",
                                skiprows=4)

In the first 4 rows we have '12 months ending March' above the years, and 'Number of offenders' above the last column.

In [None]:
overview21mardf.head()

Unnamed: 0,Offence type,Type of sentence,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
0,Indictable only offences,Total sentenced,19976.0,19330.0,17144.0,15681.0,14597.0,14050.0,13731.0,13263.0,12006.0,12718.0,10573.0
1,,,,,,,,,,,,,
2,,Immediate custody,13868.0,13664.0,12377.0,11450.0,10752.0,10393.0,10138.0,9762.0,8796.0,9023.0,6945.0
3,,Suspended sentence,1699.0,1509.0,1517.0,1579.0,1683.0,1667.0,1539.0,1470.0,1155.0,1276.0,1414.0
4,,Community sentence,3724.0,3589.0,2901.0,2254.0,1682.0,1472.0,1392.0,1382.0,1402.0,1804.0,1770.0


## Cleaning while importing

There are other ways of achieving the same or similar results. [See the documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html) for the arguments you can use with `read_excel()`.

In [None]:
#import the data specifying the header row is row 5
overview21mardf = pd.read_excel(overview21mar, 
                                engine="odf", 
                                sheet_name="Q5_1", 
                                header=4)
#show first few rows
overview21mardf.head()

Unnamed: 0,Offence type,Type of sentence,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
0,Indictable only offences,Total sentenced,19976.0,19330.0,17144.0,15681.0,14597.0,14050.0,13731.0,13263.0,12006.0,12718.0,10573.0
1,,,,,,,,,,,,,
2,,Immediate custody,13868.0,13664.0,12377.0,11450.0,10752.0,10393.0,10138.0,9762.0,8796.0,9023.0,6945.0
3,,Suspended sentence,1699.0,1509.0,1517.0,1579.0,1683.0,1667.0,1539.0,1470.0,1155.0,1276.0,1414.0
4,,Community sentence,3724.0,3589.0,2901.0,2254.0,1682.0,1472.0,1392.0,1382.0,1402.0,1804.0,1770.0


### Specifying column names

We can also specify things like column names and data types rather than leave it to `read_excel()` to figure out.

In the code below we create a list of column names that we can then use when importing the data, with the `names=` argument.

In [1]:
#create list to store our new column headers in
#it's already got the first two, but we can easily create the rest and add them on
newcolumnheaders = ['offence','sentencetype']

#loop through a range of numbers from 2011 to 2021
for i in range(2011,2022):
  #add a string version of that number to the list
  newcolumnheaders.append(str(i))

newcolumnheaders

['offence',
 'sentencetype',
 '2011',
 '2012',
 '2013',
 '2014',
 '2015',
 '2016',
 '2017',
 '2018',
 '2019',
 '2020',
 '2021']

Here's an alternative way of doing it: creating two lists and then joining them together

In [2]:
#create a list of years as strings 
yrstrings = [str(i) for i in range(2011,2022)]
#create a list with the first two column names
newcols =['offence','sentencetype']
#add the two lists together to overwrite the second list
newcols= newcols+yrstrings
print(newcols)

['offence', 'sentencetype', '2011', '2012', '2013', '2014', '2015', '2016', '2017', '2018', '2019', '2020', '2021']


And here's how to use that list when importing the data, with the `names=` argument.

In [None]:

#import the data specifying the header row is row 5 and the names of columns
overview21mardf = pd.read_excel(overview21mar, 
                                engine="odf", 
                                sheet_name="Q5_1", 
                                header=4,
                                names=newcols) #use that list to specify the column names
#show first few rows
overview21mardf.head()

### Specifying data types

Another problem we can solve at import is the data types.

Here are the data types as 'guessed' when importing.

In [None]:
#store data types of each column
dftypes = overview21mardf.dtypes
#show
dftypes

offence          object
sentencetype     object
2011             object
2012            float64
2013            float64
2014            float64
2015            float64
2016            float64
2017            float64
2018            float64
2019            float64
2020            float64
2021             object
dtype: object

We can see the last column - 2021 - is an 'object' when it should be a number (float). The same applies to 2011.

What is an 'object'? Well it's actually a type of data that retains the variation within Excel itself. For example, let's show the value in the first cell of that last column.

(We use the `.iloc[]` function to specify a location by its row and column indices)

In [None]:
#show the value of the 'cell' in row 0, column 12
print(overview21mardf.iloc[0,12])
#show the type
print(type(overview21mardf.iloc[0,12]))
#show the value of row 1, column 12
print(overview21mardf.iloc[1,12])
#show the type
print(type(overview21mardf.iloc[1,12]))

10573
<class 'int'>
nan
<class 'float'>


We can specify we want the last column to be a float, but we will get an error because not all the cells contain values that can be converted.

In [None]:
#create a list of years as strings 
yrstrings = [str(i) for i in range(2011,2022)]
#create a list with the first two column names
newcols =['offence','sentencetype']
#add the two lists together to overwrite the second list
newcols= newcols+yrstrings
print(newcols)
#import the data specifying the header row is row 5 and the names of columns
overview21mardf = pd.read_excel(overview21mar, 
                                engine="odf", 
                                sheet_name="Q5_1", 
                                header=4,
                                names=newcols,
                                converters={'2021':float}) #specify the data types
#show first few rows
overview21mardf.head()

['offence', 'sentencetype', '2011', '2012', '2013', '2014', '2015', '2016', '2017', '2018', '2019', '2020', '2021']


ValueError: ignored

## Using the `collections` library to count frequency of values in a list

We can check that by looping through the column, checking the type of each value, and creating a list of those types.

Then, the `Counter()` function from the `collections` library can be used to count frequency in that list.

In [None]:
#create empty list
listofdtypes = []

#grab the column's values
colvalues = overview21mardf['2021']

#loop through that column
for i in colvalues:
  #add type of value to our list
  listofdtypes.append(type(i))



In [None]:
#create a list of the types of each value in that column
typelist = [type(i) for i in overview21mardf.iloc[:,12]]
#import a library to use the Counter function
import collections
#count the frequency of items in the list
print(collections.Counter(typelist))

Counter({<class 'int'>: 109, <class 'float'>: 73, <class 'str'>: 1})


There's one string, which is the culprit. But we can deal with that later.

## Get rid of empty rows using `dropna`

Some rows are entirely empty. How can we get rid of those? Handily, `pandas` has a function that does that: `.dropna()`

However, if used without any arguments, this drops all rows with an `NaN` value or empty cell *anywhere* (note the index column loses all rows from 1-12):

In [None]:
#import the data again
overview21mardf = pd.read_excel(overview21mar, 
                                engine="odf", 
                                sheet_name="Q5_1", 
                                skiprows=4)

In [None]:
overview21mardf.dropna()

Unnamed: 0,Offence type,Type of sentence,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
0,Indictable only offences,Total sentenced,19976,19330.0,17144.0,15681.0,14597.0,14050.0,13731.0,13263.0,12006.0,12718.0,10573
13,Triable either way offences,Total sentenced,331344,318666.0,278796.0,277009.0,263120.0,244535.0,227700.0,202037.0,186242.0,184452.0,144916
26,Summary non-motoring,Total sentenced,495156,493826.0,456005.0,438850.0,463750.0,483856.0,470528.0,474189.0,460116.0,429662.0,188861
39,Summary motoring,Total sentenced,514207,462995.0,452644.0,450133.0,483453.0,510480.0,531615.0,510461.0,533451.0,531365.0,421399
52,Offence not known(7),Total sentenced,0,0.0,0.0,0.0,0.0,0.0,0.0,62.0,266.0,741.0,499
65,All offences,Total sentenced,1360683,1294817.0,1204589.0,1181673.0,1224920.0,1252921.0,1243574.0,1200012.0,1192081.0,1158938.0,766248
84,Offence type,Type of sentence,2011,2012.0,2013.0,2014.0,2015.0,2016.0,2017.0,2018.0,2019.0,2020.0,2021
85,Indictable only offences,Total sentenced,19976,19330.0,17143.0,15680.0,14594.0,14043.0,13728.0,13260.0,12006.0,12717.0,10571
98,Triable either way offences,Total sentenced,330513,317867.0,277989.0,276243.0,262318.0,243600.0,226862.0,201117.0,185318.0,183651.0,144565
111,Summary non-motoring,Total sentenced,493832,492718.0,455139.0,437869.0,462715.0,482200.0,468705.0,471679.0,457825.0,427706.0,188496


If we don't want that to happen we have to include an argument which specifies *which* columns we want to filter on.

In [None]:
overview21mardf.dropna(subset=["2011"], inplace=True)
overview21mardf

Unnamed: 0,offence,sentencetype,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
0,Indictable only offences,Total sentenced,19976,19330.000,17144.0000,15681.0000,14597.0000,14050.0000,13731.0000,13263.0000,12006.0000,12718.0000,10573
2,,Immediate custody,13868,13664.000,12377.0000,11450.0000,10752.0000,10393.0000,10138.0000,9762.0000,8796.0000,9023.0000,6945
3,,Suspended sentence,1699,1509.000,1517.0000,1579.0000,1683.0000,1667.0000,1539.0000,1470.0000,1155.0000,1276.0000,1414
4,,Community sentence,3724,3589.000,2901.0000,2254.0000,1682.0000,1472.0000,1392.0000,1382.0000,1402.0000,1804.0000,1770
5,,Fine,45,21.000,23.0000,51.0000,29.0000,47.0000,95.0000,75.0000,72.0000,79.0000,84
...,...,...,...,...,...,...,...,...,...,...,...,...,...
156,,Absolute discharge,8572,8013.000,7287.0000,6768.0000,5530.0000,8928.0000,4457.0000,5332.0000,4026.0000,4048.0000,1749
157,,Conditional discharge,90457,85148.000,78064.0000,75049.0000,71312.0000,63623.0000,53718.0000,45476.0000,39848.0000,35803.0000,24198
158,,Compensation,7790,6550.000,7940.0000,8487.0000,6180.0000,4886.0000,4870.0000,4955.0000,4850.0000,4506.0000,3539
159,,Otherwise dealt with(5),24655,21997.000,16609.0000,18626.0000,19747.0000,11804.0000,13228.0000,12475.0000,11570.0000,11970.0000,8507


## Dealing with `KeyError` due to numeric column headers

If we hadn't converted those column names from numbers to strings we would have hit an error here. This is because we shouldn't have numbers (integers) as column headings because that confuses pandas (which also uses integers as indices for columns). So `2011` would have been interpreted as 'column index 2011'.


Another way of fixing the column names would have been to loop through the existing column names and converting them to strings like so:

In [None]:
#create an empty list
newcolnames = []
#loop through the columns
for i in overview21mardf.columns:
  #convert to string
  strversion = str(i)
  #add to previously empty list
  newcolnames.append(strversion)
print(newcolnames)
#replace the old column names with the new ones
overview21mardf.columns = newcolnames

['Offence type', 'Type of sentence ', '2011', '2012', '2013', '2014', '2015', '2016', '2017', '2018', '2019', '2020', '2021']


### Using the `map` function to apply a function to all items in a list

Another approach is to use [the `.map()` function](https://realpython.com/python-map-function/), which basically saves us having to loop.

This applies the specified function (`str` in this case) to each item in the list, in order to create a new one:

In [None]:
overview21mardf.columns = overview21mardf.columns.map(str)
overview21mardf.columns

Index(['Offence type', 'Type of sentence ', '2011', '2012', '2013', '2014',
       '2015', '2016', '2017', '2018', '2019', '2020', '2021'],
      dtype='object')

## Fill down using `ffill`

The `ffill` function will fill down when given `axis=0` (to fill across use `axis=1`).

Note that this fills row index 1 with the values from row 0, too. So it's a good thing we removed that row first.

In [None]:
#fill down into empty cells (axis 0 means columns)
overview21mardf = overview21mardf.ffill(axis=0)
#show the results
overview21mardf

Unnamed: 0,Offence type,Type of sentence,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
0,Indictable only offences,Total sentenced,19976,19330.000,17144.0000,15681.0000,14597.0000,14050.0000,13731.0000,13263.0000,12006.0000,12718.0000,10573.0000
2,Indictable only offences,Immediate custody,13868,13664.000,12377.0000,11450.0000,10752.0000,10393.0000,10138.0000,9762.0000,8796.0000,9023.0000,6945.0000
3,Indictable only offences,Suspended sentence,1699,1509.000,1517.0000,1579.0000,1683.0000,1667.0000,1539.0000,1470.0000,1155.0000,1276.0000,1414.0000
4,Indictable only offences,Community sentence,3724,3589.000,2901.0000,2254.0000,1682.0000,1472.0000,1392.0000,1382.0000,1402.0000,1804.0000,1770.0000
5,Indictable only offences,Fine,45,21.000,23.0000,51.0000,29.0000,47.0000,95.0000,75.0000,72.0000,79.0000,84.0000
...,...,...,...,...,...,...,...,...,...,...,...,...,...
156,All offences,Absolute discharge,8572,8013.000,7287.0000,6768.0000,5530.0000,8928.0000,4457.0000,5332.0000,4026.0000,4048.0000,1749.0000
157,All offences,Conditional discharge,90457,85148.000,78064.0000,75049.0000,71312.0000,63623.0000,53718.0000,45476.0000,39848.0000,35803.0000,24198.0000
158,All offences,Compensation,7790,6550.000,7940.0000,8487.0000,6180.0000,4886.0000,4870.0000,4955.0000,4850.0000,4506.0000,3539.0000
159,All offences,Otherwise dealt with(5),24655,21997.000,16609.0000,18626.0000,19747.0000,11804.0000,13228.0000,12475.0000,11570.0000,11970.0000,8507.0000


## Reshape wide to long

We have a column for each year but that's not ideal for data analysis. What we really want is a 'year' column.

To do this we need to [reshape the data from 'wide to long'](https://towardsdatascience.com/wide-to-long-data-how-and-when-to-use-pandas-melt-stack-and-wide-to-long-7c1e0f462a98). 

The `melt()` function is great for this. First, we need to know what the columns are called...

In [None]:
#remind ourselves what the columns are
#we can also copy directly from this
overview21mardf.columns

Index(['Offence type', 'Type of sentence ', '2011', '2012', '2013', '2014',
       '2015', '2016', '2017', '2018', '2019', '2020', '2021'],
      dtype='object')

We use those in the ingredients of the `melt()` function: 

* `id_vars=` specifies which columns we want to keep as they were. This needs to be a list
* `value_vars` specifies which ones we want to reshape, using them as values instead of labels ('2011' is the value that will fill in a new column for the 'year')
* `var_name` specifies what we want to call this new column ('year')
* `value_name` specifies what we want to call the column which is going to contain all the values which were associated with the columns we're getting rid of. In this case it's 'total', indicating the total cases in each year and category.

In [None]:
#use melt to convert the year column headings into a single column
overview21mardf.melt(id_vars=['Offence type', 'Type of sentence '], value_vars=['2011', '2012', '2013', '2014',
       '2015', '2016', '2017', '2018', '2019', '2020', '2021'], var_name='year', value_name='total')

Unnamed: 0,Offence type,Type of sentence,year,total
0,Indictable only offences,Total sentenced,2011,19976
1,Indictable only offences,Immediate custody,2011,13868
2,Indictable only offences,Suspended sentence,2011,1699
3,Indictable only offences,Community sentence,2011,3724
4,Indictable only offences,Fine,2011,45
...,...,...,...,...
1337,All offences,Absolute discharge,2021,1749
1338,All offences,Conditional discharge,2021,24198
1339,All offences,Compensation,2021,3539
1340,All offences,Otherwise dealt with(5),2021,8507


In [None]:
#store those results permanently
overview21mardf = overview21mardf.melt(id_vars=['Offence type', 'Type of sentence '], value_vars=['2011', '2012', '2013', '2014',
       '2015', '2016', '2017', '2018', '2019', '2020', '2021'], var_name='year', value_name='total')

In [None]:
overview21mardf.to_csv("cleaneddata.csv")

## Creating a pivot table and format the numbers properly

We can use that data to create a pivot table. Note that the results are presented in scientific notation.

In [None]:
pivot12 = overview21mardf.pivot_table(index="Type of sentence ", 
                        values="total",
                        columns="year", 
                        aggfunc="sum")
pivot12

Unnamed: 0_level_0,year,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
Type of sentence,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Absolute discharge,total,34374.0,32108.0,29240.0,27112.0,22176.0,35780.0,17850.0,21374.0,16128.0,16226.0,7012.0
Average custodial sentence (months)(6),total,,164.9658,173.0099,182.6475,192.405,197.4078,201.2571,330.9403,310.1586,316.2768,296.1874
Community sentence,total,755320.0,696540.0,565408.0,496040.0,444416.0,452004.0,409228.0,368660.0,367926.0,333000.0,237852.0
Compensation,total,31170.0,26202.0,31774.0,33950.0,24724.0,19554.0,19488.0,19834.0,19410.0,18034.0,14160.0
Conditional discharge,total,361958.0,340668.0,312344.0,300270.0,285340.0,254544.0,214940.0,181926.0,159426.0,143238.0,96808.0
Fine,total,3538968.0,3369884.0,3243524.0,3207376.0,3451232.0,3593686.0,3658198.0,3596642.0,3673930.0,3598430.0,2280840.0
Immediate custody,total,411348.0,421808.0,380588.0,372448.0,363560.0,363008.0,357524.0,336724.0,307736.0,302236.0,237744.0
Otherwise dealt with(5),total,98650.0,88034.0,66478.0,74550.0,79082.0,47302.0,53018.0,50016.0,46388.0,48006.0,34076.0
Suspended sentence,total,195272.0,190572.0,176216.0,201968.0,215588.0,228952.0,226568.0,205814.0,159814.0,159138.0,148102.0
Total sentenced,total,5427060.0,5165816.0,4805572.0,4713714.0,4886118.0,4994830.0,4956814.0,4780990.0,4750758.0,4618308.0,3056594.0


To fix that we just need to add `.astype()` to the end of the `pivot_table()` function to format any numbers in the results as integers - the text is ignored.

In [None]:
pivot12 = overview21mardf[overview21mardf.year == '2012'].pivot_table(index="Type of sentence ", 
                        values="total",
                        columns="year", 
                        aggfunc="sum").astype(int)
pivot12

year,2012
Type of sentence,Unnamed: 1_level_1
Absolute discharge,32108
Average custodial sentence (months)(6),164
Community sentence,696540
Compensation,26202
Conditional discharge,340668
Fine,3369884
Immediate custody,421808
Otherwise dealt with(5),88034
Suspended sentence,190572
Total sentenced,5165816


In [None]:
pivot12 = overview21mardf[overview21mardf.year == '2011'].pivot_table(index="Type of sentence ", 
                        values="total",
                        columns="year", 
                        aggfunc="sum").astype(int)
pivot12

Unnamed: 0_level_0,year,2011
Type of sentence,Unnamed: 1_level_1,Unnamed: 2_level_1
Absolute discharge,total,34374
Community sentence,total,755320
Compensation,total,31170
Conditional discharge,total,361958
Fine,total,3538968
Immediate custody,total,411348
Otherwise dealt with(5),total,98650
Suspended sentence,total,195272
Total sentenced,total,5427060
Type of sentence,total,2011


## Using SQL-style functions `where`, `isin` etc.

We can also use SQL-style functions to filter a dataframe [such as `isin`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.isin.html).

In [None]:
#having called the column 'Type of sentence' we then use the .isin() function to ask if the specified string(s) are in it
overview21mardf['Type of sentence'].isin(['Total sentenced'])

KeyError: ignored

What happened there? Check the column name. 

In [None]:
overview21mardf.columns

Index([     'Offence type', 'Type of sentence ',                2011,
                      2012,                2013,                2014,
                      2015,                2016,                2017,
                      2018,                2019,                2020,
                      2021],
      dtype='object')

It actually has a space in it.

In [None]:
#having called the column 'Type of sentence ' we then use the .isin() function to ask if the specified string(s) are in it
overview21mardf['Type of sentence '].isin(['Total sentenced'])

0       True
1      False
2      False
3      False
4      False
       ...  
178    False
179    False
180    False
181    False
182    False
Name: Type of sentence , Length: 183, dtype: bool

This can be used as a filter by adding it in square brackets after the name of the dataframe, like this...

In [None]:
#filter the dataframe on the True/False list created in square brackets
overview21mardf[overview21mardf['Type of sentence '].isin(['Total sentenced'])]

Unnamed: 0,Offence type,Type of sentence,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
0,Indictable only offences,Total sentenced,19976,19330.0,17144.0,15681.0,14597.0,14050.0,13731.0,13263.0,12006.0,12718.0,10573
13,Triable either way offences,Total sentenced,331344,318666.0,278796.0,277009.0,263120.0,244535.0,227700.0,202037.0,186242.0,184452.0,144916
26,Summary non-motoring,Total sentenced,495156,493826.0,456005.0,438850.0,463750.0,483856.0,470528.0,474189.0,460116.0,429662.0,188861
39,Summary motoring,Total sentenced,514207,462995.0,452644.0,450133.0,483453.0,510480.0,531615.0,510461.0,533451.0,531365.0,421399
52,Offence not known(7),Total sentenced,0,0.0,0.0,0.0,0.0,0.0,0.0,62.0,266.0,741.0,499
65,All offences,Total sentenced,1360683,1294817.0,1204589.0,1181673.0,1224920.0,1252921.0,1243574.0,1200012.0,1192081.0,1158938.0,766248
85,Indictable only offences,Total sentenced,19976,19330.0,17143.0,15680.0,14594.0,14043.0,13728.0,13260.0,12006.0,12717.0,10571
98,Triable either way offences,Total sentenced,330513,317867.0,277989.0,276243.0,262318.0,243600.0,226862.0,201117.0,185318.0,183651.0,144565
111,Summary non-motoring,Total sentenced,493832,492718.0,455139.0,437869.0,462715.0,482200.0,468705.0,471679.0,457825.0,427706.0,188496
124,Summary motoring,Total sentenced,508526,458176.0,447926.0,445392.0,478512.0,504651.0,525538.0,504365.0,527886.0,525416.0,417926


The [`where` function](https://www.geeksforgeeks.org/python-pandas-dataframe-where/) provides similar filtering functionality.