# Care Quality Commission (CQC) Data

The CQC regulates and inspects health related facilities in England and Wales.

The CQC make available a range of open data products:

- *Locations regulated by CQC* (CQC Care Directory);
- *Care directory with filters* (contains details of registered managers and care home bed numbers); archived editions of this dataset are also available;
- *Care directory with ratings*; archived editions of this dataset are also available.


In [2]:
import pandas as pd
import sqlite3

In [3]:
#If you want to build the database from scratch, delete any outstanding copy
#Uncomment and run the following command line (!) command
import time
!mv cqc.sqlite cqc_pre_{time.strftime("%Y-%m-%d")}.sqlite 
!rm cqc.sqlite

con = sqlite3.connect("cqc.sqlite")

mv: cqc.sqlite: No such file or directory
rm: cqc.sqlite: No such file or directory


## Locations regulated by CQC (CQC Care Directory)

The URL needs updating explicitly for each new release.

http://www.cqc.org.uk/sites/default/files/02_August_2017_CQC_directory.zip

In [5]:
url='http://www.cqc.org.uk/sites/default/files/02_August_2017_CQC_directory.zip'

fn=url.split('/')[-1]
stub=fn.split('.')[0]

#Download the data from the CQC website
!wget -P downloads/ {url}
!rm -r data/CQC
#Unzip the downloaded files into a subdirectory of the data folder, making sure the data dir exists first
!mkdir -p data
#The -o flag is overkill - if we hadn't deleted the original folder it would overwirte any similar files
!unzip -o -d data/CQC downloads/{fn}
!mv data/CQC/{stub}.csv  data/CQC/locations.csv

--2017-08-07 16:39:53--  http://www.cqc.org.uk/sites/default/files/02_August_2017_CQC_directory.zip
Resolving www.cqc.org.uk... 192.229.233.39
Connecting to www.cqc.org.uk|192.229.233.39|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3743388 (3.6M) [application/zip]
Saving to: 'downloads/02_August_2017_CQC_directory.zip'


2017-08-07 16:39:54 (6.85 MB/s) - 'downloads/02_August_2017_CQC_directory.zip' saved [3743388/3743388]

Archive:  downloads/02_August_2017_CQC_directory.zip
  inflating: data/CQC/02_August_2017_CQC_directory.csv  


In [6]:
locations=pd.read_csv('data/CQC/locations.csv',skiprows=4)
locations.rename(columns={'CQC Location (for office use only':'CQC Location',
                          'CQC Provider ID (for office use only)':'CQC Provider ID'}, inplace=True)

locations.head(3)

Unnamed: 0,Name,Also known as,Address,Postcode,Phone number,Service's website (if available),Service types,Date of latest check,Specialisms/services,Provider name,Local Authority,Region,Location URL,CQC Location,CQC Provider ID
0,Kingswood House Nursing Home,,"21-23 Chapel Park Road, St Leonards On Sea",TN37 6HR,1424716000.0,,Nursing homes,14/07/2016 - 00:00,Caring for adults under 65 yrs|Mental health c...,Innowood Limited,East Sussex,South East,https://www.cqc.org.uk/location/1-1000210669,1-1000210669,1-877912132
1,Human Support Group Limited - Sale,,"59 Cross Street, Sale",M33 7HF,1619429000.0,http://www.homecaresupport.co.uk,Community services - Nursing|Homecare agencies,28/07/2017 - 00:00,Sensory impairments|Caring for adults under 65...,The Human Support Group Limited,Trafford,North West,https://www.cqc.org.uk/location/1-1000312641,1-1000312641,1-101693918
2,Little Haven,,"133 Wellmeadow Road, London",SE6 1HP,2086974000.0,,Residential homes,30/11/2015 - 00:00,Mental health conditions|Accommodation for per...,Elizabeth Peters Care Homes Limited,Lewisham,London,https://www.cqc.org.uk/location/1-1000401911,1-1000401911,1-101666779


In [7]:
tmp=locations.set_index(['CQC Location'])
#If the table exists, replace it, under the assumption we are using a more recent version of the data
tmp.to_sql(con=con, name='locations',if_exists='replace')

  chunksize=chunksize, dtype=dtype)


In [8]:
#We can now run a SQL query over the data
orgcode='1-1000210669'
pd.read_sql_query('SELECT * from {typ} where "CQC Location"="{orgcode}"'.format(typ='locations',orgcode=orgcode), con)

Unnamed: 0,CQC Location,Name,Also known as,Address,Postcode,Phone number,Service's website (if available),Service types,Date of latest check,Specialisms/services,Provider name,Local Authority,Region,Location URL,CQC Provider ID
0,1-1000210669,Kingswood House Nursing Home,,"21-23 Chapel Park Road, St Leonards On Sea",TN37 6HR,1424716000.0,,Nursing homes,14/07/2016 - 00:00,Caring for adults under 65 yrs|Mental health c...,Innowood Limited,East Sussex,South East,https://www.cqc.org.uk/location/1-1000210669,1-877912132


## Care directory with filters

The URL always links to the latest file.

http://www.cqc.org.uk/sites/default/files/HSCA%20Active%20Locations.xlsx

In [9]:
url='http://www.cqc.org.uk/sites/default/files/HSCA%20Active%20Locations.xlsx'

!rm -r "data/CQC/HSCA Active Locations.xlsx"
#Download the data from the CQC website
!mkdir -p data
!wget -P data/CQC {url}

rm: data/CQC/HSCA Active Locations.xlsx: No such file or directory
--2017-08-07 16:40:01--  http://www.cqc.org.uk/sites/default/files/HSCA%20Active%20Locations.xlsx
Resolving www.cqc.org.uk... 192.229.233.39
Connecting to www.cqc.org.uk|192.229.233.39|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 25140076 (24M) [application/vnd.openxmlformats-officedocument.spreadsheetml.sheet]
Saving to: 'data/CQC/HSCA Active Locations.xlsx'


2017-08-07 16:40:04 (8.06 MB/s) - 'data/CQC/HSCA Active Locations.xlsx' saved [25140076/25140076]



In [11]:
xl=pd.ExcelFile('data/CQC/HSCA Active Locations.xlsx')
xl.sheet_names

['HSCA Active Locations']

In [12]:
directory=pd.read_excel('data/CQC/HSCA Active Locations.xlsx',sheetname='HSCA Active Locations',skiprows=6)

In [13]:
directory.head(2)

Unnamed: 0,Location ID,HSCA start date,Care home?,Location Name,Telephone Number,"Registered manager (note; where there is more than one manager at a location, only one is included here for ease of presentation. The full list is available if required).",Web Address,Care homes beds,Location Type/Sector,Location Primary Inspection Category,...,Service user band - Learning disabilities or autistic spectrum disorder,Service user band - Mental Health,Service user band - Older People,Service user band - People detained under the Mental Health Act,Service user band - People who misuse drugs and alcohol,Service user band - People with an eating disorder,Service user band - Physical Disability,Service user band - Sensory Impairment,Service user band - Whole Population,Service user band - Younger Adults
0,1-1000210669,2013-12-12,Y,Kingswood House Nursing Home,1424716303.0,"Turner, Patricia Anne",,22.0,Social Care Org,Residential social care,...,,Y,,,,,,,,Y
1,1-1000270393,2013-10-16,N,Red Kite Home Care,,"Hall, Pearl",,0.0,Social Care Org,Community based adult social care services,...,,Y,Y,,,,Y,,,Y


In [14]:
directory.columns.tolist()

['Location ID',
 'HSCA start date',
 'Care home?',
 'Location Name',
 'Telephone Number',
 'Registered manager (note; where there is more than one manager at a location, only one is included here for ease of presentation. The full list is available if required).',
 'Web Address',
 'Care homes beds',
 'Location Type/Sector',
 'Location Primary Inspection Category',
 'Region',
 'Local Authority',
 'Location CCG Code',
 'Location CCG',
 'Street Address',
 'Address Line 2',
 'City',
 'County',
 'Postal Code',
 'Brand ID',
 'Brand Name',
 'Provider Companies House Number',
 'Provider Charity Number',
 'Provider ID',
 'Provider Name',
 'Provider HSCA start date',
 'Provider Primary Inspection Category',
 'Provider - Telephone Number',
 'Provider - Web Address',
 'Provider - Street Address',
 'Provider - Address Line 2',
 'Provider - City',
 'Provider - County',
 'Provider - Postal Code',
 'Provider Nominated Individual Name',
 'Regulated activity - Accommodation for persons who require nursi

In [15]:
#Regulated acvitity
[i.split(' - ')[1] for i in directory.columns if i.startswith('Regulated activity')]

['Accommodation for persons who require nursing or personal care',
 'Accommodation for persons who require treatment for substance misuse',
 'Assessment or medical treatment for persons detained under the Mental Health Act 1983',
 'Diagnostic and screening procedures',
 'Family planning',
 'Management of supply of blood and blood derived products',
 'Maternity and midwifery services',
 'Nursing care',
 'Personal care',
 'Services in slimming clinics',
 'Surgical procedures',
 'Termination of pregnancies',
 'Transport services, triage and medical advice provided remotely',
 'Treatment of disease, disorder or injury']

In [16]:
#Service types
[i.split(' - ')[1] for i in directory.columns if i.startswith('Service type')]

['Acute services with overnight beds',
 'Acute services without overnight beds / listed acute services with or without overnight beds',
 'Ambulance service',
 'Blood and Transplant service',
 'Care home service with nursing',
 'Care home service without nursing',
 'Community based services for people who misuse substances',
 'Community based services for people with a learning disability',
 'Community based services for people with mental health needs',
 'Community health care services',
 'Community health care services',
 'Community healthcare service',
 'Dental service',
 'Diagnostic and/or screening service',
 'Diagnostic and/or screening service',
 'Doctors consultation service',
 'Doctors treatment service',
 'Domiciliary care service',
 'Extra Care housing services',
 'Hospice services',
 'Hospice services at home',
 'Hospital services for people with mental health needs, learning disabilities and problems with substance misuse',
 'Hyperbaric Chamber',
 'Long term conditions serv

In [17]:
#Service user bands
[i.split(' - ')[1] for i in directory.columns if i.startswith('Service user band')]

['Children 0-18 years',
 'Dementia',
 'Learning disabilities or autistic spectrum disorder',
 'Mental Health',
 'Older People',
 'People detained under the Mental Health Act',
 'People who misuse drugs and alcohol',
 'People with an eating disorder',
 'Physical Disability',
 'Sensory Impairment',
 'Whole Population',
 'Younger Adults']

In [18]:
tmp=directory.set_index(['Location ID'])
#If the table exists, replace it, under the assumption we are using a more recent version of the data
tmp.to_sql(con=con, name='directory',if_exists='replace')

  chunksize=chunksize, dtype=dtype)


In [19]:
#We can now run a SQL query over the data
orgcode='1-1000210669'
pd.read_sql_query('SELECT * from {typ} where "Location ID"="{orgcode}"'.format(typ='directory',orgcode=orgcode), con)

Unnamed: 0,Location ID,HSCA start date,Care home?,Location Name,Telephone Number,"Registered manager (note; where there is more than one manager at a location, only one is included here for ease of presentation. The full list is available if required).",Web Address,Care homes beds,Location Type/Sector,Location Primary Inspection Category,...,Service user band - Learning disabilities or autistic spectrum disorder,Service user band - Mental Health,Service user band - Older People,Service user band - People detained under the Mental Health Act,Service user band - People who misuse drugs and alcohol,Service user band - People with an eating disorder,Service user band - Physical Disability,Service user band - Sensory Impairment,Service user band - Whole Population,Service user band - Younger Adults
0,1-1000210669,2013-12-12 00:00:00,Y,Kingswood House Nursing Home,1424716303,"Turner, Patricia Anne",,22.0,Social Care Org,Residential social care,...,,Y,,,,,,,,Y


In [20]:
#Find the most popular brands overall
q='''
SELECT "Brand Name",COUNT(*) as cnt from directory
WHERE "Brand Name" !="-"
GROUP BY "Brand Name"
HAVING cnt > 10
ORDER BY cnt DESC
'''
pd.read_sql_query(q, con).head()

Unnamed: 0,Brand Name,cnt
0,BRAND IDH Dental,595
1,BRAND Acadia,311
2,BRAND Four Seasons Group,304
3,BRAND Oasis Dental Care,301
4,BRAND BUPA Group,296


## Care directory with ratings

The URL always links to the latest file.

http://www.cqc.org.uk/sites/default/files/Latest%20ratings.xlsx

In [21]:
url='http://www.cqc.org.uk/sites/default/files/Latest%20ratings.xlsx'

!rm -r "data/CQC/Latest ratings.xlsx"
#Download the data from the CQC website
!mkdir -p data
!wget -P data/CQC {url}

rm: data/CQC/Latest ratings.xlsx: No such file or directory
--2017-08-07 16:41:59--  http://www.cqc.org.uk/sites/default/files/Latest%20ratings.xlsx
Resolving www.cqc.org.uk... 192.229.233.39
Connecting to www.cqc.org.uk|192.229.233.39|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 19571036 (19M) [application/vnd.openxmlformats-officedocument.spreadsheetml.sheet]
Saving to: 'data/CQC/Latest ratings.xlsx'


2017-08-07 16:42:03 (8.10 MB/s) - 'data/CQC/Latest ratings.xlsx' saved [19571036/19571036]



In [22]:
xl=pd.ExcelFile('data/CQC/HSCA Active Locations.xlsx')
xl.sheet_names

['HSCA Active Locations']

In [23]:
ratings=pd.read_excel('data/CQC/HSCA Active Locations.xlsx',sheetname='HSCA Active Locations',skiprows=6)
ratings.head(2)

Unnamed: 0,Location ID,HSCA start date,Care home?,Location Name,Telephone Number,"Registered manager (note; where there is more than one manager at a location, only one is included here for ease of presentation. The full list is available if required).",Web Address,Care homes beds,Location Type/Sector,Location Primary Inspection Category,...,Service user band - Learning disabilities or autistic spectrum disorder,Service user band - Mental Health,Service user band - Older People,Service user band - People detained under the Mental Health Act,Service user band - People who misuse drugs and alcohol,Service user band - People with an eating disorder,Service user band - Physical Disability,Service user band - Sensory Impairment,Service user band - Whole Population,Service user band - Younger Adults
0,1-1000210669,2013-12-12,Y,Kingswood House Nursing Home,1424716303.0,"Turner, Patricia Anne",,22.0,Social Care Org,Residential social care,...,,Y,,,,,,,,Y
1,1-1000270393,2013-10-16,N,Red Kite Home Care,,"Hall, Pearl",,0.0,Social Care Org,Community based adult social care services,...,,Y,Y,,,,Y,,,Y


In [24]:
tmp=ratings.set_index(['Location ID'])
#If the table exists, replace it, under the assumption we are using a more recent version of the data
tmp.to_sql(con=con, name='ratings',if_exists='replace')

  chunksize=chunksize, dtype=dtype)


In [25]:
#We can now run a SQL query over the data
orgcode='1-1000210669'
pd.read_sql_query('SELECT * from {typ} where "Location ID"="{orgcode}"'.format(typ='ratings',orgcode=orgcode), con)

Unnamed: 0,Location ID,HSCA start date,Care home?,Location Name,Telephone Number,"Registered manager (note; where there is more than one manager at a location, only one is included here for ease of presentation. The full list is available if required).",Web Address,Care homes beds,Location Type/Sector,Location Primary Inspection Category,...,Service user band - Learning disabilities or autistic spectrum disorder,Service user band - Mental Health,Service user band - Older People,Service user band - People detained under the Mental Health Act,Service user band - People who misuse drugs and alcohol,Service user band - People with an eating disorder,Service user band - Physical Disability,Service user band - Sensory Impairment,Service user band - Whole Population,Service user band - Younger Adults
0,1-1000210669,2013-12-12 00:00:00,Y,Kingswood House Nursing Home,1424716303,"Turner, Patricia Anne",,22.0,Social Care Org,Residential social care,...,,Y,,,,,,,,Y
