### 2RP Net - Data Engineer Test

1.	Utilize um repositório Git Local

2.	Realize a extração dos dados dos 3 ultimos meses de prescrição (english-prescribing-data-epd) sem contar o ultimo, fonte: https://opendata.nhsbsa.net/dataset/english-prescribing-data-epd. Existem várias formas para  realizar essa atividade, faça da maneira que preferir. Consulte a documentação disponibilizada na página e veja qual maneira mais adequa a arquitetura que você deseja. A descrição dos dados pode ser vista em https://opendata.nhsbsa.net/dataset/english-prescribing-data-epd/resource/af8dd944-fb82-42c1-a955-646c8866b939 :  
a.	Caso opte por coletar esses dados por csv, se atente ao volume dos dados. 
b.	Caso tenha algum problema em manipular essa quantidade de dados, opte pela coleta dos dados por meio da API limitando a quantidade.

3.	Crie um processo para validação dos dados extraídos

4.	Após a coleta dos dados, separe os dados entre prescribers e prescriptions.

5.	Persista os dados da forma que achar melhor. Exemplo: arquivos, mysql, postgreSQL, sqlite, mongodb, delta, store em cloud, etc.

7.	Faça uma rotina que mensalmente colete os dados do ultimo mes e adiciona apenas os dados que sejam novos. Essa rotina deve rodar automaticamente todos os meses, escolha a forma que preferir para essa atividade.

8.	Documente o máximo possível.

Abaixo segue algumas dicas para ajudar.

1. Codificação
- Utilize as boas praticas de código que julgar necessárias.
- documentação é sempre bem vinda, um código limpo e claro também nem sempre precisa de documentação

2. ReadME.md
- Esclarecer/Explicitar no README.md como utilizar sua aplicação
- Use e abuse de markdown nas explicações.
- Desenho/arquitetura do pipeline (pode usar o site https://draw.io) e colocar as img(s) no diretório "/DOCS"

3. Git/Gitflow
Utilize um repositório Git local e faça o uso da metodologia Gitflow (https://medium.com/trainingcenter/utilizando-o-fluxo-git-flow-e63d5e0d5e04) para cada nova feature implementada. 

6.	Gere scripts que atendam as solicitações abaixo:

In [1]:
# Import package
try:
    import pandas as pd  # Import pandas
    import numpy as np
    import pandera as pa
    import os
    import sys
    import glob
    import time
    import urllib.request
    from urllib.request import urlretrieve
except Exception as e:
    print("Error : {'Falha Imports'} ".format(e))

In [2]:
# Marca o horário do início do programa
s_time_control = time.time()

#### Concaternar arquivos e redução de tamanho em conversão para parquet

In [3]:
os.chdir("/Jupyter/2RP")  

In [4]:
#extension = 'csv'
#all_filenames = [i for i in glob.glob('*.{}'.format(extension))]

In [5]:
#combinar todos os arquivos da lista
# df = pd.concat([pd.read_csv(f) for f in all_filenames ])

In [6]:
df = pd.read_csv('epd_202205.csv')

In [7]:
e_time_dask = time.time()
print("Tempo de criação do arquivo: ", round(e_time_dask-s_time_control)/60, "minutes")

Tempo de criação do arquivo:  1.45 minutes


In [8]:
df.head(3)

Unnamed: 0,YEAR_MONTH,REGIONAL_OFFICE_NAME,REGIONAL_OFFICE_CODE,ICB_NAME,ICB_CODE,PCO_NAME,PCO_CODE,PRACTICE_NAME,PRACTICE_CODE,ADDRESS_1,...,BNF_CODE,BNF_DESCRIPTION,BNF_CHAPTER_PLUS_CODE,QUANTITY,ITEMS,TOTAL_QUANTITY,ADQUSAGE,NIC,ACTUAL_COST,UNIDENTIFIED
0,202206,NORTH WEST,Y62,NHS CHESHIRE AND MERSEYSIDE INTEGRATED C,QYG,WIRRAL COMMUNITY HEALTH AND CARE NHS FOU,RY700,WIRRAL COMMUNITY NMP,Y03836,ST CATHERINE'S HC,...,20020200701,Viscopaste PB7 bandage 7.5cm x 6m,20: Dressings,10.0,1,10.0,0.0,38.9,36.40326,N
1,202206,NORTH WEST,Y62,NHS CHESHIRE AND MERSEYSIDE INTEGRATED C,QYG,WIRRAL COMMUNITY HEALTH AND CARE NHS FOU,RY700,WIRRAL WIC (APH)_WIC APH,N85645,ARROWE PARK HOSPITAL,...,20030100079,Mepore dressing 11cm x 15cm,20: Dressings,5.0,1,5.0,0.0,1.85,1.74307,N
2,202206,NORTH EAST AND YORKSHIRE,Y63,NHS SOUTH YORKSHIRE INTEGRATED CARE BOAR,QF7,NHS NOTTINGHAM AND NOTTINGHAMSHIRE ICB -,02Q00,BASSETLAW HEALTH PARTNERSHIP,Y03762,C/O RETFORD HOSPITAL,...,20030100167,Dressit sterile dressing pack with gloves,20: Dressings,10.0,6,60.0,0.0,41.4,38.7544,N


In [9]:
df.keys()

Index(['YEAR_MONTH', 'REGIONAL_OFFICE_NAME', 'REGIONAL_OFFICE_CODE',
       'ICB_NAME', 'ICB_CODE', 'PCO_NAME', 'PCO_CODE', 'PRACTICE_NAME',
       'PRACTICE_CODE', 'ADDRESS_1', 'ADDRESS_2', 'ADDRESS_3', 'ADDRESS_4',
       'POSTCODE', 'BNF_CHEMICAL_SUBSTANCE', 'CHEMICAL_SUBSTANCE_BNF_DESCR',
       'BNF_CODE', 'BNF_DESCRIPTION', 'BNF_CHAPTER_PLUS_CODE', 'QUANTITY',
       'ITEMS', 'TOTAL_QUANTITY', 'ADQUSAGE', 'NIC', 'ACTUAL_COST',
       'UNIDENTIFIED'],
      dtype='object')

In [10]:
df['REGIONAL_OFFICE_CODE'].unique()

array(['Y62', 'Y63', 'Y58', 'Y59', 'Y61', 'Y56', 'Y60', '-'], dtype=object)

In [11]:
df['REGIONAL_OFFICE_NAME'].unique()

array(['NORTH WEST', 'NORTH EAST AND YORKSHIRE', 'SOUTH WEST',
       'SOUTH EAST', 'EAST OF ENGLAND', 'LONDON', 'MIDLANDS',
       'UNIDENTIFIED'], dtype=object)

In [12]:
df[['REGIONAL_OFFICE_NAME','REGIONAL_OFFICE_CODE']].groupby(['REGIONAL_OFFICE_NAME','REGIONAL_OFFICE_CODE']).head(10)

Unnamed: 0,REGIONAL_OFFICE_NAME,REGIONAL_OFFICE_CODE
0,NORTH WEST,Y62
1,NORTH WEST,Y62
2,NORTH EAST AND YORKSHIRE,Y63
3,NORTH EAST AND YORKSHIRE,Y63
4,NORTH EAST AND YORKSHIRE,Y63
...,...,...
6550461,UNIDENTIFIED,-
6550462,UNIDENTIFIED,-
6550463,UNIDENTIFIED,-
6550464,UNIDENTIFIED,-


a.	Crie um dataframe contendo os 10 principais produtos químicos prescritos por região.

In [13]:
df1 = df.groupby('REGIONAL_OFFICE_NAME').get_group('NORTH WEST')

In [14]:
df_Y62 = df1[['REGIONAL_OFFICE_NAME','REGIONAL_OFFICE_CODE','CHEMICAL_SUBSTANCE_BNF_DESCR', 'TOTAL_QUANTITY']]

In [15]:
df_Y62.groupby(['REGIONAL_OFFICE_NAME','REGIONAL_OFFICE_CODE','CHEMICAL_SUBSTANCE_BNF_DESCR']).TOTAL_QUANTITY.sum().reset_index().sort_values('TOTAL_QUANTITY',ascending=False).head(10)

Unnamed: 0,REGIONAL_OFFICE_NAME,REGIONAL_OFFICE_CODE,CHEMICAL_SUBSTANCE_BNF_DESCR,TOTAL_QUANTITY
384,NORTH WEST,Y62,Enteral nutrition,351905747.0
374,NORTH WEST,Y62,Emollients,47989129.0
839,NORTH WEST,Y62,Other emollient preparations,40477476.0
843,NORTH WEST,Y62,Other food for special diet preparations,34659858.0
31,NORTH WEST,Y62,Alginic acid compound preparations,31421966.0
889,NORTH WEST,Y62,Paracetamol,29887512.0
249,NORTH WEST,Y62,Co-codamol (Codeine phosphate/paracetamol),27837904.0
720,NORTH WEST,Y62,Metformin hydrochloride,22737750.0
86,NORTH WEST,Y62,Atorvastatin,20539876.0
615,NORTH WEST,Y62,Lactulose,19753595.0


In [16]:
df1 = df.groupby('REGIONAL_OFFICE_NAME').get_group('NORTH EAST AND YORKSHIRE')

In [17]:
df_Y63 = df1[['REGIONAL_OFFICE_NAME','REGIONAL_OFFICE_CODE','CHEMICAL_SUBSTANCE_BNF_DESCR', 'TOTAL_QUANTITY']]

In [18]:
df_Y63.groupby(['REGIONAL_OFFICE_NAME','REGIONAL_OFFICE_CODE','CHEMICAL_SUBSTANCE_BNF_DESCR']).TOTAL_QUANTITY.sum().reset_index().sort_values('TOTAL_QUANTITY',ascending=False).head(10)

Unnamed: 0,REGIONAL_OFFICE_NAME,REGIONAL_OFFICE_CODE,CHEMICAL_SUBSTANCE_BNF_DESCR,TOTAL_QUANTITY
397,NORTH EAST AND YORKSHIRE,Y63,Enteral nutrition,313500457.0
386,NORTH EAST AND YORKSHIRE,Y63,Emollients,63289006.0
864,NORTH EAST AND YORKSHIRE,Y63,Other food for special diet preparations,46760120.0
907,NORTH EAST AND YORKSHIRE,Y63,Paracetamol,46449905.0
35,NORTH EAST AND YORKSHIRE,Y63,Alginic acid compound preparations,43973005.0
860,NORTH EAST AND YORKSHIRE,Y63,Other emollient preparations,35078907.0
733,NORTH EAST AND YORKSHIRE,Y63,Metformin hydrochloride,27259048.0
90,NORTH EAST AND YORKSHIRE,Y63,Atorvastatin,26474443.0
737,NORTH EAST AND YORKSHIRE,Y63,Methadone hydrochloride,23839482.0
257,NORTH EAST AND YORKSHIRE,Y63,Co-codamol (Codeine phosphate/paracetamol),22332058.0


In [19]:
df1 = df.groupby('REGIONAL_OFFICE_NAME').get_group('SOUTH WEST')

In [20]:
df_Y58 = df1[['REGIONAL_OFFICE_NAME','REGIONAL_OFFICE_CODE','CHEMICAL_SUBSTANCE_BNF_DESCR', 'TOTAL_QUANTITY']]

In [21]:
df_Y58.groupby(['REGIONAL_OFFICE_NAME','REGIONAL_OFFICE_CODE','CHEMICAL_SUBSTANCE_BNF_DESCR']).TOTAL_QUANTITY.sum().reset_index().sort_values('TOTAL_QUANTITY',ascending=False).head(10)

Unnamed: 0,REGIONAL_OFFICE_NAME,REGIONAL_OFFICE_CODE,CHEMICAL_SUBSTANCE_BNF_DESCR,TOTAL_QUANTITY
366,SOUTH WEST,Y58,Enteral nutrition,117213879.0
355,SOUTH WEST,Y58,Emollients,40226383.0
854,SOUTH WEST,Y58,Paracetamol,25757715.0
815,SOUTH WEST,Y58,Other food for special diet preparations,17120163.5
789,SOUTH WEST,Y58,Omeprazole,15635594.0
811,SOUTH WEST,Y58,Other emollient preparations,15610380.0
31,SOUTH WEST,Y58,Alginic acid compound preparations,15505381.0
81,SOUTH WEST,Y58,Atorvastatin,15399647.0
694,SOUTH WEST,Y58,Metformin hydrochloride,15054532.0
238,SOUTH WEST,Y58,Co-codamol (Codeine phosphate/paracetamol),13046580.0


In [22]:
df1 = df.groupby('REGIONAL_OFFICE_NAME').get_group('SOUTH EAST')

In [23]:
df_Y59 = df1[['REGIONAL_OFFICE_NAME','REGIONAL_OFFICE_CODE','CHEMICAL_SUBSTANCE_BNF_DESCR', 'TOTAL_QUANTITY']]

In [24]:
df_Y59.groupby(['REGIONAL_OFFICE_NAME','REGIONAL_OFFICE_CODE','CHEMICAL_SUBSTANCE_BNF_DESCR']).TOTAL_QUANTITY.sum().reset_index().sort_values('TOTAL_QUANTITY',ascending=False).head(10)

Unnamed: 0,REGIONAL_OFFICE_NAME,REGIONAL_OFFICE_CODE,CHEMICAL_SUBSTANCE_BNF_DESCR,TOTAL_QUANTITY
384,SOUTH EAST,Y59,Enteral nutrition,263444721.0
373,SOUTH EAST,Y59,Emollients,55460558.0
854,SOUTH EAST,Y59,Other food for special diet preparations,38912250.0
733,SOUTH EAST,Y59,Methadone hydrochloride,34309738.0
899,SOUTH EAST,Y59,Paracetamol,26529008.0
850,SOUTH EAST,Y59,Other emollient preparations,25081961.0
729,SOUTH EAST,Y59,Metformin hydrochloride,24957881.0
84,SOUTH EAST,Y59,Atorvastatin,21543886.0
827,SOUTH EAST,Y59,Omeprazole,19197739.0
248,SOUTH EAST,Y59,Co-codamol (Codeine phosphate/paracetamol),19144994.0


In [25]:
df1 = df.groupby('REGIONAL_OFFICE_NAME').get_group('EAST OF ENGLAND')

In [26]:
df_Y61 = df1[['REGIONAL_OFFICE_NAME','REGIONAL_OFFICE_CODE','CHEMICAL_SUBSTANCE_BNF_DESCR', 'TOTAL_QUANTITY']]

In [27]:
df_Y61.groupby(['REGIONAL_OFFICE_NAME','REGIONAL_OFFICE_CODE','CHEMICAL_SUBSTANCE_BNF_DESCR']).TOTAL_QUANTITY.sum().reset_index().sort_values('TOTAL_QUANTITY',ascending=False).head(10)

Unnamed: 0,REGIONAL_OFFICE_NAME,REGIONAL_OFFICE_CODE,CHEMICAL_SUBSTANCE_BNF_DESCR,TOTAL_QUANTITY
384,EAST OF ENGLAND,Y61,Enteral nutrition,193741913.0
373,EAST OF ENGLAND,Y61,Emollients,41260874.0
848,EAST OF ENGLAND,Y61,Other food for special diet preparations,25345743.0
721,EAST OF ENGLAND,Y61,Metformin hydrochloride,19476258.0
889,EAST OF ENGLAND,Y61,Paracetamol,19237946.0
845,EAST OF ENGLAND,Y61,Other emollient preparations,17486945.0
84,EAST OF ENGLAND,Y61,Atorvastatin,17441278.0
247,EAST OF ENGLAND,Y61,Co-codamol (Codeine phosphate/paracetamol),15366863.0
33,EAST OF ENGLAND,Y61,Alginic acid compound preparations,13453583.0
613,EAST OF ENGLAND,Y61,Lactulose,12858787.0


In [28]:
df1 = df.groupby('REGIONAL_OFFICE_NAME').get_group('LONDON')

In [29]:
df_Y56 = df1[['REGIONAL_OFFICE_NAME','REGIONAL_OFFICE_CODE','CHEMICAL_SUBSTANCE_BNF_DESCR', 'TOTAL_QUANTITY']]

In [30]:
df_Y56.groupby(['REGIONAL_OFFICE_NAME','REGIONAL_OFFICE_CODE','CHEMICAL_SUBSTANCE_BNF_DESCR']).TOTAL_QUANTITY.sum().reset_index().sort_values('TOTAL_QUANTITY',ascending=False).head(10)

Unnamed: 0,REGIONAL_OFFICE_NAME,REGIONAL_OFFICE_CODE,CHEMICAL_SUBSTANCE_BNF_DESCR,TOTAL_QUANTITY
387,LONDON,Y56,Enteral nutrition,290916061.0
375,LONDON,Y56,Emollients,72505881.0
865,LONDON,Y56,Other emollient preparations,32726201.0
738,LONDON,Y56,Metformin hydrochloride,31774176.0
869,LONDON,Y56,Other food for special diet preparations,26505017.0
84,LONDON,Y56,Atorvastatin,20265576.0
742,LONDON,Y56,Methadone hydrochloride,18058059.0
32,LONDON,Y56,Alginic acid compound preparations,16876288.0
913,LONDON,Y56,Paracetamol,16106571.0
56,LONDON,Y56,Amlodipine,14650890.0


In [31]:
df1 = df.groupby('REGIONAL_OFFICE_NAME').get_group('MIDLANDS')

In [32]:
df_indef = df1[['REGIONAL_OFFICE_NAME','REGIONAL_OFFICE_CODE','CHEMICAL_SUBSTANCE_BNF_DESCR', 'TOTAL_QUANTITY']]

In [33]:
df_indef.groupby(['REGIONAL_OFFICE_NAME','REGIONAL_OFFICE_CODE','CHEMICAL_SUBSTANCE_BNF_DESCR']).TOTAL_QUANTITY.sum().reset_index().sort_values('TOTAL_QUANTITY',ascending=False).head(10)

Unnamed: 0,REGIONAL_OFFICE_NAME,REGIONAL_OFFICE_CODE,CHEMICAL_SUBSTANCE_BNF_DESCR,TOTAL_QUANTITY
391,MIDLANDS,Y60,Enteral nutrition,411086823.0
380,MIDLANDS,Y60,Emollients,83325531.0
869,MIDLANDS,Y60,Other food for special diet preparations,57877234.0
865,MIDLANDS,Y60,Other emollient preparations,45305436.0
915,MIDLANDS,Y60,Paracetamol,44579438.0
741,MIDLANDS,Y60,Metformin hydrochloride,35686882.0
34,MIDLANDS,Y60,Alginic acid compound preparations,34810676.0
257,MIDLANDS,Y60,Co-codamol (Codeine phosphate/paracetamol),33297832.0
87,MIDLANDS,Y60,Atorvastatin,29709322.0
633,MIDLANDS,Y60,Lactulose,26155255.0


b.	Quais produtos químicos prescritos tiveram a maior somatória de custos por mês? Liste os 10 primeiros.

In [34]:
df.groupby("CHEMICAL_SUBSTANCE_BNF_DESCR").ACTUAL_COST.sum().reset_index().sort_values('ACTUAL_COST',ascending=False).head(10)

Unnamed: 0,CHEMICAL_SUBSTANCE_BNF_DESCR,ACTUAL_COST
77,Apixaban,33007440.0
439,Enteral nutrition,24362040.0
114,Beclometasone dipropionate,23779420.0
1176,Rivaroxaban,18781050.0
208,Catheters,12954540.0
352,Detection Sensor Interstitial Fluid/Gluc,10911130.0
164,Budesonide,9732518.0
1421,Wound Management & Other Dressings,9588992.0
574,Glucose blood testing reagents,9230607.0
419,Edoxaban,9123207.0


c.	Quais são as precrições mais comuns? 

In [35]:
df[['BNF_DESCRIPTION','TOTAL_QUANTITY']].groupby('BNF_DESCRIPTION').sum().sort_values(by='TOTAL_QUANTITY', ascending=False).head(10)

Unnamed: 0_level_0,TOTAL_QUANTITY
BNF_DESCRIPTION,Unnamed: 1_level_1
Ensure Plus milkshake style liquid (9 flavours),187541000.0
Paracetamol 500mg tablets,152879229.0
Fortisip Bottle (8 flavours),151562000.0
Fortisip Compact liquid (8 flavours),139502750.0
Ensure Compact liquid (4 flavours),124729528.0
Lactulose 3.1-3.7g/5ml oral solution,115093595.0
Metformin 500mg tablets,111419337.0
Fortisip Compact Protein liquid (9 flavours),107179750.0
Omeprazole 20mg gastro-resistant capsules,99700161.0
Dermol 500 lotion,83283500.0


d.	Qual produto químico é mais prescrito por cada prescriber?

In [36]:
df_pres = df[['PRACTICE_NAME','CHEMICAL_SUBSTANCE_BNF_DESCR','TOTAL_QUANTITY']]

In [37]:
df_asc = df_pres.groupby(['PRACTICE_NAME','CHEMICAL_SUBSTANCE_BNF_DESCR']).TOTAL_QUANTITY.sum().reset_index().sort_values('TOTAL_QUANTITY',ascending=False)

In [38]:
df_prescriber = df_asc.drop_duplicates('PRACTICE_NAME')

In [39]:
df_prescriber.sort_values('TOTAL_QUANTITY',ascending=False)

Unnamed: 0,PRACTICE_NAME,CHEMICAL_SUBSTANCE_BNF_DESCR,TOTAL_QUANTITY
1548621,MEDICUS HEALTH PARTNERS,Enteral nutrition,4253632.0
443167,CGL BIRMINGHAM SOUTH,Methadone hydrochloride,3970225.0
1600962,MODALITY PARTNERSHIP (AWC),Enteral nutrition,3949145.0
1573996,MIDLANDS MEDICAL PARTNERSHIP,Enteral nutrition,3365624.0
1812476,PARK SURGERY,Enteral nutrition,3291582.0
...,...,...,...
2023019,ROTHERHAM DISTRICT NURSING,Eye Products,1.0
2356363,TAMWORTH LOCALITY NETWORK,Salbutamol,1.0
995523,GPSI CYSTOSCOPY - IDLE,Testosterone esters,1.0
1727755,NT&H COMM DIAB N DDES,Glucagon,1.0


e.	Quantos prescribers foram adicionados no ultimo mês? 

#### Precribers são os que prescrevem - médico ou hostipal, Enquanto, Prescriptions - prescrições as receitas realizei a análise pela ótica das prescrições

Realizei a analise pela a otica dos precriber que no caso são os centros de saúde e não houve aumento de unidades ou seja o número é fixo de prescribers. 

In [40]:
df_202207 = pd.read_csv('epd_202207.csv')

In [41]:
df_prescribers = df_202207[['PRACTICE_NAME']]

In [42]:
print(df_prescribers.groupby('PRACTICE_NAME').PRACTICE_NAME.sum().count().sum(),'Centros de prescrições - Centros de tratamentos - Hospitais')

8387 Centros de prescrições - Centros de tratamentos - Hospitais


In [43]:
df_prescriptions = df_202207[['BNF_DESCRIPTION']]

In [44]:
print(df_prescriptions.groupby('BNF_DESCRIPTION').BNF_DESCRIPTION.count().sum(),'prescrições realizadas')

17603900 prescrições realizadas


f.	Quais prescribers atuam em mais de uma região? Ordene por quantidade de regiões antendidas.

In [109]:
df_region = df[['PRACTICE_NAME','PRACTICE_CODE','REGIONAL_OFFICE_NAME']]

In [110]:
df_region

Unnamed: 0,PRACTICE_NAME,PRACTICE_CODE,REGIONAL_OFFICE_NAME
0,WIRRAL COMMUNITY NMP,Y03836,NORTH WEST
1,WIRRAL WIC (APH)_WIC APH,N85645,NORTH WEST
2,BASSETLAW HEALTH PARTNERSHIP,Y03762,NORTH EAST AND YORKSHIRE
3,BASSETLAW HEALTH PARTNERSHIP,Y03762,NORTH EAST AND YORKSHIRE
4,BASSETLAW HEALTH PARTNERSHIP,Y03762,NORTH EAST AND YORKSHIRE
...,...,...,...
17603895,THE RISE GROUP PRACTICE,F83039,LONDON
17603896,THE RISE GROUP PRACTICE,F83039,LONDON
17603897,THE RISE GROUP PRACTICE,F83039,LONDON
17603898,THE RISE GROUP PRACTICE,F83039,LONDON


In [111]:
df_region.value_counts()

PRACTICE_NAME                            PRACTICE_CODE  REGIONAL_OFFICE_NAME    
MODALITY PARTNERSHIP (AWC)               B83033         NORTH EAST AND YORKSHIRE    11683
MEDICUS HEALTH PARTNERS                  F85002         LONDON                      10067
MIDLANDS MEDICAL PARTNERSHIP             M85063         MIDLANDS                     9982
SHORE MEDICAL                            J81012         SOUTH WEST                   8502
PORTSDOWN GROUP PRACTICE                 J82155         SOUTH EAST                   8232
                                                                                    ...  
TRAFFORD DRUG SERVICE                    Y00480         NORTH WEST                      1
HAWTHORNS SPINAL UNIT                    Y03574         NORTH EAST AND YORKSHIRE        1
ERP EAST                                 Y01701         NORTH EAST AND YORKSHIRE        1
CHILD & FAMILY COMMUNITY PAEDIATRICIANS  Y05299         MIDLANDS                        1
HMP EAST SUTTON PAR

In [112]:
df_region.drop_duplicates()

Unnamed: 0,PRACTICE_NAME,PRACTICE_CODE,REGIONAL_OFFICE_NAME
0,WIRRAL COMMUNITY NMP,Y03836,NORTH WEST
1,WIRRAL WIC (APH)_WIC APH,N85645,NORTH WEST
2,BASSETLAW HEALTH PARTNERSHIP,Y03762,NORTH EAST AND YORKSHIRE
34,GP APH OOH,N85638,NORTH WEST
41,SEVERNSIDE MEDICAL PRACTICE,L84052,SOUTH WEST
...,...,...,...
17582508,THE PRACTICE AT 188,E83027,LONDON
17589495,QUEEN STREET SURGERY,B87600,NORTH EAST AND YORKSHIRE
17594786,GP LED HEALTH CENTRE,Y02854,NORTH WEST
17595508,THE REGENTS PARK PRACTICE,F83025,LONDON


In [114]:
df_region.groupby(['PRACTICE_NAME','PRACTICE_CODE','REGIONAL_OFFICE_NAME']).sum().head(8690)

PRACTICE_NAME,PRACTICE_CODE,REGIONAL_OFFICE_NAME
(FRACTURE CLINIC) NORTH OOH,Y00082,MIDLANDS
(IRLAM) SALFORD CARE CTRS MEDICAL PRACTI,P87657,NORTH WEST
(OUT PATIENT DEPARTMENT) NORTH OOH,Y00234,MIDLANDS
0-19 EAST CHESHIRE HEALTH VISITORS,Y05381,NORTH WEST
0-19 PUBLIC HEALTH SERVICE HARTLEPOOL,Y04082,NORTH EAST AND YORKSHIRE
...,...,...
WIGSTON CENTRAL SURGERY,C82071,MIDLANDS
WIGTON GROUP MEDICAL PRACTICE,A82045,NORTH EAST AND YORKSHIRE
WILBERFORCE SURGERY,B81032,NORTH EAST AND YORKSHIRE
WILBRAHAM SURGERY,P84071,NORTH WEST


In [115]:
df_region.groupby(['PRACTICE_NAME','PRACTICE_CODE','REGIONAL_OFFICE_NAME']).count().head(8690)

PRACTICE_NAME,PRACTICE_CODE,REGIONAL_OFFICE_NAME
(FRACTURE CLINIC) NORTH OOH,Y00082,MIDLANDS
(IRLAM) SALFORD CARE CTRS MEDICAL PRACTI,P87657,NORTH WEST
(OUT PATIENT DEPARTMENT) NORTH OOH,Y00234,MIDLANDS
0-19 EAST CHESHIRE HEALTH VISITORS,Y05381,NORTH WEST
0-19 PUBLIC HEALTH SERVICE HARTLEPOOL,Y04082,NORTH EAST AND YORKSHIRE
...,...,...
WIGSTON CENTRAL SURGERY,C82071,MIDLANDS
WIGTON GROUP MEDICAL PRACTICE,A82045,NORTH EAST AND YORKSHIRE
WILBERFORCE SURGERY,B81032,NORTH EAST AND YORKSHIRE
WILBRAHAM SURGERY,P84071,NORTH WEST


In [116]:
df_pract = df_region.groupby(['PRACTICE_CODE']).count()

In [53]:
df_pract

Unnamed: 0_level_0,REGIONAL_OFFICE_NAME
PRACTICE_NAME,Unnamed: 1_level_1
(FRACTURE CLINIC) NORTH OOH,1188
(IRLAM) SALFORD CARE CTRS MEDICAL PRACTI,1633
(OUT PATIENT DEPARTMENT) NORTH OOH,33
0-19 EAST CHESHIRE HEALTH VISITORS,1
0-19 PUBLIC HEALTH SERVICE HARTLEPOOL,2
...,...
YOUR HEALTHCARE NON MED PRES,269
YOXALL,1837
ZAIN MEDICAL CENTRE,1060
ZAMAN,2223


In [130]:
df_pract.groupby(['PRACTICE_NAME']).sum().count().sum()

3646

In [131]:
df_duplic = df_pract.sort_values('PRACTICE_NAME',ascending=True).duplicated()

In [132]:
df_duplic

PRACTICE_CODE
Y07362    False
Y06616     True
Y03941     True
Y06635     True
Y06638     True
          ...  
J82155    False
J81012    False
M85063    False
F85002    False
B83033    False
Length: 8927, dtype: bool

In [133]:
df_duplic.sum()

5281

In [134]:
df_duplic.count()

8927

In [135]:
df_region.drop_duplicates()

Unnamed: 0,PRACTICE_NAME,PRACTICE_CODE,REGIONAL_OFFICE_NAME
0,WIRRAL COMMUNITY NMP,Y03836,NORTH WEST
1,WIRRAL WIC (APH)_WIC APH,N85645,NORTH WEST
2,BASSETLAW HEALTH PARTNERSHIP,Y03762,NORTH EAST AND YORKSHIRE
34,GP APH OOH,N85638,NORTH WEST
41,SEVERNSIDE MEDICAL PRACTICE,L84052,SOUTH WEST
...,...,...,...
17582508,THE PRACTICE AT 188,E83027,LONDON
17589495,QUEEN STREET SURGERY,B87600,NORTH EAST AND YORKSHIRE
17594786,GP LED HEALTH CENTRE,Y02854,NORTH WEST
17595508,THE REGENTS PARK PRACTICE,F83025,LONDON


In [136]:
# Imprime a(s) linha(s) duplicadas
df_region[df_region.duplicated()]

Unnamed: 0,PRACTICE_NAME,PRACTICE_CODE,REGIONAL_OFFICE_NAME
3,BASSETLAW HEALTH PARTNERSHIP,Y03762,NORTH EAST AND YORKSHIRE
4,BASSETLAW HEALTH PARTNERSHIP,Y03762,NORTH EAST AND YORKSHIRE
5,BASSETLAW HEALTH PARTNERSHIP,Y03762,NORTH EAST AND YORKSHIRE
6,BASSETLAW HEALTH PARTNERSHIP,Y03762,NORTH EAST AND YORKSHIRE
7,BASSETLAW HEALTH PARTNERSHIP,Y03762,NORTH EAST AND YORKSHIRE
...,...,...,...
17603895,THE RISE GROUP PRACTICE,F83039,LONDON
17603896,THE RISE GROUP PRACTICE,F83039,LONDON
17603897,THE RISE GROUP PRACTICE,F83039,LONDON
17603898,THE RISE GROUP PRACTICE,F83039,LONDON


In [137]:
# Imprime a(s) linha(s) duplicadas
df_region[df_region.duplicated()].drop_duplicates()

Unnamed: 0,PRACTICE_NAME,PRACTICE_CODE,REGIONAL_OFFICE_NAME
3,BASSETLAW HEALTH PARTNERSHIP,Y03762,NORTH EAST AND YORKSHIRE
29,WIRRAL WIC (APH)_WIC APH,N85645,NORTH WEST
31,WIRRAL COMMUNITY NMP,Y03836,NORTH WEST
42,SEVERNSIDE MEDICAL PRACTICE,L84052,SOUTH WEST
45,SEVERNBANK SURGERY,L84085,SOUTH WEST
...,...,...,...
17582509,THE PRACTICE AT 188,E83027,LONDON
17589506,QUEEN STREET SURGERY,B87600,NORTH EAST AND YORKSHIRE
17594911,GP LED HEALTH CENTRE,Y02854,NORTH WEST
17595509,THE REGENTS PARK PRACTICE,F83025,LONDON


In [138]:
# Imprime a(s) linha(s) duplicadas
df_region[df_region.duplicated()].drop_duplicates()

Unnamed: 0,PRACTICE_NAME,PRACTICE_CODE,REGIONAL_OFFICE_NAME
3,BASSETLAW HEALTH PARTNERSHIP,Y03762,NORTH EAST AND YORKSHIRE
29,WIRRAL WIC (APH)_WIC APH,N85645,NORTH WEST
31,WIRRAL COMMUNITY NMP,Y03836,NORTH WEST
42,SEVERNSIDE MEDICAL PRACTICE,L84052,SOUTH WEST
45,SEVERNBANK SURGERY,L84085,SOUTH WEST
...,...,...,...
17582509,THE PRACTICE AT 188,E83027,LONDON
17589506,QUEEN STREET SURGERY,B87600,NORTH EAST AND YORKSHIRE
17594911,GP LED HEALTH CENTRE,Y02854,NORTH WEST
17595509,THE REGENTS PARK PRACTICE,F83025,LONDON


In [139]:
# Imprime a(s) linha(s) duplicadas
df_user = df_region[df_region.duplicated()].drop_duplicates()

In [182]:
df_drop = df_user[df_user['PRACTICE_NAME'].duplicated()].sort_values('PRACTICE_NAME',ascending=False)

In [183]:
df_drop[df_drop.duplicated()]

Unnamed: 0,PRACTICE_NAME,PRACTICE_CODE,REGIONAL_OFFICE_NAME,Users


In [191]:
df_drop.groupby(['REGIONAL_OFFICE_NAME','PRACTICE_NAME']).PRACTICE_NAME.size().sort_values(ascending=True)

REGIONAL_OFFICE_NAME      PRACTICE_NAME                  
EAST OF ENGLAND           ABBEY ROAD SURGERY                  1
NORTH WEST                HIGH STREET SURGERY                 1
                          HIGH STREET MEDICAL CENTRE          1
                          HEART FAILURE SERVICE               1
                          HEART FAILURE COMMUNITY SERVICE     1
                                                             ..
SOUTH EAST                UNIDENTIFIED DOCTORS               12
EAST OF ENGLAND           UNIDENTIFIED DOCTORS               12
MIDLANDS                  UNIDENTIFIED DOCTORS               15
NORTH EAST AND YORKSHIRE  UNIDENTIFIED DOCTORS               17
NORTH WEST                UNIDENTIFIED DOCTORS               23
Name: PRACTICE_NAME, Length: 383, dtype: int64

In [199]:
df_drop.groupby(['REGIONAL_OFFICE_NAME','PRACTICE_NAME']).PRACTICE_NAME.size().reset_index(name='SIZE').sort_values('REGIONAL_OFFICE_NAME',ascending=True)

Unnamed: 0,REGIONAL_OFFICE_NAME,PRACTICE_NAME,SIZE
0,EAST OF ENGLAND,ABBEY ROAD SURGERY,1
21,EAST OF ENGLAND,PARK MEDICAL CENTRE,1
22,EAST OF ENGLAND,PARKFIELD MEDICAL CENTRE,1
23,EAST OF ENGLAND,PARKSIDE MEDICAL CENTRE,1
24,EAST OF ENGLAND,PRIORY MEDICAL CENTRE,1
...,...,...,...
360,SOUTH WEST,CAMHS,1
359,SOUTH WEST,BIRCHWOOD MEDICAL PRACTICE,1
381,SOUTH WEST,WHITE HOUSE SURGERY,1
369,SOUTH WEST,PARK LANE PRACTICE,1


In [197]:
df_size = df_drop.groupby(['REGIONAL_OFFICE_NAME','PRACTICE_NAME']).PRACTICE_NAME.size().reset_index(name='SIZE').sort_values('PRACTICE_NAME',ascending=True)

In [198]:
df_size

Unnamed: 0,REGIONAL_OFFICE_NAME,PRACTICE_NAME,SIZE
98,MIDLANDS,ABBEY MEDICAL CENTRE,2
38,LONDON,ABBEY MEDICAL CENTRE,1
305,SOUTH EAST,ABBEY MEDICAL CENTRE,1
99,MIDLANDS,ABBEY MEDICAL PRACTICE,2
0,EAST OF ENGLAND,ABBEY ROAD SURGERY,1
...,...,...,...
172,MIDLANDS,WOODLANDS SURGERY,1
97,LONDON,WOODLANDS SURGERY,2
37,EAST OF ENGLAND,WOODLANDS SURGERY,1
304,NORTH WEST,WOODSIDE MEDICAL CENTRE,1


Index(['YEAR_MONTH', 'REGIONAL_OFFICE_NAME', 'REGIONAL_OFFICE_CODE',
       'ICB_NAME', 'ICB_CODE', 'PCO_NAME', 'PCO_CODE', 'PRACTICE_NAME',
       'PRACTICE_CODE', 'ADDRESS_1', 'ADDRESS_2', 'ADDRESS_3', 'ADDRESS_4',
       'POSTCODE', 'BNF_CHEMICAL_SUBSTANCE', 'CHEMICAL_SUBSTANCE_BNF_DESCR',
       'BNF_CODE', 'BNF_DESCRIPTION', 'BNF_CHAPTER_PLUS_CODE', 'QUANTITY',
       'ITEMS', 'TOTAL_QUANTITY', 'ADQUSAGE', 'NIC', 'ACTUAL_COST',
       'UNIDENTIFIED'],
      dtype='object')

g.	Qual o preço médio dos químicos prescritos em no ultimo mês coletado?

In [63]:
df_202207.head(3)

Unnamed: 0,YEAR_MONTH,REGIONAL_OFFICE_NAME,REGIONAL_OFFICE_CODE,ICB_NAME,ICB_CODE,PCO_NAME,PCO_CODE,PRACTICE_NAME,PRACTICE_CODE,ADDRESS_1,...,BNF_CODE,BNF_DESCRIPTION,BNF_CHAPTER_PLUS_CODE,QUANTITY,ITEMS,TOTAL_QUANTITY,ADQUSAGE,NIC,ACTUAL_COST,UNIDENTIFIED
0,202206,NORTH WEST,Y62,NHS CHESHIRE AND MERSEYSIDE INTEGRATED C,QYG,WIRRAL COMMUNITY HEALTH AND CARE NHS FOU,RY700,WIRRAL COMMUNITY NMP,Y03836,ST CATHERINE'S HC,...,20020200701,Viscopaste PB7 bandage 7.5cm x 6m,20: Dressings,10.0,1,10.0,0.0,38.9,36.40326,N
1,202206,NORTH WEST,Y62,NHS CHESHIRE AND MERSEYSIDE INTEGRATED C,QYG,WIRRAL COMMUNITY HEALTH AND CARE NHS FOU,RY700,WIRRAL WIC (APH)_WIC APH,N85645,ARROWE PARK HOSPITAL,...,20030100079,Mepore dressing 11cm x 15cm,20: Dressings,5.0,1,5.0,0.0,1.85,1.74307,N
2,202206,NORTH EAST AND YORKSHIRE,Y63,NHS SOUTH YORKSHIRE INTEGRATED CARE BOAR,QF7,NHS NOTTINGHAM AND NOTTINGHAMSHIRE ICB -,02Q00,BASSETLAW HEALTH PARTNERSHIP,Y03762,C/O RETFORD HOSPITAL,...,20030100167,Dressit sterile dressing pack with gloves,20: Dressings,10.0,6,60.0,0.0,41.4,38.7544,N


In [64]:
df_202207['TOTAL_COST'] = df_202207['TOTAL_QUANTITY'] * df_202207['ACTUAL_COST']

In [65]:
df_202207.head(3)

Unnamed: 0,YEAR_MONTH,REGIONAL_OFFICE_NAME,REGIONAL_OFFICE_CODE,ICB_NAME,ICB_CODE,PCO_NAME,PCO_CODE,PRACTICE_NAME,PRACTICE_CODE,ADDRESS_1,...,BNF_DESCRIPTION,BNF_CHAPTER_PLUS_CODE,QUANTITY,ITEMS,TOTAL_QUANTITY,ADQUSAGE,NIC,ACTUAL_COST,UNIDENTIFIED,TOTAL_COST
0,202206,NORTH WEST,Y62,NHS CHESHIRE AND MERSEYSIDE INTEGRATED C,QYG,WIRRAL COMMUNITY HEALTH AND CARE NHS FOU,RY700,WIRRAL COMMUNITY NMP,Y03836,ST CATHERINE'S HC,...,Viscopaste PB7 bandage 7.5cm x 6m,20: Dressings,10.0,1,10.0,0.0,38.9,36.40326,N,364.0326
1,202206,NORTH WEST,Y62,NHS CHESHIRE AND MERSEYSIDE INTEGRATED C,QYG,WIRRAL COMMUNITY HEALTH AND CARE NHS FOU,RY700,WIRRAL WIC (APH)_WIC APH,N85645,ARROWE PARK HOSPITAL,...,Mepore dressing 11cm x 15cm,20: Dressings,5.0,1,5.0,0.0,1.85,1.74307,N,8.71535
2,202206,NORTH EAST AND YORKSHIRE,Y63,NHS SOUTH YORKSHIRE INTEGRATED CARE BOAR,QF7,NHS NOTTINGHAM AND NOTTINGHAMSHIRE ICB -,02Q00,BASSETLAW HEALTH PARTNERSHIP,Y03762,C/O RETFORD HOSPITAL,...,Dressit sterile dressing pack with gloves,20: Dressings,10.0,6,60.0,0.0,41.4,38.7544,N,2325.264


In [66]:
"£{:,.2f}".format(df_202207['TOTAL_COST'].mean())

'£90,143.95'

h.	Gere uma tabela que contenha apenas a prescrição de maior valor de cada usuário.

In [67]:
df[['PRACTICE_NAME', 'ACTUAL_COST']].groupby('PRACTICE_NAME').max().head(8387)

Unnamed: 0_level_0,ACTUAL_COST
PRACTICE_NAME,Unnamed: 1_level_1
(FRACTURE CLINIC) NORTH OOH,1183.65046
(IRLAM) SALFORD CARE CTRS MEDICAL PRACTI,1294.30272
(OUT PATIENT DEPARTMENT) NORTH OOH,214.10748
0-19 EAST CHESHIRE HEALTH VISITORS,4.20343
0-19 PUBLIC HEALTH SERVICE HARTLEPOOL,5.46635
...,...
YOUR HEALTHCARE NON MED PRES,293.46451
YOXALL,2588.08465
ZAIN MEDICAL CENTRE,1243.10165
ZAMAN,2090.79671
