### 2RP Net - Data Engineer Test

1.	Utilize um repositório Git Local

2.	Realize a extração dos dados dos 3 ultimos meses de prescrição (english-prescribing-data-epd) sem contar o ultimo, fonte: https://opendata.nhsbsa.net/dataset/english-prescribing-data-epd. Existem várias formas para  realizar essa atividade, faça da maneira que preferir. Consulte a documentação disponibilizada na página e veja qual maneira mais adequa a arquitetura que você deseja. A descrição dos dados pode ser vista em https://opendata.nhsbsa.net/dataset/english-prescribing-data-epd/resource/af8dd944-fb82-42c1-a955-646c8866b939 :  
a.	Caso opte por coletar esses dados por csv, se atente ao volume dos dados. 
b.	Caso tenha algum problema em manipular essa quantidade de dados, opte pela coleta dos dados por meio da API limitando a quantidade.

3.	Crie um processo para validação dos dados extraídos

4.	Após a coleta dos dados, separe os dados entre prescribers e prescriptions.

5.	Persista os dados da forma que achar melhor. Exemplo: arquivos, mysql, postgreSQL, sqlite, mongodb, delta, store em cloud, etc.

7.	Faça uma rotina que mensalmente colete os dados do ultimo mes e adiciona apenas os dados que sejam novos. Essa rotina deve rodar automaticamente todos os meses, escolha a forma que preferir para essa atividade.

8.	Documente o máximo possível.

Abaixo segue algumas dicas para ajudar.

1. Codificação
- Utilize as boas praticas de código que julgar necessárias.
- documentação é sempre bem vinda, um código limpo e claro também nem sempre precisa de documentação

2. ReadME.md
- Esclarecer/Explicitar no README.md como utilizar sua aplicação
- Use e abuse de markdown nas explicações.
- Desenho/arquitetura do pipeline (pode usar o site https://draw.io) e colocar as img(s) no diretório "/DOCS"

3. Git/Gitflow
Utilize um repositório Git local e faça o uso da metodologia Gitflow (https://medium.com/trainingcenter/utilizando-o-fluxo-git-flow-e63d5e0d5e04) para cada nova feature implementada. 

6.	Gere scripts que atendam as solicitações abaixo:

In [1]:
# Import package
try:
    import pandas as pd  # Import pandas
    import numpy as np
    import pandera as pa
    import os
    import sys
    import glob
    import time
    import urllib.request
    from urllib.request import urlretrieve
except Exception as e:
    print("Error : {'Falha Imports'} ".format(e))

In [2]:
# Marca o horário do início do programa
s_time_control = time.time()

#### Concaternar arquivos e redução de tamanho em conversão para parquet

In [3]:
os.chdir("/Jupyter/2RP")  

In [4]:
#extension = 'csv'
#all_filenames = [i for i in glob.glob('*.{}'.format(extension))]

In [5]:
#combinar todos os arquivos da lista
#df = pd.concat([pd.read_csv(f) for f in all_filenames ])

In [6]:
df = pd.read_csv('epd_202205.csv')

In [7]:
e_time_dask = time.time()
print("Tempo de criação do arquivo: ", round(e_time_dask-s_time_control)/60, "minutes")

Tempo de criação do arquivo:  1.45 minutes


In [8]:
df.head(3)

Unnamed: 0,YEAR_MONTH,REGIONAL_OFFICE_NAME,REGIONAL_OFFICE_CODE,ICB_NAME,ICB_CODE,PCO_NAME,PCO_CODE,PRACTICE_NAME,PRACTICE_CODE,ADDRESS_1,...,BNF_CODE,BNF_DESCRIPTION,BNF_CHAPTER_PLUS_CODE,QUANTITY,ITEMS,TOTAL_QUANTITY,ADQUSAGE,NIC,ACTUAL_COST,UNIDENTIFIED
0,202206,NORTH WEST,Y62,NHS CHESHIRE AND MERSEYSIDE INTEGRATED C,QYG,WIRRAL COMMUNITY HEALTH AND CARE NHS FOU,RY700,WIRRAL COMMUNITY NMP,Y03836,ST CATHERINE'S HC,...,20020200701,Viscopaste PB7 bandage 7.5cm x 6m,20: Dressings,10.0,1,10.0,0.0,38.9,36.40326,N
1,202206,NORTH WEST,Y62,NHS CHESHIRE AND MERSEYSIDE INTEGRATED C,QYG,WIRRAL COMMUNITY HEALTH AND CARE NHS FOU,RY700,WIRRAL WIC (APH)_WIC APH,N85645,ARROWE PARK HOSPITAL,...,20030100079,Mepore dressing 11cm x 15cm,20: Dressings,5.0,1,5.0,0.0,1.85,1.74307,N
2,202206,NORTH EAST AND YORKSHIRE,Y63,NHS SOUTH YORKSHIRE INTEGRATED CARE BOAR,QF7,NHS NOTTINGHAM AND NOTTINGHAMSHIRE ICB -,02Q00,BASSETLAW HEALTH PARTNERSHIP,Y03762,C/O RETFORD HOSPITAL,...,20030100167,Dressit sterile dressing pack with gloves,20: Dressings,10.0,6,60.0,0.0,41.4,38.7544,N


In [9]:
df.keys()

Index(['YEAR_MONTH', 'REGIONAL_OFFICE_NAME', 'REGIONAL_OFFICE_CODE',
       'ICB_NAME', 'ICB_CODE', 'PCO_NAME', 'PCO_CODE', 'PRACTICE_NAME',
       'PRACTICE_CODE', 'ADDRESS_1', 'ADDRESS_2', 'ADDRESS_3', 'ADDRESS_4',
       'POSTCODE', 'BNF_CHEMICAL_SUBSTANCE', 'CHEMICAL_SUBSTANCE_BNF_DESCR',
       'BNF_CODE', 'BNF_DESCRIPTION', 'BNF_CHAPTER_PLUS_CODE', 'QUANTITY',
       'ITEMS', 'TOTAL_QUANTITY', 'ADQUSAGE', 'NIC', 'ACTUAL_COST',
       'UNIDENTIFIED'],
      dtype='object')

In [10]:
df['REGIONAL_OFFICE_CODE'].unique()

array(['Y62', 'Y63', 'Y58', 'Y59', 'Y61', 'Y56', 'Y60', '-'], dtype=object)

In [11]:
df['REGIONAL_OFFICE_NAME'].unique()

array(['NORTH WEST', 'NORTH EAST AND YORKSHIRE', 'SOUTH WEST',
       'SOUTH EAST', 'EAST OF ENGLAND', 'LONDON', 'MIDLANDS',
       'UNIDENTIFIED'], dtype=object)

In [12]:
df[['REGIONAL_OFFICE_NAME','REGIONAL_OFFICE_CODE']].groupby(['REGIONAL_OFFICE_NAME','REGIONAL_OFFICE_CODE']).head(10)

Unnamed: 0,REGIONAL_OFFICE_NAME,REGIONAL_OFFICE_CODE
0,NORTH WEST,Y62
1,NORTH WEST,Y62
2,NORTH EAST AND YORKSHIRE,Y63
3,NORTH EAST AND YORKSHIRE,Y63
4,NORTH EAST AND YORKSHIRE,Y63
...,...,...
6550461,UNIDENTIFIED,-
6550462,UNIDENTIFIED,-
6550463,UNIDENTIFIED,-
6550464,UNIDENTIFIED,-


a.	Crie um dataframe contendo os 10 principais produtos químicos prescritos por região.

In [14]:
df1 = df.groupby('REGIONAL_OFFICE_NAME').get_group('NORTH WEST')

In [15]:
df_Y62 = df1[['REGIONAL_OFFICE_NAME','REGIONAL_OFFICE_CODE','CHEMICAL_SUBSTANCE_BNF_DESCR', 'TOTAL_QUANTITY']]

In [16]:
df_Y62.groupby(['REGIONAL_OFFICE_NAME','REGIONAL_OFFICE_CODE','CHEMICAL_SUBSTANCE_BNF_DESCR']).TOTAL_QUANTITY.sum().reset_index().sort_values('TOTAL_QUANTITY',ascending=False).head(10)

Unnamed: 0,REGIONAL_OFFICE_NAME,REGIONAL_OFFICE_CODE,CHEMICAL_SUBSTANCE_BNF_DESCR,TOTAL_QUANTITY
384,NORTH WEST,Y62,Enteral nutrition,351905747.0
374,NORTH WEST,Y62,Emollients,47989129.0
839,NORTH WEST,Y62,Other emollient preparations,40477476.0
843,NORTH WEST,Y62,Other food for special diet preparations,34659858.0
31,NORTH WEST,Y62,Alginic acid compound preparations,31421966.0
889,NORTH WEST,Y62,Paracetamol,29887512.0
249,NORTH WEST,Y62,Co-codamol (Codeine phosphate/paracetamol),27837904.0
720,NORTH WEST,Y62,Metformin hydrochloride,22737750.0
86,NORTH WEST,Y62,Atorvastatin,20539876.0
615,NORTH WEST,Y62,Lactulose,19753595.0


In [17]:
df1 = df.groupby('REGIONAL_OFFICE_NAME').get_group('NORTH EAST AND YORKSHIRE')

In [18]:
df_Y63 = df1[['REGIONAL_OFFICE_NAME','REGIONAL_OFFICE_CODE','CHEMICAL_SUBSTANCE_BNF_DESCR', 'TOTAL_QUANTITY']]

In [19]:
df_Y63.groupby(['REGIONAL_OFFICE_NAME','REGIONAL_OFFICE_CODE','CHEMICAL_SUBSTANCE_BNF_DESCR']).TOTAL_QUANTITY.sum().reset_index().sort_values('TOTAL_QUANTITY',ascending=False).head(10)

Unnamed: 0,REGIONAL_OFFICE_NAME,REGIONAL_OFFICE_CODE,CHEMICAL_SUBSTANCE_BNF_DESCR,TOTAL_QUANTITY
397,NORTH EAST AND YORKSHIRE,Y63,Enteral nutrition,313500457.0
386,NORTH EAST AND YORKSHIRE,Y63,Emollients,63289006.0
864,NORTH EAST AND YORKSHIRE,Y63,Other food for special diet preparations,46760120.0
907,NORTH EAST AND YORKSHIRE,Y63,Paracetamol,46449905.0
35,NORTH EAST AND YORKSHIRE,Y63,Alginic acid compound preparations,43973005.0
860,NORTH EAST AND YORKSHIRE,Y63,Other emollient preparations,35078907.0
733,NORTH EAST AND YORKSHIRE,Y63,Metformin hydrochloride,27259048.0
90,NORTH EAST AND YORKSHIRE,Y63,Atorvastatin,26474443.0
737,NORTH EAST AND YORKSHIRE,Y63,Methadone hydrochloride,23839482.0
257,NORTH EAST AND YORKSHIRE,Y63,Co-codamol (Codeine phosphate/paracetamol),22332058.0


In [20]:
df1 = df.groupby('REGIONAL_OFFICE_NAME').get_group('SOUTH WEST')

In [21]:
df_Y58 = df1[['REGIONAL_OFFICE_NAME','REGIONAL_OFFICE_CODE','CHEMICAL_SUBSTANCE_BNF_DESCR', 'TOTAL_QUANTITY']]

In [22]:
df_Y58.groupby(['REGIONAL_OFFICE_NAME','REGIONAL_OFFICE_CODE','CHEMICAL_SUBSTANCE_BNF_DESCR']).TOTAL_QUANTITY.sum().reset_index().sort_values('TOTAL_QUANTITY',ascending=False).head(10)

Unnamed: 0,REGIONAL_OFFICE_NAME,REGIONAL_OFFICE_CODE,CHEMICAL_SUBSTANCE_BNF_DESCR,TOTAL_QUANTITY
366,SOUTH WEST,Y58,Enteral nutrition,117213879.0
355,SOUTH WEST,Y58,Emollients,40226383.0
854,SOUTH WEST,Y58,Paracetamol,25757715.0
815,SOUTH WEST,Y58,Other food for special diet preparations,17120163.5
789,SOUTH WEST,Y58,Omeprazole,15635594.0
811,SOUTH WEST,Y58,Other emollient preparations,15610380.0
31,SOUTH WEST,Y58,Alginic acid compound preparations,15505381.0
81,SOUTH WEST,Y58,Atorvastatin,15399647.0
694,SOUTH WEST,Y58,Metformin hydrochloride,15054532.0
238,SOUTH WEST,Y58,Co-codamol (Codeine phosphate/paracetamol),13046580.0


In [23]:
df1 = df.groupby('REGIONAL_OFFICE_NAME').get_group('SOUTH EAST')

In [24]:
df_Y59 = df1[['REGIONAL_OFFICE_NAME','REGIONAL_OFFICE_CODE','CHEMICAL_SUBSTANCE_BNF_DESCR', 'TOTAL_QUANTITY']]

In [25]:
df_Y59.groupby(['REGIONAL_OFFICE_NAME','REGIONAL_OFFICE_CODE','CHEMICAL_SUBSTANCE_BNF_DESCR']).TOTAL_QUANTITY.sum().reset_index().sort_values('TOTAL_QUANTITY',ascending=False).head(10)

Unnamed: 0,REGIONAL_OFFICE_NAME,REGIONAL_OFFICE_CODE,CHEMICAL_SUBSTANCE_BNF_DESCR,TOTAL_QUANTITY
384,SOUTH EAST,Y59,Enteral nutrition,263444721.0
373,SOUTH EAST,Y59,Emollients,55460558.0
854,SOUTH EAST,Y59,Other food for special diet preparations,38912250.0
733,SOUTH EAST,Y59,Methadone hydrochloride,34309738.0
899,SOUTH EAST,Y59,Paracetamol,26529008.0
850,SOUTH EAST,Y59,Other emollient preparations,25081961.0
729,SOUTH EAST,Y59,Metformin hydrochloride,24957881.0
84,SOUTH EAST,Y59,Atorvastatin,21543886.0
827,SOUTH EAST,Y59,Omeprazole,19197739.0
248,SOUTH EAST,Y59,Co-codamol (Codeine phosphate/paracetamol),19144994.0


In [26]:
df1 = df.groupby('REGIONAL_OFFICE_NAME').get_group('EAST OF ENGLAND')

In [27]:
df_Y61 = df1[['REGIONAL_OFFICE_NAME','REGIONAL_OFFICE_CODE','CHEMICAL_SUBSTANCE_BNF_DESCR', 'TOTAL_QUANTITY']]

In [28]:
df_Y61.groupby(['REGIONAL_OFFICE_NAME','REGIONAL_OFFICE_CODE','CHEMICAL_SUBSTANCE_BNF_DESCR']).TOTAL_QUANTITY.sum().reset_index().sort_values('TOTAL_QUANTITY',ascending=False).head(10)

Unnamed: 0,REGIONAL_OFFICE_NAME,REGIONAL_OFFICE_CODE,CHEMICAL_SUBSTANCE_BNF_DESCR,TOTAL_QUANTITY
384,EAST OF ENGLAND,Y61,Enteral nutrition,193741913.0
373,EAST OF ENGLAND,Y61,Emollients,41260874.0
848,EAST OF ENGLAND,Y61,Other food for special diet preparations,25345743.0
721,EAST OF ENGLAND,Y61,Metformin hydrochloride,19476258.0
889,EAST OF ENGLAND,Y61,Paracetamol,19237946.0
845,EAST OF ENGLAND,Y61,Other emollient preparations,17486945.0
84,EAST OF ENGLAND,Y61,Atorvastatin,17441278.0
247,EAST OF ENGLAND,Y61,Co-codamol (Codeine phosphate/paracetamol),15366863.0
33,EAST OF ENGLAND,Y61,Alginic acid compound preparations,13453583.0
613,EAST OF ENGLAND,Y61,Lactulose,12858787.0


In [29]:
df1 = df.groupby('REGIONAL_OFFICE_NAME').get_group('LONDON')

In [30]:
df_Y56 = df1[['REGIONAL_OFFICE_NAME','REGIONAL_OFFICE_CODE','CHEMICAL_SUBSTANCE_BNF_DESCR', 'TOTAL_QUANTITY']]

In [31]:
df_Y56.groupby(['REGIONAL_OFFICE_NAME','REGIONAL_OFFICE_CODE','CHEMICAL_SUBSTANCE_BNF_DESCR']).TOTAL_QUANTITY.sum().reset_index().sort_values('TOTAL_QUANTITY',ascending=False).head(10)

Unnamed: 0,REGIONAL_OFFICE_NAME,REGIONAL_OFFICE_CODE,CHEMICAL_SUBSTANCE_BNF_DESCR,TOTAL_QUANTITY
387,LONDON,Y56,Enteral nutrition,290916061.0
375,LONDON,Y56,Emollients,72505881.0
865,LONDON,Y56,Other emollient preparations,32726201.0
738,LONDON,Y56,Metformin hydrochloride,31774176.0
869,LONDON,Y56,Other food for special diet preparations,26505017.0
84,LONDON,Y56,Atorvastatin,20265576.0
742,LONDON,Y56,Methadone hydrochloride,18058059.0
32,LONDON,Y56,Alginic acid compound preparations,16876288.0
913,LONDON,Y56,Paracetamol,16106571.0
56,LONDON,Y56,Amlodipine,14650890.0


In [32]:
df1 = df.groupby('REGIONAL_OFFICE_NAME').get_group('MIDLANDS')

In [33]:
df_indef = df1[['REGIONAL_OFFICE_NAME','REGIONAL_OFFICE_CODE','CHEMICAL_SUBSTANCE_BNF_DESCR', 'TOTAL_QUANTITY']]

In [34]:
df_indef.groupby(['REGIONAL_OFFICE_NAME','REGIONAL_OFFICE_CODE','CHEMICAL_SUBSTANCE_BNF_DESCR']).TOTAL_QUANTITY.sum().reset_index().sort_values('TOTAL_QUANTITY',ascending=False).head(10)

Unnamed: 0,REGIONAL_OFFICE_NAME,REGIONAL_OFFICE_CODE,CHEMICAL_SUBSTANCE_BNF_DESCR,TOTAL_QUANTITY
391,MIDLANDS,Y60,Enteral nutrition,411086823.0
380,MIDLANDS,Y60,Emollients,83325531.0
869,MIDLANDS,Y60,Other food for special diet preparations,57877234.0
865,MIDLANDS,Y60,Other emollient preparations,45305436.0
915,MIDLANDS,Y60,Paracetamol,44579438.0
741,MIDLANDS,Y60,Metformin hydrochloride,35686882.0
34,MIDLANDS,Y60,Alginic acid compound preparations,34810676.0
257,MIDLANDS,Y60,Co-codamol (Codeine phosphate/paracetamol),33297832.0
87,MIDLANDS,Y60,Atorvastatin,29709322.0
633,MIDLANDS,Y60,Lactulose,26155255.0


b.	Quais produtos químicos prescritos tiveram a maior somatória de custos por mês? Liste os 10 primeiros.

In [35]:
df.groupby("CHEMICAL_SUBSTANCE_BNF_DESCR").ACTUAL_COST.sum().reset_index().sort_values('ACTUAL_COST',ascending=False).head(10)

Unnamed: 0,CHEMICAL_SUBSTANCE_BNF_DESCR,ACTUAL_COST
77,Apixaban,33007440.0
439,Enteral nutrition,24362040.0
114,Beclometasone dipropionate,23779420.0
1176,Rivaroxaban,18781050.0
208,Catheters,12954540.0
352,Detection Sensor Interstitial Fluid/Gluc,10911130.0
164,Budesonide,9732518.0
1421,Wound Management & Other Dressings,9588992.0
574,Glucose blood testing reagents,9230607.0
419,Edoxaban,9123207.0


c.	Quais são as precrições mais comuns? 

In [36]:
df[['BNF_DESCRIPTION','TOTAL_QUANTITY']].groupby('BNF_DESCRIPTION').sum().sort_values(by='TOTAL_QUANTITY', ascending=False).head(10)

Unnamed: 0_level_0,TOTAL_QUANTITY
BNF_DESCRIPTION,Unnamed: 1_level_1
Ensure Plus milkshake style liquid (9 flavours),187541000.0
Paracetamol 500mg tablets,152879229.0
Fortisip Bottle (8 flavours),151562000.0
Fortisip Compact liquid (8 flavours),139502750.0
Ensure Compact liquid (4 flavours),124729528.0
Lactulose 3.1-3.7g/5ml oral solution,115093595.0
Metformin 500mg tablets,111419337.0
Fortisip Compact Protein liquid (9 flavours),107179750.0
Omeprazole 20mg gastro-resistant capsules,99700161.0
Dermol 500 lotion,83283500.0


d.	Qual produto químico é mais prescrito por cada prescriber?

In [37]:
df_pres = df[['PRACTICE_NAME','CHEMICAL_SUBSTANCE_BNF_DESCR','TOTAL_QUANTITY']]

In [38]:
df_asc = df_pres.groupby(['PRACTICE_NAME','CHEMICAL_SUBSTANCE_BNF_DESCR']).TOTAL_QUANTITY.sum().reset_index().sort_values('TOTAL_QUANTITY',ascending=False)

In [39]:
df_prescriber = df_asc.drop_duplicates('PRACTICE_NAME')

In [40]:
df_prescriber.sort_values('TOTAL_QUANTITY',ascending=False)

Unnamed: 0,PRACTICE_NAME,CHEMICAL_SUBSTANCE_BNF_DESCR,TOTAL_QUANTITY
1548621,MEDICUS HEALTH PARTNERS,Enteral nutrition,4253632.0
443167,CGL BIRMINGHAM SOUTH,Methadone hydrochloride,3970225.0
1600962,MODALITY PARTNERSHIP (AWC),Enteral nutrition,3949145.0
1573996,MIDLANDS MEDICAL PARTNERSHIP,Enteral nutrition,3365624.0
1812476,PARK SURGERY,Enteral nutrition,3291582.0
...,...,...,...
2023019,ROTHERHAM DISTRICT NURSING,Eye Products,1.0
2356363,TAMWORTH LOCALITY NETWORK,Salbutamol,1.0
995523,GPSI CYSTOSCOPY - IDLE,Testosterone esters,1.0
1727755,NT&H COMM DIAB N DDES,Glucagon,1.0


Index(['YEAR_MONTH', 'REGIONAL_OFFICE_NAME', 'REGIONAL_OFFICE_CODE',
       'ICB_NAME', 'ICB_CODE', 'PCO_NAME', 'PCO_CODE', 'PRACTICE_NAME',
       'PRACTICE_CODE', 'ADDRESS_1', 'ADDRESS_2', 'ADDRESS_3', 'ADDRESS_4',
       'POSTCODE', 'BNF_CHEMICAL_SUBSTANCE', 'CHEMICAL_SUBSTANCE_BNF_DESCR',
       'BNF_CODE', 'BNF_DESCRIPTION', 'BNF_CHAPTER_PLUS_CODE', 'QUANTITY',
       'ITEMS', 'TOTAL_QUANTITY', 'ADQUSAGE', 'NIC', 'ACTUAL_COST',
       'UNIDENTIFIED'],
      dtype='object')

e.	Quantos prescribers foram adicionados no ultimo mês? 

#### Precribers são os que prescrevem - médico ou hostipal, Enquanto, Prescriptions - prescrições as receitas realizei a análise pela ótica das prescrições

Realizei a analise pela a otica dos precriber que no caso são os centros de saúde e não houve aumento de unidades ou seja o número é fixo de prescribers. 

In [42]:
df_202207 = pd.read_csv('epd_202207.csv')

In [43]:
df_prescribers = df_202207[['PRACTICE_NAME']]

In [44]:
print(df_prescribers.groupby('PRACTICE_NAME').PRACTICE_NAME.sum().count().sum(),'Centros de prescrições - Centros de tratamentos - Hospitais')

8387 Centros de prescrições - Centros de tratamentos - Hospitais


In [45]:
df_prescriptions = df_202207[['BNF_DESCRIPTION']]

In [46]:
print(df_prescriptions.groupby('BNF_DESCRIPTION').BNF_DESCRIPTION.count().sum(),'prescrições realizadas')

17603900 prescrições realizadas


f.	Quais prescribers atuam em mais de uma região? Ordene por quantidade de regiões antendidas.

In [98]:
df_region = df[['PRACTICE_NAME','REGIONAL_OFFICE_NAME']]

In [103]:
df_region.groupby(['PRACTICE_NAME','REGIONAL_OFFICE_NAME']).sum().head()

PRACTICE_NAME,REGIONAL_OFFICE_NAME
(FRACTURE CLINIC) NORTH OOH,MIDLANDS
(IRLAM) SALFORD CARE CTRS MEDICAL PRACTI,NORTH WEST
(OUT PATIENT DEPARTMENT) NORTH OOH,MIDLANDS
0-19 EAST CHESHIRE HEALTH VISITORS,NORTH WEST
0-19 PUBLIC HEALTH SERVICE HARTLEPOOL,NORTH EAST AND YORKSHIRE


In [100]:
df_region.groupby(['REGIONAL_OFFICE_NAME','PRACTICE_NAME']).count()

REGIONAL_OFFICE_NAME,PRACTICE_NAME
EAST OF ENGLAND,ABBEY FIELD MEDICAL CENTRE
EAST OF ENGLAND,ABBEY ROAD SURGERY
EAST OF ENGLAND,ABBOTSWOOD MEDICAL CENTRE
EAST OF ENGLAND,ABRIDGE SURGERY
EAST OF ENGLAND,ACE LTD OOH
...,...
SOUTH WEST,YELVERTON SURGERY
SOUTH WEST,YETMINSTER MEDICAL CENTRE
SOUTH WEST,YORKLEIGH SURGERY(CT)
SOUTH WEST,YORKLEY HEALTH CENTRE(WG)


In [101]:
df_pract = df_region.groupby(['PRACTICE_NAME']).count()

In [50]:
df_pract

Unnamed: 0_level_0,REGIONAL_OFFICE_NAME
PRACTICE_NAME,Unnamed: 1_level_1
(FRACTURE CLINIC) NORTH OOH,1188
(IRLAM) SALFORD CARE CTRS MEDICAL PRACTI,1633
(OUT PATIENT DEPARTMENT) NORTH OOH,33
0-19 EAST CHESHIRE HEALTH VISITORS,1
0-19 PUBLIC HEALTH SERVICE HARTLEPOOL,2
...,...
YOUR HEALTHCARE NON MED PRES,269
YOXALL,1837
ZAIN MEDICAL CENTRE,1060
ZAMAN,2223


In [70]:
df_duplic = df_pract.sort_values('PRACTICE_NAME',ascending=True).duplicated()

In [88]:
df_duplic

PRACTICE_NAME
(FRACTURE CLINIC) NORTH OOH                 False
(IRLAM) SALFORD CARE CTRS MEDICAL PRACTI    False
(OUT PATIENT DEPARTMENT) NORTH OOH          False
0-19 EAST CHESHIRE HEALTH VISITORS          False
0-19 PUBLIC HEALTH SERVICE HARTLEPOOL       False
                                            ...  
YOUR HEALTHCARE NON MED PRES                 True
YOXALL                                       True
ZAIN MEDICAL CENTRE                          True
ZAMAN                                        True
ZETLAND MEDICAL PRACTICE                     True
Length: 8387, dtype: bool

In [87]:
df_duplic.sum()""

4694

In [75]:
df_duplic.count()

8387

In [89]:
df1 = df.groupby('PRACTICE_NAME').get_group('ZAMAN')

In [92]:
df_Y56 = df1[['REGIONAL_OFFICE_NAME','PRACTICE_NAME']]

In [94]:
print(df_Y56)

        REGIONAL_OFFICE_NAME PRACTICE_NAME
7118248           NORTH WEST         ZAMAN
7118249           NORTH WEST         ZAMAN
7118254           NORTH WEST         ZAMAN
7118256           NORTH WEST         ZAMAN
7118257           NORTH WEST         ZAMAN
...                      ...           ...
7161866           NORTH WEST         ZAMAN
7161867           NORTH WEST         ZAMAN
7161868           NORTH WEST         ZAMAN
7161869           NORTH WEST         ZAMAN
7161870           NORTH WEST         ZAMAN

[2223 rows x 2 columns]


In [31]:
df_Y56.groupby(['PRACTICE_NAME']).sum().reset_index().sort_values(ascending=False).head(10)

Unnamed: 0,REGIONAL_OFFICE_NAME,REGIONAL_OFFICE_CODE,CHEMICAL_SUBSTANCE_BNF_DESCR,TOTAL_QUANTITY
387,LONDON,Y56,Enteral nutrition,290916061.0
375,LONDON,Y56,Emollients,72505881.0
865,LONDON,Y56,Other emollient preparations,32726201.0
738,LONDON,Y56,Metformin hydrochloride,31774176.0
869,LONDON,Y56,Other food for special diet preparations,26505017.0
84,LONDON,Y56,Atorvastatin,20265576.0
742,LONDON,Y56,Methadone hydrochloride,18058059.0
32,LONDON,Y56,Alginic acid compound preparations,16876288.0
913,LONDON,Y56,Paracetamol,16106571.0
56,LONDON,Y56,Amlodipine,14650890.0


g.	Qual o preço médio dos químicos prescritos em no ultimo mês coletado?

In [None]:
"£{:,.2f}".format(df['ACTUAL_COST'].mean())

h.	Gere uma tabela que contenha apenas a prescrição de maior valor de cada usuário.

In [None]:
df[['BNF_DESCRIPTION', 'ACTUAL_COST']].groupby('BNF_DESCRIPTION').sum().max()