# ASID Lookup Investigation 

We use the ASID lookup file, emailed to us on the 1st of every month from the DIR team. This is used in the ODS Downloader along with the ODS Portal API call to produce the organistion metadata JSON file. A list of organisations are obtained from ODS Portal API call, and then enriched with ASIDs using the ASID lookup file. The organistion metadata JSON file is then used as an input for the data pipeline.

This investigation is to see what the differences are between earlier ASID Lookup files and latter ASID Lookup files 

In [1]:
import pandas as pd

In [2]:
asid_file_location = "s3://prm-gp2gp-data-sandbox-dev/asid-lookup/"
asid_one = pd.read_csv(asid_file_location + "asidLookup-Nov-2020.csv.gz")
asid_two = pd.read_csv(asid_file_location + "asidLookup-Apr-2021.csv.gz")

In [3]:
asid_one = asid_one.loc[asid_one["OrgType"] == "GP Practice"].drop("PName", axis=1)
asid_two = asid_two.loc[asid_two["OrgType"] == "GP Practice"].drop("PName", axis=1)

In [4]:
asid_one.shape[0]

14372

In [5]:
asid_two.shape[0]

14424

In [6]:
len(set(asid_one["ASID"].values).intersection(set(asid_two["ASID"].values)))

14236

Finding one is not the subset of the other

In [7]:
unique_asid_rows = pd.concat([asid_one, asid_two]).drop_duplicates()

In [8]:
unique_asid_rows["ASID"].value_counts().value_counts()

1    14439
2      121
Name: ASID, dtype: int64

In [9]:
unique_asid_counts = unique_asid_rows["ASID"].value_counts()
unique_asid_rows_over_one_bool = unique_asid_counts > 1
repeated_asids = unique_asid_counts.loc[unique_asid_rows_over_one_bool].index

In [10]:
asid_two.set_index("ASID").loc[repeated_asids]

Unnamed: 0,NACS,OrgName,MName,OrgType,PostCode
200000016875,B84007,BRIG ROYD SURGERY,THE PHOENIX PARTNERSHIP,GP Practice,HX6 4BN
200000018990,B83611,BARKEREND HC - EL ELIWI,THE PHOENIX PARTNERSHIP,GP Practice,BD3 0BS
200000021139,Y00446,MAGHULL PRACTICE,EGTON MEDICAL INFORMATION SYSTEMS LTD (EMIS),GP Practice,L31 0DJ
200000023662,L84052,SEVERNSIDE MEDICAL PRACTICE,THE PHOENIX PARTNERSHIP,GP Practice,GL1 1XR
762405575015,J81076,THE TOLLERFORD PRACTICE,THE PHOENIX PARTNERSHIP,GP Practice,DT6 5BN
...,...,...,...,...,...
931662763040,P87022,MOCHA PARADE MEDICAL PRACTICE,IN PRACTICE SYSTEMS LTD,GP Practice,M7 3SE
200000017694,D81629,WILLOW TREE SURGERY,THE PHOENIX PARTNERSHIP,GP Practice,PE2 5RQ
200000023104,H84059,THAMESIDE MEDICAL PRACTICE,EGTON MEDICAL INFORMATION SYSTEMS LTD (EMIS),GP Practice,TW11 8HU
579228803014,E85750,SPRING GROVE MEDICAL PRACTICE,EGTON MEDICAL INFORMATION SYSTEMS LTD (EMIS),GP Practice,TW7 4HG


In [11]:
asid_one.set_index("ASID").loc[repeated_asids]

Unnamed: 0,NACS,OrgName,MName,OrgType,PostCode
200000016875,B84007,DR LJ PICKLES AND PARTNERS,THE PHOENIX PARTNERSHIP,GP Practice,HX6 4BN
200000018990,B83611,BARKEREND HC - EL ELIWI,THE PHOENIX PARTNERSHIP,GP Practice,BD3 8QH
200000021139,Y00446,MAGHULL SURGERY,EGTON MEDICAL INFORMATION SYSTEMS LTD (EMIS),GP Practice,L31 0DJ
200000023662,L84052,GLOUCESTER CITY HEALTH CENTRE,THE PHOENIX PARTNERSHIP,GP Practice,GL1 1XR
762405575015,J81076,THE TOLLERFORD PRACTICE,THE PHOENIX PARTNERSHIP,GP Practice,DT2 0DB
...,...,...,...,...,...
931662763040,P87022,MOCHA PARADE MEDICAL PRACTICE,IN PRACTICE SYSTEMS LTD,GP Practice,M7 1QE
200000017694,D81629,THE WILLOW TREE SURGERY,THE PHOENIX PARTNERSHIP,GP Practice,PE2 5RQ
200000023104,H84059,CHILDS (THAMESIDE),EGTON MEDICAL INFORMATION SYSTEMS LTD (EMIS),GP Practice,TW11 8HU
579228803014,E85750,SPRING GROVE MEDICAL PRACTICE,EGTON MEDICAL INFORMATION SYSTEMS LTD (EMIS),GP Practice,TW7 4HQ


## Outcome
- Produce an ASID lookup file that contains the most recent entry for all ASIDs

In [19]:
# Import ASID look up files
asid_file_location = "s3://prm-gp2gp-data-sandbox-dev/asid-lookup/"
asid_files = [
    "asidLookup-Nov-2020.csv.gz",
    "asidLookup-Dec-2020.csv.gz",
    "asidLookup-Jan-2021.csv.gz",
    "asidLookup-Feb-2021.csv.gz",
    "asidLookup-Mar-2021.csv.gz",
    "asidLookup-Apr-2021.csv.gz",
    
]
asid_lookup_files = [asid_file_location + f for f in asid_files]
asid_lookup = pd.concat((
    pd.read_csv(f)
    for f in asid_lookup_files
))
asid_lookup = asid_lookup.drop_duplicates().groupby("ASID").last().reset_index()

In [20]:
asid_lookup

Unnamed: 0,ASID,NACS,OrgName,MName,PName,OrgType,PostCode
0,000032357014,K82022,KINGSWOOD SURGERY,EGTON MEDICAL INFORMATION SYSTEMS LTD (EMIS),EMIS Web,GP Practice,HP13 7UN
1,000083218040,FQG85,THE WILLOWS PHARMACY,AAH PHARMACEUTICALS,ProScript Link,Pharmacy,BN27 4LE
2,000166272047,A83031,CARMEL MEDICAL PRACTICE,THE PHOENIX PARTNERSHIP,SystmOne,GP Practice,DL3 8SQ
3,000226234045,FDC52,DAY LEWIS PHARMACY,RX SYSTEMS,ProScript Connect,Pharmacy,SP10 1HF
4,000327466041,FXX85,PRESCRIPTIONS FIRST,RX SYSTEMS,ProScript Connect,Pharmacy,PR8 6QL
...,...,...,...,...,...,...,...
38492,999707455039,FK696,LLOYDSPHARMACY,LLOYDS PHARMACY LTD,CoMPaSS,Pharmacy,S11 8HN
38493,999756796044,P91627,LOSTOCK MEDICAL CENTRE,EGTON MEDICAL INFORMATION SYSTEMS LTD (EMIS),EMIS Web,GP Practice,M32 9PA
38494,999757880037,FJL19,LLOYDSPHARMACY,LLOYDS PHARMACY LTD,CoMPaSS,Pharmacy,FY5 5HT
38495,999785485049,FWC78,DUKES PHARMACY,HELIX HEALTH,PharmaSys,Pharmacy,RG45 6DS
