# NPI Lookup Tool: Summary

Owner/Publisher: Eric Kim, eric@kimanalytics.com

Several healthcare companies deal with missing data such as NPI on a regular basis. NPI is an identifier for providers that is necessary to submit to facilitators such as CMS to get reimbursement payments back to the healthcare companies doing the submission. Missing several NPIs could mean missing reimbursements and lower submission rates. This tool developed with Python uses the API from the NPPES NPI registry to lookup missing NPIs in real time.

## NPPES NPI Registry

https://npiregistry.cms.hhs.gov/

The API is a new, faster alternative to the downloadable NPPES data files. It allows systems to access NPPES public data in real-time, rather than through batched uploads. The API retrieves data from NPPES daily.

In [1]:
import json # to read json files from NPPES API
import pandas as pd # to work with dataframes
import numpy as np # to work with numbers

## Load NPI file give to us by submissions team

This file will have both clean and missing NPIs

In [2]:
npi = pd.read_csv('npi.csv') # Use pandas to load csv
npi

Unnamed: 0,Provider First Name,Provider Last Name,Provider State,NPI
0,Jimmy,Zhao,NY,10000000000.0
1,Aru,Krishna,GA,94607.0
2,Niki,Desai,GA,
3,John,Rodgers,,
4,Ida,Kwok,CA,4555894000.0
5,Arun,Nepal,CA,8102774000.0
6,Marieta,Valev,CA,2062961000.0
7,James,Van Dyne,CA,5171034000.0
8,Zigui,Li,CA,2136934000.0


Simple data cleansing

In [3]:
npi['NPI'] = npi['NPI'].fillna(0) # Fill nan to zeros so that can convert to integer
npi['NPI'] = npi['NPI'].astype(np.int64) # convert to integer
npi

Unnamed: 0,Provider First Name,Provider Last Name,Provider State,NPI
0,Jimmy,Zhao,NY,9999999999
1,Aru,Krishna,GA,94607
2,Niki,Desai,GA,0
3,John,Rodgers,,0
4,Ida,Kwok,CA,4555893625
5,Arun,Nepal,CA,8102774266
6,Marieta,Valev,CA,2062961130
7,James,Van Dyne,CA,5171034323
8,Zigui,Li,CA,2136934325


## Adding NPI format types

Usually coders will enter all "9's" for charts that have missing NPI, on other occasions, they may mistaken zip codes to be NPI. We provide a simple filter to record records above or below 10-digit count standard to be missing and also all "9's" to be missing as well. On rare occasion, a coder might enter in the 10-digit phone number associated with the provider as the NPI. This data integrity issue could be solved by matching NPI with the providers phone numbers, but sometimes the NPI and phone numbers could be flipped. This happens on an extremely rare occasion. For this reason, we will assume a 99% confidence with our logic.

In [4]:
npi['Type'] = "" # Add new blank column

for x in range(len(npi)):
    if str(npi['NPI'][x]) == '9999999999': # All 9's will be missing
        npi.at[x,'Type'] = 'Missing'
    elif len(str(npi['NPI'][x])) != 10: # Any record above or below 10 digits is invalid
        npi.at[x,'Type'] = 'Missing'
    else:
        npi.at[x,'Type'] = 'Clean' # Else, clean
        
npi

Unnamed: 0,Provider First Name,Provider Last Name,Provider State,NPI,Type
0,Jimmy,Zhao,NY,9999999999,Missing
1,Aru,Krishna,GA,94607,Missing
2,Niki,Desai,GA,0,Missing
3,John,Rodgers,,0,Missing
4,Ida,Kwok,CA,4555893625,Clean
5,Arun,Nepal,CA,8102774266,Clean
6,Marieta,Valev,CA,2062961130,Clean
7,James,Van Dyne,CA,5171034323,Clean
8,Zigui,Li,CA,2136934325,Clean


## Splitting dataframes into "Clean" and "Missing"

We will work with the missing NPI records first and combine it with the clean records later

In [5]:
clean_npi = npi[npi['Type'] == 'Clean']
clean_npi = clean_npi.reset_index(drop=True)
clean_npi

Unnamed: 0,Provider First Name,Provider Last Name,Provider State,NPI,Type
0,Ida,Kwok,CA,4555893625,Clean
1,Arun,Nepal,CA,8102774266,Clean
2,Marieta,Valev,CA,2062961130,Clean
3,James,Van Dyne,CA,5171034323,Clean
4,Zigui,Li,CA,2136934325,Clean


In [6]:
missing_npi = npi[npi['Type'] == 'Missing']
missing_npi = missing_npi.fillna('')
missing_npi = missing_npi.reset_index(drop=True)
missing_npi

Unnamed: 0,Provider First Name,Provider Last Name,Provider State,NPI,Type
0,Jimmy,Zhao,NY,9999999999,Missing
1,Aru,Krishna,GA,94607,Missing
2,Niki,Desai,GA,0,Missing
3,John,Rodgers,,0,Missing


Setting the confidence level for missing records to 0%

In [7]:
for x in range(len(missing_npi)):
    missing_npi.at[x, 'Confidence'] = 0.00 # 0% confidence level for missing records, will update as we find NPIs
    
missing_npi

Unnamed: 0,Provider First Name,Provider Last Name,Provider State,NPI,Type,Confidence
0,Jimmy,Zhao,NY,9999999999,Missing,0.0
1,Aru,Krishna,GA,94607,Missing,0.0
2,Niki,Desai,GA,0,Missing,0.0
3,John,Rodgers,,0,Missing,0.0


## Starting with first missing NPI

Here we go through the 4 examples of missing NPI on a case-by-case basis. This logic is an extremely simplified version of its original. The purpose of this is to show in detail how cases are resolved.

In [8]:
n = 0
missing_npi.loc[n]

Provider First Name         Jimmy
Provider Last Name           Zhao
Provider State                 NY
NPI                    9999999999
Type                      Missing
Confidence                      0
Name: 0, dtype: object

In [9]:
http = "https://npiregistry.cms.hhs.gov/api/?"

fn = missing_npi.loc[n][0] 
ln = missing_npi.loc[n][1] 
state = missing_npi.loc[n][2] 

h = (http + 'first_name=' + fn + '&last_name=' + ln + '&state=' + state +'&version=2.1')
h

'https://npiregistry.cms.hhs.gov/api/?first_name=Jimmy&last_name=Zhao&state=NY&version=2.1'

In [10]:
import urllib.request, json # to read json file
with urllib.request.urlopen(h) as url:
    provider = json.loads(url.read().decode())
    
provider

{'result_count': 1,
 'results': [{'enumeration_type': 'NPI-1',
   'number': 1760877484,
   'last_updated_epoch': 1427760000,
   'created_epoch': 1427760000,
   'basic': {'first_name': 'JIMMY',
    'last_name': 'ZHAO',
    'middle_name': 'LIU',
    'credential': 'M.D., PH.D.',
    'sole_proprietor': 'YES',
    'gender': 'M',
    'enumeration_date': '2015-03-31',
    'last_updated': '2015-03-31',
    'status': 'A',
    'name': 'ZHAO JIMMY'},
   'other_names': [],
   'addresses': [{'country_code': 'US',
     'country_name': 'United States',
     'address_purpose': 'LOCATION',
     'address_type': 'DOM',
     'address_1': '505 E 70TH ST',
     'address_2': 'WEILL CORNELL INTERNAL MEDICINE ASSOCIATES',
     'city': 'NEW YORK',
     'state': 'NY',
     'postal_code': '100214872',
     'telephone_number': '212-746-3587',
     'fax_number': '212-746-8051'},
    {'country_code': 'US',
     'country_name': 'United States',
     'address_purpose': 'MAILING',
     'address_type': 'DOM',
     'addr

Since there is only one result returned from the NPPES NPI registry, we assume that that is correct with 95% confidence.

In [11]:
if provider['result_count'] == 1:
    missing_npi.at[n, 'NPI'] = provider['results'][0]['number']
    missing_npi.at[n, 'Type'] = 'Found'
    missing_npi.at[n, 'Confidence'] = 0.95

In [12]:
missing_npi

Unnamed: 0,Provider First Name,Provider Last Name,Provider State,NPI,Type,Confidence
0,Jimmy,Zhao,NY,1760877484,Found,0.95
1,Aru,Krishna,GA,94607,Missing,0.0
2,Niki,Desai,GA,0,Missing,0.0
3,John,Rodgers,,0,Missing,0.0


## Second case, first and last name is flipped

Usually if the provider can't be located on the NPPES NPI registry, the provider first and last name could be switched. Here we see what is going on and how this case is resolved.

In [13]:
n = 1
missing_npi.loc[n]

Provider First Name        Aru
Provider Last Name     Krishna
Provider State              GA
NPI                      94607
Type                   Missing
Confidence                   0
Name: 1, dtype: object

In [14]:
http = "https://npiregistry.cms.hhs.gov/api/?"

fn = missing_npi.loc[n][0] 
ln = missing_npi.loc[n][1] 
state = missing_npi.loc[n][2] 

h = (http + 'first_name=' + fn + '&last_name=' + ln + '&state=' + state +'&version=2.1')
h

'https://npiregistry.cms.hhs.gov/api/?first_name=Aru&last_name=Krishna&state=GA&version=2.1'

0 results returned, we will use an if-then to try to flip the order of names.

In [15]:
import urllib.request, json # to read json file
with urllib.request.urlopen(h) as url:
    provider = json.loads(url.read().decode())
    
provider

{'result_count': 0, 'results': []}

In [16]:
if provider['result_count'] == 0:
    firstname = fn
    lastname = ln
    fn = lastname
    ln = firstname
    h = (http + 'first_name=' + fn + '*&last_name=' + ln + '*&state=' + state +'&version=2.1')

Results found when flipping name. Take in consideration that the NPPES NPI registry can do fuzzy lookups and find the formal name of "Krishnan Arunachalam" for Krishna Aru.

In [17]:
import urllib.request, json # to read json file
with urllib.request.urlopen(h) as url:
    provider = json.loads(url.read().decode())
    
provider

{'result_count': 1,
 'results': [{'enumeration_type': 'NPI-1',
   'number': 1013995935,
   'last_updated_epoch': 1183852800,
   'created_epoch': 1136246400,
   'basic': {'name_prefix': 'DR.',
    'first_name': 'KRISHNAN',
    'last_name': 'ARUNACHALAM',
    'credential': 'M.D.',
    'sole_proprietor': 'YES',
    'gender': 'M',
    'enumeration_date': '2006-01-03',
    'last_updated': '2007-07-08',
    'status': 'A',
    'name': 'ARUNACHALAM KRISHNAN'},
   'other_names': [],
   'addresses': [{'country_code': 'US',
     'country_name': 'United States',
     'address_purpose': 'LOCATION',
     'address_type': 'DOM',
     'address_1': '2012 OCILLA HWY',
     'address_2': '',
     'city': 'DOUGLAS',
     'state': 'GA',
     'postal_code': '315332232',
     'telephone_number': '912-384-7822',
     'fax_number': '912-383-9542'},
    {'country_code': 'US',
     'country_name': 'United States',
     'address_purpose': 'MAILING',
     'address_type': 'DOM',
     'address_1': 'PO BOX 1650',
     

In [18]:
if provider['result_count'] == 1:
    missing_npi.at[n, 'NPI'] = provider['results'][0]['number']
    missing_npi.at[n, 'Type'] = 'Found'
    missing_npi.at[n, 'Confidence'] = 0.95

In [19]:
missing_npi

Unnamed: 0,Provider First Name,Provider Last Name,Provider State,NPI,Type,Confidence
0,Jimmy,Zhao,NY,1760877484,Found,0.95
1,Aru,Krishna,GA,1013995935,Found,0.95
2,Niki,Desai,GA,0,Missing,0.0
3,John,Rodgers,,0,Missing,0.0


## 2 NPIs for one Provider

In this case, we have 2 NPIs for one provider, we will use claims to get the DX associated with the speciality of the provider. We have a propriety table that includes unique DX codes that only the designated specialist could capture. For example, eye related diagnostics to optometrists and kidney related diagnostics to urologists.

In [20]:
n = 2
missing_npi.loc[n]

Provider First Name       Niki
Provider Last Name       Desai
Provider State              GA
NPI                          0
Type                   Missing
Confidence                   0
Name: 2, dtype: object

In [21]:
http = "https://npiregistry.cms.hhs.gov/api/?"

fn = missing_npi.loc[n][0] 
ln = missing_npi.loc[n][1] 
state = missing_npi.loc[n][2] 

h = (http + 'first_name=' + fn + '*&last_name=' + ln + '*&state=' + state +'&version=2.1')
h

'https://npiregistry.cms.hhs.gov/api/?first_name=Niki*&last_name=Desai*&state=GA&version=2.1'

2 records returned, will use claims paired with unique dx table to find out which speciality the provider is associated with. 

In [22]:
import urllib.request, json # to read json file
with urllib.request.urlopen(h) as url:
    provider = json.loads(url.read().decode())
    
provider

{'result_count': 2,
 'results': [{'enumeration_type': 'NPI-1',
   'number': 1659813756,
   'last_updated_epoch': 1478547480,
   'created_epoch': 1478476800,
   'basic': {'first_name': 'NIKITA',
    'last_name': 'DESAI',
    'sole_proprietor': 'NO',
    'gender': 'F',
    'enumeration_date': '2016-11-07',
    'last_updated': '2016-11-07',
    'status': 'A',
    'name': 'DESAI NIKITA'},
   'other_names': [],
   'addresses': [{'country_code': 'US',
     'country_name': 'United States',
     'address_purpose': 'LOCATION',
     'address_type': 'DOM',
     'address_1': '318 MALL BLVD # 600',
     'address_2': '',
     'city': 'SAVANNAH',
     'state': 'GA',
     'postal_code': '314064797',
     'telephone_number': '912-356-3833'},
    {'country_code': 'US',
     'country_name': 'United States',
     'address_purpose': 'MAILING',
     'address_type': 'DOM',
     'address_1': '318 MALL BLVD # 600',
     'address_2': '',
     'city': 'SAVANNAH',
     'state': 'GA',
     'postal_code': '31406479

In [23]:
if provider['result_count'] > 1:
    npis = []
    specialties = []
    for x in range(provider['result_count']):
        npis.append(provider['results'][x]['number'])
        specialties.append(provider['results'][x]['taxonomies'][0]['desc'])

One provider is an optometrist, they are more likely to capture eye-related dx codes while the other student could be more likely to capture generic dx codes such as Z00.00, (Encounter for general adult medical exam w/o abnormal findings).

In [24]:
cols = ['NPI', 'Specialty']
df = pd.DataFrame(list(zip(npis, specialties)), columns = cols) 
df 

Unnamed: 0,NPI,Specialty
0,1659813756,Optometrist
1,1972130839,Student in an Organized Health Care Education/...


Going into claims to identify which DX code did our provider administer

In [25]:
claims = pd.read_csv('claims.csv')
claims = claims[claims['Provider First Name'] == 'Niki']
claims = claims[claims['Provider Last Name'] == 'Desai']
claims

Unnamed: 0,Claim ID,Member First Name,Member Last Name,Provider First Name,Provider Last Name,Provider NPI,DX Code
0,72347524542,Eric,Kim,Niki,Desai,,H52.201


Comparing the DX codes with the speciality. This proprietary table includes DX codes that can only be administered by the listed specialties.

In [26]:
unique_dxs = pd.read_csv('unique_dxs.csv')
unique_dxs

Unnamed: 0,ICD-10,Description,Specialty
0,H52.201,"Unspecified astigmatism, right eye",Optometrist
1,H52.202,"Unspecified astigmatism, left eye",Optometrist
2,H52.203,"Unspecified astigmatism, bilateral",Optometrist
3,C64.1,"Malignant neoplasm of right kidney, except ren...",Urologist
4,C64.2,"Malignant neoplasm of left kidney, except rena...",Urologist


Doing an inner join to get associated specialty

In [27]:
left = claims
right = unique_dxs

result = pd.merge(left, right, left_on='DX Code', right_on ='ICD-10')
result

Unnamed: 0,Claim ID,Member First Name,Member Last Name,Provider First Name,Provider Last Name,Provider NPI,DX Code,ICD-10,Description,Specialty
0,72347524542,Eric,Kim,Niki,Desai,,H52.201,H52.201,"Unspecified astigmatism, right eye",Optometrist


Specialty found: Optometrist

In [28]:
df[df['Specialty'] == result['Specialty'][0]]

Unnamed: 0,NPI,Specialty
0,1659813756,Optometrist


Taking that NPI to fill in missing NPI table

In [29]:
missing_npi.at[n, 'NPI'] = df[df['Specialty'] == result['Specialty'][0]]['NPI'][0]
missing_npi.at[n, 'Type'] = 'Found'
missing_npi.at[n, 'Confidence'] = 0.95

In [30]:
missing_npi

Unnamed: 0,Provider First Name,Provider Last Name,Provider State,NPI,Type,Confidence
0,Jimmy,Zhao,NY,1760877484,Found,0.95
1,Aru,Krishna,GA,1013995935,Found,0.95
2,Niki,Desai,GA,1659813756,Found,0.95
3,John,Rodgers,,0,Missing,0.0


## Final case, many NPIs for common provider with common DX

In [31]:
n = 3
missing_npi.loc[n]

Provider First Name       John
Provider Last Name     Rodgers
Provider State                
NPI                          0
Type                   Missing
Confidence                   0
Name: 3, dtype: object

In [32]:
http = "https://npiregistry.cms.hhs.gov/api/?"

fn = missing_npi.loc[n][0] 
ln = missing_npi.loc[n][1] 
state = missing_npi.loc[n][2] 

h = (http + 'first_name=' + fn + '&last_name=' + ln + '&state=' + state +'&version=2.1')
h

'https://npiregistry.cms.hhs.gov/api/?first_name=John&last_name=Rodgers&state=&version=2.1'

In [33]:
import urllib.request, json # to read json file
with urllib.request.urlopen(h) as url:
    provider = json.loads(url.read().decode())
    
provider

{'result_count': 10,
 'results': [{'enumeration_type': 'NPI-1',
   'number': 1871589465,
   'last_updated_epoch': 1183852800,
   'created_epoch': 1127174400,
   'basic': {'name_prefix': 'DR.',
    'first_name': 'JOHN',
    'last_name': 'RODGERS',
    'middle_name': 'BARCLAY',
    'name_suffix': 'JR.',
    'credential': 'M.D.',
    'sole_proprietor': 'NO',
    'gender': 'M',
    'enumeration_date': '2005-09-20',
    'last_updated': '2007-07-08',
    'status': 'A',
    'name': 'RODGERS JOHN'},
   'other_names': [],
   'addresses': [{'country_code': 'US',
     'country_name': 'United States',
     'address_purpose': 'LOCATION',
     'address_type': 'DOM',
     'address_1': '47 NEW SCOTLAND AVE',
     'address_2': '',
     'city': 'ALBANY',
     'state': 'NY',
     'postal_code': '122083412',
     'telephone_number': '518-262-5276',
     'fax_number': '518-262-6470'},
    {'country_code': 'US',
     'country_name': 'United States',
     'address_purpose': 'MAILING',
     'address_type': 'D

In [34]:
if provider['result_count'] > 1:
    npis = []
    specialties = []
    for x in range(provider['result_count']):
        npis.append(provider['results'][x]['number'])
        specialties.append(provider['results'][x]['taxonomies'][0]['desc'])

List of NPIs and Specialities that NPPES NPI registry returned

In [35]:
cols = ['NPI', 'Specialty']
df = pd.DataFrame(list(zip(npis, specialties)), columns = cols) 
df 

Unnamed: 0,NPI,Specialty
0,1871589465,Internal Medicine Gastroenterology
1,1588641112,Internal Medicine
2,1790745560,Orthopaedic Surgery Orthopaedic Trauma
3,1932164548,Occupational Therapist
4,1184689473,Urology
5,1134230659,Psychiatry & Neurology Psychiatry
6,1336274182,Pediatrics
7,1053633362,Pharmacist
8,1578868816,Marriage & Family Therapist
9,1215079660,Social Worker


Our claims returned an extremely generic DX Code that possibly any specialist could administer

In [36]:
claims = pd.read_csv('claims.csv')
claims = claims[claims['Provider First Name'] == 'John']
claims = claims[claims['Provider Last Name'] == 'Rodgers']
claims

Unnamed: 0,Claim ID,Member First Name,Member Last Name,Provider First Name,Provider Last Name,Provider NPI,DX Code
1,32789368377,Ivy,Kim,John,Rodgers,,Z00.00


In [37]:
unique_dxs = pd.read_csv('unique_dxs.csv')
unique_dxs

Unnamed: 0,ICD-10,Description,Specialty
0,H52.201,"Unspecified astigmatism, right eye",Optometrist
1,H52.202,"Unspecified astigmatism, left eye",Optometrist
2,H52.203,"Unspecified astigmatism, bilateral",Optometrist
3,C64.1,"Malignant neoplasm of right kidney, except ren...",Urologist
4,C64.2,"Malignant neoplasm of left kidney, except rena...",Urologist


No results returned

In [38]:
left = claims
right = unique_dxs

result = pd.merge(left, right, left_on='DX Code', right_on ='ICD-10')
result

Unnamed: 0,Claim ID,Member First Name,Member Last Name,Provider First Name,Provider Last Name,Provider NPI,DX Code,ICD-10,Description,Specialty


### Assumption

Since we couldn't narrow down the speciality and have no other data to narrow down by, we simply take the top of the list and divide that by the length of the list. In this case we have 10 records. 1/10 = 10% that is record is a hit (correct). Since we follow our assumption that the NPPES NPI registry is only 95% accurate (safety measure), we also multiply that factor into our total confidence. Thus, for our provider, the confidence is (95%) x (1/10 records) = 9.5% confidence.

In [39]:
missing_npi.at[n, 'NPI'] = df['NPI'][0]
missing_npi.at[n, 'Type'] = 'More than 1:1 Match'
missing_npi.at[n, 'Confidence'] = (0.95) * (1/len(df))

In [40]:
missing_npi

Unnamed: 0,Provider First Name,Provider Last Name,Provider State,NPI,Type,Confidence
0,Jimmy,Zhao,NY,1760877484,Found,0.95
1,Aru,Krishna,GA,1013995935,Found,0.95
2,Niki,Desai,GA,1659813756,Found,0.95
3,John,Rodgers,,1871589465,More than 1:1 Match,0.095


## Confidence level for Coder-entry

We are assuming that only 1 out of every 100 NPI records that our coders enter is incorrect. Thus, we assume a 99% confidence level.

In [41]:
for x in range(len(clean_npi)):
    clean_npi.at[x, 'Confidence'] = 0.990
    
clean_npi

Unnamed: 0,Provider First Name,Provider Last Name,Provider State,NPI,Type,Confidence
0,Ida,Kwok,CA,4555893625,Clean,0.99
1,Arun,Nepal,CA,8102774266,Clean,0.99
2,Marieta,Valev,CA,2062961130,Clean,0.99
3,James,Van Dyne,CA,5171034323,Clean,0.99
4,Zigui,Li,CA,2136934325,Clean,0.99


Combining with or found NPI records

In [42]:
found = missing_npi[missing_npi['Type'] == 'Found']
clean = clean_npi
many = missing_npi[missing_npi['Type'] != 'Found']

frames = [clean, found]

npis = pd.concat(frames)
npis = npis.reset_index(drop=True)
npis = npis.sort_values(by=['Confidence'], ascending=False)
npis

Unnamed: 0,Provider First Name,Provider Last Name,Provider State,NPI,Type,Confidence
0,Ida,Kwok,CA,4555893625,Clean,0.99
1,Arun,Nepal,CA,8102774266,Clean,0.99
2,Marieta,Valev,CA,2062961130,Clean,0.99
3,James,Van Dyne,CA,5171034323,Clean,0.99
4,Zigui,Li,CA,2136934325,Clean,0.99
5,Jimmy,Zhao,NY,1760877484,Found,0.95
6,Aru,Krishna,GA,1013995935,Found,0.95
7,Niki,Desai,GA,1659813756,Found,0.95


## Average confidence

Following compliance regulations for submissions, suppose we have a 98.0% accuracy rate that we have to achieve. We take the average confidence level of our list and add or subtract records to meet or beat our accuracy rate.

In [43]:
npis['Confidence'].mean()

0.9750000000000001

Our average confidence for our total list is 97.5%. We have to take out the lowest confidence records one-by-one until we it a 98.0% level.

In [44]:
for x in range(len(npis)):
    if npis['Confidence'].mean() < 0.98:
        npis = npis[:-1]

After creating the loop, we lose 2 of the lowest records to finalize with 98.3% confidence.

In [45]:
npis

Unnamed: 0,Provider First Name,Provider Last Name,Provider State,NPI,Type,Confidence
0,Ida,Kwok,CA,4555893625,Clean,0.99
1,Arun,Nepal,CA,8102774266,Clean,0.99
2,Marieta,Valev,CA,2062961130,Clean,0.99
3,James,Van Dyne,CA,5171034323,Clean,0.99
4,Zigui,Li,CA,2136934325,Clean,0.99
5,Jimmy,Zhao,NY,1760877484,Found,0.95


In [46]:
npis['Confidence'].mean()

0.9833333333333334

### Export final NPIs

In [47]:
npis.to_csv('final_npis.csv')

## Alternate example

Here we assume that our Coder-entries are better at a 99.9% accuracy level instead of a 99.0% accuracy level.

In [48]:
for x in range(len(clean_npi)):
    clean_npi.at[x, 'Confidence'] = 0.999

In [49]:
found = missing_npi[missing_npi['Type'] == 'Found']
clean = clean_npi
many = missing_npi[missing_npi['Type'] != 'Found']

frames = [clean, found]

npis = pd.concat(frames)
npis = npis.reset_index(drop=True)
npis = npis.sort_values(by=['Confidence'], ascending=False)
npis

for x in range(len(npis)):
    if npis['Confidence'].mean() > 0.98:
        npis = pd.concat([npis, many.iloc[[x]].reset_index(drop=True)])

In [50]:
npis

Unnamed: 0,Provider First Name,Provider Last Name,Provider State,NPI,Type,Confidence
0,Ida,Kwok,CA,4555893625,Clean,0.999
1,Arun,Nepal,CA,8102774266,Clean,0.999
2,Marieta,Valev,CA,2062961130,Clean,0.999
3,James,Van Dyne,CA,5171034323,Clean,0.999
4,Zigui,Li,CA,2136934325,Clean,0.999
5,Jimmy,Zhao,NY,1760877484,Found,0.95
6,Aru,Krishna,GA,1013995935,Found,0.95
7,Niki,Desai,GA,1659813756,Found,0.95
0,John,Rodgers,,1871589465,More than 1:1 Match,0.095


Since our average is high, we could add in records from the "More than 1:1 Match" table from descending confidence level. Let's check our average confidence after adding in one record.

In [51]:
npis['Confidence'].mean()

0.8822222222222222

Since it dropped down to 88.2%, we have to take that out, one-by-one until we hit 98.0% total confidence or better.

In [52]:
for x in range(len(npis)):
    if npis['Confidence'].mean() < 0.98:
        npis = npis[:-1]

In [53]:
npis

Unnamed: 0,Provider First Name,Provider Last Name,Provider State,NPI,Type,Confidence
0,Ida,Kwok,CA,4555893625,Clean,0.999
1,Arun,Nepal,CA,8102774266,Clean,0.999
2,Marieta,Valev,CA,2062961130,Clean,0.999
3,James,Van Dyne,CA,5171034323,Clean,0.999
4,Zigui,Li,CA,2136934325,Clean,0.999
5,Jimmy,Zhao,NY,1760877484,Found,0.95
6,Aru,Krishna,GA,1013995935,Found,0.95
7,Niki,Desai,GA,1659813756,Found,0.95


Our final confidence is 98.0%.

In [54]:
npis['Confidence'].mean()

0.9806250000000001

## QA Step

Since we are using "fake NPIs" for the purpose of this demonstration, we can't really QA if the NPIs are correct or not. However, if we were to use real provider names and real NPIs, we could actually get a real confidence number for the entries sent to us by our chart coders. Also, we could double-check our own NPIs.

In [55]:
qa = npis.copy()
qa['NPPES NPI Exact Match'] = ''

qa

Unnamed: 0,Provider First Name,Provider Last Name,Provider State,NPI,Type,Confidence,NPPES NPI Exact Match
0,Ida,Kwok,CA,4555893625,Clean,0.999,
1,Arun,Nepal,CA,8102774266,Clean,0.999,
2,Marieta,Valev,CA,2062961130,Clean,0.999,
3,James,Van Dyne,CA,5171034323,Clean,0.999,
4,Zigui,Li,CA,2136934325,Clean,0.999,
5,Jimmy,Zhao,NY,1760877484,Found,0.95,
6,Aru,Krishna,GA,1013995935,Found,0.95,
7,Niki,Desai,GA,1659813756,Found,0.95,


In [56]:
http = "https://npiregistry.cms.hhs.gov/api/?"

for x in range(len(qa)):
    fn = str(qa.loc[x][0])
    ln = str(qa.loc[x][1])
    num = str(qa.loc[x][3])
    h = (http + 'number=' + num +'&version=2.1')
    with urllib.request.urlopen(h) as url:
        provider = json.loads(url.read().decode())
        if (
            provider['result_count'] == 1 and
            provider['results'][0]['basic']['first_name'] == fn.upper() and
            provider['results'][0]['basic']['last_name'] == ln.upper()
            ): qa.at[x, 'NPPES NPI Exact Match'] = 'Y'
        else:
            qa.at[x, 'NPPES NPI Exact Match'] = 'N'

In [57]:
qa

Unnamed: 0,Provider First Name,Provider Last Name,Provider State,NPI,Type,Confidence,NPPES NPI Exact Match
0,Ida,Kwok,CA,4555893625,Clean,0.999,N
1,Arun,Nepal,CA,8102774266,Clean,0.999,N
2,Marieta,Valev,CA,2062961130,Clean,0.999,N
3,James,Van Dyne,CA,5171034323,Clean,0.999,N
4,Zigui,Li,CA,2136934325,Clean,0.999,N
5,Jimmy,Zhao,NY,1760877484,Found,0.95,Y
6,Aru,Krishna,GA,1013995935,Found,0.95,N
7,Niki,Desai,GA,1659813756,Found,0.95,N
