## Texas Children's Hospital Affiliation Web Scrape

The goal of this project is to find all the provider info from the Seton FAP website (https://www.texaschildrens.org/doctors)

Notes: the complete list of provider info is on that page; each bio page contains affiliation info; same website structure as Seton (both Ascension hosps)

### Overview
1. Get all provider info from initial page
3. Scrape affils for all Providers listed on page

### Get all Provider Info

##### Notes:
1. Search by specialty for the following specialties listed on site (needed to search/manually get these):
    Anesthesiology (code: 1451; 6 pages)
    Emergency Medicine (1506; 3 pages)
    Neonatology (1496; 4 pages)
    Nuclear Radiology (1641; 2 pages)
    Pathology (1491; 3 pages)
    Pediatric Radiology (1521; 3 pages)
    OBGYN Anesthesiology (1651; 2 pages)

2. page address structure: "https://www.texaschildrens.org/doctorslist?title=&field_specialty_tid=1451&page=1" (zero start)

3. User-agent: *
Crawl-delay: 10

In [1]:
import pandas as pd
import time
import numpy as np

#### web scrape testing:

I'll test out the beautiful soup approaches to get the info I want

In [2]:
import requests
page = requests.get("https://www.texaschildrens.org/doctorslist?title=&field_specialty_tid=1451&page=1")
page.status_code

200

In [3]:
from bs4 import BeautifulSoup, CData
soup = BeautifulSoup(page.content, 'html.parser')

In [9]:
#get all the doc info content on each page
content_entries = soup.find_all(class_='view-content')
content_entries[0]

<div class="view-content">
<div class="views-responsive-grid views-responsive-grid-horizontal views-columns-4">
<div class="views-row views-row-1 views-row-first">
<div class="views-column views-column-1 views-column-first">
<div class="views-field views-field-field-photo"> <div class="field-content"><a href="/find-a-doctor/kathleen-chen-md-ms"><img alt="" height="165" src="https://www.texaschildrens.org/sites/default/files/styles/search_thumbnail/public/chen.jpeg?itok=LOcKWaRx" typeof="foaf:Image" width="165"/></a></div> </div>
<div class="views-field views-field-title"> <span class="field-content"><a href="/find-a-doctor/kathleen-chen-md-ms">Kathleen Chen, MD, MS</a></span> </div>
<div> <div></div> </div> </div>
<div class="views-column views-column-2">
<div class="views-field views-field-field-photo"> <div class="field-content"><a href="/find-a-doctor/julia-han-chen-md"><img alt="" height="165" src="https://www.texaschildrens.org/sites/default/files/styles/search_thumbnail/public/Ch

In [23]:
# get the doc-specific web string
[ce.a.attrs['href'] for ce in content_entries[0].find_all(class_="views-field-field-photo")]

['/find-a-doctor/kathleen-chen-md-ms',
 '/find-a-doctor/julia-han-chen-md',
 '/find-a-doctor/monica-chen-md',
 '/find-a-doctor/kevin-chu-md',
 '/find-a-doctor/nana-e-coleman-md',
 '/find-a-doctor/camille-m-colomb-md',
 '/find-a-doctor/michelle-r-dalton-md',
 '/find-a-doctor/moreshwar-s-desai-md',
 '/find-a-doctor/ronald-blaine-easley-md',
 '/find-a-doctor/amy-fang-md',
 '/find-a-doctor/mary-toni-felberg-md',
 '/find-a-doctor/thomas-p-fogarty-iii-md-faap',
 '/find-a-doctor/clint-fuller-md',
 '/find-a-doctor/priscilla-j-garcia-md',
 '/find-a-doctor/maria-carolina-gazzaneo-md',
 '/find-a-doctor/chris-d-glover-md',
 '/find-a-doctor/jordana-r-goldman-md',
 '/find-a-doctor/cheryl-ann-gore-md-mba-m-ed',
 '/find-a-doctor/michael-gorena-md',
 '/find-a-doctor/kalyani-govindan-md']

In [24]:
len(content_entries[0].find_all(class_="views-field-field-photo"))

20

In [21]:
content_entries[0].find_all(class_="views-field-field-photo")[0].a.attrs['href']

'/find-a-doctor/kathleen-chen-md-ms'

In [17]:
#get doc name
#views-field-title
[ce.get_text().rstrip().lstrip() for ce in content_entries[0].find_all(class_="views-field-title")]

['Kathleen Chen, MD, MS',
 'Julia Han Chen, MD',
 'Monica Chen, MD',
 'Kevin Chu, MD',
 'Nana E. Coleman, MD',
 'Camille M. Colomb, MD',
 'Michelle R. Dalton, MD',
 'Moreshwar S. Desai, MD',
 'Ronald Blaine Easley, MD',
 'Amy Fang, MD',
 'Mary (Toni) A. Felberg, MD',
 'Thomas P. Fogarty III, MD, FAAP',
 'Clint Fuller, MD',
 'Priscilla J. Garcia, MD',
 'Maria Carolina Gazzaneo, MD',
 'Chris D. Glover, MD, MBA',
 'Jordana R. Goldman, MD',
 'Cheryl Ann Gore, MD, MBA, M Ed',
 'Michael Gorena, MD',
 'Kalyani Govindan, MD']

In [25]:
content_entries[0].find_all(class_="views-field-title")[0].get_text().rstrip().lstrip()

'Kathleen Chen, MD, MS'

In [47]:
#import spec id table that I made; to pass the roster scrape function
SpecTable = pd.read_excel("SpecTable.xlsx")
SpecTable.head()

Unnamed: 0,Specialty,SpecCode,Pages
0,Anesthesiology,1451,6
1,Emergency Medicine,1506,3
2,Neonatology,1496,4
3,Nuclear Radiology,1641,1
4,Pathology,1491,3


In [28]:
test_df = SpecTable.loc[SpecTable['Specialty']=='Nuclear Radiology']
test_df.head()

Unnamed: 0,Specialty,SpecCode,Pages
3,Nuclear Radiology,1641,2


### Scrape Roster

In [51]:
#### DEFINE PROVIDER SITE SCRAPE FUNCTION
# https://www.texaschildrens.org/doctorslist?title=&field_specialty_tid=1451&page=1

def ProviderSiteList(dataframe, delay):
    RosterList = []
    seconds = delay #define seconds to wait between get requests
    for index, row in dataframe.iterrows():
        specialty = row['Specialty']
        #print(specialty) #for testing
        specId = row['SpecCode']
        PageNumber = row['Pages']
        for n in range(PageNumber+1):
            page = str(n)
            #print(page) #for testing
            url = "https://www.texaschildrens.org/doctorslist?title=&field_specialty_tid=" + str(specId) + "&page=" + page #concat url
            print(url)
            try:
                #url = "https://www.texaschildrens.org/doctorslist?title=&field_specialty_tid=" + specId + "&page=" + page #concat url
                #print(url)
                rosterpage = requests.get(url,time.sleep(seconds)) #get html from url, with delay to prevent website blacklisting
                #print(rosterpage.status_code) #report status of request
                soup = BeautifulSoup(rosterpage.content, 'html.parser') # make soup!
                content_entries = soup.find_all(class_='view-content')
                #print(content_entries[0]) #for testing
                page_roster_length = len(content_entries[0].find_all(class_="views-field-field-photo"))
                #print(page_roster_length) #for testing
                print("Content Entries scraped")
                for i in range(page_roster_length):
                    site = content_entries[0].find_all(class_="views-field-field-photo")[i].a.attrs['href']
                    name = content_entries[0].find_all(class_="views-field-title")[i].get_text().rstrip().lstrip()
                    dict1 = {'Name': name, 'Specialty': specialty, 'site': site}
                    RosterList.append(dict1)
                    print("Added info for Page: "+page+" & Specialty: "+specialty+" & Provider: "+name)
            except:
                pass
                print("Roster ERROR for Page: "+page+" & Specialty: "+specialty+" & Provider: "+name)
    return(RosterList)

In [52]:
ChildrensSites = ProviderSiteList(SpecTable, 10)

https://www.texaschildrens.org/doctorslist?title=&field_specialty_tid=1451&page=0
Content Entries scraped
Added info for Page: 0 & Specialty: Anesthesiology & Provider: Adam C. Adler, MS, MD, FAAP
Added info for Page: 0 & Specialty: Anesthesiology & Provider: Dheeraj Ahuja, MD
Added info for Page: 0 & Specialty: Anesthesiology & Provider: Titilopemi A.O. Aina, MD, MPH
Added info for Page: 0 & Specialty: Anesthesiology & Provider: Ayse Akcan-Arikan, MD
Added info for Page: 0 & Specialty: Anesthesiology & Provider: Melanie Jeanne Alo, MD
Added info for Page: 0 & Specialty: Anesthesiology & Provider: Marc Michael Anders, MD
Added info for Page: 0 & Specialty: Anesthesiology & Provider: Dean B. Andropoulos, MD, MHCM
Added info for Page: 0 & Specialty: Anesthesiology & Provider: Amy Sharon Arrington, MD, PhD
Added info for Page: 0 & Specialty: Anesthesiology & Provider: Rahul Geetendra Baijal, MD
Added info for Page: 0 & Specialty: Anesthesiology & Provider: Dalia A. Bashir, MD
Added info f

Content Entries scraped
Added info for Page: 5 & Specialty: Anesthesiology & Provider: Premal M. Trivedi, MD
Added info for Page: 5 & Specialty: Anesthesiology & Provider: Sebastian C. Tume, MD
Added info for Page: 5 & Specialty: Anesthesiology & Provider: Mario Patino Velez, MD
Added info for Page: 5 & Specialty: Anesthesiology & Provider: David Freed Vener, MD
Added info for Page: 5 & Specialty: Anesthesiology & Provider: Kenneth L. Wayman, MD
Added info for Page: 5 & Specialty: Anesthesiology & Provider: Eric A. Williams, MD
Added info for Page: 5 & Specialty: Anesthesiology & Provider: Erin S. Williams, MD
Added info for Page: 5 & Specialty: Anesthesiology & Provider: Karla Wyatt, MD, MS
Added info for Page: 5 & Specialty: Anesthesiology & Provider: Ammar Yamani, MD
Added info for Page: 5 & Specialty: Anesthesiology & Provider: David Allan Young, MD
Added info for Page: 5 & Specialty: Anesthesiology & Provider: Michael Blaine Zelisko, MD
Added info for Page: 5 & Specialty: Anesthes

Content Entries scraped
Added info for Page: 2 & Specialty: Neonatology & Provider: Madhulika A. Kulkarni, MD
Added info for Page: 2 & Specialty: Neonatology & Provider: Tommy Leonard Jr., MD
Added info for Page: 2 & Specialty: Neonatology & Provider: Tommy Leonard, MD
Added info for Page: 2 & Specialty: Neonatology & Provider: Krithika Lingappan, MD
Added info for Page: 2 & Specialty: Neonatology & Provider: Jenelle E. Little, MD
Added info for Page: 2 & Specialty: Neonatology & Provider: Pablo Lohmann, MD
Added info for Page: 2 & Specialty: Neonatology & Provider: Torey L Mack, MD
Added info for Page: 2 & Specialty: Neonatology & Provider: Surbhi Maheshwari, MBBS
Added info for Page: 2 & Specialty: Neonatology & Provider: George Thomas Mandy, MD
Added info for Page: 2 & Specialty: Neonatology & Provider: Tiffany M. McKee-Garrett, MD
Added info for Page: 2 & Specialty: Neonatology & Provider: Valerie C. Moore, MD
Added info for Page: 2 & Specialty: Neonatology & Provider: Joseph A. Ga

Content Entries scraped
Added info for Page: 1 & Specialty: Pediatric Radiology & Provider: Stephen F. Kralik, MD
Added info for Page: 1 & Specialty: Pediatric Radiology & Provider: Kamlesh U. Kukreja, MD
Added info for Page: 1 & Specialty: Pediatric Radiology & Provider: Marcia K. Kukreja, MD
Added info for Page: 1 & Specialty: Pediatric Radiology & Provider: Nadia F. Mahmood, MD
Added info for Page: 1 & Specialty: Pediatric Radiology & Provider: Maricarmen Nazario Malavé,  MD
Added info for Page: 1 & Specialty: Pediatric Radiology & Provider: Prakash M. Masand, MD
Added info for Page: 1 & Specialty: Pediatric Radiology & Provider: Amy Mehollin-Ray, MD
Added info for Page: 1 & Specialty: Pediatric Radiology & Provider: Avner Meoded, MD
Added info for Page: 1 & Specialty: Pediatric Radiology & Provider: Amir Pezeshkmehr, MD
Added info for Page: 1 & Specialty: Pediatric Radiology & Provider: Ronald A. Rauch, MD
Added info for Page: 1 & Specialty: Pediatric Radiology & Provider: Alicia M

In [53]:
Site_df = pd.DataFrame(ChildrensSites)
Site_df.head()

Unnamed: 0,Name,Specialty,site
0,"Adam C. Adler, MS, MD, FAAP",Anesthesiology,/find-a-doctor/adam-c-adler-ms-md-faap
1,"Dheeraj Ahuja, MD",Anesthesiology,/find-a-doctor/dheeraj-ahuja-md
2,"Titilopemi A.O. Aina, MD, MPH",Anesthesiology,/find-a-doctor/titilopemi-ao-aina-md
3,"Ayse Akcan-Arikan, MD",Anesthesiology,/find-a-doctor/ayse-akcan-arikan-md
4,"Melanie Jeanne Alo, MD",Anesthesiology,/find-a-doctor/melanie-jeanne-alo-md


In [82]:
Site_df.shape

(352, 3)

### Scrape locations from provider site

Now that I have a list of individual provider sites from the roster, I'll use that as input to scrape affiliations from each doc's profile page

#### Test profile page beautiful soup components

In [69]:

#url = 'https://www.texaschildrens.org'+site

url = 'https://www.texaschildrens.org/find-a-doctor/adam-c-adler-ms-md-faap'
affilpage = requests.get(url) #get html from url, with delay to prevent website blacklisting
soup = BeautifulSoup(affilpage.content, 'html.parser') # make soup!
content_entries = soup.find_all(class_='adr')

#views-field-field-campus

In [70]:
content_entries

[<div class="adr">
 <div class="street-address">
 <span itemprop="streetAddress">6621 Fannin Street</span>
 <div class="additional" itemprop="streetAddress">
             Suite A3300          </div>
 </div>
 <span class="locality" itemprop="addressLocality">Houston,</span>
 <span class="region" itemprop="addressRegion">TX</span>
 <span class="postal-code" itemprop="postalCode">77030</span>
 <div class="country-name" itemprop="addressCountry">United States</div>
 </div>]

In [80]:
locations = [ce.get_text().lstrip().rstrip().replace('Location ', '') for ce in soup.find_all(class_="views-field-field-campus")]
locations

['Texas Medical Center']

In [78]:
#street-address
[ce.get_text().rstrip().lstrip().replace('\n\n            ', '--') for ce in content_entries[0].find_all(class_="street-address")]
#[ce.get_text().lstrip().rstrip() for ce in soup.find_all(class_="street-address")]

['6621 Fannin Street--Suite A3300']

In [77]:
#City
[ce.get_text().replace(',', '') for ce in content_entries[0].find_all(class_="locality")]
#[ce.get_text().lstrip().rstrip() for ce in soup.find_all(class_="locality")]

['Houston']

In [75]:
#state
[ce.get_text().rstrip().lstrip() for ce in content_entries[0].find_all(class_="region")]
#[ce.get_text().lstrip().rstrip() for ce in soup.find_all(class_="region")]

['TX']

In [76]:
#postal-code
[ce.get_text().rstrip().lstrip() for ce in content_entries[0].find_all(class_="postal-code")]
#[ce.get_text().lstrip().rstrip() for ce in soup.find_all(class_="postal-code")]

['77030']

In [95]:
# Phone
[ce.get_text().rstrip().lstrip().replace('Phone: ', '') for ce in soup.find_all(class_="views-field-field-phone")]

['832-824-5800']

### Get Affiliations from each doc's bio page

In [96]:
#number of provider list sites

def ProviderAffilList(dataframe, delay):
    AffilList = []
    seconds = delay #define seconds to wait between get requests
    for index, row in dataframe.iterrows():
        name = row['Name']
        site_string = row['site']
        try:
            url = "https://www.texaschildrens.org" + site_string #concat url
            page = requests.get(url,time.sleep(seconds)) #get html from url, with delay to prevent website blacklisting
            print(page.status_code) #report status of request
            soup = BeautifulSoup(page.content, 'html.parser') # make soup!
            locations = [ce.get_text().lstrip().rstrip().replace('Location ', '') for ce in soup.find_all(class_="views-field-field-campus")]
            content_entries = soup.find_all(class_='adr')
            street_address = [ce.get_text().rstrip().lstrip().replace('\n\n            ', '--') for ce in content_entries[0].find_all(class_="street-address")]
            city = [ce.get_text().replace(',', '') for ce in content_entries[0].find_all(class_="locality")]
            state = [ce.get_text().rstrip().lstrip() for ce in content_entries[0].find_all(class_="region")]
            zip_office = [ce.get_text().rstrip().lstrip() for ce in content_entries[0].find_all(class_="postal-code")]
            phone = [ce.get_text().rstrip().lstrip().replace('Phone: ', '') for ce in soup.find_all(class_="views-field-field-phone")]
            
            for i in range(len(locations)):
                dict1 = {'Name': name, 'Location': locations[i], 'OfficeAddress': str(street_address), 'OfficeCity': str(city), 'OfficeState': state, 'OfficeZip': zip_office, 'OfficePhone': phone}
                AffilList.append(dict1)
                print("Added Affil for Provider: "+name)
        except:
            pass
            print("Provider Affil ERROR for Page: "+name)
    return(AffilList)

#### Test run

In [86]:
test_df = Site_df.iloc[0:3, :]
test_df.head()

Unnamed: 0,Name,Specialty,site
0,"Adam C. Adler, MS, MD, FAAP",Anesthesiology,/find-a-doctor/adam-c-adler-ms-md-faap
1,"Dheeraj Ahuja, MD",Anesthesiology,/find-a-doctor/dheeraj-ahuja-md
2,"Titilopemi A.O. Aina, MD, MPH",Anesthesiology,/find-a-doctor/titilopemi-ao-aina-md


In [97]:
test_list = ProviderAffilList(test_df, 5)

200
Added Affil for Provider: Adam C. Adler, MS, MD, FAAP
200
Added Affil for Provider: Dheeraj Ahuja, MD
200
Added Affil for Provider: Titilopemi A.O. Aina, MD, MPH


In [98]:
pd.DataFrame(test_list)

Unnamed: 0,Location,Name,OfficeAddress,OfficeCity,OfficePhone,OfficeState,OfficeZip
0,Texas Medical Center,"Adam C. Adler, MS, MD, FAAP",['6621 Fannin Street--Suite A3300'],['Houston'],[832-824-5800],[TX],[77030]
1,Texas Medical Center,"Dheeraj Ahuja, MD",['6621 Fannin Street--Suite A3300'],['Houston'],[832-824-5800],[TX],[77030]
2,Texas Medical Center,"Titilopemi A.O. Aina, MD, MPH",['6621 Fannin Street--Suite A3300'],['Houston'],[832-824-5800],[TX],[77030]


#### Get that data!

In [99]:
affil_list = ProviderAffilList(Site_df, 10)

200
Added Affil for Provider: Adam C. Adler, MS, MD, FAAP
200
Added Affil for Provider: Dheeraj Ahuja, MD
200
Added Affil for Provider: Titilopemi A.O. Aina, MD, MPH
200
Added Affil for Provider: Ayse Akcan-Arikan, MD
200
Added Affil for Provider: Melanie Jeanne Alo, MD
200
Added Affil for Provider: Marc Michael Anders, MD
200
Added Affil for Provider: Dean B. Andropoulos, MD, MHCM
200
Added Affil for Provider: Amy Sharon Arrington, MD, PhD
200
Added Affil for Provider: Rahul Geetendra Baijal, MD
200
Added Affil for Provider: Dalia A. Bashir, MD
200
Added Affil for Provider: Aarti Bavare, MD
200
Added Affil for Provider: Saleh Bhar, MD
200
Added Affil for Provider: Tamer Elattary, MD
200
Added Affil for Provider: Ronald A. Bronicki, MD
200
Added Affil for Provider: Katrin Ann Campbell, MD
200
Added Affil for Provider: Carlos Javier Campos-Lopez, MD
200
Added Affil for Provider: Lisa A. Caplan, MD
200
Added Affil for Provider: Nicholas Patrick Carling, MD
200
Added Affil for Provider: D

200
200
Added Affil for Provider: Athis Rajh Arunachalam, MD
200
Added Affil for Provider: Amit A. Bhatt, MD, FAAP
200
Provider Affil ERROR for Page: Joel W. Bonaparte, MD
200
200
Added Affil for Provider: Courtney Ann Carmicheal-Swanner, MD
200
Provider Affil ERROR for Page: Rebeca Elizabeth Cavazos, MD
200
Provider Affil ERROR for Page: Amarnath H Chamkur, MD
200
200
Added Affil for Provider: Mufeed Al Islam Ashraf, MD
200
Added Affil for Provider: Jonathan Lyle Davies, MD
200
Added Affil for Provider: Melissa M. Carbajal, MD
200
Added Affil for Provider: Xanthi I. Couroucli, MD
200
200
Added Affil for Provider: Milenka Cuevas-Guaman, MD
200
Provider Affil ERROR for Page: Shaeequa P. Dasnadi, MD
200
200
Added Affil for Provider: Stephanie Blair Deal, MD
200
Added Affil for Provider: Karen T. deVille, MD
200
Provider Affil ERROR for Page: Daniela Dinu, MD
200
Added Affil for Provider: Adel A. ElHennawy, MD
200
Added Affil for Provider: Caraciolo J. Fernandes, MD, MBA
200
Added Affil f

200
Provider Affil ERROR for Page: Eric P. Ritter, MD
200
Provider Affil ERROR for Page: Derek M. Schoppa, MD
200
200
Provider Affil ERROR for Page: Jo Hsu Tu, DO
200
200
Provider Affil ERROR for Page: Odile Francoise Yacoub, MD


#### Check results

In [100]:
affil_df = pd.DataFrame(affil_list)
affil_df.head(20)

Unnamed: 0,Location,Name,OfficeAddress,OfficeCity,OfficePhone,OfficeState,OfficeZip
0,Texas Medical Center,"Adam C. Adler, MS, MD, FAAP",['6621 Fannin Street--Suite A3300'],['Houston'],[832-824-5800],[TX],[77030]
1,Texas Medical Center,"Dheeraj Ahuja, MD",['6621 Fannin Street--Suite A3300'],['Houston'],[832-824-5800],[TX],[77030]
2,Texas Medical Center,"Titilopemi A.O. Aina, MD, MPH",['6621 Fannin Street--Suite A3300'],['Houston'],[832-824-5800],[TX],[77030]
3,Texas Medical Center,"Ayse Akcan-Arikan, MD",['6651 Main Street--MC E1420'],['Houston'],[832-826-6230],[TX],[77030]
4,Texas Medical Center,"Melanie Jeanne Alo, MD",['6621 Fannin Street--Suite A3300'],['Houston'],[832-824-5800],[TX],[77030]
5,Texas Medical Center,"Marc Michael Anders, MD",['6651 Main Street--MC E1420'],['Houston'],[832-826-6230],[TX],[77030]
6,Texas Medical Center,"Dean B. Andropoulos, MD, MHCM",['6621 Fannin Street--Suite A3300'],['Houston'],[832-824-5800],[TX],[77030]
7,Texas Medical Center,"Amy Sharon Arrington, MD, PhD",['6651 Main Street--MC E1420'],['Houston'],[832-826-5048],[TX],[77030]
8,Texas Medical Center,"Rahul Geetendra Baijal, MD",['6621 Fannin Street--Suite A3300'],['Houston'],[832-824-5800],[TX],[77030]
9,Texas Medical Center,"Dalia A. Bashir, MD",['6651 Main Street--MC E1420'],['Houston'],[832-822-6230],[TX],[77030]


In [101]:
# Merge affil data back to Roster
affil_specs_df = pd.merge(Site_df, affil_df, how = 'left', on = 'Name')
affil_specs_df.head(20)

Unnamed: 0,Name,Specialty,site,Location,OfficeAddress,OfficeCity,OfficePhone,OfficeState,OfficeZip
0,"Adam C. Adler, MS, MD, FAAP",Anesthesiology,/find-a-doctor/adam-c-adler-ms-md-faap,Texas Medical Center,['6621 Fannin Street--Suite A3300'],['Houston'],[832-824-5800],[TX],[77030]
1,"Dheeraj Ahuja, MD",Anesthesiology,/find-a-doctor/dheeraj-ahuja-md,Texas Medical Center,['6621 Fannin Street--Suite A3300'],['Houston'],[832-824-5800],[TX],[77030]
2,"Titilopemi A.O. Aina, MD, MPH",Anesthesiology,/find-a-doctor/titilopemi-ao-aina-md,Texas Medical Center,['6621 Fannin Street--Suite A3300'],['Houston'],[832-824-5800],[TX],[77030]
3,"Ayse Akcan-Arikan, MD",Anesthesiology,/find-a-doctor/ayse-akcan-arikan-md,Texas Medical Center,['6651 Main Street--MC E1420'],['Houston'],[832-826-6230],[TX],[77030]
4,"Melanie Jeanne Alo, MD",Anesthesiology,/find-a-doctor/melanie-jeanne-alo-md,Texas Medical Center,['6621 Fannin Street--Suite A3300'],['Houston'],[832-824-5800],[TX],[77030]
5,"Marc Michael Anders, MD",Anesthesiology,/find-a-doctor/marc-michael-anders-md,Texas Medical Center,['6651 Main Street--MC E1420'],['Houston'],[832-826-6230],[TX],[77030]
6,"Dean B. Andropoulos, MD, MHCM",Anesthesiology,/find-a-doctor/dean-b-andropoulos-md-mhcm,Texas Medical Center,['6621 Fannin Street--Suite A3300'],['Houston'],[832-824-5800],[TX],[77030]
7,"Amy Sharon Arrington, MD, PhD",Anesthesiology,/find-a-doctor/amy-sharon-arrington-md-phd,Texas Medical Center,['6651 Main Street--MC E1420'],['Houston'],[832-826-5048],[TX],[77030]
8,"Rahul Geetendra Baijal, MD",Anesthesiology,/find-a-doctor/rahul-geetendra-baijal-md,Texas Medical Center,['6621 Fannin Street--Suite A3300'],['Houston'],[832-824-5800],[TX],[77030]
9,"Dalia A. Bashir, MD",Anesthesiology,/find-a-doctor/dalia-bashir-md,Texas Medical Center,['6651 Main Street--MC E1420'],['Houston'],[832-822-6230],[TX],[77030]


In [102]:
# write out results
from pandas import ExcelWriter
filename = 'TexasChildrensRosterAffilsRaw.xlsx'

with ExcelWriter(filename) as writer:
        affil_specs_df.to_excel(writer, sheet_name='TX_Childrens_RawRoster')

### Split Names for NPI Search

In order to cross reference the docs in our DB, we'll need to get their National Provider Number (NPI) from the NPPES website.  

CAVEAT: I'll use practive state and doc name to search, but there might be ambiguous serach results for docs with the same name.

In [106]:
#split name/degrees

# new data frame with split value columns 
new = affil_specs_df.Name.str.split(',', 1, expand = True) 
  
# making separate first name column from new data frame 
affil_specs_df['Name']= new[0] 
  
# making separate last name column from new data frame 
affil_specs_df['Degree']= new[1] 

In [117]:
# new data frame with split value columns 
new = affil_specs_df.Name.str.split(' ', 1, expand = True) 
 
# making separate first name column from new data frame 
affil_specs_df['FirstName']= new[0] 

#split from right for last name
new2 = affil_specs_df["Name"].str.rsplit(" ", n = 1, expand = True) 

# making separate last name column from new data frame 
affil_specs_df['LastName']= new2[1] 

In [118]:
affil_specs_df.head()

Unnamed: 0,Name,Specialty,site,Location,OfficeAddress,OfficeCity,OfficePhone,OfficeState,OfficeZip,Degree,FirstName,LastName
0,Adam C. Adler,Anesthesiology,/find-a-doctor/adam-c-adler-ms-md-faap,Texas Medical Center,['6621 Fannin Street--Suite A3300'],['Houston'],[832-824-5800],[TX],[77030],"MS, MD, FAAP",Adam,Adler
1,Dheeraj Ahuja,Anesthesiology,/find-a-doctor/dheeraj-ahuja-md,Texas Medical Center,['6621 Fannin Street--Suite A3300'],['Houston'],[832-824-5800],[TX],[77030],MD,Dheeraj,Ahuja
2,Titilopemi A.O. Aina,Anesthesiology,/find-a-doctor/titilopemi-ao-aina-md,Texas Medical Center,['6621 Fannin Street--Suite A3300'],['Houston'],[832-824-5800],[TX],[77030],"MD, MPH",Titilopemi,Aina
3,Ayse Akcan-Arikan,Anesthesiology,/find-a-doctor/ayse-akcan-arikan-md,Texas Medical Center,['6651 Main Street--MC E1420'],['Houston'],[832-826-6230],[TX],[77030],MD,Ayse,Akcan-Arikan
4,Melanie Jeanne Alo,Anesthesiology,/find-a-doctor/melanie-jeanne-alo-md,Texas Medical Center,['6621 Fannin Street--Suite A3300'],['Houston'],[832-824-5800],[TX],[77030],MD,Melanie,Alo


In [119]:
# write out results
from pandas import ExcelWriter
filename = 'TexasChildrensRosterAffilsRaw.xlsx'

with ExcelWriter(filename) as writer:
        affil_specs_df.to_excel(writer, sheet_name='TX_Childrens_RawRoster')

### Get Prac NPIs (including address and specialty, by name and state)

In order to get more specific NPI results/cross check them against the specialty listed on the hosp site, I'll query for doc address and specialty code as well

In [2]:
# Put it all together!
import pandas as pd
import json
import time
import requests
from bs4 import BeautifulSoup
import re
from pandas.io.json import json_normalize

def NPI_API_PROVIDER_NAME(dataframe, delay):
    NPI_API = pd.DataFrame(columns=['basic.first_name', 'basic.last_name','number',  'address_1', 'address_2', 'address_purpose', 'city', 'state', 'postal_code', 'telephone_number', 'fax_number', 'basic.enumeration_date', 'desc']) #initialize DataFrame
    seconds = delay #define seconds to wait between get requests
    
    for index, row in dataframe.iterrows():
        url_1 = 'https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&first_name='
        url_first_name = row['FirstName']
        url_2 = '&use_first_name_alias=&last_name='
        url_last_name = row['LastName']
        url_3 = '&organization_name=&address_purpose=LOCATION&city=&state=TX&postal_code=&country_code=&limit=&skip=&version=2.1'
        url = url_1+url_first_name+url_2+url_last_name+url_3
        print('Query url: '+ url)
        try:
            page = requests.get(url, time.sleep(seconds))
            #print(page.status_code) #for testing
            page_json = json.loads(page.content)
            df = pd.DataFrame.from_dict(json_normalize(page_json['results']), orient='columns') 
            addresses_with_NPI = pd.io.json.json_normalize(page_json['results'], record_path='addresses', meta=['number'])
            df_addresses = pd.merge(df, addresses_with_NPI, how='inner', on='number')
            columns_to_keep = ['basic.first_name', 'basic.last_name', 'number',  'address_1', 'address_2', 'address_purpose', 'city', 'state', 'postal_code', 'telephone_number', 'fax_number', 'basic.enumeration_date']
            df_addresses_subset = df_addresses.reindex(columns=columns_to_keep)
            taxonomies_with_NPI = pd.io.json.json_normalize(page_json['results'], record_path='taxonomies', meta=['number'])
            df_taxonomies = pd.merge(df_addresses_subset, taxonomies_with_NPI, how='inner', on=['number'])
            tax_columns_to_keep = ['basic.first_name', 'basic.last_name', 'number',  'desc','address_1', 'address_2', 'address_purpose', 'city', 'state', 'postal_code', 'telephone_number', 'fax_number', 'basic.enumeration_date']
            df_taxonomies_subset = df_taxonomies.reindex(columns=tax_columns_to_keep)
            #print(df_addresses_subset.head(1)) #for testing
            NPI_API = NPI_API.append(df_taxonomies_subset, ignore_index=True)
            #print(NPI_API.shape) #for testing
            print('NPI added for '+ url_first_name+' '+url_last_name)
        except:
            pass
            print('NPI ERROR for '+ url_first_name+' '+url_last_name)
    return(NPI_API)

In [3]:
#read in ambiNPIs
#this is the list of ambiguous NPIs to cross check against specialty
tx_childrens = pd.read_excel('TexasChildrensRosterAffilsRaw.xlsx')
tx_childrens.head()

Unnamed: 0,Name,Specialty,site,Location,OfficeAddress,OfficeCity,OfficePhone,OfficeState,OfficeZip,Degree,FirstName,LastName
0,Adam C. Adler,Anesthesiology,/find-a-doctor/adam-c-adler-ms-md-faap,Texas Medical Center,['6621 Fannin Street--Suite A3300'],['Houston'],['832-824-5800'],['TX'],['77030'],"MS, MD, FAAP",Adam,Adler
1,Dheeraj Ahuja,Anesthesiology,/find-a-doctor/dheeraj-ahuja-md,Texas Medical Center,['6621 Fannin Street--Suite A3300'],['Houston'],['832-824-5800'],['TX'],['77030'],MD,Dheeraj,Ahuja
2,Titilopemi A.O. Aina,Anesthesiology,/find-a-doctor/titilopemi-ao-aina-md,Texas Medical Center,['6621 Fannin Street--Suite A3300'],['Houston'],['832-824-5800'],['TX'],['77030'],"MD, MPH",Titilopemi,Aina
3,Ayse Akcan-Arikan,Anesthesiology,/find-a-doctor/ayse-akcan-arikan-md,Texas Medical Center,['6651 Main Street--MC E1420'],['Houston'],['832-826-6230'],['TX'],['77030'],MD,Ayse,Akcan-Arikan
4,Melanie Jeanne Alo,Anesthesiology,/find-a-doctor/melanie-jeanne-alo-md,Texas Medical Center,['6621 Fannin Street--Suite A3300'],['Houston'],['832-824-5800'],['TX'],['77030'],MD,Melanie,Alo


In [4]:
tx_childrens.shape

(354, 12)

In [5]:
#get NPPES API results
npi_specs = NPI_API_PROVIDER_NAME(tx_childrens, 2)

Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&first_name=Adam&use_first_name_alias=&last_name=Adler&organization_name=&address_purpose=LOCATION&city=&state=TX&postal_code=&country_code=&limit=&skip=&version=2.1
NPI added for Adam Adler
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&first_name=Dheeraj&use_first_name_alias=&last_name=Ahuja&organization_name=&address_purpose=LOCATION&city=&state=TX&postal_code=&country_code=&limit=&skip=&version=2.1
NPI ERROR for Dheeraj Ahuja
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&first_name=Titilopemi&use_first_name_alias=&last_name=Aina&organization_name=&address_purpose=LOCATION&city=&state=TX&postal_code=&country_code=&limit=&skip=&version=2.1
NPI ERROR for Titilopemi Aina
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&first_name=Ayse&use_first_name

NPI added for Ronald Easley
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&first_name=Amy&use_first_name_alias=&last_name=Fang&organization_name=&address_purpose=LOCATION&city=&state=TX&postal_code=&country_code=&limit=&skip=&version=2.1
NPI ERROR for Amy Fang
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&first_name=Mary&use_first_name_alias=&last_name=Felberg&organization_name=&address_purpose=LOCATION&city=&state=TX&postal_code=&country_code=&limit=&skip=&version=2.1
NPI added for Mary Felberg
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&first_name=Thomas&use_first_name_alias=&last_name=III&organization_name=&address_purpose=LOCATION&city=&state=TX&postal_code=&country_code=&limit=&skip=&version=2.1
NPI ERROR for Thomas III
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&first_name=Clint&u

NPI added for Benjamin Lee
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&first_name=Joyce&use_first_name_alias=&last_name=Liu&organization_name=&address_purpose=LOCATION&city=&state=TX&postal_code=&country_code=&limit=&skip=&version=2.1
NPI ERROR for Joyce Liu
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&first_name=Yang&use_first_name_alias=&last_name=Liu&organization_name=&address_purpose=LOCATION&city=&state=TX&postal_code=&country_code=&limit=&skip=&version=2.1
NPI added for Yang Liu
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&first_name=Lauren&use_first_name_alias=&last_name=Lobaugh&organization_name=&address_purpose=LOCATION&city=&state=TX&postal_code=&country_code=&limit=&skip=&version=2.1
NPI ERROR for Lauren Lobaugh
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&first_name=Michel

NPI added for Julie Schackman
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&first_name=Brent&use_first_name_alias=&last_name=Schakett&organization_name=&address_purpose=LOCATION&city=&state=TX&postal_code=&country_code=&limit=&skip=&version=2.1
NPI added for Brent Schakett
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&first_name=Catherine&use_first_name_alias=&last_name=Seipel&organization_name=&address_purpose=LOCATION&city=&state=TX&postal_code=&country_code=&limit=&skip=&version=2.1
NPI added for Catherine Seipel
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&first_name=Thomas&use_first_name_alias=&last_name=Shaw&organization_name=&address_purpose=LOCATION&city=&state=TX&postal_code=&country_code=&limit=&skip=&version=2.1
NPI added for Thomas Shaw
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_descrip

NPI added for Joseph Allen
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&first_name=David&use_first_name_alias=&last_name=Ashby&organization_name=&address_purpose=LOCATION&city=&state=TX&postal_code=&country_code=&limit=&skip=&version=2.1
NPI ERROR for David Ashby
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&first_name=Brian&use_first_name_alias=&last_name=Bassham&organization_name=&address_purpose=LOCATION&city=&state=TX&postal_code=&country_code=&limit=&skip=&version=2.1
NPI added for Brian Bassham
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&first_name=Alecia&use_first_name_alias=&last_name=Borkovich&organization_name=&address_purpose=LOCATION&city=&state=TX&postal_code=&country_code=&limit=&skip=&version=2.1
NPI added for Alecia Borkovich
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=

NPI ERROR for Julie McManemy
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&first_name=Sarah&use_first_name_alias=&last_name=Meskill&organization_name=&address_purpose=LOCATION&city=&state=TX&postal_code=&country_code=&limit=&skip=&version=2.1
NPI added for Sarah Meskill
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&first_name=Binita&use_first_name_alias=&last_name=Patel&organization_name=&address_purpose=LOCATION&city=&state=TX&postal_code=&country_code=&limit=&skip=&version=2.1
NPI added for Binita Patel
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&first_name=Faria&use_first_name_alias=&last_name=Pereira&organization_name=&address_purpose=LOCATION&city=&state=TX&postal_code=&country_code=&limit=&skip=&version=2.1
NPI added for Faria Pereira
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&f

NPI added for Jonathan Davies
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&first_name=Melissa&use_first_name_alias=&last_name=Carbajal&organization_name=&address_purpose=LOCATION&city=&state=TX&postal_code=&country_code=&limit=&skip=&version=2.1
NPI added for Melissa Carbajal
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&first_name=Xanthi&use_first_name_alias=&last_name=Couroucli&organization_name=&address_purpose=LOCATION&city=&state=TX&postal_code=&country_code=&limit=&skip=&version=2.1
NPI added for Xanthi Couroucli
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&first_name=Margo&use_first_name_alias=&last_name=Cox&organization_name=&address_purpose=LOCATION&city=&state=TX&postal_code=&country_code=&limit=&skip=&version=2.1
NPI added for Margo Cox
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_descrip

NPI added for Lakshmi Katakam
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&first_name=Mona&use_first_name_alias=&last_name=Khattab&organization_name=&address_purpose=LOCATION&city=&state=TX&postal_code=&country_code=&limit=&skip=&version=2.1
NPI ERROR for Mona Khattab
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&first_name=Madhulika&use_first_name_alias=&last_name=Kulkarni&organization_name=&address_purpose=LOCATION&city=&state=TX&postal_code=&country_code=&limit=&skip=&version=2.1
NPI added for Madhulika Kulkarni
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&first_name=Tommy&use_first_name_alias=&last_name=Jr.&organization_name=&address_purpose=LOCATION&city=&state=TX&postal_code=&country_code=&limit=&skip=&version=2.1
NPI ERROR for Tommy Jr.
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description

NPI added for Talat Ahmed
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&first_name=Salim&use_first_name_alias=&last_name=Bharwani&organization_name=&address_purpose=LOCATION&city=&state=TX&postal_code=&country_code=&limit=&skip=&version=2.1
NPI added for Salim Bharwani
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&first_name=William&use_first_name_alias=&last_name=Caplan&organization_name=&address_purpose=LOCATION&city=&state=TX&postal_code=&country_code=&limit=&skip=&version=2.1
NPI added for William Caplan
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&first_name=Zvi&use_first_name_alias=&last_name=Friedman&organization_name=&address_purpose=LOCATION&city=&state=TX&postal_code=&country_code=&limit=&skip=&version=2.1
NPI ERROR for Zvi Friedman
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&

NPI added for Andrea Marcogliese
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&first_name=Carrie&use_first_name_alias=&last_name=Mohila&organization_name=&address_purpose=LOCATION&city=&state=TX&postal_code=&country_code=&limit=&skip=&version=2.1
NPI ERROR for Carrie Mohila
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&first_name=Yuko&use_first_name_alias=&last_name=Mori-Akiyama&organization_name=&address_purpose=LOCATION&city=&state=TX&postal_code=&country_code=&limit=&skip=&version=2.1
NPI ERROR for Yuko Mori-Akiyama
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&first_name=Angshumoy&use_first_name_alias=&last_name=Roy&organization_name=&address_purpose=LOCATION&city=&state=TX&postal_code=&country_code=&limit=&skip=&version=2.1
NPI added for Angshumoy Roy
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_

NPI added for Angel Blanco
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&first_name=Marcos&use_first_name_alias=&last_name=Botelho&organization_name=&address_purpose=LOCATION&city=&state=TX&postal_code=&country_code=&limit=&skip=&version=2.1
NPI ERROR for Marcos Botelho
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&first_name=Richard&use_first_name_alias=&last_name=Braverman&organization_name=&address_purpose=LOCATION&city=&state=TX&postal_code=&country_code=&limit=&skip=&version=2.1
NPI added for Richard Braverman
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&first_name=Christopher&use_first_name_alias=&last_name=Cassady&organization_name=&address_purpose=LOCATION&city=&state=TX&postal_code=&country_code=&limit=&skip=&version=2.1
NPI added for Christopher Cassady
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&t

NPI ERROR for Amir Pezeshkmehr
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&first_name=Ronald&use_first_name_alias=&last_name=Rauch&organization_name=&address_purpose=LOCATION&city=&state=TX&postal_code=&country_code=&limit=&skip=&version=2.1
NPI added for Ronald Rauch
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&first_name=Alicia&use_first_name_alias=&last_name=Roman-Colon&organization_name=&address_purpose=LOCATION&city=&state=TX&postal_code=&country_code=&limit=&skip=&version=2.1
NPI added for Alicia Roman-Colon
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&first_name=Marla&use_first_name_alias=&last_name=Sammer&organization_name=&address_purpose=LOCATION&city=&state=TX&postal_code=&country_code=&limit=&skip=&version=2.1
NPI ERROR for Marla Sammer
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_desc

NPI added for Mark McMasters
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&first_name=Rakesh&use_first_name_alias=&last_name=Narayan&organization_name=&address_purpose=LOCATION&city=&state=TX&postal_code=&country_code=&limit=&skip=&version=2.1
NPI added for Rakesh Narayan
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&first_name=Cenk&use_first_name_alias=&last_name=Ozdogan&organization_name=&address_purpose=LOCATION&city=&state=TX&postal_code=&country_code=&limit=&skip=&version=2.1
NPI added for Cenk Ozdogan
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&first_name=George&use_first_name_alias=&last_name=Parker&organization_name=&address_purpose=LOCATION&city=&state=TX&postal_code=&country_code=&limit=&skip=&version=2.1
NPI added for George Parker
Query url: https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=

#### Data Carpentry

In [6]:
npi_specs = npi_specs[['basic.first_name', 'basic.last_name', 'number',  'desc','address_1', 'address_2', 'address_purpose', 'city', 'state', 'postal_code', 'telephone_number', 'fax_number', 'basic.enumeration_date']]
npi_specs

Unnamed: 0,basic.first_name,basic.last_name,number,desc,address_1,address_2,address_purpose,city,state,postal_code,telephone_number,fax_number,basic.enumeration_date
0,ADAM,ADLER,1104141886,Anesthesiology,6701 FANNIN ST,,LOCATION,HOUSTON,,770302608,832-824-1000,,2010-04-06
1,ADAM,ADLER,1104141886,Student in an Organized Health Care Education/...,6701 FANNIN ST,,LOCATION,HOUSTON,,770302608,832-824-1000,,2010-04-06
2,ADAM,ADLER,1104141886,Anesthesiology Pediatric Anesthesiology,6701 FANNIN ST,,LOCATION,HOUSTON,,770302608,832-824-1000,,2010-04-06
3,ADAM,ADLER,1104141886,Anesthesiology,2 GREENWAY PLZ,SUITE 300,MAILING,HOUSTON,,770460297,832-828-3660,,2010-04-06
4,ADAM,ADLER,1104141886,Student in an Organized Health Care Education/...,2 GREENWAY PLZ,SUITE 300,MAILING,HOUSTON,,770460297,832-828-3660,,2010-04-06
5,ADAM,ADLER,1104141886,Anesthesiology Pediatric Anesthesiology,2 GREENWAY PLZ,SUITE 300,MAILING,HOUSTON,,770460297,832-828-3660,,2010-04-06
6,MELANIE,ALO,1689671240,Anesthesiology,6621 FANNIN,,LOCATION,HOUSTON,,77030,832-824-5800,832-825-5801,2005-06-28
7,MELANIE,ALO,1689671240,Anesthesiology,TWO GREENWAY PLAZA,SUITE 900,MAILING,HOUSTON,,77046,713-798-1750,713-798-1187,2005-06-28
8,MARC,ANDERS,1053708503,General Acute Care Hospital Children,6621 FANNIN ST,SECTION OF CRITICAL CARE,LOCATION,HOUSTON,,770302358,832-824-1000,,2015-04-22
9,MARC,ANDERS,1053708503,Pediatrics Pediatric Critical Care Medicine,6621 FANNIN ST,SECTION OF CRITICAL CARE,LOCATION,HOUSTON,,770302358,832-824-1000,,2015-04-22


In [7]:
npi_specs_location = npi_specs.loc[npi_specs['address_purpose']=='LOCATION', ]
npi_specs_location.head()

Unnamed: 0,basic.first_name,basic.last_name,number,desc,address_1,address_2,address_purpose,city,state,postal_code,telephone_number,fax_number,basic.enumeration_date
0,ADAM,ADLER,1104141886,Anesthesiology,6701 FANNIN ST,,LOCATION,HOUSTON,,770302608,832-824-1000,,2010-04-06
1,ADAM,ADLER,1104141886,Student in an Organized Health Care Education/...,6701 FANNIN ST,,LOCATION,HOUSTON,,770302608,832-824-1000,,2010-04-06
2,ADAM,ADLER,1104141886,Anesthesiology Pediatric Anesthesiology,6701 FANNIN ST,,LOCATION,HOUSTON,,770302608,832-824-1000,,2010-04-06
6,MELANIE,ALO,1689671240,Anesthesiology,6621 FANNIN,,LOCATION,HOUSTON,,77030,832-824-5800,832-825-5801,2005-06-28
8,MARC,ANDERS,1053708503,General Acute Care Hospital Children,6621 FANNIN ST,SECTION OF CRITICAL CARE,LOCATION,HOUSTON,,770302358,832-824-1000,,2015-04-22


In [8]:
npi_specs_location['FirstName'] = npi_specs_location['basic.first_name'].str.title()
npi_specs_location['LastName'] = npi_specs_location['basic.last_name'].str.title()

npi_specs_location.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


Unnamed: 0,basic.first_name,basic.last_name,number,desc,address_1,address_2,address_purpose,city,state,postal_code,telephone_number,fax_number,basic.enumeration_date,FirstName,LastName
0,ADAM,ADLER,1104141886,Anesthesiology,6701 FANNIN ST,,LOCATION,HOUSTON,,770302608,832-824-1000,,2010-04-06,Adam,Adler
1,ADAM,ADLER,1104141886,Student in an Organized Health Care Education/...,6701 FANNIN ST,,LOCATION,HOUSTON,,770302608,832-824-1000,,2010-04-06,Adam,Adler
2,ADAM,ADLER,1104141886,Anesthesiology Pediatric Anesthesiology,6701 FANNIN ST,,LOCATION,HOUSTON,,770302608,832-824-1000,,2010-04-06,Adam,Adler
6,MELANIE,ALO,1689671240,Anesthesiology,6621 FANNIN,,LOCATION,HOUSTON,,77030,832-824-5800,832-825-5801,2005-06-28,Melanie,Alo
8,MARC,ANDERS,1053708503,General Acute Care Hospital Children,6621 FANNIN ST,SECTION OF CRITICAL CARE,LOCATION,HOUSTON,,770302358,832-824-1000,,2015-04-22,Marc,Anders


In [9]:
#join NPIs back to affil roster
Texas_childrens_roster_npis = pd.merge(tx_childrens, npi_specs_location, how = 'left', on = ['FirstName', 'LastName'])
Texas_childrens_roster_npis.head(20)

Unnamed: 0,Name,Specialty,site,Location,OfficeAddress,OfficeCity,OfficePhone,OfficeState,OfficeZip,Degree,...,desc,address_1,address_2,address_purpose,city,state,postal_code,telephone_number,fax_number,basic.enumeration_date
0,Adam C. Adler,Anesthesiology,/find-a-doctor/adam-c-adler-ms-md-faap,Texas Medical Center,['6621 Fannin Street--Suite A3300'],['Houston'],['832-824-5800'],['TX'],['77030'],"MS, MD, FAAP",...,Anesthesiology,6701 FANNIN ST,,LOCATION,HOUSTON,,770302608.0,832-824-1000,,2010-04-06
1,Adam C. Adler,Anesthesiology,/find-a-doctor/adam-c-adler-ms-md-faap,Texas Medical Center,['6621 Fannin Street--Suite A3300'],['Houston'],['832-824-5800'],['TX'],['77030'],"MS, MD, FAAP",...,Student in an Organized Health Care Education/...,6701 FANNIN ST,,LOCATION,HOUSTON,,770302608.0,832-824-1000,,2010-04-06
2,Adam C. Adler,Anesthesiology,/find-a-doctor/adam-c-adler-ms-md-faap,Texas Medical Center,['6621 Fannin Street--Suite A3300'],['Houston'],['832-824-5800'],['TX'],['77030'],"MS, MD, FAAP",...,Anesthesiology Pediatric Anesthesiology,6701 FANNIN ST,,LOCATION,HOUSTON,,770302608.0,832-824-1000,,2010-04-06
3,Dheeraj Ahuja,Anesthesiology,/find-a-doctor/dheeraj-ahuja-md,Texas Medical Center,['6621 Fannin Street--Suite A3300'],['Houston'],['832-824-5800'],['TX'],['77030'],MD,...,,,,,,,,,,
4,Titilopemi A.O. Aina,Anesthesiology,/find-a-doctor/titilopemi-ao-aina-md,Texas Medical Center,['6621 Fannin Street--Suite A3300'],['Houston'],['832-824-5800'],['TX'],['77030'],"MD, MPH",...,,,,,,,,,,
5,Ayse Akcan-Arikan,Anesthesiology,/find-a-doctor/ayse-akcan-arikan-md,Texas Medical Center,['6651 Main Street--MC E1420'],['Houston'],['832-826-6230'],['TX'],['77030'],MD,...,,,,,,,,,,
6,Melanie Jeanne Alo,Anesthesiology,/find-a-doctor/melanie-jeanne-alo-md,Texas Medical Center,['6621 Fannin Street--Suite A3300'],['Houston'],['832-824-5800'],['TX'],['77030'],MD,...,Anesthesiology,6621 FANNIN,,LOCATION,HOUSTON,,77030.0,832-824-5800,832-825-5801,2005-06-28
7,Marc Michael Anders,Anesthesiology,/find-a-doctor/marc-michael-anders-md,Texas Medical Center,['6651 Main Street--MC E1420'],['Houston'],['832-826-6230'],['TX'],['77030'],MD,...,General Acute Care Hospital Children,6621 FANNIN ST,SECTION OF CRITICAL CARE,LOCATION,HOUSTON,,770302358.0,832-824-1000,,2015-04-22
8,Marc Michael Anders,Anesthesiology,/find-a-doctor/marc-michael-anders-md,Texas Medical Center,['6651 Main Street--MC E1420'],['Houston'],['832-826-6230'],['TX'],['77030'],MD,...,Pediatrics Pediatric Critical Care Medicine,6621 FANNIN ST,SECTION OF CRITICAL CARE,LOCATION,HOUSTON,,770302358.0,832-824-1000,,2015-04-22
9,Dean B. Andropoulos,Anesthesiology,/find-a-doctor/dean-b-andropoulos-md-mhcm,Texas Medical Center,['6621 Fannin Street--Suite A3300'],['Houston'],['832-824-5800'],['TX'],['77030'],"MD, MHCM",...,Anesthesiology,6621 FANNIN ST,,LOCATION,HOUSTON,,770302303.0,832-824-5800,832-825-5801,2005-06-30


In [11]:
#write out file!
from pandas import ExcelWriter
filename = 'TexasChildrensNPIScrape.xlsx'

with ExcelWriter(filename) as writer:
       Texas_childrens_roster_npis.to_excel(writer, sheet_name='NPI_with_spec')