## Single page Tabular Scrape

- <a href="https://apps.health.ny.gov/pubdoh/professionals/doctors/conduct/factions/AllRecordsAction.action">On this site</a>, scrape all the doctors info on page 292.
- Export the content into a CSV file called ```page_292.csv```.

In [2]:
## import libraries
import pandas as pd
import requests

In [14]:
## get url
url = "https://apps.health.ny.gov/pubdoh/professionals/doctors/conduct/factions/AllRecordsAction.action"
response = requests.get(url)
response.status_code

200

In [29]:
## use Pandas to read tables on page
tables = pd.read_html(response.text)
tables

[                                  Physician Last Name  \
 0                                           Uzochukwu   
 1                                         Bendelstein   
 2                                             Gregory   
 3                                             Barnard   
 4                                           Loesevitz   
 5                                             Rankine   
 6                                         Vaccariello   
 7                                               Ramos   
 8                                               Yonan   
 9                                             Sherman   
 10                                               Weis   
 11                                            Delaney   
 12                                          Francisco   
 13                                           Kollisch   
 14                                          Putterman   
 15                                            Sudberg   
 16           

In [30]:
## let's look at the second table: (The first table include other elements)
df = tables[1]
df

Unnamed: 0,Physician Last Name,Physician First Name,Physician Middle Name,License Number,License Type,Effective Date,Date Updated,Year of Birth
0,Uzochukwu,Nzeadibenma,O.,231276,MD,09/28/2023,09/21/2023,1973
1,Bendelstein,Harold,L.,160938,MD,09/26/2023,09/19/2023,1951
2,Gregory,Joyce,,310488,MD,09/25/2023,09/18/2023,1968
3,Barnard,Morris,,195220,MD,09/21/2023,09/14/2023,1963
4,Loesevitz,Arthur,Woldemar Roderich,182467,MD,09/21/2023,09/14/2023,1953
5,Rankine,Kirk,P.,294106,MD,09/20/2023,09/13/2023,1969
6,Vaccariello,Charles,J.,104837,MD,09/20/2023,09/13/2023,1941
7,Ramos,Erwin,F.,254719,MD,09/12/2023,09/05/2023,1970
8,Yonan,Abdullah,M.,209041,MD,08/22/2023,09/01/2023,1956
9,Sherman,Lawrence,M.,129824,MD,09/07/2023,08/31/2023,1949


In [47]:
## use pandas to write to csv file
df.to_csv("page_292.csv", encoding = "UTF-8", index = False )

## Multipage Tabular Scrape

- <a href="https://apps.health.ny.gov/pubdoh/professionals/doctors/conduct/factions/AllRecordsAction.action">On this site</a>, scrape all doctors whose last names begin with "P".
- Export the content into a CSV file called ```md_P.csv```.


In [32]:
## base url of site to scrape
base_url = "https://apps.health.ny.gov/pubdoh/professionals/doctors/conduct/factions/AlphabetSearchAction.action?alpbhabetSearch="

In [43]:
alphabetlist = [chr(i) for  i in range(97, 123)]
alphabetlist

['a',
 'b',
 'c',
 'd',
 'e',
 'f',
 'g',
 'h',
 'i',
 'j',
 'k',
 'l',
 'm',
 'n',
 'o',
 'p',
 'q',
 'r',
 's',
 't',
 'u',
 'v',
 'w',
 'x',
 'y',
 'z']

In [75]:
## using list comprehension
urls_lc = [(f"{base_url}{alphabet}*") for alphabet in alphabetlist]
urls_lc

['https://apps.health.ny.gov/pubdoh/professionals/doctors/conduct/factions/AlphabetSearchAction.action?alpbhabetSearch=a*',
 'https://apps.health.ny.gov/pubdoh/professionals/doctors/conduct/factions/AlphabetSearchAction.action?alpbhabetSearch=b*',
 'https://apps.health.ny.gov/pubdoh/professionals/doctors/conduct/factions/AlphabetSearchAction.action?alpbhabetSearch=c*',
 'https://apps.health.ny.gov/pubdoh/professionals/doctors/conduct/factions/AlphabetSearchAction.action?alpbhabetSearch=d*',
 'https://apps.health.ny.gov/pubdoh/professionals/doctors/conduct/factions/AlphabetSearchAction.action?alpbhabetSearch=e*',
 'https://apps.health.ny.gov/pubdoh/professionals/doctors/conduct/factions/AlphabetSearchAction.action?alpbhabetSearch=f*',
 'https://apps.health.ny.gov/pubdoh/professionals/doctors/conduct/factions/AlphabetSearchAction.action?alpbhabetSearch=g*',
 'https://apps.health.ny.gov/pubdoh/professionals/doctors/conduct/factions/AlphabetSearchAction.action?alpbhabetSearch=h*',
 'https:

In [44]:
## Let's import the required libaries to create a delay
from random import randrange
import time

In [76]:
## scrapping
busted_links = []  
df_all = []  
total_links = len(urls_lc)
counter = 1 

for url in urls_lc:
    print(f"Scrapping {counter} of {total_links}")
    counter += 1
    response = requests.get(url)
    try:  
        df = pd.read_html(response.text)[1]
    except:
        print(f"Oh no, {url} returned {response.status_code}")
        busted_links.append(url)
    else:
        df_all.append(df) 
        print(df)
    snoozer = randrange(5,12)
    print(f"Snoozing for {snoozer} seconds before next scrape")
    time.sleep(snoozer)
print("Done scrapping all provided links")

Scrapping 1 of 26
   Physician Last Name Physician First Name Physician Middle Name  \
0       AR Medical Art                 P.C.                   NaN   
1                Aaron               Joseph                   NaN   
2               Aarons                 Mark                  Gold   
3                Abadi             Jamsheed                     S   
4               Abbasi                Abdul                Hafeez   
5              Abbassi                Jadan                   NaN   
6              Abbassi                Samih                     R   
7               Abboud                Aiman               Michael   
8              Abdalah                 Ehab                    F.   
9             Abdel-Al               Naglaa         Zidan Elsayed   
10        Abdel-Hameed             Mohammad           Fathi Ahmad   
11        Abdel-Hameed             Mohammad           Fathi Ahmad   
12        Abdel-Mageed              Mohamed                   NaN   
13          Abde

Scrapping 4 of 26
      Physician Last Name Physician First Name Physician Middle Name  \
0                D'Aconti                 John                   NaN   
1                 D'Amato               Thomas                   NaN   
2              D'Ambrosio              Francis                Gerard   
3              D'Ambrosio              Francis                Gerard   
4                D'Angelo               Carmen                     A   
5                 D'Anjou               Thomas                   NaN   
6                 D'Auria              Anthony                   NaN   
7                D'Emilia                 John           Constantine   
8                 D'Silva               Ashley                 James   
9                 D'Silva               Ashley                 James   
10                D'Souza                 Ivan                   NaN   
11  DWP Pain Free Medical                 P.C.                   NaN   
12                DaCosta               Gaston

Scrapping 7 of 26
   Physician Last Name Physician First Name Physician Middle Name  \
0             Gabelman              Charles                Grover   
1                Gaber                Ahmed                     H   
2           Gabinskaya              Tatyana                   NaN   
3             Gabinsky               Sergey                    N.   
4              Gabriel            Demetrios                     M   
5             Gabriels            F.Forrest                   NaN   
6             Gabriels               Firman                    F.   
7             Gabriels               Firmin                    F.   
8           Gabrielson               George                   NaN   
9              Gaerman                Moshe                   NaN   
10             Gaerman                Moshe                   NaN   
11                Gage                 Dana                   Lee   
12              Gaines                 Gary                   NaN   
13             G

Scrapping 10 of 26
   Physician Last Name Physician First Name Physician Middle Name  \
0         J.P. Medical                 P.C.                   NaN   
1               Jabari              Jawanza                    N.   
2                Jaber                Ahmad                     M   
3            Jackowitz              Michael                     S   
4              Jackson              Bertram                     V   
5              Jackson                Bryce                     V   
6              Jackson                Harry                     E   
7              Jackson               Joseph                     A   
8              Jackson                 Mark                    H.   
9              Jackson                 Mark                    H.   
10             Jackson                 Mary                     J   
11             Jackson                Oscar                     F   
12             Jackson              William                Andrew   
13             

Scrapping 13 of 26
   Physician Last Name Physician First Name Physician Middle Name  \
0     MNM Medical Care                 P.C.                   NaN   
1              Mabatid                Heidi                   NaN   
2                Mabry                 Myra                   NaN   
3            MacDonald                Glenn                  John   
4            MacDonald                Glenn                  John   
5              MacKoul                 Paul                    J.   
6           MacPherson             Geoffrey                  Alan   
7             Macaluso          Christopher                   NaN   
8            Macapagal              Zenaida                   NaN   
9             Maccabee                 Neta                   NaN   
10             Maccone               Robert                     J   
11                Mack               Jeremy                 Roger   
12           Mackenzie                Janet                     S   
13           Ma

Scrapping 16 of 26
   Physician Last Name Physician First Name Physician Middle Name  \
0                 Paal                 Adam                   NaN   
1                 Pace               Enrico                   NaN   
2                 Pace              Leonard                   NaN   
3              Pacetti              Stephen                     J   
4               Pachas               Hector                     M   
5              Pacheco                Denny                    J.   
6                Pacik                Peter                   NaN   
7                Pacis            Andresito                    B.   
8                 Pack                    A               Stephen   
9         Packianathan             Emmanuel                   NaN   
10        Packianathan             Emmanuel                   NaN   
11               Padeh                Asher                   NaN   
12               Padeh                Asher                   NaN   
13             

Scrapping 19 of 26
     Physician Last Name Physician First Name Physician Middle Name  \
0   SVS Wellcare Medical                 PLLC                   NaN   
1                  Saado                Walid                   NaN   
2                  Saado                Walid                   NaN   
3                  Saado                Walid                   NaN   
4                   Saba              Souheil                   NaN   
5                  Sabal              Gerardo                Casino   
6                  Sabal              Queenie                T.U.C.   
7                 Sabato               Ulises                 Cesar   
8                 Sabido             Benjamin                   NaN   
9                 Sabido            Frederick                   NaN   
10                 Sabir                Rafiq                   NaN   
11                  Sabo              Matthew                    J.   
12                  Sabo              Mildred             

Scrapping 22 of 26
                           Physician Last Name Physician First Name  \
0   V & G Physical Medicine and Rehabilitation                 P.C.   
1                                  Vaccariello              Charles   
2                                    Vaccarino               Robert   
3                                    Vadapalli              Maruthi   
4                                         Vaid               Mustak   
5                                       Vaidya              Kaushal   
6                                      Vaisman                 Naum   
7                                      Valbrun                 Leon   
8                                   Valdiviezo                Sonia   
9                                      Valente             Domenico   
10                                   Valentine                 John   
11                    Valiant Medical Services                 P.C.   
12                                     Vallala           M

Scrapping 26 of 26
   Physician Last Name Physician First Name Physician Middle Name  \
0              Zaccheo               Jerald                     D   
1            Zachariah              Abraham                   NaN   
2               Zachel             Gretchen                   NaN   
3               Zacher                Allan                   NaN   
4               Zackin                Henry                     J   
5               Zackin                Henry                     J   
6               Zackin                Henry                     J   
7                Zadeh               Mehran                   NaN   
8                Zafar                Kamal                   NaN   
9                Zafar                Kamal                   NaN   
10               Zafar                Syeda                   NaN   
11                Zahl              Kenneth                   NaN   
12              Zahler               Gideon                   NaN   
13             

In [77]:
## which link broke?
busted_links

[]

In [78]:
## what does df_all hold?
df_all

[   Physician Last Name Physician First Name Physician Middle Name  \
 0       AR Medical Art                 P.C.                   NaN   
 1                Aaron               Joseph                   NaN   
 2               Aarons                 Mark                  Gold   
 3                Abadi             Jamsheed                     S   
 4               Abbasi                Abdul                Hafeez   
 5              Abbassi                Jadan                   NaN   
 6              Abbassi                Samih                     R   
 7               Abboud                Aiman               Michael   
 8              Abdalah                 Ehab                    F.   
 9             Abdel-Al               Naglaa         Zidan Elsayed   
 10        Abdel-Hameed             Mohammad           Fathi Ahmad   
 11        Abdel-Hameed             Mohammad           Fathi Ahmad   
 12        Abdel-Mageed              Mohamed                   NaN   
 13          Abdelja

In [79]:
## convert to a single df rather than a list of df
df = pd.concat(df_all, ignore_index = True) 
df

Unnamed: 0,Physician Last Name,Physician First Name,Physician Middle Name,License Number,License Type,Effective Date,Date Updated,Year of Birth
0,AR Medical Art,P.C.,,207165,,12/08/2010,12/01/2010,
1,Aaron,Joseph,,72800,MD,01/13/1999,,1927.0
2,Aarons,Mark,Gold,161530,MD,12/13/2005,12/06/2005,1958.0
3,Abadi,Jamsheed,S,136045,MD,08/14/2013,08/07/2013,1939.0
4,Abbasi,Abdul,Hafeez,183025,MD,04/13/2004,04/15/2004,1955.0
...,...,...,...,...,...,...,...,...
496,Zaki,Omar,S,114194,MD,10/13/1994,,
497,Zales,Michael,,95317,MD,07/19/1994,,
498,Zalmanov,Mikhail,Isaakovich,158429,MD,11/08/2005,11/02/2005,1946.0
499,Zalmanov,Mikhail,I,158429,MD,03/06/2013,02/27/2013,1946.0


In [80]:
## export to csv
df.to_csv("md_P.csv", index = False, encoding = "UTF-8")