## Using the Requests module to get a file

Documentation for Requests is available at https://requests.readthedocs.io/en/latest/

This demonstration simply requests a file from the HHS Open Data portal: https://healthdata.gov/dataset/electronic-health-record-ehr-incentive-program-payments-eligible-hospitals

In this example, we get the file from HHS inspect some interesting information about it, then write the data to a local file.

In [1]:
import requests

In [2]:
r = requests.get('http://dhcs-chhsagency.opendata.arcgis.com/datasets/8e4f3a0c75b9424d888d11c1f949cc32_0.csv?outSR={%22latestWkid%22:3857,%22wkid%22:102100}')

In [3]:
type(r)

requests.models.Response

In [4]:
r.status_code

200

In [5]:
len(r.text)

69065

In [7]:
print(r.text[0:1000])

﻿X,Y,Provider_Name,NPI,CCN,Medicaid_EP_Hospital_Type,Street_Address,City,County,State,Zip_Code,Payment_Year_Number,Program_Type,Total_payments,Last_Program_Year,Last_Payment_Year,Last_Payment_Criteria,Most_Recent_Disbursement_Amount,LONGITUDE,LATITUDE,FID
-122.435802,37.769062,Sutter Bay Hospitals,1659439834,50008,Acute Care Hospitals,CASTRO & DUBOCE STS,SAN FRANCISCO,San Francisco,CA,94114,3,Medicare/Medicaid,638474,2015,2017,MU,70942,-122.435802,37.769062,1
-117.913856,33.774499,PRIME HEALTHCARE SERVICES - GARDEN GROVE LLC,1659538858,50230,Acute Care Hospitals,12601 GARDEN GROVE BLVD,GARDEN GROVE,Orange,CA,92843,4,Medicare/Medicaid,3947489,2014,2015,MU,394749,-117.913856,33.774499,2
-117.262672,34.539918,ST MARY MEDICAL CENTER,1669456299,50300,Acute Care Hospitals,18300 US HIGHWAY 18,APPLE VALLEY,San Bernardino,CA,92307,3,Medicare/Medicaid,3062645,2014,2015,MU,340294,-117.262672,34.539918,3
-120.045618,36.945447,MADERA COMMUNITY HOSPITAL,1669673646,50568,Acute Care Hospitals,1250 E A

In [6]:
r.text[0:1000]

'\ufeffX,Y,Provider_Name,NPI,CCN,Medicaid_EP_Hospital_Type,Street_Address,City,County,State,Zip_Code,Payment_Year_Number,Program_Type,Total_payments,Last_Program_Year,Last_Payment_Year,Last_Payment_Criteria,Most_Recent_Disbursement_Amount,LONGITUDE,LATITUDE,FID\n-122.435802,37.769062,Sutter Bay Hospitals,1659439834,50008,Acute Care Hospitals,CASTRO & DUBOCE STS,SAN FRANCISCO,San Francisco,CA,94114,3,Medicare/Medicaid,638474,2015,2017,MU,70942,-122.435802,37.769062,1\n-117.913856,33.774499,PRIME HEALTHCARE SERVICES - GARDEN GROVE LLC,1659538858,50230,Acute Care Hospitals,12601 GARDEN GROVE BLVD,GARDEN GROVE,Orange,CA,92843,4,Medicare/Medicaid,3947489,2014,2015,MU,394749,-117.913856,33.774499,2\n-117.262672,34.539918,ST MARY MEDICAL CENTER,1669456299,50300,Acute Care Hospitals,18300 US HIGHWAY 18,APPLE VALLEY,San Bernardino,CA,92307,3,Medicare/Medicaid,3062645,2014,2015,MU,340294,-117.262672,34.539918,3\n-120.045618,36.945447,MADERA COMMUNITY HOSPITAL,1669673646,50568,Acute Care Hospital

In [8]:
with open('nadac.csv','w') as f:
    f.write(r.text)

In [10]:
type(r.headers)

requests.structures.CaseInsensitiveDict

In [9]:
import json
print(json.dumps(dict(r.headers), indent=4))

{
    "Content-disposition": "attachment; filename=Electronic_Health_Record_EHR_Incentive_Program_Payments_to_Eligible_Hospitals.csv",
    "Content-Encoding": "gzip",
    "Content-Type": "text/csv; charset=utf-8",
    "Date": "Wed, 26 Feb 2020 00:34:30 GMT",
    "ETag": "\"996569756dccc998a491bb0b489bb7a4\"",
    "Last-Modified": "Fri, 24 Aug 2018 21:36:33 GMT",
    "Server": "openresty",
    "Vary": "Accept-Encoding",
    "x-amz-meta-retrieved_at": "Fri Aug 24 2018 21:36:32 GMT+0000 (UTC)",
    "Content-Length": "21684",
    "Connection": "keep-alive"
}


### Total payments in this file?

In [11]:
import csv

In [12]:
total = 0

with open('nadac.csv') as f:
    reader = csv.reader(f)

    header = next(reader)
    print(header)
    payments_idx = header.index('Total_payments')
    
    for record in reader:
        total += int(record[payments_idx])



['\ufeffX', 'Y', 'Provider_Name', 'NPI', 'CCN', 'Medicaid_EP_Hospital_Type', 'Street_Address', 'City', 'County', 'State', 'Zip_Code', 'Payment_Year_Number', 'Program_Type', 'Total_payments', 'Last_Program_Year', 'Last_Payment_Year', 'Last_Payment_Criteria', 'Most_Recent_Disbursement_Amount', 'LONGITUDE', 'LATITUDE', 'FID']


In [13]:
print("CA hospitals have received ${:,.2f} in payments.".format(total))

CA hospitals have received $758,058,589.00 in payments.


## Reading internet files with Pandas

Pandas is smart enough to know that when you provide an HTTP url it is supposed to go access that data from the internet.

https://pandas.pydata.org/pandas-docs/version/0.23.4/io.html


In [14]:
import pandas as pd

In [15]:
df = pd.read_csv('http://dhcs-chhsagency.opendata.arcgis.com/datasets/8e4f3a0c75b9424d888d11c1f949cc32_0.csv')

In [16]:
df.head()

Unnamed: 0,X,Y,Provider_Name,NPI,CCN,Medicaid_EP_Hospital_Type,Street_Address,City,County,State,...,Payment_Year_Number,Program_Type,Total_payments,Last_Program_Year,Last_Payment_Year,Last_Payment_Criteria,Most_Recent_Disbursement_Amount,LONGITUDE,LATITUDE,FID
0,-122.435802,37.769062,Sutter Bay Hospitals,1659439834,50008,Acute Care Hospitals,CASTRO & DUBOCE STS,SAN FRANCISCO,San Francisco,CA,...,3,Medicare/Medicaid,638474,2015,2017,MU,70942,-122.435802,37.769062,1
1,-117.913856,33.774499,PRIME HEALTHCARE SERVICES - GARDEN GROVE LLC,1659538858,50230,Acute Care Hospitals,12601 GARDEN GROVE BLVD,GARDEN GROVE,Orange,CA,...,4,Medicare/Medicaid,3947489,2014,2015,MU,394749,-117.913856,33.774499,2
2,-117.262672,34.539918,ST MARY MEDICAL CENTER,1669456299,50300,Acute Care Hospitals,18300 US HIGHWAY 18,APPLE VALLEY,San Bernardino,CA,...,3,Medicare/Medicaid,3062645,2014,2015,MU,340294,-117.262672,34.539918,3
3,-120.045618,36.945447,MADERA COMMUNITY HOSPITAL,1669673646,50568,Acute Care Hospitals,1250 E ALMOND AVE,MADERA,Madera,CA,...,4,Medicare/Medicaid,2057365,2015,2016,MU,205737,-120.045618,36.945447,4
4,-117.117197,33.470664,Temecula Valley Hospital Inc,1679816201,50775,Acute Care Hospitals,31700 TEMECULA PKWY,TEMECULA,Riverside,CA,...,1,Medicare/Medicaid,474790,2016,2017,AIU,474790,-117.117197,33.470664,5


In [17]:
df.shape

(328, 21)

In [18]:
print("CA hospitals have received ${:,.2f} in payments.".format(df['Total_payments'].sum()))

CA hospitals have received $758,058,589.00 in payments.
