# Download via HTTP

## Using Async

One option is to use the `aiohttp` library.  This is async-only. 


In [1]:
import aiohttp
session = aiohttp.ClientSession()

async with session.get('https://geoconnex-demo-pages.internetofwater.dev/collections/demo-gpkg/items?f=json&limit=1000', allow_redirects=True) as resp:
    with open("nldi_sample_aiohttp.geojson", 'wb') as fd:
        async for chunk in resp.content.iter_chunked(1024):
            fd.write(chunk)

## Using httpx
A higher-level option, with the capability to do async or sync, is `httpx`. 

In [1]:

import httpx
with httpx.Client() as client:
    r = client.get('https://geoconnex-demo-pages.internetofwater.dev/collections/demo-gpkg/items?f=json&limit=100')
r.json()

{'type': 'FeatureCollection',
 'features': [{'type': 'Feature',
   'geometry': {'type': 'Point',
    'coordinates': [-69.28754452464723, 46.49314140574131]},
   'properties': {'fid': 1,
    'SourceAgency': None,
    'SourceDataset': None,
    'SourceFeatureURL': None,
    'FeatureType': None,
    'ReachCode': '01010002001452',
    'Measure': 64.87204415982373,
    'ReachSMDate': '2012-03-07T00:00:00.000Z',
    'AddressDate': '2022-03-01T01:26:08.000Z',
    'Catchment': 0.0,
    'HU': '',
    'OnNetwork': 1,
    'HydroAddressID': '78f8fbec-d4e9-4e72-bcf9-32136d391bbb',
    'GNIS_NAME': 'Allagash River',
    'GNIS_ID': None,
    'SnapTolerance': 250.0,
    'SnapDistance': 93.1321,
    'InSnapTolerance': 1,
    'QCTolerance': 250.0,
    'InQCTolerance': 1,
    'QCApproved': 1,
    'NHDPv2ReachCode': '01010002001002',
    'NHDPv2Measure': 0.0,
    'Latitude': 46.49333333,
    'Longitude': -69.28833333,
    'uri': 'https://geoconnex.us/iow/demo/ME00237',
    'state': 'https://geoconnex.us/r

Note that the response has built-in mechanisms for parsing JSON.  

`httpx` has several mechanism for download.  The above uses `get()`, which fetches the entire response into memory. For large responses, it may not be possible to hold the whole thing in memory.  

In ourcase, a `stream` is likely the best option.  This lets us read the response data in chunks.  We can either process the response as it comes in, or write it to disk for later processing. 

In [4]:
with httpx.Client() as client:
    with open("nldi_sample_httpx.geojson", "wb") as download_file:
        url = "https://geoconnex-demo-pages.internetofwater.dev/collections/demo-gpkg/items?f=json&limit=100"
        with client.stream("GET", url) as response:
            for chunk in response.iter_bytes(1024): # 1Kb chunks
                download_file.write(chunk)

Using a `stream` will dramatically reduce the memory footprint of the running program. It also lets us quickly see if a response holds data without having to download the whole thing.  We can just read the first chunk, then make a decision if we want to continue:

In [24]:
import httpx
import ijson
Failing_URL = r"https://www.waterqualitydata.us/data/Station/search?mimeType=geojson&minactivities=1&counts=no"
Succeeding_URL = r"https://www.sciencebase.gov/catalog/file/get/60c7b895d34e86b9389b2a6c?name=vigil.geojson"

try:
    with httpx.Client() as client:
        with client.stream("GET", Succeeding_URL, timeout=5.0) as response:
            chunk = response.iter_bytes(2048)
            print("Read some data")
except httpx.ReadTimeout:
    print("READ TIMEOUT")

Read some data
