# Create watersheds and catchments

AKSSF project has ~ 500 sites that have been shifted to the flow networks.
We need to create watersheds for each. This will make extracting spatial
and climatic covariates for modeling go much faster.

Dustin pointed out that there are fromnodes and tonodes in the NHDPlus that can be used to navigate
upstream, save all the NHPPlusIDs, select and merge the catchments to create watersheds for each site.
This works in R so just need to transfer to python. Premise is to use a while loop to keep selecting
new stream segments that have their tonode match the fromnode of the last segment(s).
Logic for stopping while loop:
Stop when summing the ids is not greater than 0. I'm not sure why this works, but it does
for some watersheds, although I think it may be running infinitely on other watersheds.
Alternatively, if sum(StartFlag) == count(rec) then all NHDPlusIDs are headwater streams.

NOTE: creating an attribute index on the field that we are selecting on in the cats_merge
feature class is really important for creating large watersheds. This vastly sped up the time
for the Cook Inlet watersheds, like the mainstem Susitna. But, you can only run it once, then
the index is created. This has been added to the Cook Inlet watersheds steps 4-7 code chunk.

Create a loop and process watersheds for all the points. Start with Cook Inlet first.
Note that folders, geodbs, and merged catchments are created in the merge_grids script.
1. select catchments that intersect points to get NHDPlusID
2. create list of IDs
3. use loop to create watersheds
4. first get list of all upstream NHDPlusIDs
5. create temporary layer of catchments
6. select catchments that match the upstream IDs
7. dissolved on those catchments and save to Cook Inlet gdb and watersheds feature dataset

In [None]:
# COOK INLET
# steps 1 and 2
# intersect points with catchments and create list of NHDPlusIDs
import arcpy
arcpy.env.workspace = r"W:\GIS\AKSSF\Cook_Inlet\Cook_Inlet.gdb"
arcpy.env.overwriteOutput = True

points = r"T:\Aquatic\AKSSF\AKSSF_Hydrography.gdb\sites_outside_bb_verified_DM"
cats = r"W:\GIS\AKSSF\Cook_Inlet\Cook_Inlet.gdb\cats_merge"
idList = []
outcats = "cats_intersect"

arcpy.MakeFeatureLayer_management(cats, "tempLayer")
arcpy.management.SelectLayerByLocation("tempLayer", "INTERSECT", points)
arcpy.CopyFeatures_management("tempLayer", outcats)

fields = arcpy.ListFields("tempLayer")
for field in fields:
    print("{0}".format(field.name))
with arcpy.da.SearchCursor("tempLayer", ["NHDPlusID"]) as cursor:
    for row in cursor:
        idList.append(row[0])

print(len(idList))

#check if duplicate catchments in the idList
idset = set(idList)
print(idset)
print(len(idset))


In [None]:
#only 114 watersheds created for cook inlet, but 241 unique catchments
# see which ones are missing.
import arcpy

# Got through 95 watersheds and all other programs froze, restarted and finding which watersheds remain.
arcpy.env.workspace = r"W:\GIS\AKSSF\Cook_Inlet\Cook_Inlet.gdb\Watersheds"
wtds = arcpy.ListFeatureClasses()
#just get numeric part
wtds = [x[4:20] for x in wtds]
#convert to numeric
wtds = [int(i) for i in wtds]
print(wtds)
print(len(wtds))
print(len(idList))
#
idFilter = [x for x in idList if x not in wtds]
print(idFilter)
print("Original list of sites in Cook Inlet: " + str(len(idList)))
print("Watersheds completed: " + str(len(wtds)))
print("Watersheds remaining: " + str(len(idFilter)))

In [None]:
print(idList)
idListord = sorted(idList)
print(idListord)

In [None]:
# COOK INLET
# steps 4-7

import arcpy
import pandas as pd


# steps 4-9 for loop to create watersheds
arcpy.env.workspace = r"W:/GIS/AKSSF/Cook_Inlet/Cook_Inlet.gdb"
arcpy.env.overwriteOutput = True
arcpy.env.qualifiedFieldNames = False

vaa = "vaa_merge"
cats = "cats_merge"
output_SR = arcpy.Describe(cats).spatialReference
arcpy.env.outputCoordinateSystem = output_SR

#watersheds feature dataset for storing fcs
arcpy.management.CreateFeatureDataset(r"W:\GIS\AKSSF\Cook_Inlet\Cook_Inlet.gdb", "Watersheds", output_SR)

# field_names = [f.name for f in arcpy.ListFields(vaa)]
# print(field_names)
vaa_df = pd.DataFrame(arcpy.da.TableToNumPyArray(vaa, ("NHDPlusID", "FromNode", "ToNode", "StartFlag")))

#NHDPlusID for mainstem Susitna below Talkeetna, which doesn't seem to run.
# idFilter = [75000200013536]
# cache creek
# idFilter = [75000600030148]

arcpy.AddIndex_management(cats, "NHDPlusID", "NHD_index")

for id in idList:
    print("Starting watershed for: " + str(id))
    rec = [id]
    up_ids = []
    up_ids.append(rec)
    rec_len = len(rec)
    hws_sum = 0

    while rec_len != hws_sum:
        fromnode = vaa_df.loc[vaa_df["NHDPlusID"].isin(rec), "FromNode"]
        rec = vaa_df.loc[vaa_df["ToNode"].isin(fromnode), "NHDPlusID"]
        rec_len = len(rec)
        rec_hws = vaa_df.loc[vaa_df["ToNode"].isin(fromnode), "StartFlag"]
        hws_sum = sum(rec_hws)
        # print(rec)
        # print(rec_len)
        # print(hws_sum)
        up_ids.append(rec)

    # print(up_ids)
    # print(len(up_ids))
    #up_ids is a list with more than numbers, use extend to only keep numeric nhdplusids
    newup_ids = []
    for x in up_ids:
        newup_ids.extend(x)
    # print(newup_ids)
    print(len(newup_ids))

    # newup_ids = newup_ids[1:1000000]
    # print(len(newup_ids))

        print("Starting selection")
    tempLayer = "catsLyr"
    #expression = 'NHDPlusID IN {0}'.format(tuple(newup_ids))
    #trying expression to deal with one catchment (i.e. hws)
    expression = '"NHDPlusID" IN ({0})'.format(', '.join(map(str, newup_ids)) or 'NULL')
    arcpy.MakeFeatureLayer_management(cats, tempLayer, where_clause=expression)
    # arcpy.management.SelectLayerByAttribute(tempLayer, "NEW_SELECTION", expression, None)

    print("Starting dissolve")
    outwtd = "Watersheds\\wtd_" + str(round(id))
    print(outwtd)
    arcpy.management.Dissolve(tempLayer, outwtd)

Cook Inlet has some mainstem sites (e.g. Susitna River) with watersheds that cross HU8 boundaries. Along
those boundaries, the catchments don't line up so there are small holes inside the
watershed. Use eliminate polygon part to find and fill those holes and make complete
watershed polygons. Check to see if there are any mainstem Copper River sites that
might also run across HU8 boundaries... I don't think that there are.

In [None]:
import arcpy
import os
import pandas as pd

regions = ["Copper_River", "Cook_Inlet"]

for region in regions:
    arcpy.env.workspace = "W:\\GIS\\AKSSF\\" + region + "\\" + region + ".gdb\\Watersheds"
    wtds = arcpy.ListFeatureClasses()
    print(region + ": " + str(len(wtds)) + " watersheds")

    for wtd in wtds:
        wtdName = wtd[4:20]
        print("Starting wtd: " + wtdName)
        wtdPath = os.path.join(arcpy.env.workspace, wtd)
        field_names = [f.name for f in arcpy.ListFields(wtdPath)]
        # print(field_names)
        if "Area_km2" in field_names:
            print("Area already calculated")
        else:
            arcpy.AddField_management(wtdPath, "Area_km2", "DOUBLE")
            expression1 = "{0}".format("!SHAPE.area@SQUAREKILOMETERS!")
            arcpy.CalculateField_management(wtdPath, "Area_km2", expression1, "PYTHON", )
        #if area > 1000 km2, then eliminate polygon parts
        #print("Eliminating holes in watershed for catchment ID: " + wtdName)



In [None]:
# COPPER RIVER
# steps 1 and 2
# intersect points with catchments and create list of NHDPlusIDs
import arcpy
arcpy.env.workspace = r"W:\GIS\AKSSF\Copper_River\Copper_River.gdb"
arcpy.env.overwriteOutput = True

points = r"T:\Aquatic\AKSSF\AKSSF_Hydrography.gdb\sites_outside_bb_verified_DM"
cats = r"W:\GIS\AKSSF\Copper_River\Copper_River.gdb\cats_merge"
idList = []
outcats = "cats_intersect"

arcpy.MakeFeatureLayer_management(cats, "tempLayer")
arcpy.management.SelectLayerByLocation("tempLayer", "INTERSECT", points)
arcpy.CopyFeatures_management("tempLayer", outcats)

fields = arcpy.ListFields("tempLayer")
for field in fields:
    print("{0}".format(field.name))
with arcpy.da.SearchCursor("tempLayer", ["NHDPlusID"]) as cursor:
    for row in cursor:
        idList.append(row[0])

print(len(idList))

In [None]:
# COPPER RIVER
# steps 4-7

import arcpy
import pandas as pd

#idList = [75004300004324]

# steps 4-9 for loop to create watersheds
arcpy.env.workspace = r"W:/GIS/AKSSF/Copper_River/Copper_River.gdb"
arcpy.env.overwriteOutput = True
arcpy.env.qualifiedFieldNames = False

vaa = "vaa_merge"
cats = "cats_merge"
output_SR = arcpy.Describe(cats).spatialReference
arcpy.env.outputCoordinateSystem = output_SR

#watersheds feature dataset for storing fcs
arcpy.management.CreateFeatureDataset(r"W:\GIS\AKSSF\Copper_River\Copper_River.gdb", "Watersheds", output_SR)

vaa_df = pd.DataFrame(arcpy.da.FeatureClassToNumPyArray(vaa, ("NHDPlusID", "FromNode", "ToNode")))

for id in idList:
    print("Starting watershed for: " + str(id))
    rec = [id]
    print(type(rec))
    up_ids = []

    while sum(rec) > 0:
        up_ids.append(rec)
        fromnode = vaa_df.loc[vaa_df["NHDPlusID"].isin(rec), "FromNode"]
        rec = vaa_df.loc[vaa_df["ToNode"].isin(fromnode), "NHDPlusID"]

    #up_ids is a list with more than numbers, use extend to only keep numeric nhdplusids
    newup_ids = []
    for x in up_ids:
        newup_ids.extend(x)

    print(type(newup_ids))
    tempLayer = "catsLyr"
    #expression = 'NHDPlusID IN {0}'.format(tuple(newup_ids))
    #trying expression to deal with one catchment (i.e. hws)
    expression = '"NHDPlusID" IN ({0})'.format(', '.join(map(str, newup_ids)) or 'NULL')
    arcpy.MakeFeatureLayer_management(cats, tempLayer)
    arcpy.management.SelectLayerByAttribute(tempLayer, "NEW_SELECTION", expression, None)

    outwtd = "Watersheds\\wtd_" + str(round(id))
    print(outwtd)
    arcpy.management.Dissolve(tempLayer, outwtd)


In [None]:
# BRISTOL BAY WATERSHEDS

import arcpy
import pandas as pd

arcpy.env.workspace = r"W:\GIS\AKSSF\Bristol_Bay\Bristol_Bay.gdb"
arcpy.env.overwriteOutput = True

points = r"W:\GIS\AKSSF\AKSSF_Hydrography.gdb\bb_MD_verified_DM"
cats = r"W:\GIS\AKSSF\Bristol_Bay\Bristol_Bay.gdb\cats_merge"
idList = []
outcats = "cats_intersect"

arcpy.MakeFeatureLayer_management(cats, "tempLayer")
arcpy.management.SelectLayerByLocation("tempLayer", "INTERSECT", points)
arcpy.CopyFeatures_management("tempLayer", outcats)

fields = arcpy.ListFields("tempLayer")
for field in fields:
    print("{0}".format(field.name))
with arcpy.da.SearchCursor("tempLayer", ["catID"]) as cursor:
    for row in cursor:
        idList.append(row[0])

print(len(idList))

In [None]:
# BRISTOL BAY
# steps 4-7

import arcpy
import pandas as pd
import numpy
import time

# idList = [492244] #for testing

# steps 4-9 for loop to create watersheds
arcpy.env.workspace = r"W:\GIS\AKSSF\Bristol_Bay\Bristol_Bay.gdb"
arcpy.env.overwriteOutput = True
arcpy.env.qualifiedFieldNames = False

streams = "streams_merge"
cats = "cats_merge"
output_SR = arcpy.Describe(cats).spatialReference
arcpy.env.outputCoordinateSystem = output_SR

arcpy.AddIndex_management(cats, "catID", "catid_index")

#watersheds feature dataset for storing fcs
arcpy.management.CreateFeatureDataset(r"W:\GIS\AKSSF\Bristol_Bay\Bristol_Bay.gdb", "Watersheds", output_SR)

str_df = pd.DataFrame(arcpy.da.FeatureClassToNumPyArray(streams, ("catID", "upCatID1", "upCatID2")))
hws_codes = [999999, 1999999, 2999999, 3999999, 4999999]

#idList if doing ALL watersheds.
for id in idList:
    print("Starting watershed for: " + str(id))
    rec = [id]
    up_ids = []
    sum_rec = sum(rec)
    timeout = time.time() + 60*15 # 15 minutes from this point

    while(sum_rec > 0):
        if time.time() > timeout:
            break
        up_ids.append(rec)
        rec = str_df.loc[str_df["catID"].isin(rec), ("upCatID1", "upCatID2")]
        rec = rec.replace(hws_codes, 0)
        rec = pd.concat([rec['upCatID1'], rec['upCatID2']])
        # print(rec)
        sum_rec = sum(rec)
    # print(up_ids)


    #up_ids is a list with more than numbers, use extend to only keep numeric nhdplusids
    newup_ids = []
    for x in up_ids:
        newup_ids.extend(x)

    # print(type(newup_ids))
    # print(newup_ids)
    tempLayer = "catsLyr"
    #expression = 'NHDPlusID IN {0}'.format(tuple(newup_ids))
    #trying expression to deal with one catchment (i.e. hws)
    expression = '"catID" IN ({0})'.format(', '.join(map(str, newup_ids)) or 'NULL')
    arcpy.MakeFeatureLayer_management(cats, tempLayer)
    arcpy.management.SelectLayerByAttribute(tempLayer, "NEW_SELECTION", expression, None)

    outwtd = "Watersheds\\wtd_" + str(round(id))
    arcpy.management.Dissolve(tempLayer, outwtd)
    print("Watershed created at:" + outwtd)

In [None]:
# code when trouble-shooting bb above.
import arcpy

# Got through 95 watersheds and all other programs froze, restarted and finding which watersheds remain.
arcpy.env.workspace = r"W:\GIS\AKSSF\Bristol_Bay\Bristol_Bay.gdb\Watersheds"
wtds = arcpy.ListFeatureClasses()
#just get numeric part
wtds = [x[4:20] for x in wtds]
#convert to numeric
wtds = [int(i) for i in wtds]
print(wtds)
print(len(wtds))
print(len(idList))
#
# idFilter = [x for x in idList if x not in wtds]
# print(idFilter)
# print("Original list of sites in BB: " + str(len(idList)))
# print("Watersheds completed: " + str(len(wtds)))
# print("Watersheds remaining: " + str(len(idFilter)))


In [4]:
# PRINCE WILLIAM SOUND WATERSHEDS

import arcpy
import pandas as pd

gdb = r"W:\GIS\AKSSF\Prince_William_Sound\Prince_William_Sound.gdb"
arcpy.env.workspace = gdb
arcpy.env.overwriteOutput = True

points = r"W:\GIS\AKSSF\AKSSF_Hydrography.gdb\sites_outside_bb_verified_DM"
cats = r"W:\GIS\AKSSF\Prince_William_Sound\Prince_William_Sound.gdb\cats_merge"
idList = []
outcats = gdb + "\\cats_intersect"

arcpy.MakeFeatureLayer_management(cats, "tempLayer")
arcpy.management.SelectLayerByLocation("templayer", "INTERSECT", points)
arcpy.CopyFeatures_management("templayer", outcats)

fields = arcpy.ListFields("tempLayer")
for field in fields:
    print("{0}".format(field.name))
with arcpy.da.SearchCursor("tempLayer", ["gridcode"]) as cursor:
    for row in cursor:
        idList.append(row[0])

print(len(idList))

OBJECTID
Shape
gridcode
Shape_Length
Shape_Area
19


In [5]:
# Prince_William_Sound
# steps 4-7

import arcpy
import pandas as pd
import numpy

arcpy.env.workspace = r"W:\GIS\AKSSF\Prince_William_Sound\Prince_William_Sound.gdb"
arcpy.env.overwriteOutput = True
arcpy.env.qualifiedFieldNames = False

streams = "streams_merge"
cats = "cats_merge"
output_SR = arcpy.Describe(cats).spatialReference
arcpy.env.outputCoordinateSystem = output_SR

# arcpy.AddIndex_management(cats, "NHDPlusID", "NHD_index")

#watersheds feature dataset for storing fcs
arcpy.management.CreateFeatureDataset(r"W:\GIS\AKSSF\Prince_William_Sound\Prince_William_Sound.gdb", "Watersheds", output_SR)

fields = arcpy.ListFields(streams)
for field in fields:
    print("{0}".format(field.name))

str_df = pd.DataFrame(arcpy.da.FeatureClassToNumPyArray(streams, ("LINKNO", "USLINKNO1", "USLINKNO2")))
hws_codes = [-1]

# idList = [46055]

#idList if doing ALL watersheds.
for id in idList:
    print("Starting watershed for: " + str(id))
    rec = [id]
    up_ids = []
    sum_rec = sum(rec)

    while(sum_rec > 0):
        up_ids.append(rec)
        rec = str_df.loc[str_df["LINKNO"].isin(rec), ("USLINKNO1", "USLINKNO2")]
        rec = pd.concat([rec['USLINKNO1'], rec['USLINKNO2']])
        sum_rec = sum(rec)
        print(sum_rec)


    # up_ids is a list with more than numbers, use extend to only keep numeric nhdplusids
    newup_ids = []
    for x in up_ids:
        newup_ids.extend(x)

    # print(type(newup_ids))
    # print(newup_ids)
    tempLayer = "catsLyr"
    #expression = 'NHDPlusID IN {0}'.format(tuple(newup_ids))
    #trying expression to deal with one catchment (i.e. hws)
    expression = '"gridcode" IN ({0})'.format(', '.join(map(str, newup_ids)) or 'NULL')
    arcpy.MakeFeatureLayer_management(cats, tempLayer)
    arcpy.management.SelectLayerByAttribute(tempLayer, "NEW_SELECTION", expression, None)

    outwtd = "Watersheds\\wtd_" + str(round(id))
    print(outwtd)
    arcpy.management.Dissolve(tempLayer, outwtd)

# merge into one fc at very end.

OBJECTID
Shape
LINKNO
DSLINKNO
USLINKNO1
USLINKNO2
DSNODEID
strmOrder
Length
Magnitude
DSContArea
strmDrop
Slope
StraightL
USContArea
WSNO
DOUTEND
DOUTSTART
DOUTMID
Shape_Length
Starting watershed for: 18457
27744
28132
28222
28392
31982
55458
70970
51980
46384
32378
49918
46944
27748
27942
30852
45518
23998
25882
41028
19208
-4
Watersheds\wtd_18457
Starting watershed for: 26464
31298
29766
29746
27746
26276
25536
11766
-4
Watersheds\wtd_26464
Starting watershed for: 28086
33752
31730
30390
28450
11820
-4
Watersheds\wtd_28086
Starting watershed for: 29854
57928
92356
136652
140448
114430
116608
71984
36578
50646
66676
74932
72752
47198
26782
27316
15926
-4
Watersheds\wtd_29854
Starting watershed for: 30884
49888
78696
58952
100026
123672
90698
94972
124242
89638
90142
122522
86948
85182
49012
49116
18847
-4
Watersheds\wtd_30884
Starting watershed for: 31865
46050
-4
Watersheds\wtd_31865
Starting watershed for: 36645
50980
50068
48998
45948
55168
67240
-8
Watersheds\wtd_36645
Starting w

In [6]:
# KODIAK WATERSHEDS

import arcpy
import pandas as pd

arcpy.env.workspace = r"W:\GIS\AKSSF\Kodiak\Kodiak.gdb"
arcpy.env.overwriteOutput = True

points = r"W:\GIS\AKSSF\AKSSF_Hydrography.gdb\sites_outside_bb_verified_DM"
cats = r"W:\GIS\AKSSF\Kodiak\Kodiak.gdb\cats_merge"
idList = []
outcats = "cats_intersect"

arcpy.MakeFeatureLayer_management(cats, "tempLayer")
arcpy.management.SelectLayerByLocation("tempLayer", "INTERSECT", points)
arcpy.CopyFeatures_management("tempLayer", outcats)

fields = arcpy.ListFields("tempLayer")
for field in fields:
    print("{0}".format(field.name))
with arcpy.da.SearchCursor("tempLayer", ["gridcode"]) as cursor:
    for row in cursor:
        idList.append(row[0])

print(len(idList))

OBJECTID
Shape
gridcode
proc_reg
Shape_Length
Shape_Area
28


In [7]:
# Kodiak
# steps 4-7

import arcpy
import pandas as pd
import numpy

arcpy.env.workspace = r"W:\GIS\AKSSF\Kodiak\Kodiak.gdb"
arcpy.env.overwriteOutput = True
arcpy.env.qualifiedFieldNames = False

streams = "streams_merge"
cats = "cats_merge"
output_SR = arcpy.Describe(cats).spatialReference
arcpy.env.outputCoordinateSystem = output_SR

#watersheds feature dataset for storing fcs
arcpy.management.CreateFeatureDataset(r"W:\GIS\AKSSF\Kodiak\Kodiak.gdb", "Watersheds", output_SR)

fields = arcpy.ListFields(streams)
for field in fields:
    print("{0}".format(field.name))

str_df = pd.DataFrame(arcpy.da.FeatureClassToNumPyArray(streams, ("LINKNO", "USLINKNO1", "USLINKNO2")))
hws_codes = [-1]

#idList if doing ALL watersheds.
for id in idList:
    print("Starting watershed for: " + str(id))
    rec = [id]
    up_ids = []
    sum_rec = sum(rec)

    while(sum_rec > 0):
        up_ids.append(rec)
        rec = str_df.loc[str_df["LINKNO"].isin(rec), ("USLINKNO1", "USLINKNO2")]
        rec = pd.concat([rec['USLINKNO1'], rec['USLINKNO2']])
        sum_rec = sum(rec)


    #up_ids is a list with more than numbers, use extend to only keep numeric nhdplusids
    newup_ids = []
    for x in up_ids:
        newup_ids.extend(x)

    # print(type(newup_ids))
    # print(newup_ids)
    tempLayer = "catsLyr"
    #expression = 'NHDPlusID IN {0}'.format(tuple(newup_ids))
    #trying expression to deal with one catchment (i.e. hws)
    expression = '"gridcode" IN ({0})'.format(', '.join(map(str, newup_ids)) or 'NULL')
    arcpy.MakeFeatureLayer_management(cats, tempLayer)
    arcpy.management.SelectLayerByAttribute(tempLayer, "NEW_SELECTION", expression, None)

    outwtd = "Watersheds\\wtd_" + str(round(id))
    print(outwtd)
    arcpy.management.Dissolve(tempLayer, outwtd)





OBJECTID
Shape
LINKNO
DSLINKNO
USLINKNO1
USLINKNO2
DSNODEID
strmOrder
Length
Magnitude
DSContArea
strmDrop
Slope
StraightL
USContArea
WSNO
DOUTEND
DOUTSTART
DOUTMID
proc_reg
Shape_Length
Starting watershed for: 48267
Watersheds\wtd_48267
Starting watershed for: 49617
Watersheds\wtd_49617
Starting watershed for: 50197
Watersheds\wtd_50197
Starting watershed for: 64593
Watersheds\wtd_64593
Starting watershed for: 72144
Watersheds\wtd_72144
Starting watershed for: 76954
Watersheds\wtd_76954
Starting watershed for: 77794
Watersheds\wtd_77794
Starting watershed for: 90346
Watersheds\wtd_90346
Starting watershed for: 93176
Watersheds\wtd_93176
Starting watershed for: 94216
Watersheds\wtd_94216
Starting watershed for: 97276
Watersheds\wtd_97276
Starting watershed for: 99516
Watersheds\wtd_99516
Starting watershed for: 100826
Watersheds\wtd_100826
Starting watershed for: 101556
Watersheds\wtd_101556
Starting watershed for: 103096
Watersheds\wtd_103096
Starting watershed for: 103196
Watersheds\