In this notebook: <br>
- Let's pull the CSV that corelated the NOAA to USGS
- pull the CSV of all the USGS gauges that we use currently in rivermaps.co (not future version)
- Find any items that are on that list but have a NOAA prediction
- Find list of USGS that are in current version, but not correlated with NOAA prediction - currently
- add those to the NOAA to USGS CSV so we can expand our prediction reach

In [18]:
import pandas as pd
from bs4 import BeautifulSoup
import os
import requests
import time

In [19]:
import pickle
path="C:\Springboard\Github\gauge_info"
os.chdir(path)

In [20]:
# load DF with NOAA and USGS for all gauges in Colorado River Basin that have predictions in NOAA
df = pickle.load(open("NOAA_USGS.pkl", "rb"))
df.head()

Unnamed: 0,NOAA_gauge,River,State,Elevation,Segment,USGS_link,usgs
0,SPRA3,San Pedro,AZ,2820,7,http://waterdata.usgs.gov/az/nwis/uv?09472050,9472050
1,MAOA3,Acdc,AZ,1230,6,0,0
2,MHFA3,Acdc,AZ,1225,7,0,0
3,MSXA3,Acdc,AZ,1220,8,0,0
4,ACHA3,Agua Caliente Wash,AZ,2588,2,0,0


## Let's pull the CSV that corelated the NOAA to USGS

In [21]:
# load DF with NOAA and USGS that are CURRENTLY used in future forecast - these were put together manually
df2 = pd.read_csv("USGS_NOAA_old.csv")
df2

Unnamed: 0,USGS,NOAA
0,09067020,EALC2
1,09057500,BGMC2
2,09066325,GRVC2
3,09070000,GPSC2
4,09070500,EGLC2
...,...,...
81,09507980,EVDA3
82,09507480,FCSA3
83,09402000,LCCA3
84,09384000,LCLA3


In [22]:
# load list from CSV of all USGS (and CO Water) gauges that are currently used
import csv
USGS_current = []
with open('USGS_list.csv', 'r') as f:
    readCSV = csv.reader(f, delimiter=',')
    for row in readCSV:
        for i in row:
            USGS_current.append(i)

In [23]:
len(USGS_current)

270

Before we proceed, let's review the data that we do have:
1. 459 NOAA sites througout the Colorado River Forecast Basin; we have the correponding USGS gauge for just about all of them. These are stored in df
2. 270 USGS (and CO Water) measures that are currently being is in the real-time display of water. These gauges are NOT just from the Colorado River Forecast Basin. These are stored in df2
3. 86 NOAA prediction sites that were manually entered to corresponded with USGS gauges. There are in the list USGS_current

Next, let's find all of the possible USGS sites (from the 270 currently used on the real time page) that have a corresponding NOAA forecast. 

In [24]:
USGS_in_NOAA = []
for g in df['usgs']:
    if g in USGS_current:
        USGS_in_NOAA.append(g)
len(USGS_in_NOAA)

111

That means there are 111 gauges that are in both my current list of gauges and the NOAA predictions. Since, this is more than 86 that I am currently using, I expect to gain 25 gauges that could have predictions. Let's see if that checks out

## Find the NOAA predictions that are missing in USGS List

In [29]:
new_USGS = []
old_USGS = df2['USGS'].tolist()
for g in USGS_in_NOAA:
    if g in old_USGS:
        pass
    else:
        new_USGS.append(g)
print(new_USGS)

['10092700', '10016900', '09050700', '09497980', '09095500', '09034250', '09180500', '09065100', '09063000', '09242500', '09430500', '09188500', '09152500', '09064000', '10163000', '09497500', '09502000', '09510200', '09050100', '09149500', '09146200', '09506000', '09510000', '09504000', '09508500', '10130500', '10128500', '09505200']


In [30]:
len(new_USGS)

28

This makes sense because I created a few gauges (cataract canyon comes to mind) to reflect some predictions. <br>
We will add these gauges to the existing NOAA to USGS dataframe (df2)

In [45]:
new_USGS_NOAA = []
for g in new_USGS:
    new_USGS_NOAA.append((g, str(df[df['usgs']==g]['NOAA_gauge'])))

print(new_USGS_NOAA)

[('10092700', '26    BIUI1\nName: NOAA_gauge, dtype: object'), ('10016900', '31    EVAW4\nName: NOAA_gauge, dtype: object'), ('09050700', '54    BLRC2\nName: NOAA_gauge, dtype: object'), ('09497980', '73    CHRA3\nName: NOAA_gauge, dtype: object'), ('09095500', '85    CAMC2\nName: NOAA_gauge, dtype: object'), ('09034250', '86    CAWC2\nName: NOAA_gauge, dtype: object'), ('09180500', '95    CLRU1\nName: NOAA_gauge, dtype: object'), ('09065100', '107    CSSC2\nName: NOAA_gauge, dtype: object'), ('09063000', '130    RERC2\nName: NOAA_gauge, dtype: object'), ('09242500', '141    ENMC2\nName: NOAA_gauge, dtype: object'), ('09430500', '171    GILN5\nName: NOAA_gauge, dtype: object'), ('09188500', '191    WBRW4\nName: NOAA_gauge, dtype: object'), ('09152500', '194    GJNC2\nName: NOAA_gauge, dtype: object'), ('09064000', '205    HMSC2\nName: NOAA_gauge, dtype: object'), ('10163000', '302    PPPU1\nName: NOAA_gauge, dtype: object'), ('09497500', '323    SLCA3\nName: NOAA_gauge, dtype: object')

In [None]:
# and NOAA is NOT in list from USGS to NOAA
# then add to the USGS to NOAA list

In the future notebook
    - go to USGS page
    - pull long and lat for that gauge