In this notebook: <br>
- Let's pull the CSV that corelated the NOAA to USGS
- pull the CSV of all the USGS gauges that we use currently in rivermaps.co (not future version)
- Find any items that are on that list but have a NOAA prediction
- Find list of USGS that are in current version, but not correlated with NOAA prediction - currently
- add those to the NOAA to USGS CSV so we can expand our prediction reach

In [1]:
import pandas as pd
from bs4 import BeautifulSoup
import os
import requests
import time

In [2]:
import pickle
path="C:\Springboard\Github\gauge_info"
os.chdir(path)

# Dataframe that contains all of the NOAA predictions and their corresponding USGS gage

In [3]:
# load DF with NOAA and USGS for all gauges in Colorado River Basin that have predictions in NOAA
df = pickle.load(open("NOAA_USGS.pkl", "rb"))
df.head()

Unnamed: 0,NOAA_gauge,River,State,Elevation,Segment,USGS_link,usgs
0,SPRA3,San Pedro,AZ,2820,7,http://waterdata.usgs.gov/az/nwis/uv?09472050,9472050
1,MAOA3,Acdc,AZ,1230,6,0,0
2,MHFA3,Acdc,AZ,1225,7,0,0
3,MSXA3,Acdc,AZ,1220,8,0,0
4,ACHA3,Agua Caliente Wash,AZ,2588,2,0,0


## CSV that corelated the NOAA to USGS - currently used in production at /future

In [10]:
# load DF with NOAA and USGS that are CURRENTLY used in future forecast - these were put together manually
df2 = pd.read_csv("USGS_NOAA_new.csv", names=['USGS', 'NOAA'])
df2

Unnamed: 0,USGS,NOAA
0,09067020,EALC2
1,09057500,BGMC2
2,09066325,GRVC2
3,09070000,GPSC2
4,09070500,EGLC2
...,...,...
109,09504000,VDCA3
110,09508500,VDTA3
111,10130500,CLLU1
112,10128500,OAWU1


## List of all USGS (and CO Water) gages - currently used in production REAL-TIME

In [11]:
# load list from CSV of all USGS (and CO Water) gauges that are currently used
import csv
USGS_current = []
with open('USGS_list.csv', 'r') as f:
    readCSV = csv.reader(f, delimiter=',')
    for row in readCSV:
        for i in row:
            USGS_current.append(i)

In [12]:
len(USGS_current)

270

Before we proceed, let's review the data that we do have:
1. 459 NOAA sites througout the Colorado River Forecast Basin; we have the correponding USGS gauge for just about all of them. These are stored in df
2. 270 USGS (and CO Water) measures that are currently being is in the real-time display of water. These gauges are NOT just from the Colorado River Forecast Basin. These are stored in df2
3. 111 NOAA prediction sites that corresponded with USGS gauges. There are in the list USGS_current

Next, let's find all of the possible USGS sites (from the 270 currently used on the real time page) that have a corresponding NOAA forecast. 

In [13]:
USGS_in_NOAA = []
for g in df['usgs']:
    if g in USGS_current:
        USGS_in_NOAA.append(g)
len(USGS_in_NOAA)

111

That means there are 111 gauges that are in both my current list of gauges and the NOAA predictions. Since, this is more than 86 that I am currently using, I expect to gain 25 gauges that could have predictions. Let's see if that checks out

## Find the NOAA predictions that now need to be pulled for the new models.

In [18]:
# need a list of USGS gages from the models
USGS_models = pickle.load(open("model_gages.pkl", "rb"))

# load those list of USGS and put them up against the DF to pull the NOAA name
for g in USGS_models:
    

# append those gages to the new list

In [19]:
USGS_models

['09112500',
 '09124500',
 '09115500',
 '09067020',
 '09024000',
 '09085000',
 '09073400',
 '09070000',
 '09110000',
 '09342500',
 '09034250',
 '09085100',
 '09166500',
 '10105900',
 '09237500',
 '09107000',
 '09065100',
 '09081600',
 '09415000',
 '10140100',
 '09036000',
 '10141000']

In [14]:
new_USGS = []
old_USGS = df2['USGS'].tolist()
for g in USGS_in_NOAA:
    if g in old_USGS:
        pass
    else:
        new_USGS.append(g)
print(new_USGS)

[]


In [15]:
len(new_USGS)

0

This makes sense because I created a few gauges (cataract canyon comes to mind) to reflect some predictions. <br>
We will add these gauges to the existing NOAA to USGS dataframe (df2)

## add those missing USGS and NOAA predictions to df2 dataframe to add to those 86 current ones

In [10]:
# create list of tuples first
new_USGS_NOAA = []
for g in new_USGS:
    new_USGS_NOAA.append((g, df[df['usgs']==g]['NOAA_gauge'].tolist()[0]))

print(new_USGS_NOAA)

[('10092700', 'BIUI1'), ('10016900', 'EVAW4'), ('09050700', 'BLRC2'), ('09497980', 'CHRA3'), ('09095500', 'CAMC2'), ('09034250', 'CAWC2'), ('09180500', 'CLRU1'), ('09065100', 'CSSC2'), ('09063000', 'RERC2'), ('09242500', 'ENMC2'), ('09430500', 'GILN5'), ('09188500', 'WBRW4'), ('09152500', 'GJNC2'), ('09064000', 'HMSC2'), ('10163000', 'PPPU1'), ('09497500', 'SLCA3'), ('09502000', 'SMDA3'), ('09510200', 'SYCA3'), ('09050100', 'TCFC2'), ('09149500', 'DLAC2'), ('09146200', 'UCRC2'), ('09506000', 'VCVA3'), ('09510000', 'VDBA3'), ('09504000', 'VDCA3'), ('09508500', 'VDTA3'), ('10130500', 'CLLU1'), ('10128500', 'OAWU1'), ('09505200', 'WBVA3')]


In [11]:
# create dataframe from that list of tuples
df_new = pd.DataFrame(new_USGS_NOAA, columns =['USGS', 'NOAA'])
df_new

Unnamed: 0,USGS,NOAA
0,10092700,BIUI1
1,10016900,EVAW4
2,9050700,BLRC2
3,9497980,CHRA3
4,9095500,CAMC2
5,9034250,CAWC2
6,9180500,CLRU1
7,9065100,CSSC2
8,9063000,RERC2
9,9242500,ENMC2


In [12]:
df2 = df2.append(df_new, ignore_index=True)
df2

Unnamed: 0,USGS,NOAA
0,09067020,EALC2
1,09057500,BGMC2
2,09066325,GRVC2
3,09070000,GPSC2
4,09070500,EGLC2
...,...,...
109,09504000,VDCA3
110,09508500,VDTA3
111,10130500,CLLU1
112,10128500,OAWU1


This is excellent! Exactly what we wanted from this exploration! We went from 86 gauges to 114 that we can have prediction for.

## Export those USGS and NOAA to CSV that can be used on the website

In [13]:
# df2.to_csv("USGS_NOAA_newer.csv", index=False, header=False)
# already exported this, so no need to do that again

## Find the list of USGS gauges that are NOT covered by the NOAA predictions

In [18]:
USGS_missing = []
old_USGS = df2['USGS'].tolist()
for g in USGS_current:
    if g not in old_USGS:
        USGS_missing.append(g)
len(USGS_missing)

156

This list of 156 gauges are ones that we don't have predictions for. We will hopefully be able to build models for some of these gauges so that we can use the existing predictions 

In [19]:
df_missing = pd.DataFrame(USGS_missing, columns =['USGS'])

In [20]:
## Export DF so that we can use them in the future notebooks
df_missing.to_pickle("USGS_missing.pkl")
df_missing.to_csv("USGS_missing.csv")

In the future notebook
    - go to USGS page
    - pull long and lat for that gauge; also pull river
    - correlated to close by gauges