# Get Sources Table

Schema/Table: nldi_data.crawler_source
```text
>COLUMN: {'name': 'crawler_source_id', 'type': INTEGER(), 'nullable': False, 'default': None, 'autoincrement': False, 'comment': None}
>COLUMN: {'name': 'source_name', 'type': VARCHAR(length=500), 'nullable': False, 'default': None, 'autoincrement': False, 'comment': None}
>COLUMN: {'name': 'source_suffix', 'type': VARCHAR(length=1000), 'nullable': False, 'default': None, 'autoincrement': False, 'comment': None}
>COLUMN: {'name': 'source_uri', 'type': VARCHAR(length=256), 'nullable': False, 'default': None, 'autoincrement': False, 'comment': None}
>COLUMN: {'name': 'feature_id', 'type': VARCHAR(length=500), 'nullable': False, 'default': None, 'autoincrement': False, 'comment': None}
>COLUMN: {'name': 'feature_name', 'type': VARCHAR(length=500), 'nullable': False, 'default': None, 'autoincrement': False, 'comment': None}
>COLUMN: {'name': 'feature_uri', 'type': VARCHAR(length=256), 'nullable': False, 'default': None, 'autoincrement': False, 'comment': None}
>COLUMN: {'name': 'feature_reach', 'type': VARCHAR(length=500), 'nullable': True, 'default': None, 'autoincrement': False, 'comment': None}
>COLUMN: {'name': 'feature_measure', 'type': VARCHAR(length=500), 'nullable': True, 'default': None, 'autoincrement': False, 'comment': None}
>COLUMN: {'name': 'ingest_type', 'type': VARCHAR(length=5), 'nullable': True, 'default': None, 'autoincrement': False, 'comment': None}
>COLUMN: {'name': 'feature_type', 'type': VARCHAR(length=100), 'nullable': True, 'default': None, 'autoincrement': False, 'comment': None}
```

This is the table of sources which govern where and how the crawler will look for data. Each row in that table is a potential source of data for the NLDI database. 

In [10]:
from sqlalchemy import MetaData, create_engine, inspect
DB_URL="postgresql://nldi_schema_owner:changeMe@172.17.0.1:5432/nldi" ## demo Database (CI is empty)
CONN = create_engine(DB_URL, client_encoding="UTF-8", echo=False, future=True)


In [11]:
SCHEMA = "nldi_data"
TABLE = "crawler_source"

In [12]:
m = MetaData(bind=CONN, schema=SCHEMA)
m.reflect()

  m.reflect()
  m.reflect()


In [24]:
t = m.tables[SCHEMA + "." + TABLE]
for c in t.columns:
    print(c)


crawler_source.crawler_source_id
crawler_source.source_name
crawler_source.source_suffix
crawler_source.source_uri
crawler_source.feature_id
crawler_source.feature_name
crawler_source.feature_uri
crawler_source.feature_reach
crawler_source.feature_measure
crawler_source.ingest_type
crawler_source.feature_type


## Import to a Pandas Dataframe

Turns out that for simple tables (no relates or joins), it is very straightforward to import the 2D table into a Pandas dataframe.
See <https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql_table.html>.

The complication here is that we don't want to use SQL Alchemy to manage the database connection.  The `read_sql_table` method is 
not yet implemented in SQL Alchemy 2.0.  But pandas "knows" enough about SQL that it can do a simple table select and populate
a dataframe with the result.  The key is to give pandas the connection string rather than a connection engine object from SQL Alchemy.

In [25]:
import pandas as pd
# sources = pd.read_sql_table(table_name=TABLE, schema=SCHEMA, con=CONN)    ## NO !!!
print(f"Connect directly to {DB_URL} w/ pandas")
sources = pd.read_sql_table(table_name=TABLE, schema=SCHEMA, con=DB_URL)    ## YES !!!
sources.set_index("crawler_source_id", inplace=True)

Connect directly to postgresql://nldi_schema_owner:changeMe@172.17.0.1:5432/nldi w/ pandas


In [26]:
sources

Unnamed: 0_level_0,source_name,source_suffix,source_uri,feature_id,feature_name,feature_uri,feature_reach,feature_measure,ingest_type,feature_type
crawler_source_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2,HUC12 Pour Points,huc12pp,https://www.sciencebase.gov/catalogMaps/mappin...,HUC_12,HUC_12,HUC_12,,,point,hydrolocation
1,Water Quality Portal,WQP,https://www.waterqualitydata.us/data/Station/s...,MonitoringLocationIdentifier,MonitoringLocationName,siteUrl,,,point,varies
5,NWIS Surface Water Sites,nwissite,https://www.sciencebase.gov/catalog/file/get/6...,provider_id,name,subjectOf,nhdpv2_REACHCODE,nhdpv2_REACH_measure,reach,hydrolocation
6,Water Data Exchange 2.0 Sites,wade,https://www.hydroshare.org/resource/5f665b7b82...,feature_id,feature_name,feature_uri,,,point,varies
7,geoconnex.us reference gages,ref_gage,https://www.hydroshare.org/resource/3295a17b4c...,id,name,subjectOf,nhdpv2_REACHCODE,nhdpv2_REACH_measure,reach,hydrolocation
8,Streamgage catalog for CA SB19,ca_gages,https://sb19.linked-data.internetofwater.dev/c...,site_id,sitename,uri,rchcd_medres,reach_measure,reach,hydrolocation
9,USGS Geospatial Fabric V1.1 Points of Interest,gfv11_pois,https://www.sciencebase.gov/catalogMaps/mappin...,prvdr_d,name,uri,n2_REACHC,n2_REACH_,reach,hydrolocation
10,Vigil Network Data,vigil,https://www.sciencebase.gov/catalog/file/get/6...,SBID,Site Name,SBURL,nhdpv2_REACHCODE,nhdpv2_REACH_measure,reach,hydrolocation
11,NWIS Groundwater Sites,nwisgw,https://www.sciencebase.gov/catalog/file/get/6...,provider_id,name,subjectOf,,,point,point
12,New Mexico Water Data Initative Sites,nmwdi-st,https://locations.newmexicowaterdata.org/colle...,id,name,geoconnex,,,point,point
