Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISSUE: Setup ODBC Connection to Local NASIS #277

Closed
MollicMeyer opened this issue Oct 27, 2022 · 13 comments
Closed

ISSUE: Setup ODBC Connection to Local NASIS #277

MollicMeyer opened this issue Oct 27, 2022 · 13 comments

Comments

@MollicMeyer
Copy link

ISSUE: Setup ODBC Connection to Local NASIS

Please, any assistance would be greatly appreciated. A local NASIS ODBC connection to R is crucial for querying and harmonizing data relevant to the 2022 SPSD Soil Science Collaborative Research National Funding Opportunity (NFO) - No. USDA-NRCS-NHQ-SOILS-22- -NOFO0001124 – “Change Detection of Soil Carbon Stocks from Legacy Data”

I have outlined all of my system and software properties and the NASIS query and selected set reports. I have 4 different attempts to set up the ODBC. 4th attempt was successful. fetchNASIS() error report indicates that no local NASIS database is available. 5th attempt was to restart R and NASIS and reselect the selected set.

I am at a dead end and not sure what to do at this point. The most peculiar issue is that the ODBC connection only works with the windows 7 password.

My attempts and process are attached in the .docx

NASIS Issues.docx

@brownag
Copy link
Member

brownag commented Oct 27, 2022

Thanks for reporting this. I would be happy to work with you 1:1 to sort this out, but please first check out the following ideas I have.

The most peculiar issue is that the ODBC connection only works with the windows 7 password.

You are running a fairly old version of NASIS (7.0.x), and might be running afoul of the heuristics we use to set the login information used internally v.s. the bundled version of SQL Express with your older version.

If only the Windows 7 credentials work, you might need to force soilDB to use the Windows 7 credentials despite the fact you are running Windows 10.

Can you try this and let me know if you get a different result?

library(soilDB)
options(soilDB.NASIS.credentials = "DSN=nasis_local;UID=NasisSqlRO;PWD=nasisRe@d0n1y")
x <- fetchNASIS()

It appears a newer version (NASIS 7.1.x) is available for installation on non-CCE machines. The above might work, but please also consider upgrading: https://new.cloudvault.usda.gov/index.php/s/xFTJabHiT45WDom

Also note that there are considerably newer versions of the SQL Server ODBC Drivers available: https://learn.microsoft.com/en-us/sql/connect/odbc/download-odbc-driver-for-sql-server

@MollicMeyer
Copy link
Author

Thanks for the prompt response! The first option you provided was successful. However, I'm following the suggested quality control options to clean up the dataset and

Error in get("sites.missing.pedons", envir = soilDB.env) : 
  object 'soilDB.env' not found

is this possibly tied to my outdated versions as well?

@brownag
Copy link
Member

brownag commented Oct 28, 2022

Thanks for pointing that out. I believe that is an unrelated problem, I get the same result with my own selected set.

library(soilDB)
x <- fetchNASIS()
#> NOTE: some records are missing surface fragment cover
#> multiple horizontal datums present, consider using WGS84 coordinates (x_std, y_std)
#> Warning: non-standard notation in Munsell value, use getClosestMunsellChip()
#> mixing dry colors ... [37 of 2766 horizons]
#> mixing moist colors ... [52 of 2993 horizons]
#> NOTE: some records are missing rock fragment volume
#> NOTE: all records are missing artifact volume
#> replacing missing lower horizon depths with top depth + 1cm ... [67 horizons]
#> Warning: Horizon top depths contain NA! Check depth logic with
#> aqp::checkHzDepthLogic()
#> Warning: Horizon bottom depths contain NA! Check depth logic with
#> aqp::checkHzDepthLogic()
#> -> QC: sites without pedons: 
#>  Use `get('sites.missing.pedons', envir=soilDB.env) for site record IDs (siteiid)`
#> -> QC: duplicate pedons: 
#>  Use `get('dup.pedon.ids', envir=soilDB.env) for pedon record IDs (peiid)`
#> -> QC: pedons missing bottom hz depths:
#>  Use `get('missing.bottom.depths', envir=soilDB.env)` for pedon record IDs (peiid)
get('missing.bottom.depths', envir=soilDB.env)
#> Error in get("missing.bottom.depths", envir = soilDB.env): object 'soilDB.env' not found

That used to work... and soilDB.env should be created when the package gets loaded... but it is clearly not accessible in the global environment. Will look into fixing that... thanks!

I do know that there have been updates to various queries and metadata to support updated data NASIS data models... so it is possible there are yet-to-be-found incompatibilities there with your old version. We can't guarantee backward compatibility for all of the prior data models... so if you run up against that sort of problem you may have to upgrade.

@brownag
Copy link
Member

brownag commented Oct 28, 2022

So, it is possible to work around the issue with soilDB.env by assigning into the global environment, but CRAN regards modification of the global (user) environment as something that "might be regarded as malicious or anti-social", so we cannot take that approach. I will need to see what sort of other workaround we can do.

brownag added a commit that referenced this issue Oct 28, 2022
@brownag
Copy link
Member

brownag commented Oct 28, 2022

With update in 8e2882c (soilDB v2.7.6) there is now a new method exported called get_soilDB_env() that returns the internal soilDB.env object.

So now the following should work:

library(soilDB)
x <- fetchNASIS()
get('sites.missing.pedons', envir = get_soilDB_env())

I have updated the various QC messages that the data access functions emit, but note that there are several other places in our documentation and training materials that will still use the old code. It will take a little bit before we have updated all such old references.

brownag added a commit that referenced this issue Oct 28, 2022
…for consistency

 - it is still safe to use soilDB.env internally within the package
 - #277
brownag added a commit that referenced this issue Oct 28, 2022
@brownag
Copy link
Member

brownag commented Oct 28, 2022

Turns out the issue with soilDB.env was my fault... when I switched over to using Roxygen for handling the namespace, I neglected to export soilDB.env.

I handled all the functions but forgot that our generic export pattern historically would have exported everything that did not start with a ".". And the alias that "documented" this object was a bit hidden in the "soilDB package" docs.

The object soilDB.env is now exported and aliased to get_soilDB_env() documentation, so you should be able to use either syntax interchangeably!

get('sites.missing.pedons', envir = soilDB.env)
get('sites.missing.pedons', envir = get_soilDB_env())

@MollicMeyer
Copy link
Author

Excellent. I'm rerunning the entire process and will let you know if the get_soilDB_env() ends up working. I'm working through the transitions of the old manual QC cleaning methods and adjusting to what has already been automated in the new versions.

Is there a metadata form for the each of the site and hz columns and how they were derived? For example, I know we used to manually have to simplify the fragment data, but now there is a fragvoltot column already included in the fetchNASIS(), does that stem from simplifyFragmentData ??

@brownag
Copy link
Member

brownag commented Oct 28, 2022

The NASIS metadata are described here: https://www.nrcs.usda.gov/sites/default/files/2022-09/NASIS_741_Tables_and_Columns_.pdf

There is not currently an in-depth description for each of the site and horizon columns included in the fetchNASIS() result, though many of the columns come directly from site/phorizon tables... there are others that are brought in from other child tables (such as Site Observation, or Ecological Site History), and others that are calculated. We really need to address this lack of fetchNASIS-column-specific information going forward.

The fragvoltot column is a value directly out of the pedon horizon table (you will find its description in the above PDF)--it was not always included in the fetchNASIS() result, but has been added more recently for QC purposes. The older values calculated by R are still there.

I know we used to manually have to simplify the fragment data

Not sure how far back we are talking, but the simplifyFragmentData() function has been in place for some time (circa April 2016--so it predates me being in this position). That function calculates the individual fragment size class volumes, as well as the columns total_frags_pct (total fragment volume from the phfrags records) and total_frags_pct_nopf (total fragment volume without parafragments).

If there is a difference between fragvoltot and total_frags_pct it indicates that the NASIS pedon horizon calculation "Fragment Total Volume" has not been run or needs to be re-run.

@MollicMeyer
Copy link
Author

Thank you. Yes the documentation for columns created by fetchNASIS() will be important to address. I've been backtracing through the source code in the meantime.

your get('sites.missing.pedons', envir = get_soilDB_env()) works now. Thank you so much!

I see now that many of the created columns are listed after what has been extracted from the pedon horizon table e.g. total_frags_pct_nopf I will just have to pay attention going forward.

@MollicMeyer
Copy link
Author

Also, when set fetchNASIS(lab = TRUE) I get the following error:

NOTICE: multiple phiid values exist in the phlabresults table, computing weighted averages and dominant values based on horizon thickness
Error in [.data.frame(d.nodups, orig_names) :
undefined columns selected

Any idea what is going on there?

@brownag
Copy link
Member

brownag commented Oct 28, 2022

Thanks! I appreciate you bringing up these issues.

Error in [.data.frame(d.nodups, orig_names) : undefined columns selected

I think this should be fixed now w/ fb65b02

The short story is that when there are multiple lab samples per morphologic horizon (many:1 relationship between phlabsample and phorizon) there is some code used that calculates weighted average for numeric lab data variables and dominant condition (based on horizon thickness) for categorical ones.

There are some heuristics used to split the data.frame queried from phlabresults into three sets--numeric, character and "ph" (which uses a log10 transformation for calculating weighted average)... and it seems some of the columns were missed in the split, resulting in selecting some undefined columns from the "duplicates removed" data. The reason they were missed appears to be because if the columns are all NA they will have a "logical" data type not numeric, character, factor etc, so if they do not get caught by the other column filters they are missing from d.nodups data.frame

Since you are interested in soil carbon stocks / lab data related to carbon you may want to consider processing the output of the query used within soilDB:::.get_phlabresults_data_from_NASIS_db() yourself (if the assumptions baked into the function don't work for you). I imagine you may be interested in the details of subsamples where they may be available.

Please take a look at R/get_phlabresults_data_from_NASIS_db.R and see if you have any suggestions.

@dylanbeaudette
Copy link
Member

Glad you two have made progress on this. NASIS + ODBC + soilDB is a moving target. Given that these questions are related to an NFO, I'd suggest setting up a call with the RO and anyone else interested in assisting. I suspect that there are issues to sort out beyond technical. For example, I suspect that the you may be more interested in data from the Soil Characterization Database ("KSSL") vs. what might be in the NASIS phlabresults table.

@brownag
Copy link
Member

brownag commented Nov 7, 2022

I will close this issue as the original problem (and several others) have been resolved. @MollicMeyer if you run into any other troubles feel free to reach out and/or post a new issue.

@brownag brownag closed this as completed Nov 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants