Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search for term anywhere in database #59

Open
cboettig opened this issue Jun 26, 2015 · 11 comments
Open

Search for term anywhere in database #59

cboettig opened this issue Jun 26, 2015 · 11 comments

Comments

@cboettig
Copy link
Member

@cboettig cboettig commented Jun 26, 2015

@sckott I'm wondering if we can construct an endpoint to search for a term anywhere in the database.

Given how disorganized the FishBase SQL is, it can be pretty hard to know which table to find something in (e.g. min/max temp, ropensci/rfishbase#47, that several people have requested recently -- it must be there somewhere since it's on the species summary pages).

Haven't found a great solution for doing this in SQL, but there's a few ideas:

The information_schema answer in the first link looks promising. Let me know if you get a chance to take a whack at this?

@sckott

This comment has been minimized.

Copy link
Member

@sckott sckott commented Jul 10, 2015

Hmm. We could index the whole thing in elasticsearch, and search that way. May not be appropriate, just off the top

@sckott

This comment has been minimized.

Copy link
Member

@sckott sckott commented Jul 10, 2015

Something like this, playing with my installation locally:

SELECT TABLE_NAME,COLUMN_NAME FROM information_schema.`columns` limit 300, 5;
+-------------+--------------------+
| TABLE_NAME  | COLUMN_NAME        |
+-------------+--------------------+
| TABLESPACES | TABLESPACE_TYPE    |
| TABLESPACES | LOGFILE_GROUP_NAME |
| TABLESPACES | EXTENT_SIZE        |
| TABLESPACES | AUTOEXTEND_SIZE    |
| TABLESPACES | MAXIMUM_SIZE       |
+-------------+--------------------+
5 rows in set (0.09 sec)

Could do this query to find table, then merge columns required to get the data needed?

I guess if we're thinking of re-orging the API based on a flat file of everything, then this is moot

@cboettig

This comment has been minimized.

Copy link
Member Author

@cboettig cboettig commented Jul 13, 2015

Nice, this looks promising. Let's build an endpoint around this for now,
since it should be quick. (I think it will be a ways away before we can
replace everything with a flat file, but guess we'll see)

On Fri, Jul 10, 2015 at 12:10 PM, Scott Chamberlain <
notifications@github.com> wrote:

Something like this, playing with my installation locally:

SELECT TABLE_NAME,COLUMN_NAME FROM information_schema.columns limit 300, 5;

+-------------+--------------------+
| TABLE_NAME | COLUMN_NAME |
+-------------+--------------------+
| TABLESPACES | TABLESPACE_TYPE |
| TABLESPACES | LOGFILE_GROUP_NAME |
| TABLESPACES | EXTENT_SIZE |
| TABLESPACES | AUTOEXTEND_SIZE |
| TABLESPACES | MAXIMUM_SIZE |
+-------------+--------------------+
5 rows in set (0.09 sec)

Could do this query to find table, then merge columns required to get the
data needed?

I guess if we're thinking of re-orging the API based on a flat file of
everything, then this is moot


Reply to this email directly or view it on GitHub
#59 (comment)
.

Carl Boettiger
UC Santa Cruz
http://carlboettiger.info/

@sckott

This comment has been minimized.

Copy link
Member

@sckott sckott commented Jul 21, 2015

@cboettig working on this, not quite sure what the use case is right now

Are users going to query for fields X, Y, and Z, and return those in any table, if found? Does this endpoint look for fields, and if exist, then return those fields (regardless of table they're in)? if so, then main parameter I guess is fields to specify what fields to return.

But there definitely are duplicate field names in different tables. how to deal with that? I guess users could give namespaced field names, like species:name where the table is species and the field they want is name - then we just parse those namespaced strings apart, and use them

@cboettig

This comment has been minimized.

Copy link
Member Author

@cboettig cboettig commented Jul 21, 2015

@sckott Here's an example use case: discover which table(s) contain the data of min and max temp for a species range (e.g. see ropensci/rfishbase#47).

I imagine the user would query something like "Min Temp", though not quite sure if one would need good fuzzy matching. The exact field is called TempMin and comes from the stocks table, which isn't particularly intuitive from the name "stocks" (as the issue 47 discussion highlights). So the basic return object for the function would be a list of table names and the corresponding field name (since users are unlikely to guess the exact format of the the field name, e.g. TempMin and not MinTemp etc).

Like you say, it's not clear how to handle duplicate field names -- I imagine one would just return a list of all tables containing a match. (that may be useful more generally; e.g. if the duplicate really does refer to the same thing, like a StockID, this may be a way to discover tables that can be joined using a given variable).

@sckott

This comment has been minimized.

Copy link
Member

@sckott sckott commented Jul 21, 2015

@cboettig that makes sense I think. Already have a working route for getting all tables and fields. Will try to add support for a fields parameter passed in to filter by whatever user wants to find

sckott added a commit that referenced this issue Jul 21, 2015
@sckott

This comment has been minimized.

Copy link
Member

@sckott sckott commented Jul 21, 2015

@cboettig Okay, added a route /listfields https://github.com/ropensci/fishbaseapi/blob/master/api_docs.md#listfields - we can rename to anything you want.

Restarted server and getting same problem as yesterday though, so it's not up yet

@sckott

This comment has been minimized.

Copy link
Member

@sckott sckott commented Jul 21, 2015

when it's up try:

  • fishbase.ropensci.org/listfields
  • fishbase.ropensci.org/listfields?fields=Year
  • fishbase.ropensci.org/listfields?fields=Year,Genus
  • fishbase.ropensci.org/listfields?fields=Year,Genus&exact=true
@sckott

This comment has been minimized.

Copy link
Member

@sckott sckott commented Mar 11, 2016

WRT searching anywhere in the DB, I imagine Solr is our best bet, or ES as long as we're sure it can be secured properly. Guess it's a matter of whether we want to put in the effort for this

@sckott

This comment has been minimized.

Copy link
Member

@sckott sckott commented May 31, 2019

sckott added a commit that referenced this issue May 31, 2019
@sckott

This comment has been minimized.

Copy link
Member

@sckott sckott commented May 31, 2019

Added fuzzy search for the /taxa route only, for only the params genus and species

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.