-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve discoverability of tables #52
Comments
Okay, two different approaches here, not sure what is best. Would welcome feedback from @sckott @rBatt @jebyrnes @jafflerbach and others here: Option 1: "R way"This is mostly what we've done so far. Each table has a corresponding R function Option 2: "API way"Use the Option 3: Something betterPerhaps more ideally, we should do a whole bunch more table joins on the server side to reduce the number of endpoints/tables one has to query. This would leave the user more to filter out, but would make discovery easier. I'm very interested if most users prefer "one big table" or "many many smaller tables" approach. I think the latter is more intuitive to when you get started, but turns out to be harder to use (now which table did I put that in again?) unless you're really good at table joins etc, but I'd love to hear more thoughts. Option 4: something elseCombination of the above or something else entirely. Thoughts anyone? |
Sorry to chime in so late here Carl. Seems like we're trying to solve this a bit on the API side by making new routes for finding fields across all tables ( I'm curious to hear what |
I haven't had much time to play around here lately, so my apologies if my suggestion mostly reflects an ignorance. I like Car's suggestion 3. I tend to prefer "one big table" setups, but in that case it's important to know the logical associations among the columns (e.g., some columns would be "oxygen" columns etc). I guess you could put the data in a "long" format (a column named "variable", a column named "vrbl_category", and one for "value"; handy for analysis, bad for subsetting and memory). My perspective stems from how I have organized most of my data sets. These would be large surveys with columns for species, location [could be several columns, e.g., region-stratum-lat-lon], date [several columns], weight, length, abundance, etc. I end up with a lot of repeat values (e.g., if multiple individuals caught in same place at same time, only a few columns change across those rows). I would look to rfishbase for additional information to I'm not sure what the "simple interface" vs "all possible data" options would look like. In general, I tend to like when objects have the simple stuff on the surface, with the details still present underneath. It says "I'm guessing you want this, but if not, I have a feeling I might not be on the same page as you with this topic of organization, but hopefully we can achieve that after a little back-and-forth, if that's helpful to you. So let me know if some of this doesn't make sense, or point to a more specific example of formatting options to help get me on the same page. Or just wait for me to use the package more .... (again, sorry about that). Thanks for the ongoing great work that you're doing. |
For my work I would prefer one big table as well that I can then use to select species and attributes of interest that I can then join to my data (much like @rBatt said). |
I normally prefer to read the entire table once then slice it as needed afterwards, which can sometimes be done with just two commands. Reading small tables then joining needs about one read statement and one join statement per table. So from that, I like Option 2 just because it lends to cleaner and simpler code. I don't know what design considerations you're balancing, so maybe there's some performance hit for passing around large tables. Is Option 3 getting at the idea of handling tables like
where On a side note, I wasn't aware of |
As @layamene so effectively put it in her review (#46), most tables are effectively "hidden." Tables should be more discoverable.
FishBase has too many tables. We need:
The text was updated successfully, but these errors were encountered: