Skip to content
This repository has been archived by the owner on Mar 20, 2023. It is now read-only.

GBIF searching takes too long when country/year is missing #7

Open
villanueval opened this issue Aug 20, 2018 · 2 comments
Open

GBIF searching takes too long when country/year is missing #7

villanueval opened this issue Aug 20, 2018 · 2 comments
Assignees

Comments

@villanueval
Copy link
Member

This is due to the large size of the database (23M records). We need a better way to do the string matching, or do it in the database, when the string does not have a country and/or year to filter the options.

As an alternative, we can do the matching in chunks of data, but it will be slow.

@villanueval villanueval self-assigned this Aug 20, 2018
@ajs6f
Copy link
Member

ajs6f commented Aug 20, 2018

What is being used now? grep or similar?

@villanueval
Copy link
Member Author

I wish. It is using approximate string matching using the R package stringdist (e.g. this line). If the string doesn't have the country where the item was collected, it can't select a subset of the whole database, so it gets 23 million rows and then run the approximate match, which takes about 3 minutes, and the user gets bored.

Another option is to run it as a batch and return the results at a later time/date. But that can wait.

@villanueval villanueval transferred this issue from another repository Feb 26, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants