Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ensure googlesheets scraper skips malformed repo ids/rows #79

Merged
merged 1 commit into from
Sep 8, 2022

Conversation

vsoch
Copy link
Contributor

@vsoch vsoch commented Sep 5, 2022

Signed-off-by: vsoch vsoch@users.noreply.github.com

Signed-off-by: vsoch <vsoch@users.noreply.github.com>
@NickleDave
Copy link

Can confirm that I was able to run on a google sheet to completion.

If it helps, here's what the output looks like:

$ rse import --type google-sheet "https://docs.google.com/spreadsheets/d/e/2PACX-1vQkPsu14BG0bErrY0thXymfS55be0spEVX_WpWm2Yy3We8swMO0sIb3iD4Sg-i1lWnxSsiiN5JmWAD-/pub?gid=0&single=true&output=csv"
INFO:rse.main.import.google-sheet:Found software record: https://github.com/patriceguyot/Acoustic_Indices
INFO:rse.main.import.google-sheet:Found software record: https://www.adobe.com/products/audition.html
INFO:rse.main.import.google-sheet:Found software record: https://www.titley-scientific.com/us/anabat-insight.html
INFO:rse.main.import.google-sheet:Found software record: https://datadryad.org/stash/dataset/doi:10.5061/dryad.221mq23
INFO:rse.main.import.google-sheet:Found software record: https://github.com/ChristianBergler/ANIMAL-SPOT
INFO:rse.main.import.google-sheet:Found software record: https://arbimon.rfcx.org/
INFO:rse.main.import.google-sheet:Found software record: https://soundanalysis.wp.st-andrews.ac.uk/
INFO:rse.main.import.google-sheet:Found software record: https://www.audacityteam.org/download/
INFO:rse.main.import.google-sheet:Found software record: https://github.com/nwolek/audiomoth-scripts
INFO:rse.main.import.google-sheet:Found software record: https://github.com/sarabsethi/audioset_soundscape_feats_sethi2019
INFO:rse.main.import.google-sheet:Found software record: https://autoencoded-vocal-analysis.readthedocs.io/en/latest/index.html
INFO:rse.main.import.google-sheet:Found software record: https://github.com/timsainb/AVGN
INFO:rse.main.import.google-sheet:Found software record: http://www.avianz.net/index.php
INFO:rse.main.import.google-sheet:Found software record: http://www.avisoft.com/sound-analysis/
INFO:rse.main.import.google-sheet:Found software record: https://github.com/EricArcher/banter
INFO:rse.main.import.google-sheet:Found software record: https://bitbucket.org/chrisscott/batclassify/src
INFO:rse.main.import.google-sheet:Found software record: https://github.com/macaodha/batdetect
INFO:rse.main.import.google-sheet:Found software record: https://www.batlogger.com/en/products/batexplorer/
INFO:rse.main.import.google-sheet:Found software record: https://www.wsl.ch/en/services-and-products/software-websites-and-apps/batscope-4.html
INFO:rse.main.import.google-sheet:Found software record: https://cran.r-project.org/web/packages/bioacoustics/index.html
INFO:rse.main.import.google-sheet:Found software record: https://birdnet.cornell.edu/
INFO:rse.main.import.google-sheet:Found software record: https://github.com/BirdVox/birdvoxclassify
INFO:rse.main.import.google-sheet:Found software record: https://github.com/BirdVox/birdvoxdetect
INFO:rse.main.import.google-sheet:Found software record: https://github.com/OpenWild/caracal
INFO:rse.main.import.google-sheet:Found software record: https://github.com/vocalpy/crowsetta
INFO:rse.main.import.google-sheet:Found software record: https://github.com/MarineBioAcousticsRC/DetEdit
INFO:rse.main.import.google-sheet:Found software record: https://github.com/DrCoffey/DeepSqueak
INFO:rse.main.import.google-sheet:Found software record: https://github.com/nilomr/fieldtools
INFO:rse.main.import.google-sheet:Found software record: https://github.com/DenaJGibbon/gibbonR-package
INFO:rse.main.import.google-sheet:Found software record: http://www.oldbird.org/glassofire.htm
INFO:rse.main.import.google-sheet:Found software record: https://www.goldwave.com/
INFO:rse.main.import.google-sheet:Found software record: https://github.com/Cdevenish/hardRain
INFO:rse.main.import.google-sheet:Found software record: https://sites.google.com/view/alcore-suzuki/home/harkbird
INFO:rse.main.import.google-sheet:Found software record: https://github.com/vocalpy/hybrid-vocal-classifier
INFO:rse.main.import.google-sheet:Found software record: https://github.com/DanWoodrich/INSTINCT
INFO:rse.main.import.google-sheet:Found software record: http://bioacoustics.us/ishmael.html
INFO:rse.main.import.google-sheet:Found software record: https://www.wildlifeacoustics.com/products/kaleidoscope-pro
INFO:rse.main.import.google-sheet:Found software record: https://meridian.cs.dal.ca/2015/04/12/ketos/
INFO:rse.main.import.google-sheet:Found software record: https://koe.io.ac.nz/
INFO:rse.main.import.google-sheet:Found software record: https://github.com/shyamblast/Koogu/tree/v0.6.5
INFO:rse.main.import.google-sheet:Found software record: https://librosa.org/librosa/
INFO:rse.main.import.google-sheet:Found software record: https://rflachlan.github.io/Luscinia/
INFO:rse.main.import.google-sheet:Found software record: https://cran.r-project.org/web/packages/monitoR/index.html
INFO:rse.main.import.google-sheet:Found software record: https://marce10.github.io/ohun/index.html
INFO:rse.main.import.google-sheet:Found software record: https://github.com/kitzeslab/opensoundscape
INFO:rse.main.import.google-sheet:Found software record: https://www.pamguard.org/
INFO:rse.main.import.google-sheet:Found software record: https://github.com/TaikiSan21/PAMr
INFO:rse.main.import.google-sheet:Found software record: https://github.com/YannickJadoul/Parselmouth
INFO:rse.main.import.google-sheet:Found software record: https://www.fon.hum.uva.nl/praat/
INFO:rse.main.import.google-sheet:Found software record: https://github.com/shivChitinous/prinia-project
INFO:rse.main.import.google-sheet:Found software record: https://ravensoundsoftware.com/software/raven-lite/
INFO:rse.main.import.google-sheet:Found software record: https://ravensoundsoftware.com/software/raven-pro
INFO:rse.main.import.google-sheet:Found software record: https://www.reaper.fm/
INFO:rse.main.import.google-sheet:Found software record: https://github.com/scikit-maad/scikit-maad
INFO:rse.main.import.google-sheet:Found software record: https://docs.scipy.org/doc/scipy/reference/signal.html
INFO:rse.main.import.google-sheet:Found software record: http://dx.doi.org/10.6084/m9.figshare.3792780
INFO:rse.main.import.google-sheet:Found software record: https://cran.r-project.org/web/packages/seewave/index.html
INFO:rse.main.import.google-sheet:Found software record: https://www.sonicvisualiser.org/
INFO:rse.main.import.google-sheet:Found software record: https://sonobat.com/
INFO:rse.main.import.google-sheet:Found software record: https://doi.org/10.1080/09524622.2013.827588
INFO:rse.main.import.google-sheet:Found software record: https://soundata.readthedocs.io/en/latest/
INFO:rse.main.import.google-sheet:Found software record: https://cran.r-project.org/web/packages/soundecology/vignettes/intro.html
INFO:rse.main.import.google-sheet:Found software record: https://github.com/macster110/aipam
INFO:rse.main.import.google-sheet:Found software record: https://github.com/rhine3/specky
INFO:rse.main.import.google-sheet:Found software record: https://github.com/YvesBas/Tadarida-L

https://github.com/YvesBas/Tadarida-D

https://github.com/YvesBas/Tadarida-C
INFO:rse.main.import.google-sheet:Found software record: https://www.cetus.ucsd.edu/technologies_triton.html
INFO:rse.main.import.google-sheet:Found software record: https://github.com/yardencsGitHub/tweetynet
INFO:rse.main.import.google-sheet:Found software record: https://github.com/vocalpy/vak
INFO:rse.main.import.google-sheet:Found software record: https://github.com/HaroldMills/Vesper
INFO:rse.main.import.google-sheet:Found software record: https://cran.r-project.org/web/packages/warbleR/index.html
Found 70 results
WARNING:rse.main.import.google-sheet:Skipping malformed entry www.adobe.com/products/audition.html
WARNING:rse.main.import.google-sheet:Skipping malformed entry www.titley-scientific.com/us/anabat-insight.html
WARNING:rse.main.import.google-sheet:Skipping malformed entry datadryad.org/stash/dataset/doi:10.5061/dryad.221mq23
WARNING:rse.main.import.google-sheet:Skipping malformed entry arbimon.rfcx.org/
WARNING:rse.main.import.google-sheet:Skipping malformed entry soundanalysis.wp.st-andrews.ac.uk/
WARNING:rse.main.import.google-sheet:Skipping malformed entry www.audacityteam.org/download/
WARNING:rse.main.import.google-sheet:Skipping malformed entry autoencoded-vocal-analysis.readthedocs.io/en/latest/index.html
WARNING:rse.main.import.google-sheet:Skipping malformed entry www.avianz.net/index.php
WARNING:rse.main.import.google-sheet:Skipping malformed entry www.avisoft.com/sound-analysis/
WARNING:rse.main.import.google-sheet:Skipping malformed entry bitbucket.org/chrisscott/batclassify/src
WARNING:rse.main.import.google-sheet:Skipping malformed entry www.batlogger.com/en/products/batexplorer/
WARNING:rse.main.import.google-sheet:Skipping malformed entry www.wsl.ch/en/services-and-products/software-websites-and-apps/batscope-4.html
WARNING:rse.main.import.google-sheet:Skipping malformed entry cran.r-project.org/web/packages/bioacoustics/index.html
WARNING:rse.main.import.google-sheet:Skipping malformed entry birdnet.cornell.edu/
WARNING:rse.main.import.google-sheet:Skipping malformed entry www.oldbird.org/glassofire.htm
WARNING:rse.main.import.google-sheet:Skipping malformed entry www.goldwave.com/
WARNING:rse.main.import.google-sheet:Skipping malformed entry sites.google.com/view/alcore-suzuki/home/harkbird
WARNING:rse.main.import.google-sheet:Skipping malformed entry bioacoustics.us/ishmael.html
WARNING:rse.main.import.google-sheet:Skipping malformed entry www.wildlifeacoustics.com/products/kaleidoscope-pro
WARNING:rse.main.import.google-sheet:Skipping malformed entry meridian.cs.dal.ca/2015/04/12/ketos/
WARNING:rse.main.import.google-sheet:Skipping malformed entry koe.io.ac.nz/
ERROR:rse.utils.urls:Cannot find endpoint https://api.github.com/repos/tree/v0.6.5.
WARNING:rse.main.import.google-sheet:Skipping malformed entry github.com/shyamblast/Koogu/tree/v0.6.5
WARNING:rse.main.import.google-sheet:Skipping malformed entry librosa.org/librosa/
ERROR:rse.utils.urls:Cannot find endpoint https://api.github.com/repos/rflachlanhub.io/Luscinia.
WARNING:rse.main.import.google-sheet:Skipping malformed entry rflachlan.github.io/Luscinia/
WARNING:rse.main.import.google-sheet:Skipping malformed entry cran.r-project.org/web/packages/monitoR/index.html
ERROR:rse.utils.urls:Cannot find endpoint https://api.github.com/repos/ohun/index.html.
WARNING:rse.main.import.google-sheet:Skipping malformed entry marce10.github.io/ohun/index.html
INFO:rse.main.database.filesystem:github/kitzeslab/opensoundscape was added to the the database.
WARNING:rse.main.import.google-sheet:Skipping malformed entry www.pamguard.org/
INFO:rse.main.database.filesystem:github/TaikiSan21/PAMr was added to the the database.
INFO:rse.main.database.filesystem:github/YannickJadoul/Parselmouth was added to the the database.
WARNING:rse.main.import.google-sheet:Skipping malformed entry www.fon.hum.uva.nl/praat/
INFO:rse.main.database.filesystem:github/shivChitinous/prinia-project was added to the the database.
WARNING:rse.main.import.google-sheet:Skipping malformed entry ravensoundsoftware.com/software/raven-lite/
WARNING:rse.main.import.google-sheet:Skipping malformed entry ravensoundsoftware.com/software/raven-pro
WARNING:rse.main.import.google-sheet:Skipping malformed entry www.reaper.fm/
INFO:rse.main.database.filesystem:github/scikit-maad/scikit-maad was added to the the database.
WARNING:rse.main.import.google-sheet:Skipping malformed entry docs.scipy.org/doc/scipy/reference/signal.html
WARNING:rse.main.import.google-sheet:Skipping malformed entry dx.doi.org/10.6084/m9.figshare.3792780
WARNING:rse.main.import.google-sheet:Skipping malformed entry cran.r-project.org/web/packages/seewave/index.html
WARNING:rse.main.import.google-sheet:Skipping malformed entry www.sonicvisualiser.org/
WARNING:rse.main.import.google-sheet:Skipping malformed entry sonobat.com/
WARNING:rse.main.import.google-sheet:Skipping malformed entry doi.org/10.1080/09524622.2013.827588
WARNING:rse.main.import.google-sheet:Skipping malformed entry soundata.readthedocs.io/en/latest/
WARNING:rse.main.import.google-sheet:Skipping malformed entry cran.r-project.org/web/packages/soundecology/vignettes/intro.html
INFO:rse.main.database.filesystem:github/macster110/aipam was added to the the database.
INFO:rse.main.database.filesystem:github/rhine3/specky was added to the the database.
INFO:rse.main.database.filesystem:github/YvesBas/Tadarida-C was added to the the database.
WARNING:rse.main.import.google-sheet:Skipping malformed entry www.cetus.ucsd.edu/technologies_triton.html
INFO:rse.main.database.filesystem:github/yardencsGitHub/tweetynet was added to the the database.
INFO:rse.main.database.filesystem:github/vocalpy/vak was added to the the database.
ERROR:rse.utils.urls:Permission denied to query https://api.github.com/repos/HaroldMills/Vesper: 403, rate limit exceeded
WARNING:rse.main.import.google-sheet:Skipping malformed entry github.com/HaroldMills/Vesper
WARNING:rse.main.import.google-sheet:Skipping malformed entry cran.r-project.org/web/packages/warbleR/index.html
(.venv)  pimienta@pop-os  ~/Documents/repos/coding/opensci/bioacoustics/bioacoustics-software   add-rse

@vsoch
Copy link
Contributor Author

vsoch commented Sep 8, 2022

Thanks!

@vsoch vsoch merged commit 1d738ca into master Sep 8, 2022
@vsoch vsoch deleted the skip-empty-rows branch September 8, 2022 01:51
@vsoch
Copy link
Contributor Author

vsoch commented Sep 8, 2022

Released! https://pypi.org/project/rse/0.0.45/

@NickleDave make sure you export a GITHUB_TOKEN (PAT or personal access token) so you don't run into the low ratelimit.

@NickleDave
Copy link

That's helpful to know, thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants