This repository has been archived by the owner on Jun 30, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
Scraping
Aaron Taylor edited this page Jun 19, 2014
·
7 revisions
- Scraping with Ruby
- Intro to scraping: http://ruby.bastardsbook.com/chapters/web-scraping/
- Nokogiri: http://ruby.bastardsbook.com/chapters/html-parsing/
- Mechanize: http://readysteadycode.com/howto-scrape-websites-with-ruby-and-mechanize
- Library Index: https://www.ruby-toolbox.com/categories/Web_Content_Scrapers
- RSS feed scraper: https://github.com/feedjira/feedjira
- Mechanize and Nokogiri Example: http://www.icicletech.com/blog/web-scraping-with-ruby-using-mechanize-and-nokogiri-gems
- Scraping with Python
- Part 1: http://www.packtpub.com/article/web-scraping-with-python
- Part 2: http://www.packtpub.com/article/web-scraping-with-python-part-2
- Diffbot scraping
- Custom APIs: http://www.diffbot.com/products/custom/
- Crawlbot: http://www.diffbot.com/products/crawlbot/
- contact about a custom Calendar API in line with their other services
- Morph: https://morph.io
- source code repo: https://github.com/openaustralia/morph
- Crawlera: http://crawlera.com
- Import.io: https://import.io
- HN comment thread: https://news.ycombinator.com/item?id=7582858
- AlchemyAPI: http://www.alchemyapi.com
- text analysis and data gathering. may be beyond what we need for straight database input of events
- Top Algorithms used from UVM: http://www.cs.uvm.edu/~icdm/algorithms/10Algorithms-08.pdf
- Everything Algorithm: http://gigaom.com/2014/05/23/meet-the-algorithm-that-can-learn-everything-about-anything/
- Machine Learning: http://machinelearningmastery.com/a-tour-of-machine-learning-algorithms
- Data mining Books: http://christonard.com/12-free-data-mining-books/
- Machine Learning Basics Lectures: http://homepages.inf.ed.ac.uk/vlavrenk/iaml.html
© 2014 Peck