Skip to content

rmax/mit-ocw-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MIT's OpenCourseWare Crawler

Author

Rolando Espinoza La fuente <darkrho@gmail.com>

About

MIT's OpenCourseWare is an excellent resource of knowledge. This crawler helps to fetch all courses information, like materials' download links.

Requirements

Usage Example

First, choose a department at MIT's OpenCourseWare. Then figure out the DEPARTMENT_ID which is part of the department's url. In this case we will choose the Nuclear Science and Engineering department using nuclear-engineering as DEPARTMENT_ID.

Finally run scrapy-ctl.py to crawl and fetch all courses information.

  • To only crawl all courses:

    $ ./scrapy-ctl.py crawl materials --set DEPARTMENT_ID=nuclear-engineering
  • To store results in a CSV file:

    $ ./scrapy-ctl.py crawl materials --set DEPARTMENT_ID=nuclear-engineering --set EXPORT_FORMAT=csv --set EXPORT_FILE=materials.csv
  • To store urls for later usage in a download manager:

    $ ./scrapy-ctl.py crawl materials --set DEPARTMENT_ID=nuclear-engineering --set EXPORT_FORMAT=csv --set EXPORT_FILE=materials.csv --set EXPORT_FIELDS=download_url
    $ wget -i materials.csv

Releases

No releases published

Packages

No packages published

Languages