MIT's OpenCourseWare Crawler

Author: Rolando Espinoza La fuente <darkrho@gmail.com>

About

MIT's OpenCourseWare is an excellent resource of knowledge. This crawler helps to fetch all courses information, like materials' download links.

Requirements

Scrapy

Usage Example

First, choose a department at MIT's OpenCourseWare. Then figure out the DEPARTMENT_ID which is part of the department's url. In this case we will choose the Nuclear Science and Engineering department using nuclear-engineering as DEPARTMENT_ID.

Finally run scrapy-ctl.py to crawl and fetch all courses information.

To only crawl all courses:

$ ./scrapy-ctl.py crawl materials --set DEPARTMENT_ID=nuclear-engineering

To store results in a CSV file:

$ ./scrapy-ctl.py crawl materials --set DEPARTMENT_ID=nuclear-engineering --set EXPORT_FORMAT=csv --set EXPORT_FILE=materials.csv

To store urls for later usage in a download manager:

$ ./scrapy-ctl.py crawl materials --set DEPARTMENT_ID=nuclear-engineering --set EXPORT_FORMAT=csv --set EXPORT_FILE=materials.csv --set EXPORT_FIELDS=download_url
$ wget -i materials.csv

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
ocw		ocw
.hgignore		.hgignore
LICENSE		LICENSE
README.rst		README.rst
scrapy-ctl.py		scrapy-ctl.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ocw

ocw

.hgignore

.hgignore

LICENSE

LICENSE

README.rst

README.rst

scrapy-ctl.py

scrapy-ctl.py

Repository files navigation

MIT's OpenCourseWare Crawler

About

Requirements

Usage Example

About

Releases

Packages

Languages

License

rmax/mit-ocw-crawler

Folders and files

Latest commit

History

Repository files navigation

MIT's OpenCourseWare Crawler

About

Requirements

Usage Example

About

Resources

License

Stars

Watchers

Forks

Languages