edX search engine built on Python 2.7, Redis and Postgres
Python Shell HTML
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
scraper
search_engine
utils
web
.gitignore
README.md
__init__.py
app.ini
app.py
app.sock
manage.py
requirements.txt
setup.sh

README.md

Coaster

Search engine for edx.org built on Python 2.7, Redis and Postgres.

Crawler combining Scrapy's distributed architecture with Selenium's ability to traverse Javascript-based web applications (Selenium is configured on a headless Ubuntu server via Docker).

All courses' publicly accessible data (sections, subsections, units as well as videos' links and transcripts) are categorized and stored on a complex relational database built with Postgres.

Calculations (TF-IDF) and cache-like structures required for the search engine are stored using Redis.

Simple web interface made with Flask and jQuery.