Navigation Menu

Skip to content

portovep/cloujera

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cloujera

Cloujera lets you do a fine-grained search for spoken words in Coursera's videos. It does this by performing full text searches on the transcripts of videos on coursera.

Local Setup

  1. Bring up Vagrant (elasticsearch + redis): vagrant up

  2. Compile the clojurescript: (Make sure you have java >1.7) lein cljsbuild once

  3. Start the app: lein run

  4. On the first run, visit http://127.0.0.1:8080/burglar/go to seed the db (it will error out ridiculously with an IndexMissingException from elasticsearch if you don't do this!);

Testing uberjar inside Vagrant

$ vagrant ssh
$ cd /vagrant
$ source ./scripts/prod-env.sh
$ ./scripts/deploy.sh

NOTE: the address to access the uberjarred cloujera running on port 8080 is http://127.0.0.1:8081 (see Vagrantfile)

FIXME: the sourcing of prod-env.sh is just a temporary fix while we move to docker ....

Scraping courses

Visiting http://cloujera.whatever/burglar/go scrapes some 10 courses to get you started;

To scrape another course, you need to:

  1. Visit the cloujera session API https://api.coursera.org/api/catalog.v1/sessions and choose a course

  2. Sign up for the course and agree to honor code manually for the vise890+cloujera@gmail.com user

  3. Find the video lecture URL (videoLecturesURL)

  4. Perform an http POST http://cloujera.whatever/burglar/raid with this payload (JSON):

    { "url": videoLecturesURL }
    

    For example:

    { "url": "https://class.coursera.org/apcalcpart1-001/lecture" }
    

Deployment

Provisioning (The first time)

$ ssh user@cloudmachine
$ git clone https://github.com/vise890/cloujera
$ cd cloujera
$ sudo ./scripts/provision.sh

(Re)-Deploying cloujera

# in the cloujera directory...
$ source ./scripts/prod-env.sh
$ ./scripts/deploy.sh

NOTE: deploy.sh pulls the most recent version of cloujera from the repo

Troubleshooting

Ensure that all the containers are running in the VM:

$ vagrant ssh
$ sudo docker ps -a

You should see redis and elasticsearch running

Checking the cloujera logs

$ cat cloujera.log

Checking elasticsearch health

Visit http://localhost:9200/, you should see status: 200

Checking redis Running

redis-cli will drop you into a redis shell. Some useful commands are: INFO, MONITOR, HELP, HELP @server.

NOTE: this works form the host as well as in the Vagrant VM

BUGS

  • lein run doesn't give any output initially
  • lein run doesn't reload

About

Clojure app that enables full text search for Coursera

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Clojure 86.9%
  • Shell 6.7%
  • CSS 6.4%