#phpGlluchEdX
phpGlluchEdX is a collection of scripts to obtain the metadata from a group of EdX courses.
Tested in novembre 2015
Freeling as server. Change normalize.php and put your IP and port.ç
Rip all the site, for example with HTTrak.
Go to the ripped site, to www.edx.org/course and copy all of them to courses dir in this project.
This files has to be executed in php CLI in this order:
- php description.php. Retrieves information from edX courses.
- php pre_clean.php. Deletes some irrelevant texts.
- php lang_ordering.php. Put the courses in a dir with the lang as name. Results are in json0/en/ or json0/es/
- php normalize.php. Adds POS tagging information.
- php clean.php. Remove puntuation from POS tagging.
The courses information will be in the directory json2, for POS tagged results or in json0/en/ and json0/es/ without POS info.
phpGlluchCoursera
phpGlluchCourseTalk
phpGlluchMiriadaX