diff --git a/AUTHORS.rst b/AUTHORS.rst index 98edf9b2..983997b0 100644 --- a/AUTHORS.rst +++ b/AUTHORS.rst @@ -2,12 +2,9 @@ Credits ======= -Development Lead ----------------- - * Jordan Yoshihara - -Contributors ------------- - * Aron Asor +* Jamie Alexandre +* Benjamin Bach +* Ivan Savov +* David Hu diff --git a/CONTRIBUTING.rst b/CONTRIBUTING.rst index be966108..62a9ec83 100644 --- a/CONTRIBUTING.rst +++ b/CONTRIBUTING.rst @@ -4,93 +4,130 @@ Contributing ============ -Contributions are welcome, and they are greatly appreciated! Every -little bit helps, and credit will always be given. +Contributions to this project are welcome and are in fact greatly appreciated! +Every little bit helps and credit will always be given. Whether you're a junior +Python programmer looking for a open source project to contribute to, an advanced +programmer that can help us make `ricecooker` more efficient, we'd love to hear +from you. We've outlined below some of the ways you can contribute. + -You can contribute in many ways: Types of Contributions ---------------------- Report Bugs ~~~~~~~~~~~ - -Report bugs at https://github.com/learningequality/ricecooker/issues. +Report bugs at https://github.com/learningequality/ricecooker/issues . If you are reporting a bug, please include: -* Your operating system name and version. +* Which version of `ricecooker` you're using. +* Which operating system you're using (name and version). * Any details about your local setup that might be helpful in troubleshooting. * Detailed steps to reproduce the bug. + Fix Bugs ~~~~~~~~ - Look through the GitHub issues for bugs. Anything tagged with "bug" -and "help wanted" is open to whoever wants to implement it. +and "help wanted" is open game for community contributors. + Implement Features ~~~~~~~~~~~~~~~~~~ - Look through the GitHub issues for features. Anything tagged with "enhancement" and "help wanted" is open to whoever wants to implement it. + Write Documentation ~~~~~~~~~~~~~~~~~~~ +The `ricecooker` library can always use more documentation. You can contribute +fixes and improvements to the official `ricecooker` docs, add docstrings to code, +or write a blog post or article and share your experience using `ricecooker`. -ricecooker could always use more documentation, whether as part of the -official ricecooker docs, in docstrings, or even on the web in blog posts, -articles, and such. Submit Feedback ~~~~~~~~~~~~~~~ +The best way to send us your feedback is to file an issue at +https://github.com/learningequality/ricecooker/issues . -The best way to send feedback is to file an issue at https://github.com/learningequality/ricecooker/issues. - -If you are proposing a feature: +If you are proposing a new feature: * Explain in detail how it would work. -* Keep the scope as narrow as possible, to make it easier to implement. -* Remember that this is a volunteer-driven project, and that contributions - are welcome :) +* Try to keep the scope as narrow as possible to make it easier to implement. +* Remember this is a volunteer-driven project, and contributions are welcome :) + + + +Getting Started! +---------------- -Get Started! ------------- +Ready to contribute? In order to work on the `ricecooker` code you'll first need +to make you have [Python 3](https://www.python.org/downloads/) on your computer. +You'll also need to install the Python package [pip](https://pypi.python.org/pypi/pip) +if you don't have it already. -Ready to contribute? Here's how to set up `ricecooker` for local development. +Here are the steps for setting up `ricecooker` for local development: 1. Fork the `ricecooker` repo on GitHub. -2. Clone your fork locally:: +2. Clone your fork of the repository locally, and go into the `ricecooker` directory:: + + git clone git@github.com:/ricecooker.git + cd ricecooker/ + +3. Create a Python virtual environment for this project (optional, but recommended): + + * Install the `virtualenv` package using the command `pip install virtualenv`. - $ git clone git@github.com:learningequality/ricecooker.git + * The steps toThe next steps depends if you're using a UNIX system (Mac/Linux) or Windows: + * For UNIX operating systems: + * Create a virtual env called `venv` in the current directory using the + command: `virtualenv -p python3 venv` + * Activate the virtualenv called `venv` by running: `source venv/bin/activate`. + Your command prompt will change to indicate you're working inside `venv`. + * For Windows systems: + * Create a virtual env called `venv` in the current directory using the + following command: `virtualenv -p C:/Python36/python.exe venv`. + You may need to adjust the `-p` argument depending on where your version + of Python is located. Note you'll need Python version 3.4 or higher. + * Activate the virtualenv called `venv` by running: `.\venv\Scripts\activate` -3. Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed, this is how you set up your fork for local development:: +4. Install the `ricecooker` code in the virtual environment using these commands:: - $ mkvirtualenv ricecooker - $ cd ricecooker/ - $ python setup.py develop + pip install -e . -4. Create a branch for local development:: - $ git checkout -b name-of-your-bugfix-or-feature +5. Create a branch for local development:: + + git checkout -b name-of-your-bugfix-or-feature Now you can make your changes locally. -5. When you're done making changes, check that your changes pass flake8 and the tests, including testing other Python versions with tox:: - $ flake8 ricecooker tests - $ python setup.py test or py.test - $ tox +6. When you're done making changes, check that your changes pass flake8 linter rules + and the `ricecooker` test suite, including testing other Python versions with tox:: + + flake8 ricecooker tests + pytest + tox + + To get `flake8` and `tox`, just `pip`-install them into your virtualenv. + + +7. Commit your changes and push your branch to GitHub:: + + git add . + git commit -m "A detailed description of your changes." + git push origin name-of-your-bugfix-or-feature + + +8. Open a pull request through the GitHub web interface. + + - To get flake8 and tox, just pip install them into your virtualenv. -6. Commit your changes and push your branch to GitHub:: - $ git add . - $ git commit -m "Your detailed description of your changes." - $ git push origin name-of-your-bugfix-or-feature -7. Submit a pull request through the GitHub website. Pull Request Guidelines ----------------------- @@ -100,15 +137,17 @@ Before you submit a pull request, check that it meets these guidelines: 1. The pull request should include tests. 2. If the pull request adds functionality, the docs should be updated. Put your new functionality into a function with a docstring, and add the - feature to the list in README.md. -3. The pull request should work for Python 3.4 and 3.5, and for PyPy. Check + feature to the list in `README.md`. +3. The pull request should work for Python 3.4, 3.5. Check https://travis-ci.org/learningequality/ricecooker/pull_requests and make sure that the tests pass for all supported Python versions. + + + Tips ---- -To run a subset of tests:: - -$ py.test tests.test_ricecooker +To run a subset of tests, you can specify a particular module name:: +$ py.test tests.test_licenses diff --git a/LICENSE b/LICENSE index 34749b5e..ce4f7dfb 100644 --- a/LICENSE +++ b/LICENSE @@ -1,6 +1,6 @@ MIT License -Copyright (c) 2016 Learning Equality +Copyright (c) 2016, 2017, 2018 Learning Equality. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal diff --git a/Makefile b/Makefile index 0b06d622..ac836834 100644 --- a/Makefile +++ b/Makefile @@ -66,6 +66,9 @@ coverage: ## check code coverage quickly with the default Python docs: ## generate Sphinx HTML documentation, including API docs pip install sphinx recommonmark nbsphinx ipython + pandoc -f gfm README.md -t rst -o docs/README.rst + sed -i '' 's/docs\///g' docs/README.rst + sed -i '' 's/\.md/\.html/g' docs/README.rst rm -f docs/ricecooker.rst rm -f docs/modules.rst rm -f docs/ricecooker.classes.rst @@ -74,7 +77,7 @@ docs: ## generate Sphinx HTML documentation, including API docs sphinx-apidoc -o docs/ ricecooker $(MAKE) -C docs clean $(MAKE) -C docs html - # $(MAKE) -C docs latex + #$(MAKE) -C docs latex $(BROWSER) docs/build/html/index.html servedocs: docs ## compile the docs watching for changes diff --git a/README.md b/README.md index 26386bf4..cde86806 100644 --- a/README.md +++ b/README.md @@ -1,372 +1,184 @@ -# Rice Cooker +ricecooker +========== +The `ricecooker` library is a framework for creating Kolibri content channels and +uploading them to [Kolibri Studio](https://studio.learningequality.org/), which +is the central content server that [Kolibri](http://learningequality.org/kolibri/) +applications talk to when they import content. + +The Kolibri content pipeline is pictured below: + +![The Kolibri Content Pipeline](docs/figures/content_pipeline_diagram.png) + +This `ricecooker` framework is the "main actor" in the first part of the content +pipeline, and touches all aspects of the pipeline within the region highlighted +in blue in the above diagram. + + +Before we continue, let's have some definitions: + - A **Kolibri channel** is a tree-like data structure that consist of the following content nodes: + - Topic nodes (folders) + - Content types: + - Document (PDF files) + - Audio (mp3 files) + - Video (mp4 files) + - HTML5App zip files (generic container for web content: HTML+JS+CSS) + - Exercises + - A **sushi chef** is a Python script that uses the `ricecooker` library to + import content from various sources, organize content into Kolibri channels + and upload the channel to Kolibri Studio. + + + +## Overview + +Use the following shortcuts to jump to the most relevant parts of the `ricecooker` +documentation depending on your role: + + - **Content specialists and Administrators** can read the non-technical part + of the documentation to learn about how content works in the Kolibri platform. + - The best place to start is the [Kolibri Platform overview](platform/README.md). + - Read more about the supported [content types here](platform/content_types.md) + - Content curators can consult [this document](https://docs.google.com/document/d/1slwoNT90Wqu0Rr8MJMAEsA-9LWLRvSeOgdg9u7HrZB8/edit?usp=sharing) + for information about how to prepare "spec sheets" that guide developers how + to import content into the Kolibri ecosystem. + - The Non-technical of particular interest is the [CSV workflow](docs/csv_exercises.md) + channel metadata as spreadsheets + + + - **Chef authors** can read the remainder of this README, and get started using + the `ricecooker` library by following these first steps: + - [Quickstart](docs/tutorial/quickstart.ipynb), which will introduce you to + the steps needed to create a sushi chef script. + - After the quickstart, you should be ready to take things into your own + hands, and complete all steps in the [ricecooker tutorial](https://gist.github.com/jayoshih/6678546d2a2fa3e7f04fc9090d81aff6). + - The next step after that is to read the [ricecooker usage docs](docs/usage.md), + which is also available Jupyter notebooks under [docs/tutorial/](docs/tutorial/). + More detailed technical documentation is available on the following topics: + - [Installation](docs/installation.md) + - [Content Nodes](docs/nodes.md) + - [File types](docs/files.md) + - [Exercises](docs/exercises.md) + - [HTML5 apps](docs/htmlapps.md) + - [Parsing HTML](docs/parsing_html.md) + - [Running chef scripts](chefops.md) to learn about the command line args, + for controlling chef operation, managing caches, and other options. + - [Sushi chef style guide](https://docs.google.com/document/d/1_Wh7IxPmFScQSuIb9k58XXMbXeSM0ZQLkoXFnzKyi_s/edit) + + + - **Ricecooker developers** should read all the documentation for chef authors, + and also consult the docs in the [developer/](docs/developer) folder for + additional information info about the "behind the scenes" work needed to + support the Kolibri content pipeline: + - [Running chef scripts](chefops.md), also known as **chefops**. + - [Running chef scripts in daemon mode](developer/daemonization.md) + - [Managing the content pipeline](developer/sushops.md), also known as **sushops**. -A framework for creating channels on [Kolibri Studio](https://studio.learningequality.org/). ## Installation -* [Install ffmpeg](https://ffmpeg.org/) if you don't have it already. +We'll assume you have a Python 3 installation on your computer and are familiar +with best practices for working with Python codes (e.g. `virtualenv` or `pipenv`). +If this is not the case, you can consult the Kolibri developer docs as a guide for +[setting up a Python virtualenv](http://kolibri-dev.readthedocs.io/en/latest/start/getting_started.html#virtual-environment). -* [Install pip](https://pypi.python.org/pypi/pip) if you don't have it already. +The `ricecooker` library is a standard Python library distributed through PyPI: + - Run `pip install ricecooker` to install + You can then use `import ricecooker` in your chef script. + - Some of functions in `ricecooker.utils` require additional software: + - Make sure you install the command line tool [ffmpeg](https://ffmpeg.org/) + - Running javascript code while scraping webpages requires the phantomJS browser. + You can run `npm install phantomjs-prebuilt` in your chef's working directory. -* Run `pip install ricecooker` +For more details and install options, see [docs/installation.md](docs/installation.md). -* You can now reference ricecooker using `import ricecooker` in your .py files -## Using the Rice Cooker +## Simple chef example -The rice cooker is a framework you can use to translate content into Kolibri-compatible objects. -The following steps will guide you through the creation of a program, or sushi chef, -that uses the `ricecooker` framework. -A sample sushi chef has been created [here](https://github.com/learningequality/ricecooker/blob/master/examples/sample_program.py). - - -### Step 1: Obtaining an Authorization Token ### -You will need an authorization token to create a channel on Kolibri Studio. In order to obtain one: - -1. Create an account on [Kolibri Studio](https://studio.learningequality.org/). -2. Navigate to the Tokens tab under your Settings page. -3. Copy the given authorization token. -4. Set `token="auth-token"` in your call to uploadchannel (alternatively, you can create a file with your - authorization token and set `token="path/to/file.txt"`). - - - -### Step 2: Creating a Sushi Chef class ### - -To use the Ricecooker, your chef script must define a sushi chef class that is a -subclass of the class `ricecooker.chefs.SushiChef`. Since it inheriting from the -`SushiChef` class, your chef class will have the method `run` which performs all -the work of uploading your channel to the content curation server. -Your sushi chef class will also inherit the method `main`, which your sushi chef -script should call when it runs on the command line. - -The sushi chef class for your channel must have the following attributes: - - - `channel_info` (dict) that looks like this: - - channel_info = { - 'CHANNEL_SOURCE_DOMAIN': '', # who is providing the content (e.g. learningequality.org) - 'CHANNEL_SOURCE_ID': '', # channel's unique id - 'CHANNEL_TITLE': 'Channel name shown in UI', - 'CHANNEL_LANGUAGE': 'en', # Use language codes from le_utils - 'CHANNEL_THUMBNAIL': 'http://yourdomain.org/img/logo.jpg', # (optional) local path or url to image file - 'CHANNEL_DESCRIPTION': 'What is this channel about?', # (optional) description of the channel (optional) - } - - - `construct_channel(**kwargs) -> ChannelNode`: This method is responsible for - building the structure of your channel (to be discussed below). - -To write the `construct_channel` method of your chef class, start by importing -`ChannelNode` from `ricecooker.classes.nodes` and create a `ChannelNode` using -the data in `self.channel_info`. Once you have the `ChannelNode` instance, the -rest of your chef's `construct_channel` method is responsible for constructing -the channel by adding various `Node`s using the method `add_child`. -`TopicNode`s correspond to folders, while `ContentNode`s correspond to different -type of content nodes. - -`ContentNode` objects (and subclasses like `VideoNode`, `AudioNode`, ...) store -the metadata associate with the content, and are associated with one or more -`File` objects (`VideoFile`, `AudioFile`, ...). - -For example, here is a simple sushi chef class whose `construct_channel` builds -a tree with a single topic. +This is a sushi chef script that uses the `ricecooker` library to create a Kolibri +channel with a single topic node (Folder), and puts a single PDF content node inside that folder. ``` +#!/usr/bin/env python from ricecooker.chefs import SushiChef -from ricecooker.classes.nodes import ChannelNode, TopicNode +from ricecooker.classes.nodes import ChannelNode, TopicNode, DocumentNode +from ricecooker.classes.files import DocumentFile +from ricecooker.classes.licenses import get_license -class MySushiChef(SushiChef): - """ - This is my sushi chef... - """ + +class SimpleChef(SushiChef): channel_info = { - 'CHANNEL_SOURCE_DOMAIN': '', # make sure to change this when testing - 'CHANNEL_SOURCE_ID': '', # channel's unique id - 'CHANNEL_TITLE': 'Channel name shown in UI', - 'CHANNEL_THUMBNAIL': 'http://yourdomain.org/img/logo.jpg', # (optional) local path or url to image file - 'CHANNEL_DESCRIPTION': 'What is this channel about?', # (optional) description of the channel (optional) + 'CHANNEL_TITLE': 'Potatoes info channel', + 'CHANNEL_SOURCE_DOMAIN': '', # where you got the content (change me!!) + 'CHANNEL_SOURCE_ID': '', # channel's unique id (change me!!) + 'CHANNEL_LANGUAGE': 'en', # le_utils language code + 'CHANNEL_THUMBNAIL': 'https://upload.wikimedia.org/wikipedia/commons/b/b7/A_Grande_Batata.jpg', # (optional) + 'CHANNEL_DESCRIPTION': 'What is this channel about?', # (optional) } def construct_channel(self, **kwargs): - # create channel channel = self.get_channel(**kwargs) - # create a topic and add it to channel - potato_topic = TopicNode(source_id="", title="Potatoes!") + potato_topic = TopicNode(title="Potatoes!", source_id="") channel.add_child(potato_topic) + doc_node = DocumentNode( + title='Growing potatoes', + description='An article about growing potatoes on your rooftop.', + source_id='pubs/mafri-potatoe', + license=get_license('CC BY', copyright_holder='University of Alberta'), + language='en', + files=[DocumentFile(path='https://www.gov.mb.ca/inr/pdf/pubs/mafri-potatoe.pdf', + language='en')], + ) + potato_topic.add_child(doc_node) return channel -``` - -You can now run of you chef by creating an instance of the chef class and calling -it's `run` method: - - -``` -mychef = MySushiChef() -args = {'token': 'YOURTOKENHERE9139139f3a23232', 'reset': True, 'verbose': True} -options = {} -mychef.run(args, options) -``` - -Note: Normally you'll pass `args` and `options` on the command line, but you can -pass dict objects with the necessary parameters for testing. - -If you get an error, make sure you've replaced `YOURTOKENHERE9139139f3a23232` by -the token you obtained from the content curation server and you've changed -`channel_info['CHANNEL_SOURCE_DOMAIN']` and/or `channel_info['CHANNEL_SOURCE_ID']` -instead of using the default values. - -If the channel run was successful, you should be able to see your single-topic -channel on the content curation server. The topic node "Potatoes!" is nice to -look at, but it feels kind of empty. Let's add more nodes to it! - - -### Step 3: Creating Nodes ### - -Once your channel is created, you can start adding nodes. To do this, you need to -convert your data to the rice cooker's objects. Here are the classes that are -available to you (import from `ricecooker.classes.nodes`): - - - __TopicNode__: folders to organize to the channel's content - - __VideoNode__: content containing mp4 file - - __AudioNode__: content containing mp3 file - - __DocumentNode__: content containing pdf file - - __HTML5AppNode__: content containing zip of html files (html, js, css, etc.) - - __ExerciseNode__: assessment-based content with questions - - -Each node has the following attributes: - - - __source_id__ (str): content's original id - - __title__ (str): content's title - - __license__ (str or License): content's license id or object - - __description__ (str): description of content (optional) - - __author__ (str): who created the content (optional) - - __thumbnail__ (str or ThumbnailFile): path to thumbnail or file object (optional) - - __files__ ([FileObject]): list of file objects for node (optional) - - __extra_fields__ (dict): any additional data needed for node (optional) - - __domain_ns__ (uuid): who is providing the content (e.g. learningequality.org) (optional) - -**IMPORTANT**: nodes representing distinct pieces of content MUST have distinct `source_id`s. -Each node has a `content_id` (computed as a function of the `source_domain` and the node's `source_id`) that uniquely identifies a piece of content within Kolibri for progress tracking purposes. For example, if the same video occurs in multiple places in the tree, you would use the same `source_id` for those nodes -- but content nodes that aren't for that video need to have different `source_id`s. - -All non-topic nodes must be assigned a license upon initialization. You can use the license's id (found under `le_utils.constants.licenses`) or create a license object from `ricecooker.classes.licenses` (recommended). When initializing a license object, you can specify a __copyright_holder__ (str), or the person or organization who owns the license. If you are unsure which license class to use, a `get_license` method has been provided that takes in a license id and returns a corresponding license object. - -For example: -``` -from ricecooker.classes.licenses import get_license -from le_utils.constants import licenses - -node = VideoNode( - license = get_license(licenses.CC_BY, copyright_holder="Khan Academy"), - ... -) -``` - -Thumbnails can also be passed in as a path to an image (str) or a ThumbnailFile object. Files can be passed in upon initialization, but can also be added at a later time. More details about how to create a file object can be found in the next section. VideoNodes also have a __derive_thumbnail__ (boolean) argument, which will automatically extract a thumbnail from the video if no thumbnails are provided. - -Once you have created the node, add it to a parent node with `parent_node.add_child(child_node)` - - - -### Step 4a: Adding Files ### - -To add a file to your node, you must start by creating a file object from `ricecooker.classes.files`. Your sushi chef is responsible for determining which file object to create. Here are the available file models: - - __ThumbnailFile__: png or jpg files to add to any kind of node - - __AudioFile__: mp3 file - - __DocumentFile__: pdf file - - __HTMLZipFile__: zip of html files (must have `index.html` file at topmost level) - - __VideoFile__: mp4 file (can be high resolution or low resolution) - - __SubtitleFile__: vtt files to be used with VideoFiles - - __WebVideoFile__: video downloaded from site such as YouTube or Vimeo - - __YouTubeVideoFile__: video downloaded from YouTube using a youtube video id - - -Each file class can be passed a __preset__ and __language__ at initialization (SubtitleFiles must have a language set at initialization). A preset determines what kind of file the object is (e.g. high resolution video vs. low resolution video). A list of available presets can be found at `le_utils.constants.format_presets`. A list of available languages can be found at `le_utils.constants.languages`. - -ThumbnailFiles, AudioFiles, DocumentFiles, HTMLZipFiles, VideoFiles, and SubtitleFiles must be initialized with a __path__ (str). This path can be a url or a local path to a file. -``` -from le_utils.constants import languages - -file_object = SubtitleFile( - path = "file:///path/to/file.vtt", - language = languages.getlang('en').code, - ... -) -``` - -VideoFiles can also be initialized with __ffmpeg_settings__ (dict), which will be used to determine compression settings for the video file. -``` -file_object = VideoFile( - path = "file:///path/to/file.mp3", - ffmpeg_settings = {"max_width": 480, "crf": 20}, - ... -) -``` - -WebVideoFiles must be given a __web_url__ (str) to a video on YouTube or Vimeo, and YouTubeVideoFiles must be given a __youtube_id__ (str). WebVideoFiles and YouTubeVideoFiles can also take in __download_settings__ (dict) to determine how the video will be downloaded and __high_resolution__ (boolean) to determine what resolution to download. -``` -file_object = WebVideoFile( - web_url = "https://vimeo.com/video-id", - ... -) - -file_object = YouTubeVideoFile( - youtube_id = "abcdef", - ... -) -``` - - - -### Step 4b: Adding Exercises ### - -ExerciseNodes are special objects that have questions used for assessment. To add a question to your exercise, you must first create a question model from `ricecooker.classes.questions`. Your sushi chef is responsible for determining which question type to create. Here are the available question types: - - - __PerseusQuestion__: special question type for pre-formatted perseus questions - - __MultipleSelectQuestion__: questions that have multiple correct answers (e.g. check all that apply) - - __SingleSelectQuestion__: questions that only have one right answer (e.g. radio button questions) - - __InputQuestion__: questions that have text-based answers (e.g. fill in the blank) - - -Each question class has the following attributes that can be set at initialization: - - - __id__ (str): question's unique id - - __question__ (str): question body, in plaintext or Markdown format; math expressions must be in Latex format, surrounded by `$`, e.g. `$ f(x) = 2 ^ 3 $`. - - __answers__ ([{'answer':str, 'correct':bool}]): answers to question, also in plaintext or Markdown - - __hints__ (str or [str]): optional hints on how to answer question, also in plaintext or Markdown - - -To set the correct answer(s) for MultipleSelectQuestions, you must provide a list of all of the possible choices as well as an array of the correct answers (`all_answers [str]`) and `correct_answers [str]` respectively). -``` -question = MultipleSelectQuestion( - question = "Select all prime numbers.", - correct_answers = ["2", "3", "5"], - all_answers = ["1", "2", "3", "4", "5"], - ... -) -``` - -To set the correct answer(s) for SingleSelectQuestions, you must provide a list of all possible choices as well as the correct answer (`all_answers [str]` and `correct_answer str` respectively). -``` -question = SingleSelectQuestion( - question = "What is 2 x 3?", - correct_answer = "6", - all_answers = ["2", "3", "5", "6"], - ... -) -``` - -To set the correct answer(s) for InputQuestions, you must provide an array of all of the accepted answers (`answers [str]`). -``` -question = InputQuestion( - question = "Name a factor of 10.", - answers = ["1", "2", "5", "10"], -) -``` - -To add images to a question's question, answers, or hints, format the image path with `'![](path/to/some/file.png)'` and the rice cooker will parse them automatically. - - -In order to set the criteria for completing exercises, you must set __exercise_data__ to equal a dict containing a mastery_model field based on the mastery models provided under `le_utils.constants.exercises`. If no data is provided, the rice cooker will default to mastery at 3 of 5 correct. For example: -``` -node = ExerciseNode( - exercise_data={ - 'mastery_model': exercises.M_OF_N, - 'randomize': True, - 'm': 3, - 'n': 5, - }, - ... -) +if __name__ == '__main__': + """ + Run this script on the command line using: + python simple_chef.py -v --reset --token=YOURTOKENHERE9139139f3a23232 + """ + simple_chef = SimpleChef() + simple_chef.main() ``` -Once you have created the appropriate question object, add it to an exercise object with `exercise_node.add_question(question)` - - - -### Step 5: Running your chef script ### - -Your sushi chef scripts will run as standalone command line application -`mychef.py` which you can call from the command line. - -To make the script file `mychef.py` a command line program, you need to do three things: - - - Add the line `#!/usr/bin/env python` as the first line of `mychef.py` - - Add this code block at the bottom of `mychef.py`: - - if __name__ == '__main__': - chef = MySushiChef() - chef.main() - - - Make the file `mychef.py` executable by running `chmod +x mychef.py` on the - command line. - -The final chef script file `mychef.py` should look like this: +Let's assume the above code snippet is saved as the file `simple_chef.py`. - #!/usr/bin/env python - ... - ... - class MySushiChef(SushiChef): - channel_info = { ... } - def construct_channel(**kwargs): - ... - ... - ... - ... - if __name__ == '__main__': - chef = MySushiChef() - chef.main() +You can run the chef script by passing the appropriate command line arguments: -You can now call the script by passing the appropriate command line arguments: + python simple_chef.py -v --reset --token=YOURTOKENHERE9139139f3a23232 - ./mychef.py -v --token=YOURTOKENHERE9139139f3a23232 --reset +The most important argument when running a chef script is `--token` which is used +to pass in the Studio Access Token which you can obtain from your profile's +[settings page](http://studio.learningequality.org/settings/tokens). -To see the help menu, type +The flags `-v` (verbose) and `--reset` are generally useful in development. +These make sure the chef script will start the process from scratch and displays +useful debugging information on the command line. - ./mychef.py -h +To see all the `ricecooker` command line options, run `python simple_chef.py -h`. +For more details about running chef scripts see [the chefops page](docs/chefops.md). -Here the full list of the supported command line args: +If you get an error when running the chef, make sure you've replaced +`YOURTOKENHERE9139139f3a23232` by the token you obtained from Studio. +Also make sure you've changed the value of `channel_info['CHANNEL_SOURCE_DOMAIN']` +and `channel_info['CHANNEL_SOURCE_ID']` instead of using the default values. - - `-h` (help) will print how to use the rice cooker - - `-v` (verbose) will print what the rice cooker is doing - - `-u` (update) will force the ricecooker to redownload all files (skip checking the cache) - - `--download-attempts=3` will set the maximum number of times to retry downloading files - - `--debug` will print out debugging statements during rice cooking session - - `--warn` will print out warnings during rice cooking session - - `--quiet` will print out errors during rice cooking session - - `--compress` will compress your high resolution videos to save space - - `--token` will authorize you to create your channel (obtained in Step 1) - - `--resume` will resume your previous rice cooking session - - `--step=LAST` will specify at which step to resume your session - - `--reset` will automatically start the rice cooker from the beginning - - `--prompt` will prompt you to open your channel once it's been uploaded - - `--publish` will automatically publish your channel once it's been uploaded - - `--daemon` will start the chef in daemon mode (i.e. the chef will not execute - immediately; instead, it will wait to receive commands via the Sushi Bar) - - `[OPTIONS]` any additional key=value options you would like to pass to your construct_channel method +## Next steps -### Optional: Resuming the Rice Cooker ### + - See the [usage docs](docs/usage.md) for more explanations about the above code. + - See [nodes](docs/nodes.md) to learn how to create different content node types. + - See [file](docs/files.md) to learn about the file types supported, and how to create them. -If your rice cooking session gets interrupted, you can resume from any step that -has already completed using `--resume --step=` option. If step is not specified, -the rice cooker will resume from the last step you ran. If the specified step has -not been reached, the rice cooker will resume from. Other choices for `--step`: - - __LAST__: Resume where the session left off (default) - - __INIT__: Resume at beginning of session - - __CONSTRUCT_CHANNEL__: Resume with call to construct channel - - __CREATE_TREE__: Resume at set tree relationships - - __DOWNLOAD_FILES__: Resume at beginning of download process - - __GET_FILE_DIFF__: Resume at call to get file diff from Kolibri Studio - - __START_UPLOAD__: Resume at beginning of uploading files to Kolibri Studio - - __UPLOADING_FILES__: Resume at last upload request - - __UPLOAD_CHANNEL__: Resume at beginning of uploading tree to Kolibri Studio - - __PUBLISH_CHANNEL__: Resume at option to publish channel - - __DONE__: Resume at prompt to open channel +## Further reading + - Read the [Kolibri Studio docs](http://kolibri-studio.readthedocs.io/en/latest/) + to learn more about the Kolibri Studio features + - Read the [Kolibri user guide](http://kolibri.readthedocs.io/en/latest/) to learn + how to install Kolibri on your machine (useful for testing channels) + - Read the [Kolibri developer docs](http://kolibri-dev.readthedocs.io/en/latest/) + to learn about the inner workings of Kolibri. diff --git a/docs/README.md b/docs/README.md deleted file mode 120000 index 32d46ee8..00000000 --- a/docs/README.md +++ /dev/null @@ -1 +0,0 @@ -../README.md \ No newline at end of file diff --git a/docs/README.rst b/docs/README.rst new file mode 100644 index 00000000..213e8948 --- /dev/null +++ b/docs/README.rst @@ -0,0 +1,227 @@ +ricecooker +========== + +The ``ricecooker`` library is a framework for creating Kolibri content +channels and uploading them to `Kolibri +Studio `__, which is the central +content server that `Kolibri `__ +applications talk to when they import content. + +The Kolibri content pipeline is pictured below: + +|The Kolibri Content Pipeline| + +This ``ricecooker`` framework is the "main actor" in the first part of +the content pipeline, and touches all aspects of the pipeline within the +region highlighted in blue in the above diagram. + +Before we continue, let's have some definitions: + +- A **Kolibri channel** is a tree-like data structure that consist of + the following content nodes: + + - Topic nodes (folders) + - Content types: + + - Document (PDF files) + - Audio (mp3 files) + - Video (mp4 files) + - HTML5App zip files (generic container for web content: + HTML+JS+CSS) + - Exercises + +- A **sushi chef** is a Python script that uses the ``ricecooker`` + library to import content from various sources, organize content into + Kolibri channels and upload the channel to Kolibri Studio. + +Overview +-------- + +Use the following shortcuts to jump to the most relevant parts of the +``ricecooker`` documentation depending on your role: + +- **Content specialists and Administrators** can read the non-technical + part of the documentation to learn about how content works in the + Kolibri platform. + + - The best place to start is the `Kolibri Platform + overview `__. + - Read more about the supported `content types + here `__ + - Content curators can consult `this + document `__ + for information about how to prepare "spec sheets" that guide + developers how to import content into the Kolibri ecosystem. + - The Non-technical of particular interest is the `CSV + workflow `__ channel metadata as + spreadsheets + +- **Chef authors** can read the remainder of this README, and get + started using the ``ricecooker`` library by following these first + steps: + + - `Quickstart `__, which will + introduce you to the steps needed to create a sushi chef script. + - After the quickstart, you should be ready to take things into your + own hands, and complete all steps in the `ricecooker + tutorial `__. + - The next step after that is to read the `ricecooker usage + docs `__, which is also available Jupyter notebooks + under `tutorial/ `__. More detailed technical + documentation is available on the following topics: + - `Installation `__ + - `Content Nodes `__ + - `File types `__ + - `Exercises `__ + - `HTML5 apps `__ + - `Parsing HTML `__ + - `Running chef scripts `__ to learn about the command + line args, for controlling chef operation, managing caches, and + other options. + - `Sushi chef style + guide `__ + +- **Ricecooker developers** should read all the documentation for chef + authors, and also consult the docs in the + `developer/ `__ folder for additional information + info about the "behind the scenes" work needed to support the Kolibri + content pipeline: + + - `Running chef scripts `__, also known as **chefops**. + - `Running chef scripts in daemon + mode `__ + - `Managing the content pipeline `__, also + known as **sushops**. + +Installation +------------ + +We'll assume you have a Python 3 installation on your computer and are +familiar with best practices for working with Python codes (e.g. +``virtualenv`` or ``pipenv``). If this is not the case, you can consult +the Kolibri developer docs as a guide for `setting up a Python +virtualenv `__. + +The ``ricecooker`` library is a standard Python library distributed +through PyPI: + +- Run ``pip install ricecooker`` to install You can then use + ``import ricecooker`` in your chef script. +- Some of functions in ``ricecooker.utils`` require additional + software: + + - Make sure you install the command line tool + `ffmpeg `__ + - Running javascript code while scraping webpages requires the + phantomJS browser. You can run ``npm install phantomjs-prebuilt`` + in your chef's working directory. + +For more details and install options, see +`installation.html `__. + +Simple chef example +------------------- + +This is a sushi chef script that uses the ``ricecooker`` library to +create a Kolibri channel with a single topic node (Folder), and puts a +single PDF content node inside that folder. + +:: + + #!/usr/bin/env python + from ricecooker.chefs import SushiChef + from ricecooker.classes.nodes import ChannelNode, TopicNode, DocumentNode + from ricecooker.classes.files import DocumentFile + from ricecooker.classes.licenses import get_license + + + class SimpleChef(SushiChef): + channel_info = { + 'CHANNEL_TITLE': 'Potatoes info channel', + 'CHANNEL_SOURCE_DOMAIN': '', # where you got the content (change me!!) + 'CHANNEL_SOURCE_ID': '', # channel's unique id (change me!!) + 'CHANNEL_LANGUAGE': 'en', # le_utils language code + 'CHANNEL_THUMBNAIL': 'https://upload.wikimedia.org/wikipedia/commons/b/b7/A_Grande_Batata.jpg', # (optional) + 'CHANNEL_DESCRIPTION': 'What is this channel about?', # (optional) + } + + def construct_channel(self, **kwargs): + channel = self.get_channel(**kwargs) + potato_topic = TopicNode(title="Potatoes!", source_id="") + channel.add_child(potato_topic) + doc_node = DocumentNode( + title='Growing potatoes', + description='An article about growing potatoes on your rooftop.', + source_id='pubs/mafri-potatoe', + license=get_license('CC BY', copyright_holder='University of Alberta'), + language='en', + files=[DocumentFile(path='https://www.gov.mb.ca/inr/pdf/pubs/mafri-potatoe.pdf', + language='en')], + ) + potato_topic.add_child(doc_node) + return channel + + + if __name__ == '__main__': + """ + Run this script on the command line using: + python simple_chef.py -v --reset --token=YOURTOKENHERE9139139f3a23232 + """ + simple_chef = SimpleChef() + simple_chef.main() + +Let's assume the above code snippet is saved as the file +``simple_chef.py``. + +You can run the chef script by passing the appropriate command line +arguments: + +:: + + python simple_chef.py -v --reset --token=YOURTOKENHERE9139139f3a23232 + +The most important argument when running a chef script is ``--token`` +which is used to pass in the Studio Access Token which you can obtain +from your profile's `settings +page `__. + +The flags ``-v`` (verbose) and ``--reset`` are generally useful in +development. These make sure the chef script will start the process from +scratch and displays useful debugging information on the command line. + +To see all the ``ricecooker`` command line options, run +``python simple_chef.py -h``. For more details about running chef +scripts see `the chefops page `__. + +If you get an error when running the chef, make sure you've replaced +``YOURTOKENHERE9139139f3a23232`` by the token you obtained from Studio. +Also make sure you've changed the value of +``channel_info['CHANNEL_SOURCE_DOMAIN']`` and +``channel_info['CHANNEL_SOURCE_ID']`` instead of using the default +values. + +Next steps +---------- + +- See the `usage docs `__ for more explanations about + the above code. +- See `nodes `__ to learn how to create different + content node types. +- See `file `__ to learn about the file types supported, + and how to create them. + +Further reading +--------------- + +- Read the `Kolibri Studio + docs `__ to learn + more about the Kolibri Studio features +- Read the `Kolibri user + guide `__ to learn how to + install Kolibri on your machine (useful for testing channels) +- Read the `Kolibri developer + docs `__ to learn about + the inner workings of Kolibri. + +.. |The Kolibri Content Pipeline| image:: figures/content_pipeline_diagram.png + diff --git a/docs/authors.rst b/docs/authors.rst deleted file mode 100644 index 98edf9b2..00000000 --- a/docs/authors.rst +++ /dev/null @@ -1,13 +0,0 @@ -======= -Credits -======= - -Development Lead ----------------- - -* Jordan Yoshihara - -Contributors ------------- - -* Aron Asor diff --git a/docs/chefops.md b/docs/chefops.md new file mode 100644 index 00000000..27375124 --- /dev/null +++ b/docs/chefops.md @@ -0,0 +1,132 @@ +Running chef scrips +=================== +The base class `SushiChef` provides a lot of command line arguments that control +the chef script's operation. It is expected that **every chef script will come +with a README** that explains the desired command line arguments for the chef script. +We call "chef ops" all this additional context necessary to run the chef. + + +Executable scripts +------------------ +On UNIX systems, you can make your sushi chef script (e.g. `chef.py`) run as a +standalone command line application. To make a script program, you need to do three things: + + - Add the line `#!/usr/bin/env python` as the first line of `chef.py` + - Add this code block at the bottom of `chef.py` if it is not already there: + + if __name__ == '__main__': + chef = MySushiChef() # replace with you chef class name + chef.main() + + - Make the file `chef.py` executable by running `chmod +x chef.py` on the + command line. + +You can now call your sushi chef script using `./chef.py ...` instead of the longer +`python chef.py ...`. + + + +Ricecooker CLI +-------------- +You can run `./chef.py -h` to see an always-up-to-date info about the `ricecooker` CLI interface: + + usage: tutorial_chef.py [-h] [--token TOKEN] [-u] [-v] [--quiet] [--warn] + [--debug] [--compress] [--thumbnails] + [--download-attempts DOWNLOAD_ATTEMPTS] + [--reset | --resume] + [--step {INIT, CONSTRUCT_CHANNEL, CREATE_TREE, DOWNLOAD_FILES, GET_FILE_DIFF, + START_UPLOAD, UPLOADING_FILES, UPLOAD_CHANNEL, PUBLISH_CHANNEL,DONE, LAST}] + [--prompt] [--stage] [--publish] [--daemon] + [--nomonitor] [--cmdsock CMDSOCK] + + required arguments: + --token TOKEN Authorization token (can be token or path to file with token) + + optional arguments: + -h, --help show this help message and exit + -u, --update Force re-download of files (skip .ricecookerfilecache/ check) + -v, --verbose Verbose mode + --quiet Print only errors to stderr + --warn Print warnings to stderr + --debug Print debugging log info to stderr + --compress Compress high resolution videos to low resolution + videos + --thumbnails Automatically generate thumbnails for topics + --download-attempts DOWNLOAD_ATTEMPTS + Maximum number of times to retry downloading files + --reset Restart session, overwriting previous session (cannot + be used with --resume flag) + --resume Resume from ricecooker step (cannot be used with + --reset flag) + --step {INIT, ... Step to resume progress from (must be used with --resume flag) + --prompt Prompt user to open the channel after creating it + --stage Upload to staging tree to allow for manual + verification before replacing main tree + --publish Publish newly uploaded version of the channel + --daemon Run chef in daemon mode + --nomonitor Disable SushiBar progress monitoring + --cmdsock CMDSOCK Local command socket (for cronjobs) + + extra options: + You can pass arbitrary key=value options on the command line + + +### Extra options +In addition to the command line arguments described above, the `ricecooker` CLI +supports passing additional keyword options using the format `key=value key2=value2`. + +It is common for a chef script to accept a "language option" like `lang=fr` which +runs the French version of the chef script. This way a single chef codebase can +create multiple Kolibri Studio channels, one for each language. + +These extra options will be parsed along with the `riceooker` arguments and +passed as along to all the chef's methods: `pre_run`, `run`, `get_channel`, +`construct_channel`, etc. + +For example, a script started using `./chef.py ... lang=fr` could: + - Subclass the method `get_channel` to set the channel name to + `"Channel Name ({})".format(getlang('fr').native_name)` + - Use the language code `fr` in `pre_run`, `run`, and `construct_channel` to + crawl and scrape the French version of the source website + + +### Resuming interrupted chef runs +If your rice cooking session gets interrupted, you can resume from any step that +has already completed using `--resume --step=` option. If step is not specified, +the rice cooker will resume from the last step you ran. If the specified step has +not been reached, the rice cooker will resume from. + +The "state" necessary to support these checkpoints is stored in the directory +`restore` in the folder where the chef runs. +Use the `--reset` flag to skip the auto-resume prompt. + + + +### Caching +Use `--update` argument to skip checks for the `.ricecookerfilecache` directory. +This is required if you suspect the files on the source website have been updated. + +Note that some chef scripts implement their own caching mechanism, so you need +to disable those caches as well if you want to make sure you're getting new content. + + + +Run scripts +----------- +For complicated chef scripts that run in multiple languages or with multiple +options, the chef author can implement a "run script" that can be run as: + + ./run.sh + +The script should contain the appropriate command args and options (basically the +same thing as the instructions in the chef's README but runnable). + + + +Daemon mode +----------- +Starting a chef script with the `--daemon` argument makes it listen for remote +control commands from the [sushibar](https://sushibar.learningequality.org/) host. +See [daemonization][developer/daemonization.md] for more info. + + diff --git a/docs/conf.py b/docs/conf.py index e6f9c523..d4b2ece4 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -50,12 +50,6 @@ # Add any paths that contain templates here, relative to this directory. templates_path = ['_templates'] -# The suffix(es) of source filenames. -# You can specify multiple suffix as a list of string: -# -# source_suffix = ['.rst', '.md'] -source_suffix = '.rst' - # The encoding of source files. # # source_encoding = 'utf-8-sig' @@ -65,8 +59,8 @@ # General information about the project. project = 'ricecooker' -copyright = '2017, Learning Equality' -author = 'Learning Equality' +copyright = '2018, Learning Equality' +author = 'Learning Equality Content Team' # The version info for the project you're documenting, acts as replacement for # |version| and |release|, also used in various other places throughout the @@ -317,8 +311,10 @@ # One entry per manual page. List of tuples # (source start file, name, description, authors, manual section). man_pages = [ - (master_doc, 'ricecooker', 'ricecooker Documentation', - [author], 1) + (master_doc, + 'ricecooker', + 'ricecooker Documentation', + [author], 1) ] # If true, show URL addresses after external links. @@ -332,8 +328,12 @@ # (source start file, target name, title, author, # dir menu entry, description, category) texinfo_documents = [ - (master_doc, 'ricecooker', 'ricecooker Documentation', - author, 'ricecooker', 'One line description of project.', + (master_doc, + 'ricecooker', + 'ricecooker Documentation', + author, + 'ricecooker', + 'One line description of project.', 'Miscellaneous'), ] @@ -439,12 +439,22 @@ # epub_use_index = True -# Example configuration for intersphinx: refer to the Python standard library. -intersphinx_mapping = {'https://docs.python.org/': None} +# Configuration for intersphinx for various LE projects + +intersphinx_mapping = { + 'python': ('https://docs.python.org/3.6/', None), + 'django': ('https://django.readthedocs.org/en/latest/', None), + 'kolibri-user': ('http://kolibri.readthedocs.io/en/latest/', None), + 'kolibri': ('http://kolibri-dev.readthedocs.io/en/latest/', None), + 'studio-user': ('http://kolibri-studio.readthedocs.io/en/latest/', None), +} + # Also accept .md files (via https://github.com/rtfd/recommonmark) source_parsers = { '.md': CommonMarkParser, } +# The suffix(es) of source filenames. +# You can specify multiple suffix as a list of string: source_suffix = ['.rst', '.md'] \ No newline at end of file diff --git a/docs/contributing.rst b/docs/contributing.rst deleted file mode 100644 index 64f4eb55..00000000 --- a/docs/contributing.rst +++ /dev/null @@ -1,125 +0,0 @@ -.. highlight:: shell - -============ -Contributing -============ - -Contributions are welcome, and they are greatly appreciated! Every -little bit helps, and credit will always be given. - -You can contribute in many ways: - -Types of Contributions ----------------------- - -Report Bugs -~~~~~~~~~~~ - -Report bugs at https://github.com/learningequality/ricecooker/issues. - -If you are reporting a bug, please include: - -* Your operating system name and version. -* Any details about your local setup that might be helpful in troubleshooting. -* Detailed steps to reproduce the bug. - -Fix Bugs -~~~~~~~~ - -Look through the GitHub issues for bugs. Anything tagged with "bug" -and "help wanted" is open to whoever wants to implement it. - -Implement Features -~~~~~~~~~~~~~~~~~~ - -Look through the GitHub issues for features. Anything tagged with "enhancement" -and "help wanted" is open to whoever wants to implement it. - -Write Documentation -~~~~~~~~~~~~~~~~~~~ - -ricecooker could always use more documentation, whether as part of the -official ricecooker docs, in docstrings, or even on the web in blog posts, -articles, and such. - -Submit Feedback -~~~~~~~~~~~~~~~ - -The best way to send feedback is to file an issue at https://github.com/learningequality/ricecooker/issues. - -If you are proposing a feature: - -* Explain in detail how it would work. -* Keep the scope as narrow as possible, to make it easier to implement. -* Remember that this is a volunteer-driven project, and that contributions - are welcome :) - -Get Started! ------------- - -Ready to contribute? Here's how to set up `ricecooker` for local development. - -1. Fork the `ricecooker` repo on GitHub. -2. Clone your fork locally:: - - $ git clone git@github.com:learningequality/ricecooker.git - -3. Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed, this is how you set up your fork for local development:: - - $ mkvirtualenv ricecooker - $ cd ricecooker/ - $ python setup.py develop - -4. Create a branch for local development:: - - $ git checkout -b name-of-your-bugfix-or-feature - - Now you can make your changes locally. - -5. When you're done making changes, check that your changes pass flake8 and the tests, including testing other Python versions with tox:: - - $ flake8 ricecooker tests - $ python setup.py test or py.test - $ tox - - To get flake8 and tox, just pip install them into your virtualenv. - -6. Commit your changes and push your branch to GitHub:: - - $ git add . - $ git commit -m "Your detailed description of your changes." - $ git push origin name-of-your-bugfix-or-feature - -7. Submit a pull request through the GitHub website. - - - -Code conventions ----------------- -You can run the code linting tools by using the `pre-commit` package. - -To use them, run:: - - pip install pre-commit - pre-commit install - - -Pull Request Guidelines ------------------------ - -Before you submit a pull request, check that it meets these guidelines: - -1. The pull request should include tests. -2. If the pull request adds functionality, the docs should be updated. Put - your new functionality into a function with a docstring, and add the - feature to the list in README.md. -3. The pull request should work for Python 2.6, 2.7, 3.3, 3.4 and 3.5, and for PyPy. Check - https://travis-ci.org/learningequality/ricecooker/pull_requests - and make sure that the tests pass for all supported Python versions. - -Tips ----- - -To run a subset of tests:: - -$ py.te diff --git a/docs/csv_metadata/README.md b/docs/csv_metadata/README.md new file mode 100644 index 00000000..8b6211e6 --- /dev/null +++ b/docs/csv_metadata/README.md @@ -0,0 +1,25 @@ +CSV Metadata Workflow +===================== + +It is possible to create Kolibri channels by: + - Organizing content items (documents, videos, mp3 files) into a folder hierarchy + on the local file system + - Specifying metadata in the form of CSV files + +The CSV-based workflow is a good fit for non-technical users since it doesn't +require writing any code, but instead can use the Excel to provide all the metadata. + + - [CSV-based workflow README](https://github.com/learningequality/sample-channels/tree/master/channels/csv_channel) + - [Example content folder](https://github.com/learningequality/sample-channels/tree/master/channels/csv_exercises/content) + - [Example Channel.csv metadata file](https://github.com/learningequality/sample-channels/blob/master/channels/csv_channel/content/Channel.csv) + - [Example Content.csv metadata file](https://github.com/learningequality/sample-channels/blob/master/channels/csv_channel/content/Content.csv) + +Organizing the content into folders and creating the CSV metadata files is most +of the work, and can be done by non-programmers. +The generic sushi chef script (`LineCook`) is then used to upload the channel. + + + +CSV Exercises +-------------- +See [this doc](./csv_exercises.md) for the technical details about creating exercises. diff --git a/docs/csv_exercises.md b/docs/csv_metadata/csv_exercises.md similarity index 98% rename from docs/csv_exercises.md rename to docs/csv_metadata/csv_exercises.md index 168b385d..1082488f 100644 --- a/docs/csv_exercises.md +++ b/docs/csv_metadata/csv_exercises.md @@ -12,7 +12,7 @@ The currently supported question types for the CSV workflow are: To prepare a CSV content channel with exercises, you need the usual things -(A channel directory `channeldir`, `Channel.csv`, and `Content.csv`) and two +(A channel directory `channeldir`, `Channel.csv`, and `Content.csv`) and two additional metadata files `Exercises.csv` and `ExerciseQuestions.csv`, the format of which is defined below. diff --git a/docs/souschef.md b/docs/csv_metadata/souschef.md similarity index 93% rename from docs/souschef.md rename to docs/csv_metadata/souschef.md index a2188a29..8649bca1 100644 --- a/docs/souschef.md +++ b/docs/csv_metadata/souschef.md @@ -1,12 +1,19 @@ # Writing a SousChef Kolibri is an open source educational platform to distribute content to areas with -little or no internet connectivity. Educational content is created and edited on [Kolibri Studio](https://studio.learningequality.org), -which is a platform for organizing content to import from the Kolibri applications. The purpose -of this project is to create a *chef*, or a program that scrapes a content source and puts it -into a format that can be imported into Kolibri Studio. This project will read a -given source's content and parse and organize that content into a folder + csv structure, -which will then be imported into Kolibri Studio. +little or no internet connectivity. Educational content is created and edited on +[Kolibri Studio](https://studio.learningequality.org), which is a platform for +organizing content to import from the Kolibri applications. + +A *souchef* is a program that scrapes content from a source website source and +puts the content into a format that can be imported into Kolibri Studio. +This project will read a given source's content and parse and organize that content +into a folder + csv structure, which will then be imported into Kolibri Studio. + + +## Definitions +A `sous chef` script is responsible for scraping content from a source and putting +it into a folder and CSV structure. @@ -34,21 +41,17 @@ which will then be imported into Kolibri Studio. * Run `pip install -r requirements.txt` to install the required python libraries. -## Description - -A sous chef is responsible for scraping content from a source and putting it into a folder -and csv structure. - ## Getting started -Here are some notes and sample code to help you get started. +Here are some notes and sample code to help you get started writing a sous chef. + ### Downloader -The Ricecooker script `utils/downloader.py` has a `read` function that can read from both -urls and file paths. To use: +The Ricecooker module `utils/downloader.py` provides a `read` function that can +read from both urls and file paths. To use: ``` from ricecooker.utils.downloader import read @@ -56,7 +59,6 @@ from ricecooker.utils.downloader import read local_file_content = read('/path/to/local/file.pdf') # Load local file web_content = read('https://example.com/page') # Load web page contents js_content = read('https://example.com/loadpage', loadjs=True) # Load js before getting contents - ``` The `loadjs` option will run the JavaScript code on the webpage before reading @@ -350,5 +352,8 @@ zipper.contains('index.html') # Returns True zipper.contains('css/style.css') # Returns False ``` +See the above example on BeautifulSoup on how to parse html. + + + -(See above example on BeautifulSoup on how to parse html) diff --git a/docs/developer/README.md b/docs/developer/README.md new file mode 100644 index 00000000..164930cf --- /dev/null +++ b/docs/developer/README.md @@ -0,0 +1,84 @@ +Notes for ricecooker library developers +======================================= + + + + +Computed identifiers +-------------------- + +### Channel ID + +The `channel_id` (uuid hex str) property is an important identifier that: + - Is used in the wire formats used to communicate between `ricecooker` and Kolibri Studio + - Appears as part of URLs for on both Kolibri Studio and Kolibri + - Determines the filename for the channel sqlite3 database file that Kolibri imports + from Kolibri Studio. + +To compute the `channel_id`, you need to know the channel's `source_domain` (a.k.a. `channel_info['CHANNEL_SOURCE_DOMAIN']`) +and the channel's `source_id` (a.k.a `channel_info['CHANNEL_SOURCE_ID']`): + + import uuid + channel_id = uuid.uuid5( + uuid.uuid5(uuid.NAMESPACE_DNS, source_domain), + source_id + ).hex + +This above code snippet is useful if you know the `source_domain` and `source_id` +and you want to determine the `channel_id` without crating a `ChannelNode` object. + +The `ChannelNode` class implements the following methods: + + class ChannelNode(Node): + def get_domain_namespace(self): + return uuid.uuid5(uuid.NAMESPACE_DNS, self.source_domain) + def get_node_id(self): + return uuid.uuid5(self.get_domain_namespace(), self.source_id) + +Given a channel object `ch`, you can find its id using `channel_id = ch.get_node_id().hex`. + + + + +### Node IDs + +Content nodes within the Kolibri ecosystem have the following identifiers: + - `source_id` (str): arbitrary string used to identify content item within the + source website, e.g., the a database id or URL. + - `node_id` (uuid): an identifier for the content node within the channel tree + - `content_id` (uuid): an identifier derived from the channel source_domain + and the content node's `source_id` used for tracking a user interactions with + the content node (e.g. video watched, or exercise completed). + +When a particular piece of content appears in multiple channels, or in different +places within a tree, the `node_id` of each occurrence will be different, but the +`content_id` of each item will be the same for all copies. In other words, the +`content_id` keeps track of the "is identical to" information about content nodes. + +Content nodes inherit from the `TreeNode` class, which implements the following methods: + + class TreeNode(Node): + def get_domain_namespace(self): + return self.domain_ns if self.domain_ns else self.parent.get_domain_namespace() + def get_content_id(self): + return uuid.uuid5(self.get_domain_namespace(), self.source_id) + def get_node_id(self): + return uuid.uuid5(self.parent.get_node_id(), self.get_content_id().hex) + +The `content_id` identifier is computed based on the channel source domain, +and the `source_id` attribute of the content node. To find the `content_id` hex +value for a content node `node`, use `content_id = node.get_content_id().hex`. + + +The `node_id` of a content nodes in a tree is computed based on the parent node's +`node_id` and current node's `content_id` + + + + def get_node_id(self): + return uuid.uuid5(self.get_domain_namespace(), self.source_id) + + return uuid.uuid5(self.get_domain_namespace(), self.source_id) + + + \ No newline at end of file diff --git a/docs/developer/authors.rst b/docs/developer/authors.rst new file mode 120000 index 00000000..49689011 --- /dev/null +++ b/docs/developer/authors.rst @@ -0,0 +1 @@ +../../AUTHORS.rst \ No newline at end of file diff --git a/docs/developer/contributing.rst b/docs/developer/contributing.rst new file mode 120000 index 00000000..e9a8ba64 --- /dev/null +++ b/docs/developer/contributing.rst @@ -0,0 +1 @@ +../../CONTRIBUTING.rst \ No newline at end of file diff --git a/docs/daemonization.md b/docs/developer/daemonization.md similarity index 59% rename from docs/daemonization.md rename to docs/developer/daemonization.md index 70c9a837..d449cc9b 100644 --- a/docs/daemonization.md +++ b/docs/developer/daemonization.md @@ -1,4 +1,3 @@ - Daemon mode =========== Running a chef scripts with the `--daemon` option will make it listen to remote @@ -18,15 +17,20 @@ Local control channel --------------------- To also enable local UNIX domain sockets commands, start the chef script using - ./chef.py --daemon --cmdsock=/var/run/chefname.sock + ./chef.py --daemon --cmdsock=/var/run/cmdsocks/channelA.sock -Once the chef is running, a chef run can be scheduled using the following command: +Once the chef is running, a chef run can be started by sending the appropriate +json data to the UNIX domain socket `/var/run/cmdsocks/channelA.sock`. +Use the `nc` command for this (install netcat using `apt-get install netcat-openbsd`). - /bin/echo '{"command":"start"}' | /bin/nc -U /var/run/chefname.sock + /bin/echo '{"command":"start"}' | /bin/nc -UN /var/run/cmdsocks/channelA.sock If you need to override chef run `args` or `options` use: - /bin/echo '{"command":"start", "args":{"publish":true}, "options":{"lang":"en"} }' | /bin/nc -U /var/run/chefname.sock + /bin/echo '{"command":"start", "args":{"publish":true}, "options":{"lang":"en"} }' | /bin/nc -UN /var/run/cmdsocks/channelA.sock The above command will run the chef, re-using the command line args and options, but setting `publish` to `True` and also providing the keyword option `lang=en`. + +Chef runs can be scheduled by setting up cronjobs for the above commands. + diff --git a/docs/design.md b/docs/developer/design_cli.md similarity index 87% rename from docs/design.md rename to docs/developer/design_cli.md index ec81a983..202e8a99 100644 --- a/docs/design.md +++ b/docs/developer/design_cli.md @@ -1,17 +1,17 @@ -New `ricecooker` API -==================== +Command line interface +====================== -What: a new way to parse command line arguments and use the ricecooker - -Why: need new command line options for sushibar, and to allow chef scripts to define their own - -How: subclassing and `argparse` parsers linked via `parents` +This document describes logic `ricecooker` uses to parse command line arguments. +Under normal use cases you shouldn't need modify the command line parsing, but +you need to understand how `argparse` works if you want to add new command line +arguments for your chef script. Summary ------- A sushi chef script using the new API looks like this: + #!/usr/bin/env python ... ... @@ -36,11 +36,11 @@ The call to `chef.main()` results in the following sequence of six calls: 1. main() 2. parse_args_and_options() 3. run(args, options) - 4. uploadchannel(chef, **args, **options) + 4. uploadchannel(chef, *args, **options) ... - 5. get_channel(*kwargs) + 5. get_channel(**kwargs) ... - 6. construct_channel(*kwargs) + 6. construct_channel(**kwargs) ... ... DONE @@ -134,11 +134,11 @@ The call to `chef.main()` results in the following sequence of events: 1. main() 2. parse_args_and_options() 3. run(args, options) - 4. uploadchannel(chef, **args, **options) + 4. uploadchannel(chef, *args, **options) ... ... - 5. construct_channel(*kwargs) - 5'. construct_channel(*kwargs) + 5. construct_channel(**kwargs) + 5'. construct_channel(**kwargs) ... ... DONE @@ -195,18 +195,3 @@ to the SushiBar server and listens for more commands. -Possible alternative to get_channel method API ----------------------------------------------- - -Instead of `get_channel` we can rely on the info in `self.channel_info` (dict). -If they need extensibility, they can define a `get_channel_info` method that -returns a dict (this way users don't need to know what a `ChannelNode` is to provide the info). - -The sushi bar integration code can get the channel_id using - - uuid.uuid5( - uuid.uuid5(uuid.NAMESPACE_DNS, channel_info['CHANNEL_SOURCE_DOMAIN']), - channel_info['CHANNEL_SOURCE_ID'] - ).hex - - diff --git a/docs/developer/sushops.md b/docs/developer/sushops.md new file mode 100644 index 00000000..fdbc9724 --- /dev/null +++ b/docs/developer/sushops.md @@ -0,0 +1,67 @@ +SushOps +======= +SushOps engineers (also called ETL engineers) are responsible for making sure +the overall content pipeline runs smoothly. Assuming the [chefops][./chefops.md] +is done right, running the chef script should be as simple as running a single command. +SushOps engineers need to make sure not only that chef is running correctly, +but also monitor content on the Sushibar dashboard, in Kolibri Studio, and in +downstream remixed channels, and in Kolibri installations. + +SushOps is an internal role to Learning Equality but we'll document the responsibilities +here for convenience, since this role is closely related to the `ricecooker` library. + + + +Project management and support +------------------------------ +SushOps manage and support developers working on new chefs scripts, by reviewing +spec sheets, writing technical specs, preregistering chefs on sushibar, crating +necessary git repos, reviewing pull requests, chefops, and participating in Q/A. + + +Cheffing servers +---------------- +Chef scripts run on various cheffing servers, equipped with appropriate storage +space and processing power (if needed for video transcoding). Currently we have: + - CPU-intensive chefs running on `vader` + - other chefs running on `cloud-kitchen` + - various other chefs running on partner orgs infrastructure + + +Scheduled runs +-------------- +Chefs scripts can be scheduled to run automatically on a periodic basis, e.g., +once a month. In between runs, chef scripts stay dormant (daemonized). +Scheduled chefs run by default with the `--stage` argument in order not to +accidentally overwrite the currently active content tree on Studio with a broken one. +If the channel content is relatively unchanged and raises no flags for review, +the staged tree will be ACTIVATED, and the channel PUBLISHed automatically as well. + + +Chef inventory +-------------- +In order to keep track of all the sushi chefs (30+ and growing), SushOps people +maintain this spreadsheet listing and keep it up-to-date for all chefs: + - chef_name, short, unique identified, e.g., `khan_academy_en` + - chef repo url + - command necessary to run this chef, e.g., `./kachef.py ... lang=en` + - scheduled run settings (crontab format) + +This spreadsheet is used by humans as an inventory of the chef scripts currently +in operation. The automation scripts use the same data to provision chef scripts +environments, and setting up scheduling for them on the LE cheffing servers. + + +SushOps tooling and automation +------------------------------ +Some of the more repetitive system administration tasks have been automated using +`fab` commands. + + fab -R cloud-kitchen setup_chef:chef_name # clones the chef_name repo and installs requirements + fab -R cloud-kitchen update:chef_name # git fetch and git reset --hard to get latest chef code + fab -R cloud-kitchen run_chef:chef_name # runs the chef + fab -R cloud-kitchen schedule_chef:chef_name # set up chef to run as cronjob + +TODO: decide where to put sharable fab commands `ricecooker.utils.fabfile` ? + + diff --git a/docs/exercises.md b/docs/exercises.md index 11c4ef5d..8f201ad1 100644 --- a/docs/exercises.md +++ b/docs/exercises.md @@ -1,20 +1,82 @@ -Exercise Questions -================== +Exercise and exercise questions +=============================== +`ExerciseNode`s are special objects that have questions used for assessment. -Base class - class BaseQuestion: +In order to set the criteria for completing exercises, you must set __exercise_data__ +to equal a dict containing a mastery_model field based on the mastery models provided under `le_utils.constants.exercises`. +If no data is provided, the rice cooker will default to mastery at 3 of 5 correct. For example: +``` +node = ExerciseNode( + exercise_data={ + 'mastery_model': exercises.M_OF_N, + 'randomize': True, + 'm': 3, + 'n': 5, + }, + ... +) +``` -Perseus format (used by KA): - class PerseusQuestion(BaseQuestion) +To add a question to your exercise, you must first create a question model from `ricecooker.classes.questions`. +Your sushi chef is responsible for determining which question type to create. +Here are the available question types: + - __SingleSelectQuestion__: questions that only have one right answer (e.g. radio button questions) + - __MultipleSelectQuestion__: questions that have multiple correct answers (e.g. check all that apply) + - __InputQuestion__: questions that have text-based answers (e.g. fill in the blank) + - __PerseusQuestion__: special question type for pre-formatted perseus questions -Currently supported internal formats: - class SingleSelectQuestion(BaseQuestion) - class MultipleSelectQuestion(BaseQuestion) - class InputQuestion(BaseQuestion) +Each question class has the following attributes that can be set at initialization: + - __id__ (str): question's unique id + - __question__ (str): question body, in plaintext or Markdown format; + math expressions must be in Latex format, surrounded by `$`, e.g. `$f(x) = 2^3$`. + - __answers__ ([{'answer':str, 'correct':bool}]): answers to question, also in plaintext or Markdown + - __hints__ (str or [str]): optional hints on how to answer question, also in plaintext or Markdown + + +To set the correct answer(s) for MultipleSelectQuestions, you must provide a list +of all of the possible choices as well as an array of the correct answers +(`all_answers [str]`) and `correct_answers [str]` respectively). +``` +question = MultipleSelectQuestion( + question = "Select all prime numbers.", + correct_answers = ["2", "3", "5"], + all_answers = ["1", "2", "3", "4", "5"], + ... +) +``` + +To set the correct answer(s) for SingleSelectQuestions, you must provide a list +of all possible choices as well as the correct answer (`all_answers [str]` and +`correct_answer str` respectively). + +``` +question = SingleSelectQuestion( + question = "What is 2 x 3?", + correct_answer = "6", + all_answers = ["2", "3", "5", "6"], + ... +) +``` + +To set the correct answer(s) for InputQuestions, you must provide an array of +all of the accepted answers (`answers [str]`). +``` +question = InputQuestion( + question = "Name a factor of 10.", + answers = ["1", "2", "5", "10"], +) +``` + +To add images to a question's question, answers, or hints, format the image path +with `'![](path/to/some/file.png)'` and the rice cooker will parse them automatically. + + +Once you have created the appropriate question object, add it to an exercise object +with `exercise_node.add_question(question)` diff --git a/docs/figures/content_pipeline_diagram.png b/docs/figures/content_pipeline_diagram.png new file mode 100644 index 00000000..4dd89504 Binary files /dev/null and b/docs/figures/content_pipeline_diagram.png differ diff --git a/docs/files.md b/docs/files.md index 551cc859..35e6bae6 100644 --- a/docs/files.md +++ b/docs/files.md @@ -1,30 +1,103 @@ - Files ===== +Each `ricecooker` content node is associated with one or more files stored in a +content-addressable file storage system. For example, to store the file `sample.pdf` +we first compute `md5` hash of its contents (say `abcdef00000000000000000000000000`) +then store the file at the path `storage/a/b/abcdef00000000000000000000000000.pdf`. +The same storage mechanism is used on Kolibri Studio and Kolibri applications. + + +File objects +------------ +The following file classes are defined in the module `ricecooker.classes.files`: + + AudioFile # .mp3 + DocumentFile # .pdf + HTMLZipFile # .zip containing HTML,JS,CSS + VideoFile # .mp4 (`path` is local file system or url) + WebVideoFile # .mp4 (downloaded from `web_url`) + YouTubeVideoFile # .mp4 (downloaded from youtube based on `youtube_id`) + SubtitleFile # .vtt (`path` is local file system or url) + YouTubeSubtitleFile # .vtt (downloaded from youtube based on `youtube_id` and `language`) + ThumbnailFile # .png/.jpg/.jpeg (`path` is local file system or url) + Base classes +------------ +The file classes extent the base classes `File(object)` and `DownloadFile(File)`. +When creating a file object, you must specify the following attributes: + - `path` (str): this can be either local path like `dir/subdir/file.ext`, or + a URL like 'http://site.org/dir/file.ext'. + - `language` (str or `le_utils` language object): what is the language is the + file contents. + + + +### Path +The `path` attribute can be either a path on the local filesystem relative to the +current working directory of the chef script, or the URL of a web resource. + +### Language +The Python package `le-utils` defines the internal language codes used throughout +the Kolibri platform (e.g. `en`, `es-MX`, and `zul`). To find the internal language +code for a given language, you can locate it in the [lookup table](https://github.com/learningequality/le-utils/blob/master/le_utils/resources/languagelookup.json), +or use one of the language lookup helper functions defined in `le_utils.constants.languages`: + - `getlang() --> lang_obj`: basic lookup used to ensure `` is a valid + internal language code (otherwise returns `None`). + - `getlang_by_name() --> lang_obj`: lookup by name, e.g. `French` + - `getlang_by_native_name() --> lang_obj`: lookup by native name, e.g., `français` + - `getlang_by_alpha2() --> lang_obj`: lookup by standard two-letter code, e.g `fr` - class File(object) - class NodeFile(File) - class DownloadFile(File) +You can either pass `lang_obj` as the `language` attribute when creating nodes and files, +or pass the internal language code (str) obtained from the property `lang_obj.code`. +See [languages][./languages.md] to read more about language codes. -Audio - class AudioFile(DownloadFile) +Audio files +----------- +Use the `AudioFile(DownloadFile)` class to store `mp3` files. -PDFs - class DocumentFile(DownloadFile) + audio_file = AudioFile( + path='dir/subdir/lecture_recording.mp3', + language=getlang('en').code + ) -HTML + CSS + JS in a zip file +Document files +-------------- +Use the `DocumentFile` to store PDF documents - class HTMLZipFile(DownloadFile) + document_file = DocumentFile( + path='dir/subdir/lecture_slides.mp4', + language=getlang('en').code + ) -Videos + +HTMLZip files +------------- +The `HTML5ZipFile` class is a generic zip container for web content like HTML, CSS, +and JavaScript. To be a valid `HTML5ZipFile` file, the file must have a `index.html` +in its root. The file `index.html` will be loaded within a sandboxed iframe when +this content item is accessed on Kolibri. + +Chef authors are responsible for scraping the HTML and all the related JS, CSS, +and images required to render the web content, and creating the zip file. +Creating a `HTML5ZipFile` is then done using + + document_file = HTML5ZipFile( + path='/tmp/interactive_js_simulation.zip', + language=getlang('en').code + ) + + + +Videos files +------------ +The following file classes can be added to the `VideoNode`s: class VideoFile(DownloadFile) class WebVideoFile(File) @@ -32,15 +105,97 @@ Videos class SubtitleFile(DownloadFile) class YouTubeSubtitleFile(File) -Thumbs - class ThumbnailFile(ThumbnailPresetMixin, DownloadFile) - class TiledThumbnailFile(ThumbnailPresetMixin, File) - class ExtractedVideoThumbnailFile(ThumbnailFile) +To create `VideoFile`, you need the code + + video_file = VideoFile( + path='dir/subdir/lecture_video_recording.mp4', + language=getlang('en').code + ) + + +VideoFiles can also be initialized with __ffmpeg_settings__ (dict), +which will be used to determine compression settings for the video file. +``` +video_file = VideoFile( + path = "file:///path/to/file.mp4", + ffmpeg_settings = {"max_width": 480, "crf": 28}, + language=getlang('en').code +) +``` + +WebVideoFiles must be given a __web_url__ (str) to a video on YouTube or Vimeo, +and YouTubeVideoFiles must be given a __youtube_id__ (str). + +``` +video_file2 = WebVideoFile( + web_url = "https://vimeo.com/video-id", + language=getlang('en').code, +) + +video_file3 = YouTubeVideoFile( + youtube_id = "abcdef", + language=getlang('en').code, +) +``` + +WebVideoFiles and YouTubeVideoFiles can also take in __download_settings__ (dict) +to determine how the video will be downloaded and __high_resolution__ (boolean) +to determine what resolution to download. + + +Subtitle files can be created using +``` +subs_file = SubtitleFile( + path = "file:///path/to/file.vtt", + language = languages.getlang('en').code, +) +``` + +You can also get subtitles using `YouTubeSubtitleFile` which takes a `youtube_id` +and youtube `language` code (may be different from internal language codes). + + + + +Thumbnail files +--------------- +The class `ThumbnailFile` defined thumbnails that can be added to channel, +topic nodes, and content nodes. The extensions `.png`, `.jpg`, and `.jpeg` and supported. + +The recommended size for thumbnail images is 420px by 236px (aspect ratio 16:9). + + + +File size limits +---------------- +Kolibri Studio does not impose any max-size limits for files uploaded, but chef +authors need to keep in mind that content channels will often be downloaded over +slow internet connections and viewed on devices with limited storage. + +Below are some general guidelines for handling video files: + - Short videos (5-10 mins long) should be roughly less then 15MB + - Longer video lectures (1 hour long) should not be larger than 200MB + - High-resolution videos should be converted to lower resolution formats: + Here are some recommended choices for video vertical resolution: + - Use max height of `480` for videos that work well in low resolution (most videos) + - Use max height of `720` for high resolution videos (lectures with writing on board) + - Ricecooker can handle the video compressions for you if you specify the + `--compress` command line argument, or by setting the `ffmpeg_settings` property + when creating `VideoFile`s. The default values for `ffmpeg_settings` are as follows: + ``` + ffmpeg_settings = {'crf':32, 'max_width':"'trunc(oh*a/2)*2:min(ih,480)'" } + ``` + - The `ffmpeg` option `crf` stands for Constant Rate Factor and is very useful + for controlling overall video quality. Setting `crf=24` produces high quality + video (and possibly large file size), `crf=28` is a mid-range quality, and + values of `crf` above 30 produce highly-compressed videos with small size. + +PDF files are usually not large, but PDFs with many pages (more than 50 pages) +can be difficult to views and browse on devices with small screens, so we +recommend that long PDF documents be split into separate parts. -Images (supporting classes for Exercises) +Note: Kolibri Studio imposes a file storage quota on a per-user basis. By default +the storage limit for new accounts is 500MB. Please get in touch with the content +team by email (`content@le...`) if you need a quota increase. - class Base64ImageFile(ThumbnailPresetMixin, File) - class _ExerciseBase64ImageFile(Base64ImageFile) - class _ExerciseImageFile(DownloadFile) - class _ExerciseGraphieFile(DownloadFile) diff --git a/docs/htmlapps.md b/docs/htmlapps.md index 40e78952..19e93bb2 100644 --- a/docs/htmlapps.md +++ b/docs/htmlapps.md @@ -9,7 +9,6 @@ the zip file. - Example of HTML5App nodes ------------------------- @@ -19,29 +18,33 @@ Example of HTML5App nodes * [medium complexity example](http://tessa-demo.learningequality.org/learn/#/45605d184d985e74960015190a6f4e4f/recommended/ecb158bff182511db6327be6f8a91891) * Download all parts of a multi-part lesson into a single HTML5Zip file * Original source didn't have a "table of contents" so added manually (really bad CSS I need to fix in final version) -* [complex example](https://kolibridemo.learningequality.org/learn/#/197934f144305350b5820c7c4dd8e194/recommended/4255a90d5e7352ee8e59c8d4d97014ce) - * Full javascript application +* [complex example](http://kolibridemo.learningequality.org/learn/#/topics/c/d165c4fbc3bd5bbeaf3e51360965af29) + * Full javascript application packaged as a zip file + * Source: [sushi-chef-phet](https://github.com/learningequality/sushi-chef-phet/blob/master/chef.py#L104) Links and navigation -------------------- -It's currently not possible to have navigation links between different HTML5App nodes, but relative links within the same zip file work (since rendered in same iframe). It's important to "cut" the source websites content into appropriately sized chunks: +It's currently not possible to have navigation links between different HTML5App nodes, +but relative links within the same zip file work (since they are rendered in same iframe). +It's important to "cut" the source websites content into appropriately sized chunks: + + - As small as possible so that resources are individually trackable, assignable, and reusable in multiple places + - But not too small, e.g. if a lesson contains three parts intended to be followed one after the other, then all three parts should be included in a single HTML5App with internal links + - Use nested folder structure to represent complex sources. + Whenever an HTML page that acts as a "container" with links to other pages + and PDFs we try to turn it into a Folder and put content items inside it. + Nested folders is main way of representing structured content. + -* As small as possible so that resources are individually trackable, assignable, and reusable in multiple places -* But not too small, e.g. if a lesson contains three parts intended to be followed one after the other, then all three parts should be included in a single HTML5App with internal links -* Use nested folder structure to represent complex sources. Whenever an HTML page that acts as a "container" with links to other pages and PDFs we try to turn it into a Folder and put content items inside it. Basically, nested folders is our only current way of representing structured content. HTML Writer utility class ------------------------- - The class `HTMLWriter` in `ricecooker.utils.html_writer` provides a basic helper methods for creating files within a zip file. -Docs: [docs/souschef.md](https://github.com/learningequality/ricecooker/blob/master/docs/souschef.md#htmlwriter-ricecookerutilshtml_writerpy) - -source code: +See the source code: [ricecooker/utils/html_writer.py](https://github.com/learningequality/ricecooker/blob/master/ricecooker/utils/html_writer.py#L5) - diff --git a/docs/index.rst b/docs/index.rst index 3eb29eeb..cc2dfbd7 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -1,29 +1,79 @@ -.. ricecooker documentation master file, created by - sphinx-quickstart on Wed Sep 21 19:57:19 2016. - You can adapt this file completely to your liking, but it should at least - contain the root `toctree` directive. +Welcome to the ricecooker docs! +=============================== -Welcome to ricecooker's documentation! -====================================== -Contents: +Quickstart +---------- +Developers who are new to the ``ricecooker`` library can get started here. + +.. toctree:: + :maxdepth: 1 + + README + tutorial/index + + + +Kolibri content platform +------------------------ +The Kolibri content platform is described in the following docs, which should be +accessible to both technical and non-technical audiences. + +.. toctree:: + :maxdepth: 1 + + platform/README + platform/content_types + + + +Ricecooker API reference +------------------------ +The detailed information for content developers (chef authors) is presented here: .. toctree:: :maxdepth: 2 - README - installation usage - tutorial/index - design nodes files + languages + htmlapps exercises + installation + chefops + + +Ricecooker Utils +---------------- +The ``ricecooker`` library includes a number of utilities and helper functions: + +.. toctree:: + :maxdepth: 1 + parsing_html + csv_metadata/README + csv_metadata/csv_exercises + csv_metadata/souschef + + + +Ricecooker developer docs +------------------------- +To learn about the inner workings of the ``ricecooker`` library, consult the following: + +.. toctree:: + :maxdepth: 1 + + developer/README + chefops + developer/daemonization + developer/sushops + developer/design_cli + developer/contributing + developer/authors modules history - authors - contributing .. automodule:: ricecooker.classes diff --git a/docs/installation.md b/docs/installation.md new file mode 100644 index 00000000..0e6ead99 --- /dev/null +++ b/docs/installation.md @@ -0,0 +1,76 @@ +Installation +============ +The `ricecooker` library is published as a Python3-only [package on PyPI](https://pypi.python.org/pypi/ricecooker). + + +Software prerequisites +---------------------- +The `ricecooker` library requires Python 3.5+ and some additional tools like +`ffmpeg` for video compression, and `phantomjs` for scraping webpages that +require JavaScript to run before the DOM is rendered. + +On a Debian-like linux box, you can install all the necessary packages using: + + apt-get install build-essential gettext pkg-config \ + python3 python3-pip python3-dev python3-virtualenv virtualenv python3-tk \ + linux-tools libfreetype6-dev libxft-dev libwebp-dev libjpeg-dev libmagickwand-dev \ + ffmpeg phantomjs + +Mac OS X users can install the necessary software using Homebrew: + + brew install freetype imagemagick@6 ffmpeg phantomjs + brew link --force imagemagick@6 + + + +Stable release +-------------- +To install `ricecooker`, run this command in your terminal: + + pip install ricecooker + +This is the preferred method to install `ricecooker`, as it will always install +the most recent stable release. + +If you don't have `pip` installed, then this +[Python installation guide](http://docs.python-guide.org/en/latest/starting/installation/) +will guide you through the process of setting up. + +Note: We recommend you install `ricecooker` in a Python `virtualenv` specific for +cheffing work, rather that globally for your system python. For information about +creating and activating a virtualenv, you can follow the instructions provided +[here](http://kolibri-dev.readthedocs.io/en/develop/start/getting_started.html#virtual-environment). + + + +Install from github +------------------- +You can install `ricecooker` directly from the [github repo](https://github.com/learningequality/ricecooker) +using the following command: + + pip install git+https://github.com/learningequality/ricecooker + +Occasionally, you'll want to install a `ricecooker` version from a specific branch, +instead of the default branch version. This is the way to do this: + + pip install -U git+https://github.com/learningequality/ricecooker@somebranchname + +The `-U` flag forces the update instead of reusing any previously installed/cached versions. + + +Install from source +------------------- +Another option for installing `ricecooker` is to clone the repo and install using: + + git clone git://github.com/learningequality/ricecooker + cd ricecooker + pip install -e . + +The flag `-e` installs `ricecooker` in "editable mode," which means you can now +make changes to the source code and you'll see the changes reflected immediately. +This installation method very useful if you're working around a bug in `ricecooker` +or extending the crawling/scraping/http/html utilities in `ricecooker/utils/`. + +Speaking of bugs, if you ever run into problems while using `ricecooker`, you should +let us know by [opening an issue](https://github.com/learningequality/ricecooker/issues). + diff --git a/docs/installation.rst b/docs/installation.rst deleted file mode 100644 index f14b0df3..00000000 --- a/docs/installation.rst +++ /dev/null @@ -1,51 +0,0 @@ -.. highlight:: shell - -============ -Installation -============ - - -Stable release --------------- - -To install ricecooker, run this command in your terminal: - -.. code-block:: console - - $ pip install ricecooker - -This is the preferred method to install ricecooker, as it will always install the most recent stable release. - -If you don't have `pip`_ installed, this `Python installation guide`_ can guide -you through the process. - -.. _pip: https://pip.pypa.io -.. _Python installation guide: http://docs.python-guide.org/en/latest/starting/installation/ - - -From sources ------------- - -The sources for ricecooker can be downloaded from the `Github repo`_. - -You can either clone the public repository: - -.. code-block:: console - - $ git clone git://github.com/jayoshih/ricecooker - -Or download the `tarball`_: - -.. code-block:: console - - $ curl -OL https://github.com/jayoshih/ricecooker/tarball/master - -Once you have a copy of the source, you can install it with: - -.. code-block:: console - - $ python setup.py install - - -.. _Github repo: https://github.com/jayoshih/ricecooker -.. _tarball: https://github.com/jayoshih/ricecooker/tarball/master diff --git a/docs/languages.md b/docs/languages.md new file mode 100644 index 00000000..8ac6fea1 --- /dev/null +++ b/docs/languages.md @@ -0,0 +1,53 @@ +Kolibri Language Codes +---------------------- + +The file [le_utils/constants/languages.py](https://github.com/learningequality/le-utils/blob/master/le_utils/constants/languages.py) +and the lookup table in [le_utils/resources/languagelookup.json](https://github.com/learningequality/le-utils/blob/master/le_utils/resources/languagelookup.json) +define the internal representation for languages codes used by Ricecooker, Kolibri, +and Kolibri Studio to identify content items in different languages. + +The internal representation uses a mixture of two-letter codes (e.g. `en`), +two-letter-and-country code (e.g. `pt-BR` for Brazilian Portuguese), +and three-letter codes (e.g., `zul` for Zulu). + +In order to make sure you have the correct language code when interfacing with +the Kolibri ecosystem (e.g. when uploading new content to Kolibri Studio), you +must lookup the language object using the helper method `getlang`: + +``` +>>> from le_utils.constants.languages import getlang +>>> language_obj = getlang('en') # lookup language using language code +>>> language_obj +Language(native_name='English', primary_code='en', subcode=None, name='English', ka_name=None) +``` +The function `getlang` will return `None` if the lookup fails. In such cases, you +can try lookup by name or lookup by alpha2 code (ISO_639-1) methods defined below. + +Once you've successfully looked up the language object, you can obtain the internal +representation language code from the language object's `code` attribute: +``` +>>> language_obj.code +'en' +``` +The Ricecooker API expects these internal representation language codes will be +supplied for all `language` attributes (channel language, node language, and files language). + + + +### More lookup helper methods + +The helper method `getlang_by_name` allows you to lookup a language by name: +``` +>>> from le_utils.constants.languages import getlang_by_name +>>> language_obj = getlang_by_name('English') # lookup language by name +>>> language_obj +Language(native_name='English', primary_code='en', subcode=None, name='English', ka_name=None) +``` + +The module `le_utils.constants.languages` defines two other language lookup methods: + - Use `getlang_by_native_name` for lookup up names by native language name, + e.g., you look for 'Français' to find French. + - Use `getlang_by_alpha2` to perform lookups using the standard two-letter codes + defined in [ISO_639-1](https://en.wikipedia.org/wiki/ISO_639-1) that are + supported by the `pycountries` library. + \ No newline at end of file diff --git a/docs/nodes.md b/docs/nodes.md index 2ad80ccd..3102b949 100644 --- a/docs/nodes.md +++ b/docs/nodes.md @@ -1,28 +1,286 @@ - Nodes ===== +Kolibri channels are tree-like structures that consist of different types of topic +nodes (folders) and various content nodes (document, audio, video, html, exercise). +The module `ricecooker.classes.nodes` defines helper classes to represent each of +these supported content types and provide validation logic to check channel content +is valid before uploading it to Kolibri Studio. + +The purpose of the Node classes is to represent the channel tree structure and +store metadata necessary for each type of content item, while the actual content +data is stored in file objects (defined in `ricecooker.classes.files`) and exercise +questions object (defined in `ricecooker.classes.questions`) which are created separately. + + + +Overview +-------- +The following diagram lists all the node classes defined in `ricecooker.classes.nodes` +and shows the associated file and question classes that content nodes can contain. + + ricecooker.classes.nodes + | + | ricecooker.classes.files + class Node(object) | + class ChannelNode(Node) | + class TreeNode(Node) | + class TopicNode(TreeNode) | + class ContentNode(TreeNode) | + class AudioNode(ContentNode) files = [AudioFile] + class DocumentNode(ContentNode) files = [DocumentFile] + class HTML5AppNode(ContentNode) files = [HTMLZipFile] + class VideoNode(ContentNode) files = [VideoFile, WebVideoFile, YouTubeVideoFile, + SubtitleFile, YouTubeSubtitleFile] + class ExerciseNode(ContentNode) questions = [SingleSelectQuestion, + MultipleSelectQuestion, + InputQuestion, + PerseusQuestion] + | + | + ricecooker.classes.questions + + +In the remainder of this document we'll describe in full detail the metadata that +is needed to specify different content nodes. + +For more info about file objects see page [files](./files.md) and to learn about +the different exercise questions see the page [exercises](./exercises.md). + + + +Content node metadata +--------------------- +Each node has the following attributes: + - __source_id__ (str): content's original id + - __title__ (str): content's title + - __license__ (str or License): content's license id or object + - __language__ (str or lang_obj): language for the content node + - __description__ (str): description of content (optional) + - __author__ (str): who created the content (optional) + - __thumbnail__ (str or ThumbnailFile): path to thumbnail or file object (optional) + - __files__ ([FileObject]): list of file objects for node (optional) + - __extra_fields__ (dict): any additional data needed for node (optional) + - __domain_ns__ (uuid): who is providing the content (e.g. learningequality.org) (optional) + +**IMPORTANT**: nodes representing distinct pieces of content MUST have distinct `source_id`s. +Each node has a `content_id` (computed as a function of the `source_domain` and +the node's `source_id`) that uniquely identifies a piece of content within Kolibri +for progress tracking purposes. For example, if the same video occurs in multiple +places in the tree, you would use the same `source_id` for those nodes -- but +content nodes that aren't for that video need to have different `source_id`s. + + + +### Licenses +All content nodes within Kolibri and Kolibri Studio must have a license. The file +[le_utils/constants/licenses.py](https://github.com/learningequality/le-utils/blob/master/le_utils/constants/licenses.py) +contains the constants used to identify the license types. These constants are meant +to be used in conjunction with the helper method `ricecooker.classes.licenses.get_license` +to create `Licence` objects. + +To initialize a license object, you must specify the license type and the +`copyright_holder` (str) which identifies a person or an organization. For example: +``` +from ricecooker.classes.licenses import get_license +from le_utils.constants import licenses + +license_obj = get_license(licenses.CC_BY, copyright_holder="Khan Academy") +``` + +Note: The `copyright_holder` field is required for all License types except for +the public domain license for which `copyright_holder` can be None. Everyone owns +the stuff in the public domain. + + +### Languages +The Python package `le-utils` defines the internal language codes used throughout +the Kolibri platform (e.g. `en`, `es-MX`, and `zul`). To find the internal language +code for a given language, you can locate it in the [lookup table](https://github.com/learningequality/le-utils/blob/master/le_utils/resources/languagelookup.json), +or use one of the language lookup helper functions defined in `le_utils.constants.languages`: + - `getlang() --> lang_obj`: basic lookup used to ensure `` is a valid + internal language code (otherwise returns `None`). + - `getlang_by_name() --> lang_obj`: lookup by name, e.g. `French` + - `getlang_by_native_name() --> lang_obj`: lookup by native name, e.g., `français` + - `getlang_by_alpha2() --> lang_obj`: lookup by standard two-letter code, e.g `fr` + + +You can either pass `lang_obj` as the `language` attribute when creating nodes, +or pass the internal language code (str) obtained from the property `lang_obj.code`: +``` +from le_utils.constants.languages import getlang_by_native_name + +lang_obj = getlang_by_native_name('français') +print(lang_obj # Language(native_name='Français', primary_code='fr', subcode=None, name='French') +print(lang_obj.code) # fr +``` +See [languages][./languages.md] to read more about language codes. -Base class - class Node(object) -Channel and it's tree's root node: +### Thumbnails +Thumbnails can be passed in as a local filesystem path to an image file (str) or +a `ThumbnailFile` object. +The recommended size for thumbnail images is 420px by 236px (aspect ratio 16:9). - class ChannelNode(Node) - class TreeNode(Node) -Folder nodes +Topic nodes +----------- +Topic nodes are folder-like containers that are used to organize the channel's content. + + + from ricecooker.classes import TopicNode + from le_utils.constants.languages import getlang + + topic_node = TopicNode( + title='The folder name', + description='A longer description of what the folder contains', + source_id='', + language='en', + thumbnail=None, + author='', + ) + +It is highly recommended to find suitable thumbnail images for topic nodes. The +presence of thumbnails will make the content more appealing and easier to browse. +The `--thumbnails` command line argument can be used to generate thumbnails for +topic nodes based on the thumbnails of the content nodes they contain. + + - class TopicNode(TreeNode) Content nodes +------------- +The table summarizes summarizes the content node classes, their associated files, +and the file formats supported by each file class: + + ricecooker.classes.nodes ricecooker.classes.files + | | + AudioNode --files--> AudioFile # .mp3 + DocumentNode --files--> DocumentFile # .pdf + HTML5AppNode --files--> HTMLZipFile # .zip + VideoNode --files--> VideoFile, WebVideoFile, YouTubeVideoFile, # .mp4 + SubtitleFile, YouTubeSubtitleFile # .vtt + + +For your copy-paste convenience, here is the sample code for creating a content +node (`DocumentNode`) and an associated (`DocumentFile`) + + content_node = DocumentNode( + source_id='', + title='Some Document', + author='First Last (author\'s name)', + description='Put file description here', + language=getlang('en').code, + license=get_license(licenses.CC_BY, copyright_holder='Copyright holder name'), + thumbnail='some/local/path/name_thumb.jpg', + files=[DocumentFile( + path='some/local/path/name.pdf', + language=getlang('en').code + )] + ) + +Files can be passed in upon initialization as in the above sample, or can be +added after initialization using the content_node's `add_files` method. + +Note you also use URLs for `path` and `thumbnail` instead of local filesystem paths, +and the files will be downloaded for you automatically. + +You can replace `DocumentNode` and `DocumentFile` with any of the other +combinations of content node and file types. +VideoNodes also have a __derive_thumbnail__ (boolean) argument, which will automatically +extract a thumbnail from the video if no thumbnail is provided. + + + +Exercise nodes +-------------- +The `ExerciseNode` class (also subclasses of `ContentNode`), act as containers for +various assessment questions types defined in `ricecooker.classes.questions`. +The question types currently supported are: + - __SingleSelectQuestion__: questions that only have one right answer (e.g. radio button questions) + - __MultipleSelectQuestion__: questions that have multiple correct answers (e.g. check all that apply) + - __InputQuestion__: questions that have as answers simple text or numeric expressions (e.g. fill in the blank) + - __PerseusQuestion__: perseus json question (used in Khan Academy chef) + + +The following code snippet creates an exercise node that contains the three simple +question types: + + exercise_node = ExerciseNode( + source_id='', + title='Basic questions', + author='LE content team', + description='Showcase of the simple question type supported by Ricecooker and Studio', + language=getlang('en').code, + license=get_license(licenses.PUBLIC_DOMAIN), + thumbnail=None, + exercise_data={ + 'mastery_model': exercises.M_OF_N, # \ + 'm': 2, # learners must get 2/3 questions correct to complete exercise + 'n': 3, # / + 'randomize': True, # show questions in random order + }, + questions=[ + MultipleSelectQuestion( + id='sampleEX_Q1', + question = "Which numbers the following numbers are even?", + correct_answers = ["2", "4",], + all_answers = ["1", "2", "3", "4", "5"], + hints=['Even numbers are divisible by 2.'], + ), + SingleSelectQuestion( + id='sampleEX_Q2', + question = "What is 2 times 3?", + correct_answer = "6", + all_answers = ["2", "3", "5", "6"], + hints=['Multiplication of $a$ by $b$ is like computing the area of a rectangle with length $a$ and width $b$.'], + ), + InputQuestion( + id='sampleEX_Q3', + question = "Name one of the *factors* of 10.", + answers = ["1", "2", "5", "10"], + hints=['The factors of a number are the divisors of the number that leave a whole remainder.'], + ) + ] + ) + + +Creating a `PerseusQuestion` requires first obtaining the perseus-format `.json` +file for the question. You can questions using the [web interface](http://khan.github.io/perseus/). +[Click here](https://github.com/learningequality/ricecooker/tree/master/examples/data) +to see a samples of questions in the perseus json format. + +To following code creates an exercise node with a single perseus question in it: + + # LOAD JSON DATA (as string) FOR PERSEUS QUESTIONS + RAW_PERSEUS_JSON_STR = open('../ricecooker/examples/perseus_graph_question.json', 'r').read() + # or + # import requests + # RAW_PERSEUS_JSON_STR = requests.get('https://github.com/learningequality/sample-channels/blob/master/contentnodes/exercise/perseus_graph_question.json').text + exercise_node2 = ExerciseNode( + source_id='', + title='An exercise containing a perseus question', + author='LE content team', + description='An example exercise with a Persus question', + language=getlang('en').code, + license=get_license(licenses.CC_BY, copyright_holder='Copyright holder name'), + thumbnail=None, + exercise_data={ + 'mastery_model': exercises.M_OF_N, + 'm': 1, + 'n': 1, + }, + questions=[ + PerseusQuestion( + id='ex2bQ4', + raw_data=RAW_PERSEUS_JSON_STR, + source_url='https://github.com/learningequality/sample-channels/blob/master/contentnodes/exercise/perseus_graph_question.json' + ), + ] + ) - class ContentNode(TreeNode) - class VideoNode(ContentNode) - class AudioNode(ContentNode) - class DocumentNode(ContentNode) - class HTML5AppNode(ContentNode) - class ExerciseNode(ContentNode) \ No newline at end of file +The example above uses the JSON from [this question](http://khan.github.io/perseus/#content=%7B%22question%22%3A%7B%22content%22%3A%22Move%20the%20points%20in%20the%20figure%20below%20to%20obtain%20the%20graph%20of%20the%20line%20with%20equation%20%24y%3D%5C%5Cfrac%7B3%7D%7B2%7Dx-3%24.%5Cn%5Cn%5B%5B%E2%98%83%20interactive-graph%202%5D%5D%5Cn%22%2C%22images%22%3A%7B%7D%2C%22widgets%22%3A%7B%22interactive-graph%202%22%3A%7B%22type%22%3A%22interactive-graph%22%2C%22alignment%22%3A%22default%22%2C%22static%22%3Afalse%2C%22graded%22%3Atrue%2C%22options%22%3A%7B%22step%22%3A%5B1%2C1%5D%2C%22backgroundImage%22%3A%7B%22url%22%3Anull%7D%2C%22markings%22%3A%22graph%22%2C%22labels%22%3A%5B%22x%22%2C%22y%22%5D%2C%22showProtractor%22%3Afalse%2C%22showRuler%22%3Afalse%2C%22showTooltips%22%3Afalse%2C%22rulerLabel%22%3A%22%22%2C%22rulerTicks%22%3A10%2C%22range%22%3A%5B%5B-5%2C5%5D%2C%5B-5%2C5%5D%5D%2C%22gridStep%22%3A%5B0.5%2C0.5%5D%2C%22snapStep%22%3A%5B0.25%2C0.25%5D%2C%22graph%22%3A%7B%22type%22%3A%22linear%22%7D%2C%22correct%22%3A%7B%22type%22%3A%22linear%22%2C%22coords%22%3A%5B%5B0%2C-3%5D%2C%5B2%2C0%5D%5D%7D%7D%2C%22version%22%3A%7B%22major%22%3A0%2C%22minor%22%3A0%7D%7D%2C%22interactive-graph%201%22%3A%7B%22options%22%3A%7B%22labels%22%3A%5B%22x%22%2C%22y%22%5D%2C%22range%22%3A%5B%5B-10%2C10%5D%2C%5B-10%2C10%5D%5D%2C%22step%22%3A%5B1%2C1%5D%2C%22valid%22%3Atrue%2C%22backgroundImage%22%3A%7B%22url%22%3Anull%7D%2C%22markings%22%3A%22graph%22%2C%22showProtractor%22%3Afalse%2C%22showRuler%22%3Afalse%2C%22showTooltips%22%3Afalse%2C%22rulerLabel%22%3A%22%22%2C%22rulerTicks%22%3A10%2C%22correct%22%3A%7B%22type%22%3A%22linear%22%2C%22coords%22%3Anull%7D%7D%2C%22type%22%3A%22interactive-graph%22%2C%22version%22%3A%7B%22major%22%3A0%2C%22minor%22%3A0%7D%7D%2C%22expression%201%22%3A%7B%22options%22%3A%7B%22answerForms%22%3A%5B%7B%22value%22%3A%22y%3D%5C%5Cfrac%7B3%7D%7B2%7Dx-3%22%2C%22form%22%3Afalse%2C%22simplify%22%3Afalse%2C%22considered%22%3A%22correct%22%2C%22key%22%3A0%2C%22times%22%3Afalse%2C%22functions%22%3A%5B%22f%22%2C%22g%22%2C%22h%22%5D%2C%22buttonSets%22%3A%5B%22basic%22%2C%22basic%20relations%22%5D%2C%22buttonsVisible%22%3A%22focused%22%2C%22linterContext%22%3A%7B%22contentType%22%3A%22%22%2C%22highlightLint%22%3Afalse%2C%22paths%22%3A%5B%5D%2C%22stack%22%3A%5B%5D%7D%7D%2C%7B%22considered%22%3A%22correct%22%2C%22form%22%3Afalse%2C%22key%22%3A1%2C%22simplify%22%3Afalse%2C%22value%22%3A%22%5C%5Cfrac%7B3%7D%7B2%7Dx-3%22%2C%22times%22%3Afalse%2C%22functions%22%3A%5B%22f%22%2C%22g%22%2C%22h%22%5D%2C%22buttonSets%22%3A%5B%22basic%22%2C%22basic%20relations%22%5D%2C%22buttonsVisible%22%3A%22focused%22%2C%22linterContext%22%3A%7B%22contentType%22%3A%22%22%2C%22highlightLint%22%3Afalse%2C%22paths%22%3A%5B%5D%2C%22stack%22%3A%5B%5D%7D%7D%5D%2C%22buttonSets%22%3A%5B%22basic%22%2C%22basic%20relations%22%5D%2C%22functions%22%3A%5B%22f%22%2C%22g%22%2C%22h%22%5D%2C%22times%22%3Afalse%2C%22static%22%3Afalse%7D%2C%22type%22%3A%22expression%22%2C%22version%22%3A%7B%22major%22%3A1%2C%22minor%22%3A0%7D%2C%22graded%22%3Atrue%2C%22alignment%22%3A%22default%22%2C%22static%22%3Afalse%7D%7D%7D%2C%22answerArea%22%3A%7B%22calculator%22%3Afalse%2C%22chi2Table%22%3Afalse%2C%22periodicTable%22%3Afalse%2C%22tTable%22%3Afalse%2C%22zTable%22%3Afalse%7D%2C%22itemDataVersion%22%3A%7B%22major%22%3A0%2C%22minor%22%3A1%7D%2C%22hints%22%3A%5B%5D%7D), +for which you can also a [rendered preview here](http://khan.github.io/perseus/?renderer#content=%7B%22question%22%3A%7B%22content%22%3A%22Move%20the%20points%20in%20the%20figure%20below%20to%20obtain%20the%20graph%20of%20the%20line%20with%20equation%20%24y%3D%5C%5Cfrac%7B3%7D%7B2%7Dx-3%24.%5Cn%5Cn%5B%5B%E2%98%83%20interactive-graph%202%5D%5D%5Cn%22%2C%22images%22%3A%7B%7D%2C%22widgets%22%3A%7B%22interactive-graph%202%22%3A%7B%22type%22%3A%22interactive-graph%22%2C%22alignment%22%3A%22default%22%2C%22static%22%3Afalse%2C%22graded%22%3Atrue%2C%22options%22%3A%7B%22step%22%3A%5B1%2C1%5D%2C%22backgroundImage%22%3A%7B%22url%22%3Anull%7D%2C%22markings%22%3A%22graph%22%2C%22labels%22%3A%5B%22x%22%2C%22y%22%5D%2C%22showProtractor%22%3Afalse%2C%22showRuler%22%3Afalse%2C%22showTooltips%22%3Afalse%2C%22rulerLabel%22%3A%22%22%2C%22rulerTicks%22%3A10%2C%22range%22%3A%5B%5B-5%2C5%5D%2C%5B-5%2C5%5D%5D%2C%22gridStep%22%3A%5B0.5%2C0.5%5D%2C%22snapStep%22%3A%5B0.25%2C0.25%5D%2C%22graph%22%3A%7B%22type%22%3A%22linear%22%7D%2C%22correct%22%3A%7B%22type%22%3A%22linear%22%2C%22coords%22%3A%5B%5B0%2C-3%5D%2C%5B2%2C0%5D%5D%7D%7D%2C%22version%22%3A%7B%22major%22%3A0%2C%22minor%22%3A0%7D%7D%7D%7D%2C%22answerArea%22%3A%7B%22calculator%22%3Afalse%2C%22chi2Table%22%3Afalse%2C%22periodicTable%22%3Afalse%2C%22tTable%22%3Afalse%2C%22zTable%22%3Afalse%7D%2C%22itemDataVersion%22%3A%7B%22major%22%3A0%2C%22minor%22%3A1%7D%2C%22hints%22%3A%5B%5D%7D). diff --git a/docs/parsing_html.md b/docs/parsing_html.md index 4f26df80..eea7d3ce 100644 --- a/docs/parsing_html.md +++ b/docs/parsing_html.md @@ -1,9 +1,10 @@ - Parsing HTML using `BeautifulSoup` ================================== +Basic code to GET the HTML source of a webapge and parse it: -Basic setup to get the HTML: + import requests + from bs4 import BeautifulSoup url = 'https://somesite.edu' html = requests.get(url).content @@ -12,9 +13,18 @@ Basic setup to get the HTML: Basic API uses `find` and `find_all`: - sections_ul = doc.find('ul', class_='some-special-class') - section_lis = sections_ul.find_all('li', recursive=False) # search only immediate children + special_ul = doc.find('ul', class_='some-special-class') + section_lis = special_ul.find_all('li', recursive=False) # search only immediate children for section_li in section_lis: print('processing a section
  • right now...') - print(section_li.prettify()) # useful for debugging + print(section_li.prettify()) # useful seeing HTML in when developing... + + + +Further reading +--------------- +You can learn more about BeautifulSoup from these excellent tutorials: + - http://akul.me/blog/2016/beautifulsoup-cheatsheet/ + - http://youkilljohnny.blogspot.ca/2014/03/beautifulsoup-cheat-sheet-parse-html-by.html + - http://www.compjour.org/warmups/govt-text-releases/intro-to-bs4-lxml-parsing-wh-press-briefings/ diff --git a/docs/platform/README.md b/docs/platform/README.md new file mode 100644 index 00000000..957e1049 --- /dev/null +++ b/docs/platform/README.md @@ -0,0 +1,111 @@ +Kolibri content platform +======================== +Educational content in the Kolibri platform is organized into **content channels**. +The `ricecooker` library is used for creating content channels and uploading them +to [Kolibri Studio](https://studio.learningequality.org/), which is the central +content server that [Kolibri](http://learningequality.org/kolibri/) applications +talk to when importing their content. + +The Kolibri content pipeline is pictured below: + +![The Kolibri Content Pipeline](../figures/content_pipeline_diagram.png) + +This `ricecooker` framework is the "main actor" in the first part of the content +pipeline, and touches all aspects of the pipeline within the region highlighted +in blue in the above diagram. + + +Supported Content types +----------------------- +Kolibri channels are tree-like data structures that consist of the following types +of nodes: + + - Topic nodes (folders) + - Content types: + - Document (PDF files) + - Audio (mp3 files) + - Video (mp4 files) + - HTML5App zip files (generic container for web content: HTML+JS+CSS) + - Exercises, which contain different types of questions: + - SingleSelectQuestion (multiple choice) + - MultipleSelectQuestion (multiple choice with multiple correct answers) + - InputQuestion (good for numeric inputs) + - PerseusQuestion (a rich exercise question format developed at Khan Academy) + +You can learn more about the content types supported by the Kolibri ecosystem +[here](./content_types.md). + + + +Content import workflows +------------------------ +The following options are available for importing content into Kolibri Studio. + + +### Kolibri Studio web interface +You can use the [Kolibri Studio](https://studio.learningequality.org/) web interface +to upload various content types and organize them into channels. Kolibri Studio +allows you to explore pre-organized libraries of open educational resources, +and reuse them in your channels. You can also add tags, re-order, re-mix content, +and create exercises to support student's learning process. + +To learn more about Studio, we recommend reading the following pages in the +[Kolibri Studio User Guide](http://kolibri-studio.readthedocs.io/en/latest/): + - [Accessing Studio](http://kolibri-studio.readthedocs.io/en/latest/access_studio.html) + - [Working with channels](http://kolibri-studio.readthedocs.io/en/latest/working_channels.html) + - [Adding content to channels](http://kolibri-studio.readthedocs.io/en/latest/add_content.html) + +When creating large channels (50+ content items) or channels that need will be +updated regularly, you should consider using one of the bulk-import options below. + + + +### Bulk-importing content programatically +The [`ricecooker`](https://github.com/learningequality/ricecooker) library is a +tool that programmers can use to upload content to Kolibri Studio in an automated +fashion. We refer to these import scripts as **sushi chefs**, because their job +is to chop-up the source material (e.g. an educational website) and package the +content items into tasty morsels (content items) with all the associated metadata. + +Using the bulk import option requires the a content developer (sushi chef author) +to prepare the content, content metadata, and run the chef script to perform the +upload to Kolibri Studio. + +Educators and content specialists can assist the developers by preparing a **spec sheet** +for the content source (usually a shared google doc), which provides detailed +instructions for how content should be structured and organized within the channel. + +Consult [this document](https://docs.google.com/document/d/1slwoNT90Wqu0Rr8MJMAEsA-9LWLRvSeOgdg9u7HrZB8/edit?usp=sharing) +for more info about writing spec sheets. + + + +### CSV metadata workflow +In addition to the web interface and the Python interface (`ricecooker`), there +exists a third option for creating Kolibri channels by: + - Organizing content items (documents, videos, mp3 files) into a folder hierarchy + on the local file system + - Specifying metadata in the form of CSV files + +The CSV-based workflow is a good fit for non-technical users since it doesn't +require writing any code, but instead can use Excel to provide all the metadata. + + - [CSV-based workflow README](https://github.com/learningequality/sample-channels/tree/master/channels/csv_channel) + - [Example content folder](https://github.com/learningequality/sample-channels/tree/master/channels/csv_exercises/content) + - [Example Channel.csv metadata file](https://github.com/learningequality/sample-channels/blob/master/channels/csv_channel/content/Channel.csv) + - [Example Content.csv metadata file](https://github.com/learningequality/sample-channels/blob/master/channels/csv_channel/content/Content.csv) + - [CSV-based exercises info](https://github.com/learningequality/sample-channels/tree/master/channels/csv_exercises) + +Organizing the content into folders and creating the CSV metadata files is most +of the work, and can be done by non-programmers. +The generic sushi chef script (`LineCook`) is then used to upload the channel. + + + + +Further reading +--------------- + + - [Kolibri Studio User Guide](http://kolibri-studio.readthedocs.io/en/latest/index.html) + - [Sample channels](https://github.com/learningequality/sample-channels) + diff --git a/docs/platform/content_types.md b/docs/platform/content_types.md new file mode 100644 index 00000000..4671f987 --- /dev/null +++ b/docs/platform/content_types.md @@ -0,0 +1,88 @@ +Supported content types +======================= + +Audio +----- +The `AudioNode` and `AudioFile` are used to store mp3 files. + + +Videos +------ +The `VideoNode` and `VideoFile` are used to store videos. + + +Documents +--------- +The `DocumentNode` and `DocumentFile` are used to store PDF documents. + + +HTML5Apps +--------- +The most versatile and extensible option for importing content into Kolibri is to +package the content as HTML5App nodes. The HTML5 content type on Kolibri, consists +of a zip file with web content inside it. The Kolibri application serves the file +`index.html` from the root of the zip folder inside an iframe. It is possible to +package any web content in this manner: text, images, CSS, fonts, and JavaScript code. +The `iframe` rendering the content in Kolibri is sandbox so no plugins are allowed (no swf/flash). +In addition, it is expected that oh web resources are stored within the zip file, +and referenced using relative paths. This is what enables Kolibri to used in offline settings. + + +Here are some samples: + + - [Sample Vue.js App](https://github.com/learningequality/sample-channels/tree/master/contentnodes/html5_vuejs): + Proof of concept of minimal webapp based on the vue.js framework. + Note the [shell script](https://github.com/learningequality/sample-channels/blob/master/contentnodes/html5_vuejs/update.sh#L22) + tweaks the output to make references relative paths. + + - [Sample React App](https://github.com/learningequality/sample-channels/tree/master/contentnodes/html5_react): + Proof of concept of minimal webapp based on the React framework. + Note the [shell script](https://github.com/learningequality/sample-channels/blob/master/contentnodes/html5_react/update.sh#L24) + tweaks required to make paths relative. + + + +Exercises +--------- +Kolibri exercises are based on the `perseus` exercise framework developed by Khan Academy. +Perseus provides a free-form interface for questions based on various "widgets" buttons, +draggables, expressions, etc. This is the native format for exercises on Kolibri. +An exercise question item is represented as a giant json file, with the main question +field stored as Markdown. Widgets are included in the "main" through a unique-Unicode +character and then widget metadata is stored separately as part of the json data. + +Exercises can be created programmatically or interactively using the perseus editor through the web: [http://khan.github.io/perseus/](http://khan.github.io/perseus/) +(try adding different widgets in the Question area and then click the JSON Mode +checkbox to "view source" for the exercise. + +You can then copy-paste the results as a .json file and import into Kolibri using ricecooker library (Python). + +Sample: [https://github.com/learningequality/sample-channels/blob/master/contentnodes/exercise/sample_perseus04.json](https://github.com/learningequality/sample-channels/blob/master/contentnodes/exercise/sample_perseus04.json) + + +Kolibri Studio provides helper classes for creating single/multiple-select questions, and numeric input questions: +[https://github.com/learningequality/ricecooker/blob/master/docs/exercises.md](https://github.com/learningequality/ricecooker/blob/master/docs/exercises.md) + +A simple multiple choice (single select) question can be created as follows: + + SingleSelectQuestion( + question = "What was the main idea in the passage you just read?", + correct_answer = "The right answer", + all_answers = ["The right answer", "Another option", "Nope, not this"] + ... + +Exercise activities allow student answers to be logged and enable progress reports +for teachers and coaches. Exercises can also be used as part of individual assignments +(playlist-like thing with a mix of content and exercises), group assignments, and exams. + + + + +Extending Kolibri +----------------- +New content types and presentation modalities will become available and supported +natively by future versions of Kolibri. The Kolibri software architecture is based +around the plug-in system that is easy to extend. All currently supported content +type renderers are based on this plug-in architecture. It might be possible to create +a Kolibri plugin for rendering specific content in custom ways. + diff --git a/docs/ricecooker.rst b/docs/ricecooker.rst index 9a964441..b7f1f0e2 100644 --- a/docs/ricecooker.rst +++ b/docs/ricecooker.rst @@ -13,40 +13,40 @@ Subpackages Submodules ---------- -ricecooker\.chefs module ------------------------- +ricecooker.chefs module +----------------------- .. automodule:: ricecooker.chefs :members: :undoc-members: :show-inheritance: -ricecooker\.commands module ---------------------------- +ricecooker.commands module +-------------------------- .. automodule:: ricecooker.commands :members: :undoc-members: :show-inheritance: -ricecooker\.config module -------------------------- +ricecooker.config module +------------------------ .. automodule:: ricecooker.config :members: :undoc-members: :show-inheritance: -ricecooker\.exceptions module ------------------------------ +ricecooker.exceptions module +---------------------------- .. automodule:: ricecooker.exceptions :members: :undoc-members: :show-inheritance: -ricecooker\.sushi\_bar\_client module -------------------------------------- +ricecooker.sushi\_bar\_client module +------------------------------------ .. automodule:: ricecooker.sushi_bar_client :members: diff --git a/docs/tutorial/languages.ipynb b/docs/tutorial/languages.ipynb index e217d62e..0bb7cb26 100644 --- a/docs/tutorial/languages.ipynb +++ b/docs/tutorial/languages.ipynb @@ -107,6 +107,28 @@ "language_obj.code" ] }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Language(native_name='Français, langue française', primary_code='fr', subcode=None, name='French', ka_name='francais')\n", + "fr\n" + ] + } + ], + "source": [ + "from le_utils.constants.languages import getlang_by_native_name\n", + "\n", + "lang_obj = getlang_by_native_name('français')\n", + "print(lang_obj) # \n", + "print(lang_obj.code) # 'fr')\n" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -698,7 +720,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.1" + "version": "3.6.4" } }, "nbformat": 4, diff --git a/docs/tutorial/quickstart.ipynb b/docs/tutorial/quickstart.ipynb index 1cd4d872..9dcc3c35 100644 --- a/docs/tutorial/quickstart.ipynb +++ b/docs/tutorial/quickstart.ipynb @@ -6,9 +6,30 @@ "source": [ "# The `ricecooker` quick start\n", "\n", - "This short tutorial will walk you through the steps of creating a sushi chef class `MySushiChef` that uses the `ricecooker` framework to upload a content channel to the Kolibri Content Curation Server.\n", + "This mini-tutorial will walk you through the steps of running a simple chef script `SimpleChef` that uses the `ricecooker` framework to upload a content channel to the Kolibri Studio server.\n", "\n", - "We'll go over the same steps as described in the [README](../../README.md) but this time showing the expected outcome is in each step. If you clone the `ricecooker` repository, you should be able to run the same commands by yourself and poke around." + "We'll go over the same steps as described in the [usage](../usage.md), but this time showing the expected output of each step.\n", + "\n", + "\n", + "### Running the notebooks\n", + "To follow along and run the code in this notebook, you'll need to clone the `ricecooker` repository, crate a virtual environement, install `ricecooker` using `pip install ricecooker`, install Jypyter notebook using `pip install jupyter`, then start the jupyter notebook server by running `jupyter notebook`. You will then be able to run all the code sections in this notebook and poke around." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Step 1: Obtain a Studio Authorization Token\n", + "\n", + "You will need a` Studio Authorization Token to create a channel on Kolibri Studio.\n", + "In order to obtain such a token:\n", + "1. Create an account on [Kolibri Studio](https://studio.learningequality.org/).\n", + "2. Navigate to the Tokens tab under your Settings page.\n", + "3. Copy the given authorization token to a safe place.\n", + "\n", + "You must pass the token on the command line as `--token=` when\n", + "calling your chef script. Alternatively, you can create a file to store your token\n", + "and pass in the command line argument `--token=\"path/to/file.txt\"`.\n" ] }, { @@ -257,7 +278,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.1" + "version": "3.6.4" } }, "nbformat": 4, diff --git a/docs/usage.md b/docs/usage.md new file mode 100644 index 00000000..2f1c2bc8 --- /dev/null +++ b/docs/usage.md @@ -0,0 +1,245 @@ +# Using the `ricecooker` library + +The `ricecooker` library is used to transform various educational content types +into Kolibri-compatible formats and upload content to Kolibri Studio. +The following steps will guide you through the creation of a sushi chef script +that uses all the features of the `ricecooker` library. + + + +## Step 1: Obtain a Studio Authorization Token + +You will need a Studio Authorization Token to create a channel on Kolibri Studio. +In order to obtain such a token: +1. Create an account on [Kolibri Studio](https://studio.learningequality.org/). +2. Navigate to the Tokens tab under your Settings page. +3. Copy the given authorization token to a safe place. + +You must pass the token on the command line as `--token=` when +calling your chef script. Alternatively, you can create a file to store your token +and pass in the command line argument `--token="path/to/file.txt"`. + + + +## Step 2: Create a Sushi Chef script + +We'll use following simple chef script as an the running example in this section. +You can copy-paste this code into a file `mychef.py` and use it as a starting point +for the chef script you're working on. + +``` +#!/usr/bin/env python +from ricecooker.chefs import SushiChef +from ricecooker.classes.nodes import TopicNode, DocumentNode +from ricecooker.classes.files import DocumentFile +from ricecooker.classes.licenses import get_license + +class SimpleChef(SushiChef): # (1) + channel_info = { # (2) + 'CHANNEL_TITLE': 'Potatoes info channel', + 'CHANNEL_SOURCE_DOMAIN': 'gov.mb.ca', # change me!!! + 'CHANNEL_SOURCE_ID': 'website_docs', # change me!!! + 'CHANNEL_LANGUAGE': 'en', + 'CHANNEL_THUMBNAIL': 'https://upload.wikimedia.org/wikipedia/commons/b/b7/A_Grande_Batata.jpg', + 'CHANNEL_DESCRIPTION': 'A channel about potatoes.', + } + + def construct_channel(self, **kwargs): + channel = self.get_channel(**kwargs) # (3) + potato_topic = TopicNode(title="Potatoes!", source_id="les_patates") # (4) + channel.add_child(potato_topic) # (5) + doc_node = DocumentNode( # (6) + title='Growing potatoes', + description='An article about growing potatoes on your rooftop.', + source_id='inr/pdf/pubs/mafri-potatoe.pdf', + author=None, + language='en', # (7) + license=get_license('CC BY', copyright_holder='U. of Alberta'), # (8) + files=[ + DocumentFile( # (9) + path='https://www.gov.mb.ca/inr/pdf/pubs/mafri-potatoe.pdf', # (10) + language='en', # (11) + ) + ], + ) + potato_topic.add_child(doc_node) + return channel + +if __name__ == '__main__': # (12) + """ + Run this script on the command line using: + python simple_chef.py -v --reset --token=YOURTOKENHERE9139139f3a23232 + """ + simple_chef = SimpleChef() + simple_chef.main() # (13) + +``` + + +### Ricecooker Chef API +To use the `ricecooker` library, you create a **sushi chef** scripts that define +a subclass of the base class `ricecooker.chefs.SushiChef`, as shown at (1) in the code. +By extending `SushiChef`, your chef class will inherit the following methods: + - `run`, which performs all the work of uploading your channel to the Kolibri Studio. + A sushi chef run consists of multiple steps, the most important one being + when the we call the chef class' `construct_channel` method. + - `main`, which your is the function that runs when the sushi chef script is + called on the command line. + + +### Chef class attributes +A chef class should have the attribute `channel_info` (dict), which contains the +metadata for the channel, as shows on line (2). Define the `channel_info` as follows: + + channel_info = { + 'CHANNEL_TITLE': 'Channel name shown in UI', + 'CHANNEL_SOURCE_DOMAIN': '', # who is providing the content (e.g. learningequality.org) + 'CHANNEL_SOURCE_ID': '', # an unique identifier for this channel within the domain + 'CHANNEL_LANGUAGE': 'en', # use language codes from le_utils + 'CHANNEL_THUMBNAIL': 'http://yourdomain.org/img/logo.jpg', # (optional) local path or url to a thumbnail image + 'CHANNEL_DESCRIPTION': 'What is this channel about?', # (optional) longer description of the channel + } + +Note: make sure you change the values of `CHANNEL_SOURCE_DOMAIN` and `CHANNEL_SOURCE_ID` +before you try running this script. The combination of these two values is used +to compute the `channel_id` for the Kolibri channel you're creating. If you keep +the lines above unchanged, you'll get an error because the channel with source +domain 'gov.mb.ca' and source id 'website_docs' already exists on Kolibri Studio. + + +### Construct channel +The code responsible for building the structure of the channel your channel by +adding `TopicNode`s, `ContentNodes`s, files, and exercises questions lives here. +This is where most of the work of writing a chef script happens. + +You chef class should have a method with the signature: +``` +def construct_channel(self, **kwargs) -> ChannelNode: + ... +``` + +To write the `construct_channel` method of your chef class, start by getting the +`ChannelNode` for this channel by calling `self.get_channel(**kwargs)`. +An instance of the `ChannelNode` will be constructed for you, from the metadata +provided in `self.channel_info`. Once you have the `ChannelNode` instance, the +rest of your chef's `construct_channel` method is responsible for constructing +the channel by adding various `Node`s objects to the channel using `add_child`. + + +### Topic nodes +Topic nodes are folder-like containers that are used to organize the channel's content. +Line (4) shows how to create a `TopicNode` (folder) instance titled "Potatoes!". +Line (5) shows how to add the newly created topic node to the channel. + + +### Content nodes +The `ricecooker` library provides classes like `DocumentNode`, `VideoNode`, +`AudioNode`, etc., to store the metadata associate with media content items. +Each content node also has one or more files associated with it, +`DocumentFile`, `VideoFile`, `AudioFile`, `ThumbnailFile`, etc. + +Line (6) shows how to create a `DocumentNode` to store the metadata for a pdf file. +The `title` and `description` attributes are set. We also set the `source_id` +attribute to a unique identifier for this document on the source domain `gov.mb.ca`. +The document does not specify authors, so we set the `author` attribute to `None`. + +On (7), we set `language` attribute to the internal language code `en`, to indicate +the content node is in English. We use the same language code later on line (11) +to indicate the file contents are in English. The Python package `le-utils` defines +the internal language codes used throughout the Kolibri platform (e.g. `en`, `es-MX`, and `zul`). +To find the internal language code for a given language, you can locate it in the +[lookup table](https://github.com/learningequality/le-utils/blob/master/le_utils/resources/languagelookup.json), +or use one of the language lookup helper functions defined in `le_utils.constants.languages`. + +Line (8) shows how we set the `license` attribute to the appropriate instance of +`ricecooker.classes.licenses.License`. All non-topic nodes must be assigned a +license upon initialization. You can obtain the appropriate license object using +the helper function `get_license` defined in `ricecooker.classes.licenses`. +Use the predefined license ids given in `le_utils.constants.licenses` as the +first argument to the `get_license` helper function. + + +### Files +On lines (9, 10, and 11), we create a `DocumentFile` instance and set the appropriate +`path` and `language` attributes. Note that `path` can be a web URL as in the above example, +or a local filesystem path. + + +### Command line interface +You can run your chef script by passing the appropriate command line arguments: + + python mychef.py -v --reset --token=YOURTOKENHERE9139139f3a23232 + +The most important argument when running a chef script is `--token` which is used +to pass in the Studio Access Token obtained in Step 1. + +The flags `-v` (verbose) and `--reset` are generally useful in development. +These make sure the chef script will start the process from scratch and displays +useful debugging information on the command line. + +To see the full list of `ricecooker` command line options, run `./mychef.py -h`. +For more details about running chef scripts see [the chefops page](./chefops.md). + +If you get an error when running the chef, make sure you've replaced +`YOURTOKENHERE9139139f3a23232` by the token you obtained from Studio. +Also make sure you've changed the value of `channel_info['CHANNEL_SOURCE_DOMAIN']` +and `channel_info['CHANNEL_SOURCE_ID']` instead of using the default values. + +If the channel run was successful, you should be able to see your single-topic +channel on Kolibri Studio server. The topic node "Potatoes!" is nice to look at, +but it feels kind of empty. Let's add more nodes to it! + + + + + + +## Step 3: Add more content nodes and files + +Once your channel is created, you can start adding nodes. To do this, you need to +convert your data to the rice cooker's objects. Here are the classes that are +available to you (import from `ricecooker.classes.nodes`): + + - __TopicNode__: folders to organize to the channel's content + - __AudioNode__: content containing mp3 file + - __DocumentNode__: content containing pdf file + - __HTML5AppNode__: content containing zip of html files (html, js, css, etc.) + - __VideoNode__: content containing mp4 file + - __ExerciseNode__: assessment-based content with questions + +Once you have created the node, add it to a parent node with `parent_node.add_child(child_node)` + +To read more about the different nodes, read the [nodes page](./nodes.md). + + +To add a file to your node, you must start by creating a file object from `ricecooker.classes.files`. +Your sushi chef is responsible for determining which file object to create. +Here are the available file models: + + - __AudioFile__: mp3 file + - __DocumentFile__: pdf file + - __HTMLZipFile__: zip of html files (must have `index.html` file at topmost level) + - __VideoFile__: mp4 file (can be high resolution or low resolution) + - __WebVideoFile__: video downloaded from site such as YouTube or Vimeo + - __YouTubeVideoFile__: video downloaded from YouTube using a youtube video id + - __SubtitleFile__: .vtt subtitle files to be used with VideoFiles + - __ThumbnailFile__: png or jpg files to add to any kind of node + + +Each file class can be passed a __preset__ and __language__ at initialization +(SubtitleFiles must have a language set at initialization). +A preset determines what kind of file the object is (e.g. high resolution video vs. low resolution video). +A list of available presets can be found at `le_utils.constants.format_presets`. + +ThumbnailFiles, AudioFiles, DocumentFiles, HTMLZipFiles, VideoFiles, and SubtitleFiles +must be initialized with a __path__ (str). This path can be a url or a local path to a file. + +To read more about the different nodes, read the [nodes files](./files.md). + + + + +## Step 4: Adding exercises + +See the [exercises page](./exercises.md). + diff --git a/docs/usage.rst b/docs/usage.rst deleted file mode 100644 index fb3da50f..00000000 --- a/docs/usage.rst +++ /dev/null @@ -1,7 +0,0 @@ -===== -Usage -===== - -To use ricecooker in a project:: - - import ricecooker diff --git a/ricecooker/chefs.py b/ricecooker/chefs.py index 99f03a1b..96d34a3c 100644 --- a/ricecooker/chefs.py +++ b/ricecooker/chefs.py @@ -55,15 +55,14 @@ def __init__(self, *args, compatibility_mode=False, **kwargs): parser.add_argument('command', choices=['uploadchannel'], help='Main command for the chef script.') parser.add_argument('chef_script', help='Path to chef script file') # -h Help documentation # NO NEED BECAUSE AUTOMATIC + parser.add_argument('--token', default='#', help='Authorization token (can be token or path to file with token)') + parser.add_argument('-u', '--update', action='store_true', help='Force re-download of files (skip .ricecookerfilecache/ check)') parser.add_argument('-v', '--verbose', action='store_true', default=True, help='Verbose mode') - parser.add_argument('-u', '--update', action='store_true', help='Re-download files from file paths') - parser.add_argument('--warn', action='store_true', help='Print out warnings to stderr') - parser.add_argument('--debug', action='store_true', help='Print out debugging statements to stderr') - parser.add_argument('--quiet', action='store_true', help='Print out errors to stderr') - parser.add_argument('--stage', action='store_true', help='Stage updates rather than deploying them for manual verification on Kolibri Studio') + parser.add_argument('--quiet', action='store_true', help='Print only errors to stderr') + parser.add_argument('--warn', action='store_true', help='Print warnings to stderr') + parser.add_argument('--debug', action='store_true', help='Print debugging log info to stderr') parser.add_argument('--compress', action='store_true', help='Compress high resolution videos to low resolution videos') parser.add_argument('--thumbnails', action='store_true', help='Automatically generate thumbnails for topics') - parser.add_argument('--token', default='#', help='Authorization token (can be token or path to file with token)') parser.add_argument('--download-attempts',type=int,default=3, help='Maximum number of times to retry downloading files') rrgroup = parser.add_mutually_exclusive_group() rrgroup.add_argument('--reset', action='store_true', help='Restart session, overwriting previous session (cannot be used with --resume flag)') @@ -71,7 +70,8 @@ def __init__(self, *args, compatibility_mode=False, **kwargs): allsteps = [step.name.upper() for step in Status] parser.add_argument('--step',choices=allsteps,default='LAST', help='Step to resume progress from (must be used with --resume flag)') parser.add_argument('--prompt', action='store_true', help='Prompt user to open the channel after creating it') - parser.add_argument('--publish', action='store_true', help='Publish channel after creating it') + parser.add_argument('--stage', action='store_true', help='Upload to staging tree to allow for manual verification before replacing main tree') + parser.add_argument('--publish', action='store_true', help='Publish newly uploaded version of the channel') # [OPTIONS] --- extra key=value options are supported, but do not appear in help self.arg_parser = parser @@ -307,7 +307,7 @@ class JsonTreeChef(SushiChef): This sushi chef loads the data from a channel from a ricecooker json tree file which conatins the json representation of a full ricecooker node tree. For example the content hierarchy with two levels of subfolders and a PDF - content node looks like this: + content node looks like this:: { "title": "Open Stax", diff --git a/ricecooker/utils/downloader.py b/ricecooker/utils/downloader.py index d4635584..22cfef0a 100644 --- a/ricecooker/utils/downloader.py +++ b/ricecooker/utils/downloader.py @@ -31,26 +31,25 @@ def read(path, loadjs=False, session=None, driver=None, timeout=60, clear_cookies=True, loadjs_wait_time=3, loadjs_wait_for_callback=None): - """ read: Reads from source and returns contents - Args: - path: (str) url or local path to download - loadjs: (boolean) indicates whether to load js (optional) - session: (requests.Session) session to use to download (optional) - driver: (selenium.webdriver) webdriver to use to download (optional) - timeout: (int) Maximum number of seconds to wait for the request to - complete. - clear_cookies: (boolean) whether to clear cookies. - loadjs_wait_time: (int) if loading JS, seconds to wait after the - page has loaded before grabbing the page source - loadjs_wait_for_callback: (function) if loading - JS, a callback that will be invoked to determine when we can - grab the page source. The callback will be called with the - webdriver, and should return True when we're ready to grab the - page source. For example, pass in an argument like: - lambda driver: driver.find_element_by_id('list-container') - to wait for the #list-container element to be present before - rendering. - Returns: str content from file or page + """Reads from source and returns contents + + Args: + path: (str) url or local path to download + loadjs: (boolean) indicates whether to load js (optional) + session: (requests.Session) session to use to download (optional) + driver: (selenium.webdriver) webdriver to use to download (optional) + timeout: (int) Maximum number of seconds to wait for the request to complete. + clear_cookies: (boolean) whether to clear cookies. + loadjs_wait_time: (int) if loading JS, seconds to wait after the + page has loaded before grabbing the page source + loadjs_wait_for_callback: (function) if loading + JS, a callback that will be invoked to determine when we can + grab the page source. The callback will be called with the + webdriver, and should return True when we're ready to grab the + page source. For example, pass in an argument like: + ``lambda driver: driver.find_element_by_id('list-container')`` + to wait for the #list-container element to be present before rendering. + Returns: str content from file or page """ session = session or DOWNLOAD_SESSION @@ -131,25 +130,22 @@ def download_static_assets(doc, destination, base_url, request_fn=make_request, url_blacklist=[], js_middleware=None, css_middleware=None, derive_filename=_derive_filename): """Download all static assets referenced from an HTML page. - The goal is to easily create HTML5 apps! Downloads JS, CSS, images, and audio clips. - - doc: The HTML page source as a string or BeautifulSoup instance. - destination: The folder to download the static assets to! - base_url: The base URL where assets will be downloaded from. - request_fn: The function to be called to make requests, passed to - ricecooker.utils.html.download_file(). Pass in a custom one for custom - caching logic. - url_blacklist: A list of keywords of files to not include in downloading. - Will do substring matching, so e.g. 'acorn.js' will match - '/some/path/to/acorn.js'. - js_middleware: If specificed, JS content will be passed into this callback - which is expected to return JS content with any modifications. - css_middleware: If specificed, CSS content will be passed into this callback - which is expected to return CSS content with any modifications. - derive_filename: A callback that is passed the URL to fetch and returns the - filename to save the file as. (optional) + Args: + doc: The HTML page source as a string or BeautifulSoup instance. + destination: The folder to download the static assets to! + base_url: The base URL where assets will be downloaded from. + request_fn: The function to be called to make requests, passed to + ricecooker.utils.html.download_file(). Pass in a custom one for custom + caching logic. + url_blacklist: A list of keywords of files to not include in downloading. + Will do substring matching, so e.g. 'acorn.js' will match + '/some/path/to/acorn.js'. + js_middleware: If specificed, JS content will be passed into this callback + which is expected to return JS content with any modifications. + css_middleware: If specificed, CSS content will be passed into this callback + which is expected to return CSS content with any modifications. Return the modified page HTML with links rewritten to the locations of the downloaded static files, as a BeautifulSoup object. (Call str() on it to diff --git a/ricecooker/utils/linecook.py b/ricecooker/utils/linecook.py index a5f96963..f9ae77de 100644 --- a/ricecooker/utils/linecook.py +++ b/ricecooker/utils/linecook.py @@ -26,10 +26,11 @@ def chan_path_from_rel_path(rel_path, channeldir): """ Convert `rel_path` form os.walk tuple format to a tuple of directories and - subdirectories, starting with the `channeldir` folder, e.g., - >>> chan_path_from_rel_path('content/open_stax_zip/Open Stax/Math/Elementary', + subdirectories, starting with the `channeldir` folder, e.g.,:: + + >>> chan_path_from_rel_path('content/open_stax_zip/Open Stax/Math/Elementary', 'content/open_stax_zip/Open Stax') - 'Open Stax/Math/Elementary' + 'Open Stax/Math/Elementary' """ rel_path_parts = rel_path.split(os.path.sep) dirs_before_channeldir = channeldir.split(os.path.sep)[:-1] diff --git a/ricecooker/utils/path_builder.py b/ricecooker/utils/path_builder.py index 0da515c9..0617c4d5 100644 --- a/ricecooker/utils/path_builder.py +++ b/ricecooker/utils/path_builder.py @@ -1,7 +1,7 @@ class PathBuilder: """ - Class for formatting paths to write to DataWriter + Class for formatting paths to write to DataWriter. """ path = None # List of items in path @@ -13,38 +13,38 @@ def __init__(self, channel_name=None): self.path = [self.channel_name] def __str__(self): - """ Converts path list to string - e.g. [Channel, Topic, Subtopic] -> Channel/Topic/Subtopic - Returns: str path + """Converts path list to string + e.g. [Channel, Topic, Subtopic] -> Channel/Topic/Subtopic + Returns: str path """ return "/".join(self.path) def reset(self): - """ reset: Clear path - Args: None - Returns: None + """Clear path + Args: None + Returns: None """ self.path = [self.channel_name] def set(self, *path): - """ set: Set path from root - Args: *path: (str) items to add to path - Returns: None + """Set path from root + Args: path: (str) items to add to path + Returns: None """ self.path = [self.channel_name] self.path.extend(list(path)) def open_folder(self, path_item): - """ open_folder: Add item to path - Args: path_item: (str) item to add to path - Returns: None + """Add item to path + Args: path_item: (str) item to add to path + Returns: None """ self.path.append(path_item) def go_to_parent_folder(self): - """ go_to_parent_folder: Go back one level in path - Args: None - Returns: last item in path + """Go back one level in path + Args: None + Returns: last item in path """ if len(self.path) > 1: return self.path.pop() diff --git a/ricecooker/utils/tokens.py b/ricecooker/utils/tokens.py index 558df49e..2ed1ca3e 100644 --- a/ricecooker/utils/tokens.py +++ b/ricecooker/utils/tokens.py @@ -23,11 +23,11 @@ def get_env(envvar): def get_content_curation_token(args_token): """ Get the token through one of four possible ways. Input `args_token` can be - 1. path to a token-containing file (path) - 2. actual token (str) in which case there's nothing to get just pass along - 3. `#` (default value when no --token is given on command line) - 3a: if environment variable CONTENT_CURATION_TOKEN exists, we'll use that - 3b: else we prompt the user interactively + 1. path to a token-containing file (path) + 2. actual token (str) in which case there's nothing to get just pass along + 3. `#` (default value when no --token is given on command line) + 3a. if environment variable CONTENT_CURATION_TOKEN exists, we'll use that + 3b. else we prompt the user interactively """ if args_token != "#": # retrieval methods 1, 2 if os.path.isfile(args_token):