The tag "" and "" appears on the questions of MagoGenie channel on CC. #222

harishbalachandran · 2017-04-18T07:28:10Z

Summary:
After importing the MagoGenie Questions onto CC, we see that a tag "" and/or
"" appears on most of the questions.

Link used:
https://contentworkshop.learningequality.org/channels/f531b1ddf64755c9be5061a922317021/edit
Channel ID: f531b1ddf64755c9be5061a922317021

Screenshot:

harishbalachandran · 2017-04-22T03:52:44Z

@rtibbles @aronasorman Can you please put a ETA on this? Its a Blocker for all the MG channel.

rtibbles · 2017-04-24T02:09:37Z

Looks the Magogenie content needs to run through an HTML to Markdown parser before being put in?

This package should do the trick, I think: https://pypi.python.org/pypi/html2text

jamalex · 2017-04-24T02:30:51Z

Note that the ricecooker does not support HTML, only Markdown (with embedded $-delimited latex formulas, as needed, as well). The tool that @rtibbles links to may be helpful for converting HTML to Markdown in the sushi chef.

yogeshmhaskule · 2017-04-25T09:56:01Z

@rtibbles @aronasorman @jamalex For more clarification, will this issue be handled in ricecooker or we need to handle it in our code (i.e sushi chef). I remember @aronasorman @jayoshih had taken care of the same issue before.

jayoshih · 2017-04-25T17:05:37Z

@yogeshmhaskule For security reasons, we needed to escape html tags to prevent script attacks. To maintain the paragraphs, you'll need to use \n instead

yogeshmhaskule · 2017-04-26T14:32:20Z

@jayoshih @aronasorman I tried using \n in place of  tag. But got the problem for other tags like ,etc then some .png and base64 images. For this I have used "htm2text" python package, then It removes the 'img' tag of base64 image. and put the '![]' before the base64 image data and "!\[\]\" before the png image. So it failed to download the png image. If you provide more details to parse html in the "Sample program" of ricecooker. It will be beneficial to us for more understanding and easy to move forward.

I have attached the sample response of question in file:
sample_response.txt

Check the format of answer content which is similar to question content. for your reference take a look on answer content(it's combination of text, mathml, base64) which is in the file.

jamalex · 2017-04-26T21:45:39Z

The examples in your sample text contain MathML source, but also include the images that are the rendered version of the MathML, so we can just use the images in this case. The examples you describe (e.g. ![](...)) are valid Markdown, and will work with the ricecooker, even with base64-encoded images. However, there's some escaping in there, and newlines, that throw it off. The code below shows an example of converting the source you have provided into something that works for the ricecooker. For fuller MathML code (with no image alternative), you'll need to follow the instructions in the other issue.

import json
import requests
import html2text

from ricecooker.classes.nodes import ChannelNode, ExerciseNode
from ricecooker.classes.questions import MultipleSelectQuestion

from le_utils.constants import licenses

def convert_html_to_markdown(html):
    return html2text.html2text(html.replace("\/", "/").replace("\n", ""))

def construct_channel(*args, **kwargs):

    channel = ChannelNode(
        source_domain="test.com",
        source_id="test",
        title="Exercise test",
    )

    exercise = ExerciseNode(source_id="ex1", title="My Ex", license=licenses.CC_BY)
    channel.add_child(exercise)

    question_source = json.loads(requests.get("https://github.com/fle-internal/content-curation/files/958703/sample_response.txt").content.decode())["103898"]

    question = MultipleSelectQuestion(
        id="question",
        question=convert_html_to_markdown(question_source["question"]["content"]),
        correct_answers=[convert_html_to_markdown(a["content"]) for a in question_source["possible_answers"] if a["is_correct"]],
        all_answers=[convert_html_to_markdown(a["content"]) for a in question_source["possible_answers"]],
    )

    exercise.add_question(question)

    return channel

jayoshih mentioned this issue Apr 25, 2017

Magogenie questions(related to MathML) not rendering properly onto content curation server #223

Closed

rtibbles closed this as completed May 4, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The tag "</p>" and "<p>" appears on the questions of MagoGenie channel on CC. #222

The tag "</p>" and "<p>" appears on the questions of MagoGenie channel on CC. #222

harishbalachandran commented Apr 18, 2017 •

edited

Loading

harishbalachandran commented Apr 22, 2017

rtibbles commented Apr 24, 2017 •

edited by jamalex

Loading

jamalex commented Apr 24, 2017

yogeshmhaskule commented Apr 25, 2017 •

edited

Loading

jayoshih commented Apr 25, 2017

yogeshmhaskule commented Apr 26, 2017 •

edited

Loading

jamalex commented Apr 26, 2017

The tag "</p>" and "<p>" appears on the questions of MagoGenie channel on CC. #222

The tag "</p>" and "<p>" appears on the questions of MagoGenie channel on CC. #222

Comments

harishbalachandran commented Apr 18, 2017 • edited Loading

harishbalachandran commented Apr 22, 2017

rtibbles commented Apr 24, 2017 • edited by jamalex Loading

jamalex commented Apr 24, 2017

yogeshmhaskule commented Apr 25, 2017 • edited Loading

jayoshih commented Apr 25, 2017

yogeshmhaskule commented Apr 26, 2017 • edited Loading

jamalex commented Apr 26, 2017

harishbalachandran commented Apr 18, 2017 •

edited

Loading

rtibbles commented Apr 24, 2017 •

edited by jamalex

Loading

yogeshmhaskule commented Apr 25, 2017 •

edited

Loading

yogeshmhaskule commented Apr 26, 2017 •

edited

Loading