Skip to content
This repository has been archived by the owner on Jan 28, 2020. It is now read-only.

Importer app #4

Closed
Ferdi opened this issue May 11, 2015 · 14 comments
Closed

Importer app #4

Ferdi opened this issue May 11, 2015 · 14 comments
Assignees

Comments

@Ferdi
Copy link
Contributor

Ferdi commented May 11, 2015

Importer module is responsible for providing the APIs for :

  • Import course content as tarball file that users can generate by a) exporting the course from studio or b) clicking “download” repository from github
  • Parse each item of the course (chapter, sequential, vertical, problem etc.), extract the relevant metadata and the XML representation of the item and insert them in LORE database.
  • Assign each item a universal unique identifier (UUID) if it does not have one.

Uses the interface provided by #3

@ShawnMilo
Copy link
Contributor

How should we handle the same course being imported multiple times? How do we
detect it -- is it unique by repository + course number? What if the "duplicate"
course has been modified?

If it is a re-import, do we version the imported Learning Modules, so
there are now multiple versions of those Learning Modules to choose from?
Do we mark older versions of the Learning Modules as deprecated and link
to the newer version?

Deleting or replacing Learning Modules seems like a
bad idea, because it could be destructive to users who have already selected
Learning Modules for export, or who have exported them and may wish to re-export
the same content.

@ShawnMilo
Copy link
Contributor

Courses have static content (JavaScript, CSS, images). Is there a benefit in
uploading this to a CDN or online storage and embedding links in the imported
Learning Objects? It seems that something like this is needed (URL embedding)
even if only to point to an edX instance, and since there are no guarantees
that a course's content will be live on a particular instance, a CDN seems
like a good alternative. If permissions are an issue, S3 and IS&T are options.

@Ferdi
Copy link
Contributor Author

Ferdi commented May 12, 2015

org + coursenumber + semester should uniquely identify the course.

@carsongee
Copy link
Contributor

I think we will need to upload the static assets to be able to export problems, but we should use django storage to do that so we can swap out the backend at will

@ShawnMilo
Copy link
Contributor

@carsongee, that implies that we need to parse the XML to determine which Learning Objects consume which static content. Looking at a course, it seems that there's no other way to determine which static file belongs to which Learning Object; multiple items could refer to the same resource in their XML, and due to the hierarchy, there will be overlap (if a problem uses a file, then by definition so will the enclosing chapter). So we need a table with a foreign key to Learning Object for storing metadata for the static content.

@carsongee
Copy link
Contributor

Agreed, static files are tricky, and I'm not sure how completely we want to deal with them at first. We can always just store everything for a course in a single folder/container, and then parse the exported XML files for what static content it needs on export which is how edX does it using their "magic" /static/file_name.jpg url expansion to find the correct files. If we do that we don't have to worry about storing anything other than the files.

@pwilkins
Copy link
Contributor

re: "org + coursenumber + semester should uniquely identify the course." I know that the assumption is that since LORE courses have already run, but what happens should the course change in its github repository?

@ShawnMilo
Copy link
Contributor

Regarding testing LORE imports: I need to use a course (or multiple courses) for unit tests. I was thinking it should be edx-demo-course, because it's probably not going to change too much, and it's open to the public (can't have the test suite blocked from running by permissions, nor can we embed private courses in this public repo).

The zip file from github is 12Mb, so I don't want to add it to the repo and cause bloat. I also don't want to download it each time tests are run, so I was going to make a testdata directory in the repo and add it to .gitignore. Download it there if it's not there already during test set up.

The downsides I can see so far is that edx-demo-course is probably the least likely course to have edge cases, and if it did change then someone running it on a new machine could have different results. Maybe automatically purge & re-download if it's more than a week old or something...

Ideally I'd like the tests to run against multiple, fairly complicated course repositories, and have those repos be guaranteed not to change without having to bloat this codebase.

Any suggestions?

@pdpinch
Copy link
Member

pdpinch commented May 13, 2015

This doesn't completely address your question, but we should also include this course in testing: https://github.com/pmitros/edX-Insider

It is referenced from the OLX documentation as a model OLX course without a Studio provenance. It won't surface all our edge cases, but it will surface some.

Totally wacky idea -- @carsongee is there some (reasonable) way we could add course import testing to our course test suite? That works with private course repos, right?

On May 13, 2015, at 11:52 AM, Shawn Milochik notifications@github.com wrote:

Regarding testing LORE imports: I need to use a course (or multiple courses) for unit tests. I was thinking it should be edx-demo-course, because it's probably not going to change too much, and it's open to the public (can't have the test suite blocked from running by permissions, nor can we embed private courses in this public repo).

The zip file from github is 12Mb, so I don't want to add it to the repo and cause bloat. I also don't want to download it each time tests are run, so I was going to make a testdata directory in the repo and add it to .gitignore. Download it there if it's not there already during test set up.

The downsides I can see so far is that edx-demo-course is probably the least likely course to have edge cases, and if it did change then someone running it on a new machine could have different results. Maybe automatically purge & re-download if it's more than a week old or something...

Ideally I'd like the tests to run against multiple, fairly complicated course repositories, and have those repos be guaranteed not to change without having to bloat this codebase.

Any suggestions?


Reply to this email directly or view it on GitHub.

@ShawnMilo
Copy link
Contributor

@pdpinch: Thanks, I'll add that as well.

@carsongee
Copy link
Contributor

I think we should make our own example course that tests out edge cases, that is what the devops course is for us, no reason not to make one here, and use submodules to include in the testdata directory (we can sanitize that course for public consumption as well, if we want to use it). I would also expect there to need to be multiple courses to do unit testing, we can do that with very small simplified courses like is done here: https://github.com/edx/edx-platform/tree/master/common/test/data

edited

@carsongee
Copy link
Contributor

wow, hit submit too early

@pdpinch
Copy link
Member

pdpinch commented May 19, 2015

Which PRs need to be merged to close this?

@ShawnMilo
Copy link
Contributor

#21. It includes the other, but was rebased again.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants