Migrate all content to markdown #10

Closed
lxbarth opened this Issue Dec 20, 2012 · 34 comments

Comments

Projects
None yet
7 participants
Contributor

lxbarth commented Dec 20, 2012

We'll need to migrate all content to the new site

This will mean:

  • Come up with a process to do so (@lxbarth will have a look, any ideas welcome right away)
  • Do it (this might require a call for volunteers to chime in)
Contributor

lxbarth commented Dec 20, 2012

The migration target are markdown documents in the _posts directory of the gh-pages branch.

Documents are organized by language and by guide

en/
en/beginner
en/beginner/0200-12-30-introduction.md
...

Note that the date in the path is used to order chapters.

mvexel commented Dec 20, 2012

@lxbarth you can assign me and @aplannersguide to this

Pandoc seems to be working fine for this. I have to download the Google Doc as a html as pandoc doesn't work with HTTPS. Do we need to migrate the images as well? Also, how do you want to organize tasks so we are not replicating each others work?

mvexel commented Dec 20, 2012

We need to either publish all the docs as HTML pages or you can pipe curl
output to pandoc?

On Thu, Dec 20, 2012 at 1:15 PM, Tim Moreland notifications@github.comwrote:

Pandoc seems to be working fine for this. I have to download the Google
Doc as a html as pandoc doesn't work with HTTPS. Do we need to migrate the
images as well? Also, how do you want to organize tasks so we are not
replicating each others work?


Reply to this email directly or view it on GitHubhttps://github.com/hotosm/learnosm/issues/10#issuecomment-11589078.

martijn van exel
schaaltreinen.nl

Not sure. I've never used pandoc before. You can read more about it at http://johnmacfarlane.net/pandoc/README.html. I viewed the markdown files I created and they are pretty rough. I think I need to review the documentation a bit more to get the output into a usable format. I'm not a coder so I will need some hand holding along the way.

This was referenced Dec 20, 2012

The markdown files created by pandoc from the downloaded Google Doc HTML file are not working. I think it has something to do with how Google format's their HTML within Google Docs. A work around I found is to use pandoc to convert the HTML to rich text format files and then do some simple edits in the file (basic find and replace) before adding a new file to the GitHub repo. It is a bit more manual work but will get the job done. I'm really new to GitHub and need someone (@mvexel @lxbarth) to create the corresponding folders (intermediate, advanced) for me so I can get started on adding the intermediate and advanced tutorials.

mvexel commented Dec 21, 2012

Maybe there is a way to simplify the HTML first before feeding it into pandoc? I am trying rtf output from Google Docs, but that does not look very promising either. Next I will try ODT output and export html from LibreOffice and feed that into pandoc.

mvexel commented Dec 21, 2012

When I export the gdoc https://docs.google.com/document/d/1j-h2ke5rAc9wmg5O8dTRnokxoL4LhO6r6SnDpW4Y-ig/edit to ODT, load in LibreOffice, export to HTML, feed that into pandoc, I get this md: https://gist.github.com/4355365 which when I feed it into http://daringfireball.net/projects/markdown/dingus looks reasonable - not perfect, but manageable.

Contributor

lxbarth commented Jan 2, 2013

@mvexel this will need some manual cleanup, but not too bad. Any ETA when you'd like to push on the conversion or any road blocks where you'd need my help?

@mvexel @lxbarth I'm thinking we just need a process to break up the task
so we aren't duplicating each others work. Personally I think it makes more
sense to break it up by the major sections of each of the guides instead of
individual guide. It would be helpful for me if either of you could create
the folder structure for the major sections of the learning guides and
beginner guides on the github site so I could fork and start adding my
contributions. If I wasn't such a github newbie I would do it myself.
Thoughts?

On Wed, Jan 2, 2013 at 5:18 AM, Alex Barth notifications@github.com wrote:

@mvexel https://github.com/mvexel this will need some manual cleanup,
but not too bad. Any ETA when you'd like to push on the conversion or any
road blocks where you'd need my help?


Reply to this email directly or view it on GitHubhttps://github.com/hotosm/learnosm/issues/10#issuecomment-11803986.

Tim Moreland
www.aplannersguide.com

mvexel commented Jan 4, 2013

Agreed, although in my impression we were just exploring some ways to get this done as efficiently as possible & in that phase some duplication is not bad at all. I want to explore a few more options to minimize manual labor before settling on a method. I can have another look at that tonight. If @lxbarth or @aplannersguide could create a sensible division of work in parallel I think we can be good to go soon.

Contributor

lxbarth commented Jan 10, 2013

@mvexel - soon enough to create division of labor once you come up with a migration plan? I'm thinking migration plan + first wave of migration can be done by a single person. We then divvy up manual review between us.

Member

MappingKat commented Jan 22, 2013

so have you all decided on a migration plan? I would like to mimic it for our team here in Indonesia.

Contributor

lxbarth commented Jan 23, 2013

@MappingKat - rough plan: convert with pandoc, clean up by hand. Seems like @mvexel hasn't had the time yet to beat the path, I suggest you go ahead :) Let me know how things go. I am also on IRC #learnosm right now.

If we are moving ahead do you want to break it up into sections for concurrent edits without duplication? Another option would to have one person doing the basic conversion and another one going back in manually afterwards. Thoughts?

Tim

Sent from my mobile device

On Jan 23, 2013, at 12:29 PM, Alex Barth notifications@github.com wrote:

@MappingKat - rough plan: convert with pandoc, clean up by hand. Seems like @mvexel hasn't had the time yet to beat the path, I suggest you go ahead :) Let me know how things go. I am also on IRC #learnosm right now.


Reply to this email directly or view it on GitHub.

Contributor

lxbarth commented Jan 23, 2013

If we are moving ahead

@aplannersguide - Right now all that is moving is

  1. Site build (@jueyang)
  2. Adding indonesian content (@MappingKat)
  3. Figuring out migration strategy (@mvexel) - this is essentially: deciding your question around "break it up into sections" and then taking a lead on coordination.

Any further content migration work outside of Indonesian is pending on (3). We have expedited Indonesian as the team there has resources freeing up right now. If you'd like to take a lead on content migration (3), please coordinate with @mvexel - looks like he's gotten side tracked with s/th else. Also feel free to poke me on #learnosm to talk through this. Thank you!

mvexel commented Jan 23, 2013

I have - basically no news since my last msg. Did not find a viable easy
route to converting horrible gdocs html to md. If someone wants to take
lead that'd be great.

On Wed, Jan 23, 2013 at 1:49 PM, Alex Barth notifications@github.comwrote:

If we are moving ahead

@aplannersguide https://github.com/aplannersguide - Right now all that
is moving is

  1. Site build (@jueyang https://github.com/jueyang)
  2. Adding indonesian content (@mappingkathttps://github.com/mappingkat
    )
  3. Figuring out migration strategy (@mvexel https://github.com/mvexel)
  4. this is essentially: deciding your question around "break it up into
    sections" and then taking a lead on coordination.

Any further content migration work outside of Indonesian is pending on
(3). We have expedited Indonesian as the team there has resources freeing
up right now. If you'd like to take a lead on content migration (3), please
coordinate with @mvexel https://github.com/mvexel - looks like he's
gotten side tracked with s/th else. Also feel free to poke me on #learnosm
to talk through this. Thank you!


Reply to this email directly or view it on GitHubhttps://github.com/hotosm/learnosm/issues/10#issuecomment-12621508.

martijn van exel
schaaltreinen.nl

Member

MappingKat commented Jan 28, 2013

Sorry, but how are images taken care of? How are they extracted and put in a folder? Bit confused on that matter.

Contributor

lxbarth commented Jan 28, 2013

@MappingKat - Images go all in the images/ directory, organized by guide:

images/ - general images
images/beginner - images for beginner guide

Thinking about this again, I'm thinking we don't need directories for different language images (e. g. for screenshots) but we should just use suffixes in the file, e. g:

josm-split-way-en.png
josm-split-way-es.png

@wonderchook @MappingKat - thoughts? Using suffixes for languages saves us a weird directory structure where we would have directories for the same guide with often few pictures in multiple language directories.

Member

MappingKat commented Jan 28, 2013

Well there are def some screenshots that are in Indonesian. It is rare,
but there are a few. Hm. Let me think this one over and ask the team.

For text is this the best set of instructions?

TEXT

  1. Export GoogleDoc to ODT
  2. Open in LibreOffice or Office and export to HTML
  3. Feed that into Pandoc (downloaded at
    http://code.google.com/p/pandoc/downloads/list)
  4. Feed result intohttp://daringfireball.net/projects/markdown/dingus
  5. Add header and edit in Github

On Mon, Jan 28, 2013 at 10:39 PM, Alex Barth notifications@github.comwrote:

@MappingKat https://github.com/MappingKat - Images go all in the images/https://github.com/hotosm/learnosm/tree/gh-pages/imagesdirectory, organized by guide:

images/ - general images
images/beginner - images for beginner guide

Thinking about this again, I'm thinking we don't need directories for
different language images (e. g. for screenshots) but we should just use
suffixes in the file, e. g:

josm-split-way-en.png
josm-split-way-es.png

@wonderchook https://github.com/wonderchook @mappingkathttps://github.com/MappingKat- thoughts? Using suffixes for languages saves us a weird directory
structure where we would have directories for the same guide with often few
pictures in multiple language directories.


Reply to this email directly or view it on GitHubhttps://github.com/hotosm/learnosm/issues/10#issuecomment-12787451.

Cheers,

Katrina E. *
*
*
@MappingKat https://twitter.com/MappingKat
+62 812 83 000221

Contributor

lxbarth commented Jan 29, 2013

@MappingKat - Not sure what the dingus step is for?

Member

MappingKat commented Jan 29, 2013

I thought in the thread that the mark down from pandoc was not sufficient
enough for github and you needed to add to dingus step. Am I incorrect?

On Wed, Jan 30, 2013 at 1:18 AM, Alex Barth notifications@github.comwrote:

@MappingKat https://github.com/MappingKat - Not sure what the dingus
step is for?


Reply to this email directly or view it on GitHubhttps://github.com/hotosm/learnosm/issues/10#issuecomment-12849135.

Cheers,

Katrina E. *
*
*
@MappingKat https://twitter.com/MappingKat
+62 812 83 000221

Contributor

lxbarth commented Jan 29, 2013

@MappingKat - sorry, missed that in @mvexel's earlier comment. Your process looks good, let me know how this goes.

Member

MappingKat commented Jan 30, 2013

Most members of the Indo team are using Windows and I am so confused how to use pandoc with it. Should we use the command: pandoc -f html -t markdown file.html?

If so, where does the markdown file go? A clear set of instructions would be super helpful! Thanks!

My understanding is that you have to change directories (CD) in the command
line to where the files are located. Then your command would look something
like this:

pandoc -f html -t markdown YOURFILE.html -o YOURFILE.md

This will place the markdown file into the folder where the file was
located. I think the only thing missing from your command was the output
option (-o). I hope this helps.

On Tue, Jan 29, 2013 at 11:34 PM, Katrina Engelsted <
notifications@github.com> wrote:

Most members of the Indo team are using Windows and I am so confused how
to use pandoc with it. Should we use the command: pandoc -f html -t
markdown file.html?

If so, where does the markdown file go? A clear set of instructions would
be super helpful! Thanks!


Reply to this email directly or view it on GitHubhttps://github.com/hotosm/learnosm/issues/10#issuecomment-12873891.

Tim Moreland
www.aplannersguide.com

Contributor

jueyang commented Jan 31, 2013

I realized you can download google docs as .html directly. This saves the odt-->html step, making the conversion straightforward. See wiki page

Member

MappingKat commented Jan 31, 2013

I though that the html version that google docs provided was a bit funky in
pandoc. If that is not the case then excellent on not needing one more
step.

Cheers,

Katrina E.

Humanitarian Openstreetmap Team
Jakarta, Indonesia
@MappingKat
www.MappinKat.WordPress.com
On Feb 1, 2013 2:18 AM, "Jue Yang" notifications@github.com wrote:

I realized you can download google docs as .html directly. This saves the
odt-->html step, making the conversion straightfoward. See wiki pagehttps://github.com/hotosm/learnosm/wiki/Google-Doc-to-Markdown-Procedures


Reply to this email directly or view it on GitHubhttps://github.com/hotosm/learnosm/issues/10#issuecomment-12960395.

Contributor

jueyang commented Jan 31, 2013

So far I haven't run into any funkiness using the Download as .html directly from google docs. The md posts as a result are cleaner and do not require too much edit. I didn't need to feed it to dingus.

Also, using directly download from google actually packages up images nicely, which is great for the syncing up images. See image process wiki

Contributor

lxbarth commented Mar 21, 2013

The beginners guide was fully migrated before launch. Migration of any other content can continue. Neither @jueyang nor myself are planning on moving on migrating other content. We should coordinate file structure.

Suggestion:

_posts/[LANG]/[GUIDE]/-0200-[MM]-[DD]-[path].md

Member

yohanboniface commented Apr 1, 2013

As discussed yesterday on IRC, we (Haiti team) are interested in having (at least some of) the intermediates docs online and translated.
I'll will work today on migrating the contents and photos from Gdocs.
One question: do we need all the intermediates doc to be able to start publishing them in the brand new LearnOSM site or can we publish/translate them one by one?

Member

yohanboniface commented Apr 1, 2013

Just pushed a first raw version of "Edit in detail" (in a separate branch, intermediate): https://github.com/hotosm/learnosm/blob/intermediate/_posts/en/intermediate/0300-12-31-edit-in-detail.md (including pictures).
I've made a quick markup review, but a good proof reading is needed.
And, btw, I find this doc too long to be really usable, and I suggest to split it (maybe moving out the appendix, as first step).

Member

yohanboniface commented Apr 1, 2013

For the record, and in case it can help other people in the future, here is the process I'm following.

  1. Go on EvilGoogleDoc and download as HTLM zipped
  2. Unpack the zip
  3. in the unzipped dir, run (replace with correct names):
$ pandoc -f html -o 0300-12-20-quality-assurance.md -t markdown 02QualityAssurance.html
  1. in the images directory, run (replace with correct prefix):
$ for file in *.png ; do mv "$file" "en_quality_assurance_${file}" ; done
  1. In the created .md file, replace (using regex) images/(image\d{2}.png) by {{site.baseurl}}/images/intermediate/en_quality_assurance_$1
Member

MappingKat commented Apr 3, 2013

+1

I dont see any problem with one on one... thanks for all the hard work and
the detailed migrate process. Much appreciated.

Cheers,

Katrina E.

Humanitarian Openstreetmap Team
Jakarta, Indonesia
@MappingKat
www.MappinKat.WordPress.com
On Apr 1, 2013 9:30 PM, "Yohan Boniface" notifications@github.com wrote:

As discussed yesterday on IRC, we (Haiti team) are interested in having
(at least some of) the intermediates docs online and translated.
I'll will work today on migrating the contents and photos from Gdocs.
One question: do we need all the intermediates doc to be able to start
publishing them in the brand new LearnOSM site or can we publish/translate
them one by one?


Reply to this email directly or view it on GitHubhttps://github.com/hotosm/learnosm/issues/10#issuecomment-15717511
.

jmarlena closed this Dec 10, 2014

althio referenced this issue in hotosm/hotosm-project-ideas Feb 19, 2015

Open

Define Guidelines and Workflow for Images in LearnOSM #14

@Nick-Tallguy Nick-Tallguy added a commit that referenced this issue Jun 6, 2015

@Nick-Tallguy Nick-Tallguy Merge pull request #10 from Nick-Tallguy/ac
All-chapters to head of index in remaining language sections
0b23018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment