New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mister Wong Import tweaks #146

Closed
spackmat opened this Issue Nov 16, 2013 · 3 comments

Comments

Projects
None yet
2 participants
@spackmat

spackmat commented Nov 16, 2013

Hi,

I migrated my old Mister Wong Bookmarks file into Shaarli. But I had to convert several format ditches. One of them could be automatted by Shaarli, one seems to be kind of a bug in Shaarli:

  • Mister Wong has its tags in the tags-attribute of the link, but also in the format " (tags: tag1 tag2)" after the description. This is specific problem and can easily be corrected with a RegEx over tze file before the import.
  • The dates in the bookmars file come in the format Y-m-d H:i:s instead of a UNIX-timestamp. This wa a little harder for me to convert, so I scripted a small converter for this, as I post at the bottom of this report. The core could also be implemented into Shaarli, as it is just a simple PHP call to convert Y-m-d H:i:s dates into U dates:
// in importFile() instead of
elseif ($attr=='ADD_DATE') $raw_add_date=intval($value);

//you could use something like this
elseif ($attr=='ADD_DATE')
{
  if (is_int($value)) $raw_add_date=intval($value);
  elseif (preg_match('/^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}$/', $value)) $raw_add_date=DateTime::createFromFormat("Y-m-d H:i:s", $value)->format("U");
}
  • All descriptions are imported with a closing DD-tag. Since the importer explodes by the opener DD-tag, this happens with all bookmark files closing their tags. To fix it, simply replace the following line:
// in importFile() instead of
$link['description'] = (isset($d[1]) ? html_entity_decode(trim($d[1]),ENT_QUOTES,'UTF-8') : '');  // Get description (optional)

// use something linke this
$link['description'] = (isset($d[1]) ? html_entity_decode(str_replace('</DD>', '', trim($d[1])),ENT_QUOTES,'UTF-8') : '');  // Get description (optional)

This would help former Mister Wong users a lot.

The Code for my small Mister Wong cleanup script:

$html = file_get_contents(__DIR__ . '/mister_wong_export.html');
$html = preg_replace('/( \(tags:.*?\))?<\/DD>/', '', $html);
$html = preg_replace_callback('/\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}/',
  create_function(
    '$match',
    'return DateTime::createFromFormat("Y-m-d H:i:s", $match[0])->format("U");'
  ), $html);
file_put_contents(__DIR__ . '/mister_wong_export_washed.html', $html)
@nodiscc

This comment has been minimized.

Show comment
Hide comment
@nodiscc

nodiscc Nov 4, 2014

Hi @spackmat , what do you think about adding this to the wiki? I could do it, but we need to format this as a proper script first.

There are already other tweaks https://github.com/shaarli/Shaarli/wiki#notes to import from some services. I don't think shaarli should support every format out there (keep it simple). Buuuut having external tools/import/export scripts is nice.

nodiscc commented Nov 4, 2014

Hi @spackmat , what do you think about adding this to the wiki? I could do it, but we need to format this as a proper script first.

There are already other tweaks https://github.com/shaarli/Shaarli/wiki#notes to import from some services. I don't think shaarli should support every format out there (keep it simple). Buuuut having external tools/import/export scripts is nice.

@spackmat

This comment has been minimized.

Show comment
Hide comment
@spackmat

spackmat Nov 14, 2014

Hi @nodiscc,

this is a good idea, buy at the moment I'm on vacation and don't have any brain capacity to finish this. Would be cool, if you could do it. Or simply link to this ticket from the Wiki. But to be honest, I don't think there are many old Mister Wong users left, who want to import their old exports into anything at all. So I think this should be more of a generic hint for such tasks in general: How to fix oddly formatted bookmark exports to be imported properly into shaarli.

Greets,
spackmat

spackmat commented Nov 14, 2014

Hi @nodiscc,

this is a good idea, buy at the moment I'm on vacation and don't have any brain capacity to finish this. Would be cool, if you could do it. Or simply link to this ticket from the Wiki. But to be honest, I don't think there are many old Mister Wong users left, who want to import their old exports into anything at all. So I think this should be more of a generic hint for such tasks in general: How to fix oddly formatted bookmark exports to be imported properly into shaarli.

Greets,
spackmat

@spackmat spackmat closed this Nov 14, 2014

@nodiscc

This comment has been minimized.

Show comment
Hide comment
@nodiscc

nodiscc Nov 14, 2014

@spackmat cool, I linked to this issue from https://github.com/shaarli/Shaarli/wiki#importing-from-mister-wong. Yep I didn't know of Mister Wong, maybe it's not worth the headache to write a proper script for this (few users). Thanks for your help anyway!

nodiscc commented Nov 14, 2014

@spackmat cool, I linked to this issue from https://github.com/shaarli/Shaarli/wiki#importing-from-mister-wong. Yep I didn't know of Mister Wong, maybe it's not worth the headache to write a proper script for this (few users). Thanks for your help anyway!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment