Non-English language support #29

lfschafaschek · 2021-05-25T14:45:19Z

With a Kindle in Portuguese, highlights's location and date aren't added in Notion... I think the problem is date format, that is different in portuguese.
I tested change language to English, and create a new highlight; this one was exported correct (it's the last registre in the file).
My Clippings.txt

paperboi · 2021-05-25T17:44:01Z

For reference, this is the snippet of code that scrapes out the location, page and date information from the text file.
See lines 49-53 in /kindle2notion/parasing.py

The function that addresses this is pasted below:

def _parse_page_location_and_date(raw_clipping_list: List) -> Tuple[str, str, str]:
    second_line = raw_clipping_list[1]
    second_line_as_list = second_line.strip().split(' | ')
    page = location = date = ''
    for element in second_line_as_list:
        element = element.lower()
        if 'page' in element:
            page = element[element.find('page'):].replace('page', '').strip()
        if 'location' in element:
            location = element[element.find('location'):].replace('location', '').strip()
        if 'added on' in element:
            date = parse(element[element.find('added on'):].replace('added on', '').strip())
            date = date.strftime('%A, %d %B %Y %I:%M:%S %p')

    return page, location, date

One would need to replace 'page' , 'location' and 'added on' in lines 49, 51, 53 with their language equivalent terms as used in the respective My Clippings.txt file to get the relevant result.

In your case from my limited understanding it would be 'destaque na página', 'destaque ou posição, and Adicionado: .

Leaving this issue open cause I'm unsure of how to incorporate this feature within the structure of the package. I'm open to hearing inputs from the GH community on this one. A working solution may be to identify the language on scraping the first clipping and adapting the relevant keywords to fetch respectively. I can change the languages on my Kindle and make some test clippings so that they would get saved in that language in the My Clippings file and code from there.

asyr01 · 2021-06-08T21:54:11Z

Really appreciate the hard work you put in.
There is no problem with English. However when it comes to my Turkish Books,
Unfortunately there is missing worlds on notion which includes special letters in Turkish,
For example "i, ç , ü, ö", This non-english letters are missing,
Maybe we could find some way to handle it.
Also when we start the script for second time, if clippings are all same it could skip existing ones
and only append the new ones, is it possible?
Thanks, Have a good one.

paperboi · 2021-06-10T01:29:33Z

Placing #46 here for reference. Thanks for contributing again @asyr01!

Regarding your second question, the current package is already capable of doing that. It can be optimized with a JSON structure to track clippings instead of the current method.

mefonseca · 2021-06-11T14:27:46Z

Hi! Really appreciate this package!
I was also using Kindle in Portuguese and not getting the location and date. I changed my devise to English and it is all good now.
However, non-english letters are missing. I think is the same problem as @asyr01.
"Transformação" -> "Transformao"
"Mudança" -> "Mudana"
"Está" -> "est"
"Você" -> "voc"

I saw that the last commit was regard this issue:

raw_clippings_text = raw_clippings_text.encode("ascii", errors="xmlcharrefreplace").decode()

If it was only utf-8-sig I think it would read the non-english letter (I tried manually running the funcion "read_raw_clippings" on my "My Clippings.txt"), but I don't know what would happen on other parts of the code.

raw_clippings_text = open(clippings_file_path, "r", encoding="utf-8-sig").read()

Thank you!

paperboi · 2021-06-25T17:59:14Z

Hi! Really appreciate this package!
I was also using Kindle in Portuguese and not getting the location and date. I changed my devise to English and it is all good now.
However, non-english letters are missing. I think is the same problem as @asyr01.
"Transformação" -> "Transformao"
"Mudança" -> "Mudana"
"Está" -> "est"
"Você" -> "voc"

I saw that the last commit was regard this issue:
raw_clippings_text = raw_clippings_text.encode("ascii", errors="xmlcharrefreplace").decode()

If it was only utf-8-sig I think it would read the non-english letter (I tried manually running the funcion "read_raw_clippings" on my "My Clippings.txt"), but I don't know what would happen on other parts of the code.
raw_clippings_text = open(clippings_file_path, "r", encoding="utf-8-sig").read()

Thank you!

Thanks for the tip @mefonseca! Implemented your request in the latest release.
@asyr01 please update the package and try running it on your system. It should account for those letters now.

@lfschafaschek Will implement custom Portuguese support soon!

Thank you all for your patience and goodwill. Hope this fix addresses your issues here.

huhlik-cz · 2021-10-05T11:58:03Z

Hi, I'm running the latest version and I have the same issue as above but with the Czech characters like these: ěščřžňů. Can the Czech language be also supported? Thank you!

paperboi changed the title ~~Highlights's date and locations aren't exported~~ Non-English language support May 25, 2021

paperboi added enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed labels May 25, 2021

paperboi added this to To do in Enhancements Jun 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-English language support #29

Non-English language support #29

lfschafaschek commented May 25, 2021 •

edited

Loading

paperboi commented May 25, 2021 •

edited

Loading

asyr01 commented Jun 8, 2021 •

edited

Loading

paperboi commented Jun 10, 2021

mefonseca commented Jun 11, 2021 •

edited

Loading

paperboi commented Jun 25, 2021

huhlik-cz commented Oct 5, 2021

Non-English language support #29

Non-English language support #29

Comments

lfschafaschek commented May 25, 2021 • edited Loading

paperboi commented May 25, 2021 • edited Loading

asyr01 commented Jun 8, 2021 • edited Loading

paperboi commented Jun 10, 2021

mefonseca commented Jun 11, 2021 • edited Loading

paperboi commented Jun 25, 2021

huhlik-cz commented Oct 5, 2021

lfschafaschek commented May 25, 2021 •

edited

Loading

paperboi commented May 25, 2021 •

edited

Loading

asyr01 commented Jun 8, 2021 •

edited

Loading

mefonseca commented Jun 11, 2021 •

edited

Loading