Skip to content

Justified Output of Chinese Characters not Aligned Properly #204

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cheunglone opened this issue Nov 28, 2018 · 13 comments
Closed

Justified Output of Chinese Characters not Aligned Properly #204

cheunglone opened this issue Nov 28, 2018 · 13 comments
Assignees
Labels
Milestone

Comments

@cheunglone
Copy link

Trying to Chinese text to 2-column with "justified" text alignment. The text is not aligned properly.

While English text is aligned properly for same Layout.xml.

image

image

@pgundlach pgundlach self-assigned this Nov 28, 2018
@pgundlach pgundlach added the Bug label Nov 28, 2018
@pgundlach
Copy link
Member

This looks ugly. I can't promise a quick solution, but I'll have a look.

@cheunglone
Copy link
Author

cheunglone commented Nov 28, 2018 via email

@pgundlach
Copy link
Member

There was no attachment in your mail (filtered by github probably), but I think I can start debugging without these files. I'd come back to you if I need help.

@pgundlach
Copy link
Member

Sorry for the late reply. Do you have a test document? I got good results with my texts.

BTW: do you (by chance) have line breaks in the text and not used ignoreeol with ?

@cheunglone
Copy link
Author

Hi,

Thanks for your update.

I also wonder if that is the case, and checked not quite likely. Please find the test document as attached.

Uploading annual_report.zip…

@cheunglone
Copy link
Author

data - ico - eng.xml.txt

data - ico - chn.xml.txt

layout.xml.txt

seems I cannot upload zip of the working folder. Please see the data.xml and layout.xml as attached.

@pgundlach
Copy link
Member

Thank you. I can now reproduce the problem with my setup. I'll have a look.

@pgundlach
Copy link
Member

This is a smaller test file for the problem.

<Layout xmlns="urn:speedata.de:2009/publisher/en" xmlns:sd="urn:speedata:2009/publisher/functions/en">
	<Options ignoreeol="yes"/>
	<Trace grid="yes"/>

	<SetGrid nx="19" height="12pt"/>

	<Pagetype name="x" test="true()">
		<Margin left="1cm" right="2cm" top="1cm" bottom="1cm"/>
		<PositioningArea name="2c">
			<PositioningFrame width="9" height="15" row="10" column="1"/>
			<PositioningFrame width="9" height="15" row="10" column="11"/>
		</PositioningArea>
	</Pagetype>

	<LoadFontfile name="R" filename="Arial Unicode.ttf"/>

	<DefineFontfamily name="text" fontsize="10" leading="12">
		<Regular fontface="R"/>
	</DefineFontfamily>

	<Record element="data">
		<ProcessNode select="para"/>
	</Record>

	<Record element="para">
		<Output area="2c" allocate="auto">
			<Text>
				<Paragraph textformat="text">
					<Value select="."/>
				</Paragraph>
			</Text>
		</Output>
	</Record>
</Layout>

And the data:

<data>
  <para>本人謹代表揚科集團有限公司(「本公司」)董事會(「董事會」)欣然提呈本公司截至2018年3月31日止年度(「2018財政年度」)的年報,包括本公司及其附屬公司(統稱「本集團」)經審核綜合財務報表。</para>
  <para>2018財政年度對本集團而言是頗具挑戰的一年。本集團於2018財政年度錄得本公司股東應佔綜合虧損凈額約11.2百萬港元。虧損狀況乃由於本集團於2018財政年度不利的投標結果及本集團現有大型資訊科技項目實施階段較過往年度確認的收入大幅減少(原因為該等項目的實施階段已進入其收尾階段及大致上於2018財政年度完工)之合併結果。</para>
</data>

@iclukas
Copy link
Contributor

iclukas commented Dec 14, 2018

I don’t think this is really a bug. The Chinese text is missing spaces, so it can’t be justified. You can add zero-width spaces after each character to make justification work, like so:

<Paragraph textformat="text">
	<Loop select="string-length(string(.))" variable="i">
		<Value select="substring(string(.),$i,1)"/>
		<HSpace width="0"/>
	</Loop>
</Paragraph>

You could also add zero-width spaces to your source data

<data>
  <para>本&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;2018年&#8203;3月&#8203;31日&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;2018財&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;&#8203;。</para>
  <para>2018財&#8203;政&#8203;年&#8203;度&#8203;對&#8203;本&#8203;集&#8203;團&#8203;而&#8203;言&#8203;是&#8203;頗&#8203;具&#8203;挑&#8203;戰&#8203;的&#8203;一&#8203;年&#8203;。&#8203;本&#8203;集&#8203;團&#8203;於&#8203;2018財&#8203;政&#8203;年&#8203;度&#8203;錄&#8203;得&#8203;本&#8203;公&#8203;司&#8203;股&#8203;東&#8203;應&#8203;佔&#8203;綜&#8203;合&#8203;虧&#8203;損&#8203;凈&#8203;額&#8203;約&#8203;11.2百&#8203;萬&#8203;港&#8203;元&#8203;。&#8203;虧&#8203;損&#8203;狀&#8203;況&#8203;乃&#8203;由&#8203;於&#8203;本&#8203;集&#8203;團&#8203;於&#8203;2018財&#8203;政&#8203;年&#8203;度&#8203;不&#8203;利&#8203;的&#8203;投&#8203;標&#8203;結&#8203;果&#8203;及&#8203;本&#8203;集&#8203;團&#8203;現&#8203;有&#8203;大&#8203;型&#8203;資&#8203;訊&#8203;科&#8203;技&#8203;項&#8203;目&#8203;實&#8203;施&#8203;階&#8203;段&#8203;較&#8203;過&#8203;往&#8203;年&#8203;度&#8203;確&#8203;認&#8203;的&#8203;收&#8203;入&#8203;大&#8203;幅&#8203;減&#8203;少&#8203;(&#8203;原&#8203;因&#8203;為&#8203;該&#8203;等&#8203;項&#8203;目&#8203;的&#8203;實&#8203;施&#8203;階&#8203;段&#8203;已&#8203;進&#8203;入&#8203;其&#8203;收&#8203;尾&#8203;階&#8203;段&#8203;及&#8203;大&#8203;致&#8203;上&#8203;於&#8203;2018財&#8203;政&#8203;年&#8203;度&#8203;完&#8203;工&#8203;)&#8203;之&#8203;合&#8203;併&#8203;結&#8203;果&#8203;。&#8203;</para>
</data>

and use allowbreak

<Paragraph textformat="text" allowbreak="&#8203;">
	<Value select="."/>
</Paragraph>

@cheunglone
Copy link
Author

Hi pgundlach and iclukas,

Cool, I tried the smaller test file and the suggestion. It works very nice. Thanks so much.

Although not immediate worked in my file, I will start working on it. Thanks again.

Adam

@pgundlach pgundlach reopened this Dec 17, 2018
@pgundlach
Copy link
Member

I think this stll needs some attention. In my testcase above when you set allocate="yes" instead of "auto", the formatting is different.

Also I think the rules of line breaking in Chinese must be fixed. There are no rules whatsovever, and this is not good.

@cheunglone
Copy link
Author

I agree so.

Thought I can put empty spaces to the content. We don't have them in Chinese content naturally.

Also, somehow, we feel like Chinese characters are fixed width, and shouldn't it be easier to do the alignment?

anyway, thanks for the ongoing feedback.

@cheunglone cheunglone reopened this Jan 25, 2019
@pgundlach pgundlach added this to the Version 3.8 milestone Feb 12, 2019
@pgundlach pgundlach modified the milestones: Version 3.8, Version 3.10 Dec 20, 2019
@pgundlach
Copy link
Member

The next version will focus on non-western scripts.

@pgundlach pgundlach modified the milestones: Version 3.10, Version 4.2 Oct 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants