Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

custome html format #41

Closed
AniFengx opened this issue Jan 18, 2018 · 13 comments
Closed

custome html format #41

AniFengx opened this issue Jan 18, 2018 · 13 comments

Comments

@AniFengx
Copy link

Can i control the jodconverter convert some file to .html that the format is what i want.
Such as charset=gb2312 i need utf-8.
Sorry foy my bad English

@sbraconnier
Copy link
Member

Could you please check if it's possible to do so using LibreOffice or Apache OpenOffice without JOConverter ? If the anwser is yes, then there is a way to do it with JODConverter.

Could you also check if #16 is somehow what you are trying to do??

@AniFengx
Copy link
Author

Hey , i have found the option to set the charset for html in libreoffice gui.
But do you have any method to add the property to your jodconverter?
And i try to ues the JsonDocumentFormatRegistry to add a DocumentFormat by copying your HTML format and adding loadproperties in it.It doesn't work.
such as :
String a = "[{\"name\": \"HTML\",\"extension\": \"ht\",\"mediaType\": \"text/html\", \"inputFamily\": \"TEXT\", \"loadProperties\": {\"CharacterSet\": \"utf-8\" },\"storeProperties\": { \"SPREADSHEET\": { \"FilterName\": \"HTML (StarCalc)\" }, \"PRESENTATION\": { \"FilterName\": \"impress_html_Export\" }, \"TEXT\": { \"FilterName\": \"HTML (StarWriter)\" } } }]"; )
JsonDocumentFormatRegistry customRegistry = JsonDocumentFormatRegistry.create(a);
LocalConverter.builder().formatRegistry(customRegistry)

@sbraconnier
Copy link
Member

sbraconnier commented Jan 19, 2018

Can you please tell me where is this option in the LibreOffice GUI ? I didn't find it.

Also, the custom registry you built will only be able to load and save file from/to html. And the charset you uses will only be applied when loading an html document.

Can you please tell me what is the format of the source document you try to convert to HTML ?

@AniFengx
Copy link
Author

This is the Page about the option .https://help.libreoffice.org/Common/HTML_compatibility
: Choose Tools - Options' - Load/Save' - HTML Compatibility.
.doc or .docx is the format of the source document which i try to convert.
I mainly convert MS Office to .html for previewing in my program.

@AniFengx
Copy link
Author

And another quertion about MS Office to .html.
When i use jodconverter to convert a .doc to .html , it will create a .html file and some .png because the word has some images. But those images will be compiled into base64 in html file if i use the libreoffice gui.
Do you have idea about to control jodconverter to use base64 not some images.

@sbraconnier
Copy link
Member

You have to set the templateProfileDir to the path of the user directory you configured if you want these settings to be applied while using JODConverter.

Here on my Windows, after settings LibreOffice to suit my needs (using the GUI), I copied the directory C:\Users\myUser\AppData\Roaming\LibreOffice\4 into C:\JodConverter\templateProfile (to act as a template profile directory that won't change) and set the templateProfileDir to this copy:

LocalOfficeManager.builder()
    .templateProfileDir("C:\\JodConverter\\LO\\templateProfile")
    .install()
    .build();

I search a lot for your second question and found that it is possible to embed the images using an option with the HTML format filter. Here's a working example:

final File input = new File("input.doc");
final File output = new File("output.html");

final DocumentFormat format = DocumentFormat.copy(DefaultDocumentFormatRegistry.HTML);
format.getStoreProperties(DocumentFamily.TEXT).put("FilterOptions", "EmbedImages");
JodConverter.convert(input).to(output).as(format).execute();

@sbraconnier
Copy link
Member

Note that with my latest commit, the DocumentFormat must be created this way:

final File input = new File("input.doc");
final File output = new File("output.html");

final DocumentFormat format =
    DocumentFormat.builder()
        .from(DefaultDocumentFormatRegistry.HTML)
        .storeProperty(DocumentFamily.TEXT, "FilterOptions", "EmbedImages")
        .build();
JodConverter.convert(input).to(output).as(format).execute();

@AniFengx
Copy link
Author

AniFengx commented Jan 24, 2018

Do you try it in jodconverter Online module ?
i use it such as :

final DocumentFormat format =
			 DocumentFormat.builder()
			      .from(DefaultDocumentFormatRegistry.HTML)
			      .storeProperty(DocumentFamily.TEXT, "FilterOptions", "EmbedImages")
			       .build();
File tempFile1 = new File("d:/root/fileupload/DevCard/" +"a.doc");
try {
			OnlineConverter.make(officeManager)
			.convert(tempFile1)
			.as(DefaultDocumentFormatRegistry.DOC)
			.to(tempFile)
			.as(format)
			.execute();
		} 

and it still create html file by useing <img src="a_html_d9c47caae6455f3d.png" ...>

@sbraconnier
Copy link
Member

sbraconnier commented Jan 24, 2018

LibreOffice /Collabora Online only supports conversion "as is". You cannot customize it using filters or custom load/store properties (as far as I know). Maybe they will in the future but for now, the JODConverer Online module only uses the "extension" part of the DocumentFormat in order to build the required URL to execute the conversion.

See Using the Collabora Online / LibreOffice Online without JODConverter for more info.

@AniFengx
Copy link
Author

OK,i will consider using a springboot-jodconverter with libreoffice(local) for compromise plan

@sbraconnier
Copy link
Member

Note that I've uploaded a new sample rest api that can be used with the jodconverter-online module as client.

Using this sample as server, this would work.

This is just a sample though...

@AniFengx
Copy link
Author

@sbraconnier Thank you so much for doing this.I truely need it .

@cuipengfei
Copy link

You have to set the templateProfileDir to the path of the user directory you configured if you want these settings to be applied while using JODConverter.

Here on my Windows, after settings LibreOffice to suit my needs (using the GUI), I copied the directory C:\Users\myUser\AppData\Roaming\LibreOffice\4 into C:\JodConverter\templateProfile (to act as a template profile directory that won't change) and set the templateProfileDir to this copy:

LocalOfficeManager.builder()
    .templateProfileDir("C:\\JodConverter\\LO\\templateProfile")
    .install()
    .build();

I search a lot for your second question and found that it is possible to embed the images using an option with the HTML format filter. Here's a working example:

final File input = new File("input.doc");
final File output = new File("output.html");

final DocumentFormat format = DocumentFormat.copy(DefaultDocumentFormatRegistry.HTML);
format.getStoreProperties(DocumentFamily.TEXT).put("FilterOptions", "EmbedImages");
JodConverter.convert(input).to(output).as(format).execute();

Just tried templateProfileDir and it worked well.
Is it recommended for production? Are there recommendations of either the AppData\Roaming\LibreOffice\4 folder should be somehow cleaned/debloated before copying into prod?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants