Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Argument boxes_flow doesn't work for me (pdf2txt tool) #540

Closed
jiripraha opened this issue Oct 28, 2020 · 2 comments · Fixed by #682
Closed

Argument boxes_flow doesn't work for me (pdf2txt tool) #540

jiripraha opened this issue Oct 28, 2020 · 2 comments · Fixed by #682

Comments

@jiripraha
Copy link

The boxes_flow argument doesn't work for me in the pdf2txt tool. Float type, also disabled option.

It helped me:

 if x.lower().strip() == "disabled":
     return x.lower().strip()
 try:
     return float(x)

//--------

        paramv = locals().get(param, None)
        if param == "boxes_flow" and paramv == "disabled":
             setattr(laparams, param, None)
        elif paramv is not None:
             setattr(laparams, param, paramv)

Have a nice day

@pietermarsman
Copy link
Member

Hi @jiripraha,

Unfortunately I don't understand what's not working for you.

Can you share the command and/or code you are using? Which version of pdfminer.six are you using?

@pietermarsman pietermarsman added this to new in pdfminer.six via automation Nov 8, 2020
@pietermarsman pietermarsman moved this from new to needs more info in pdfminer.six Nov 8, 2020
@jiripraha
Copy link
Author

Hi,
I tried to use the 'boxes_flow' parameter when I ran pdf2txt.py and I think there is an error. I don't dare to pull the request yet, sorry. I'm just getting started with python, but I described the fix above (line 20 and line 46 pdf2txt.py). The parameter is badly processed in pdf2txt and is not used for further processing (both float and disabled options too). Thanks for your patience.
Examples of commands:
"python pdf2txt.py book.pdf --outfile book.html --output_type html --boxes-flow disabled --codec UTF8"
"python pdf2txt.py book.pdf --outfile book.html --output_type html --boxes-flow 0.9 --codec UTF8"

pdfminer.six automation moved this from needs more info to done Jan 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
pdfminer.six
  
done
Development

Successfully merging a pull request may close this issue.

2 participants