I've used the profiler to find out which lines take most time to execute. It is this '.find()' method in the inline 'isany()' method, in the 'group_textboxes()' method of 'LTLayoutContainer' that takes 65% of the time!
This method takes so long because theboxes input is a long list. And this is directly caused by group_objects() not grouping vertical aligned objects by default. This can be enabled by setting LAParams.detect_vertical to True.
So you can fix your problem by using laparams = LAParams(detect_vertical=True).
ok thanks, I just ran some unit tests on normal (non-rotated) pages with the detect_vertical=True and didn't seem to get much of a performance loss, so I wonder why this is not enabled by default? Can be closed though.
@SVasilev @Migliorati the issue described by @thf24 is fixed by enabling detection of vertical text boxes. I consider this issue closed because this specific question is answered.
I get that this solution does not work for all PDF's and for all code. If you have performance issues with specific PDF's or if you think pdfminer.six is slow in general for some subset of all PDF's, feel free to open a new issue.