Problem in extracting text from PDF (font firasans) #962
danltw
started this conversation in
Ask for help with specific PDFs
Replies: 1 comment 4 replies
-
It appears to be an issue with pdfminer (pdfplumber uses pdfminer internally).
import pdfplumber.display
text = (
pdfplumber.display.pypdfium2.PdfDocument("Downloads/test.pdf")
.get_page(0)
.get_textpage()
.get_text_range()
)
print(text)
# 64
# HMM Leadership Topic Summaries
# 1. CHANGE MANAGEMENT
# • Foster skills for adapting to continual change
# • Identify and carry out opportunities for improvement
# • Implement formal change programs
# • Address factors that can derail change
# 2. COACHING
# • Identify and act on coaching opportunities
# • Listen and question effectively during coaching
# • Give constructive feedback during coaching
# • Coach employees to become agile learners
# • Develop awareness and skills to coach all employees
# 3. DEVELOPING EMPLOYEES
# • Tailor development strategies to individual employees
# • Help employees create and implement development plans
# • Identify and design experiences that foster individual development
# • Build your team members’ global skills
# 4. DIFFICULT INTERACTIONS
# • Determine which conflicts to resolve
# • Address the negative emotions conflict raises
# • Clarify the facts of an interpersonal conflict
# • Solve the problem underlying a difficult interaction
# • Manage conflict between direct reports
# 5. DIGITAL INTELLIGENCE
# • Adopt a digital mindset—and foster one in others
# • Cultivate a team culture that thrives in today’s digital world
# • Use data responsibly and effectively
# • Prioritize and act on digital opportunities
# 6. FEEDBACK ESSENTIALS
# • Give effective feedback
# • Tailor feedback to the individual
# • Create an environment that encourages improvement through feedback
# • Seek feedback to improve your performance |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm not sure if the issue lies in the PDF font being firasans (embedded). I've inspected the fonts below as below:
Seems to be able to be read as pdf object in
pdf = pdfplumber.open()
but no text is extracted. I am unable to get the text frompdf2txt.py
frompdfminer
as well.pdf in question:
test.pdf
pdfminer says that it supports the font type that the pdf fonts are in, as inspected. I don't have the source file.
Any advice on this?
Beta Was this translation helpful? Give feedback.
All reactions