Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Font size of the char given by pdfplumber is diffrent from the actual font size. #63

Closed
suyogchadawar opened this issue May 25, 2018 · 8 comments

Comments

@suyogchadawar
Copy link

Hi, the font size given by pdfplumber does not match with the font size given by adobe acrobat dc pro. Is there any correlation between them?
Can we get font color from pdfplumber?

@jsvine
Copy link
Owner

jsvine commented May 25, 2018

the font size given by pdfplumber does not match with the font size given by adobe acrobat dc pro.

Can you provide the following?:

  • An example of a PDF that displays this behavior
  • The output you get
  • The font size that Adobe Acrobat indicates

Can we get font color from pdfplumber?

In pdfplumber output for the text/characters of interest, do you see the fill and stroke attributes? If not, can you paste the pdfplumber object-representation for a character that you believe should have color information but doesn't?

@suyogchadawar
Copy link
Author

In pdfplumber output for the text/characters of interest, do you see the fill and stroke attributes? If not, can you paste the pdfplumber object-representation for a character that you believe should have color information but doesn't

{'adv': Decimal('0.507'), 'fontname': 'QYZTAI+Calibri-Light', 'doctop': Decimal('83.708'), 'y1': Decimal('708.292'), 'bottom': Decimal('100.704'), 'text': u'1', 'top': Decimal('83.708'), 'object_type': 'char', 'height': Decimal('16.996'), 'width': Decimal('7.057'), 'page_number': 1, 'upright': True, 'y0': Decimal('691.296'), 'x0': Decimal('72.000'), 'x1': Decimal('79.057'), 'size': Decimal('16.996')}

i got this output by using following code:

` file_read=pdfplumber.open(in_file)

for temp in file_read.pages:

dict_char = temp.chars

for t in dict_char:

    print(t) `

@suyogchadawar
Copy link
Author

suyogchadawar commented May 25, 2018

An example of a PDF that displays this behavior
The output you get
The font size that Adobe Acrobat indicates

1
splited0.pdf

2
{'adv': Decimal('0.507'), 'fontname': 'QYZTAI+Calibri-Light', 'doctop': Decimal('83.708'), 'y1': Decimal('708.292'), 'bottom': Decimal('100.704'), 'text': u'1', 'top': Decimal('83.708'), 'object_type': 'char', 'height': Decimal('16.996'), 'width': Decimal('7.057'), 'page_number': 1, 'upright': True, 'y0': Decimal('691.296'), 'x0': Decimal('72.000'), 'x1': Decimal('79.057'), 'size': Decimal('16.996')}

i am getting size as 16.996 using pdfplumber

3
13.92
capture

@suyogchadawar
Copy link
Author

hi @jsvine , did you get the chance to look into this?

@jsvine
Copy link
Owner

jsvine commented May 30, 2018

Hi, @suyogchadawar. I don't know exactly what's causing this issue; pdfplumber depends on pdfminer.six for text extraction. It does seem, though, that you can arrive at the Adobe-determined font size by dividing width by adv. In your example, that'd be: 7.057 / 0.507 = 13.919.

See here for the computation of the adv attribute: https://github.com/euske/pdfminer/blob/master/pdfminer/layout.py#L230

@suyogchadawar
Copy link
Author

Thanks this worked for me

It does seem, though, that you can arrive at the Adobe-determined font size by dividing width by adv.

but still I am not able to find color of the character from pdfplumber.

@jsvine
Copy link
Owner

jsvine commented May 30, 2018

It appears that character color is not currently accessible via pdfplumber. I'll see if there's a way to add that in future versions.

@suyogchadawar
Copy link
Author

ok sure thanks

@jsvine jsvine closed this as completed May 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants