Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Special letters" are being converted to regular ones #92

Open
jkreuz opened this issue Jan 27, 2021 · 0 comments
Open

"Special letters" are being converted to regular ones #92

jkreuz opened this issue Jan 27, 2021 · 0 comments

Comments

@jkreuz
Copy link

jkreuz commented Jan 27, 2021

Hello

Is it possible in some way to define what language the news is in, so it could be fetched correctly?
I used the library for a news in Portuguese, but it converted "special letters" to regular ones.
It highly compromises NLP procedures that deals with syntax, context etc.

example: "àáéóíúâôêãõç" is converted to "aaeiuaoeaoc"

from newsfetch.news import newspaper
news = newspaper('https://g1.globo.com/sc/santa-catarina/noticia/2021/01/20/greve-na-comcap-coleta-feita-por-empresa-privada-em-florianopolis-vai-abranger-35percent-do-roteiro-diz-prefeitura.ghtml')

I saw inside the class it is used Newspaper3K Scraper and if I enforce the right language it returns the correct text.

from newspaper import Article
article = Article(url, language='pt')

thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant