Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed issue with printing unicode #126

Merged
merged 71 commits into from Nov 7, 2020
Merged

Conversation

DarrelDonald
Copy link
Contributor

Whenever there were unicode characters that needed to be printed, an error would be produced. I encoded the print statements and it resolved the issue.

Darrel Donald added 2 commits December 31, 2019 21:32
To fix error with printing unicode
Fix problem with printing unicode
@johnwmillr
Copy link
Owner

Hi Darrel,

Can you post some examples of songs that were giving you errors before this change?

Thanks,
John

@DarrelDonald
Copy link
Contributor Author

DarrelDonald commented Jan 10, 2020

"'Till I Collapse" by Eminem was the only one I encountered before modifying the code. I was trying to download all of Eminem's songs. I think there were a lot because I had it print a message in the console every time it would happen at first, but I couldn't see exactly which songs were doing it.

@johnwmillr
Copy link
Owner

I can't recreate this issue with the latest version of the package (1.8.2). Can you test your search with the latest version of the package? Or provide example code that produces the error?

John

@DarrelDonald
Copy link
Contributor Author

DarrelDonald commented Jan 15, 2020

python3 -m lyricsgenius song "'Till I Collapse" "Eminem" --save
Searching for "'Till I Collapse" by Eminem...
Done.
Traceback (most recent call last):
  File "\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "\lib\site-packages\lyricsgenius\__main__.py", line 56, in <module>
    main()
  File "\lib\site-packages\lyricsgenius\__main__.py", line 43, in main
    print("Saving lyrics to '{s}'...".format(s=song.title))
  File "\lib\encodings\cp437.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2019' in position 18: character maps to <undefined>

@johnwmillr
Copy link
Owner

Hi @DarrelDonald, sorry for the delay. Are you still running into this issue? What version of Python and OS are you using?

@DarrelDonald
Copy link
Contributor Author

I haven't used it since then. I was using Python 3.5 and my operating system was Windows 10.

@allerter
Copy link
Collaborator

allerter commented Sep 5, 2020

@johnwmillr, this happened to me too recently. Try this:

song = genius.search_song('60 days sober and cool', artist='Noah Cyrus')
song.to_text('song.txt')

The problem is with the \u2005 character which is one of the space characters. The difference with Darrel's issue is that there was a Unicode character in the song's title, unlike mine which was in the lyrics. I guess we should encode all text to 'utf8' when printing/saving if this issue doesn't have to do with my Python environment.
There is already an issue with this problem at #138

@johnwmillr
Copy link
Owner

@allerter, does the snippet you shared produce an error for you? When running in my environment, the song saves without issue.

@allerter
Copy link
Collaborator

allerter commented Sep 8, 2020

@johnwmillr, it does. I guess it might have to do with my environment. But either way, the problem in this issue is probably an actual problem. It's because the Windows console uses a different charset but Python 3 deals in Unicode. So when trying to print a string that has Unicode characters, it results in an error in Python 3.5 and lower. This was solved in 3.6 since Python bypasses console I/O to support Unicode. If someone with 3.5 and lower wanted to be able to print Unicode they would have to do this (from a solution on SO):

chcp 65001
set PYTHONIOENCODING=utf-8

@johnwmillr
Copy link
Owner

Thanks for the explanation, @allerter. Do you know of a package-wide approach that would address the <= 3.5 issue that would be more robust than adding .encode('utf8')) wherever we print or save text?

@allerter
Copy link
Collaborator

allerter commented Sep 8, 2020

@johnwmillr, unfortunately, I don't know of a package-wide way to achieve this. We could probably set the PYTHONIOENCODING environment variable to utf-8. That would solve the issue of printing Unicode characters. If we only set it once when the Genius class is instantiated, we would have to rely on the user not changing this later on. So I don't think that's a good idea.
Looking at this question on StackOverflow, I think this might be the way to go:

  • Saving: Using encoding='utf8' whenever we save a text file. The reason why saving the song worked with your environment is because Python uses locale.getpreferredencoding() to infer the file's encoding and set the encoding parameter and yours is probably set to utf8, but that's not the case for me. So I think it's best to explicitly set the encoding ourselves.
  • Printing: Encoding by utf8 and decoding by the user's stdout's encoding which we could make it a utility function to call whenever needed:
def print_unicode(s):
    print(s.encode('utf-8').decode(sys.stdout.encoding, errors='replace'))

The good thing about this function is that if the user's console can print Unicode characters, it will print everything, and when it can't, it will display the text with the unicode character like \u2005. For example:
String >>> There is a \u2005 here
If sys.stdout.encoding is utf8 or another that can handle the character >>> There is a   here
If sys.stdout.encoding can't handle the character >>> There is a \u2005 here
Although this looks good to me, there might be a better way to handle this that I'm not aware of.

added encoding='utf8' when saving text files
updated utils
@allerter allerter mentioned this pull request Sep 29, 2020
moved Artist and Song to types
added TokenRequiredError
updated docs,
added public_api to methods (incomplete)
added the public_api attr to Genius,
removed pytest from dependencies
update TokenRequiredError,
update overlapping methods
This reverts commit 3386282644198706639e858098af18b560976ccd.
- added "discussions" public API endpoint,
- updated docs
reconfigured Genius.lyrics() to:
    * fetch the page through the Sender
    * extract the lyrics not only more concisely, but also without changing the original format of the lyrics (adding newlines)
* reconfigured Sender to:
    * allow making requests to Genius at https://genius.com
    * remove SLEEP_TIME to allow users to supply their own rate limiting (by default 0.2) instead of enforcing it
    * move the authorization header to the authorization_header attribute instead of re-creating it for every API call
    * to check lyrics_state along with the title
* reconfigured search_* that return an object:
    * to check lyrics_state to avoid getting empty lyrics
* moved setting excluded_terms to __init__
* moved _clean_str to utils and renamed to clean_str
* added tests for genius.tag
* reordered some of the imports and the functions alphabetically
- fix genius.tag() example
@allerter

This comment has been minimized.

@allerter allerter linked an issue Nov 6, 2020 that may be closed by this pull request
- genius.tag(): replaced unicodedata.normalize with only replacing \xa0
@allerter
Copy link
Collaborator

allerter commented Nov 6, 2020

😅 I think I forgot to squash the merge. As unicodedata.normalize turned out not to work, I added the safe_unicode function in utils.py and used it wherever the package prints something that might lead to the UnicodeEncodeError. What do you think about this solution, @johnwmillr?
Also, all the open()s that save lyrics, will now have encoding='utf8' which will solve saving lyrics that contatin Unicode characters (#138). This kinda removes the need for the binary_encoding parameter if it was only meant for the Unicode issue.

@allerter
Copy link
Collaborator

allerter commented Nov 7, 2020

Another solution would be to use the logging module from Python's standard libraries.

@allerter
Copy link
Collaborator

allerter commented Nov 7, 2020

I resolved the conflicts but there was a green commit merge and since I wasn't sure if it would update the package or the PR, I didn't submit it.

@johnwmillr
Copy link
Owner

The PR looks good! Thank you @DarrelDonald and @allerter for your work on this. Merging now.

@johnwmillr johnwmillr merged commit 00e44c9 into johnwmillr:master Nov 7, 2020
johnwmillr added a commit that referenced this pull request Nov 8, 2020
* Update to origin (#8)

* added auth

* added voting methods for annotations

* renamed client_access_token to access_token

* added documentation for authorization,
renamed all instances of client_access_token to access_token

* fixed instance parse_redirected_url didn't work with 'token' flow

* added OAuth2 documentation,
modified OAuth2 to make it user-friendlier

* removed Authenticating yourself from snippets,
added missing commas

* Revert "removed Authenticating yourself from snippets,"

This reverts commit d2e99bb.

* added missing comma

* added more missing commas

* added missing word in sippets

* changed get_user_auth_url to auth.url property,
changed 'all' scope type to tuple,
fixed combining scopes (changed ',' to ' ')

* changed calling get_user_auth_url to url

* fixed client_only_app not in OAuth2 attributes

* fixed parse_redirected_url

* set None in POST body params when optional annotation parameteres aren't supplied

* revert to current _make_request

* fixed typo

* added some docs, fixed new methods' payload

* fixed response for 204 response,
added tests

* fixed added test bugs

* fixed update_annotation and added its test,
added docs for contributing

* added example redirect uri to snippets

* commit changes

* add text_format to API annotation methods

* fixed prompt_user bug

* fix flake8 error

* fixed bugs, added auth tests

* fix test_url error

* fix wrong function name

* removed test_url

Co-authored-by: John W. Miller <johnwmillr@users.noreply.github.com>

* - added encoding using utf-8 when saving files
- removed binary encoding
- genius.tag(): replaced unicodedata.normalize with only replacing \xa0
- fixed IndentationError
- __main__: made import relative

* fix flake8 error

Co-authored-by: John W. Miller <johnwmillr@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Issue With Encoding Character '\u2005' For Song "Closed on Sunday"
3 participants