sra_ids_to_runinfo.py UnicodeEncodeError #525

carrere · 2020-12-04T13:05:09Z

Dear nf-core team, first of all, many thanks for your amazing work that make our analyses more easy and straightforward !

I am using this nf-core/rnaseq pipeline (release 2.0) with the experimental feature --public_data_ids to retrieve SRA datasets and I face some issues with some SRA projects for which some characters are non-ascii.

Here is an example: looking for SRP290966, you can find the degree character "°" in the experiment_title field encoded in unicode: [ENA API RESULT] (https://www.ebi.ac.uk/ena/portal/api/filereport?accession=SRP290966&result=read_run&fields=experiment_title) )

The workflow ends with this error:

Traceback (most recent call last):
  File "/home/carrere/.nextflow/assets/nf-core/rnaseq/bin/sra_ids_to_runinfo.py", line 178, in <module>
    sys.exit(main())
  File "/home/carrere/.nextflow/assets/nf-core/rnaseq/bin/sra_ids_to_runinfo.py", line 174, in main
    fetch_sra_runinfo(args.FILE_IN,args.FILE_OUT,platform_list,library_layout_list)
  File "/home/carrere/.nextflow/assets/nf-core/rnaseq/bin/sra_ids_to_runinfo.py", line 131, in fetch_sra_runinfo
    for row in csv_dict:
  File "/opt/conda/lib/python2.7/csv.py", line 108, in next
    row = self.reader.next()
UnicodeEncodeError: 'ascii' codec can't encode character u'\xb0' in position 260: ordinal not in range(128)

Thanks for your help,

Sébastien

The text was updated successfully, but these errors were encountered:

drpatelh · 2020-12-04T13:34:33Z

Thanks for reporting this @carrere 👍 I can indeed reproduce this locally by manually running the sra_ids_to_runinfo.py script with an id file containing just SRR12971731. @JoseEspinosa would be great if you can take a look at this please? 🙂

carrere · 2020-12-04T13:54:28Z

You're welcome. I think the main problem come from the EBI API that not declare the document encoding:

14:52 $ curl -I "https://www.ebi.ac.uk/ena/portal/api/filereport?accession=SRP290966&result=read_run&fields=experiment_title"
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Content-Type: text/plain
Strict-Transport-Security: max-age=0
Date: Fri, 04 Dec 2020 13:52:07 GMT
Expires: 0
X-XSS-Protection: 1; mode=block
Pragma: no-cache
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
Content-Length: 6226

Or in the firefox console:

"The character encoding of the plain text document was not declared. The document will render with garbled text in some browser configurations if the document contains characters from outside the US-ASCII range. The character encoding of the file needs to be declared in the transfer protocol or file needs to use a byte order mark as an encoding signature. filereport"

I do not know if you can fix this automatically on the client side ... but if you know someone @ EBI, you could ask them to fix on the API side.

Sebastien

JoseEspinosa · 2020-12-11T14:33:20Z

I been trying to solve the issue. The script works as it is when run with Python 3 but not with Python 2. The reason is that in Python 3 UTF-8 is the default source encoding but not in Python 2. Although I was trying to find a solution that worked both with Python 2 and Python 3 I didn't find it. That is why I just checked the python version and include this code. I don't know if you think this solution suitable if yes I can just make a PR with this patch

drpatelh · 2020-12-11T15:12:19Z

Thanks! But we should be using Python 3 for that process?

JoseEspinosa · 2020-12-13T09:36:21Z

I think we are using Python 2.7.13
Should we change the image instead?

drpatelh · 2020-12-13T10:04:09Z

Indeed we are!! Yes, that would be great we just need to replace those lines with the snippet below and that would be the best fix. Sorry, I was looking at the wrong process🤦🏽

    conda (params.enable_conda ? "conda-forge::python=3.8.3" : null)
    if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) {
        container "https://depot.galaxyproject.org/singularity/python:3.8.3"
    } else {
        container "quay.io/biocontainers/python:3.8.3"
    }

drpatelh · 2020-12-13T21:21:01Z

Thanks @JoseEspinosa. This will be fixed in the next release via f33eb6d

JoseEspinosa · 2020-12-13T22:05:42Z

Perfect @drpatelh ! was about to implement it now and saw that you closed the issue 😎

drpatelh · 2020-12-14T01:17:11Z

No worries! I had to use another container in the end that specifically contained requests. Thinking about it, I don't know if that is Python 3 or not 🤦🏽 It was late. Will check in the morning.

drpatelh · 2020-12-14T12:40:36Z

It's Python >3 in the requests container 💥 Also tested with SRR12971731 and it's working.

$ singularity shell depot.galaxyproject.org-singularity-requests-2.24.0.img

Singularity depot.galaxyproject.org-singularity-requests-2.24.0.img:> python
Python 3.8.3 | packaged by conda-forge | (default, Jun  1 2020, 17:43:00)
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

carrere · 2020-12-14T12:47:27Z

👍
Thank you !

drpatelh added the bug Something isn't working label Dec 4, 2020

drpatelh added a commit to drpatelh/nf-core-rnaseq that referenced this issue Dec 13, 2020

Fix nf-core#525

f33eb6d

drpatelh mentioned this issue Dec 13, 2020

Final pre-release updates #535

Merged

drpatelh closed this as completed Dec 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sra_ids_to_runinfo.py UnicodeEncodeError #525

sra_ids_to_runinfo.py UnicodeEncodeError #525

carrere commented Dec 4, 2020

drpatelh commented Dec 4, 2020

carrere commented Dec 4, 2020

JoseEspinosa commented Dec 11, 2020

drpatelh commented Dec 11, 2020

JoseEspinosa commented Dec 13, 2020

drpatelh commented Dec 13, 2020 •

edited

Loading

drpatelh commented Dec 13, 2020

JoseEspinosa commented Dec 13, 2020

drpatelh commented Dec 14, 2020

drpatelh commented Dec 14, 2020

carrere commented Dec 14, 2020

sra_ids_to_runinfo.py UnicodeEncodeError #525

sra_ids_to_runinfo.py UnicodeEncodeError #525

Comments

carrere commented Dec 4, 2020

drpatelh commented Dec 4, 2020

carrere commented Dec 4, 2020

JoseEspinosa commented Dec 11, 2020

drpatelh commented Dec 11, 2020

JoseEspinosa commented Dec 13, 2020

drpatelh commented Dec 13, 2020 • edited Loading

drpatelh commented Dec 13, 2020

JoseEspinosa commented Dec 13, 2020

drpatelh commented Dec 14, 2020

drpatelh commented Dec 14, 2020

carrere commented Dec 14, 2020

drpatelh commented Dec 13, 2020 •

edited

Loading