Raise error if file format is incorrect #73

dominiquesydow · 2020-10-22T15:29:36Z

Code of Conduct

Description

Raise ValueError if an incorrect file format is loaded using the mol2 or pdb modules.

Two checks per mol2/pdb module would be optimal:

When loading DataFrames from file: Check if extension is correct (.mol2/.mol2.gz or .pdb/pdb.gz, respectively).
When loading DataFrames from text: Check for certain tags in the text?

This PR suggests the following changes:

pdb from file: Raise ValueError in PandasPdb._read_pdb if file extension not .pdb/.pdb.gz
TBD: pdb from text: Add len(dfs['ATOM'] == 0 to check if any atomic data was loaded? However, I am not sure if you want
mol2 from text: Raise ValueError in PandasPdb._get_atomsection if format is not mol2 (@<TRIPOS>ATOM not found)
this behaviour in general.
TBD: mol2 from file: Raise error if file extension is incorrect in mol2.mol2_io.split_multimol2. I tried to add:

    # mol2_io.py
    def split_multimol2(mol2_path):
        if mol2_path.endswith('.mol2'):
            open_file = open
            read_mode = 'r'
        elif mol2_path.endswith('mol2.gz'):
            open_file = gzip.open
            read_mode = 'rb'
        else:
            raise ValueError('Wrong file format; allowed file formats are .mol2 and .mol2.gz.')
        # rest of the function's code

However, the error is not thrown, which I don't understand right now.

Related issues or pull requests

Suggests changes for #71.

Pull Request Checklist

Added a note about the modification or contribution to the ./docs/sources/CHANGELOG.md file (if applicable)
Added appropriate unit test functions in the ./biopandas/*/tests directories (if applicable)
Modify documentation in the corresponding Jupyter Notebook under biopandas/docs/sources/ (if applicable)
Ran PYTHONPATH='.' pytest ./biopandas -sv and make sure that all unit tests pass (for small modifications, it might be sufficient to only run the specific test file, e.g., PYTHONPATH='.' pytest ./biopandas/classifier/tests/test_stacking_cv_classifier.py -sv)
Checked for style issues by running flake8 ./biopandas

rasbt · 2020-10-22T18:02:46Z

Thanks for the PR!

Right now, the unit tests are based on nosetest which is why importing pytest failed the unit tests. I moved another project to pytest and like it overall, however, for biopandas, this is best done in a separate PR (in the future). So, I made some adjustments to the unittests here.

TBD: mol2 from file: Raise error if file extension is incorrect in mol2.mol2_io.split_multimol2. I tried to add:

Huh, yeah, that was a weird one! It took me a bit to figure it out but then I remembered that I implemented split_multimol2 as a generator. So, split_multimol2('40_mol2_files.pdb') didn't do anything in the unit test and it required a change to next(split_multimol2('40_mol2_files.pdb')).

pdb from text: Add len(dfs['ATOM'] == 0 to check if any atomic data was loaded? However, I am not sure if you want

I generally think that's a good idea. Maybe raising a warning would be a good compromise to notify users but allow them to investigate PDB files that don't have atom entries (but only hetatm/anisou entries etc).

coveralls · 2020-10-22T19:10:09Z

Coverage increased (+0.3%) to 94.558% when pulling 7020057 on dominiquesydow:raise-invalid-file-error into 8da42c2 on rasbt:master.

dominiquesydow · 2020-10-23T06:18:17Z

Thanks, excellent!!
My apologies for the pytest - nosetest confusion and thanks for fixing it.
I will look into the len(dfs['ATOM'] == 0) check later (latest beginning of next week).

rasbt · 2020-10-26T01:43:38Z

I appreciate the PR, and no worries at all!

pep8speaks · 2020-10-30T14:29:11Z

Hello @dominiquesydow! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-10-30 14:30:40 UTC

dominiquesydow · 2020-10-30T14:40:28Z

pdb from text: Add len(dfs['ATOM'] == 0 to check if any atomic data was loaded? However, I am not sure if you want

I generally think that's a good idea. Maybe raising a warning would be a good compromise to notify users but allow them to investigate PDB files that don't have atom entries (but only hetatm/anisou entries etc).

Agreed! I added the warning (only checking for missing ATOM entries).

Is there anything else I should address in this PR that I might have missed?

rasbt · 2020-10-30T15:27:18Z

This looks great. Thanks a lot. It's good to merge!

…e-error Raise error if file format is incorrect

dominiquesydow added 2 commits October 22, 2020 16:57

Raise error in PandasPdb._read_pdb if file extension not .pdb/.pdb.gz

00079a7

Raise ValueError to PandasPdb._get_atomsection if format is not mol2

f4e05a3

dominiquesydow mentioned this pull request Oct 22, 2020

Error handling when reading wrong file formats #71

Closed

rasbt added 2 commits October 22, 2020 12:45

using assert_raises

7432f2e

add split_multimol2 unit test

7e3cfd2

rasbt mentioned this pull request Oct 22, 2020

Add PandasPdb.read_pdb_from_list method + unit test #72

Merged

5 tasks

read_pdb_from_list to changelog

269b12f

Issue warning if no atoms have been loaded from PDB format

f5bb5f4

Remove trailing whitespace

7020057

rasbt merged commit 061eb70 into BioPandas:master Oct 30, 2020

nozomu-y pushed a commit to nozomu-y/biopandas that referenced this pull request Jan 9, 2023

Merge pull request BioPandas#73 from dominiquesydow/raise-invalid-fil…

d77d446

…e-error Raise error if file format is incorrect

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Raise error if file format is incorrect #73

Raise error if file format is incorrect #73

dominiquesydow commented Oct 22, 2020 •

edited by rasbt

rasbt commented Oct 22, 2020

coveralls commented Oct 22, 2020 •

edited

dominiquesydow commented Oct 23, 2020

rasbt commented Oct 26, 2020

pep8speaks commented Oct 30, 2020 •

edited

dominiquesydow commented Oct 30, 2020

rasbt commented Oct 30, 2020

Raise error if file format is incorrect #73

Raise error if file format is incorrect #73

Conversation

dominiquesydow commented Oct 22, 2020 • edited by rasbt

Code of Conduct

Description

Related issues or pull requests

Pull Request Checklist

rasbt commented Oct 22, 2020

coveralls commented Oct 22, 2020 • edited

dominiquesydow commented Oct 23, 2020

rasbt commented Oct 26, 2020

pep8speaks commented Oct 30, 2020 • edited

Comment last updated at 2020-10-30 14:30:40 UTC

dominiquesydow commented Oct 30, 2020

rasbt commented Oct 30, 2020

dominiquesydow commented Oct 22, 2020 •

edited by rasbt

coveralls commented Oct 22, 2020 •

edited

pep8speaks commented Oct 30, 2020 •

edited