Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

💄 linting #12

Merged
merged 2 commits into from
Apr 17, 2023
Merged

💄 linting #12

merged 2 commits into from
Apr 17, 2023

Conversation

gessulat
Copy link
Member

No description provided.

@sambenfredj
Copy link
Collaborator

  • For the exception in utils.py can handle the error using StopIteration
  • For the exception in dataset.py we can fix it by using if .. else ..
 if target_column:
        return utils.convert_targets_column(df, target_column)
    else:
        return df

or we can handle it with keyword KeyError

@gessulat gessulat merged commit bf0c2ce into main Apr 17, 2023
@sambenfredj sambenfredj deleted the feature/linting branch July 18, 2023 12:42
gessulat added a commit that referenced this pull request Feb 27, 2024
💄 fix linting
gessulat added a commit that referenced this pull request Sep 18, 2024
…ation (wfondrie#119)

* 💄 lint mokapot

* 💄 lints tests

* 💄 fixes format with ruff

- adds line break in dataset.py
- updates call of ruff in CI
- updates pyproject.toml according to new ruff api

* 💄 fixes format with ruff

- adds line break in dataset.py
- updates call of ruff in CI
- updates pyproject.toml according to new ruff api

* 💄 make ruff and black happy together

* Fix problems with nnls

* Feature/improve speed and limit memory (#11)

Improve speed and limit memory consumption

- stream input files for inference
- add feature: skip deduplication
- add feature: ensemble model
- add feature: rescale input before inference with pre-trained models

* 💄 linting (#12)

:lipstick: fix linting

* Fix bugs (#17)

- fix bug member variables not assigned when model is not trained
- allow throw when input file is malformed: remove skip on bad lines from pandas read function

* fix test model: remove subset_max_train from percolator model (#18)

* Fix test brew: (#20)

- Create new object of OnDiskPsmDataset to use for brew tests
- Update brew function outputs and assert statements

* fix test datasets: (#19)

- remove assign confidence tests because datasets don't have assign confidence methods anymore
- add eval_fdr value to the _update_labels function

* Fix test confidence (#22)

* Fix test confidence:
- fix bugs for grouped confidence
- fix test_one_group : create file using psm_df_1000 to create OnDiskPsmDataset.
- remove test_pickle because confidence does not return dataframe results anymore.
- add test_multi_groups to test that different group results are saved correctly.

* fix bugs:
- overwrite default fdr for update_labels function
- return dataframe for psm_df_1000 to use with LinearPsmDataset

* Fix cli tests: (#28)

- Remove test_cli_pepxml because xml files don't work with streaming
- Replace old output file names
- Add random generator 'rng' variable to confidence since it is required for proteins
- Remove subset_max_train from PluginModel
- Fix bug: convert targets column after reading in chunks
- Fix peptide column name for confidence
- Fix test cli plugins : replace DecisionTreeClassifier with LinearSVC BECAUSE DecisionTreeClassifier return scores as 0 or 1

* Fix system tests: (#29)

- Refactor test structure : Separate brew and confidence functions, read results from output files.
- Fix bugs: fix output columns for proteins, sort proteins data by score

* Fix parser pin test: (#30)

- Add label value to initial direction because it has to have a numerical number
- Read pin does not return dataframe anymore
- Compare output of read_pin function to example dataframe

* Add tests: (#31)

- Add skip_deduplication flag test
- Add ensemble flag test
- Agg rescale flag test
- Fix bug: remove target_column variable from read file for read_data_for_rescale

* Fix writer tests: (#32)

- Remove writer tests with confidence object becaause LinearPsmDataset does not have asign_confidence method anymore and results are streamed to output files while computing confidence

* fix error no psms found during training : if no psms passed the fdr value then raise error that model performed worse (#33)

* Introduce new executable and bug fixes

* Create new executable to aggregate psms to peptides.
* Fix bugs:
- fix error no psms found during training : if no psms passed the fdr value then raise error that model performed worse
- raise error when pep values are all equal to 1
- prefixes paths to dest_dir to not pollute the workdir
- catch error to prevent traces logged: Catch all errors to not break structured logging by error traces
- fixes parallelism in parse_in_chunks to max_workers
- fix indeterminism
- fixed small column chunk bug
- fix bug when using multiple input files
* Fix and add tests:
- remove writer tests with confidence object because LinearPsmDataset does not have asign_confidence method anymore and results are streamed to output files while computing confidence
- add test for the new function "get_unique_peptides_from_psms"
- add cli test for aggregatePsmsToPeptides

* ✨ force ci re-run

* 💄 lint mokapot

* 💄 lints tests

* 💄 fixes format with ruff

- adds line break in dataset.py
- updates call of ruff in CI
- updates pyproject.toml according to new ruff api

* 💄 fixes format with ruff

- adds line break in dataset.py
- updates call of ruff in CI
- updates pyproject.toml according to new ruff api

* 💄 make ruff and black happy together

* ✨ removed deprecated error ignore

* Fix two boolean conditions in nnls algorithm

* Set tolerance to fixed value in fit_nnls to avoid non-convergence

* Adjust unittest for hist_nnls to new error cases

* Add documentation and test for create_chunks

* Make cli unit tests for aggregateP2P easier debuggable

* Improve test for peptide_csv in test_utils

* Improve and test convert_targets_column function

And fix some minor pep8 issues

* Enable switching in system tests from subprocess to direct calls

* Fix cli system and utils tests

* Fix unit tests

* Add documentation and test for create_chunks

* Make cli unit tests for aggregateP2P easier debuggable

* Improve test for peptide_csv in test_utils

* Improve and test convert_targets_column function

And fix some minor pep8 issues

* Enable switching in system tests from subprocess to direct calls

* Fix cli system and utils tests

* Fix unit tests

* parquet reader for mokapot

* merge sort function adapted for parquet

* brew function adapted for parquet input

* confidence assignment modified for parquet format

* merge sort chunk size added as constant

* update label func modified for parquet

* main function uses format arg to choose between csv and parquet

* pyarrow added to dependancies

* Change conversion of target column values

Revert it to "old style" because otherwise some tests break

* fixed failing tests

* added new tests for parquet

* refactor: Add type hinting to tuplize function

* refactor: Refactor find_column(s) functions and use it in read_percolator

* refactor: Insert some newlines for improved readability

* refactor: Insert some newlines for improved readability

* refactor: Insert some newlines for improved readability
refactor: Insert some newlines for improved readability

* refactor: Remove redundant test case for case-sensitive column matching.

* Add typeguard

* refactored unchunked file reader for parquet and csv

* Add map_columns_to_indices function and more type checking

* Make debugging dataframe issues easier in unit test

by adding some pandas config options to pytest_sessionstart in
conftest.py

* Fix test_utils

* Add level_columns to OnDiskPsmDataset

* Rename deduplication to do_rollup

* Change deduplication to do_rollup

* Fix pin reading by adding level_columns

* Revert "Merge branch 'feature/parquet_parser' into 'main'"

This reverts merge request !39

* Move peptides.csv to dest_dir and remove it where it's created

* Save changes

* Get rid of path manipulation via strings

* Clean up more path related stuff

* Correct documentation of return values of brew function

* Simplify and generalize path definitions in confidence.py

* Move confidence related functions to confidence

* Fix problem with parameter lists in cli tests

* Add checking of column names for OnDiskPsmDataset

* Fix column index stuff

* Disable parallel unit tests when debugging

* Refactor the confidence.to_txt function

* Fix chunked reader to read in column order as passed to the function

* Add comments and put temp file naming for merge-sorting in one place

* Add test for chunked reader

* Fix warning in read_file_in_chunks test

* Remove unnecessary conversion (back and fro) in confidence.py

* Add pyarrow as a dependency

* Remove superfluous conversions and some more superfluous stuff

* Improve find_column* and use consistently in pin parser

* Introduce tabbed reader and writers (for csv for now)

* Correct buggy import statement

* Fix bug and add test related to --aggregate flag

* Remove ignoring of warnings

Warnings should not be ignored, except when the verbosity is set to
error level. If there are annoying and known warnings, that are handled
appropriately, they should be ignored locally on a case by case basis.

* Comment regarding to_txt function (can be removed)

* Add file type detection to readers and writers

* Ignore warning in PIN reader (locally now)

* Improve column ordering/mapping and add unit test

* Correct targets conversion function and fix offending unit tests

* Improve on label updating and type safety

* Correct output capturing in test helper

* Use TabbedWriter in save_sorted_metadata_chunks

* Progress towards rollup

* Improve map_columns_to_indices for dicts

* Add stringify methods to tabbed readers

* Change assign_confidence for rollup

* Fix column ordering problem (temp)

* Add tests for the rollup

* Fix a bug in the rollup unit test

* Rename pcms to precursors

* Remove a now superfluous function and comments

* squashed sqlite writer branch

* failing test fixed

* unused function get_unique_peptides_from_psms removed

* format interpreted implicitly using filename

* Confidencewriter class implemented

* sqlite path changed to Path type

* perquet module deleted and integrated into read_pin

* failing tests fixed

* instantiation done before returning object

* PSM_PEP column name changed to POSTERIOR_ERROR_PROBABILITY

* add pipeline status for main branch

* ✨ fixes wfondrie#54

* Do some cosmetics

* Remove rescale stuff

Was broken and unused anyway

* Renamed TabbedFileReader and Writer to TabularDataReader and Writer

* Separate general tabular data and confidence writer stuff

* Fix import bug in confidence

* Fix another import bug

Due to pycharm refactoring... sigh...

* Add better unit test for (chunked) confidence

* Make chunked confidence unit test fail with small chunk size

* Fix confidence chunk size bug

* test case added for sqlite writer

* test data added for sqlite writer

* Add option for suppressing warnings

* Revert the change in confidence and adapt unit test

CONFIDENCE_CHUNK_SIZE is now directly modified from the unit test.

* prepare tables sqlite db added as helper func

* Fix problems with sqlit after the merge

* Remove all group related stuff

* Remove crosslink stuff

* Remove plugins

* Remove skipped tests and skip marks

* Do some minor cleanup

* Add type inference and tests for tabular data

* Add reader and tests for in memory dataframe reader

* Add streaming module

* Add checks to merged reader

* Add creation of dataframe reader from series and arrays

* Add JoinedTabularData and tests

* Add column renaming for tabular data readers

* Add context manager to tabular data writer and make confidence writer a function

* Get rid of all kinds of warnings during tests

* Fix another warning

* Add context manager to TabularDataWriter

* Fix a problem with indexing in the merged reader

* Add buffering to writers

* Correct problem in unit test with log output and typechecking

* Add functionality to add computed columns to TabularData

* Add method to get an associated reader from a writer

* Add cli and test for the rollup

* Fix underscore problem

* Fix problem with typechecked/contextmanager order

* Remove typechecking from auto_finalizer for the moment

* Fix path problem in rollup unit test

* Show rollup levels not found only if non-empty

* Simplify sqlite connection in unit tests

* Add new suffixes for csv

* Change options: add src_dir and remove keep_decoys

* Add files for rollup testing

* buffered write for parquet intermediary files implemented

* tests updated for parquet writing

* fixed aggregatePsmstoPeptides cli

* test data structure changed to list of dicts to match merge sort output structure

* test data updated to be dataframe readable

* Remove aggregatePsmsToPeptides

* Move remove_columns function to tabular_data

* Fix program name in cli output

* Let brew_rollup also search for parquet files

* Use csv or parquet suffix also for temp and output files

* Filter a warning in the system tests

* Make the column types a bit more lenient

* fixed rollup app for parquets

* Fix unclosed files problem

* Remove unused parameter target_column from merge_sort

* Remove superfluous passing of sep

* Change tabs to colons in protein(s) column of pin file

* Test parquet merge_sort more extensively

* Unify csv and parquet methods in merge_sort

* Simplify get_row_iterator

* Make brew rollup faster

* Fix bug in MergedTabularReader

* Fix problem with last line in buffering

* Fix problem with type conversions in merge_sort

(And make it a bit less ugly...)

* ✨ addresses @jspaezp suggestions from PR wfondrie#119

- removes MSAID internal status banner in README
- removes @TypeChecked for brew() (we can reintroduce it when we have a
  typed version of brew()
- uses yield from pattern
- uses dictionaries to assert for correct column-name and type mappings
- removes installation of sample-plugin from tests

* ✨ addresses review suggestions

- adds back typechecked for brew
- uses parameterized tests for test_peps_hist_nnls

* Add check for length of mokapot output file

* Make test for output file length more "elastic"

* Revert documentation on psms parameter for brew function.

And add some additional info.

* Change f-string to normal string where unnecessary

* Remove types_from_dataframe function, since unnecessary

* (chore) updated cicd,ruff,black and tests (#42)

* (chore) updated cicd, ruff, black and tests
* resolved comment question
by @jspaezp

* ✨ fix setting scores when training failed

* Readd previously commented out check for feature columns

* ✨ proper docstring for `sqlite_db_path`

* Add doc strings for write_confidence (and improve types)

* Improve documentation and type hints of assign_confidence

* Add class and module documentation for the tabular data classes

* Feature/remove nnls patch (#43)

* ✨ remove patched nnls (using fixed scipy version now)
* 💄 linting; line breaks

Co-authored-by: Elmar Zander <elmar@zandere.de>

* Fix/windows tests (#44)

* Remove nnls patch and fix scipy version (with fixed nnls)

* ✨ remove patched nnls (using fixed scipy version now)

* 💄 linting; line breaks

* 🚑 fix dtypes to ensure windows ci to pass

---------

Co-authored-by: Elmar Zander <elmar@zandere.de>

* Fix/windows tests (#45)

* Remove nnls patch and fix scipy version (with fixed nnls)

* ✨ remove patched nnls (using fixed scipy version now)

* 💄 linting; line breaks

* 🚑 fix dtypes to ensure windows ci to pass

* 💄  linting

---------

Co-authored-by: Elmar Zander <elmar@zandere.de>

* ✨ draft of pin to tsv converter

* ✨ adds is_valid_tsv

* ✨ adds tsv verification for pin files and conversion

* 📝 remove print

* 🔥 add required default for --dest_dir

- using Mokapot without --dest_dir was broken without this new default
  before

---------

Co-authored-by: Siegfried Gessulat <siegfried@msaid.de>
Co-authored-by: Elmar Zander <elmar@zandere.de>
Co-authored-by: sambenfredj <100685091+sambenfredj@users.noreply.github.com>
Co-authored-by: Elmar Zander <elmar.zander@toptal.com>
Co-authored-by: Vishal Sukumar <vishal.sukumar@msaid.de>
Co-authored-by: Florian Seefried <florian.seefried@msaid.de>
Co-authored-by: Graber Michael <michael.graber@msaid.de>
Co-authored-by: Tobias Schmidt <tobias.schmidt@msaid.de>
Co-authored-by: J. Sebastian Paez <jspaezp@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants