-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
analysis tests #62
analysis tests #62
Conversation
@pm-blanco Are these analysis functions actually used by pyMBE? Maybe we should consider removing or at least updating some of them? |
@kosovan |
@kosovan I took the liberty to finish this PR. I cleaned up |
@paobtorres this PR is now ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few inconsistencies in the docstring of this file. For example, in Args from analyze_time_series()
has no items (-) and the Returns provides only the format output. Then, in block_analyze()
each Args has an item symbol (-) the the Returns contains the variable name of the output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@paobtorres thank you for your revision and for bringing up this point. I have reviewed the docstrings of analysis.py
and pyMBE.py
to tried to standardize the format and, in the process, I solved numerous inaccuracies and ambiguities on variable types. Regarding the format of the Return arguments, I think that we should actually not enforce that there should be a variable name in the docstrings because there are some functions like pmb.check_if_df_cell_has_a_value
or check_if_name_is_defined_in_df
that do not really store their output into a variable but simply return it and then the user can put any arbitrary variable name to it. I would suggest to either drop all variable names for return arguments (because they are actually not important for users anyway) or to only add the variable name in return arguments when it is actually defined in the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pm-blanco I think that depends on which Python code docstring guideline we follow. I believe that when there is a variable name we should put it because it helps the user to better understand how each function works by reading the docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@paobtorres we follow the Google Style Python Docstrings. In that standard, the variable names are not provided for return arguments. I do not think that it is enforced that we should not provide variable names, so we can agree that we provide variable names for return variables to help users reading the code but we do not enforce that all functions should have a return variable name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the docstring in analysis.py
should be changed to keep a consisting format. Other than that, I did some testing and everything ran successfully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed the changes and quickly ran a test everything is ok.
Provides unit tests for:
lib.analysis.get_params_from_file_name()
lib.analysis.block_analyze()
lib.analysis.get_dt()
lib.analysis.split_dataframe_in_equal_blocks()
lib.analysis.split_dataframe()
lib.analysis.add_data_to_df()
lib.analysis.analyze_time_series()
Deprecates:
lib.analysis.do_binning_analysis()
lib.analysis.merge_time_series_dfs()
lib.analysis.get_time_series_from_average_df()
lib.analysis.read_csv_file()
lib.analysis.get_distribution_from_df()
lib.analysis.create_histogram_df_from_distribution_list()
lib.analysis.find_index_with_value_in_df()
Coverage in
analysis.py
: 100%; Increases the total coverage of the module to 83%