Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Codecleanup and streamlining #6

Merged
merged 14 commits into from Jan 28, 2022
Merged

Codecleanup and streamlining #6

merged 14 commits into from Jan 28, 2022

Conversation

hernot
Copy link
Contributor

@hernot hernot commented Jul 25, 2021

Could not help doing some streamlining code clean-up and update of documentation. Hopefully bringing the whole closer to actual release.

Details about the tiny bits and polishings you find in the commit message.

I have tested / tox'ed the whole for python 2.7, 3.6, 3.7 and 3.8. For 3.5 i do not have any installation any more and for 3.9 not yet.

And for the extra.py module if this is still integral part of ahds you will have still to write the test suit that is the only one missing for reaching 100% code coverage.

* Cleanup of commented but not yet removed codes.
* Using else clause of while and fore where sensible instead of
  a=None
  ```
  while b:
     if c:
       a=c
       break
  if a is None:
    a=''
  ```

* remodeled tests for data_stream.py, gammar.py and header.py to first
  test individual classess and their methods before testing complex
  usage scenaios. Simplifies setupClass method and allows to identify
  errors more up front and without stepping through cascades of function
  calls. Added tests/test_03_data_streams_common.py test suite testing
  functions and classes imported and required by header.py up front.

* reduced methods exported by grammar.py to get_header only
* added custom parser classes deriverd from simpleparse.parser for
  AmiraMesh and HyperSurface files. Both share the same constructor
  interface accepting addtional keywords arguments. The latter are
  passed by the parse method of the HyperSurfaceParser class to the
  parse_hypersurface_data method to parse content structure of
  HyperSurface files after parsing header. This ensures that both
  parsers return a complete structure of the parsed files.

* updated documentation and readme as far as possible.
  docs/ahds_classes.png still needs to be updated to reflect new class
  structure. Check if update of documentation is appropriate will be
  necessary.
- github workflow now uses tox-gh-actions pluging for tox:
  * all testing including syntax check is now runf from within tox
    environment. This ensures that most errors can be identified through
    local tox run and fixed before committing and pushing to github for
    all locally installed python interpretor versions. Further most
    errors occuring when github workflow is run can be repoduced through
    local tox run. The only requirement for this is to ensure that the
    affected python version is installed locally.

- github workflow extended by windows and mac-os builds (experimental)

- some minor fixes for future "must break there points" identified  through
  local flake8 run.

TODO: need to discuss which of the issures reported by flake8 should
      be addressed before release
@hernot
Copy link
Contributor Author

hernot commented Aug 16, 2021

People taking an apple, making a bite, putting it on the table and calling it a computer are suspicious.
People staring inside through any window, complaining about everything and thereby pretending to work on the computer are not to be trusted.
Still ahds has from now on builds on python >= 3 for both windows and mac OS-X and python 2.7 for mac OS-X.
Had to modify the github action workflow a bit to ensure that all tests and uploading to coveralls works on all platforms. the ghithub action selected originally only worked on linux and official coveralls githubaction insists to load lcov files which are not generated by pytest-cov module just by coverage directly. Luckily coveralls.py directly supports github and can directly pares .coverage database file. All the details inside the commit message.
Further i changed workflow that all testing including stylecheck through flake8 is run through tox. That allows to do most of the testing for different pythonversions locally before the first push.


--------------------------------------------
Installation
--------------------------------------------
Presently, ``ahds`` only works with Python 2.7 but will soon work on Python 3. Please begin by
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to install without numpy? Please try in an empty virtual env.


pip install ahds

----------------------------------------------
License
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The license and copyright need to be retained. Could you please add these back?

@@ -112,7 +293,7 @@ The `designation` is the first line and conveys several important details about

* extra format data e.g. ``<hxsurface>`` specifying that an AmiraMesh file will contain HyperSurface data

A series of `definitions` follow that refer to data found in the data pointer sections that either begin with the word ‘define’ or have ‘n’ prepended to a variable. For example:
A series of `declarations` defines the sturcture and layout of the data and how many elements the corresponding array, vector or matrix contains. A declaration lline starts with the word ‘define’ or in files created by early Amira versions have ‘n’ prepended. For example:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is declarations the official term?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question need to check if they have any term in their reference guide for it. Otherwise i would believe we are free to find an appropriate naming for the part defining the array/matrix size and the part describing each data item therein and thus the data-pointers in detail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok checked in the reference guide for version 5. they talk about arrays and entries

[...] one-dimensional array Nodes containing entries of type float[3] called
Coordinates, as well as a one-dimensional array Tetrahedra containing entries of
type int[4] called Nodes. [...]

and further down they describe the statement defining the array

[...] The statement define Nodes 4 defines a one-dimensional array of size 4.
Later on, this array can be referenced using the name Nodes. Similarly, a state-
ment define Array 100 100 defines a two-dimensional array of size 100 x
100 [...]

and the statement defining the data element / item

The statement Nodes { float[3] Coordinates } = @1 specifies that
for each element of the array Nodes defined earlier, three floating point numbers
(floats) should be stored. These data are given the name Coordinates and a tag
in this line and will appear below as a tagged block with the marker @1

So their wording uses array and element definition and tag, tagged block and marker when they mean @N symbol. So if i would change wording to array definitions and element definitions if we want strictly stick to their wording. Should we also reword the corresponding grammar symbols and variable names to consistent and ease reading and understanding code for future contributors and ourselves when having to amend in one of our next lives?

@@ -1,15 +1,32 @@
``ahds`` package
================

.. automodule:: ahds
.. automodule:: src.ahds
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this is for practical purposes i.e. it only works when pointing to src

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it had something to do with testing, etc. i would have to check commit message for detailed reasons (hope have put them)

Copy link
Contributor Author

@hernot hernot Sep 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok i have checked this was necessary cause in the pull request #4 which you already have merged by the commit 964eff1 i had to change file system structure to consistently be able to reproduce a bug locally using pytest and tox which i encountered on the CI (back then still Travis). The only way back then i found to get hold of the bug without overstressing CI was to change filesystem structure to the one preferred by pytest.
This change now reflects this previous change upon doc sources to ensure sphinx is able to collect the source file documentation for the new file locations.

Added tests
-----------
 * ahds_readonly_descriptor
 * ahds_parent_descriptor
 * ahds_member_descriptor
 * BlockMetaClass
 * Python2 specific patches eg. dict

Added conftest.py to ensure the essential tests for
ahds_readonly_descriptor, ahds_parent_descriptor and BlockMetaClass can
be executed even if an error in BlockMetaClass or other essential parts
prevents regular import of ahds/core.py.
For this it implements a pytest_sessionstart hook which tests if ahds/core.py
succeeds. In case import fails than a fallback module is loaded and patched
with all the components successfully created before the error occured.
The resulting is than placed on sys.modules dictionary instead of the failed
module. This ensures that test collection phase of pytests succeeds eventhough
ahds/core module can not be imported.
In case import is not possible the pytest_collection_finish hook adds to
all tests the pytest.mark.xfail mark unless it has explicitly been
marked with the run_recover mark which is defined by the
pytest_configure hook. Any test to which the run_recover mark is
attached will be run even if ahds/core.py can not be imported and is
replaced by the patched fallback module. All other tests which are
marked xfail will not be run at all.

Python 2 dict remapp
--------------------
Removed the _dict_iter_items, _dict_iter_keys, and _dict_iter_values
exports of the unbound dict.iteritems, dict.iterkeys and dict.itervalues
functions. Instead the python 2 specific sectoin defines a dict succlass
of builtin.dict that remaps these methods to dict.items, dict.keys and
dict values the original dict.items, dict.keys and dict.values methods
are mappted to dict.listitems, dict.listkeys and dict.list values. The
original builtins.dict class is replaced by this python 3 conformat
facade.

TODO: it has to be tested whether any of the still used python 2 based production
environments where ahds is used requires that dict.items, dict.keys,
dict.values methods return an indexable list instead of a DictItemsView,
DictKeysView or DictValuesView object. If this is the case the python 3
conformant remapming has to be removed again. Otherwise it will help to
motivate developers to stick to python 3 instead of python 2 for new
projects and programs.

Python 2 range remapp
---------------------
remapped builtins.range to builtins.xrange the original range can be
accessed through builtins.slow_range.
TODO: if any productive software requires that range returns list and
not xrange object both of which are indexable this has to be undone
again.

Some notes capturing ideas on how to write AmiraMesh files
----------------------------------------------------------
Added some notes how writing out valid AmiraMesh files could be
acomplisht. Just a first collection of thoughts and ideas for discussion.
@hernot
Copy link
Contributor Author

hernot commented Sep 20, 2021

Ok as Coveralls seems to fight with their database since yesterday and still encounters problems i will retry once again tomorrow. Until upload of reports is again flawlessly possible and than squash all the unsuccessful upload runs. @paulkorir can you check if the github actions settings if retriggering unsuccessful runs can be configured. If what possibilities configuration allows to ensure only by you as maintainer and the submitter of the pull-request. Cause failed runs due to downtime of coveralls can happen anytime again.

@hernot
Copy link
Contributor Author

hernot commented Sep 21, 2021

Ok Coveralls is back up again just right on the spot for pushing update removing numpy build dependency.
Further i have checked the current pull request readds the license statement in the README.rst and docs/overview.rst.

The last commit also includes some fixes for reducing number of code changes necessary on removal of pyhton2. The tricky part about them is they are monkeypatching builtins module to gain python3 compatible interface behavior. The down is this fixes only safely work as long as none of your still python 2 based production environments importing ahds library requires dict.items(), dict.keys(), dict.values() and range return indexable list objects, cause as the dict object is monkey patched to a facade class rewiring dict.items to dict.viewitems, dict.keys to dict.viewkeys and dict.values to dict.viewvalues. The range function is monkey patched to point to the pyton 3 range alike xrange function. Let me know if any of these monkey patche could pose a problem on your systems not yet migrated to python 3. Otherwise as according to roadmap dropping python 2 support is not far ahead anyway i would keep them.

I also have included in the weekend commit some thoughts how writing AmiraMesh files could work. The idea is to have some start for discussion and design phase.
EDIT'
Just checked a consideration (idea) which came to my mind this morning. I do think counting formatting support form python we already have about 80 % we do need for writing it needs just to refactor the _load_declaratoins and _load_definitions functions and to find an appropriate way to generate parameters section. With this also the question how should a simple interface for creating valid ahds object structure which can be converted into a valid amira mesh file and in future possibly also a valid hxsurface file. Will update the list of my thoughts the next days with this considerations.

EDIT
fix some minor issue in GitHub Actions workflow file causing Windows runners to upload empty coverage report.

For building the decoders module numpy is not necessary any more. The
same holds for the hxbyterle_decoder fallback python function. Both now
return a bytearray object which can be passed to np.frombuffer for
converting it into a numpy array. The both functions as well as the
raw decode function now mimick the call signature of zlib.decompress
which is used to decode HxZip encoded data.

Github Actions update:
----------------------
On windows runners empty coverage reports were uploaded leaving the
impression that not any line was tested eventhough pytest reported
successful completion and that all lines in all files have been tested
exempt extra.py for which up to now no tests exist.

This issue is fixed by forcing GitHub action runners to use bash as
shell in perparatory coveralls upload steps:

  Coveralls python 2.7 preconditions
  Coveralls recent python preconditions

instead of default shell which would be power shell on windows runners
and bash for all others.
According to documentation legacy powershell does not apply proper utf8
encoding which is precondition for successfully appending environment
variable to $GITHUB_ENV to be used in subsequents steps. By enforcing
use of bash according to documentation selects on Windows runners bash
bundled with git for Windows instead of default pwsh (power shell).
… name; fixed stream handle swallowing final byte(s) of stream when their value is equal to b'\n' character; fixed missing verbous flag check; tests still missing
@hernot
Copy link
Contributor Author

hernot commented Oct 20, 2021

While trying to use the forthcomming version in my productive work (testing with real world data) i came across a rather strange HxSpreadsheet file with column names which do not include a column number counter and have all the same name. That seemed at least in the past a valid AmiraMesh file. Therefore i'm currently refactoring my proposal on how Specialised Contentypes like HxSpreadSheet could be more reliably supported and hopefully than again written to file such that amira is able to read them properly. I'm trying to come up with a simpler, better maintainable and thereby less complex solution. Have already an Idea which I'm currently testing. Will need some days. Let you know when finished.

While testing with old real data generated by Amira a reather unusually
formatted header caused problems properly loading the data. The header
of the 'tests/data/EqualLabeledColumnsSpreadSheet.am' file defines
9 arrays with exactly the same name 'AT' and it also defines nine data
streams with exactly the same name 'AT'. Only in the parameter section
it can be deduced that each of the 9 arrays represents one '__Column'
as the '__ColumnNameXXXX' parameters still read '__ColumnName0000'
through to '__ColumnName0009'. With all 9 arrays and corresponding data
haveing the same name only the last array was added to the AmiraHeader
structure. All others were droped.

This is solved by refactoring the dispatch time filtering of
array_definitions for collecting the array definitions which could be
part of a dedicated special 'ContentType' like a 'HxSpreadSheet'. The
filtering is now only applied if the special 'ContentType' parameter
is encountered while parsing the Parameters section of the file.
The main purpose of 'DispatchFilter' objects usde for filtering is two
fold.
  1) Indicate whether the content of the file needs a special handling
     during setup of the AmiraHeader File Structure
  2) Filter and rewrite the names of the arrays and the correponding
     data_names where necessary and update the affected low level
     array_defintion objects and data_defintion objects accordingly

The benefit of this refactoring is that the dispatch time filtering
became more stable and the interface could be further simplified.
@hernot
Copy link
Contributor Author

hernot commented Oct 21, 2021

Ok refactoring done. Nice effect interface for dispatch time prefiltering and if necessary recoding array_names, array_references and data_names became clearer and the code to handle the included HxSpreadSheet example became further smaller and cleaner.
EDIT
After fixing the small bug in the regular expressions used to identify the start HxSurface streams all works again. Thereby i have added to the Py23FixTestcase lass a static method called parametrize. Like the pytest's own parametrize function it allows to define a test function which is called multiple times with different parameters. The reason for this home-brew version is that pytest.parametrize is not supported by unittest module and unittest module does not have such. I didn't want to write the very same test code five times just to test the two stream header start regular expressions for "AmiraMesh" and "HxSurface" files with various 'ascii' and 'binary' files.

While testing with old real data generated by Amira a reather unusually
formatted header caused problems properly loading the data. The header
of the 'tests/data/EqualLabeledColumnsSpreadSheet.am' file defines
9 arrays with exactly the same name 'AT' and it also defines nine data
streams with exactly the same name 'AT'. Only in the parameter section
it can be deduced that each of the 9 arrays represents one '__Column'
as the '__ColumnNameXXXX' parameters still read '__ColumnName0000'
through to '__ColumnName0009'. With all 9 arrays and corresponding data
haveing the same name only the last array was added to the AmiraHeader
structure. All others were droped.

This is solved by refactoring the dispatch time filtering of
array_definitions for collecting the array definitions which could be
part of a dedicated special 'ContentType' like a 'HxSpreadSheet'. The
filtering is now only applied if the special 'ContentType' parameter
is encountered while parsing the Parameters section of the file.
The main purpose of 'DispatchFilter' objects usde for filtering is two
fold.
  1) Indicate whether the content of the file needs a special handling
     during setup of the AmiraHeader File Structure
  2) Filter and rewrite the names of the arrays and the correponding
     data_names where necessary and update the affected low level
     array_defintion objects and data_defintion objects accordingly

The benefit of this refactoring is that the dispatch time filtering
became more stable and the interface could be further simplified.
…ing in case it can not be excluded that files are opened for loading.
@paulkorir paulkorir merged commit ac19eac into emdb-empiar:dev Jan 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants