Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add local file option and formatter #48

Merged
merged 46 commits into from
May 2, 2024
Merged

Conversation

AlecThomson
Copy link
Contributor

@AlecThomson AlecThomson commented Apr 10, 2024

Allows local file support (i.e. if you wget a bunch of IONEX data for offline use).

Adds a formatter for the ever-changing IONEX formats.

Also some tweaks to typing for dev use

Closes #46

@maaijke maaijke marked this pull request as ready for review April 19, 2024 14:09
@maaijke maaijke self-assigned this Apr 19, 2024
RMextract/getIONEX.py Outdated Show resolved Hide resolved
RMextract/getIONEX.py Outdated Show resolved Hide resolved
@AlecThomson
Copy link
Contributor Author

@maaijke - I've reworked a few things, which should address your comments. I've also added some more standard use of pathlib and urllib. Although, I haven't been able to do this throughout the entire codebase. The former makes life much easier for dealing with paths (over using strings and os.path).

I've added a new formatters submodule, which should lay things out more explicitly for a user. I've also dropped the redundant yy from the format.

On the change of filename extension, I think it should be easy enough to do in the new submodule. For example (that I'm making up), if the extensions change after 2025 we could add something like:

def ftp_aiub_unibe_ch(server: str, year: int, dayofyear: int, prefix: str) -> str:
    """
    Format the URL for the ftp.aiub.unibe.ch server
    """
    ext = ".Z" if year <= 2024 else ".gz"
    return f"{server}/CODE/{year:4d}/{prefix.upper()}{dayofyear:03d}0.{year%100:02d}I{ext}"

@AlecThomson
Copy link
Contributor Author

Oh, I also did a quick bit of autoformatting to make my IDE happy. Sorry if this touched a few files. This mostly fixed up the order of imports (compliant with isort), so shouldn't change any actual code behaviour.

Copy link
Collaborator

@gmloose gmloose left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You made a lot of good improvements. This is very much appreciated, because the code needed some extra love and care. I made a few minor remarks.

I saw in the email conversation you also suggested to use a linting tool like ruff, and to use more rigorous code formatting (I would suggest to use black for that). I can only support that.
Also, I think the package desperately needs tests. I consider that even more important than proper formatting.

Please feel free to become more closely involved in improving RMextract.

Comment on lines 20 to 23
logger.debug('casacore will be used to compute positions')
except ImportError:
logger.debug('We will need PyEphem to perform calculations!')
logger.debug('the accuracy of results might decease a bit')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would log this at info level.

RMextract/PosTools.py Outdated Show resolved Hide resolved
RMextract/PosTools.py Show resolved Hide resolved
RMextract/getIONEX.py Outdated Show resolved Hide resolved
RMextract/getIONEX.py Show resolved Hide resolved
RMextract/getIONEX.py Show resolved Hide resolved
RMextract/getIONEX.py Outdated Show resolved Hide resolved
RMextract/getIONEX.py Show resolved Hide resolved
RMextract/getRM.py Show resolved Hide resolved
use_mean: Optional[bool] = None,
stat_names=[],
useEMM=False,
object = '',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind-a ugly to use a built-in class name as variable name. You may want to rename it.

Copy link
Contributor Author

@AlecThomson AlecThomson Apr 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed - this argument was copied from the existing hidden kwarg:

if key=='object':

Happy to change, but it could break existing use elsewhere.

@gmloose
Copy link
Collaborator

gmloose commented Apr 24, 2024

One thing that bothers me a bit, but is unrelated to your changes is the dependency on PySocks. That package seems to be unmaintained. The latest release was made in 2019, and I read somewhere that people already run into issues with Python 3.10. I think it should be replaced with something more modern (don't know yet which package that would be), or we should fall back to the socket package from the Python standard library.

@AlecThomson
Copy link
Contributor Author

You made a lot of good improvements. This is very much appreciated, because the code needed some extra love and care. I made a few minor remarks.

I saw in the email conversation you also suggested to use a linting tool like ruff, and to use more rigorous code formatting (I would suggest to use black for that). I can only support that. Also, I think the package desperately needs tests. I consider that even more important than proper formatting.

Please feel free to become more closely involved in improving RMextract.

Thanks @gmloose! Happy to help out where I can!

A few top-level thoughts and some additional context. My changes here started from using RMextract in a) a highly parallel manner, and b) an environment where FTP was blocked on compute nodes. This is what lead me to adding the formatting option (allowing for easier offline file access) and catching some race conditions. I'll admit that perhaps this PR has grown a little larger than that narrower set of objectives. For the most part, I tried not to touch any of the actual logic in the existing code.

@maaijke mentioned that some more major reworks are potentially in order as well, which added some more motivation for me not to change much of the logical code - rather I tried to make the existing logic clearer for any devs / contributors (starting with myself!). I'm happy to provide some suggestions on how a rework might look - but I can't commit the time to the actual implementation. Plus, I think there'd need to be clear agreement from the core maintainers about how to go about that.

I'm happy to start adding tests for the functions I've 'touched' along the way. I've found this to be the easiest way to get unit tests in for a large project that has grown without them. Some less unitary end-to-end tests are useful as well. Perhaps after this PR merge we can start looking at creating those frameworks.

On formatters, I'm also a fan of Black. However, I've recently made the move to Ruff as it seems to be where the wider Python community is moving - and the completely replicates the feature set with greater speed.

One thing that bothers me a bit, but is unrelated to your changes is the dependency on PySocks. That package seems to be unmaintained. The latest release was made in 2019, and I read somewhere that people already run into issues with Python 3.10. I think it should be replaced with something more modern (don't know yet which package that would be), or we should fall back to the socket package from the Python standard library.

I added socks to the setup.py as it was already a silent dependency in some of the functions. However, I think it went unnoticed since it was buried in proxy server logic.

@gmloose
Copy link
Collaborator

gmloose commented Apr 24, 2024

One thing that bothers me a bit, but is unrelated to your changes is the dependency on PySocks. That package seems to be unmaintained. The latest release was made in 2019, and I read somewhere that people already run into issues with Python 3.10. I think it should be replaced with something more modern (don't know yet which package that would be), or we should fall back to the socket package from the Python standard library.

Funny. The well-known and often-used requests package added support for socks in version 2.10. And guess what..., for this it depends on the third-party package PySocks. 🤔

@gmloose
Copy link
Collaborator

gmloose commented May 1, 2024

@maaijke, do you want to press the Merge button? Please do not forgot to select "Squash and merge".

I think I need one more change to make sure the Linc interface script is still working. Let me add that and then we can go for it. Can we add Alec as a contributor to the main page?

@maaijke maaijke merged commit 45498a5 into lofar-astron:master May 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow for local path to IONEX data
3 participants