-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding more features for Enzo, yt, and rockstar #205
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! This looks really neat as we discussed before.
I spotted a few things that I've commented on individually, all very minor.
The failing integration test should start passing if you merge the latest master branch -- it is related to changes there rather than anything that looks wrong here.
I was wondering whether you have any smallish sample files that you would be willing to add to the testing files distribution? This way we could add them to the test database build to guard against us inadvertently breaking things later, as most of us are not using yt so it might fly under the radar.
@@ -206,8 +217,37 @@ def filename(cls, timestep_filename): | |||
if basename.startswith("snapshot_"): | |||
timestep_id = int(basename[9:]) | |||
return os.path.join(dirname, "out_%d.list"%timestep_id) | |||
elif basename.startswith(("RD","DD")): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does the basename starting with RD/DD indicate that rockstar was run with yt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! I've separated all of the DD/RD checks out from the checks for datasets.txt. The new structure is: check for 'snapshot_' format, check for the existence of 'datasets.txt', then check for 'RD/DD' format. I guess we could also move the datasets.txt check above the 'snapshot_' check in case outputs with this format were run with yt's rockstar, but I don't know enough about GADGET outputs to know if that's useful.
tangos/examples/misc.py
Outdated
import re | ||
|
||
|
||
def timestep_index(self,tstep,**kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is potentially confusing because some people may ingest simulations with only some time steps from the original sequence. If I understand what you're doing here correctly, it's pulling out the probable output number from the filename of the simulation itself. If so, I'm not sure this is worth including, as may cause more confusion than help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed! I removed this.
@@ -206,8 +217,37 @@ def filename(cls, timestep_filename): | |||
if basename.startswith("snapshot_"): | |||
timestep_id = int(basename[9:]) | |||
return os.path.join(dirname, "out_%d.list"%timestep_id) | |||
elif basename.startswith(("RD","DD")): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the significance of starting with RD or DD -- why does this indicate that rockstar was run with yt? Would it be ok just to look for the existence of datasets.txt
as your next check does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above - you're right, no reason only Enzo outputs can use yt's rockstar!
tangos/input_handlers/yt.py
Outdated
|
||
def load_timestep_without_caching(self, ts_extension, mode=None): | ||
from yt.data_objects.particle_filters import add_particle_filter | ||
if mode!=None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be mode is not None
for style
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
tangos/properties/yt/basic.py
Outdated
potids = dbid[masses<halo_entry['Mvir']] | ||
return np.array([db.get_halo(x) for x in potids[host_mask]]) | ||
|
||
class GetTimestepName(LivePropertyCalculation): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this already exists as step_path
so I would prefer not to duplicate it here under another name
You can find the existing implementation in live_calculation.builtin_functions.IntrinsicProperties
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent! Removed this.
tangos/properties/yt/basic.py
Outdated
|
||
class FindCenter(PropertyCalculation): | ||
"""Returns center array in halo finder units""" | ||
names = "Center" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally names of properties implemented in tangos don't have a capital letter in them unless there is a particular reason. e.g. M200
deserves a capital because it is referring to a variable name with the capital, but Center
to my mind doesn't because it's just a normal word (so I'd call it center
). I realise there are a few exceptions, but generally this is true. Is it possible to make the property names lower case here, or will that break your workflow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine! I've changed the variable names so that most are lower case. The only ones I left as is were those that are translated directly out of rockstar (e.g., X_Mpc). Is this okay? Or would you prefer that these be lower case, as well?
if match: | ||
# Check whether datasets.txt exists (i.e., if rockstar was run with yt) | ||
if os.path.exists(os.path.join(basedir, "datasets.txt")): | ||
with open(os.path.join(basedir, "datasets.txt")) as f: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I notice you've used this datasets.txt
files in a few different places. It may be that the logic about its use deserves to be in a utility function somewhere, and called from all the different places rather than reimplemented each time. Not 100% necessary, just would be neater if it's possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a shot at writing a quick utility function for this. For now, it's in tangos/util/read_datasets_file.py. Is this the right place for it? (and does the function itself look okay? It seems to work as expected, but I know there might be a more efficient way to accomplish the same thing)
Thank you so much for your feedback - I'm sorry it's taken me so long to get back around to this! I finally had a chance to do some work on the PR over the last couple of days. I've messaged Molly and Jason about sending you a few sample files. We ran some very small dwarf volumes about a year ago that I think might work and which I doubt they'll mind sharing. I'll let you know when I hear back from them! |
It turns out even our dwarf volumes are fairly hefty. However, there are a couple of low resolution Enzo datasets that are both very small and publicly available, so I ran rockstar and my fork of tangos on a few timesteps from one of them. Everything seems to have gone smoothly! I've uploaded all of the relevant data here. Do you think this would work as an addition to the testing files distribution? |
Thanks so much for doing this! It is actually much smaller than I meant by 'smallish' (I thought GB would still be fine) but that's not a problem - it can still be added as part of the test database build. Leave that with me and I will try to merge this soon. The code looks great now. |
Awesome! |
Hi @anchwr -- I've just got back to merging this, and I am wondering if you could guide me on what can be tested using your small snapshots. I can |
Oh! Good question. Here are the commands I used to make the db:
We probably don't need all of that, as some of it is really testing the same set of functions as other bits, but it should hopefully all work! |
This is perfect, thanks. When you use (You can see an example here: https://github.com/pynbody/tangos/actions/runs/3051994309/jobs/4920876522#step:10:1244) |
Thanks very much for all this work. It's now merged. We should deal with |
Oh that's bizarre (and a bit worrying)! I don't get those warnings. Maybe I can experiment a bit with yt and gc and see if there are certain circumstances under which they just don't cooperate. |
Yes, if you have any insights let me know! |
PS it may be a yt version thing? I am using v4.0.5 and so is the github action. |
Oh, maybe! I'm on the 4.1 dev version. I will try installing 4.0.5 and see if the warnings appear. |
No warnings so far! I'll keep thinking about it, though. |
I've added a handler for Enzo simulations run with the rockstar halo finder, a few extra properties for yt, and the option to translate some of the quantities calculated by rockstar into physical units.