Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding more features for Enzo, yt, and rockstar #205

Merged
merged 17 commits into from
Sep 14, 2022
Merged

Conversation

anchwr
Copy link
Contributor

@anchwr anchwr commented Jul 18, 2022

I've added a handler for Enzo simulations run with the rockstar halo finder, a few extra properties for yt, and the option to translate some of the quantities calculated by rockstar into physical units.

Copy link
Member

@apontzen apontzen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! This looks really neat as we discussed before.

I spotted a few things that I've commented on individually, all very minor.

The failing integration test should start passing if you merge the latest master branch -- it is related to changes there rather than anything that looks wrong here.

I was wondering whether you have any smallish sample files that you would be willing to add to the testing files distribution? This way we could add them to the test database build to guard against us inadvertently breaking things later, as most of us are not using yt so it might fly under the radar.

@@ -206,8 +217,37 @@ def filename(cls, timestep_filename):
if basename.startswith("snapshot_"):
timestep_id = int(basename[9:])
return os.path.join(dirname, "out_%d.list"%timestep_id)
elif basename.startswith(("RD","DD")):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the basename starting with RD/DD indicate that rockstar was run with yt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I've separated all of the DD/RD checks out from the checks for datasets.txt. The new structure is: check for 'snapshot_' format, check for the existence of 'datasets.txt', then check for 'RD/DD' format. I guess we could also move the datasets.txt check above the 'snapshot_' check in case outputs with this format were run with yt's rockstar, but I don't know enough about GADGET outputs to know if that's useful.

import re


def timestep_index(self,tstep,**kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is potentially confusing because some people may ingest simulations with only some time steps from the original sequence. If I understand what you're doing here correctly, it's pulling out the probable output number from the filename of the simulation itself. If so, I'm not sure this is worth including, as may cause more confusion than help?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed! I removed this.

@@ -206,8 +217,37 @@ def filename(cls, timestep_filename):
if basename.startswith("snapshot_"):
timestep_id = int(basename[9:])
return os.path.join(dirname, "out_%d.list"%timestep_id)
elif basename.startswith(("RD","DD")):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the significance of starting with RD or DD -- why does this indicate that rockstar was run with yt? Would it be ok just to look for the existence of datasets.txt as your next check does?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above - you're right, no reason only Enzo outputs can use yt's rockstar!


def load_timestep_without_caching(self, ts_extension, mode=None):
from yt.data_objects.particle_filters import add_particle_filter
if mode!=None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be mode is not None for style

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

potids = dbid[masses<halo_entry['Mvir']]
return np.array([db.get_halo(x) for x in potids[host_mask]])

class GetTimestepName(LivePropertyCalculation):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this already exists as step_path so I would prefer not to duplicate it here under another name

You can find the existing implementation in live_calculation.builtin_functions.IntrinsicProperties

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent! Removed this.


class FindCenter(PropertyCalculation):
"""Returns center array in halo finder units"""
names = "Center"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally names of properties implemented in tangos don't have a capital letter in them unless there is a particular reason. e.g. M200 deserves a capital because it is referring to a variable name with the capital, but Center to my mind doesn't because it's just a normal word (so I'd call it center). I realise there are a few exceptions, but generally this is true. Is it possible to make the property names lower case here, or will that break your workflow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine! I've changed the variable names so that most are lower case. The only ones I left as is were those that are translated directly out of rockstar (e.g., X_Mpc). Is this okay? Or would you prefer that these be lower case, as well?

if match:
# Check whether datasets.txt exists (i.e., if rockstar was run with yt)
if os.path.exists(os.path.join(basedir, "datasets.txt")):
with open(os.path.join(basedir, "datasets.txt")) as f:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice you've used this datasets.txt files in a few different places. It may be that the logic about its use deserves to be in a utility function somewhere, and called from all the different places rather than reimplemented each time. Not 100% necessary, just would be neater if it's possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a shot at writing a quick utility function for this. For now, it's in tangos/util/read_datasets_file.py. Is this the right place for it? (and does the function itself look okay? It seems to work as expected, but I know there might be a more efficient way to accomplish the same thing)

@anchwr
Copy link
Contributor Author

anchwr commented Aug 10, 2022

Thank you so much for your feedback - I'm sorry it's taken me so long to get back around to this! I finally had a chance to do some work on the PR over the last couple of days. I've messaged Molly and Jason about sending you a few sample files. We ran some very small dwarf volumes about a year ago that I think might work and which I doubt they'll mind sharing. I'll let you know when I hear back from them!

@anchwr
Copy link
Contributor Author

anchwr commented Aug 20, 2022

It turns out even our dwarf volumes are fairly hefty. However, there are a couple of low resolution Enzo datasets that are both very small and publicly available, so I ran rockstar and my fork of tangos on a few timesteps from one of them. Everything seems to have gone smoothly! I've uploaded all of the relevant data here. Do you think this would work as an addition to the testing files distribution?

@apontzen
Copy link
Member

Thanks so much for doing this! It is actually much smaller than I meant by 'smallish' (I thought GB would still be fine) but that's not a problem - it can still be added as part of the test database build. Leave that with me and I will try to merge this soon. The code looks great now.

@anchwr
Copy link
Contributor Author

anchwr commented Aug 21, 2022

Awesome!

@apontzen
Copy link
Member

Hi @anchwr -- I've just got back to merging this, and I am wondering if you could guide me on what can be tested using your small snapshots. I can tangos add them successfully, but then it would be good to import/calculate some properties and merger trees to test all is working well. What should I expect to work?

@anchwr
Copy link
Contributor Author

anchwr commented Sep 13, 2022

Oh! Good question. Here are the commands I used to make the db:

tangos add enzo.tinycosmo --handler=yt.YtInputHandler --min-particles 100
tangos import-consistent-trees --for enzo.tinycosmo --with-ids
tangos import-properties Mvir Rvir X Y Z VX VY VZ --for enzo.tinycosmo
tangos import-properties Mvir_Msun Rvir_kpc X_Mpc Y_Mpc Z_Mpc --for enzo.tinycosmo
tangos write center center_Mpc --for enzo.tinycosmo
mpiexec -np 8 tangos write Mgas Mcoldgas Mstar contamfrac --for enzo.tinycosmo --backend mpi4py

We probably don't need all of that, as some of it is really testing the same set of functions as other bits, but it should hopefully all work!

@apontzen
Copy link
Member

This is perfect, thanks.

When you use tangos write, do you see warnings along the lines of check_deleted failed? I am getting these, and I think it means yt is not allowing the individual snapshots to be garbage-collected, which could result in spiralling memory usage. Was wondering if you saw these also.

(You can see an example here: https://github.com/pynbody/tangos/actions/runs/3051994309/jobs/4920876522#step:10:1244)

apontzen added a commit that referenced this pull request Sep 14, 2022
@apontzen apontzen merged commit 24db730 into pynbody:master Sep 14, 2022
@apontzen
Copy link
Member

apontzen commented Sep 14, 2022

Thanks very much for all this work. It's now merged. We should deal with check_deleted failed separately, will be interested to hear your thoughts on it

@anchwr
Copy link
Contributor Author

anchwr commented Sep 14, 2022

When you use tangos write, do you see warnings along the lines of check_deleted failed? I am getting these, and I think it means yt is not allowing the individual snapshots to be garbage-collected, which could result in spiralling memory usage. Was wondering if you saw these also.

(You can see an example here: https://github.com/pynbody/tangos/actions/runs/3051994309/jobs/4920876522#step:10:1244)

Oh that's bizarre (and a bit worrying)! I don't get those warnings. Maybe I can experiment a bit with yt and gc and see if there are certain circumstances under which they just don't cooperate.

@apontzen
Copy link
Member

Yes, if you have any insights let me know!

@apontzen
Copy link
Member

PS it may be a yt version thing? I am using v4.0.5 and so is the github action.

@anchwr
Copy link
Contributor Author

anchwr commented Sep 14, 2022

Oh, maybe! I'm on the 4.1 dev version. I will try installing 4.0.5 and see if the warnings appear.

@anchwr
Copy link
Contributor Author

anchwr commented Sep 14, 2022

No warnings so far! I'll keep thinking about it, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants