Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Environment variables #7

Closed
hannorein opened this issue Jan 14, 2023 · 29 comments
Closed

Environment variables #7

hannorein opened this issue Jan 14, 2023 · 29 comments

Comments

@hannorein
Copy link
Collaborator

I suggest not using environment variables. I think this can be confusing to users who might not be familiar with the concept (one rarely comes across environment variables when using standard python packages).

Right now environment variables are only used in two locations to specify the path for data files. I suggest having an explicit init function that has to be called by the user before using assist. Something like

#include "assist.h"
assist_init("path/to/planet/ephem", NULL);
// or
assist_init("path/to/planet/ephem", "path/to/smallbody/ephem");

and similarly in python

import assist
assist.init("path/to/planet/ephem")
# or 
assist.init("path/to/planet/ephem", "path/to/smallbody/ephem")

If this function call results in an error for the user, then it's immediately clear what the issue is and how to fix it. It removes the stumbling block of learning what an environment variable is and how to set it.

Somewhat related to this, the metadata and the pointer to the data structures get stored in a static variable, which is effectively a global variable. If you only every read from that variable after it's initialized (which I think is the case), then the reading part is thread safe. However, the initialization is not. Imagine two threads both initializing this structure at the same time. It might work in practice, because it's a simple data structure, but there really should be a mutex around the init function if we want to claim it's thread safe (and to prevent we're accidentally loading in the data and allocating memory more than once).

@matthewholman
Copy link
Owner

Should this be part of the existing assist_initialize() function or something separate?

@hannorein
Copy link
Collaborator Author

I'm not sure yet how to do this best...

@matthewholman
Copy link
Owner

I made a first attempt at this. There is now an assist_ephem_init("/path/to/planets/file", "path/to/asteroids/file") function. This has to be called before assist_integrate() is called.

There is probably a more graceful solution, but this is a start.

I need to add a corresponding python function.

@matthewholman
Copy link
Owner

Hanno, you'll need to add the paths to your ephemeris files for the unit tests to pass.

@hannorein
Copy link
Collaborator Author

Cool. Something like this is what I had in mind. I'm teaching today, but I'll think about how to make it more graceful/optional. I think it would make sense to look for the datafile in ASSIST_DIR/data by default.

This is much easier in python where you can just have an optional argument.

@aryaakmal
Copy link
Collaborator

aryaakmal commented Jan 23, 2023

Matt- it looks like your changes are specific to 'mholman' ... does the user need to modify the code ad hoc, or do you intend to make this more general? Update: Nevemind - looks like that is only in the unit tests (and the problem.c) .

@aryaakmal
Copy link
Collaborator

Would it make sense to look for the ephemeris files in the default location and only call assist_ephem_init if they are not found? Or modify the function in forces.c to look in the default location ...

@matthewholman
Copy link
Owner

The main ideas are (1) to avoid using environment variables, because users are not always familiar with those; and (2) to have the users explicitly call something like assist_ephem_init() with paths to the ephemeris files, so that any errors are very clear.

@aryaakmal
Copy link
Collaborator

aryaakmal commented Jan 23, 2023

So, by default assist looks for rebound installation in the same directory unless the REB_DIR environment variable is set. By the same token, cannot assist look for the files in assist/data by default before calling assist_ephem_init? The version with environment variables would look there, unless the environment variable were set to something else - so this would follow the same logic without requiring the user to set them, or to call assist_ephem_init, unless they choose to use some non-default location.

@hannorein
Copy link
Collaborator Author

I like the idea of @aryaakmal. We could have a char[] in assist_extras which is set NULL by default (look for files in default directory). If the user calls assist_ephem_init (or maybe assist_set_path) then it would set the char[] to something other than null and ASSIST would look there for the files.

@hannorein
Copy link
Collaborator Author

(with a graceful exit and message if the files are not in the default location)

@aryaakmal
Copy link
Collaborator

aryaakmal commented Jan 23, 2023

The same logic should apply to the wrapper. Currently a pip install of assist will not find the ephemeris files unless the environment variable is set - or unless the files are in the default location and the following is set

sys.path.append(r'/Users/mholman/assist')

@matthewholman
Copy link
Owner

I like the idea of storing the file names (or possibly pointers to the ephemeris objects) in assist_extras, but I am getting a little stuck on the order of things. With the current design, memory for a version of assist_extras is allocated when assist_integrate() calls assist_attach(), which then calls assist_initialize(). But that happens after assist_ephem_init().

@matthewholman
Copy link
Owner

I pushed another attempt at this, but it seems to have broken the benchmark unit test (and perhaps others).

@hannorein
Copy link
Collaborator Author

hannorein commented Jan 24, 2023

I agree, that order of function calls and memory management are a bit confusing. We struggled a lot with that with REBOUND and REBOUNDx too. One issue with having one function assist_integrate() is that everything is happening in one function call. So any new functionality needs to be a new argument which can quickly become overwhelming for someone who only uses a fraction of the functionality. If you look at some other C libraries, the workflow is typically something like:

struct object = smth_create();
object->setting = 0;
smth_do(object);
smth_free(object);

This is what I tried to replicate using the "plain interface". When it comes to memory management, the important question to always ask is: who owns the memory? The answer almost always helps with figuring when to allocate and free memory. For example, right now the space for the ephemeris data is allocated in assist_jpl_init. But it's never freed. That's not completely wrong, because the memory is allocated as a static variable. But I think it would make sense to add the data to the assist_extras struct. Then it's clear who it belongs to and it can be freed by assist_free.

@matthewholman
Copy link
Owner

This makes a lot of sense, but I think I am still missing something. Does this mean that there can only be one assist_extras struct?

@hannorein
Copy link
Collaborator Author

No quite the opposite. You could have multiple assist_extras in one simulation. For example, say, one which has GR turned on and one where it's off. Then you could integrate both of them and study the effect of GR on the trajectory.

@matthewholman
Copy link
Owner

That's what I hoped! But how do we share the memory allocated by assist_jpl_int and then free it when all the runs are finished?

@hannorein
Copy link
Collaborator Author

hannorein commented Jan 24, 2023

I'm not sure about the best approach! I can see two options:

  1. Keep that memory static, but add a "reference counter" (also static) that gets increased when assist_extras is called and decreased when assist_free is called. When the last assist_free sees a reference counter of 0, it can free the static memory.
  2. Just not bother and have the memory allocated multiple times so that each assist_extras struct owns their own memory can can free it when itself is freed.

I think 2) would be the cleanest approach. It also works on multiple threads without any extra work. Do you think performance would be an issue? It depends how short or long individual integrations typically are.

@hannorein
Copy link
Collaborator Author

Somewhat related: I'm trying to understand a bit better what assist_jpl_init does. It seems to read in names and values of constants in the ephemeris file. But they are never used. Instead, the constants that are hard coded in const.h are used. That file was generated with dev_tools/constants/main.c.

Wouldn't it make sense to just use the constants in the ephemeris file without this intermediate step?

@matthewholman
Copy link
Owner

Option 2 is definitely cleaner, but there is some overhead in the initialization step, as you just saw. Option 1 seems easy enough, but I don't know what would be entailed in making that thread safe.

@aryaakmal
Copy link
Collaborator

aryaakmal commented Jan 24, 2023 via email

@matthewholman
Copy link
Owner

The const.h file gives us easy access to the various constants, but it be done other ways. I believe @aryaakmal is correct that the version in jpl_init is vestigial.

@hannorein
Copy link
Collaborator Author

(I think it would be more consistent to read the constants from the ephemeris file directly. If the file ever changes, one would need to manually recreate const.h. And given that the ephemeris file is constantly queried for all the actual ephemeris data, the ~20 constants that would be read only once at the beginning should not affect the performance. But I think that leads us away from the main topic.)

Regarding the bigger picture. How about the following:

We could create a new structure assist_ephem which represents the ephemeris data. The syntax could look something like:

struct assist_ephem* ephem = assist_ephem_alloc("path to file");
...
xyz = assist_ephem_get_position(ephem, PLANET_MARS, 12345.55);
...
assist_ephem_free(ephem);

After creating assist_ephem, it's read only. Nothing ever needs to be changed. Because of that, thread safety is easy. Functions that just query the position of a body at a given time such as assist_ephem_get_position above would only need access to assist_ephem. They don't need access to an assist_extras. Maybe users actually like the ability to simply query the ephemeris file for positions without doing an integration. Internally, the force routines would also call something like assist_ephem_get_position.

When one wants to do an integration then one creates an assist_extras structure. This would need two arguments, the REBOUND simulation to attach itself to, and an assist_ephem to query the ephemeris. So the entire syntax could look something like this:

struct reb_simulation* sim = reb_create_simulation()
struct assist_ephem* ephem = assist_ephem_alloc("path to file");
struct assist_extras* extras = assist_extras_init(sim, ephem);

assist_integrate(extras, ...);
// or
reb_integrate(sim, 1234.56);

...
rebound_simulation_free(sim);
assist_extras_free(extras);
assist_ephem_free(ephem);

One assist_ephem can be used by multiple assist_extras (because it's read only), so only one (slow) initialization is needed. The syntax is more verbose than before. But I think this would make sense logically. ASSIST is basically a glue code providing a bridge between REBOUND and the JPL ephemeris data. The JPL ephemeris functionality would be a feature of their own. And maybe attract a few more users who are only interested in accessing the ephemeris data and don't want to do an integration. Doing it this way would also get rid of all the static variables (which people in general don't like too much).

@matthewholman
Copy link
Owner

This is a nice solution, @hannorein. Thank you for sketching it out so clearly.

I agree that the ephemeris access routines will be useful by themselves. I've been using all_ephem() for various things; it's accessible from python. We can make sure the new version is too.

I think I can get started on this code, unless you or @aryaakmal would prefer to.

@hannorein
Copy link
Collaborator Author

Either way is fine. I should also have time tomorrow. One thing I’m not super happy with yet is the naming: init, create, alloc, attach all kind of do the same thing. I think I’m mostly to blame for this. It has never been consistent in Rebound …

@matthewholman
Copy link
Owner

unit_tests/benchmark has an issue where it appears that memory is being modified about being freed. The error says it's in rebound, but my bet is that it's something to do with the changes I made to assist_ephem_init(). Unfortunately, I haven't identified the problem yet. @hannorein, could you take a look?

@hannorein
Copy link
Collaborator Author

Will do.

I'm implementing the stuff we talked about above right now. Let me know if that interferes with what you're doing and I should stop.

@matthewholman
Copy link
Owner

No, it doesn't, but I will pause anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants