Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NamedStar() ignores Loader.directory #174

Closed
murison opened this issue Apr 20, 2018 · 9 comments
Closed

NamedStar() ignores Loader.directory #174

murison opened this issue Apr 20, 2018 · 9 comments

Comments

@murison
Copy link

murison commented Apr 20, 2018

skyfield.api.NamedStar(name), when it needs to download hip_main.dat.gz, ignores Loader.directory and instead always downloads to (and looks in) the current directory. This seems to be an oversight?

@brandon-rhodes
Copy link
Member

Yes! It was an experimental feature added by a volunteer, and never fully integrated into how Skyfield works; at some point I want to spend more time on stars, making it possible to do transforms on whole arrays of stars etc. "NamedStar" therefore isn't officially in the API docs at this point:

http://rhodesmill.org/skyfield/api.html

I'm happy to keep this issue open to remind us that we need a better story for stars and drawing star charts.

@anielsen001
Copy link

I created a Hipparcos class in a fork that tries to handle loading the Hipparcos catalog via the standard Loader object here: https://github.com/anielsen001/python-skyfield/blob/iohip/skyfield/data/hipparcos.py

It currently breaks the NamedStar class, but handles creating an Hipparcos class that can be accessed similar to a dictionary by ID and you can cross-reference with the named_star_dict to get a Star object.

I thought about creating a pull request but it appears that the hipparcos module was being worked as I wrote my version and thought it would be better to ask if my changes might be of interest since the won't merge as is right now.

@brandon-rhodes
Copy link
Member

Thanks for asking, @anielsen001! As I'm seriously working on drawing star charts with Skyfield for the first time, and am wanting to do things like show all stars down to 8th magnitude or even dimmer, the number of matching stars is reaching many thousands, and it looked like it was going to be cumbersome to try to represent every star by a separate Python object.

So the experiment you see underway in the hipparcos.py file imposes a new burden on users that isn't normally needed with Skyfield — to use the routine, they would need to get pandas installed — but the result is that reading in the whole Hipparcos catalog becomes just a few lines of code, and that the catalog can then trivially be filtered by magnitude or RA and dec because there only exists a single Star object whose RA, dec, and magnitude are huge arrays with an array entry for every single Hipparcos star.

Given how easy the result is to use, and how quickly I can put thousands of stars on a chart, I suspect that this is the best way forward — just like Skyfield's Time object can represent thousands of dates, and a position can be thousands of separate positions, the star object should now normally be used to hold the whole set of stars the user is interested in processing.

Some interesting directions in the future might be:

  1. Adding more catalogs besides just Hipparcos.
  2. Adding color and proper motion as an additional Hipparcos fields; we don't want to load too many, because that would slow everything down, but should at least be loading the ones Skyfield users are likely to use.
  3. Maybe adding an alternative loading routine that doesn't need Pandas? We should probably wait and, once the new DataFrame-based code is released, watch the Issues for whether folks have a hard time getting Pandas installed or not.
  4. Users will still want a way to fetch a known, named star without looking up its Hipparcos ID, so we should look at easy ways, once Hipparcos is loaded as a Pandas DataFrame, that users could easily select one star out of the catalog. I'm still thinking through how the DataFrame should become a Star object. Right now I'm just copying fields over. I don't think I want Star to actually become a DataFrame subclass, because that would be icky — Star couldn't then be used without Pandas installed. But maybe there should be a Star.from_catalog(catalog) method or something?

Anyway, yes, this part of the code is moving rapidly right not and, alas, it looks from my tests like having one Star object per star won't scale to large catalogs and large star charts — it's simply too expensive to represent each point with a separate class instance, so my guess is that your interesting experiment should continue to power your own project you're using it in, but that solving the general problem of star catalogs needs to either use Pandas DataFrames (as in my current experiment) or at the very least NumPy arrays of RA, Dec, and magnitude.

I'm glad you are interested in this part of the code, and would love to hear your use case so that the new experiment in Pandas can be tried out by someone else using Stars for different use cases than I'm currently tackling, so I'd love to hear more!

@anielsen001
Copy link

Thanks for your comments. I think that the pandas solution will work better for most cases. My solution could use pandas under the hood as well which might make more sense than scanning the entire text file repeatedly for each star request.

I'm working on a celestial navigation project and I came here trying to generate some simulated scenarios for testing. Right now, I only need a handful of stars so the single star per object works OK for me. I'm using hand-held cell phone images and I get solutions from astrometry.net, but there are lots of issues and I'm developing an understanding of the error process. I'm actually curious to try out your star-chart generator for this, so if you need testers, I'll volunteer.

I started trying to get real star names by implementing a cross-reference tool which is similar to #40 and you can find my rudimentary example here anielsen001/astrocat-utils

@murison
Copy link
Author

murison commented Jun 3, 2018

Is there something that requires Pandas? Wouldn't astropy Tables fit the same bill?

@brandon-rhodes
Copy link
Member

That's a good question! Pandas seems at the moment to be a much more widely used tool, and it happened to be one that I already know (I taught a tutorial on it a few years ago at PyCon) which made it easy to immediately get started with. Also, I'm not entirely familiar with why astropy has its own internal implementation of a DataFrame; and I suspected that more folks have access to Pandas than astropy, if there's any different at all in availability between the two of them.

It's also possible I should just use NumPy and not introduce another dependency beyond it, and folks could use either Pandas or Astropy on top of that.

@JoshPaterson
Copy link
Contributor

Would the Star object also be useful for catalogs of deep sky objects?

@brandon-rhodes
Copy link
Member

@JoshPaterson Yes — for any objects whose positions are fixed, or are modeled as moving in a straight line across the sky.

@brandon-rhodes
Copy link
Member

In a few minutes I'll be releasing Skyfield 1.7 and will update the "Stars" documentation to reflect the new approach of loading a star catalog as a Pandas dataframe:

http://rhodesmill.org/skyfield/stars.html

Stars will now be looked up by using the Pandas lookup operator with the star's designation in that catalog; the first example shown will be using the Hipparcos "HIP" number. We can in the future expand the options for translating names like "Polaris" into the identifiers for specific catalogs, and we can also slowly add support for other star catalogs besides Hipparcos.

I am marking the old NamedStar() experiment as deprecated. But the dict of star names is interesting; I may in the next version work it back in as an alternative way to designate stars.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants