Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

copy of MulensData instance #40

Open
rpoleski opened this issue Apr 19, 2022 · 12 comments
Open

copy of MulensData instance #40

rpoleski opened this issue Apr 19, 2022 · 12 comments
Assignees

Comments

@rpoleski
Copy link
Owner

I'm writing code for cleaning the data. It involves masking and un-masking points as bad. I've found it quite complicated - one cannot do

dataset.bad[i] = True

because it doesn't change dataset.good. One has to do something like:

bad = dataset.bad.copy()
bad[i] = True
dataset.bad = bad

We can either change implementation of .bad and .good (I don't know how - @ketozhang do you know how?) or we can add a function that makes a copy of a MulensData object. @jenniferyee What do you think?

@jenniferyee
Copy link
Collaborator

I agree that this is a problem.

Currently "bad" and "good" are lists or np.array or something similar. There must be a way to make a class that behaves the way it needs to. For example, a "BadData" class (or maybe the class should be "DataQuality"?) whose properties "bad" and "good" are accessed by MulensData.bad and MulensData.good . Then it would have some kind of indexing method that inherits from list?

@rpoleski
Copy link
Owner Author

.bad and .good are two np.arrays of bool values and we want one to be logical not of the other all the time. I'm guessing it would be possible to define a new class that inherits from np.array and whenever a value of one of these arrays is changed, then the other is recalculated. However, to work fully it will require defining all operators that make sense for np.array, which would be lots of work. Maybe we should add notes in docstrings (so that users don't do data.bad[i] = True) and add the function that makes a copy of the MulensData class

@ketozhang
Copy link
Contributor

ketozhang commented Apr 19, 2022

I don't see a reason to keep a separate arrays for MulensData::_good and MulensData::_bad if they're opposite of each other. Can you do something like this?

class MulensData:

  @property
  def good(self):
    return self._good
  
  @property
  def bad(self):
    return ~self._good
  
  @good.setter
  def good(self, value):
    self._good = value
  
  @bad.setter
  def bad(self, value):
    self._good = not value

@rpoleski
Copy link
Owner Author

We had something like that before. But it turned out that np.logical_not() is extremely slow (at least on some computers). It was dominating the run time in case of the fitting tests we performed. Maybe it would be best to return to the previous version and use self._good as @ketozhang suggested. As far as I remember, we used self._bad previously. I'm guessing self._bad is very rarely used - maybe only in some plotting functions.

@ketozhang
Copy link
Contributor

Will have to take a look how it was used: shape of array and how often MulensData::bad was called.

Instead of @property and building your own cache, you could use functools caching to handle this. See this SO question

@rpoleski
Copy link
Owner Author

I've checked that the only place where MulensData.bad is used is plotting the dataset. There are a few calls in that function, but of course we could change that to a single call. Now I agree with Keto's comment 3 posts back.

@ketozhang
Copy link
Contributor

The solution I propose won't solve the problem where we want to allow the users to mutate MulensModel::bad[i] = value

To me, I think it's trivial to get the inverse (i.e., ~MulensModel::good) and to remove the property MulensModel::bad altogether (or the reverse since I more often see bad flags stored than good ones) whenever the user needs it.

@rpoleski rpoleski self-assigned this Oct 5, 2022
@rpoleski
Copy link
Owner Author

rpoleski commented Oct 9, 2022

MulensData.copy() added in commit 2aff631. Issue not yet fully resolved.

@jenniferyee
Copy link
Collaborator

jenniferyee commented Oct 5, 2023

Added a use case in dd54d4e

@rpoleski
Copy link
Owner Author

rpoleski commented Oct 6, 2023

I've checked that the solution from Keto's first reply in this thread is not solving calls like:

data.bad[12] = True

or

data.bad[np.where(data.mag < 13.)] = True

Though the same ones for data.good do work. The above ones don't work, but there is no error and no warning!

@ketozhang
Copy link
Contributor

ketozhang commented Oct 6, 2023

That's right, the first solution I proposed will not let you mutate bad.

I'm still inclined to suggest not providing your user good. If they want only goods, many familiar with numpy know how to filter out things from an array:

good = data[~data.bad]

@ketozhang
Copy link
Contributor

FYI, giving the user access to only the bad is the same behavior implemented in numpy's masked array.
https://numpy.org/doc/stable/reference/maskedarray.generic.html#accessing-only-the-valid-entries

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants