Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Step index of DataCollector Dataframes are not synchronized between agent and model variables #1768

Open
jabrell opened this issue Aug 17, 2023 · 6 comments
Labels
bug Release notes label

Comments

@jabrell
Copy link

jabrell commented Aug 17, 2023

Describe the bug
Using the DataCollector indexes for the model step of the DataFrames coming from .get_model_vars_dataframe() and .get_agent_vars_dataframe() are different if the DataCollector is called after the schedule.step(). I.e., if the model's step function looks like:

def step(self):
    self.datacollector.collect(self)
    self.schedule.step()

The "Step" index of the dataframe returned by .get_agent_vars_dataframe() will start with 1. In contrast, the index of the dataframe from get_model_vars_dataframe() will start with 0. This makes it diificult to merge frames in result processing.

Expected behavior
The indexes referring to the model step should be consistent. In the setting above, I would expect all step index starting at 1 (since the model has made one step before the report).

To Reproduce
The behavior can be reproduce by altering the step function to the code above in the MoneyModel in the Collecting data section of the introductory tutorial

Additional context
I am a newbie to Mesa, so I can only guess on the reasons. My first guess would be that the different index are caused by the different treatment of model and agent_reporters in the collect function of the DataCollector. The model_reporter seems to be list-based and the index is implicitly derived leading to the index starting at zero independently whether the collector is called before or after execution of the step. In contrast, the agent_reporters use the Scheduelers step property which seems to be advanced by one if the collector is called after execution of the step.

@rht
Copy link
Contributor

rht commented Aug 19, 2023

It's definitely confusing for the use case when you want to merge the 2 DF's. I don't have a good design solution for this. For now you can just do df_model.index += 1 or something.

But even without merging, people could confuse the index of the model DF as step number.

@jabrell
Copy link
Author

jabrell commented Aug 21, 2023

Thats what I ended up doing. I see that point that there is no "easy" fix for this.

To lower the confusion, what about adjusting the docstring of "get_model_vars_dataframe"? Currently it states:
The DataFrame has one column for each model variable, and the index is (implicitly) the model tick.

Maybe changing it to something like:
The DataFrame has one column for each model variable, and the index is equal to the order of reporting these variables. If the data collector is called before the step function of the model, that (implicitly) equals the model step.

@tpike3 tpike3 added the bug Release notes label label Sep 9, 2023
@rht
Copy link
Contributor

rht commented Sep 9, 2023

@jabrell we are marking this to be resolved before the next patch release.

@NirobNabil
Copy link

hello @rht, I was looking into this issue as preparation for gsoc24. Do you think it would be a good idea to change model_vars[var] from Lists to Dictionaries? In that case i can try to submit a PR with accompanied changes.

Also i found some TODO while going through the code, given the extensive discussions going on about revamping datacollector, is it a good idea to work on the TODOs? Because if you are planning to replace the datacollector with new code anyway then working on it wouldn't make sense.

@rht
Copy link
Contributor

rht commented Mar 27, 2024

hello @rht, I was looking into this issue as preparation for gsoc24. Do you think it would be a good idea to change model_vars[var] from Lists to Dictionaries? In that case i can try to submit a PR with accompanied changes.

SGTM. There shouldn't be much performance hit due to switching to dictionary.

It's fine to work on the TODO's for components that are about to be deprecated, for exercise/educational purpose. For example, place_agent will be deprecated, but we accept #2083 anyway.

@NirobNabil
Copy link

Thank you, I'll get to work right away.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Release notes label
Projects
None yet
Development

No branches or pull requests

4 participants