Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate rows in group source downloads #3516

Closed
bfhealy opened this issue Sep 29, 2022 · 7 comments
Closed

Duplicate rows in group source downloads #3516

bfhealy opened this issue Sep 29, 2022 · 7 comments

Comments

@bfhealy
Copy link
Contributor

bfhealy commented Sep 29, 2022

When downloading group sources from Fritz using the frontend or api, the result sometimes contains duplicate rows (although always the correct number of total rows, so some sources are missing). I have not encountered this problem for smaller groups (<~ 1000 sources), but for larger groups I get a variable number of duplicates each time I download. Some additional info:

  • There are no duplicates on any single page; they instead appear on separate pages.
  • The number of duplicates increases for a large ratio of total sources to number per page.
  • I have been unable to replicate this behavior on my local SkyPortal instance, including in scenarios where sources are members of multiple groups or other sources are in the process of being posted.

It is possible that modifying this call of grab_query_results:

query_results = grab_query_results(

by setting use_cache=True, see

def grab_query_results(

will solve the problem. We'd also have to decide on how long to maintain the cache.

@bfhealy
Copy link
Contributor Author

bfhealy commented Oct 3, 2022

I've tested setting use_cache=True as described above in my local Skyportal instance, and group source downloads (frontend and api) still work as expected. Is there any other functionality that may be adversely affected by setting this keyword?

@mcoughlin
Copy link
Contributor

@guynir42 What do you think of trying this and seeing how it goes this week?

@guynir42
Copy link
Contributor

guynir42 commented Oct 3, 2022

You mean adding use_cache=True to the download CSV option?

@mcoughlin
Copy link
Contributor

@guynir42 yes.

@bfhealy
Copy link
Contributor Author

bfhealy commented Oct 10, 2022

Unfortunately PR #3569 has not resolved the duplicate row issue with Fritz. I am still unable to replicate the issue locally.

@bfhealy
Copy link
Contributor Author

bfhealy commented Oct 11, 2022

Testing on a group of ~10,000 sources in preview did not produce any duplicates. It seems this issue is specific to fritz.science.

@bfhealy
Copy link
Contributor Author

bfhealy commented Feb 13, 2023

@mcoughlin @guynir42 In the last several weeks, my downloads of group sources on Fritz have produced no duplicates. Not sure what changed for the better, but I'll close this issue for now.

@bfhealy bfhealy closed this as completed Feb 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants