Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_gbq results in lingering system thread after function call #613

Closed
jlynchMicron opened this issue Feb 17, 2023 · 4 comments
Closed
Assignees
Labels
api: bigquery Issues related to the googleapis/python-bigquery-pandas API. priority: p3 Desirable enhancement or fix. May not be included in next release.

Comments

@jlynchMicron
Copy link

Thanks for stopping by to let us know something could be better!

PLEASE READ: If you have a support contract with Google, please create an issue in the support console instead of filing on GitHub. This will ensure a timely response.

Please run down the following list and make sure you've tried the usual "quick fixes":

If you are still having issues, please be sure to include as much information as possible:

Environment details

  • OS type and version: Linux CentOS 7
  • Python version: 3.10
  • pandas-gbq version: 0.19.0

Steps to reproduce

  1. Start debugger session
  2. Run read_gbq function
  3. Look at Call Stack after function execution and notice extra running system thread.

Code example

ret_df = pd.read_gbq(
                query_str, 
                project_id=bq_wrap.bq_billing_project, #Billing project
                configuration={'query':{'defaultDataset':{"datasetId": profile.bq_dataset, "projectId": bq_wrap.bq_project}}}, 
                credentials=creds,
                use_bqstorage_api=use_bqstorage_api,
                progress_bar_type='tqdm')

Stack trace

image

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery-pandas API. label Feb 17, 2023
@jkelly80
Copy link

Having the same issue! Its also leading to memory leaks for me. These extra threads seem to be holding references to data that read_gbq returns - preventing the garbage collector from removing it.

@Linchin
Copy link
Contributor

Linchin commented Apr 9, 2024

Thanks for reporting the issue! I am able to reproduce it, but it seems to only happen when tqdm is used.

@Linchin
Copy link
Contributor

Linchin commented Apr 9, 2024

It seems tqdm opens a new thread when an tqdm object is created, but it's not closed when tqdm is closed.

import tqdm

# Create a list of numbers
numbers = list(range(3))

# Create a progress bar
pbar = tqdm.tqdm(numbers)

# Iterate over the list of numbers
for number in pbar:
    # Do something with the number
    pass

# Close the progress bar
pbar.close()

# There is a "Thread-7" at this breakpoint
breakpoint()

exit(0)

@Linchin
Copy link
Contributor

Linchin commented Apr 10, 2024

This is caused by tqdm creating a new thread with class TMonitor while creating a new tqdm.tqdm object. A way to patch this is to set tqdm.tqdm.monitor_interval = 0 before using it - for example just after the library is imported. But overall I think it's a bug with tqdm. I opened an issue at tqdm, so I will close this one. Still, please leave a comment or open a new issue if you have any questions :)

@Linchin Linchin closed this as completed Apr 10, 2024
@Linchin Linchin added the priority: p3 Desirable enhancement or fix. May not be included in next release. label Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-pandas API. priority: p3 Desirable enhancement or fix. May not be included in next release.
Projects
None yet
Development

No branches or pull requests

3 participants