Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

manually changing dataframe for catalog #579

Closed
jgiguereCC opened this issue Mar 9, 2023 · 2 comments
Closed

manually changing dataframe for catalog #579

jgiguereCC opened this issue Mar 9, 2023 · 2 comments

Comments

@jgiguereCC
Copy link

jgiguereCC commented Mar 9, 2023

Hi! I'm trying to manually change the dataframe for an esm-datastore and then assign the moditifed dataframe back to a catalog to read in CMIP6 models. I've tried using the functionality shown in the issue raised by @jbusecke here for intake-esm and the from_df() method showed here, but am getting AttributeError: can't set attribute and AttributeError: from_df from each of these methods respectively. Is there anything I can do to restrict the dataframe, then make a new catalog from that dataframe? I'm still quite new to using intake-esm, so apologies if this isn't the intended functionality!

intake-esm version:

intake_esm.show_versions()

INSTALLED VERSIONS
------------------

cftime: 1.6.2
dask: 2022.9.2
fastprogress: 0.2.7
fsspec: 2021.10.0
gcsfs: 2021.07.0
intake: 0.6.7
intake_esm: 2022.9.18
netCDF4: 1.6.2
pandas: 1.5.3
requests: 2.28.2
s3fs: 2022.8.2
xarray: 2022.9.0
zarr: 2.13.2

The Issue

import intake
import dask
url = "https://storage.googleapis.com/cmip6/pangeo-cmip6.json"
col = intake.open_esm_datastore(url)
scenarios = ["ssp370", "piControl", "historical"]  # set desired scenarios
var_name = 'tos'
time_step = ['Oday']
query = dict(experiment_id = scenarios,
             variable_id=var_name,
             table_id = time_step,
             member_id = 'r1i1p1f1'
            )
cat = col.search(require_all_on="source_id", **query)
correct_order = list(cat.df.columns)
new_df = cat.df.groupby(['source_id','experiment_id']).first().reset_index()[correct_order]
cat.df= new_df

Yields the error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[2], line 19
     17 correct_order = list(cat.df.columns)
     18 new_df = cat.df.groupby(['source_id','experiment_id']).first().reset_index()[correct_order]
---> 19 cat.df= new_df

AttributeError: can't set attribute

Thanks!

@andersy005
Copy link
Member

@jgiguereCC, thank you for putting together this reproducible issue :)

Try the following instead,

In [6]: cat.esmcat._df = new_df

@jgiguereCC
Copy link
Author

that seems to work! thanks!

@intake intake locked and limited conversation to collaborators Mar 10, 2023
@andersy005 andersy005 converted this issue into discussion #580 Mar 10, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants