-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creating a catalog from a list of file names and then using the gui to select a source from that catalog. #774
Comments
@tedhabermann , I suggest taking a look at this Project Pythia Intake notebook. It's nearly exactly what you want to do: You can just change it to read CSV files instead of Zarr datasets! |
Thanks Rich - darò un’occhiata
… On Nov 26, 2023, at 1:09 AM, rsignell ***@***.***> wrote:
@tedhabermann <https://github.com/tedhabermann> , take a look at this Project Pythia notebook: https://projectpythia.org/intake-cookbook/notebooks/creating_catalogs.html
It's nearly exactly what you want to do: You can just change it to read CSV files instead of Zarr datasets!
—
Reply to this email directly, view it on GitHub <#774 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABURU6KDEEANLLSIBQ3EIATYGL2M5AVCNFSM6AAAAAA72NOPDOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRWG4YTQMBQGI>.
You are receiving this because you were mentioned.
|
Rich - you are right, this is a very interesting and helpful tutorial. I learned alot about ways to use catalogs. Unfortunately, there are some differences that are important. This tutorial uses the csv data source to inform users about options while I was trying to read the file names and add them as sources to the very cool intake GUI. This tutorial seems to be aimed at users that are rather fluent in python, which is fine, but I am adding at users with less interest in writing code to select the datasets they are interested in... Also I really liked the integration of different types of data sources into the catalog. I will definitely use that capability once I figure out how to create a catalog that works and add it to the GUI! |
@tedhabermann Here's an example of creating a catalog from several CSV files -- it was trickier than I thought! Also, if you want a nice user interface for folks, you might want to consider https://lumen.holoviz.org/ instead of the intake gui. |
Rich, Thanks again. I want to make sure I understand this.
This seems like quite a bit of disc access... and a little kludgey... |
@tedhabermann , yes, I see your point. I think the add approach would be nice if you have a variety of different datasets, but if they are all CSVs, would be nice to just generate using Based on this information in the Intake documentation, I created this notebook which uses this pattern: from intake.catalog.local import LocalCatalogEntry
from intake.catalog import Catalog
cat = Catalog()
csv_list = [ ['states1', 'states_1.csv'],
['states2', 'states_2.csv'] ]
cat._entries = {name: LocalCatalogEntry(name, description='',
driver='intake.source.csv.CSVSource',
args={"urlpath": url}) for name, url in csv_list}
cat.save('catalog.yml') For a more complex |
@tedhabermann and finally, what you probably were asking for in the beginning. 🙂 My colleague pointed out that using So here we create a dict of sources using import intake
from intake.catalog.local import LocalCatalogEntry
from intake.catalog import Catalog
from pathlib import Path
source_list = ['states_1.csv', 'states_2.csv']
intake_sources={}
for source in source_list:
name = Path(source).stem
intake_sources[name] = LocalCatalogEntry(
name=name,
description=f'CSV file {name}',
driver='intake.source.csv.CSVSource',
args={
'urlpath': source
},
metadata={
'agency': 'blah',
'another tag': 'blah'
}
)
cat = Catalog.from_dict(
intake_sources,
name="CSV Files",
description="CSV Files from Intake Examples",
)
cat.save('catalog.yml') Here is the Full Notebook! |
Rich - yes, I think this is it... Thanks so much for your patience and persistence! |
Rich - the problem seems to be how more complex catalog structures get passed into the LocalCatalogEntry function. I was trying to pass a list of arguments like userParameter_l = [ UserParameter(x) for x in cat_d['parameters'] ] but that did not work even when I tried to follow the single parameter example that Martin provided on stackOverflow, so I gave up on parameters. Now I am trying to pass an argument dictionary that has a dtype dictionary in it like (urlpath is added to this dictionary as I loop the file names): In the catalog this becomes something like this (below) which looks like it has a definition of &id001 and later references to *id001. Unfortunately, the urlpath is not the same in the sources that reference *id001. The urlpath should be unique for each source so this reference approach will not work. I am trying to create these LocatCatalogEntrys as |
@tedhabermann it's hard read this without code and syntax highlighting. Can you please spend a few minutes editing your questions? |
I have read the documentation many times but am still missing something simple.
I am trying to create a catalog from a directory with a bunch of data files.
For each file I create a dictionary (cat_d) that looks like:
args:
csv_kwargs:
dtype:
Agency Portal URL: object
Datasets: object
Grant ID: object
Issue: object
ORCID: object
urlpath: ~/CHORUS/data/USAID-2023-09-18-AllReport.csv
description: CHORUS USAID All Report
driver: csv
name: USAID-2023-09-18-All
parameters:
agency:
default: USAID
description: agancy acronym
type: &id001 !!python/name:builtins.str ''
dataType:
default: all
description: CHORUS data type
type: *id001
timestamp:
default: '2023-09-18'
description: YYYY-MM-DD
type: *id001
and I try to create a LocalCatalogEntry from this like:
localCatEntry = LocalCatalogEntry(**cat_d) or
localCatEntry = LocalCatalogEntry(
name = cat_d['name'],
description = cat_d['description'],
parameters = cat_d['parameters'],
driver = cat_d['driver'],
args = cat_d['args']
)
and the localCatEntry has what appears to me to be some extraneous stuff, like the empty args key, that i don't understand.
!!python/object:intake.catalog.local.LocalCatalogEntry
args: []
cls: intake.catalog.local.LocalCatalogEntry
kwargs:
name: USAID-2023-09-18-All
description: CHORUS USAID All Report
parameters:
agency:
default: USAID
description: agancy acronym
type: &id001 !!python/name:builtins.str ''
dataType:
default: all
description: CHORUS data type
type: *id001
timestamp:
default: '2023-09-18'
description: YYYY-MM-DD
type: *id001
driver: csv
args:
csv_kwargs:
dtype:
Agency Portal URL: object
Datasets: object
Grant ID: object
Issue: object
ORCID: object
urlpath: /Users/tedhabermann/Documents/MetadataGameChanger/ProjectsAndPlans/INFORMATE/CHORUS/data/USAID-2023-09-18-AllReport.csv
undeterred, I append this localCatEntry to a dictionary where sourceName = 'USAID-2023-09-18-All':
catalog_d.update({sourceName : localCatEntry})
then make a catalog:
mycat = Catalog.from_dict(catalog_d)
now I try to add this catalog to the gui:
intake.gui.add(mycat)
and get:
TypeError Traceback (most recent call last)
Cell In[33], line 1
----> 1 intake.gui.add(mycat)
File ~/anaconda3/lib/python3.11/site-packages/intake/interface/gui.py:65, in GUI.add(self, *args, **kwargs)
63 def add(self, *args, **kwargs):
64 """Add to list of cats"""
---> 65 return self.cat.select.add(*args, **kwargs)
File ~/anaconda3/lib/python3.11/site-packages/intake/interface/base.py:222, in BaseSelector.add(self, items)
220 self.widget.options.update(options)
221 self.widget.param.trigger("options")
--> 222 self.widget.value = list(options.values())[:1]
File ~/anaconda3/lib/python3.11/site-packages/param/parameterized.py:367, in instance_descriptor.._f(self, obj, val)
365 instance_param = getattr(obj, '_instance__params', {}).get(self.name)
366 if instance_param is not None and self is not instance_param:
--> 367 instance_param.set(obj, val)
368 return
369 return f(self, obj, val)
File ~/anaconda3/lib/python3.11/site-packages/param/parameterized.py:369, in instance_descriptor.._f(self, obj, val)
367 instance_param.set(obj, val)
368 return
--> 369 return f(self, obj, val)
File ~/anaconda3/lib/python3.11/site-packages/param/parameterized.py:1252, in Parameter.set(self, obj, val)
1250 # Copy watchers here since they may be modified inplace during iteration
1251 for watcher in sorted(watchers, key=lambda w: w.precedence):
-> 1252 obj.param._call_watcher(watcher, event)
1253 if not obj.param._BATCH_WATCH:
1254 obj.param._batch_call_watchers()
File ~/anaconda3/lib/python3.11/site-packages/param/parameterized.py:2043, in Parameters.call_watcher(self, watcher, event)
2041 event = self_.update_event_type(watcher, event, self.self_or_cls.param.TRIGGER)
2042 with batch_call_watchers(self.self_or_cls, enable=watcher.queued, run=False):
-> 2043 self._execute_watcher(watcher, (event,))
File ~/anaconda3/lib/python3.11/site-packages/param/parameterized.py:2025, in Parameters._execute_watcher(self, watcher, events)
2023 async_executor(partial(watcher.fn, *args, **kwargs))
2024 else:
-> 2025 watcher.fn(*args, **kwargs)
File ~/anaconda3/lib/python3.11/site-packages/intake/interface/catalog/select.py:86, in CatSelector.callback(self, event)
85 def callback(self, event):
---> 86 self.expand_nested(event.new)
87 if self.done_callback:
88 self.done_callback(event.new)
File ~/anaconda3/lib/python3.11/site-packages/intake/interface/catalog/select.py:113, in CatSelector.expand_nested(self, cats)
111 name = next(k for k, v in old if v == cat)
112 index = next(i for i, (k, v) in enumerate(old) if v == cat)
--> 113 if right in name:
114 prefix = f"{name.split(right)[0]}{down} {right}"
115 else:
TypeError: argument of type 'NoneType' is not iterable
running intake.gui seems to know this catalog is there as it displays None in the list of catalogs, but the source is not there.
i also tried something like:
newgui = intake.interface.gui.GUI
and
newgui.add(mycat)
but got
AttributeError Traceback (most recent call last)
Cell In[46], line 1
----> 1 newgui.add(mycat)
File ~/anaconda3/lib/python3.11/site-packages/intake/interface/gui.py:65, in GUI.add(self, *args, **kwargs)
63 def add(self, *args, **kwargs):
64 """Add to list of cats"""
---> 65 return self.cat.select.add(*args, **kwargs)
AttributeError: 'NoneType' object has no attribute 'select'
The text was updated successfully, but these errors were encountered: