Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
1 contributor

Users who have contributed to this file

18 lines (9 sloc) 4.59 KB

back to project

Press Release from the Future

JupyterLab Data Explorer Press Release

JupyterLab gains another set of groundbreaking features to help data scientists work more effectively, by treating data as a first class entity. This enables practitioners in an organization to clearly see which datasets they have access to within JupyterLab, and it enables organizations to publish datasets to their end users smoothly through JupyterLab plugins. This new dataset entity in JupyterLab is flexible such that it can reflect the data in a single file, a collection of files, or even queries to a structured relational database. Examples of datasets include: .csv files, GIS data, images, videos, audio, SQL databases, any grouping these and other data formats. There are two components to this release: The Data Registry enables extension developers to quickly create extensions which provide, explain, and visualize data, and the Data Explorer enables users to easily browse, discover and collaborate on their datasets.

JupyterLab is a platform where users explore data to extract insights and make decisions. They can visualize and model datasets from a variety of sources such as files, databases, and notebooks. Prior to the Data Registry, accessing data in JupyterLab required navigating a complex web of 'places' data could live, for example, certain datasets may only become accessible by running a particular notebook that accesses the data from another location on your hard drive or remote server.

While JupyterLab has always been a fantastic tool to interface with data, it is only now that it has become aware of what a dataset actually is. New possibilities exist, such as organizations being able to curate and publish collections of datasets to their users though JupyterLab; the shareability of results increases when all users refer to datasets in a standard way through shared curation. Similarly, visualization tools now have a standard way to ingest datasets thus can be built more quickly, more generically, and they can be presented to users in standardized user interface workflows. Together, the Data Registry and Data Explorer provide a focal point for all data-related extensions in JupyterLab, enabling dataset interactions to be easy for developers and seamless for users.

The unified Data Explorer UI enables users to see all of their datasets in one place. They are able to filter by where the datasets came from and what type of dataset it is. They can browse large data catalogues and search for datasets which fit their needs. Once they find a dataset of interest, they can “one-click” visualize it with different tools, import it into a notebook, or see related metadata or comments about it. Taken all together, users can now broadly peruse and deeply examine datasets in an intuitive way within JupyterLab.

The Data Registry eases extension development dramatically which will boost the growth of JupyterLab’s ecosystem. In the past, creating data-centric extensions to Jupyter Notebooks was complex and required a high level of effort to develop or maintain such extensions. The Data Registry alleviates this issue by providing extension developers with a public API so that they can list available datasets, register conversions between different dataset types, and link new datasets into the JupyterLab Data Registry. The available datasets and their corresponding views are exposed to the user in a centralized UI, the Data Explorer. This allows extension developers to reuse data conversions provided by other extensions and rely on the built-in UI to expose their data. This standard set of interfaces allows different data extensions to work together without explicitly knowing about each other.

Overall, the release of the Data Registry and Data Explorer push JupyterLab into a new realm of tooling. With this release, JupyterLab enters the world of organized, disciplined, and shareable dataset curation which will drive individuals and organizations to the next level of productivity. Organizations with existing data catalogues can now integrate them into the JupyterLab user experience. Individuals or small working groups who don’t have an existing external data catalog can also use the Data Registry locally to organize datasets on their projects, providing themselves a way to clearly document dataset usage in their projects. Additionally, JupyterLab is well-positioned to grow organically through the open-source community by offering a developer-friendly extension framework which will invite innovative data-centric tools for visualizing and processing all kinds of data.

You can’t perform that action at this time.