# Dataframes

The `nbgallery.database.dataframes` module provides access to parts of the nbgallery database directly as pandas dataframes, no SQL required.  The dataframes provided are commonly used as input for data science jobs like summary statistics and recommenders. 

In [None]:
import nbgallery.database.dataframes as nbgdf

## General data

Metadata for each notebook, with or without summary stats.  The summary stats are computed periodically by nbgallery and contain things like number of times the notebook has been viewed and by how many users.

In [None]:
nbgdf.notebooks()

In [None]:
nbgdf.notebooks_with_summaries()

Metadata for each user, with or without summary stats.  The summary stats include things like the author contribution score.

In [None]:
nbgdf.users()

In [None]:
nbgdf.users_with_summaries()

## Click data

An nbgallery "click" is an interaction between a user and a notebook, such as viewing it in the gallery or launching it into Jupyter.  Each click object contains a notebook id, user id, action, and timestamp.

You can get all the clicks but it might be a lot, so there are some filtering options.

In [None]:
nbgdf.clicks()

Actions involving a subset of notebooks during a range of dates.

In [None]:
nbgdf.clicks(notebook_id=[23, 24, 25], min_date='2020-01-01', max_date='2020-12-31')

Actions by a user in the last 90 days.

In [None]:
nbgdf.clicks(user_id=1, days_ago=90)

There are some options to compress the data.  This one has one row per (user, notebook, action) with a count and first/last timestamps.  You can use the same filter options.

In [None]:
nbgdf.clicks_rollup(days_ago=90)

This one has one row per (user, notebook) with the action counts pivoted into the row.

In [None]:
nbgdf.clicks_rollup_pivot()

## Execution data

When nbgallery's Jupyter instrumentation is enabled, cell-level execution logs from Jupyter are recorded in nbgallery's database.  Each entry contains user id, code cell id (which links back to a notebook id and cell number), success/failure (whether the cell through an exception), and timestamp.

As with clicks, you can retrieve all the executions or also use the same filtering options.

In [None]:
nbgdf.executions(user_id=[1, 2, 3], days_ago=90)

This rollup has one row per code cell, with number of users and cumulative success rate.

In [None]:
nbgdf.cell_execution_rollup()

And this one has one row per notebook.

In [None]:
nbgdf.notebook_execution_rollup()

## More help

In [None]:
help(nbgdf)