# Applets

Applets are special jupyter notebooks whose cells are automatically run along side your notebooks to show auxiliary information and visualizations about the data you're working with. [histograms.ipynb](./histograms.ipynb), for example, automatically shows histograms for every numeric column in the data frame you're working with. It can be enabled by pressing the "histograms" button within the data chimp view.

You can create new applets by creating new notebooks in this directory. The name of the notebook will be the label for it's enable/disable button in the UI. Each code cell within a applet has access to a `df` variable that is set to the data frame you're working with. It's also possible to create lists of visualizations from a single data frame using the `get_args` and `visualization` functions. See [histograms.ipynb](./histograms.ipynb) for an example.

## The "default" applet

This notebook is the default applet whose cells are always run. The default applet is useful for data quality checks like the one below that shows a data frame with all columns that have more than 3% missing values if there's at least one such column in the data frame you're working with.

In [4]:
missing_df = (df
  .isnull()
  .mean()
  .round(4)
  .mul(100)
  .sort_values(ascending=False)
)
badly_missing_df = missing_df[missing_df > 3]
badly_missing_df if not badly_missing_df.empty else None

Cabin    77.10
Age      19.87
dtype: float64

In [None]:
import os
import sys

def flag_bad_tables():
  global dc_code
  if sys.version_info.major == 3 and sys.version_info.minor > 10:
    import tomllib
    from functools import reduce
    with open("./data_chimp/flagged.toml", "rb") as f:
      flagged = tomllib.load(f)
      if len(flagged) == 0:
        return
      flagged_tables = [
        f"'{table['name']}' table flagged because: {table['reason']}" 
        for table in flagged["tables"] 
        if "columns" not in table and table["name"].casefold() in dc_code.casefold()
      ]
      columns = reduce(lambda a, b: a + b, [
        table["columns"] 
        for table in flagged["tables"] 
        if table.get("columns")
      ])
      flagged_columns = [
        f"'{column['name']}' column flagged because: {column['reason']}"
        for column in columns 
        if column['name'].casefold() in dc_code.casefold()
      ]
      result_string = os.linesep.join([os.linesep.join(flagged_tables), os.linesep.join(flagged_columns)])
      return result_string if len(flagged_tables) > 0 or len(flagged_columns) > 0 else None

flag_bad_tables()

FileNotFoundError: [Errno 2] No such file or directory: './data_chimp/flagged.toml'

The following cells are only for testing the above ones. We tell data chimp to ignore them by adding the `dchimp.ignore` cell tag.

In [2]:
import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv")

In [None]:
dc_code = """
SELECT id, last_purchase FROM customers
JOIN users USING id
"""