# Demo of the *skimpy* Python package

This notebook is a quick demo of how to use skimpy in practice. First, let's make sure it's installed in this google colab notebook.

In [None]:
!pip install skimpy pandas

If this is the first time you've run this notebook, you may need to now refresh the runtime. Click runtime then 'restart runtime' from the menu options at the top of the page.

Now we can import and use the package. Let's grab the example data, including in the package, and the function that is going to summarise the data, *skim*.

Here's the dataframe and imports

In [None]:
from skimpy import skim, generate_test_data

df = generate_test_data()

df.head()

It's also worth noting that this data has datatypes set in advance, and you'll get more informative skims from dataframes that have the datatypes set first. Here are the datatypes in this dataframe:

In [None]:
df.info()

## Running skimpy

Okay, we're ready to run *skim* on our dataframe!


In [None]:
skim(df)

## Options

There are some limited options for customisation.

You can change the header styles of the first three tables (you can find more info on styles in the documentation of the [**rich** package](https://rich.readthedocs.io/en/stable/index.html), which **skimpy** builds on):

In [None]:
skim(df, header_style="italic green")

## Cleaning Column Names

**skimpy** also comes with a function to clean up column names. Here's an example of some messy column names:

In [None]:
import pandas as pd
from rich import print
from skimpy import clean_columns

columns = [
    "bs lncs;n edbn ",
    "Nín hǎo. Wǒ shì zhōng guó rén",
    "___This is a test___",
    "ÜBER Über German Umlaut",
]
messy_df = pd.DataFrame(columns=columns, index=[0], data=[range(len(columns))])
print("Column names:")
print(list(messy_df.columns))

Now we'll clean them up:

In [None]:
clean_df = clean_columns(messy_df)
print(list(clean_df.columns))