# Pandas-Log Usage Walkthrough

## Why pandas-log?

Pandas-log is a Python implementation of the R package tidylog, and provides a feedback about basic pandas operations.

The pandas has been invaluable for the data science ecosystem and usually consists of a series of steps that involve transforming raw data into an understandable/usable format. These series of steps need to be run in a certain sequence and if the result is unexpected it's hard to understand what happened. Pandas-log log metadata on each operation which will allow to pinpoint the issues.

## Pandas-log Demo


#### First we need to load some libraries including pandas-log

In [9]:
import pandas as pd
import numpy as np
import pandas_log 

#### Let's take a look at our dataset:

In [10]:
df = pd.read_csv("pokemon.csv")
df.head(10)

Unnamed: 0,#,name,type_1,type_2,total,hp,attack,defense,sp_atk,sp_def,speed,generation,legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False
5,5,Charmeleon,Fire,,405,58,64,58,80,65,80,1,False
6,6,Charizard,Fire,Flying,534,78,84,78,109,85,100,1,False
7,6,CharizardMega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False
8,6,CharizardMega Charizard Y,Fire,Flying,634,78,104,78,159,115,100,1,False
9,7,Squirtle,Water,,314,44,48,65,50,64,43,1,False


#### Lets say we want to find out:
## Who is the weakest non-legendary fire pokemon?

<img src="fire_pokemons.jpg" width="540" height="340" align="left"/>

#### The strategy will probably be something like:
1. Filter out legendary pokemons using `.query()`  .
1. Keep only fire pokemons using `.query()`  .
1. Drop Legendary column using `.drop()`  .
1. Keep the weakest pokemon among them using `.nsmallest()`  .
1. Reset index using `.reset_index()`  .

In [11]:
res = (df.copy()
         .query("legendary==0")
         .query("type_1=='fire' or type_2=='fire'")
         .drop("legendary", axis=1)
         .nsmallest(1,"total")
         .reset_index(drop=True)
      )
res       

Unnamed: 0,#,name,type_1,type_2,total,hp,attack,defense,sp_atk,sp_def,speed,generation


### OH NOO!!! Our code does not work !! We got no records


<img src="shocked.gif" width="490" height="340" align="left"/>

### If only there was a way to track those issue

Fortunetly thats what **pandas-log** is for! either as a global function or context manager.
This the example with pandas_log's `context_manager`.

In [12]:
with pandas_log.enable():
    res = (df.copy()
             .query("legendary==0")
             .query("type_1=='fire' or type_2=='fire'")
             .drop("legendary", axis=1)
             .nsmallest(1,"total")
             .reset_index(drop=True)
          )
res       


1) [1mquery[0m(expr="legendary==0", inplace=False):
	* Removed 65 rows (0.08125%), 735 rows remaining.
	* Step Took 0.0025560855865478516 seconds

2) [1mquery[0m(expr="type_1=='fire' or type_2=='fire'", inplace=False):
	* Removed 735 rows (1.0%), 0 rows remaining.
	* Step Took 0.0040740966796875 seconds

3) [1mdrop[0m(labels="legendary", axis=0, index=None, columns=None, level=None, inplace=False, errors='raise'):
	* Removed the following columns (legendary) now only have the following columns (sp_def,defense,generation,speed,name,type_2,hp,sp_atk,type_1,#,total,attack).
	* No change in number of rows.
	* Step Took 0.0007641315460205078 seconds

4) [1mnsmallest[0m(n=1, columns="total", keep='first'):
	* Picked 1 smallest rows by columns (total).
	* Step Took 0.0023779869079589844 seconds


Unnamed: 0,#,name,type_1,type_2,total,hp,attack,defense,sp_atk,sp_def,speed,generation


This the example with pandas_log's `auto_enable`

In [13]:
pandas_log.auto_enable()
res = (df.copy()
         .query("legendary==0")
         .query("type_1=='fire' or type_2=='fire'")
         .drop("legendary", axis=1)
         .nsmallest(1,"total")
         .reset_index(drop=True)
      )
pandas_log.auto_disable()
res       


1) [1mquery[0m(expr="legendary==0", inplace=False):
	* Removed 65 rows (0.08125%), 735 rows remaining.
	* Step Took 0.0027070045471191406 seconds

2) [1mquery[0m(expr="type_1=='fire' or type_2=='fire'", inplace=False):
	* Removed 735 rows (1.0%), 0 rows remaining.
	* Step Took 0.0044138431549072266 seconds

3) [1mdrop[0m(labels="legendary", axis=0, index=None, columns=None, level=None, inplace=False, errors='raise'):
	* Removed the following columns (legendary) now only have the following columns (sp_def,defense,generation,speed,name,type_2,hp,sp_atk,type_1,#,total,attack).
	* No change in number of rows.
	* Step Took 0.0010120868682861328 seconds

4) [1mnsmallest[0m(n=1, columns="total", keep='first'):
	* Picked 1 smallest rows by columns (total).
	* Step Took 0.0033338069915771484 seconds


Unnamed: 0,#,name,type_1,type_2,total,hp,attack,defense,sp_atk,sp_def,speed,generation


#### We can see clearly that in the second step (`.query()`) we filter all the rows!! and indeed we should of writen Fire as oppose to fire

In [15]:

res = (df.copy()
         .query("type_1=='Fire' or type_2=='Fire'")
         .query("legendary==0")         
         .drop("legendary", axis=1)       
         .nsmallest(1,"total")
         .reset_index(drop=True)
      )
res       

Unnamed: 0,#,name,type_1,type_2,total,hp,attack,defense,sp_atk,sp_def,speed,generation
0,218,Slugma,Fire,,250,40,40,40,70,40,20,2


### Whoala we got Slugma !!!!!!!!

<img src="slugma.jpg" width="250" height="340" align="left"/>

## Some more advance usage


#### One can use verbose variable which allows lower level logs functionalities like whether the dataframe was copied as part of pipeline.
This can explain comparision issues.

In [16]:
with pandas_log.enable(verbose=True):
    res = (df.copy()
             .query("legendary==0")
             .query("type_1=='Fire' or type_2=='Fire'")
             .drop("legendary", axis=1)
             .nsmallest(1,"total")
             .reset_index(drop=True)
          )
res       


1) [1mcopy[0m(deep=True):
	* using default strategy (some metric might not be relevant)
	* Step Took 0.0005130767822265625 seconds

2) [1mquery[0m(expr="legendary==0", inplace=False):
	* Removed 65 rows (0.08125%), 735 rows remaining.
	* Step Took 0.0033111572265625 seconds

3) [1mquery[0m(expr="type_1=='Fire' or type_2=='Fire'", inplace=False):
	* Removed 679 rows (0.9238095238095239%), 56 rows remaining.
	* Step Took 0.003696918487548828 seconds

4) [1mdrop[0m(labels="legendary", axis=0, index=None, columns=None, level=None, inplace=False, errors='raise'):
	* Removed the following columns (legendary) now only have the following columns (sp_def,defense,generation,speed,name,type_2,hp,sp_atk,type_1,#,total,attack).
	* No change in number of rows.
	* Step Took 0.0008273124694824219 seconds

5) [1mcopy[0m(deep=True):
	* using default strategy (some metric might not be relevant)
	* Step Took 0.00017905235290527344 seconds

5) [1mnsmallest[0m(n=1, columns="total", keep='first'

Unnamed: 0,#,name,type_1,type_2,total,hp,attack,defense,sp_atk,sp_def,speed,generation
0,218,Slugma,Fire,,250,40,40,40,70,40,20,2


as we can see after both the drop and nsmallest functions the dataframe was being copied

#### One can use silent variable which allows to suppress stdout

In [17]:
with pandas_log.enable(silent=True):
    res = (df.copy()
             .query("legendary==0")
             .query("type_1=='Fire' or type_2=='Fire'")
             .drop("legendary", axis=1)
             .nsmallest(1,"total")
             .reset_index(drop=True)
          )
res       


1) [1mcopy[0m(deep=True):
	* using default strategy (some metric might not be relevant)
	* Step Took 0.00025963783264160156 seconds


Unnamed: 0,#,name,type_1,type_2,total,hp,attack,defense,sp_atk,sp_def,speed,generation
0,218,Slugma,Fire,,250,40,40,40,70,40,20,2


#### One can use full_signature variable which allows to suppress the signature

In [18]:
with pandas_log.enable(full_signature=False):
    res = (df.copy()           
             .query("legendary==0")
             .query("type_1=='Fire' or type_2=='Fire'")
             .drop("legendary", axis=1)
             .nsmallest(1,"total")
             .reset_index(drop=True)
          )
res       


1) [1mcopy[0m(deep=True):
	* using default strategy (some metric might not be relevant)
	* Step Took 0.0002608299255371094 seconds

2) [1mquery[0m(expr="legendary==0"):
	* Removed 65 rows (0.08125%), 735 rows remaining.
	* Step Took 0.002346038818359375 seconds

3) [1mquery[0m(expr="type_1=='Fire' or type_2=='Fire'"):
	* Removed 679 rows (0.9238095238095239%), 56 rows remaining.
	* Step Took 0.0029571056365966797 seconds

4) [1mdrop[0m(labels="legendary"):
	* Removed the following columns (legendary) now only have the following columns (sp_def,defense,generation,speed,name,type_2,hp,sp_atk,type_1,#,total,attack).
	* No change in number of rows.
	* Step Took 0.0006778240203857422 seconds

5) [1mcopy[0m():
	* using default strategy (some metric might not be relevant)
	* Step Took 0.00016117095947265625 seconds

5) [1mnsmallest[0m(n=1, columns="total"):
	* Picked 1 smallest rows by columns (total).
	* Step Took 0.0014069080352783203 seconds

6) [1mcopy[0m():
	* using defaul

Unnamed: 0,#,name,type_1,type_2,total,hp,attack,defense,sp_atk,sp_def,speed,generation
0,218,Slugma,Fire,,250,40,40,40,70,40,20,2
