# Pandas-Log Usage Walkthrough

## Why pandas-log?

Pandas-log is a Python implementation of the R package tidylog, and provides a feedback about basic pandas operations.

The pandas has been invaluable for the data science ecosystem and usually consists of a series of steps that involve transforming raw data into an understandable/usable format. These series of steps need to be run in a certain sequence and if the result is unexpected it's hard to understand what happened. Pandas-log log metadata on each operation which will allow to pinpoint the issues.

## Pandas-log Demo


#### First we need to load some libraries including pandas-log

In [1]:
import pandas as pd
import numpy as np
import pandas_log 

#### Let's take a look at our dataset:

In [2]:
df = pd.read_csv("pokemon.csv")
df.head(10)

Unnamed: 0,#,name,type_1,type_2,total,hp,attack,defense,sp_atk,sp_def,speed,generation,legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False
5,5,Charmeleon,Fire,,405,58,64,58,80,65,80,1,False
6,6,Charizard,Fire,Flying,534,78,84,78,109,85,100,1,False
7,6,CharizardMega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False
8,6,CharizardMega Charizard Y,Fire,Flying,634,78,104,78,159,115,100,1,False
9,7,Squirtle,Water,,314,44,48,65,50,64,43,1,False


#### Lets say we want to find out:
## Who is the weakest non-legendary fire pokemon?

<img src="fire_pokemons.jpg" width="540" height="340" align="left"/>

#### The strategy will probably be something like:
1. Filter out legendary pokemons using `.query()`  .
1. Keep only fire pokemons using `.query()`  .
1. Drop Legendary column using `.drop()`  .
1. Keep the weakest pokemon among them using `.nsmallest()`  .
1. Reset index using `.reset_index()`  .

In [3]:
res = (df.query("legendary==0")
         .query("type_1=='fire' or type_2=='fire'")
         .drop("legendary", axis=1)
         .nsmallest(1,"total")
         .reset_index(drop=True)
      )
res       

Unnamed: 0,#,name,type_1,type_2,total,hp,attack,defense,sp_atk,sp_def,speed,generation


### OH NOO!!! Our code does not work !! We got no records


<img src="shocked.gif" width="490" height="340" align="left"/>

### If only there was a way to track those issue

Fortunetly thats what **pandas-log** is for! either as a global function or context manager.
This the example with pandas_log's `context_manager`.

In [9]:
with pandas_log.enable():
    res = (df.copy()
             .query("legendary==0")
             .query("type_1=='fire' or type_2=='fire'")
             .drop("legendary", axis=1)
             .nsmallest(1,"total")
             .reset_index(drop=True)
          )
res       


1) copy():
	* Step Took 0.0005021095275878906 seconds

2) query(expr=legendary==0):
	* Step Took 0.002875089645385742 seconds
	* Removed 65 rows (0.08125%), 735 rows remaining.

3) query(expr=type_1=='fire' or type_2=='fire'):
	* Step Took 0.003099203109741211 seconds
	* Removed 735 rows (1.0%), 0 rows remaining.

4) drop(labels=legendary):
	* Step Took 0.0006909370422363281 seconds
	* Removed the following columns (legendary) now only have the following columns (defense,total,speed,sp_def,name,generation,hp,type_2,attack,type_1,sp_atk,#).
	* No change in number of rows.

5) copy():
	* Step Took 0.00014400482177734375 seconds

5) reset_index():
	* Step Took 0.0002980232238769531 seconds

5) nsmallest(n=1, columns=total):
	* Step Took 0.003013134002685547 seconds
	* Picked 1 smallest rows by columns (total).

6) copy():
	* Step Took 0.00015878677368164062 seconds

6) reset_index():
	* Step Took 0.0003192424774169922 seconds


Unnamed: 0,#,name,type_1,type_2,total,hp,attack,defense,sp_atk,sp_def,speed,generation


This the example with pandas_log's `auto_enable`

In [10]:
pandas_log.auto_enable()
res = (df.copy()
         .query("legendary==0")
         .query("type_1=='fire' or type_2=='fire'")
         .drop("legendary", axis=1)
         .nsmallest(1,"total")
         .reset_index(drop=True)
      )
pandas_log.auto_disable()
res       


1) copy():
	* Step Took 0.0002942085266113281 seconds

2) query(expr=legendary==0):
	* Step Took 0.0027730464935302734 seconds
	* Removed 65 rows (0.08125%), 735 rows remaining.

3) query(expr=type_1=='fire' or type_2=='fire'):
	* Step Took 0.0031180381774902344 seconds
	* Removed 735 rows (1.0%), 0 rows remaining.

4) drop(labels=legendary):
	* Step Took 0.0007269382476806641 seconds
	* Removed the following columns (legendary) now only have the following columns (defense,total,speed,sp_def,name,generation,hp,type_2,attack,type_1,sp_atk,#).
	* No change in number of rows.

5) copy():
	* Step Took 0.00014710426330566406 seconds

5) reset_index():
	* Step Took 0.00040912628173828125 seconds

5) nsmallest(n=1, columns=total):
	* Step Took 0.002457857131958008 seconds
	* Picked 1 smallest rows by columns (total).

6) copy():
	* Step Took 0.0002651214599609375 seconds

6) reset_index():
	* Step Took 0.000843048095703125 seconds


Unnamed: 0,#,name,type_1,type_2,total,hp,attack,defense,sp_atk,sp_def,speed,generation


#### We can see clearly that in the second step (`.query()`) we filter all the rows!! and indeed we should of writen Fire as oppose to fire

In [6]:

res = (df.copy()
         .drop("legendary", axis=1)       
         .query("type_1=='Fire' or type_2=='Fire'")
         .query("legendary==0")         
         .nsmallest(1,"total")
         .reset_index(drop=True)
      )
res       


1) query(expr=type_1=='Fire' or type_2=='Fire'):
	* Step Took 0.0033431053161621094 seconds
	* Removed 736 rows (0.92%), 64 rows remaining.


Unnamed: 0,#,name,type_1,type_2,total,hp,attack,defense,sp_atk,sp_def,speed,generation
0,218,Slugma,Fire,,250,40,40,40,70,40,20,2


### Whoala we got Slugma !!!!!!!!

<img src="slugma.jpg" width="250" height="340" align="left"/>

## Some more advance usage


#### One can use verbose variable which allows lower level logs functionalities like whether the dataframe was copied as part of pipeline.
This can explain comparision issues.

In [7]:
with pandas_log.enable(verbose=True):
    res = (df.copy()
             .query("legendary==0")
             .query("type_1=='Fire' or type_2=='Fire'")
             .drop("legendary", axis=1)
             .nsmallest(1,"total")
             .reset_index(drop=True)
          )
res       


1) query(expr=legendary==0):
	* Step Took 0.002256155014038086 seconds
	* Removed 65 rows (0.08125%), 735 rows remaining.

2) query(expr=type_1=='Fire' or type_2=='Fire'):
	* Step Took 0.0033159255981445312 seconds
	* Removed 679 rows (0.9238095238095239%), 56 rows remaining.

3) drop(labels=legendary):
	* Step Took 0.0007841587066650391 seconds
	* Removed the following columns (legendary) now only have the following columns (defense,total,speed,sp_def,name,generation,hp,type_2,attack,type_1,sp_atk,#).
	* No change in number of rows.

4) copy():
	* Step Took 0.00041413307189941406 seconds

4) reset_index():
	* Step Took 0.0006079673767089844 seconds

4) nsmallest(n=1, columns=total):
	* Step Took 0.0018150806427001953 seconds
	* Picked 1 smallest rows by columns (total).

5) copy():
	* Step Took 0.0001609325408935547 seconds

5) reset_index():
	* Step Took 0.000308990478515625 seconds


Unnamed: 0,#,name,type_1,type_2,total,hp,attack,defense,sp_atk,sp_def,speed,generation
0,218,Slugma,Fire,,250,40,40,40,70,40,20,2


as we can see after both the drop and nsmallest functions the dataframe was being copied

#### One can use silent variable which allows to suppress stdout

In [8]:
with pandas_log.enable(silent=True):
    res = (df.copy()
             .query("legendary==0")
             .query("type_1=='Fire' or type_2=='Fire'")
             .drop("legendary", axis=1)
             .nsmallest(1,"total")
             .reset_index(drop=True)
          )
res       


1) query(expr=legendary==0):
	* Step Took 0.001997709274291992 seconds
	* Removed 65 rows (0.08125%), 735 rows remaining.


Unnamed: 0,#,name,type_1,type_2,total,hp,attack,defense,sp_atk,sp_def,speed,generation
0,218,Slugma,Fire,,250,40,40,40,70,40,20,2


#### One can use full_signature variable which allows to suppress the signature

In [11]:
with pandas_log.enable(full_signature=False):
    res = (df.copy()           
             .query("legendary==0")
             .query("type_1=='Fire' or type_2=='Fire'")
             .drop("legendary", axis=1)
             .nsmallest(1,"total")
             .reset_index(drop=True)
          )
res       

TypeError: enable() got an unexpected keyword argument 'full_signature'