# Datafaucet

Datafaucet is a productivity framework for ETL, ML application. Simplifying some of the common activities which are typical in Data pipeline such as project scaffolding, data ingesting, start schema generation, forecasting etc.

In [1]:
import datafaucet as dfc
from datafaucet import logging

## Logging

One of the main things here is to have configuration and code separated in different files. Project is all about setting the correct working directories where to run and find your notebooks, python files and configuration files. When datafaucet project is loaded, it starts by searching for a `__main__.py` file, according to python module file naming conventions. When such a file is found, the corresponding directory is set as the root path for the project. All modules and alias paths are all relative to the project root path.

### Logs

Logging via datafaucet support 5 levels:
  - info
  - notice
  - warning
  - error
  - fatal

#### No project metadata loaded.
Logging will work without loading any metadata project configuration, but in this case it will use the default cofiguration of the python root logger. By default, `debug`, `info` and `notice` level are filtered out. To enable the full functionality, including logging to kafka and logging the custom logging information about the project (sessionid, username, etc) you must load a project first.

In [2]:
logging.debug('debug')
logging.info('info')
logging.notice('notice')
logging.warning('a warning message')
logging.error('this is an error')
logging.critical('critical condition')

this is an error
critical condition


Messages propagate to the upper logger as usual: you can define your own formatting and configuration of the logger

In [3]:
import logging as python_logging
python_logging.basicConfig(format=("%(asctime)s %(levelname)s (%(threadName)s) [%(name)s] %(message)s"))
python_logging.getLogger().setLevel(python_logging.INFO)

logging.debug('debug')
logging.info('info')
logging.notice('notice')
logging.warning('a warning message')
logging.error('this is an error')
logging.critical('critical condition')

2019-12-18 09:35:09,075 INFO (MainThread) [datafaucet] info
2019-12-18 09:35:09,075 NOTICE (MainThread) [datafaucet] notice
2019-12-18 09:35:09,076 ERROR (MainThread) [datafaucet] this is an error
2019-12-18 09:35:09,077 CRITICAL (MainThread) [datafaucet] critical condition


### Initializing the datafaucet logger
If a logging configuration is loaded/initialized, then extra functionality will be available.  
In particular, logging will log datafaucet specific info, such as the session id, and data can be passed as a dictionary, optionally with a custom message  
This is the list of the extra fields available for logging:
 - dfc_sid: datafaucet session id
 - dfc_username: username
 - dfc_filepath: file name being run
 - dfc_reponame: repository name if under git
 - dfc_repohash: repository short hash if under git
 - dfc_funcname: function name being run
 - dfc_data: any extra data passed via the 'extra=' parameter in the logging

In [4]:
logging.init('info', True, 'datafaucet.log')

In [5]:
logging.debug('debug')
logging.info('info')
logging.notice('notice')
logging.warning('a warning message')
logging.error('this is an error')
logging.critical('critical condition')

 [datafaucet] INFO logging.ipynb:notebook:cell | info
 [datafaucet] NOTICE logging.ipynb:notebook:cell | notice
 [datafaucet] ERROR logging.ipynb:notebook:cell | this is an error
 [datafaucet] CRITICAL logging.ipynb:notebook:cell | critical condition


In [6]:
# custom message
dfc.logging.notice('hello world')

 [datafaucet] NOTICE logging.ipynb:notebook:cell | hello world


In [7]:
# *args miltiple variable args are concatenated similar to print
dfc.logging.warning('message', 'can have', 'multiple parts', 'and', 'types:', dfc.__name__, 'is a', type(dfc))



In [8]:
# add custom data dictionary as a dictionary
dfc.logging.warning('custom data + message', extra={'test_value':42})



In [9]:
# extra dictionary is not shown in stdout, but does show in file (jsonl format) and kafka log messages
!tail -n 1 datafaucet.log | jq .

[1;39m{
  [0m[34;1m"@timestamp"[0m[1;39m: [0m[0;32m"2019-12-18T09:35:24.243715"[0m[1;39m,
  [0m[34;1m"sid"[0m[1;39m: [0m[0;32m"0xa40823ba213611ea"[0m[1;39m,
  [0m[34;1m"repohash"[0m[1;39m: [0m[0;32m"78e2847"[0m[1;39m,
  [0m[34;1m"reponame"[0m[1;39m: [0m[0;32m"datalabframework.git"[0m[1;39m,
  [0m[34;1m"username"[0m[1;39m: [0m[0;32m"natbusa"[0m[1;39m,
  [0m[34;1m"filepath"[0m[1;39m: [0m[0;32m"logging.ipynb"[0m[1;39m,
  [0m[34;1m"funcname"[0m[1;39m: [0m[0;32m"notebook:cell"[0m[1;39m,
  [0m[34;1m"message"[0m[1;39m: [0m[0;32m"custom data + message"[0m[1;39m,
  [0m[34;1m"data"[0m[1;39m: [0m[1;39m{
    [0m[34;1m"test_value"[0m[1;39m: [0m[0;39m42[0m[1;39m
  [1;39m}[0m[1;39m
[1;39m}[0m


In [10]:
import my_module

def my_nested_function():
    logging.info('another message')
    logging.info('custom',extra=[1,2,3])
    
def my_function():
    logging.info(extra = {'a':'text', 'b':2})
    my_nested_function()
    
my_function()
my_module.foo()

 [datafaucet] INFO logging.ipynb:notebook:my_function | 
 [datafaucet] INFO logging.ipynb:notebook:my_nested_function | another message
 [datafaucet] INFO logging.ipynb:notebook:my_nested_function | custom
 [datafaucet] INFO logging.ipynb:my_module:foo | foo
 [datafaucet] INFO logging.ipynb:my_module:bar | bar


In [11]:
!tail -n 5 datafaucet.log | jq .

[1;39m{
  [0m[34;1m"@timestamp"[0m[1;39m: [0m[0;32m"2019-12-18T09:35:26.020223"[0m[1;39m,
  [0m[34;1m"severity"[0m[1;39m: [0m[0;32m"INFO"[0m[1;39m,
  [0m[34;1m"sid"[0m[1;39m: [0m[0;32m"0xa40823ba213611ea"[0m[1;39m,
  [0m[34;1m"repohash"[0m[1;39m: [0m[0;32m"78e2847"[0m[1;39m,
  [0m[34;1m"reponame"[0m[1;39m: [0m[0;32m"datalabframework.git"[0m[1;39m,
  [0m[34;1m"username"[0m[1;39m: [0m[0;32m"natbusa"[0m[1;39m,
  [0m[34;1m"filepath"[0m[1;39m: [0m[0;32m"logging.ipynb"[0m[1;39m,
  [0m[34;1m"funcname"[0m[1;39m: [0m[0;32m"notebook:my_function"[0m[1;39m,
  [0m[34;1m"message"[0m[1;39m: [0m[0;32m""[0m[1;39m,
  [0m[34;1m"data"[0m[1;39m: [0m[1;39m{
    [0m[34;1m"a"[0m[1;39m: [0m[0;32m"text"[0m[1;39m,
    [0m[34;1m"b"[0m[1;39m: [0m[0;39m2[0m[1;39m
  [1;39m}[0m[1;39m
[1;39m}[0m
[1;39m{
  [0m[34;1m"@timestamp"[0m[1;39m: [0m[0;32m"2019-12-18T09:35:26.032405"[0m[1;39m,
  [0m[34;1m"severity"[0m