# Datafaucet

Datafaucet is a productivity framework for ETL, ML application. Simplifying some of the common activities which are typical in Data pipeline such as project scaffolding, data ingesting, start schema generation, forecasting etc.

In [1]:
import datafaucet as dfc
from datafaucet import logging

## Logging

One of the main things here is to have configuration and code separated in different files. Project is all about setting the correct working directories where to run and find your notebooks, python files and configuration files. When datafaucet project is loaded, it starts by searching for a `__main__.py` file, according to python module file naming conventions. When such a file is found, the corresponding directory is set as the root path for the project. All modules and alias paths are all relative to the project root path.

### Logs

Logging via datafaucet support 5 levels:
  - info
  - notice
  - warning
  - error
  - fatal

#### No project metadata loaded.
Logging will work without loading any metadata project configuration, but in this case it will use the default cofiguration of the python root logger. By default, `debug`, `info` and `notice` level are filtered out. To enable the full functionality, including logging to kafka and logging the custom logging information about the project (sessionid, username, etc) you must load a project first.

In [2]:
logging.debug('debug')
logging.info('info')
logging.notice('notice')
logging.warning('a warning message')
logging.error('this is an error')
logging.critical('critical condition')

this is an error
critical condition


#### Loading a metadata profile
If a logging configuration is loaded, then extra functionality will be available. In particular, logging will log datafaucet specific info, such as the session id, and data can be passed as a dictionary, optionally with a custom message

In [3]:
logging.init('info', True, 'datafaucet.log')

In [4]:
logging.debug('debug')
logging.info('info')
logging.notice('notice')
logging.warning('a warning message')
logging.error('this is an error')
logging.critical('critical condition')

INFO:datafaucet:logging.ipynb:notebook:cell | info
NOTICE:datafaucet:logging.ipynb:notebook:cell | notice
ERROR:datafaucet:logging.ipynb:notebook:cell | this is an error
CRITICAL:datafaucet:logging.ipynb:notebook:cell | critical condition


In [5]:
# custom message
dfc.logging.notice('hello world')

NOTICE:datafaucet:logging.ipynb:notebook:cell | hello world


In [6]:
# *args similar to print
dfc.logging.warning('message', 'can have', 'multiple parts', 'and', 'types:', dfc.__name__, 'is a', type(dfc))



In [7]:
# add custom data dictionary as a dictionary
dfc.logging.warning('custom data + message', extra={'test_value':42})



In [8]:
# extra dictionary is not shown in stdout, but does show in file (jsonl format) and kafka log messages
!tail -n 1 datafaucet.log | jq .

[1;39m{
  [0m[34;1m"@timestamp"[0m[1;39m: [0m[0;32m"2019-12-10T13:59:55.954334"[0m[1;39m,
  [0m[34;1m"sid"[0m[1;39m: [0m[0;32m"0x2fb5f39a1b1211ea"[0m[1;39m,
  [0m[34;1m"repohash"[0m[1;39m: [0m[0;32m"68d044a"[0m[1;39m,
  [0m[34;1m"reponame"[0m[1;39m: [0m[0;32m"datalabframework.git"[0m[1;39m,
  [0m[34;1m"username"[0m[1;39m: [0m[0;32m"natbusa"[0m[1;39m,
  [0m[34;1m"filepath"[0m[1;39m: [0m[0;32m"logging.ipynb"[0m[1;39m,
  [0m[34;1m"funcname"[0m[1;39m: [0m[0;32m"notebook:cell"[0m[1;39m,
  [0m[34;1m"message"[0m[1;39m: [0m[0;32m"custom data + message"[0m[1;39m,
  [0m[34;1m"data"[0m[1;39m: [0m[1;39m{
    [0m[34;1m"test_value"[0m[1;39m: [0m[0;39m42[0m[1;39m
  [1;39m}[0m[1;39m
[1;39m}[0m


In [13]:
# from a function
import datafaucet as dfc
from datafaucet import logging
logging.init('info', True, 'datafaucet.log')

import my_module

def my_nested_function():
    logging.info('another message')
    logging.info('custom',extra=[1,2,3])
    
def my_function():
    logging.info(extra = {'a':'text', 'b':2})
    my_nested_function()
    
my_function()
my_module.foo()

INFO:datafaucet:logging.ipynb:notebook:my_function | 
INFO:datafaucet:logging.ipynb:notebook:my_nested_function | another message
INFO:datafaucet:logging.ipynb:notebook:my_nested_function | custom
INFO:datafaucet:logging.ipynb:my_module:foo | foo
INFO:datafaucet:logging.ipynb:my_module:bar | bar


In [15]:
!tail -n 3 datafaucet.log | jq .

[1;39m{
  [0m[34;1m"@timestamp"[0m[1;39m: [0m[0;32m"2019-12-10T14:05:53.070669"[0m[1;39m,
  [0m[34;1m"severity"[0m[1;39m: [0m[0;32m"INFO"[0m[1;39m,
  [0m[34;1m"sid"[0m[1;39m: [0m[0;32m"0x1f2b0e601b1311ea"[0m[1;39m,
  [0m[34;1m"repohash"[0m[1;39m: [0m[0;32m"68d044a"[0m[1;39m,
  [0m[34;1m"reponame"[0m[1;39m: [0m[0;32m"datalabframework.git"[0m[1;39m,
  [0m[34;1m"username"[0m[1;39m: [0m[0;32m"natbusa"[0m[1;39m,
  [0m[34;1m"filepath"[0m[1;39m: [0m[0;32m"logging.ipynb"[0m[1;39m,
  [0m[34;1m"funcname"[0m[1;39m: [0m[0;32m"notebook:my_nested_function"[0m[1;39m,
  [0m[34;1m"message"[0m[1;39m: [0m[0;32m"custom"[0m[1;39m,
  [0m[34;1m"data"[0m[1;39m: [0m[1;39m[
    [0;39m1[0m[1;39m,
    [0;39m2[0m[1;39m,
    [0;39m3[0m[1;39m
  [1;39m][0m[1;39m
[1;39m}[0m
[1;39m{
  [0m[34;1m"@timestamp"[0m[1;39m: [0m[0;32m"2019-12-10T14:05:53.083559"[0m[1;39m,
  [0m[34;1m"severity"[0m[1;39m: [0m[0;32m"INFO"[0