# HyFI Test Notebook

This Jupyter Notebook is used to test the HyFI package. It contains examples of how to use the package and how to create a new HyFI model.


In [1]:
from corprep import HyFI


  from .autonotebook import tqdm as notebook_tqdm


## Check Version

Now, let's get the version of the `hyfi` package.


In [2]:
version = HyFI.__version__
print("HyFI version:", version)

HyFI version: 1.17.2
 1.17.2


In [3]:
HyFI.dotenv().dict()

{'DOTENV_FILENAME': '.env',
 'DOTENV_DIR': '/raid/cis/yjlee/workspace/projects/corporate-reputation',
 'DOTENV_FILE': '/raid/cis/yjlee/workspace/projects/corporate-reputation/.env',
 'HYFI_SECRETS_DIR': 'secrets',
 'HYFI_RESOURCE_DIR': None,
 'HYFI_GLOBAL_ROOT': '/home/yjlee',
 'HYFI_GLOBAL_WORKSPACE_NAME': '.hyfi',
 'HYFI_PROJECT_NAME': None,
 'HYFI_PROJECT_DESC': None,
 'HYFI_PROJECT_ROOT': None,
 'HYFI_PROJECT_WORKSPACE_NAME': 'workspace',
 'HYFI_LOG_LEVEL': 'INFO',
 'HYFI_VERBOSE': False,
 'HYFI_NUM_WORKERS': 1,
 'CACHED_PATH_CACHE_ROOT': None,
 'CUDA_DEVICE_ORDER': 'PCI_BUS_ID',
 'CUDA_VISIBLE_DEVICES': None,
 'WANDB_PROJECT': None,
 'WANDB_DISABLED': None,
 'WANDB_DIR': None,
 'WANDB_NOTEBOOK_NAME': None,
 'WANDB_SILENT': None,
 'LABEL_STUDIO_SERVER': None,
 'KMP_DUPLICATE_LIB_OK': 'True',
 'TOKENIZERS_PARALLELISM': False}

In [4]:
# Test expanding $PWD and $USER variables
posix_expr = "The system workspace root is $WORKSPACE_ROOT and the user is $USER."

expanded_expr = HyFI.expand_posix_vars(posix_expr)
print(expanded_expr)

The system workspace root is /raid/cis/yjlee/workspace and the user is yjlee.



## Initialize Project

We'll initialize the project using the `HyFI.init_project` function. The function takes the following parameters:

- `project_name`: Name of the project to use.
- `project_description`: Description of the project that will be used.
- `project_root`: Root directory of the project.
- `project_workspace_name`: Name of the project's workspace directory.
- `global_hyfi_root`: Root directory of the global hyfi.
- `global_workspace_name`: Name of the global hierachical workspace directory.
- `num_workers`: Number of workers to run.
- `log_level`: Log level for the log.
- `autotime`: Whether to automatically set time and / or keep track of run times.
- `retina`: Whether to use retina or not.
- `verbose`: Enables or disables logging

We'll check if we're running in Google Colab, and if so, we'll mount Google Drive.


In [5]:
if HyFI.is_colab():
    HyFI.mount_google_drive()

ws = HyFI.init_project(
    project_name="hyfi",
    log_level="DEBUG",
    verbose=True,
)

print("project_dir:", ws.root_dir)
print("project_workspace_dir:", ws.workspace_dir)

INFO:hyfi.utils.notebooks:Google Colab not detected.
INFO:hyfi.utils.notebooks:Extension autotime not found. Install it first.
DEBUG:matplotlib.pyplot:Loaded backend module://matplotlib_inline.backend_inline version unknown.
INFO:hyfi.utils.envs:[/raid/cis/yjlee/workspace/projects/corporate-reputation/tests/notebook/.env] not found, finding .env in parent dirs
INFO:hyfi.utils.envs:Loaded .env from [/raid/cis/yjlee/workspace/projects/corporate-reputation/.env]
DEBUG:hydra.core.utils:Setting JobRuntime:name=app
DEBUG:hydra.core.utils:Setting JobRuntime:name=app
DEBUG:hydra.core.utils:Setting JobRuntime:name=app
DEBUG:hydra.core.utils:Setting JobRuntime:name=app
DEBUG:hydra.core.utils:Setting JobRuntime:name=app
DEBUG:hydra.core.utils:Setting JobRuntime:name=app
DEBUG:hydra.core.utils:Setting JobRuntime:name=app
DEBUG:hydra.core.utils:Setting JobRuntime:name=app
INFO:hyfi.joblib.joblib:initialized batcher with <hyfi.joblib.batch.batcher.Batcher object at 0x7fa8240b2910>
INFO:hyfi.main.con

project_dir: /home/yjlee/.hyfi/projects/hyfi
project_workspace_dir: /home/yjlee/.hyfi/projects/hyfi/workspace
 /home/yjlee/.hyfi/projects/hyfi
project_workspace_dir: /home/yjlee/.hyfi/projects/hyfi/workspace


In [None]:
ws.dotenv.dict()

## Compose Configuration

We can use the `HyFI.compose` function to load a configuration file. In this example, we'll use the default configuration by specifying `path=__default__`.


In [None]:
cfg = HyFI.compose("path=__batch__")

## Display Configuration

Now, let's print the loaded configuration using the `HyFI.print` function.


In [None]:
HyFI.print(cfg)

'/raid/cis/yjlee/workspace/projects/corporate-reputation'

In [9]:
data_file = f"{HyFI.dotenv().DOTENV_DIR}/workspace/datasets/processed/kakao_nouns_similar_6.parquet"
data = HyFI.load_dataframe(data_file)

In [20]:
qry = "cleaned_text.str.split().str.len() <= 10"
d_ = data.query(qry, engine="python")
print(d_.shape)
d_[["cleaned_text", "duplicate"]]

(477, 17)


Unnamed: 0,cleaned_text,duplicate
32,주요 카카오 서비스카카오 플랫폼,False
18,다음 카카오 흡수 합병,False
41,다음커뮤니케이션은 카카오와 합병을 결정했다고 일 공시했다 공식 ...,False
249,공정위 카카오 불공정거래 조사 착수기업결합모바일 상품권 판매업체 계약 해지 등,True
88,유상증자파캔분할합병대유신소재다음모토닉추가 상장유원컴텍엘컴텍아이컴포넌트 네오티스 ...,False
...,...,...
607,기관외국인 순매수 주요 종목하이닉스삼성전자삼성전기삼성바이오로직스현대차우셀트리온현대...,False
289,신규상장미래에셋비전기업인수목적호추가상장와이투솔루션 초록뱀이앤엠 테크엘 카카오 쏘카,False
130,카카오 분기 영업이익 억원전년 보다 감소,True
330,보통주추가상장대양금속더메디팜 영풍제지 카카오 주권변경상장삼성물산신주배정기준일대성창...,False


That's it! This example demonstrated the basic usage of the `hyfi_template` package. You can now use this package to manage your own projects and tasks in a structured manner.
