# filoma — Quick interactive examples

This notebook demonstrates key filoma capabilities and includes lightweight checks to see if it works in your environment.

It covers: imports and version checks, probing a file and a directory, working with the `filoma.DataFrame` wrapper, using `probe_to_df`, a small image probe example, and saving a CSV export.

Note: cells wrap operations in `try/except` so the notebook still runs if optional dependencies (e.g. `polars`, `numpy`, or image backends) are missing.

In [1]:
# Basic environment and import checks
from pathlib import Path

import filoma
from filoma import DataFrame


def check_imports():
    results = {}
    try:
        import filoma

        results["filoma"] = getattr(filoma, "__version__", "unknown")
    except Exception as e:
        results["filoma"] = f"IMPORT ERROR: {e}"

    for pkg in ("polars", "numpy", "PIL"):
        try:
            __import__(pkg if pkg != "PIL" else "PIL.Image")
            results[pkg] = "available"
        except Exception as e:
            results[pkg] = f"missing ({e})"

    # show where we are running the notebook from
    results["cwd"] = str(Path(".").resolve())
    return results


check_imports()

{'filoma': '1.7.6',
 'polars': 'available',
 'numpy': 'available',
 'PIL': 'available',
 'cwd': '/home/kalfasy/repos/filoma/notebooks'}

## 1) Quick probe: a single file and a directory

Try probing a README or small file, then probe a lightweight sample directory from the repo's `tests/` tree.

In [2]:
file_candidate = "../README.md"
dir_candidate = "../tests/"

print("probing file ->", file_candidate)
if file_candidate is not None:
    try:
        file_report = filoma.probe(file_candidate)
        print("file probe result type:", type(file_report))
        try:
            # many filoma dataclasses implement a nice repr or to-dict
            print(file_report)
        except Exception:
            pass
    except Exception as e:
        print("file probe failed:", e)
else:
    print("No small file found to probe in the repository root.")

print("probing directory ->", dir_candidate)
if dir_candidate is not None:
    try:
        dir_report = filoma.probe(dir_candidate, max_depth=2, threads=2)
        print("directory probe returned an object of type:", type(dir_report))
        # If it exposes a to_df() method we can inspect a little
        if hasattr(dir_report, "to_df"):
            try:
                dfw = dir_report.to_df()
                print("to_df() -> wrapper type:", type(dfw))
            except Exception as e:
                print("to_df() raised:", e)
    except Exception as e:
        print("directory probe failed:", e)
else:
    print("No small directory found to probe in tests/; adjust the path and re-run.")

[32m2025-09-20 21:43:20.513[0m | [34m[1mDEBUG   [0m | [36mfiloma.directories.directory_profiler[0m:[36m__init__[0m:[36m352[0m - [34m[1mInteractive environment detected, disabling progress bars to avoid conflicts[0m
[32m2025-09-20 21:43:20.513[0m | [1mINFO    [0m | [36mfiloma.directories.directory_profiler[0m:[36mprobe[0m:[36m439[0m - [1mStarting directory analysis of '../tests/' using 🦀 Rust (Parallel) implementation[0m
[32m2025-09-20 21:43:20.515[0m | [32m[1mSUCCESS [0m | [36mfiloma.directories.directory_profiler[0m:[36mprobe[0m:[36m455[0m - [32m[1mDirectory analysis completed in 0.00s - Found 179 items (154 files, 25 folders) using 🦀 Rust (Parallel)[0m
[32m2025-09-20 21:43:20.513[0m | [1mINFO    [0m | [36mfiloma.directories.directory_profiler[0m:[36mprobe[0m:[36m439[0m - [1mStarting directory analysis of '../tests/' using 🦀 Rust (Parallel) implementation[0m
[32m2025-09-20 21:43:20.515[0m | [32m[1mSUCCESS [0m | [36mfiloma.direc

probing file -> ../README.md
file probe result type: <class 'filoma.files.file_profiler.Filo'>
Filo(path=PosixPath('/home/kalfasy/repos/filoma/README.md'), size=6413, mode='0o100664', mode_str='-rw-rw-r--', owner='kalfasy', group='kalfasy', created=datetime.datetime(2025, 9, 20, 0, 24, 52), modified=datetime.datetime(2025, 9, 20, 0, 24, 52), accessed=datetime.datetime(2025, 9, 20, 0, 24, 52), is_symlink=False, is_file=True, is_dir=False, target_is_file=None, target_is_dir=None, rights={'read': True, 'write': True, 'execute': False}, inode=7601600, nlink=1, sha256=None, xattrs={})
probing directory -> ../tests/
directory probe returned an object of type: <class 'filoma.directories.directory_profiler.DirectoryAnalysis'>
to_df() -> wrapper type: <class 'NoneType'>


## 2) Working with `filoma.DataFrame` wrapper

Construct a `filoma.DataFrame` from a list of paths and run the convenience enrichers: `.add_path_components()`, `.add_file_stats_cols()`, and `.add_depth_col()`.

In [3]:
sample_paths = [p for p in (Path("../README.md"), Path("../pyproject.toml"), Path("../Cargo.toml")) if p.exists()]
if not sample_paths:
    # fallback to a couple of files from tests if present
    sample_paths = [p for p in (Path("../tests/test_basic_dataframe.py"), Path("../tests/test_rust_comprehensive.py")) if p.exists()]

print("sample paths used:", sample_paths)
dfw = DataFrame(sample_paths)
print("Initial wrapper and head:")
print(dfw.head(10))

print("With path components:")
try:
    df_components = dfw.add_path_components()
    print(df_components.head(10))
except Exception as e:
    print("add_path_components failed:", e)

print("With file stats:")
try:
    df_stats = dfw.add_file_stats_cols()
    print(df_stats.head(10))
except Exception as e:
    print("add_file_stats_cols failed:", e)

print("Add depth column relative to repo root:")
try:
    df_depth = dfw.add_depth_col(Path("."))
    print(df_depth.head(10))
except Exception as e:
    print("add_depth_col failed:", e)

sample paths used: [PosixPath('../README.md'), PosixPath('../pyproject.toml'), PosixPath('../Cargo.toml')]
Initial wrapper and head:
filoma.DataFrame with 3 rows
shape: (3, 1)
┌───────────────────┐
│ path              │
│ ---               │
│ str               │
╞═══════════════════╡
│ ../README.md      │
│ ../pyproject.toml │
│ ../Cargo.toml     │
└───────────────────┘
With path components:
filoma.DataFrame with 3 rows
shape: (3, 5)
┌───────────────────┬────────┬────────────────┬───────────┬────────┐
│ path              ┆ parent ┆ name           ┆ stem      ┆ suffix │
│ ---               ┆ ---    ┆ ---            ┆ ---       ┆ ---    │
│ str               ┆ str    ┆ str            ┆ str       ┆ str    │
╞═══════════════════╪════════╪════════════════╪═══════════╪════════╡
│ ../README.md      ┆ ..     ┆ README.md      ┆ README    ┆ .md    │
│ ../pyproject.toml ┆ ..     ┆ pyproject.toml ┆ pyproject ┆ .toml  │
│ ../Cargo.toml     ┆ ..     ┆ Cargo.toml     ┆ Cargo     ┆ .toml  │
└────────

## 3) Build a DataFrame from a directory using `probe_to_df`

This uses filoma's convenience `probe_to_df` which returns a `filoma.DataFrame` wrapper (Polars is used internally if available). We request a lightweight folder under `tests/` to keep runtime small.

In [4]:
from filoma import probe_to_df

dir_path = "../tests"
if dir_path is None:
    print("No test directory available for probe_to_df; skip this cell.")
else:
    try:
        dfw = probe_to_df(dir_path, to_pandas=False, enrich=True, max_depth=2, threads=2)
        print("probe_to_df returned a filoma.DataFrame with shape:", dfw.shape)
        # Show a small sample and a group_by_extension summary when available
        try:
            print("Sample rows:")
            print(dfw.head(5))
        except Exception:
            pass
        try:
            print("Extension counts:")
            print(dfw.group_by_extension().head(10))
        except Exception as e:
            print("group_by_extension failed:", e)
    except Exception as e:
        print("probe_to_df failed:", e)

[32m2025-09-20 21:43:20.530[0m | [34m[1mDEBUG   [0m | [36mfiloma.directories.directory_profiler[0m:[36m__init__[0m:[36m352[0m - [34m[1mInteractive environment detected, disabling progress bars to avoid conflicts[0m
[32m2025-09-20 21:43:20.531[0m | [1mINFO    [0m | [36mfiloma.directories.directory_profiler[0m:[36mprobe[0m:[36m439[0m - [1mStarting directory analysis of '../tests' using 🦀 Rust (Parallel) implementation[0m
[32m2025-09-20 21:43:20.534[0m | [32m[1mSUCCESS [0m | [36mfiloma.directories.directory_profiler[0m:[36mprobe[0m:[36m455[0m - [32m[1mDirectory analysis completed in 0.00s - Found 179 items (154 files, 25 folders) using 🦀 Rust (Parallel)[0m
[32m2025-09-20 21:43:20.531[0m | [1mINFO    [0m | [36mfiloma.directories.directory_profiler[0m:[36mprobe[0m:[36m439[0m - [1mStarting directory analysis of '../tests' using 🦀 Rust (Parallel) implementation[0m
[32m2025-09-20 21:43:20.534[0m | [32m[1mSUCCESS [0m | [36mfiloma.directo

probe_to_df returned a filoma.DataFrame with shape: (146, 18)
Sample rows:
filoma.DataFrame with 5 rows
shape: (5, 18)
┌───────────────────┬───────┬──────────┬──────────────────┬───┬──────────┬───────┬────────┬────────┐
│ path              ┆ depth ┆ parent   ┆ name             ┆ … ┆ inode    ┆ nlink ┆ sha256 ┆ xattrs │
│ ---               ┆ ---   ┆ ---      ┆ ---              ┆   ┆ ---      ┆ ---   ┆ ---    ┆ ---    │
│ str               ┆ i64   ┆ str      ┆ str              ┆   ┆ i64      ┆ i64   ┆ str    ┆ str    │
╞═══════════════════╪═══════╪══════════╪══════════════════╪═══╪══════════╪═══════╪════════╪════════╡
│ ../tests/test_asy ┆ 1     ┆ ../tests ┆ test_async_rust_ ┆ … ┆ 7601345  ┆ 1     ┆ null   ┆ {}     │
│ nc_rust_extra…    ┆       ┆          ┆ extra.py         ┆   ┆          ┆       ┆        ┆        │
│ ../tests/test_bas ┆ 1     ┆ ../tests ┆ test_basic_dataf ┆ … ┆ 7602664  ┆ 1     ┆ null   ┆ {}     │
│ ic_dataframe.…    ┆       ┆          ┆ rame.py          ┆   ┆          

## 4) Image probing (in-memory)

Create a small numpy array and pass it to `filoma.probe_image` to exercise the image path that accepts arrays. This avoids needing image files or heavy dependencies.

In [5]:
try:
    import numpy as np

    arr = np.random.randn(16, 16)
    img_report = filoma.probe_image(arr)
    print("probe_image on numpy array returned type:", type(img_report))
    try:
        print(img_report)
    except Exception:
        pass
except Exception as e:
    print("Skipping image probe; numpy unavailable or probe failed:", e)

probe_image on numpy array returned type: <class 'filoma.images.image_profiler.ImageReport'>
ImageReport(path=None, file_type=None, shape=(16, 16), dtype='float64', min=-2.8289461122786097, max=2.6011147464964393, mean=0.034071740265040014, nans=0, infs=0, unique=256, status=None)


## 5) Save a small CSV export (if `polars` is available)

This cell attempts to save the `probe_to_df` result or our small DataFrame example to `/tmp/filoma_example.csv`. It prints a short verification sample.

In [6]:
out_path = Path("/tmp/filoma_example.csv")
saved = False
try:
    if "dfw" in globals():
        try:
            dfw.df.write_csv(str(out_path))
            saved = True
        except Exception:
            pass
    if saved:
        print("Saved CSV to", out_path)
        try:
            print("CSV sample:", out_path.read_text().splitlines()[:10])
        except Exception:
            pass
    else:
        print("Could not save CSV; polars or file-writer not available.")
except Exception as e:
    print("Saving CSV failed:", e)

Saved CSV to /tmp/filoma_example.csv
CSV sample: ['path,depth,parent,name,stem,suffix,size_bytes,modified_time,created_time,is_file,is_dir,owner,group,mode_str,inode,nlink,sha256,xattrs', '../tests/test_async_rust_extra.py,1,../tests,test_async_rust_extra.py,test_async_rust_extra,.py,1938,2025-09-10 23:02:11,2025-09-10 23:02:11,true,false,kalfasy,kalfasy,-rw-rw-r--,7601345,1,,{}', '../tests/test_basic_dataframe.py,1,../tests,test_basic_dataframe.py,test_basic_dataframe,.py,1413,2025-09-10 23:02:12,2025-09-10 23:02:12,true,false,kalfasy,kalfasy,-rw-rw-r--,7602664,1,,{}', '../tests/scripts,1,../tests,scripts,scripts,"",4096,2025-09-04 20:14:19,2025-09-04 20:14:19,false,true,kalfasy,kalfasy,drwxrwxr-x,13593175,3,,{}', '../tests/test_ml_core.py,1,../tests,test_ml_core.py,test_ml_core,.py,5741,2025-09-20 21:41:56,2025-09-20 21:41:56,true,false,kalfasy,kalfasy,-rw-rw-r--,7602966,1,,{}', '../tests/test_rust_absolute_paths.py,1,../tests,test_rust_absolute_paths.py,test_rust_absolute_paths,.py,

---

### Notes and next steps

- If a cell raised an exception because a dependency is missing, install `polars`, `numpy`, and optionally `pillow`.
- To run longer scans increase `max_depth` and `threads` in the `probe()` calls.
- Use `probe_to_df(..., to_pandas=True)` to get a pandas.DataFrame if you prefer pandas.