Skip to content

Recipes

Thor Whalen edited this page Oct 21, 2023 · 6 revisions

Problem: I want to copy data from a source store to another

Solution

Simple answer: Make the appropriate stores for target and src, then:

target.update(src)  # update target with src

If update hasn't been transformed, this equivalent to:

for k, v in src.items():
    target[k] = v

But the key lies in what's "appropriate" (by that I mean "the ones that will make target.update(src) do what you want it to do.

Considerations:

  • src values and keys should be aligned for the task. This means src should be able to read from where you want the data copied from and give you values (obj_of_data) that target knows how to convert for writing in its own system/format etc. (data_of_obj). Also, they must use a common key (though can have different internal (id) representations. One must consider if target should be emptied first (only contain src or be "enhanced" with source data One must consider if existing data should be overwritten with src data, and either take care of this case in a for loop, or wrap target's setitem with a "skip silently of key already exists" logic. etc.?

Problem: Make directories automatically and copy files recursively

Solution: Use Files, mk_dirs_if_missing, and .update

from dol import Files, mk

@mk_dirs_if_missing
class F(Files):
    """Files reader/writer that creates directories as needed (on write)"""

def copy_files_recursively(src_dir, targ_dir):
    F(targ_dir).update(Files(src_dir))
    

image

Problem: Deserialize or format data according to extension

Solution: Use wrap_kvs with the postget argument.

from dol import wrap_kvs
from io import BytesIO

def deserialize_based_on_key(k, v):
    v = BytesIO(v)
    if k.endswith('.csv'):
        return pd.read_csv(v)
    elif k.endswith('.xlsx'):
        return pd.read_excel(v)
    else:
        raise TypeError(f"Note handles extension {k}")

And now you can apply this to any store that gives you bytes of csv content:

from py2store import LocalBinaryStore, FilesOfZip

@wrap_kvs(postget=deserialize_based_on_key)
class CsvStoreForCsvFiles(LocalBinaryStore):
    """Docs"""

@wrap_kvs(postget=deserialize_based_on_key)
class CsvStoreForZipFiles(FilesOfZip):
    """Docs"""

How can I make my own custom wrapper based on this?

Well, if you're only using one of dol's wrappers (find a bunch in dol.trans), but you just want to fix some parameters, use functools.partial for this.

from functools import partial

csv_wrap = partial(wrap_kvs, postget=deserialize_based_on_key)

If your wrapper is more complex, you may want to use a function to spell out the complexity.

For example, the previous csv_wrap will take care of giving you csv bytes as pandas data frames, but if you point your store to something that has something else than csv files, it'll choke on those. So let's take care of that

How to filter out files that are not csvs

from dol import filt_iter

def csv_wrap(store):
    store = filt_iter(store, filt=lambda k: k.endswith('.csv')
    store = wrap_kvs(store, postget=deserialize_based_on_key)

And now you can apply this to any store that gives you bytes of csv files: Say local files store, or files of a zip, etc.

from py2store import *

@csv_wrap
class CsvStore(LocalBinaryStore):
    """Docs"""

@csv_wrap
class CsvStore(FilesOfZip):
    """Docs"""
    

Problem: I want to convert filesystem to Store, while having folder views

Solution

Here is a simple one-liner to do that

def to_dict_of_stores(rootdir):
    return {root: Files(root) for root, _, _ in os.walk(rootdir)}