# iRODS sync tutorial

`sync` synchronizes the data between a local copy (local file system) and the copy stored in iRODS. It compares path, size and optionally checksum of local and remote files to determine whether they have changed and should be synchronized. It creates files or overwrites older copies, but does not delete files from the target location when they have been deleted from the source.

The command can be in one of the two modes: synchronization of data from the client's local file system to iRODS, or from iRODS to the local file system.

In [None]:
from pathlib import Path
from ibridges import Session
from ibridges.path import IrodsPath
from ibridges.sync import sync_data

### Create a session

Set up a session to an iRODS server. In this example we assume you have a valid locally cached iRODS password from a previous session.

In [None]:
session = Session(irods_env_path=Path.expanduser(Path('~')).joinpath(".irods", "irods_environment.json"))

### Uploading/downloading
Upload or download mode is determined by the type of `source` and `target` (`IrodsPath` or `str`/`Path`).

When uploading, `source` must be an existing local folder, and `target` an existing iRODS collection, and vice versa when downloading. An exception will be raised if either doesn't exist.

In [None]:
target = IrodsPath(session, "~", <irods coll path>)
source = Path(os.path.expanduser("~"), <your path>)

### Setting sync options
`sync` takes various options:

- `ignore_checksum`: sync compares the checksum values and file sizes of the source and target files to determine whether synchronization is needed. If the `ignore_checksum` option is set to True, only the file size (instead of the the size and checksum value) is used for determining whether synchronization is needed. This mode gives a potentially faster operation but the result is less accurate.
- The `max_level` option controls the depth up to which the file tree will be synchronized. With `max_level` set to None (default), there is no limit (full recursive synchronization). A max level of 1 synchronizes only the source's root, max level 2 also includes the first set of subfolders/subcollections and their contents, etc.
- The `copy_empty_folders` (default False) option controls whether folders/collections that contain no files or subfolders/subcollections will be synchronized.
- The `dry_run` option lists all the source files and folders that need to be synchronized without actually performing the synchronization.
- The `verify_checksum` (default True) option will calculate and verify the checksum on the data after up- or downloading. A checksum mismatch will generate an error, but will not abort the synchronization process.

In [None]:
ignore_checksum=True
max_level=None
copy_empty_folders=True
# copy_empty_folders=False
dry_run=True

### Dry run
Setting `dry_run` to True will list what will be synchronized without any actual transfers. Note that setting `verify_checksum` while performing a dry run will have no effect.

In [None]:
ignore_checksum=False
sync(
    session=session,
    source=source,
    target=target,
    max_level=max_level,      
    dry_run=dry_run,
    ignore_checksum=ignore_checksum,
    copy_empty_folders=copy_empty_folders
)

To perform the actual synchronization, set `dry_run` to False, and run again.

In [None]:
dry_run=False
verify_checksum=True

In [None]:
sync(
    session=session,
    source=source,
    target=target,
    max_level=max_level,      
    dry_run=dry_run,
    verify_checksum=verify_checksum,
    copy_empty_folders=copy_empty_folders)