Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
tag: 0.02
Fetching contributors…

Cannot retrieve contributors at this time

219 lines (166 sloc) 7.688 kb

A walkthrough of the basic features of git-annex.

[[!toc]]

creating a repository

This is very straightforward. Just tell it a description of the repository.

# mkdir ~/annex
# cd ~/annex
# git init
# git annex init "my laptop"

adding a remote

Like any other git repository, git-annex repositories have remotes. Let's start by adding a USB drive as a remote.

# sudo mount /media/usb
# cd /media/usb
# git clone ~/annex
# cd annex
# git annex init "portable USB drive"
# git remote add laptop ~/annex
# cd ~/annex
# git remote add usbdrive /media/usb

This is all standard ad-hoc distributed git repository setup. The only git-annex specific part is telling it the name of the new repository created on the USB drive.

Notice that both repos are set up as remotes of one another. This lets either get annexed files from the other. You'll want to do that even if you are using git in a more centralized fashion.

adding files

# cd ~/annex
# cp /tmp/big_file .
# cp /tmp/debian.iso .
# git annex add .
add big_file ok
add debian.iso ok
# git commit -a -m added

When you add a file to the annex and commit it, only a symlink to the annexed content is committed. The content itself is stored in git-annex's backend.

renaming files

# cd ~/annex
# git mv big_file my_cool_big_file
# mkdir iso
# git mv debian.iso iso/
# git commit -m moved

You can use any normal git operations to move files around, or even make copies or delete them.

Notice that, since annexed files are represented by symlinks, the symlink will break when the file is moved into a subdirectory. But, git-annex will fix this up for you when you commit -- it has a pre-commit hook that watches for and corrects broken symlinks.

getting file content

A repository does not always have all annexed file contents available. When you need the content of a file, you can use "git annex get" to make it available.

We can use this to copy everything in the laptop's annex to the USB drive.

# cd /media/usb/annex
# git pull laptop master
# git annex get .
get my_cool_big_file (copying from laptop...) ok
get iso/debian.iso (copying from laptop...) ok

Notice that you had to git pull from laptop first, this lets git-annex know what has changed in laptop, and so it knows about the files present there and can get them.

transferring files: When things go wrong

After a while, you'll have serveral annexes, with different file contents. You don't have to try to keep all that straight; git-annex does [[location_tracking]] for you. If you ask it to get a file and the drive or file server is not accessible, it will let you know what it needs to get it:

# git annex get video/hackity_hack_and_kaxxt.mov
get video/_why_hackity_hack_and_kaxxt.mov (not available)
  I was unable to access these remotes: usbdrive, server
  Try making some of these repositories available:
    5863d8c0-d9a9-11df-adb2-af51e6559a49  -- my home file server
    58d84e8a-d9ae-11df-a1aa-ab9aa8c00826  -- portable USB drive
    ca20064c-dbb5-11df-b2fe-002170d25c55  -- backup SATA drive
failed
# sudo mount /media/usb
# git annex get video/hackity_hack_and_kaxxt.mov
get video/hackity_hack_and_kaxxt.mov (copying from usbdrive...) ok
# git commit -a -m "got a video I want to rewatch on the plane"

removing files

You can always drop files safely. Git-annex checks that some other annex has the file before removing it.

# git annex drop iso/debian.iso
drop iso/Debian_5.0.iso ok
# git commit -a -m "freed up space"

removing files: When things go wrong

Before dropping a file, git-annex wants to be able to look at other remotes, and verify that they still have a file. After all, it could have been dropped from them too. If the remotes are not mounted/available, you'll see something like this.

# git annex drop important_file other.iso
drop important_file (unsafe)
  Could only verify the existence of 0 out of 1 necessary copies
  I was unable to access these remotes: usbdrive
  Try making some of these repositories available:
    58d84e8a-d9ae-11df-a1aa-ab9aa8c00826  -- portable USB drive
    ca20064c-dbb5-11df-b2fe-002170d25c55  -- backup SATA drive
  (Use --force to override this check, or adjust annex.numcopies.)
failed
drop other.iso (unsafe)
  Could only verify the existence of 0 out of 1 necessary copies
      No other repository is known to contain the file.
  (Use --force to override this check, or adjust annex.numcopies.)
failed

Here you might --force it to drop important_file if you trust your backup. But other.iso looks to have never been copied to anywhere else, so if it's something you want to hold onto, you'd need to transfer it to some other repository before dropping it.

using ssh remotes

So far in this walkthrough, git-annex has been used with a remote repository on a USB drive. But it can also be used with a git remote that is truely remote, a host accessed by ssh.

Say you have a desktop on the same network as your laptop and want to clone the laptop's annex to it:

# git clone ssh://mylaptop/home/me/annex ~/annex
# cd ~/annex
# git annex init "my desktop"

Now you can get files and they will be transferred by scp:

# git annex get my_cool_big_file
get my_cool_big_file (getting UUID for origin...) (copying from origin...)
WORM:1285650548:2159:my_cool_big_file       100% 2159     2.1KB/s   00:00
ok

When you drop files, git-annex will ssh over to the remote and make sure the file's content is still there before removing it locally:

# git annex drop my_cool_big_file
drop my_cool_big_file (checking origin..) ok

Note that normally git-annex prefers to use non-ssh remotes, like a USB drive, before ssh remotes. They are assumed to be faster/cheaper to access, if available. There is a annex-cost setting you can configure in .git/config to adjust which repositories it prefers. See [[the_man_page|git-annex]] for details.

Also, note that you need full shell access for this to work -- git-annex needs to be able to ssh in and run commands.

moving file content between repositories

Often you will want to move some file contents from a repository to some other one. For example, your laptop's disk is getting full; time to move some files to an external disk before moving another file from a file server to your laptop. Doing that by hand (by using git annex get and git annex drop) is possible, but a bit of a pain. git annex move makes it very easy.

# git annex move my_cool_big_file --to usbdrive
move my_cool_big_file (moving to usbdrive...) ok
# git annex move video/hackity_hack_and_kaxxt.mov --from fileserver
move video/hackity_hack_and_kaxxt.mov (moving from fileserver...)
WORM:1274316523:86050597:hackity_hack_and_kax 100%   82MB 199.1KB/s   07:02
ok

using the URL backend

git-annex has multiple key-value [[backends]]. So far this walkthrough has demonstrated the default, WORM (Write Once, Read Many) backend.

Another handy backend is the URL backend, which can fetch file's content from remote URLs. Here's how to set up some files in your repository that use this backend:

# git annex fromkey --backend=URL --key=http://www.archive.org/somefile somefile
fromkey somefile ok
# git commit -m "added a file from the Internet Archive"

Now you if you ask git-annex to get that file, it will download it, and cache it locally.

# git annex get somefile
get somefile (downloading)
#########################################################################100.0%
ok

You can always drop files downloaded by the URL backend. It is assumed that the URL is stable; no local backup is kept.

# git annex drop somefile
drop somefile (ok)
Jump to Line
Something went wrong with that request. Please try again.