# Syncing files to Dropbox

In this project, you will write a script that syncs your JupyterHub folder to your Dropbox account, checking each file or directory to see if it needs updating to Dropbox.

This notebook will not work on Binder or on your laptop, as it needs to have access to your JupyterHub files. I suggest downloading `get_dropbox_api.ipynb` and `sync_to_dropbox.ipynb` to your JupyterHub folder (preferably in a subdirectory e.g. `dropbox-sync` for better organisation) and editing them from there.

This notebook will also not work on SSOE laptops, as some required libraries (e.g. `requests`) are not available and will not be able to run.

Go through [Accessing the Dropbox API](get_dropbox_api.ipynb) first to get a Dropbox API token which you will need to access the Dropbox API. Paste that API as a string in the code cell below, so that you can use it again easily via the `token` variable

In [None]:
# This code sets up the Dropbox API class so it can be used later.
from dropbox_api import DBApi, APIError, RateLimitError

token = 'paste your Dropbox API token here'

## 1 Try out the Dropbox API

Instantiate the Dropbox API:

    db = DBApi(token)

In [None]:
db = DBApi(token)

The `DBApi` class makes a few methods available to us:

- `db.list_folder()`
- `db.create_folder()`
- `db.upload_file()`
- `db.delete()`

And an attribute:

- `db.root`

You can use `help()` to explore the usage of each method.

Before you start writing code to use these methods, it is good practice to call them (e.g. `db.list_folder()`) and examine the response from it. How is the response structured? Get familiar with the output from the above methods before you proceed.

## 2 Get file/directory listing from Dropbox

Get a list of files and directories from Dropbox using the Dropbox API.

In [None]:
# Get a list of entries, each entry being a file or directory in Dropbox.
# Store it in the variable `db_entries`
# By this point you shouldn't have anything in your Dropbox app folder,
# So don't be surprised to see an empty list!

db_entries = 



## 3 Get file/directory listing from local directory using `os`

Use `os.listdir()` to generate a list of files and directories in your JupyterHub home directory.

The full path of your home directory is `'/home/jupyter-<your_username>'`. So if your username is `anewuser`, your home directory will be at `'/home/jupyter-anewuser'`.

Note that `os.listdir()` will not list the contents of subdirectories for you; you will have to list them recursively.

In [None]:
# Write code below to get a list of entries, each entry being a file or
# directory in your current directory.
# Store it in the variable `local_entries`

import os

local_entries = 



**(optional)** There are many files and folders in the listing that can be ignored. Edit your code above to ignore the following files and directories.

- `.ipynb_checkpoints`
- `__pycache__`
- `.ipython`
- `.bash_history`
- `.bashrc`
- `.local`
- `.npm`
- `.conda`
- `.cache`
- `.jupyter`
- `.config`
- `.profile`
- `.bash_logout`

## 4 Compare Dropbox and local entries

Now we try to sync the Dropbox entries in `db_entries` and the local entries in `local_entries`.

Look at the format of `db_entries` and `local_entries` and think about how you would compare the two listings. Is the current format easy to compare?

You might want to update your code to make them easier to compare.

### (A) Syncing algorithm

The general algorithm for mirroring your current directory on Dropbox (one-way sync) is as follows:

1. If there are any entries in `db_entries` that are not in `local_entries`, **delete** them.
2. If there are any entries in `local_entries` that are not in `db_entries`:
    - If the entry is a file, **upload** it.
    - If the entry is a directory, **create** it.
3. If there are any entries that are both in `db_entries` and `local_entries`:
    - If the entries are files, compare them to see if they have changed.
    - If the entries are directories, ignore them.
    
On what basis shall we compare the files in Step 3? We will look into that later. For now, carry out step 1 and 2.

1. Create a folder with `db.create_folder()`.
2. Upload a file into this folder with `db.upload_file()`.
3. Delete the file and folder with `db.delete()`.

After each step, verify that it was correctly executed by checking your Dropbox account. The folders and files should be created in `App/<your app name>`.

Implement the one-way sync algorithm to sync your JupyterHub home directory to your Dropbox folder.

Before you embark on this task, drop me a message and I will back up your home directory for you. Just in case.

In [None]:
# Write your one-way sync program here



### (B) Checking files for changes with `rev`

Notice that in the response from `db.upload_file()` (or `db.list_folder()`), files carry some additional information. There is a `content_hash` attribute and a `rev` attribute.

The `content_hash` is used to verify that a file has been uploaded correctly. This hash should match the hash of the file generated locally. The algorithm for doing so can be found [in Dropbox's documentation](https://www.dropbox.com/developers/reference/content-hash). **We will not do this hashing for this project.**

The `rev` (stands for **revision**) is generated each time the file is changed on Dropbox. By storing a copy of the `rev` and comparing it to the `rev` from Dropbox, we can easily know whether there have been any changes to the file on Dropbox.

To carry out step 3, therefore, we need to store the `rev` locally and compare it to the one returned from Dropbox.

The one-way sync algorithm with rev is as follows:

1. If there are any entries in `db_entries` that are not in `local_entries`, delete them.
2. If there are any entries in `local_entries` that are not in `db_entries`:
    - If the entry is a file, upload it **and store the `rev`**.
    - If the entry is a directory, create it.
3. If there are any entries that are both in `db_entries` and `local_entries`:
    - If the entries are files, compare **their `rev`s** to see if they have changed.
    - If the entries are directories, ignore them.

Implement the syncing algorithm. You may have to edit your code from **Task 3.2** to store the `rev` in `local_entries`.

It is recommended that you initialise any `rev`s for local file entries to `None` rather than `0` or `False`, to avoid invisible errors.

In [None]:
# Delete entries from db_entries that are not in local_entries
# Upload files from local_entries that are not in db_entries
# For files in both local_entries and db_entries, compare their revs,
# upload the file if the rev is different



## 5 Store the results to a file

The next time we sync the folder contents again, we will need to know the previous `rev`s for each file. That means we need to store them somewhere ... the easiest and most straightforward way is probably as a CSV file.

Write a function, `export_csv()`, that will write `local_entries` and their `rev`s (where necessary) to a CSV file.

You should decide an appropriate format to store the entries and their `rev`s before writing your code.

In [None]:
# Export local_entries to a CSV file

def export_csv(filename,entries):
    '''Store (current) local_entries with rev and mtime to CSV.'''
    # Write your code here

The next time you sync the files, you wouldn’t want to generate the `local_entries` from scratch: there wouldn't be any `rev`s to check!

You would have to load it from the CSV file created previously.

But you would also need to check it against the **most current** file/directory listing to see if anything has changed. But how would we do that, since `os.listdir()` doesn't generate `rev`s for us to check?

Whenever a file is modified, the operating system updates its **modified date/time**. You can get the **modified date/time** for a file using `os.path.getmtime()`.

Hmm, deeper and deeper ... now we have to store an attribute for the **modified date/time**, `mtime`, so that we can check it against the latest listing.

Write a function, `import_csv()`, that will read `local_entries`, their`rev`s, and their `mtimes` from a CSV file. You will also need to update your code from Task 5.1 to export the `mtime`s for each file.

In [None]:
# Import local_entries from a CSV file

def import_csv(filename):
    '''Read in (previous) local_entries, rev, and mtime from CSV.'''
    # Write your code here

Update your earlier code for generating and saving `local_entries`. It must read `local_entries` from `'local_entries.csv'` first. Then it must compare the `mtime` of files in `local_entries` to the current file & directory listing.
  - Where the `mtime` is the same, do nothing
  - If the `mtime` is different, set the `rev` to `None` so that it will be uploaded to Dropbox

# 6 Put it all together

Now that you have all the pieces written, see if you can work out the logic of the program. What should happen first, and what should happen next? Feel free to move the cells around and edit them so that it makes more sense to you.

Put them in the right order, and you should have a program that successfully copies your JupyterHub home directory to Dropbox. You can run the entire script by going to `Kernel` → `Restart and Run All`.