# How can I upload a file?
### Overview
Here we introduce file upload via API. Specifically we will:

 1. create a new project
 2. check that there are no files
 3. upload some files
 4. set any metadata we like
 5. search for the files via metadata
 
### Prerequisites
 1. You need your _authentication token_ and the API needs to know about it. See <a href="Setup_API_environment.ipynb">**Setup_API_environment.ipynb**</a> for details.
 3. You downloaded/cloned the whole repo so the files we will try to upload exist
 
## Imports
We import the _Api_ class from the official sevenbridges-python bindings below.

In [None]:
import sevenbridges as sbg

## Initialize the object
The `Api` object needs to know your **auth\_token** and the correct path. Here we assume you are using the credentials file in your home directory. For other options see <a href="Setup_API_environment.ipynb">Setup_API_environment.ipynb</a>

In [None]:
# [USER INPUT] specify platform {cgc, sbpla, etc}
prof = 'sbpla'


config_file = sbg.Config(profile=prof)
api = sbg.Api(config=config_file)

## Create a shiny, new project
To avoid any copy-errors with the app, we will make a new project. If this project name already exists, the code below will raise an interupt and fail. Be _creative_ with your project names, it's something you will look back on and laugh. 

#### PROTIPS
This next cell is more extensively detailed in this [recipe](projects_makeNew.ipynb)

In [None]:
# [USER INPUT] Set project name and billing group index here:
# Note that you can have multiple apps or projects with the same name. It is best practice to reference entities by ID.
new_project_name = 'Shiny and Newer'                          
index_billing = -1

# Check if this project already exists. LIST all projects and check for name match
my_project = [p for p in api.projects.query(limit=100).all() \
              if p.name == new_project_name]      
              
if my_project:    # exploit fact that empty list is False
    print('A project named {} exists, please choose a unique name'
          .format(new_project_name))
    raise KeyboardInterrupt
else:
    # Create a new project
    # What are my funding sources?
    billing_groups = api.billing_groups.query()  
    print((billing_groups[index_billing].name + \
           ' will be charged for computation and storage (if applicable)'))

    # Set up the information for your new project
    new_project = {
            'billing_group': billing_groups[index_billing].id,
            'description': """A project created by the API recipe (apps_installFromJSON).
                          This also supports **markdown**
                          _Pretty cool_, right?
                       """,
            'name': new_project_name
    }

    my_project = api.projects.create(
        name=new_project['name'], billing_group=new_project['billing_group'], 
        description=new_project['description']
    )
    
    # (re)list all projects, and get your new project
    my_project = [p for p in api.projects.query(limit=100).all() 
              if p.name == new_project_name][0]

    print('Your new project {} has been created.'.format(
        my_project.name))
    # Print description if it exists
    if hasattr(my_project, 'description'): 
        print('Project description: \n {}'.format(my_project.description)) 

## Sanity-check: do I have any files?
Since you have just created the project, there will be **no** _Files_, _Apps_, or _Tasks_ in it. But just to be sure, let's query the apps in our project.

#### PROTIPS
This next cell is more extensively detailed in this [recipe](files_listAll.ipynb)

In [None]:
my_files = api.files.query(project = my_project)
print('In project {}, you have {} files.'.format(
    my_project.name, my_files.total))

## Upload some toy files
Here we are using some of the recipes from the [ok, API](https://github.com/sbg/okAPI) repository. This **synchronous** upload will not return any information. Next, we set the _same metadata_ to all of the files (except one). What is really excellent about this **flexible metadata** is that it is searchable, you can use it to build tasks later. Furthermore, you can set **tags** via API. These will be visable on the GUI and via API.

#### Notes:

 * The search by metadata function does **not** work with booleans or integers right now. This is a **known** bug so you **know** we are on it! However, I'm confident you will be able to do something clever like change True to 'True' or 1 to 'one' if you really need it
 * Alternatively, and **orders of magnitude** more slowly, you could get the metadata of each file individually and search it (including booleans and integers) in Python. An example of that is [here](files_listByMetadata.ipynb) 

In [None]:
# [USER INPUT] file names to upload:
file_list = ['files_listAll.ipynb',
            'files_copyFromMyProject.ipynb',
            'files_copyFromPublicReference.ipynb',
            'files_detailOne.ipynb',
            'files_upload_and_setMetadata.ipynb']

for f in file_list:
    api.files.upload(project=my_project, path=f)

In [None]:
# List all files in the project
my_files = api.files.query(project=my_project)
print('In project {}, you have {} files.\n'.format(
    my_project.name, my_files.total))
for f in my_files:
    print(f.name)
    
# Set file metadata
base_md = {
    'toy_example': False,
    'extension': 'ipynb',
    'revision_number': 7,
    'Hello':'Nope!'
}

# We could go through each individual file and set metadata
# for f in my_files:
#     f.metadata = base_md
#     f.save()
# But note that this means one API request for each f.save()

# But it is much more efficient if it is done in bulk, we can update up to 100 files with one request
for f in my_files:
    f.metadata = base_md
api.files.bulk_update(my_files)

# change one file's metadata to look for it later
f = my_files[2]
f.metadata['Hello'] = "is it me you're looking for?"
f.save()

In [None]:
# Also set a tag on that file
f.tags = ['example']
f.save()

In [None]:
# List files based on metadata
my_matched_files = api.files.query(
    project=my_project, 
    metadata = {'Hello' : "is it me you're looking for?"}
)

print('In project {}, you have {} matching files.\n'.format(
    my_project.name, my_matched_files.total))

for f in my_matched_files:
    print("""File named ({}) matched search criteria. 
    File metadata is {}.
    File tags are {}'"""
          .format(f.name, f.metadata, f.tags))

## (optional) Upload real-sized files
Toy files are great, but are not going to rock the genomic world. What about _hundreds of Gb_? The API uploader deals with that pretty well too. However, the method above doesn't give any indications of progress, which is rather unsettling. So let's use **asynchronous uploads** with a **progress bar**.

_iPython_ is **not very reliable with printing to screen**, so I would recommend using this in _Python_. (in fact, the code below **does not work**). It is _especially unreliable_ in a for loop as the progress bar from the prior upload can interfere with the current one. So here we are only showing a _single file_ (which you need bring from your own files).

#### Note
By setting
``` python
wait=False
```
we are using _ayschronous_ uploading. This also means we need to **.start()** each upload when we are ready. This is different than the prior cell where we used _synchronous_ uploads which started automatically.

In [None]:
# [USER INPUT] file to upload:
file_name = 'heavy.sites.vcf'    # TODO: Replace with your own large, local file


from sevenbridges.transfer.utils import simple_progress_bar

upload = api.files.upload(
    path = file_name, project = my_project, wait=False
)
upload.status

In [None]:
upload.add_progress_callback(simple_progress_bar)
upload.status
upload.start()

## Additional Information
Detailed documentation of this particular REST architectural style request is available in this [section](http://docs.sevenbridges.com/docs/upload-files)