# Working with files (downloading, moving, deleting, etc.)

* Ways to access/download/upload data off the Internet.
* Read/write files
* Delete, check if exists for local files

Requests documentation: http://docs.python-requests.org/en/master/

### Download and save a large file

In [6]:
import requests

#file_url = 'ftp://ftp.ncbi.nlm.nih.gov/bioproject/README'
file_url = 'https://gist.github.com/oxyko/10798051fb9cf1e11f4baac2c6c49f3b/archive/44e343bfe87f56fbc8bb6fbf3a48294aa7b0a1b6.zip'
local_file_name = 'out/'+ file_url.split('/')[-1]
r = requests.get(file_url, stream=True)  # stream=True makes sure that python does not run out of memory when reading/writing
with open(local_file_name, 'wb') as f:
    for chunk in r.iter_content(chunk_size=512*1024):
        if chunk:
            f.write(chunk)

Now the file should be downloaded.

### Check if file exists

In [3]:
import os.path
os.path.isfile(local_file_name)

True

In [17]:
from pathlib import Path
my_file = Path(local_file_name)
my_file.is_file()

NameError: name 'local_file_name' is not defined

### Check file type

In [19]:
os.path.isfile('data')

False

In [20]:
os.path.isdir('data')

True

### List everything in a directory

In [6]:
import os
# This will list hidden files, directories, everything
os.listdir('data')

['.DS_Store',
 'nr.80.tar.gz',
 'open-data',
 'test.xlsx',
 'test.txt',
 'test.tar.gz']

### List only files in the directory

Pay attention to the directory in listdir and isfile. It all workds from the current dir.

In [14]:
[f for f in os.listdir('data') if os.path.isfile('data/'+f)]

['.DS_Store', 'nr.80.tar.gz', 'test.xlsx', 'test.txt', 'test.tar.gz']

### Delete a file
- os.remove() will remove a file.
- os.rmdir() will remove an empty directory.
- shutil.rmtree() will delete a directory and all its contents. 

In [5]:
os.remove(local_file_name)

In [12]:
os.path.isfile(local_file_name)

NameError: name 'local_file_name' is not defined

### Copy all files in a directory

In [14]:
import shutil

## Copies contents of source directory ('data') with all the subfolders to destination dir ('out/datacopy')
## Destination folder does not need to exist
shutil.copytree('data', 'out/datacopy')

'out/datacopy'

## Directory: check if exists, create, delete

In [19]:
new_dir = 'out/somedir'
os.path.exists(new_dir)

False

In [22]:
if not os.path.exists(new_dir):
    os.makedirs(new_dir)

os.path.exists(new_dir)

True

In [24]:
os.rmdir(new_dir)
os.path.exists(new_dir)

False

### Query a json api

In [1]:
#query_url = 'http://localhost:8983/solr/CFIA_all/select?fl=id&q=title:grain'
query_url = 'http://localhost:8983/solr/cfia_all/select?fl=id&q="Puccinia graminis"&rows=7000'
r = requests.get(query_url)
r.json()

NameError: name 'requests' is not defined

## Unzip a .tar.gz file

When you specify the directory to unzip to, .extractall() will create it, if it doesn't exist

In [10]:
import tarfile

filename = 'data/nr.80.tar.gz'

try:
    if (filename.endswith("tar.gz")):
        tar = tarfile.open(filename, "r:gz")
        tar.extractall(path='out/unzip')
        tar.close()
except FileNotFoundError as e:
    print("File not found. Error: {}".format(e))
    
print('Done! See the files in the "out/unzip" directory.')

Done! See the files in the "out/unzip" directory.


## Read from file

In [11]:
with open('data/test.txt', 'r') as f:
    file_contents = f.read()
file_contents

'Some lovely text to read in a test.\n\n'