# Uploading to HDFS

It is time to learn how to store files in HDFS.  There are many ways to do this, but we will concentrate on doing it from Python.

A tutorial describing the HDFS Python client library is here

https://hdfscli.readthedocs.io/en/latest/quickstart.html#python-bindings

and the full reference here:

https://hdfscli.readthedocs.io/en/latest/api.html#api-reference

In [1]:
import hdfs
from hdfs import InsecureClient

First we will access HDFS as `root` so that we have enough rights to create a directory where the `vagrant` user can work:

In [2]:
client = InsecureClient('http://namenode:50070', user='root')
#client.delete('/Users', recursive=True)

In [3]:
client.list('/')

['Users']

In [4]:
client.makedirs('/Users')
client.makedirs('/Users/vagrant')
client.set_owner('/Users/vagrant', owner='vagrant', group='vagrant')

In [5]:
client.list('/')

['Users']

In [6]:
client.list('/Users')

['vagrant']

Now let's create a new session as the `vagrant` user:

In [7]:
client = InsecureClient('http://namenode:50070', user='vagrant')

and upload a single `README.md` file (just to demonstrate how upload works):

In [8]:
import os

os.getcwd()

'/home/vagrant/work/week7'

In [9]:
datadir = os.getcwd()
print(datadir)

/home/vagrant/work/week7


In [10]:
from glob import glob
the_dirs = glob("/home/vagrant/work/week7/structured-2018*")
print(the_dirs)
type(the_dirs)

['/home/vagrant/work/week7/structured-2018-04-01-birmingham', '/home/vagrant/work/week7/structured-2018-01-14-neworleans', '/home/vagrant/work/week7/structured-2018-04-19-relegation', '/home/vagrant/work/week7/structured-2018-04-08-proleague1', '/home/vagrant/work/week7/structured-2018-08-19-champs', '/home/vagrant/work/week7/structured-2018-07-29-proleague2', '/home/vagrant/work/week7/structured-2018-06-17-anaheim', '/home/vagrant/work/week7/structured-2018-03-11-atlanta', '/home/vagrant/work/week7/structured-2018-04-22-seattle']


list

In [11]:
#local_path = os.path.join(datadirs)
#hdfs_path = '/Users/vagrant/'
#client.upload(local_path, hdfs_path)

In [12]:
for i in the_dirs:
    local_path = os.path.join(i)
    hdfs_path = '/Users/vagrant/'
    client.upload(hdfs_path, local_path)

HdfsError: Remote path '/Users/vagrant/structured-2018-04-01-birmingham' already exists.

In [13]:
client.list('/Users/vagrant/')

['structured-2018-01-14-neworleans',
 'structured-2018-03-11-atlanta',
 'structured-2018-04-01-birmingham',
 'structured-2018-04-08-proleague1',
 'structured-2018-04-19-relegation',
 'structured-2018-04-22-seattle',
 'structured-2018-06-17-anaheim',
 'structured-2018-07-29-proleague2',
 'structured-2018-08-19-champs',
 'week7']