# Uploading to HDFS

It is time to learn how to store files in HDFS.  There are many ways to do this, but we will concentrate on doing it from Python.

A tutorial describing the HDFS Python client library is here

https://hdfscli.readthedocs.io/en/latest/quickstart.html#python-bindings

and the full reference here:

https://hdfscli.readthedocs.io/en/latest/api.html#api-reference

In [3]:
from hdfs import InsecureClient

First we will access HDFS as `root` so that we have enough rights to create a directory where the `vagrant` user can work:

In [19]:
client = InsecureClient('http://namenode:50070', user='root')
#client.delete('/Users', recursive=True)

True

In [20]:
client.list('/')

[]

In [9]:
client.makedirs('/Users')
client.makedirs('/Users/vagrant')
client.set_owner('/Users/vagrant', owner='vagrant', group='vagrant')

In [11]:
client.list('/')

['Users']

In [12]:
client.list('/Users')

['vagrant']

Now let's create a new session as the `vagrant` user:

In [13]:
client = InsecureClient('http://namenode:50070', user='vagrant')

and upload a single `README.md` file (just to demonstrate how upload works):

In [15]:
import os

datadir = os.path.join('./', 'cwl-data', 'data', 'structured')
print(datadir)

./cwl-data/data/structured


In [16]:
localreadmepath = os.path.join(datadir, 'README.md')
hdfsreadmepath = '/Users/vagrant/README.md'
client.upload(hdfsreadmepath, localreadmepath)

'/Users/vagrant/README.md'

In [17]:
client.list('/Users/vagrant/')

['README.md']