# Interacting with HopsFS

HopsFS is a fork of the Hadoop Distributed File System (HDFS). 

To see what distinguishes HopsFS from HDFS from an architecural point of view refer to:

- [blogpost](https://www.logicalclocks.com/introducing-hops-hadoop/)
- [papers](https://www.logicalclocks.com/research-papers/)

To interact with HopsFS from python, you can use the hdfs module in the hops-util-py library, it provides an easy-to-use API that resembles interaction with the local filesystem using the python `os` module. 

## Import the Module

In [1]:
from hops import hdfs

Starting Spark application


ID,YARN Application ID,Kind,State,Spark UI,Driver log,Current session?
10,application_1537374274509_0011,pyspark,idle,Link,Link,✔


SparkSession available as 'spark'.


## Getting Project Information

When interacting with HopsFS from Hopsworks, you are always inside a **project**. When you are inside a project your activated HDFS user will be projectname__username. This is to set project-specific access control and multi-tenancy (you can read more about the low-level details here: [hopsworks blogpost](https://www.logicalclocks.com/introducing-hopsworks/)

In [2]:
project_user = hdfs.project_user()
project_name = hdfs.project_name()
project_path = hdfs.project_path()
print("project user: {}\nproject name: {}\nproject path: {}".format(project_user, project_name, project_path))

project user: HopsFS_Operations__meb10000
project name: HopsFS_Operations
project path: hdfs://10.0.2.15:8020/Projects/HopsFS_Operations/

## Read/Write From/To HDFS

In [3]:
logs_README = hdfs.load("Logs/README.md")
print("logs README: {}".format(logs_README.decode("utf-8")))
hdfs.dump("test", "Logs/README_dump_test.md")
logs_README_dumped = hdfs.load("Logs/README_dump_test.md")
print("logs README dumped: {}".format(logs_README_dumped.decode("utf-8")))

logs README: *This is an auto-generated README.md file for your Dataset!*
To replace it, go into your DataSet and edit the README.md file.

*Resources* DataSet
===

## Contains resources used by jobs, for example, jar files.
logs README dumped: test

## Copy Local FS <--> HDFS

In [4]:
# creates file in current working directory with a string
with open('test.txt', 'w') as f:
    f.write("test")
hdfs.copy_to_hdfs("test.txt", "Resources", overwrite=True)
hdfs.copy_to_local("Resources/test.txt", "", overwrite=True)
hdfs_copied_file = hdfs.load("Resources/test.txt")
with open('test.txt', 'r') as f:
    local_copied_file = f.read()
print("copied file from local to hdfs: {}".format(hdfs_copied_file.decode("utf-8")))
print("copied file from hdfs to local: {}".format(local_copied_file))

copied file from local to hdfs: test
copied file from hdfs to local: test

## List Directories

In [5]:
logs_files = hdfs.ls("Logs/")
print(logs_files)
logs_files_md = hdfs.glob("Logs/*.md")
print(logs_files_md)
logs_path_names = hdfs.lsl("Logs/")
print(logs_path_names)

[u'hdfs://10.0.2.15:8020/Projects/HopsFS_Operations/Logs/README.md', u'hdfs://10.0.2.15:8020/Projects/HopsFS_Operations/Logs/README2.md', u'hdfs://10.0.2.15:8020/Projects/HopsFS_Operations/Logs/README_dump_test.md', u'hdfs://10.0.2.15:8020/Projects/HopsFS_Operations/Logs/test.txt', u'hdfs://10.0.2.15:8020/Projects/HopsFS_Operations/Logs/test4.txt']
[u'hdfs://10.0.2.15:8020/Projects/HopsFS_Operations/Logs/README.md', u'hdfs://10.0.2.15:8020/Projects/HopsFS_Operations/Logs/README2.md', u'hdfs://10.0.2.15:8020/Projects/HopsFS_Operations/Logs/README_dump_test.md']
[{'size': 211L, 'kind': u'file', 'group': u'HopsFS_Operations__Logs', 'name': u'hdfs://10.0.2.15:8020/Projects/HopsFS_Operations/Logs/README.md', 'replication': 3L, 'last_mod': 1537437288L, 'owner': u'HopsFS_Operations__meb10000', 'path': u'hdfs://10.0.2.15:8020/Projects/HopsFS_Operations/Logs/README.md', 'last_access': 1537441705L, 'block_size': 134217728L, 'permissions': 777L}, {'size': 227L, 'kind': u'file', 'group': u'HopsFS_

## Copy Within HDFS

In [6]:
hdfs.cp("Resources/test5.txt", "Logs/")
logs_files = hdfs.ls("Logs/")
print(logs_files)

[u'hdfs://10.0.2.15:8020/Projects/HopsFS_Operations/Logs/README.md', u'hdfs://10.0.2.15:8020/Projects/HopsFS_Operations/Logs/README2.md', u'hdfs://10.0.2.15:8020/Projects/HopsFS_Operations/Logs/README_dump_test.md', u'hdfs://10.0.2.15:8020/Projects/HopsFS_Operations/Logs/test.txt', u'hdfs://10.0.2.15:8020/Projects/HopsFS_Operations/Logs/test4.txt', u'hdfs://10.0.2.15:8020/Projects/HopsFS_Operations/Logs/test5.txt']

## Create and Remove Directories

In [7]:
hdfs.mkdir("Logs/test_dir")
logs_files_prior_delete = hdfs.ls("Logs/")
print("files prior to delete: {}".format(logs_files_prior_delete))
hdfs.rmr("Logs/test_dir")
logs_files_after_delete = hdfs.ls("Logs/")
print("files after to delete: {}".format(logs_files_after_delete))

files prior to delete: [u'hdfs://10.0.2.15:8020/Projects/HopsFS_Operations/Logs/README.md', u'hdfs://10.0.2.15:8020/Projects/HopsFS_Operations/Logs/README2.md', u'hdfs://10.0.2.15:8020/Projects/HopsFS_Operations/Logs/README_dump_test.md', u'hdfs://10.0.2.15:8020/Projects/HopsFS_Operations/Logs/test.txt', u'hdfs://10.0.2.15:8020/Projects/HopsFS_Operations/Logs/test4.txt', u'hdfs://10.0.2.15:8020/Projects/HopsFS_Operations/Logs/test5.txt', u'hdfs://10.0.2.15:8020/Projects/HopsFS_Operations/Logs/test_dir']
files after to delete: [u'hdfs://10.0.2.15:8020/Projects/HopsFS_Operations/Logs/README.md', u'hdfs://10.0.2.15:8020/Projects/HopsFS_Operations/Logs/README2.md', u'hdfs://10.0.2.15:8020/Projects/HopsFS_Operations/Logs/README_dump_test.md', u'hdfs://10.0.2.15:8020/Projects/HopsFS_Operations/Logs/test.txt', u'hdfs://10.0.2.15:8020/Projects/HopsFS_Operations/Logs/test4.txt', u'hdfs://10.0.2.15:8020/Projects/HopsFS_Operations/Logs/test5.txt']

## Move/Rename Files

In [8]:
logs_files_prior_move = hdfs.ls("Logs/")
print("files prior to move: {}".format(logs_files_prior_move))
hdfs.move("Logs/README_dump_test.md", "Logs/README_dump_test2.md")
logs_files_after_move = hdfs.ls("Logs/")
print("files after move: {}".format(logs_files_after_move))
logs_files_prior_rename = hdfs.ls("Logs/")
print("files prior to rename: {}".format(logs_files_prior_rename))
hdfs.rename("Logs/README_dump_test2.md", "Logs/README_dump_test.md")
logs_files_after_rename = hdfs.ls("Logs/")
print("files after move: {}".format(logs_files_after_rename))

files prior to move: [u'hdfs://10.0.2.15:8020/Projects/HopsFS_Operations/Logs/README.md', u'hdfs://10.0.2.15:8020/Projects/HopsFS_Operations/Logs/README2.md', u'hdfs://10.0.2.15:8020/Projects/HopsFS_Operations/Logs/README_dump_test.md', u'hdfs://10.0.2.15:8020/Projects/HopsFS_Operations/Logs/test.txt', u'hdfs://10.0.2.15:8020/Projects/HopsFS_Operations/Logs/test4.txt', u'hdfs://10.0.2.15:8020/Projects/HopsFS_Operations/Logs/test5.txt']
files after move: [u'hdfs://10.0.2.15:8020/Projects/HopsFS_Operations/Logs/README.md', u'hdfs://10.0.2.15:8020/Projects/HopsFS_Operations/Logs/README2.md', u'hdfs://10.0.2.15:8020/Projects/HopsFS_Operations/Logs/README_dump_test2.md', u'hdfs://10.0.2.15:8020/Projects/HopsFS_Operations/Logs/test.txt', u'hdfs://10.0.2.15:8020/Projects/HopsFS_Operations/Logs/test4.txt', u'hdfs://10.0.2.15:8020/Projects/HopsFS_Operations/Logs/test5.txt']
files prior to rename: [u'hdfs://10.0.2.15:8020/Projects/HopsFS_Operations/Logs/README.md', u'hdfs://10.0.2.15:8020/Projec

## Change Owner and Change Mode

In [9]:
import stat
file_stat = hdfs.stat("Logs/README.md")
print("file permissions prior to chmod: {0:b}".format(file_stat.st_mode))
hdfs.chmod("Logs/README.md", 700)
file_stat = hdfs.stat("Logs/README.md")
print("file permissions after to chmod: {0:b}".format(file_stat.st_mode))
hdfs.chmod("Logs/README.md", 777)
file_owner = file_stat.st_uid
#print("file owner prior to chown: {}".format(file_owner))
#hdfs.chown("Logs/README.md", "meb10000", "meb10000")

file permissions prior to chmod: 1100001001
file permissions after to chmod: 1010111100

## File Metadata

In [10]:
file_stat = hdfs.stat("Logs/README.md")
print("file_stat: {}".format(file_stat))
file_access = hdfs.access("Logs/README.md", 777)
print("file access: {}".format(file_access))

file_stat: StatResult(st_atime=1537441705, st_blksize=134217728, st_blocks=1L, st_ctime=0, st_dev=0L, st_gid='HopsFS_Operations__Logs', st_ino=0, st_mode=777, st_mtime=1537437288, st_nlink=1, st_size=211L, st_uid='HopsFS_Operations__meb10000', kind='file', name=u'hdfs://10.0.2.15:8020/Projects/HopsFS_Operations/Logs/README.md', replication=3)
file access: False

## Get Absolute Path

In [11]:
abs_path = hdfs.abs_path("Logs/")
print(abs_path)

hdfs://10.0.2.15:8020/Projects/HopsFS_Operations/Logs/