# Organizing and Managing Information as a Tree of Files and Directories

Before we get our hands dirty there are a few "ideas" that are worth understanding in order to make our UNIX adventure a little easier.

## Information Management and Processing

![pdp11](../images/Files-Cabinets.jpg)

Using computers and managing information has become synonymous.   You probably take for granted that everyone knowns what a computer "file" and "folder" are and that they have always existed.  

This, however, is not true and a fundamental contribution of operating systems was standardizing the notion of files and their organization into a hiearchical tree structure.

Files and directories are so fundamental to UNIX it is very hard to say anything about UNIX without first understanding a little bit about both.  


## Files

<img align="right" src="../images/file.pdf">In UNIX a file is an abstract object that we can store information in.  In particular we can write contents too and read contents from it. Beyond the information it contains a file has associated with it several other descriptive facts.  Examples of these facts include:
 - who owns the file 
 - the length of the file (measured in [bytes](../assembly/byte))
 - who has permissions to read or write its contents 
 - the time the contents was last modified
 - the time that it was last read
 - the time that the descriptive facts where last changed (eg the file permissions were modified)

We think of the contents of the file as the "data" of the file and we think of other information as the "meta-data" of the file.  

> Why do say that a file is an "abstract" object?  Well when it comes to computers most of the things we think of as "real" are just constructions of software.  An operating system like UNIX is responsible for providing us with such things like files but in reality there is rarely a single physical object that corresponds to any given file.  Rather it is the job of the operating system to use the resources of the computer to create an abstract file object if we ask it to do so.  

## Directory Tree
<img align="right" src="../images/359px-ENC_SYSTEME_FIGURE.jpeg">
While files serve as a great building block for managing information they are not enough.  People, like librarians, who have managed and currated large bodies of information know that you need a consistent and yet flexibly way to group, organize and index information.  

One organization that early operating systems developers settled on was a hiearchical tree of nested "directories". 



### Directories
A "directory" contains a list of names each name identifies either a single file or another directory.   These entries are said to be within the directory.  A directory that is within a directory is said to be a sub-directory.  Like a file a directory also has the same kind of meta-data associated with it (owner, permisions, etc).  The contents of a directory is, however, the list of its entries.  This structure results in the directories and files forming a tree.  As such the name of any file or directory is actually a path along the tree.

While this might sound confusing it is really quite intutive and you have likely been using such a directory hiearchy for most of your digital life.

Perhaps the most important thing to realize is that a UNIX user is allowed to create directories and files and name them as they see fit.  This ability allows a user to flexibility organize their information in an way they like and makes sense to them. To get a better handle on this let's walk through a simple abstract example.  Later we will repeat this example but using UNIX commands to get a more exact understanding.

Lets assume that you are a CS student and like poetry and want to organize yourself.  To this end you might choose to create the following directory and file structure.

![dirtree](../images/dirtree.pdf)


In this diagram we use circles to represent directories and rectangles to represent files.  The name of a directory or file is its label.  Arrows are used to show the entires of a particular directory (A file cannot contain have any arrows starting from it).  

#### HOME Directory
The above diagram assumes that we are the user "jonathan" who has a directory that they own.  By creating files and sub-directories within the "jonathan" directory we can organize our information. This personal user directory in UNIX this is called the users **HOME** directory and its name matches the users UNIX user name.   We will say more about this when we revisit this example.

Here the user, jonathan, has created two sub-directories - Poetry and Classes. Inside the Poetry directory the user has created one sub-directory (Vogon -- to store their favorite Vogon poetry and two files (ToBitOrToByte and OdeToASemicolon, two poems they have written).  

Similarly we that the user has create more complex directory structure to organize their class work.  One thing to note is that to keep things sensible names of entries in a single directory must be unique.  However, in different directories entries can have the same name -- as we can see with the three "Problem1" files at the bottom of the figure.  Each of these are in different directories.  

#### PATHS and the ROOT
As a matter of fact from the above diagram we can see that the unique name of a file or directory is really a composition of that traces a path though the tree.   In UNIX the very top of the tree is the one directory that always exists and is not a sub-directory of any other directory.  This directory is called the **ROOT** directory.  For reasons we will see later in UNIX the **ROOT** directories name is **/** all by itself.


For example the name of the Problem1 files at the bottom of the diagram are: 


- / + home + jonathan + Classes + CS + 210 + Assignment1 + Problem1 
- / + home + jonathan + Classes + CS + 210 + Assignment2 + Problem1 
- / + home + jonathan + Classes + CS + 210 + Assignment3 + Problem1 

In UNIX the way we actually will specify the path name of a file or directory will be to seperate the indepdenent components with the "/" character.  So the above three files as proper unix path names would be:

- /home/jonathan/Classes/CS/210/Assignment1/Problem1 
- /home/jonathan/Classes/CS/210/Assignment2/Problem1 
- /home/jonathan/Classes/CS/210/Assignment3/Problem1 

Remember "/" by itself is the name of the root directory and the full name of any other directory or file begins with / and then is composed of all the parent directories sperated by an additional /.



