# Ch 14: Files


### Reading and Writing

A text file is a sequence of characters stored on a permanent medium like a hard drive, or flash memory. We saw how to open and read a file in Reading Word Lists. To write a file, you have to open it with mode "w" as a second parameter:


In [1]:
fout = open("output.txt", "w")

IOStream(<file output.txt>)

If the file already exists, opening it in write mode clears out the old data and starts fresh, so be careful! If the file doesn’t exist, a new one is created. The function `open` returns a file object and the function `write` puts data into the file.

In [2]:
line1 = "This here's the wattle,\n";
write(fout, line1)

24

The return value is the number of characters that were written. The file object keeps track of where it is, so if you call
write again, it adds the new data to the end of the file.

In [3]:
line2 = "the emblem of our land.\n";
write(fout, line2)

24

When you are done writing, you should close the file.

In [4]:
close(fout)

***Formatting***

The argument of write has to be a string, so if we want to put other values in a file, we have to convert them to strings. The
easiest way to do that is with string or string interpolation:

In [5]:
fout = open("output.txt", "w")
write(fout, string(150))

3

An alternative is to use the `print(ln)` family of functions.

In [6]:
camels = 42
println(fout, "I have spotted $camels camels.")

A more powerful alternative is the @printf macro that prints using a C style format specification
string, which you can read about at https://docs.julialang.org/en/v1/stdlib/Printf/

In [7]:
close(fout)

### Filenames and Paths

Files are organized into directories (also called “folders”). Every running program has a “current directory”, which is the default directory for most operations. For example, when you open a file for reading, Julia looks for it in the current directory. 

The function `pwd` returns the name of the current directory:

In [8]:
cwd = pwd()

"C:\\users\\st\\OneDrive\\STJuliaLessons"

`cwd` stands for “current working directory”.  A string like "/home/ben" that identifies a file or directory is called a path.

A simple filename, like memo.txt is also considered a path, but it is a relative path because it relates to the current
directory. 

If the current directory is `C:\\users\\st\\OneDrive\\STJuliaLessons`, the filename `memo.txt` would refer to `C:\\users\\st\\OneDrive\\STJuliaLessons\\memo.txt.` A path that begins with / does not depend on the current directory; it is called an absolute path. To find the absolute path to a file, you can use `abspath`:

In [9]:
abspath("emma.txt")

"C:\\users\\st\\OneDrive\\STJuliaLessons\\emma.txt"

Julia provides other functions for working with filenames and paths. For example, `ispath` checks whether a file or
directory exists:

In [11]:
ispath("emma.txt")

true

If it exists, `isdir` checks whether it’s a directory

In [12]:
isdir("emma.txt")

false

Similarly, `isfile` checks whether it’s a file. `readdir` returns an array of the files (and other directories) in the given directory:

In [13]:
readdir(cwd)

32-element Array{String,1}:
 ".ipynb_checkpoints"                       
 "cmudict-0.7b.txt"                         
 "cmudict-rem-punc.txt"                     
 "emma.txt"                                 
 "EricHalfBee.txt"                          
 "intro-to-julia-for-data-science"          
 "intro-to-ml"                              
 "MyThinkJulia-Ch00-JupyterNB.ipynb"        
 "MyThinkJulia-Ch01-Intro.ipynb"            
 "MyThinkJulia-Ch02-VarExpState.ipynb"      
 "MyThinkJulia-Ch03-Functions.ipynb"        
 "MyThinkJulia-Ch04-Interface.ipynb"        
 "MyThinkJulia-Ch04-Turtle graphic geom.pdf"
 ⋮                                          
 "MyThinkJulia-Ch09-WordPlay.ipynb"         
 "MyThinkJulia-Ch10-Arrays.ipynb"           
 "MyThinkJulia-Ch11-Dictionary.ipynb"       
 "MyThinkJulia-Ch12-Tuple.ipynb"            
 "MyThinkJulia-Ch13-DataStructure.ipynb"    
 "MyThinkJuliaCh16.ipynb"                   
 "output.txt"                               
 "select-notebooks"        

To demonstrate these functions, the following example “walks” through a directory, prints the names of all the files, and
calls itself recursively on all the directories.

In [14]:
function walk(dirname)
    for name in readdir(dirname)
        path = joinpath(dirname, name)
        if isfile(path)
            println(path)
        else
            ]walk(path)
        end
    end
end

walk (generic function with 1 method)

In [16]:
walk("C:\\users\\st\\OneDrive\\STJuliaLessons")

C:\users\st\OneDrive\STJuliaLessons\.ipynb_checkpoints\create_a_caesar_cipher_solutions-checkpoint.ipynb
C:\users\st\OneDrive\STJuliaLessons\.ipynb_checkpoints\MyThinkJulia-Ch00-JupyterNB-checkpoint.ipynb
C:\users\st\OneDrive\STJuliaLessons\.ipynb_checkpoints\MyThinkJulia-Ch03rev-checkpoint.ipynb
C:\users\st\OneDrive\STJuliaLessons\.ipynb_checkpoints\MyThinkJulia-Ch04rev-checkpoint.ipynb
C:\users\st\OneDrive\STJuliaLessons\.ipynb_checkpoints\MyThinkJulia-Ch04z5-Plotting-checkpoint.ipynb
C:\users\st\OneDrive\STJuliaLessons\.ipynb_checkpoints\MyThinkJulia-Ch05-checkpoint.ipynb
C:\users\st\OneDrive\STJuliaLessons\.ipynb_checkpoints\MyThinkJulia-Ch06-checkpoint.ipynb
C:\users\st\OneDrive\STJuliaLessons\.ipynb_checkpoints\MyThinkJulia-Ch06z56S-checkpoint.ipynb
C:\users\st\OneDrive\STJuliaLessons\.ipynb_checkpoints\MyThinkJulia-Ch07-checkpoint.ipynb
C:\users\st\OneDrive\STJuliaLessons\.ipynb_checkpoints\MyThinkJulia-Ch08-checkpoint.ipynb
C:\users\st\OneDrive\STJuliaLessons\.ipynb_checkpoints

C:\users\st\OneDrive\STJuliaLessons\select-notebooks\110 Multiple dispatch.ipynb
C:\users\st\OneDrive\STJuliaLessons\select-notebooks\115 Multiple dispatch examples.ipynb
C:\users\st\OneDrive\STJuliaLessons\select-notebooks\120 Types.ipynb
C:\users\st\OneDrive\STJuliaLessons\select-notebooks\130 OneHot Vector.ipynb
C:\users\st\OneDrive\STJuliaLessons\select-notebooks\140 ModInt.ipynb
C:\users\st\OneDrive\STJuliaLessons\select-notebooks\150 Iterators.ipynb
C:\users\st\OneDrive\STJuliaLessons\select-notebooks\160 AutoDiff.ipynb
C:\users\st\OneDrive\STJuliaLessons\select-notebooks\170 Basic linear algebra.ipynb
C:\users\st\OneDrive\STJuliaLessons\select-notebooks\180 Factorizations and other fun.ipynb
C:\users\st\OneDrive\STJuliaLessons\select-notebooks\calculate_pi.ipynb
C:\users\st\OneDrive\STJuliaLessons\select-notebooks\compressing_an_image.ipynb
C:\users\st\OneDrive\STJuliaLessons\select-notebooks\compressing_an_image_solutions.ipynb
C:\users\st\OneDrive\STJuliaLessons\select-noteboo

In [17]:
?walkdir

search: [0m[1mw[22m[0m[1ma[22m[0m[1ml[22m[0m[1mk[22m[0m[1md[22m[0m[1mi[22m[0m[1mr[22m



```
walkdir(dir; topdown=true, follow_symlinks=false, onerror=throw)
```

Return an iterator that walks the directory tree of a directory. The iterator returns a tuple containing `(rootpath, dirs, files)`. The directory tree can be traversed top-down or bottom-up. If `walkdir` encounters a [`SystemError`](@ref) it will rethrow the error by default. A custom error handling function can be provided through `onerror` keyword argument. `onerror` is called with a `SystemError` as argument.

# Examples

```julia
for (root, dirs, files) in walkdir(".")
    println("Directories in $root")
    for dir in dirs
        println(joinpath(root, dir)) # path to directories
    end
    println("Files in $root")
    for file in files
        println(joinpath(root, file)) # path to files
    end
end
```

```julia-repl
julia> mkpath("my/test/dir");

julia> itr = walkdir("my");

julia> (root, dirs, files) = first(itr)
("my", ["test"], String[])

julia> (root, dirs, files) = first(itr)
("my/test", ["dir"], String[])

julia> (root, dirs, files) = first(itr)
("my/test/dir", String[], String[])
```


### Catching Exceptions

A lot of things can go wrong when you try to read and write files. If you try to open a file that doesn’t exist, you get a SystemError:

In [18]:
fin = open("bad_file")

SystemError: SystemError: opening file "bad_file": No such file or directory

It is easier to go ahead and try—and deal with problems if they happen—which is exactly what the try statement does.
The syntax is similar to an if statement:

In [21]:
f = open("output.txt")
try
    line = readline(f)
    println(line)
catch exc
    println("Something went wrong: $exc")
finally
    close(f)
end

150I have spotted 42 camels.


The `finally` keyword provides a way to run some code when a given block of code exits, regardless of how it exits:

### Command Objects

Most operating systems provide a command-line interface, also known as a shell. Shells usually provide commands to navigate the file system and launch applications. For example, in Unix you can change directories with cd , display the contents of a directory with ls , and launch a web browser by typing (for example) firefox . Any program that you can launch from the shell can also be launched from Julia using a command object:

In [22]:
cmd = `echo hello`

`[4mecho[24m [4mhello[24m`

Backticks are used to delimit the command.
The function run executes the command:

In [23]:
run(cmd)

hello


Process(`[4mecho[24m [4mhello[24m`, ProcessExited(0))

If you want to read the output of the external command, read can be used instead:

In [24]:
a = read(cmd, String)

"hello\n"

For example, most Unix systems provide a command called md5sum or md5 that reads the contents of a file and computes
a “checksum”. You can read about MD5 at https://en.wikipedia.org/wiki/Md5. This command provides an efficient way to
check whether two files have the same contents. The probability that different contents yield the same checksum is very
small (that is, unlikely to happen before the universe collapses).
You can use a command object to run md5 from Julia and get the result:

In [26]:
filename = "output.txt"

#cmd = `md5 $filename`  # for UNIX

cmd = `CertUtil -hashfile $filename MD5`

res = read(cmd, String)

"MD5 hash of output.txt:\r\n5bf98090c23bbd88314ac9bdf319ba1f\r\nCertUtil: -hashfile command completed successfully.\r\n"

### Modules

Julia introduces modules to create separate variable workspace, i.e. new global scopes.

A module starts with the keyword `module` and ends with `end`. Naming conflicts are avoided between your own top-level definitions and those found in somebody else’s code. 

`import` allows to control which names from other modules are visible and `export` specifies which of your names are public, i.e. can be used outside the module without being prefixed with the name of the module.

In [1]:
module LineCount

    export linecount

    function linecount(filename)
        count = 0
        for line in eachline(filename)
            count += 1
        end
        count
    end

end

Main.LineCount

The module LineCount object provides linecount :

In [2]:
using .LineCount

### Debugging

When you are reading and writing files, you might run into problems with whitespace. These errors can be hard to debug because spaces, tabs and newlines are normally invisible.

The built-in functions `repr` or `dump` can help. It takes any object as an argument and returns a string representation of the object.

In [30]:
s = "1 2\t 3\n 4";
repr(s)

"\"1 2\\t 3\\n 4\""

In [31]:
dump(s)

String "1 2\t 3\n 4"
