# Google Python class in Julia Part 3: Copyspecial

As part of teaching myself Python (after doing so half-heartedly for about a year) I completed Google's Python course. Also during that time period, I was learning Julia and mostly using its packages to check the results of various statistical models (GLM, MixedModels). What is nice about Julia is that it combines the best parts of Python, MATLAB, and R. For current purposes, it shares several data structures with Python.

One goal of mine is to make code and analyses portable across platforms and programs. The Google Python course is good for this, at it instructs how to perform basic tasks (file I/O, counts, low-level tokenization) using base Python. Julia is a good language to port this to, as not only it shares data structures, but also is designed to be fast; something that could be very useful when doing basic NLP-like tasks in batches. So in order to familiarize myself with Julia and learn how to port things, I will implement the exercises in Julia.

## Copyspecial exercise

The goal of the copyspecial exercise is to take a list of files contained in a directory and either copy them to a new directory or place them into a ZIP archive. The first part of of the exercise focuses on getting the absolute paths of the files. The second part involves copying the files. The third part focuses on commands to create the ZIP archive.


## Part A: Manipulating file paths
The first part of the exercise involves:  
1. specifying a directory;  
2. printing an error if the files with the correct format are not found;  
3. if the files are found, put them in a list;  
4. print the absolute path for each file;  
5. return the list of file names (with complete path)  

The first step is to define a function that takes a directory as an argument and then implements steps 1 - 5.

In [1]:
function get_special_paths(dir)
    """Prints the absolute paths of each of the 'special' files in 
    the directory """
    filenames = readdir(expanduser(dir));
    special_list = matchall(r"\w+__\w+__.\w+", join(filenames, " "));
    if length(special_list) == 0
        print("Special files not found in this directory", "\n")
    else
        print("Special files found in this directory", "\n")
    end
    special_paths = ASCIIString[];
    for file in special_list;
        push!(special_paths, abspath(file));
        print(abspath(file), "\n")
    end
    return special_paths
end

get_special_paths (generic function with 1 method)

Let's break down the above function:  

First, the functions is defined with a single argument - the name of the directory to search in. Next, the `filenames` from that directory using tilde expanson (`expanduser`) and the `readdir` function.  After that, an `if-else` statement is used to determine if any matches were found and if not, says so. Lastly, a new empty `Array` is created and the names of the special files are appended to that array and the absolute path names are then printed. The function returns the array with the absolute paths of the special files.  

Now to test the results:

In [2]:
get_special_paths("~/GitHub/google-python-julia/copyspecial");

Special files found in this directory
/Users/julian/GitHub/google-python-julia/copyspecial/xyz__hello__.txt
/Users/julian/GitHub/google-python-julia/copyspecial/zz__something__.jpg


In [3]:
get_special_paths("~/GitHub/model-comparison-r-julia");

Special files not found in this directory


## Part B: copying files
The next step is to create a function to copy the files. This function will take two directories as arguments, a source and a target, and then copy the files from source to target. If the target directory does not exist, then it will be created.

In [4]:
function copy_to(source_dir, target_dir)
    """ Copies file_list from source_dir into target_dir
    If target_dir does not exist, it is created with a warning.
    Calls get_special_path() to get source directory and then proceeds."""
    special_paths = get_special_paths(source_dir);
    target_path = expanduser(target_dir);
    if isdir(expanduser(target_dir)) == true
        for dir_file in special_paths
            run(`cp $dir_file $target_path`)
        end
    else
        mkpath(target_path)
        print("Target did not exist, so it was created prior to copying.", "\n")
        for dir_file in special_paths
            run(`cp $dir_file $target_path`)
        end
    end
end


copy_to (generic function with 1 method)

Breaking down the above function:  

First call is to the `get_special_paths` function to get the full paths of the files specified. Next in an `if-else` statement, the existence of the `target_dir` is tested. If it evaluates to `true`, then for each file in `special_paths`, it is copied into to the target directory. If it evaluates to `false`, then the directory is created, a message is printed saying the directory was created and the files are copied.  

I used the ability of Julia to call shell commands (the `run` function) since I kept getting an `EISDIR` error if I called the built-in `cp` function. The `$` symbol allows for calling of the variables that contain the strings. This might be handled better in release 0.4.0 with the `recursive` flag).  

Now to test out the function:

In [5]:
copy_to("~/GitHub/google-python-julia/copyspecial", "~/GitHub/google-python-julia/copyspecial/test-dir")

Special files found in this directory
/Users/julian/GitHub/google-python-julia/copyspecial/xyz__hello__.txt
/Users/julian/GitHub/google-python-julia/copyspecial/zz__something__.jpg


In [6]:
test_location  = expanduser("~/GitHub/google-python-julia/copyspecial/test-dir");
run(`ls $test_location`)

test-zip.zip
xyz__hello__.txt
zz__something__.jpg


The next step is to create a function that takes the files and puts them in a ZIP archive if a specific flag is present.

## Part C: creating a ZIP archive
The next step is to create a function that creates a ZIP archive. While it is possible to do this using the [ZipFile](https://zipfilejl.readthedocs.org/en/latest/) package, the goal was to use base Julia to implement the commands as much as possible. The goal here is to use shell commands called from Julia in order to create the archive.  

The function structure looks like this:  
1. call `get_special_paths` to get the file names  
2. call the `zip` command using the `run` function  
3. print an error if the archive cannot be created

In [7]:
function zip_to(source_dir, zip_file)
    """ create ZIP archive of the special files.
    Gives zip -j error message if command fails.
    Calls get_special_paths() first to get list of files."""
    special_paths = get_special_paths(source_dir);
    zip_path = expanduser(zip_file)
    try run(`zip -j $zip_path $special_paths`)
        print("Creating ZIP archive")
    catch err
        showerror(STDOUT, err, backtrace())
    end
end

zip_to (generic function with 1 method)

Time to parse the function again. First `get_special_paths` is called to get the file names. Second, the path of the ZIP archive is expanded. Then a `try-catch` statement is used to create the archive and gives the error message if there is one.  Time to see this in action:

In [8]:
zip_to("~/GitHub/google-python-julia/copyspecial", "~/GitHub/google-python-julia/copyspecial/test-dir/test-zip.zip")

Special files found in this directory
/Users/julian/GitHub/google-python-julia/copyspecial/xyz__hello__.txt
/Users/julian/GitHub/google-python-julia/copyspecial/zz__something__.jpg
updating: xyz__hello__.txt (deflated 9%)
updating: zz__something__.jpg (deflated 3%)
Creating ZIP archive

In [9]:
run(`ls $test_location`)

test-zip.zip
xyz__hello__.txt
zz__something__.jpg


## Specifying argument structure
The last thing to do is to specify how the pregram/script is to behave, based on the presence or absence of flags. The usage looks like this:  

`julia copyspecial.jl --todir dir source`  

-or-  

`julia copyspecial.jl --tozip zip source`.  


If arguments are misspecified, then the script exits (not executed).

In [None]:
if ARGS == false
    print("usage: [--todir dir] source or [--tozip zip] source")
    exit(0)
end

Now to specify what happens when the different flags are present (not executed).

In [None]:
directory = ARGS[3]

if ARGS[1] == "--todir"
    todir = true
end

if ARGS[1] == "--tozip"
    tozip = true
end

if todir == true
    file_paths = get_special_paths(directory);
    copy_to(directory, ARGS[2])
elseif tozip == true
    file_paths = get_special_paths(ARGS[3]);
    zip_to(directory, ARGS[2])
else
    print(join(file_paths, "\n"))
end


Source code is in the directory.

# References
Julia by example. Used the code for error handling to print out and trace the error. http://samuelcolvin.github.io/JuliaByExample/

In [11]:
versioninfo()

Julia Version 0.3.8
Commit 79599ad (2015-04-30 23:40 UTC)
Platform Info:
  System: Darwin (x86_64-apple-darwin13.4.0)
  CPU: Intel(R) Core(TM) i5-3210M CPU @ 2.50GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.3
