# Data workflows and automation

## For loops

Loops allow us to repeat a workflow (or series of actions) a given number of
times or while some condition is true. We would use a loop to automatically
process data that's stored in multiple files (daily values with one file per
year, for example). Loops lighten our work load by performing repeated tasks
without our direct involvement and make it less likely that we'll introduce
errors by making mistakes while processing each file by hand.

> ## Challenge - Loops
>
> 1. What happens if we don't include the `pass` statement?
>
> 2. Rewrite the loop so that the animals are separated by commas, not new lines
> (Hint: You can concatenate strings using a plus sign. For example,
> `print(string1 + string2)` outputs 'string1string2').



## Automating data processing using For Loops

> ## Challenge - Modifying loops
>
> 1. Some of the surveys you saved are missing data (they have null values that
> show up as NaN - Not A Number - in the DataFrames and do not show up in the text
> files). Modify the for loop so that the entries with null values are not
> included in the yearly files.
>
> 2. Let's say you only want to look at data from a given multiple of years. How would you modify your loop in order to generate a data file for only every 5th year, starting from 1977?
>
> 3. Instead of splitting out the data by years, a colleague wants to do analyses each species separately. How would you write a unique csv file for each species?

## Building reusable and modular code with functions
Functions are declared following this general structure:

In [None]:
def this_is_the_function_name(input_argument1, input_argument2):
    # The body of the function is indented
    # This function prints the two arguments to screen
    print ("The function arguments are:", input_argument1, input_argument2, "this is done inside the function")
    
    # And returns their product
    return input_argument1 * input_argument2

> ## Challenge - Functions
>
> 1. Change the values of the arguments in the function and check its output
> 2. Try calling the function by giving it the wrong number of arguments (not 2)
>   or not assigning the function call to a variable (no `product_of_inputs =`)
> 3. Declare a variable inside the function and test to see where it exists (Hint:
>   can you print it from outside the function?)
> 4. Explore what happens when a variable both inside and outside the function
>   have the same name. What happens to the global variable when you change the
>   value of the local variable?

> ## Challenge- More functions
>
> 1. Add two arguments to the functions we wrote that take the path of the
>    directory where the files will be written and the root of the file name.
>    Create a new set of files with a different name in a different directory.
> 2. How could you use the function `yearly_data_csv_writer` to create a csv file
>    for only one year? (Hint: think about the syntax for `range`)
> 3. Make the functions return a list of the files they have written. There are
>    many ways you can do this (and you should try them all!): either of the
>    functions can print to screen, either can use a return statement to give back
>    numbers or strings to their function call, or you can use some combination of
>    the two. You could also try using the `os` library to list the contents of
>    directories.
> 4. Explore what happens when variables are declared inside each of the functions
>    versus in the main (non-indented) body of your code. What is the scope of the
>    variables (where are they visible)? What happens when they have the same name
>   but are given different values?

> ## Challenge - Variables
>
> 1. What type of object corresponds to a variable declared as `None`? (Hint:
> create a variable set to `None` and use the function `type()`)
>
> 2. Compare the behavior of the function `yearly_data_arg_test` when the
> arguments have `None` as a default and when they do not have default values.
>
> 3. What happens if you only include a value for `start_year` in the function
> call? Can you write the function call with only a value for `end_year`? (Hint:
> think about how the function must be assigning values to each of the arguments -
> this is related to the need to put the arguments without default values before
> those with default values in the function definition!)


## If Statements

The body of the test function now has two conditionals (if statements) that
check the values of `start_year` and `end_year`. If statements execute a segment
of code when some condition is met. They commonly look something like this:

In [None]:
a = 5
if a<0:  # Meets first condition?
    # if a IS less than zero
    print('a is a negative number')
elif a>0:  # Did not meet first condition. meets second condition?
    # if a ISN'T less than zero and IS more than zero
    print('a is a positive number')
else:  # Met neither condition
    # if a ISN'T less than zero and ISN'T more than zero
    print('a must be zero!')

> ## Challenge - Modifying functions
>
> 1. Rewrite the `one_year_csv_writer` and `yearly_data_csv_writer` functions to
> have keyword arguments with default values
>
> 2. Modify the functions so that they don't create yearly files if there is no
> data for a given year and display an alert to the user (Hint: use conditional
> statements to do this. For an extra challenge, use `try`
> statements!)
>
> 3. The code below checks to see whether a directory exists and creates one if it
> doesn't. Add some code to your function that writes out the CSV files, to check
> for a directory to write to.
>
> ```Python
>	if 'dir_name_here' in os.listdir('.'):
>	    print('Processed directory exists')
>	else:
>	    os.mkdir('dir_name_here')
>	    print('Processed directory created')
> ```
>
> 4. The code that you have written so far to loop through the years is good,
> however it is not necessarily reproducible with different datasets.
> For instance, what happens to the code if we have additional years of data
> in our CSV files? Using the tools that you learned in the previous activities,
> make a list of all years represented in the data. Then create a loop to process
> your data, that begins at the earliest year and ends at the latest year using
> that list.
>
> HINT: you can create a loop with a list as follows: `for years in year_list:`