# Hands-on Practice Writing DRY Code

**Author:** John Bryan Curtis

**Date:** August 20, 2018

## Import Python Packages

In the questions below, you will be working with **numpy arrays** and **pandas dataframes**.

You will also be downloading files using **urllib.request** and accessing directories and files on your computer using **os**. Last, you will also be creating **plots** of your data.

In [27]:
# import the necessary Python packages
import os # package to access local directory and set working directory
import glob # package for creating lists of directory and file names
import urllib.request # package to download data from url
import numpy as np # package to work with numpy arrays
import pandas as pd # package to work with pandas dataframes
import matplotlib.pyplot as plt # plotting package

# print message after packages imported successfully
print("import of packages successful")

import of packages successful


## Get Data

Numpys:

1. monthly-precip-1988-to-1992.csv from https://ndownloader.figshare.com/files/12807380

2. monthly-precip-1993-to-1997.csv from https://ndownloader.figshare.com/files/12807383

Pandas:

1. temp-1991-to-1995-months.csv from https://ndownloader.figshare.com/files/12807389

2. temp-1996-to-2000-months.csv from https://ndownloader.figshare.com/files/12807386

In [28]:
# set the working directory to the `earth-analytics-bootcamp` directory
os.chdir("/Users/JBC/earth-analytics-bootcamp/")

In [29]:
# download .csv containing monthly precip between 1988 and 1992
urllib.request.urlretrieve(url = "https://ndownloader.figshare.com/files/12807380", 
                           filename = "data/monthly-precip-1988-to-1992.csv")

# download .csv containing monthly precip between 1993 and 1997
urllib.request.urlretrieve(url = "https://ndownloader.figshare.com/files/12807383", 
                           filename = "data/monthly-precip-1993-to-1997.csv")

# download .csv containing monthly temperature between 1991 and 1995
urllib.request.urlretrieve(url = "https://ndownloader.figshare.com/files/12807389", 
                           filename = "data/temp-1991-to-1995-months.csv")

# download .csv containing monthly temperature between 1996 and 2000
urllib.request.urlretrieve(url = "https://ndownloader.figshare.com/files/12807386", 
                           filename = "data/temp-1996-to-2000-months.csv")

# print message that data downloads were successful
print("datasets downloaded successfully")

# check that files are in 'data' folder
os.listdir("/Users/JBC/earth-analytics-bootcamp/data/")


datasets downloaded successfully


['avg-monthly-precip.txt',
 'avg-monthly-temp.txt',
 'avg-precip-months-seasons.csv',
 'avg-temp-months-seasons.csv',
 'boulder-precip-1996-to-2006-months.csv',
 'boulder-precip-2007-to-2017-months-seasons.csv',
 'boulder-temp-2004-to-2009.csv',
 'boulder-temp-2010-to-2014.csv',
 'boulder-temp-2015.txt',
 'boulder-temp-2016.txt',
 'boulder-temp-2017.txt',
 'monthly-precip-1988-to-1992.csv',
 'monthly-precip-1993-to-1997.csv',
 'monthly-precip-2002-2013.csv',
 'months.txt',
 'precip-2002-2013-months-seasons.csv',
 'snow-2007-to-2017-months-seasons.csv',
 'snow-2007-to-2017.csv',
 'temp-1991-to-1995-months.csv',
 'temp-1996-to-2000-months.csv']

In [30]:
# import precip data into new numpy arrays

precip_1988_to_1992 = np.loadtxt(fname="/Users/JBC/earth-analytics-bootcamp/data/monthly-precip-1988-to-1992.csv", delimiter=",")

precip_1993_to_1997 = np.loadtxt(fname="/Users/JBC/earth-analytics-bootcamp/data/monthly-precip-1993-to-1997.csv", delimiter=",")


print(precip_1988_to_1992)
print("")
print(precip_1993_to_1997)

[[0.4  1.14 2.53 1.48 3.7  0.7  0.71 1.33 2.02 0.03 0.75 2.16]
 [1.19 1.27 0.97 1.95 2.68 2.93 1.43 1.63 3.54 1.4  0.09 1.54]
 [1.04 1.32 4.55 2.16 1.73 0.39 4.23 1.13 1.84 0.96 1.6  0.75]
 [1.05 0.15 0.43 2.41 2.9  3.59 3.11 2.08 1.21 0.93 3.3  0.01]
 [0.67 0.   5.17 0.46 1.7  0.96 1.13 3.08 0.02 0.79 2.56 0.84]]

[[0.25 0.9  2.15 2.56 1.73 3.38 1.4  1.04 3.32 2.42 2.17 0.55]
 [0.86 1.37 1.61 3.46 1.35 0.93 0.35 2.56 0.54 1.02 2.25 0.49]
 [0.64 1.53 1.21 5.45 9.59 4.03 0.72 1.45 2.96 0.59 1.51 0.25]
 [1.89 0.29 2.16 1.49 4.63 2.77 1.96 0.63 3.48 0.28 1.43 0.37]
 [0.87 1.83 0.91 5.77 2.19 3.69 1.14 5.27 1.92 2.7  1.52 0.68]]


In [31]:
# import temp data into new pandas dataframes

temp_1991_to_1995 = pd.read_csv("/Users/JBC/earth-analytics-bootcamp/data/temp-1991-to-1995-months.csv")

temp_1991_to_1995

Unnamed: 0,Year,January,February,March,April,May,June,July,August,September,October,November,December
0,1991,29.9,40.9,42.8,47.8,58.2,66.6,70.5,69.2,61.7,52.1,36.8,35.3
1,1992,35.9,40.6,43.3,54.3,59.1,62.9,68.3,66.3,64.4,54.1,34.1,29.2
2,1993,28.3,30.6,42.4,47.6,57.5,64.5,69.5,67.3,58.8,48.7,35.6,35.4
3,1994,35.5,31.9,43.9,47.6,60.8,70.0,71.2,70.9,65.0,50.6,36.6,36.1
4,1995,34.5,38.3,42.1,45.1,50.9,62.4,70.5,74.0,60.4,50.5,45.0,36.3


In [32]:
# import temp data into new pandas dataframes

temp_1996_to_2000 = pd.read_csv("/Users/JBC/earth-analytics-bootcamp/data/temp-1996-to-2000-months.csv")

temp_1996_to_2000

Unnamed: 0,Year,January,February,March,April,May,June,July,August,September,October,November,December
0,1996,29.7,37.7,37.9,50.4,58.9,66.9,71.5,69.5,60.8,53.1,40.6,36.5
1,1997,31.3,32.8,45.5,42.8,57.4,66.5,71.4,68.7,64.0,52.7,37.9,33.9
2,1998,36.5,36.4,38.7,46.5,58.8,62.1,72.8,70.7,67.1,50.4,44.0,32.2
3,1999,36.4,42.1,46.0,44.5,55.6,64.8,73.5,69.3,58.5,51.9,48.0,36.9
4,2000,36.4,41.0,42.9,51.2,61.0,67.4,74.7,73.0,63.1,49.6,31.4,31.2


## Question 1: Use Indexing to Select from Numpy Array

Select the second row of data (including all columns) from the numpy array containing the data for 1988 to 1992, and save to a new numpy array

In [33]:
precip_1988 = precip_1988_to_1992[1:2,]

precip_1988

array([[1.19, 1.27, 0.97, 1.95, 2.68, 2.93, 1.43, 1.63, 3.54, 1.4 , 0.09,
        1.54]])

## Question 2: Write a Conditional Statement to Check Dimensions of Numpy Array

Write a conditional statement that checks whether the numpy array created in the previous question (i.e. the selection) is a one-dimensional numpy array

In [34]:
if precip_1988.ndim == 1:
    print("precip_1988 is a one-dimensional array")
    
else:
    print("precip_1988 is NOT one-dimensional array")

precip_1988 is NOT one-dimensional array


## Question 3: Expand Conditional Statement to Execute Different Code

Modify your conditional statement from the previous question, so that your if and else statements execute different code, not just printing messages.

For the if statement, rather than printing a message, print the shape of the numpy array from the previous question (i.e. the selection).

For the else statement, rather than printing a message, include the following code lines to be executed (i.e. if the array is not one-dimensional):

arrayname_1d = arrayname.flatten()

print(arrayname_1d.shape)

In [35]:
if precip_1988.ndim == 1:
    print(precip_1988.shape)
    
else:
    precip_1988_1d = precip_1988.flatten()
    print(precip_1988_1d.shape)
    print("Dimensions of precip_1988_1d is", precip_1988_1d.ndim, "dimension")

(12,)
Dimensions of precip_1988_1d is 1 dimension


## Question 4: Write a Conditional Statement to Check Dimensions of Two Numpy Arrays

Manually create a one-dimensional numpy array that contains the month names (i.e. January to December).

Write a conditional statement to check that this new array for month names has the same shape as the numpy array from the previous question (i.e. the selection).

In [36]:
months_np = np.array(['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'])

months_np

array(['January', 'February', 'March', 'April', 'May', 'June', 'July',
       'August', 'September', 'October', 'November', 'December'],
      dtype='<U9')

In [37]:
if months_np.shape == precip_1988_1d.shape:
    
    print("These arrays have the same shape and can be plotted together")
    
else:
    print("These arrays do NOT have the same shape and can be plotted together")

These arrays have the same shape and can be plotted together


## Question 5: Practice Pseudo Coding

**Reflect on your conditional statement from the previous question.**

**Write a sentence or two on how you could expand on your conditional statement from the previous question to create a plot from the two numpy arrays if they do indeed have the same shape.**

**Hint: what did you do in Question 3 to expand on your conditional statement?**

**A:** The *goal* is to create a plot of precipitation by month only if the precip and month arrays are the same shape. I would *code* this with a conditional statement that would plot precip by month if the arrays were the same shape and not plot if the arrays were a different shape. The plt functions would be called in the True space. 

## Question 6: Loop on Pandas Dataframes

Write a loop to run the info() method on the two pandas dataframes that you imported in this activity, and print the results.

In [38]:
# create a list of the pandas dataframes
pandas_list = [temp_1991_to_1995 , temp_1996_to_2000]

for i in pandas_list:
    
    print(i.info())
    print("")

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 13 columns):
Year         5 non-null int64
January      5 non-null float64
February     5 non-null float64
March        5 non-null float64
April        5 non-null float64
May          5 non-null float64
June         5 non-null float64
July         5 non-null float64
August       5 non-null float64
September    5 non-null float64
October      5 non-null float64
November     5 non-null float64
December     5 non-null float64
dtypes: float64(12), int64(1)
memory usage: 600.0 bytes
None

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 13 columns):
Year         5 non-null int64
January      5 non-null float64
February     5 non-null float64
March        5 non-null float64
April        5 non-null float64
May          5 non-null float64
June         5 non-null float64
July         5 non-null float64
August       5 non-null float64
September    5 non-null float64
Octo

## Question 7: Loop on Columns in Pandas Dataframes

Write a loop to run the .describe() method on each column in the pandas dataframe containing the data for 1996 to 2000).

In [39]:
# Set year as the index for temp_1996_to_2000 (only descibe month columns below)

temp_1996_to_2000 = temp_1996_to_2000.set_index("Year")

# Create a list of column names for the 1996 to 2000 temperature data (pandas dataframe)

months = list(temp_1996_to_2000)

# For loop to calculate summary stats for each month (column) across multiple years (row)

for i in months:
    
    print(temp_1996_to_2000[[i]].describe())
    #print("")
    

         January
count   5.000000
mean   34.060000
std     3.298939
min    29.700000
25%    31.300000
50%    36.400000
75%    36.400000
max    36.500000
        February
count   5.000000
mean   38.000000
std     3.724916
min    32.800000
25%    36.400000
50%    37.700000
75%    41.000000
max    42.100000
           March
count   5.000000
mean   42.200000
std     3.760319
min    37.900000
25%    38.700000
50%    42.900000
75%    45.500000
max    46.000000
           April
count   5.000000
mean   47.080000
std     3.650616
min    42.800000
25%    44.500000
50%    46.500000
75%    50.400000
max    51.200000
           May
count   5.0000
mean   58.3400
std     1.9995
min    55.6000
25%    57.4000
50%    58.8000
75%    58.9000
max    61.0000
            June
count   5.000000
mean   65.540000
std     2.157081
min    62.100000
25%    64.800000
50%    66.500000
75%    66.900000
max    67.400000
            July
count   5.000000
mean   72.780000
std     1.391761
min    71.400000
25%    71.50000

## Question 8: Write Function to Summarize Numpy Array Using Axes

Write a function that calculates the mean across columns of a numpy array.

In [40]:
# function to calculate mean for each month (column) of precip in 1993_to_1997 numpy array

def np_mean(array):
    
    # calculate mean of each column in two-dimensional numpy array
    # function can take a numpy array as input
    # function can not take list or pandas dataframe as input
    
    mean_column = np.mean(array, axis = 0)
    return(mean_column)
    

In [41]:
# call the mean function with precip_1993_1997 array and create new array from output
year_mean_precip_1993_1997 = np.array(np_mean(precip_1993_to_1997))

# print data in year_mean_precip_1993_1997
year_mean_precip_1993_1997

array([0.902, 1.184, 1.608, 3.746, 3.898, 2.96 , 1.114, 2.19 , 2.444,
       1.402, 1.776, 0.468])

## Question 10: Practice Pseudo Coding

**You have already learned how to save the output from one run of a function (see Question 9). What if you wanted to run the function on multiple numpy arrays?**

**Write a sentence or two on what you would need to know how to do, in order to save the output from a function that is running on multiple arrays in a loop.**

**Hint: think about how you can append values to a list using a loop (i.e. create an empty list that gets values appended to it in the loop).**


**A:** The *goal* here is calculate the yearly mean precipitation from multiple numpy arrays. *To accomplish this*, I would need to know how to combine multiple arrays, call a function within a loop and then print the combine output of each function call. This would require creating an empty list to store the loop output that would allow each function output to be appended to the last.