# Lab 2

## Lab purpose
This lab takes a step back from working with and within ArcGIS to explore Python as a language in and of itself. In this lab, you’ll be asked to solve a variety of problems using Python. Some of these problems will have immediate applied uses, while others will simply be asking you to think computationally – about what you can and cannot solve using Python and how it might be used in a variety of generalized tasks. You will also gain familiarity with the specific syntax of the Python language.

There are a wide variety of articles, guides, tutorials, and reference materials available on the Python language. You’re encouraged to read through many of these and refer to them when you run into difficulty. Make sure you understand why any solution works or you will run into significant difficulties later.

You will turn in a series of python scripts that you create to solve each problem. You will host them [in the repository here](https://github.com/UWTMGIS/TGIS501_W18/tree/master/lab2). If you are using a notebook, you will enter all of your code here and upload a single .ipynb file with the format LastName.ipynb. If you are uploading individual scripts, they will be LastName_ProblemNumber.py.

**It is strongly recommended you work through the entirety of Exercise 4 from the Scripting… book before attempting these problems. This lab may be completed using either python 2 or 3.**

## Problem 1: The Trouble With Turtles

Depending on your age, you may remember playing with a Turtle drawing program in elementary school. The history behind the Turtle is a bit longer (dating back to the 1960s) and “Turtle Graphics” generally refers to a means of drawing vector graphics on a Cartesian plane. You draw by moving an imaginary turtle around the screen. Each turtle (and there can be more than one) has a location, a pen, and an orientation. With simple commands you can draw truly complex shapes.

Follow [this link](http://openbookproject.net/thinkcs/python/english3e/hello_little_turtles.html) and work up through section 3.6 (there are challenge problems below you can do for fun, but perhaps come back to them later).


### Question 1

**Write a script that _asks the user to input a number and then draws a shape with that number of sides_**

If you need help on getting keyboard input in python, take a look [here](http://www.python-course.eu/input.php)

If you are using a notebook for this assignment, you can enter your code in the cell below, otherwise write your own script.

In [None]:
# import the turtle module
import turtle # import the turtle module

# check user input is valid
def checkValue(val):
    # Note: large values *do* work, but go way off screen, 
    # since a fixed length is used for the sides
    if val.isdigit(): # check it can be changed to int
        if 0 < int(val) <= 360: # check it's positive and not too large
            return True
        else:
            return False
    else:
        return False

# draw shape with N sides
def drawNSides(sides):
    angle = 360/sides # the angle to turn alex
    length = 50 # length of segment
    
    # setup screen
    wn = turtle.Screen()
    wn.bgcolor("purple")
    
    # setup turtle
    alex = turtle.Turtle()
    alex.shape("turtle")
    alex.color("yellow")
    
    # loop to create each side
    alex.forward(length) # create first segment then enter loop
    i = 1 # start at 1 since first segment is already created
    while i < sides:
        alex.left(angle)
        alex.forward(length)
        i += 1
        
    # Wait for user to close window.
    # Note: doesn't seem to exit gracefully in jupyter, 
    # must restart kernal before running again   
    wn.exitonclick() 
    
# main program with recursive loop
def drawShapeMain():
    # get number of sides from user
    get_sides = input("\nDraw a shape with 'N' sides, where 'N' is a positive integer (360 max)." +
                      "\nHow many sides do you want? ") 
    
    # run check
    check = checkValue(get_sides)
    
    # draw if check passes, recurse if not
    if check == True:
        drawNSides(int(get_sides))
    else:
        print("\nInvalid value entered.")
        drawShapeMain()

# run the program
drawShapeMain()

## Problem 2: Is GIS the best, and the looming horrors of New England

Problem 2 will have you working with text files to manipulate and analyze them. Here, we'll be working with plain text files and doing some very rudimentary forms of analysis. We'll return to these ideas later in the course using a more sophisticated approach (natural language processing), but for now we're focusing on the basics of opening files, manipulating data, and using flow control to iterate across datasets. 



### Question 3

You have written a long, beautiful ode to GIS and called it GIS_is_the_best.txt. Wow, you're very proud of yourself.
Find this file, which you have written [in this repository](https://github.com/UWTMGIS/TGIS501_Files).

You can either read it directly from the web (which we'll cover in a later lab, but you can likely figure out now if you wish) or just clone/download it somewhere local. 

A couple of hints before we go one:
1. Check out the methods .upper() and .lower()
2. The assigned readings for this week covered opening files. Exercise 4 (recommended above) discusses counting words. You can also check out a quick tutorial on text files [here](http://opentechschool.github.io/python-data-intro/core/text-files.html).


In the next cell (or in your own script), write a script that prints the total number of words in the document.


In [124]:
# import the regular expressions module because good programmers are lazy
import re 

# look for one or more alpha characters,
# Not robust, but the text file is not complex
pattern = re.compile("[a-zA-Z]+")

# open the text file
ode = open("GIS_is_the_best.txt")

# use re to split text on whitespace characters (might be cleaner than using str.split()?)
odeList = re.split("\s", ode.read())

# convert to list, only add value if it matches the re pattern
wordList = [item for item in odeList if pattern.match(item) != None] 

# print the number of items in the list
print("Word Count: ", len(wordList))

Word Count:  28177


**That was pretty fun!"

But, don't worry, we have something even more exciting in store. It turns out that you've recently become enamored with the unspeakable horrors and non-Euclidean geometries found in H.P. Lovecraft's prose. You've decided you want to investigate his work _The Shunned House_ to find two things: How many unique words he uses and how many times he uses the word "uncle".

You can find the text file for the shunned house in the same directory as your ode to GIS, [here](http://opentechschool.github.io/python-data-intro/core/text-files.html).

As a hint, you make want to look into the [collections module](https://docs.python.org/2/library/collections.html#collections)


### Question 4

How many unique words does Lovecraft use in _The Shunned House_?

Like before, case does not matter; so, "whisker" and "Whisker" would be the same. Make sure you strip out punctuation, by the by, otherwise you might end up with "whisker." "whisker?" and "whisker" as separate words!

In [112]:
# import the regular expressions
import re 

# Closest I could get for a word pattern. Accounts for underscores and hyphenated words.
# Thankfully, he only seems to use apostrophes for possessive case, so I didn't worry about
# those vs contractions vs funky abbreviations like 'til
# I used grouping as a (possibly hacky) way to get the words starting with underscores
# without including the underscore.
pattern = re.compile(r"[_]?([A-Z]+[-]?(?=[A-Z])[A-Z]*)")

# open the text file
shunned = open("shunned_house.txt")

# make all letters uppercase and split the file into a list
rawList = re.split(r"\s", shunned.read().upper())

# add 'item' to the list if it matches the pattern, make a 'set' in order to remove duplicates, 
# sort the set (optional, but nice for printing it out)
wordList = sorted(
            set(
                [pattern.match(item).group(1) for item in rawList if pattern.match(item) != None]
            )
        )

# print the number of items in the list
print("Unique word count: ", len(wordList), "\n\nWords:")
for word in wordList:
    print(word)

Unique word count:  2923 

Words:
ABHORRENT
ABIGAIL
ABNORMAL
ABNORMALITY
ABODE
ABOUT
ABOVE
ABROAD
ABRUPTLY
ABSENT
ABSURD
ABSURDITY
ABUTTING
ABYSMAL
ACCEPTED
ACCESS
ACCOMPANIED
ACCORDING
ACCOUNT
ACCOUNTS
ACCUMULATE
ACCURATELY
ACCURSED
ACID
ACROSS
ACTION
ACTIVE
ACTIVELY
ACTUAL
ACTUALLY
ADAPTED
ADDED
ADDITION
ADDITIONAL
ADDITIONS
ADULT
ADVANCE
ADVENT
AFFAIR
AFFECTED
AFFECTION
AFFINITY
AFFIRMATION
AFFORD
AFRAID
AFTER
AFTERNOON
AFTERNOONS
AFTERWARD
AGAIN
AGAINST
AGE
AGES
AGGRESSIVE
AGITATION
AGO
AGREED
AGRICULTURE
AHEAD
AID
AIMLESSLY
AIR
AJAR
ALARMINGLY
ALERT
ALICE
ALIEN
ALIENAGE
ALIVE
ALL
ALLAN
ALLEGED
ALLEGING
ALLOTTED
ALLOWED
ALLUDING
ALLUSION
ALLUSIONS
ALMOST
ALONE
ALONG
ALSO
AM
AMAZON
AMIABLE
AMIDST
AMONG
AMONGST
AMOUNT
AN
ANCIENT
AND
ANDIRONS
ANDROS
ANEMIA
ANGELL
ANGRY
ANN
ANNALS
ANOMALY
ANOTHER
ANTHROPOLOGICAL
ANTHROPOMORPHIC
ANTIQUARIAN
ANTIQUATED
ANTIQUE
ANY
ANYBODY
ANYONE
ANYTHING
APARTMENT
APERTURE
APPALLINGLY
APPARATUS
APPARENT
APPARENTLY
APPEAR
APPEARED
APPLES
APPREHENSION
APPR

OTHER
OTHERS
OTHERWISE
OUR
OUT
OUTBREAK
OUTER
OUTLINES
OUTRANKS
OUTSIDE
OVEN
OVER
OVERHEAD
OVERMANTEL
OVERRUN
OVERWHELMING
OWN
OWNED
OWNER
OWNERS
PAGEANT
PAID
PAIN
PAINTING
PAIR
PALE
PALLOR
PALPABLY
PANELLING
PANES
PANICS
PAPER
PAPERS
PARAPHERNALIA
PARCHED
PARDON
PARIS
PARLIAMENT
PARODIES
PART
PARTIAL
PARTICULAR
PARTICULARLY
PARTLY
PARTS
PASS
PASSAGES
PASSED
PASSERS-BY
PASSING
PAST
PATCH
PATCHES
PATH
PATHETIC
PATIENT
PATTED
PATTERN
PATTERNS
PAUL
PAWTUCKET
PEACE
PEAKED-ROOF
PECULIAR
PECULIARLY
PEDESTRIANS
PEDIMENT
PEELING
PELEG
PENETRATE
PENETRATES
PENETRATING
PEOPLE
PERCEIVED
PERCEPTIBLY
PERCHED
PERHAPS
PERIOD
PERIODS
PERISH
PERISHED
PERMANENT
PERMANENTLY
PERMISSION
PERNICIOUSLY
PERSISTENT
PERSON
PERSONAL
PERSONS
PERSPIRATION
PERTINENT
PERTURBATION
PERTURBING
PERVASIVE
PEST
PESTERING
PHANTASMAL
PHASES
PHEBE
PHENOMENALLY
PHOSPHORESCENCE
PHOSPHORESCENT
PHOTOGRAPHED
PHRASES
PHYSICAL
PHYSICIAN
PICKAX
PICTURES
PICTURESQUE
PICTURESQUENESS
PIECE
PIERCE
PIERCED
PILASTERS
PIT
PLACE
PLACED
PLACE

### Question 5

How many times does Lovecraft use the word "uncle" - again, case does not matter and make sure you strip punctuation.

In [116]:
# import regular expressions module
import re

# set pattern to match uppercase uncle
pattern = re.compile('UNCLE')

# open the text file
shunned = open("shunned_house.txt") 

# create a variable to hold the count
uncles = 0 

# use re to split on non-word characters, which also simplifies the pattern expression
wordList = re.split(r"\W+", shunned.read().upper())

# check if the word matches, and if so, increment the counter
for item in wordList:
    # print(item, pattern.match(item)) # test output to look for anomalies
    if pattern.match(item):
        uncles += 1

# print the final count
print("Count: ", uncles)

Count:  39


### Bonus Questions (+1 pt each)

These questions are _meant_ to be hard. I will only give you limited help with these. You will find some questions harder than others, you will find some questions more interesting than others. All of the questions are possible.

#### Bonus Question 1
Excluding prepositions and articles ("from", "the", "an", "with", etc.) - what are the five most frequently used words in _The Shunned House_? 

In [115]:
import re # import regular expressions module
from collections import Counter

# create pattern for matching
pattern = re.compile(r"[_]?([A-Z]+[-]?(?=[A-Z])[A-Z]*)")
wordCounter = Counter() # instantiate a counter object

# create list from text file
shunnedText = open("shunned_house.txt")
shunned = re.split("\s", shunnedText.read().upper())

# create list of 'banned' words from text file
# that lists articles and prepositions
bannedText = open("banned_words.txt")
banned = bannedText.read().upper().split()

for word in shunned:
    # run pattern match
    m = pattern.match(word)
    
    # if there is a match AND it's not in the banned word list, 
    # increment the counter for the word
    if m and word not in banned: 
        wordCounter[m.group(1)] += 1
        
print("{:<10}{:<10}".format("Word","Count")) # heading
for a, b in wordCounter.most_common(5):
    print("{:<10}{:<10}".format(a, b)) # use format() to print in columns! (first time using it this way!)

Word      Count     
AND       435       
WAS       153       
HAD       133       
THAT      133       
IT        123       


#### Bonus Question 2
Write a script that asks for number, checks to make sure that it is in fact a number, and then finds said number’s square root (within an error of .0001). __Do not use any build in commands that find square roots (such as sqrt() or x**(1/2). You must build the script using only multiplication, division, addition, and subtraction (you may also use absolute value).__

Pay attention to how many iterations it takes to solve, try to minimize it (By the way, there is a 'best' solution here, try not to look it up).

Once you lab is done - either as a series of separate scripts with names of the form LastName_QuestionNumber.py or within this document itself (a .ipynb file), upload them to the [lab2](https://github.com/UWTMGIS/TGIS501_W18/tree/master/lab2) area of the repository.