# Lecture 4

## Review
* Using `git` to submit homework
* Questions about prior homework 

## Working with files
  
* Writing and reading text files
  * `open` 
  * writing lines into a text file
  * using the `with` statment
  * `readlines`
  * iterating through a text file 
  * Reading and writing JSON files
  * downloading files from a URL

## Algorithms

* A few tricks
  * Logical short-circuiting
  * Chained comparisons
  * De Morgan's laws
  * string comparisons
  * list and tuple comparisons
  * `zip` function
  * `key` argument in `max`, `min`, and `sorted`, 
  
* Algorithm examples
  * Swapping values
  * Sieve of Eratosthenes 
  * Greatest Common Divisor 
  * Shoelace formula
  

### See also:
  1. Allen Downey's "Think Python 2" http://greenteapress.com/thinkpython2/thinkpython2.pdf:
    * Chapter 6: Fruitful functions
    * Chapter 8: Strings
    * Chapter 10: Lists
    * Chapter 11: Dictionaries
    * Chapter 12: Tuples
    * Chapter 14: Files
    
  1. Dietels' "Python for Programmers" https://www.oreilly.com/library/view/python-for-programmers/9780135231364/
    * Chapter 9: Files
    
  1. Python Tutorial:
    * Chapter 7: Input and Output https://docs.python.org/3.8/tutorial/inputoutput.html
  1. Driscol's Python 101
    * Chapter 8: Working with Files:  https://python101.pythonlibrary.org/chapter8_file_io.html
  1. Wes McKinney, Python for Data Analysis, 2nd Edition https://learning.oreilly.com/library/view/python-for-data/9781491957653/
    * Chapter 3: Built-in data types, functions, and files
    
    
### The Standard Library
Learn about the Python standard library, i.e. the collection of modules that are distributed with Python: https://docs.python.org/3/library/

For example, review the documentation for modules `math`, `statistics`, `random`, `csv`, `json`, `os`, `sys`, and `datetime`. What functionality do they provide?

Select two modules from the standard library that seem useful to you. In class, describe why you found them useful.

In addition to the standard library, over a hundred thousand other packages can be installed from the Python Package Index (PyPI) using the pip utility.

### Sample data
We will use some sample data in upcoming assignments. 
1. NY Times Covid-19 Deaths https://github.com/nytimes/covid-19-data/blob/master/us-states.csv
2. 1000 US cities https://gist.githubusercontent.com/Miserlou/c5cd8364bf9b2420bb29/raw/2bf258763cdddd704f8ffd3ea9a3e81d25e2c6f6/cities.json
3. 60,000 English words: http://www.mieliestronk.com/corncob_lowercase.txt

Feel free to propose others

### Practice 
There are many online tutorials and challenges from beginner to avanced to practice solving problems in Python and to build up skills.

For example:
* https://learnpython.org 
* Python game https://checkio.org 
* Project Euler: https://projecteuler.net - clever maths
* https://www.101computing.net

# Lecture

## Writing Text files

`open`, `close`, `write`, `with`

In [10]:
f = open('test.txt', 'wt')

In [11]:
f.write('one\n')

4

In [12]:
f.write('two\n')

4

In [13]:
f.close()

In [15]:
!cat test.txt

one
two


In [27]:
f = open('test.txt', 'wt')
f.write('one\n')
f.write('two\n')
f.close()

In [28]:
with open('test.txt', 'wt') as file:
    file.write('one\n')
    file.write('two\n')
print("I'm done")

I'm done


In [29]:
cat test.txt

one
two


## Reading Text files

`with` statement 

writing text files

checking if file exists

exception handling

### Download files for analysis

In [30]:
import urllib.request
import os

def download_file(url, filepath):
    if os.path.isfile(filepath):
        print('File already exists')
    else:
        urllib.request.urlretrieve(url, filepath)    

In [31]:
# download cities database
download_file(
    url = 'https://gist.githubusercontent.com/Miserlou/c5cd8364bf9b2420bb29/raw/2bf258763cdddd704f8ffd3ea9a3e81d25e2c6f6/cities.json',
    filepath = 'us-cities.json')

File already exists


In [34]:
# download english words
download_file(
    url = 'http://www.mieliestronk.com/corncob_lowercase.txt',
    filepath = 'english-words.txt')

File already exists


In [35]:
# download NY Covid-19 data
download_file(
    url = 'https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-states.csv',
    filepath = 'nytimes-covid-19.txt')

File already exists


In [36]:
with open('english-words.txt') as f:
    g = f.read()

In [37]:
g



### A few tricks

#### Inline conditions (ternary operator)
```python
x if cond else y
```

#### logical short-circuiting

In [None]:
def compliment(name):
    print(f"You look good, {name}!")

#### chained conditions

#### `zip` function

In [None]:
first_names = ["Alice", "Bob", "Caren", "Dave"]
last_names = ["Adler", "Briggs", "Collins", "Dawson"]

#### The `key` argument in `sorted`, `min`, and `max`.

Example: find the longest word. Or sort.

Example: find the word that has the most unique letters.  Or sort.

Example: find the longest word composed of unique letters


#### String comparisons
Inequality comparisons

#### List and tuple comparisons

Problem: determine if a word is sorted.

In [None]:
help(max)

## Algorithms

### Swapping the values of variables

In [None]:
a, b = 20, 13
...

In [None]:
print(a, b)

### Is it a prime? 
Determine if number `n` is prime: 
 - check all numbers from 2 to n-1
 - hop by 2 to square root from 3
 - hop by 6 to square root from 1 and 5

In [None]:
%%timeit

n = 10000
for i in range(2, n):
    if

### Sieve of Eratosthenes
* Generate a sequence prime numbers up to $n$
* https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes

### Greatest common divisor (GCD)
Euclidean algorithm https://en.wikipedia.org/wiki/Euclidean_algorithm

In [None]:
b, a = 30, 75

In [None]:
while b: 
    a, b = b, a % b 
    
print(a)

### Pythagorian triplets
Define the function `pythagorian_triples(n=100)` that prints all pairs of positive integer numbers $(a, b)$ such that $c=\sqrt{a^2 + b^2}$ is also a whole number and $1 \le c \le n$. These triplets represent right triangles whose sides are integers.

### Shoelace formula for Computing the Area of a Polygon
* https://en.wikipedia.org/wiki/Shoelace_formula
* https://www.101computing.net/the-shoelace-algorithm/
* https://www.youtube.com/watch?v=iKIpraBC-Nw

In [None]:
# Polygon (x,y) Coordinates  
A = [2,7]
B = [10,1]
C = [8,6]
D = [11,7]
E = [7,10]
#Define a polygon as being a list of vertices, (on anticlockwise order)
polygon = [A, B, C, D, E]  

In [None]:
polygon

In [None]:
%matplotlib inline
from matplotlib import pyplot as plt
plt.plot(*zip(*(polygon + [polygon[0]])))
plt.grid('on')
plt.axis('equal')
plt.title("What's the area of this polygon?")

In [None]:
#The Shoelace Algorithm from www.101computing.net/the-shoelace-algorithm

def polygonArea(vertices):
    # A function to apply the Shoelace algorithm
    numberOfVertices = len(vertices)
    sum1 = 0
    sum2 = 0
    for i in range(0, numberOfVertices-1):
        sum1 = sum1 + vertices[i][0] *  vertices[i+1][1]
        sum2 = sum2 + vertices[i][1] *  vertices[i+1][0]
    # Add xn.y1
    sum1 = sum1 + vertices[numberOfVertices-1][0]*vertices[0][1]
    # Add x1.yn
    sum2 = sum2 + vertices[0][0]*vertices[numberOfVertices-1][1]   
  
    area = abs(sum1 - sum2) / 2
    return area

In [None]:
polygonArea(polygon)

In [None]:
abs(sum(v[0] * u[1] - v[1] * u[0] for v, u in zip(polygon, polygon[1:]+[polygon[0]])))/2

In [None]:
#Vertices (x,y) Coordinates  
A = [2,7]
B = [10,1]
C = [8,6]
D = [11,7]
E = [7,10]
#Define a polygon as being a list of vertices, (on anticlockwise order)
polygon = [A,B,C,D,E]  

area = polygonArea(polygon)
print("Polygon Vertices:")
print(polygon)
print("")
print("Area = " + str(area) + "cm2")

# Homework

#### Problem 1. Give a random compliment

Extend the `compliment` program from the lecture to give one of several compliments given at random. You can consult with https://www.verywellmind.com/positivity-boosting-compliments-1717559 for examples. Use at least six different compliments to avoid being repetitive.

**Hint**: Check out function `random.choice` from the standard library.

#### Problem 2.
Write the function `all_equal(sequence)` that takes a sequence (`list` or `tuple` or `str`) and returns `True` if all its elements are equal and `False` otherwise.

In [None]:
def all_equal(sequence):
    ...
    
assert all_equal([1, 1.0, 1, 9/9, 1, True])
assert all_equal('aaaaaaaaaa')
assert all_equal((1,))
assert all_equal([])
assert not all_equal('aAaaaaab')
assert not all_equal(("1", 1, 2))
assert not all_equal([1, 1.001, 1, 1, 1])

#### Problem 3.
Write the function `increasing(sequence)` that takes a sequence (`list` or `tuple`) and returns `True` if each of its elements is great than the preceding element in the sequence.

In [None]:
def is_increasing(sequence):
    ...
    
assert is_increasing([])
assert is_increasing((1, 1.2, 2, 3.2, 9, 27))

assert not is_increasing((1, 1.001, 1, 1, 1))

#### Problem 4. 
Define the function `pythagorian_coprimes(n=100)` that prints all pairs of positive integer numbers $(a, b)$ such that $c=\sqrt{a^2 + b^2}$ is also a whole number and $1 \le c \le n$. Include only those triples that are co-prime (do not have any common divisors other than 1). For example, (3, 4, 5) is okay but (30, 40, 50) should be skipped.

**Help**: As a starting example, examine the function `pythagorian_triples` that yields all triples. Modify it to retain only those triplets that are co-prime, i.e. whose `gcd` is 1.

In [None]:
def pythagorian_triples(n=100):
    """
    Generate pythagorean triples: integers a, b, c such that
    a^2 + b^2 = c^2 and c <= n 
    """
    for a in range(2, n):
        for b in range(2, a):
            c = (a*a + b*b)**0.5
            if c > n:
                break
            if c == round(c):
                yield a, b, int(c)

In [None]:
for a, b, c in pythagorian_triples(20):
    print(a, b, c)

#### Problem 5.  Acronyms
Define the function `acronym` that returns a string containing the first letters of each word in its input, capitalized.

In [None]:
def acronym(string):
    ...

assert acronym("University of Saint Thomas") == "UOST"
assert acronym("The Museum of Fine Arts") == "TMOFA"

#### Problem 6. More acronyms 

Modify `acronym` to ignore the words `"of"` and `"the"`.

In [None]:
def acronym(string):
    ...


assert acronym("University of Saint Thomas") == "UST"
assert acronum("The Museum of Fine Arts") == "MFA"

#### 8. Write the code that computes the average length of words in English.
* use `english-words.txt` 
* no need to define a function

In [None]:
with open('english-words.txt') as f:
    words = f.read().split()
    
average_length = ...

print(f"The average English word length is {averange_length}.")

#### 9. Count vowels
Write the function `count_vowels` that takes a word and counts how many of its letters are one of `"aeiouy"`. We use a rather loose definition of a vowel here.

In [None]:
def count_vowels(word):
    """
    :param word: a string
    :return: number of letters that are one of "aeiouy"
    """
    ...
    
assert count_vowels('generation') == 5
assert count_vowels('style') == 2

#### 9. Most vowels
Write the code to find the english word with the most vowels:
* Use `english-words.txt` 
* Use your function `count_vowels`
* No need to define a function

In [None]:
with open('english-words.txt') as f:
    words = f.read().split()

#### 10.  `sorted_word` 
Write the function `sorted_word` that takes a list of words and returns the word made from its letters arranged in alphabetic order

In [None]:
def sorted_word(word, reverse=False):
    ...

assert sorted_word('Mississippi') == 'Miiiippssss'
assert sorted_word('pizzazz', True) == 'zzzzpia'

#### 11. Sorted words
Find all English words that are at least 6 letters in length whose letters appear in alphabetic order or reversed alphabetical order

* Use `english-words.txt` or the variable `words`
* Use your function `sorted_word`
* No need to define a function

#### 12. Anagrams
Find all pairs of English words that are at least 8 letters in length and are anagrams of each other, i.e. they are made up by re-arranging the same letters:
* Use `english-words.txt` or the variable `words`
* Use your function `sorted_word`
* No need to define a function
* Hint: you might need a loop nested inside another loop or, more efficiently, use the function `itertools.combinations`
* Examples: (behaviourism, misbehaviour), (colonialists, oscillations)