<img src="https://courses.edx.org/asset-v1:ACCA+ML001+2T2021+type@asset+block@acca-logo.jpg" alt="ACCA logo" style="width: 400px;"/>

# Introduction to Python
## Part 5 - Exercises - `Solutions` ✅

* **Course:** __Machine learning with Python for finance professionals__ by ACCA
* **Instructor:** [Coefficient](https://coefficient.ai) / [@CoefficientData](https://twitter.com/CoefficientData)

---

Congratulations, you have now covered the key fundamentals to get working with Python! To improve your knowledge and programming skills, regular practice is critical to embed the information covered throughout the module.

This notebook contains exercises for you to test your understanding of the content covered so far. It's wise to first tackle them yourself in order to reinforce what you've learnt. If you hit any errors, copy/paste the error message into Google and see if you can figure out how to solve them yourself.

Even if you feel comfortable solving these exercises, they can be solved in multiple ways, some more efficient than others. We strongly recommend you check out the solutions as there's some neat Python tips & tricks included!

> ### 🚩 Exercise 1
> 
> You have been provided with a list of 200 numbers, each representing the total amount due from an invoice. A payment of exactly £10,000.00 was made for some invoices, but we don't know which ones! You have been tasked with finding the two invoices that sum to 10000.
> 
> In simple terms, **find the two numbers from the list that sum to 10000**.
> 
> For example, suppose your invoices report contained the following:
>```python 
> 1184.27
> 1283.89
> 3987.02
> 6012.98
> 6298.75
> 7997.77
> ```
>
> In this list, the two invoices that sum to 10000 are 3987.02 and 6012.98.

In [1]:
# Which two invoices sum to 10000?
invoices = [
    3156.98, 5741.76, 6285.81, 2789.09, 9476.63, 9434.73, 7572.64, 321.09, 5779.01, 2317.08,
    736.05, 3585.60, 6089.20, 504.96, 6570.91, 874.59, 1740.26, 6610.48, 3882.67, 111.28,
    4471.35, 2736.32, 5496.16, 1698.49, 9789.85, 9922.64, 727.68, 3162.53, 8538.63, 3279.36,
    8208.15, 1862.54, 9613.07, 3761.58, 6513.15, 7027.79, 4951.12, 1904.82, 3957.28, 446.85,
    7002.08, 6519.58, 644.93, 1665.60, 849.53, 4953.99, 4824.48, 8262.78, 3892.76, 6297.52,
    3787.09, 8789.21, 6595.63, 5833.87, 1315.51, 6394.10, 1713.57, 3647.18, 1478.18, 5155.82,
    7998.52, 2516.10, 2792.20, 8926.61, 8298.95, 9625.43, 475.18, 2370.03, 8105.16, 2377.92,
    4405.81, 4734.28, 2059.39, 8627.45, 1926.49, 1294.41, 1330.06, 5969.54, 2387.68, 7983.72,
    2205.27, 7082.94, 4143.00, 6262.01, 8521.51, 7509.23, 1224.84, 3589.03, 1596.35, 5492.33,
    2720.23, 416.55, 7803.78, 1289.50, 3734.47, 7491.88, 2209.56, 1266.30, 3821.89, 8289.05,
    6228.39, 6742.49, 830.14, 2755.48, 8111.47, 9048.47, 7348.74, 2750.60, 6007.70, 8459.98,
    5507.28, 2026.29, 7537.75, 2447.51, 2517.41, 8800.51, 8968.01, 6834.14, 9249.43, 6043.55,
    2691.57, 605.77, 9884.69, 2431.77, 3564.64, 4502.23, 6769.02, 505.50, 3766.30, 6450.18,
    2314.20, 9167.63, 230.87, 8508.05, 7312.12, 3961.05, 9135.30, 3429.74, 9735.66, 9505.95,
    8800.71, 7712.33, 4726.63, 5078.72, 9740.91, 2513.99, 1502.42, 1353.35, 8438.59, 1647.70,
    7012.61, 5039.55, 7537.23, 1257.30, 7279.77, 8599.24, 5260.81, 9675.02, 812.49, 6093.37,
    8576.88, 7219.41, 260.63, 6685.82, 1812.51, 9739.62, 2139.48, 2621.23, 5950.70, 1020.24,
    301.17, 9211.22, 1047.03, 1291.06, 5965.06, 4364.19, 3487.26, 3665.12, 6001.96, 8575.88,
    7536.67, 2078.74, 8947.27, 2022.55, 4342.10, 8411.86, 4303.16, 6355.92, 6245.04, 1348.40,
    3912.03, 6333.90, 4804.86, 9811.86, 1226.57, 8445.80, 3406.66, 7712.23, 1568.00, 5261.75
]

---

In [2]:
# ✅ SOLUTION

In [3]:
# For simplicity, we'll start by just looking at the first 3 numbers.
first_3 = invoices[:3]

In [4]:
# For loops are OK...
for a in first_3:
    print(a * 100)

315698.0
574176.0
628581.0


In [5]:
results = []
for a in first_3:
    if a > 4000:
        results.append(a * 100)
    
results

[574176.0, 628581.0]

In [6]:
results = [a * 100 for a in first_3 if a > 4000]
results

[574176.0, 628581.0]

In [7]:
# ...but list comprehensions are great!
[a * 100 for a in first_3]

[315698.0, 574176.0, 628581.0]

In [8]:
# Let's try a double comprehension (!)
[(a, b) for a in first_3 for b in first_3]

[(3156.98, 3156.98),
 (3156.98, 5741.76),
 (3156.98, 6285.81),
 (5741.76, 3156.98),
 (5741.76, 5741.76),
 (5741.76, 6285.81),
 (6285.81, 3156.98),
 (6285.81, 5741.76),
 (6285.81, 6285.81)]

In [9]:
# We can be more efficient by not going over number pairs we already visited
[(a, b) for a in first_3 for b in first_3[first_3.index(a) :]]

[(3156.98, 3156.98),
 (3156.98, 5741.76),
 (3156.98, 6285.81),
 (5741.76, 5741.76),
 (5741.76, 6285.81),
 (6285.81, 6285.81)]

In [10]:
# Note how we used .index() method here
invoice_7 = invoices[7]
print("invoice_7 =", invoice_7)

position_of_invoice_7 = invoices.index(invoice_7)
print("position_of_invoice_7 =", position_of_invoice_7)

invoice_7 = 321.09
position_of_invoice_7 = 7


In [11]:
# Let's now add the if statement...
[(a, b) for a in invoices for b in invoices[invoices.index(a) :] if a + b == 10000]

[(2720.23, 7279.77)]

In [12]:
invoices.index(2720.23)

90

In [13]:
invoices.index(7279.77)

154

> ### 🚩 Exercise 2
> 
> You receive the following email from the Accounts Payable team:
> 
> ```txt
> Great work tracking down those two invoices! We looked into them, and unfortunately they're already accounted for. Perhaps there are THREE invoices that, together, sum up to £10,000?
> ```
> 
> **Find the _three_ invoices that sum to 10,000.00?**

In [14]:
# ✅ SOLUTION

# Which three items add up to 10000?
[
    (a, b, c)
    for a in invoices
    for b in invoices[invoices.index(a) :]
    for c in invoices[invoices.index(b) :]
    if a + b + c == 10000
]

[(6834.14, 1353.35, 1812.51)]

> ### 🚩 Exercise 3
> 
> We can use Python to count how many times a word is used within a sentence or document. This can be useful for text analysis projects and other types of reporting. 
>
> First, we will provide some tips on handling text and cleaning it up ready for analysis. We will use the following sentence for this task:
>
> <i>"How much wood would a woodchuck chuck if a woodchuck could chuck wood?"</i>

In [15]:
# Create the string variable
sentence = "How much wood would a woodchuck chuck if a woodchuck could chuck wood?"

In [16]:
# Let's handle the capital letters by making everything lowercase
sentence.lower()

'how much wood would a woodchuck chuck if a woodchuck could chuck wood?'

In [17]:
# Remove the question mark
sentence.replace("?", "")

'How much wood would a woodchuck chuck if a woodchuck could chuck wood'

In [18]:
# We can use "method chaining" to do both in one go!
sentence.lower().replace("?", "")

'how much wood would a woodchuck chuck if a woodchuck could chuck wood'

In [19]:
# You can use the split function to extract the words from the sentence
sentence.split()

['How',
 'much',
 'wood',
 'would',
 'a',
 'woodchuck',
 'chuck',
 'if',
 'a',
 'woodchuck',
 'could',
 'chuck',
 'wood?']

In [20]:
# Let's put it all together and assign to a variable ready to use in the next step
words = sentence.lower().replace("?", "").split()
words

['how',
 'much',
 'wood',
 'would',
 'a',
 'woodchuck',
 'chuck',
 'if',
 'a',
 'woodchuck',
 'could',
 'chuck',
 'wood']

> **Exercise:** Create a function `count_words()` that takes a single string as an argument (you can call your argument `sentence`), and returns a dictionary containing:
> - The individual unique list of words as keys
> - The number of times each word was used as the value
> 
> _Example: for the phrase `I am who I am` we'd create a dictionary of word counts as follows:_
> 
> ```python
> {"i": 2, "am": 2, "who": 1}
> ```

In [21]:
# The words list is ready to go
print(words)

['how', 'much', 'wood', 'would', 'a', 'woodchuck', 'chuck', 'if', 'a', 'woodchuck', 'could', 'chuck', 'wood']


In [22]:
# ✅ SOLUTION

In [23]:
# Create a dictionary ready to hold the words (as keys) and their counts (as values)
word_counts = {}

# Loop through the words
for word in words:

    # Is the word in the dictionary yet?
    if word in word_counts:
        # If it is, add +1 to the current count
        word_counts[word] += 1

    else:
        # If it isn't, put it in with a count of 1
        word_counts[word] = 1

print(word_counts)

{'how': 1, 'much': 1, 'wood': 2, 'would': 1, 'a': 2, 'woodchuck': 2, 'chuck': 2, 'if': 1, 'could': 1}


In [24]:
# Put the solution into a function called count_words()

def count_words(sentence):
    words = sentence.lower().replace("?", "").split()
    word_counts = {}
    for word in words:
        if word in word_counts:
            word_counts[word] += 1
        else:
            word_counts[word] = 1

    return word_counts

In [25]:
# Let's test it out
count_words(sentence)

{'how': 1,
 'much': 1,
 'wood': 2,
 'would': 1,
 'a': 2,
 'woodchuck': 2,
 'chuck': 2,
 'if': 1,
 'could': 1}

In [26]:
# We can simplify our function slightly by using default values with dictionary lookups.

# Let's say we have a dictionary containing everyone we owe money to, and the amount owed.
creditors = {"bill": 100, "charlie": 50, "percy": 50}

# We can check how much we owe someone using a dictionary lookup.
creditors["charlie"]

50

In [27]:
# This will error if you provide someone who isn't in the dictionary.
creditors["george"]

KeyError: 'george'

In [28]:
# Instead, let's supply a default value - this won't break now if we try someone who isn't in the dictionary.
creditors.get("george", 0)

0

In [29]:
# But for keys that ARE present, it works just the same as before.
creditors.get("charlie", 0)

50

In [30]:
# Now let's use .get() with a default value to simplify our previous solution.

def count_words(sentence):
    words = sentence.lower().replace("?", "").split()
    word_counts = {}
    for word in words:
        word_counts[word] = word_counts.get(word, 0) + 1

    return word_counts

In [31]:
count_words(sentence)

{'how': 1,
 'much': 1,
 'wood': 2,
 'would': 1,
 'a': 2,
 'woodchuck': 2,
 'chuck': 2,
 'if': 1,
 'could': 1}

> ### 🚩 Exercise 4
> 
> Repeat your word counting analysis using a text source of your choice. If you're stuck for ideas, you could try a famous speech such as [The Gettysburg Address](https://en.wikipedia.org/wiki/Gettysburg_Address#Text_of_the_Gettysburg_Address) or the [full-text minutes of the Federal Reserve](https://www.federalreserve.gov/monetarypolicy/fomccalendars.htm) or even a whole novel such as [The Great Gatsby](https://www.gutenberg.org/files/64317/64317-h/64317-h.htm)!
> 
> We've included some tips below that you may find useful.

In [32]:
# Tip: You may want to remove more than just question marks. You can break your method chains
#      across multiple lines if you surround the whole chain with round brackets.
(
    sentence.lower()
    .replace("?", "")  # remove ?
    .replace(".", "")  # remove .
    .replace(",", "")  # remove ,
    .split()
)

['how',
 'much',
 'wood',
 'would',
 'a',
 'woodchuck',
 'chuck',
 'if',
 'a',
 'woodchuck',
 'could',
 'chuck',
 'wood']

In [33]:
# Tip: You can copy/paste large amounts of text into a single Python string by using multi-line strings.
#      These are specified using a pair of triple-quotes (double or single).

multi_line_string = """This text
spans
multiple lines.
"""

In [34]:
print(multi_line_string)

This text
spans
multiple lines.



In [35]:
# Tip: Here's a function that sorts a dictionary by its values. You can use this to sort your word counts.
def sort_dictionary(input_dictionary, reverse=True):
    return sorted(input_dictionary.items(), key=lambda x: x[1], reverse=reverse)

In [36]:
# Here's how to use it

counts = {'one': 1, 'ten': 10, 'three': 3, 'six': 6}
sort_dictionary(counts)

[('ten', 10), ('six', 6), ('three', 3), ('one', 1)]

---

#### Steps:
1. Adapt your `count_words()` function from the previous exercise so that it strips out punctuation using the code tip provided above.
2. Try out your adapted function with a long-form piece of text of your choice, using triple quotes (`"""`) to store this information in a single multi-line string variable.
3. Find the top 20 words in your chosen text using the `sort_dictionary()` function provided above. (This returns a list, so you can use list slicing to reduce it down to just the top 20.)

In [37]:
# ✅ SOLUTION

def count_words(sentence):
    words = (
        sentence.lower()
        .replace("?", "")  # remove ?
        .replace(".", "")  # remove .
        .replace(",", "")  # remove ,
        .split()
    )
    word_counts = {}
    for word in words:
        word_counts[word] = word_counts.get(word, 0) + 1

    return word_counts

In [38]:
# Gettysburg
sentence = """
Four score and seven years ago our fathers brought forth upon this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal.
Now we are engaged in a great civil war, testing whether that nation, or any nation so conceived and so dedicated, can long endure. We are met on a great battle-field of that war. We have come to dedicate a portion of that field, as a final resting place for those who here gave their lives that that nation might live. It is altogether fitting and proper that we should do this.

But, in a larger sense, we can not dedicate—we can not consecrate—we can not hallow—this ground. The brave men, living and dead, who struggled here, have consecrated it, far above our poor power to add or detract. The world will little note, nor long remember what we say here, but it can never forget what they did here. It is for us the living, rather, to be dedicated here to the unfinished work which they who fought here have thus far so nobly advanced. It is rather for us to be here dedicated to the great task remaining before us—that from these honored dead we take increased devotion to that cause for which they gave the last full measure of devotion—that we here highly resolve that these dead shall not have died in vain—that this nation, under God, shall have a new birth of freedom—and that government of the people, by the people, for the people, shall not perish from the earth.
"""

sort_dictionary(count_words(sentence))[:20]

[('the', 11),
 ('that', 10),
 ('to', 8),
 ('we', 8),
 ('here', 8),
 ('a', 7),
 ('and', 5),
 ('nation', 5),
 ('can', 5),
 ('of', 5),
 ('have', 5),
 ('for', 5),
 ('it', 5),
 ('not', 5),
 ('in', 4),
 ('dedicated', 4),
 ('this', 3),
 ('are', 3),
 ('great', 3),
 ('so', 3)]

In [39]:
# Gatsby
with open("gatsby.txt", "r") as f:
    sentence = f.read()

# Top 20 words in The Great Gatsby
top_words = sort_dictionary(count_words(sentence))
top_words[:20]

[('the', 2544),
 ('and', 1565),
 ('a', 1440),
 ('of', 1224),
 ('to', 1197),
 ('i', 1000),
 ('in', 849),
 ('he', 771),
 ('was', 758),
 ('that', 565),
 ('his', 488),
 ('it', 470),
 ('with', 463),
 ('you', 428),
 ('at', 411),
 ('had', 377),
 ('her', 374),
 ('on', 362),
 ('she', 353),
 ('for', 335)]

In [40]:
# Most used words with more than 10 letters (must have 4+ occurrences)
[(word, count) for (word, count) in top_words
 if len(word) > 10 and count > 4]

[('immediately', 15),
 ('information', 12),
 ('conversation', 10),
 ('impatiently', 7),
 ('interrupted', 7),
 ('disappeared', 7),
 ('distributing', 7),
 ('living-room', 6),
 ('wolfshiem’s', 6),
 ('incessantly', 6),
 ('distribution', 6),
 ('interesting', 5),
 ('incredulously', 5),
 ('continually', 5),
 ('embarrassment', 5),
 ('klipspringer', 5),
 ('replacement', 5)]

---
<div class="alert alert-block alert-success">
<b>🎉 Congratulations</b><br>
You have reached the end of this module.
</div>