# Make your life easier by using unittest.**mock**

### Also, a small introduction to mocking, APIs and cats 🐈

## What is a mock?
### Is it the same as a *stub*? Or as a *dummy*? Or as a *fake*?

![SO](http://localhost:5500/lib/img3.png)

#### A mock is an *interaction-based* object whose purpose is to *override* other objects and return user-defined values. 

Its main implementation in Python is via the builtin package `unittest.mock`. [Official documentation](https://docs.python.org/3/library/unittest.mock.html) says that
>`unittest.mock` is a library for testing in Python. It allows you to replace parts of your system under test with mock objects and make assertions about how they have been used.

In [None]:
# Hidden imports go here
import sys
import ipytest
import requests
import pyspark
import unittest
import time
from unittest import mock

Let's start with the *very* basics:

In [None]:
def a_random_function(arg1: str, arg2: str) -> str:
    return arg2 + arg1

a_random_function("AMRO", "ABN")

Now let's now make things slightly less linear:

In [None]:
def another_function(arg1: int, arg2: int, arg3: str, arg4: str) -> str:
    return a_random_function(arg3, arg4) + str((arg1 + arg2)//2)

another_function(2, 4, "ʘ=)∫", "(=ʘᆽ")

### How is this related to unit testing?    
Let's take the definition of *unit testing*. According to Wikipedia (the main source of information for our century), we can define unit testing as:
>"A software testing method by which individual units of source code — sets of one or more computer program modules together with associated control data, usage procedures, and operating procedures — are tested to determine whether they are fit for use."

The dilemma is:
> " How can we test something over which we have no control? "

In our specific case, let's take `a_random_function(arg1, arg2)` as given.    
Let's say we import it from some other package, that doesn't belong to us.    

We run into this issue many times: the most common use case is when we interact with the filesystem.

Let's redefine our `another_function()` as follows:

In [None]:
def another_function(arg1: str, arg2: str) -> str:
    return a_random_function(sys.argv[0], sys.argv[1]) + " - " + str((arg1 + arg2)//2)

another_function(10, 200)

As you can see, I have no possible control over `sys.argv`: they get defined at runtime, once I run my application.    
How can we account for them, or get *some* degree of control?

Fortunately, `unittest.mock` allows us to perform this task in a relatively relaxed way.

In [None]:
@mock.patch("sys.argv", ["One", "Two"])
def another_function(arg1: int, arg2: int) -> str:
    return a_random_function(sys.argv[0], sys.argv[1]) + " - " + str((arg1 + arg2)//2)

another_function(10, 200)

In [None]:
print(sys.argv)

![cat](http://localhost:5500/lib/img1.jpg)

In [None]:
# Test-related imports go here

ipytest.autoconfig()
test_args = ["--showlocals", 
            "-x", 
            "--cov-report", 
            "term-missing",
            "--cov",
            "neon.utils.functions"]

## Let's get to our cats 🐈
Since I really love cats, I'd like to know more about them.    
Fortunately, someone created the **completely free** [Cat facts API](https://catfact.ninja/) which can return nice (and interesting) facts about our feline friends.

In [None]:
requests.get(url="https://catfact.ninja/fact").json()["fact"]

For the purpose of this presentation, I made a **whole application** to collect facts about cats. The application does two main things:
- Collects a certain number of facts by querying the API
- Saves everything as a nice table into my local HIVE metastore

I called the application **Neon** (just because that's the first name I got from a random generator).    
The application code and its tests are freely available on my Github profile (link at the end of the presentation).

### Let's start making use of it then!

In [None]:
from neon.utils.functions import *

![outline](http://localhost:5500/lib/img4.png)

In [None]:
interesting_facts = process_data(usernumber=3)

Each API call generates a JSONs.    
`process_data()` conveniently packs them into a list, over which we can easily iterate: 

In [None]:
for entry in interesting_facts:
    print(entry["fact"])

Let's now store these important facts inside my table, so then they don't get lost:

In [None]:
spark = establish_spark()
group_and_save(spark, facts=interesting_facts)
df = spark.read.table("default.random_cats_facts")

In [None]:
df.show(100,0)

As you can see, the whole procedure is **very slow**.

This is due to a random `waiting` variable that I added in order to simulate a slow connection, a very inefficient backend on the server side, or anything that can get in between.

Since the `waiting` variable polls a random amount of seconds between 1 and 10, we're talking here about an *average* waiting time of *5 seconds per request*.
In case you don't believe me, check the [central limit theorem](https://en.wikipedia.org/wiki/Central_limit_theorem) for statistical reference 🤓

### What can I do to make the whole thing more efficient, while testing for the main functionality?

There are two main issues over here:
1) **I don't wanna wait 5 seconds for each call to the API**, but I'd like to be sure that the call works nonetheless (i.e. no test data, I don't care about the data: I want my functionality to work)    
2) **I don't wanna save the data I retrieve every single time**, since I have no control over its saving process. I just know that, as soon as I run the application, the data will be saved somewhere. That's not good for testing purposes!

Let's use our mocks:

In [None]:
class TestAPI(unittest.TestCase):
    @mock.patch("neon.utils.functions.fake_request", return_value=None)
    def test_retrieve_data_without_waiting(self, patched_request):
        t0 = time.perf_counter()
        actual: dict = retrieve_data(waiting=5)
        actual_keys: list = [key for key in actual]
        expected_keys: list = ["fact", "length"]
        t1 = time.perf_counter() - t0
        self.assertEqual(actual_keys, expected_keys)
        self.assertLess(t1, 10)
    
    @mock.patch("neon.utils.functions.retrieve_data", return_value={"fact" : "This is a random fact", "length" : "-99"})
    def test_process_data_with_custom_load(self, patched_retrieve):
        t0 = time.perf_counter()
        actual: list = process_data(usernumber=5, waiting=5)
        expected: list = [{"fact" : "This is a random fact", "length" : "-99"}]*5
        t1 = time.perf_counter() - t0
        self.assertEqual(actual, expected)
        self.assertLess(t1, 10)

In [None]:
first_test_suite = TestAPI()
ipytest.run(*test_args)

And now to the Spark part:

In [None]:
class TestSparkFunctions(unittest.TestCase):
    @classmethod
    def setUpClass(cls) -> None:
        cls.mocked_data: List[dict] = [{"fact": "This is a random fact", "length": "-99"}]

    @mock.patch("pyspark.sql.readwriter.DataFrameWriter.saveAsTable")
    def test_group_and_save_with_patching(self, patched_writer):
        patched_writer.new = True
        spark: SparkSession = establish_spark()
        data: List[dict] = self.mocked_data
        group_and_save(spark, data)
        patched_writer.assert_called()

    @mock.patch("neon.utils.functions.fake_request", return_value=None)
    def test_group_and_save_with_api_load(self, patched_request):
        mocked_save = mock.create_autospec(group_and_save)
        spark: SparkSession = establish_spark()
        data: list[dict] = process_data(usernumber=5, waiting=2)
        expected: bool = mocked_save(spark, data)
        mocked_save.assert_called()
        self.assertTrue(expected)

In [None]:
second_test_suite = TestSparkFunctions()
ipytest.run(*test_args)

### Problem solved!

![cat](http://localhost:5500/lib/img2.jpg)

### Useful links:
[Catfacts (API reference)](https://catfact.ninja/)    
[This application (Git repo)](https://github.com/jean-n92/pytest-mock-presentation)    
[Where to patch (Documentation)](https://docs.python.org/3/library/unittest.mock.html#where-to-patch)    
[Quick start with mock (Documentation)](https://docs.python.org/3/library/unittest.mock.html#quick-guide)    
[Difference between mock and stub (StackOverflow)](https://stackoverflow.com/questions/3459287/whats-the-difference-between-a-mock-stub)