Skip to content
This repository was archived by the owner on Feb 26, 2025. It is now read-only.

Conversation

@heyitsarpit
Copy link
Contributor

@heyitsarpit heyitsarpit commented May 31, 2019

This pull request will track the progress of the publish_dafsa.py script to upload a binary file containing the data of the public suffix list to the remote settings server.

More details about the project can be found here

Tasks

  • Find the latest commit hash of the suffix list repository using Github API V3
  • Check if the hash is new by comparing to previous stored hash
  • Download make_dafsa.py, prepare_tlds.py and the raw public suffix list from remote sources
  • Store the above three in a temporary directory
  • Execute prepare_tlds.py to output the dafsa (currently the C++ hex array, waiting changes in make_dafsa.py)
  • Make sure the program is safe (ie. hash is not updated if the attachment upload fails and vice-versa)
  • Make sure the program will fail loudly (ie. crash if 404 on download, if command fails, if upload fails)
  • Write a small unit-test for the nominal case
  • Request review when the file is uploaded
  • Publish the new hash to remote-settings server with the binary as attachment

Testing Strategy

Function Tests
def get_latest_hash(url):
  • Check if the fetched hash is a string of length 40.
  • Check if HTTPError is raised when response is status is a 404 code
def download_resources(directory, *urls):
  • Check all the files are downloaded with the correct names.
  • Check if HTTPError is raised when response is status is a 404 code
def get_stored_hash(clients):
  • Check if the hash is fetched successfully
  • Check if KintoException is raised when record fetching fails
def prepare_dafsa(directory):
  • Check if the created file has size greater than zero
  • Check if an exception is raised when the dafsa creation process fails
def remote_settings_publish(client, latest_hash, binary_path):
  • Check if a POST request was made successfully for record upload
  • Check if a PATCH request was made successfully for review request
def publish_dafsa(event, context):
  • Check if prepare_dafsa() and remote_settings_publish() are not called when record hashes do match
  • Check if prepare_dafsa() and remote_settings_publish() are called when record-hash and latest-hash do not match

@heyitsarpit
Copy link
Contributor Author

heyitsarpit commented Jun 5, 2019

@leplatrem I think I have settled on a general flow of how the script will execute, there's more stuff left to handle but just comment if this execution pattern is fine.

def publish_dafsa():  # Script entry point
    client = Client(server_url=SERVER, auth=CREDENTIALS) # Create a new client 
    latest_hash = get_latest_hash(COMMIT_HASH_URL) # Fetch the latest hash of psl repository
    record = client.get_record(id=RECORD_ID, bucket=BUCKET_ID, collection=COLLECTION_ID) # Fetch the record from kinto server
    if record_exists(record):   # Check if a valid record exists already
        if record["data"]["latest-commit-hash"] == latest_hash:  # Compare the two hashes, new and stored
            return 0  # Return if true, the rest of the script need not execute
        else:
            make_dafsa_and_publish(client, latest_hash)   # Updating the record, make the dafsa and publish the record
    else:
        make_dafsa_and_publish(client, latest_hash)  # Making a new record, make the dafsa and publish the record

The make_dafsa_and_publish(client, latest_hash) function handles all the major work.

@leplatrem
Copy link
Contributor

leplatrem commented Jun 5, 2019

@leplatrem I think I have settled on a general flow of how the script will execute, there's more stuff left to handle but just comment if this execution pattern is fine.

Yes, it's good!

It could even be simplified. The two else: are redundant and not necessary since we have return

Also, checkout 😉 #373 (comment) Ok done.

Copy link
Contributor

@leplatrem leplatrem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It starts to look a letter better!

Some new comments :)

@heyitsarpit
Copy link
Contributor Author

I ran the whole script for the first time, and it's working !!!
I had to change permissions for the bucket but the record creation and attachment upload are happening.

@heyitsarpit
Copy link
Contributor Author

For now I'm struggling with the test for kinto client, since most of this code are not separate functions with explicit exports, I'm having to rewrite the most of this logic again and test incrementally, is that fine?

@leplatrem
Copy link
Contributor

For now I'm struggling with the test for kinto client, since most of this code are not separate functions with explicit exports, I'm having to rewrite the most of this logic again and test incrementally, is that fine?

I don't think so, based on what I see you should not have to rewrite anything of what you have here.
You could indeed extract two functions from your code, like one for prepare_tlds() and one for remote_settings_publish().

I believe that you didn't start it on the right foot. The tests code should not duplicate anything from the feature code, or as little as possible.

You could have 2 main test suites:

  • One for publish_dafsa() that asserts that make_dafsa_and_publish() is only called when the hash is missing or has changed. You can use mock to assert that a certain function is called
  • One for make_dafsa() that will assert that a POST request is sent.

You should probably take some time to understand how tests work in general with the responses library. You have to mock fake server responses, call the code you want to test (eg. make_dafsa_and_publish()), and then assert that a certain http endpoint was called

You can read this also http://blog.mathieu-leplatre.info/your-tests-as-your-specs.html

@leplatrem
Copy link
Contributor

Here is an example to test the prepare_dafsa function:

class TestPrepareDafsa(unittest.TestCase):
    def test_exception_is_raise_when_process_returns_non_zero(self):
        with tempfile.TemporaryDirectory() as tmp:
            with mock.patch("subprocess.Popen") as mocked:
                mocked.return_value.returncode = 1

                with self.assertRaises(Exception) as cm:
                    prepare_dafsa(tmp)

                self.assertIn("DAFSA Build Failed", str(cm.exception))

@leplatrem
Copy link
Contributor

And here is a more complex one, that leverages responses to simulate server responses:

import unittest
import tempfile
from unittest import mock

import responses

from commands.publish_dafsa import publish_dafsa, prepare_dafsa, COMMIT_HASH_URL

class TestPublishDafsa(unittest.TestCase):
    def setUp(self):
        mocked = mock.patch("commands.publish_dafsa.prepare_dafsa")
        self.addCleanup(mocked.stop)
        self.mocked_prepare = mocked.start()

        mocked = mock.patch("commands.publish_dafsa.remote_settings_publish")
        self.addCleanup(mocked.stop)
        self.mocked_publish = mocked.start()

    @responses.activate
    def test_prepare_and_publish_are_not_called_when_hash_matches(self):
        event = {
            "server": "http://fakeserver/v1",
        }
        record_url = "http://fakeserver/v1/buckets/main-workspace/collections/public-suffix-list/records/tld-dafsa"
        responses.add(responses.GET, record_url, json={
            "data": {
                "commit-hash": "abc",
            }
        })
        responses.add(responses.GET, COMMIT_HASH_URL, json=[{
            "sha": "abc"
        }])

        publish_dafsa(event, context=None)

        self.assertFalse(self.mocked_prepare.called)
        self.assertFalse(self.mocked_publish.called)

@leplatrem
Copy link
Contributor

I ran the lambda manually. There was no error, but the attachment was published with a size of 0. Something has to be improved here ;)

@heyitsarpit
Copy link
Contributor Author

heyitsarpit commented Jun 26, 2019

Can you run it once again, the scripts are being pulled from a temporary github repo that I just updated with the latest code.
Edit: never mind, the dafsa creation is failing suddenly for some reason :(
Edit: Working Again!!

@leplatrem
Copy link
Contributor

Copy link
Contributor

@leplatrem leplatrem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is kind of heavy going... I'm not sure how to help :/

Please ask questions when something is not clear! Experimenting is a great way to learn and that's awesome! But before submit your code, you should also take some time to polish it a minimum in order to preserve the reviewer patience ;)

@heyitsarpit
Copy link
Contributor Author

heyitsarpit commented Jun 30, 2019

I have refactored record fetching into a separate function and created tests for all of them, that are all passing currently!!!

@heyitsarpit heyitsarpit reopened this Jun 30, 2019
Copy link
Contributor

@leplatrem leplatrem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! 🌟

A last round of polishing and we're good to merge :)

@heyitsarpit
Copy link
Contributor Author

I've addressed all the recent concerns that you raised, I suppose we are almost ready to merge now?

@leplatrem leplatrem changed the title [WIP] Publishing Public Suffix List as a DAFSA binary Publishing Public Suffix List as a DAFSA binary Jul 2, 2019
Copy link
Contributor

@leplatrem leplatrem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏 👏

Well done! We're good to ship!

@leplatrem leplatrem merged commit 26de3e8 into mozilla-services:master Jul 2, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants