Support uploading content to pulp #28

JayZ12138 · 2019-07-17T19:04:15Z

Implemented basic uploading functionality, which could upload a single file to pulp.

JayZ12138 · 2019-07-17T20:48:22Z

I didn't add the upload_one_file to fake.client, seems like it would just return None in the current implementation. I'm not sure if any exception handling is needed in upload_one_file function, if so, then I can do the same in the fake function.

rohanpm

Yeah, I think that the fake client really does need the equivalent API as well. The idea is that you should be able to swap in a fake client into any code written against a real client, and have that code succeed.

Since the API right now doesn't have any way of searching for content, the fake client implementation might not need to do anything except just validate the arguments and "succeed". Later on, if we add something in here for searching content, we'll have to update fake client to remember what has been uploaded and return it in searches.

pubtools/pulplib/_impl/client/client.py

tests/client/test_client.py

rohanpm · 2019-07-18T01:27:25Z

One more comment: as you're adding new public API, can you please also bump the minor version in setup.py in this commit? Also, add to CHANGELOG.md ?

i.e. if current version is x.y.z, because you've added new backwards-compatible API, per semver you should set next version to x.(y+1).z.

1. bump version and update CHANGELOG.md 2. move the upload api to FileRepository class 3. remove some unused arguments for importing to iso repo 4. upload/import file to Pulp is still blocking style, the api is just for import 1 file, if users want to import many files at the same time, should wrap it in threads 5. read file with binary mode 6. add verifications in test

JayZ12138

Yeah, I think that the fake client really does need the equivalent API as well. The idea is that you should be able to swap in a fake client into any code written against a real client, and have that code succeed.

Since the API right now doesn't have any way of searching for content, the fake client implementation might not need to do anything except just validate the arguments and "succeed". Later on, if we add something in here for searching content, we'll have to update fake client to remember what has been uploaded and return it in searches.

Now there's no public API in client now, is a fake one for _do_upload_file still needed?

CHANGELOG.md

pubtools/pulplib/_impl/client/client.py

tests/repository/test_upload.py

pubtools/pulplib/_impl/model/repository/file.py

rohanpm · 2019-07-23T01:37:52Z

Now there's no public API in client now, is a fake one for _do_upload_file still needed?

What's needed is that, if you use the fake client to get a FileRepository instance, then the public upload_file method on that instance should be implemented and should do something reasonable according to the documentation. It should probably at least crash if the file isn't readable.

rohanpm

Jay, could you consider writing an example for this too? It could help to ensure the API is right. Might help you develop and test it too.

For instance, in "examples" directory there could be examples/upload-files with behavior "upload all files in a directory to a given repo", run like e.g. upload-files --repo-id my-repo ~/path/to/dir

This, in combination with debug logging, could help with investigating how the blocking/non-blocking upload is working. For instance, if you had 10 files in the directory and you have blocking implementation you should be able to easily see that it's only handling one file at a time. If you have non-blocking implementation you should be able to get the example to upload multiple files at a time and it should be obviously faster.

pubtools/pulplib/_impl/client/client.py

JayZ12138 · 2019-07-25T01:31:59Z

Jay, could you consider writing an example for this too? It could help to ensure the API is right. Might help you develop and test it too.

For instance, in "examples" directory there could be examples/upload-files with behavior "upload all files in a directory to a given repo", run like e.g. upload-files --repo-id my-repo ~/path/to/dir

This, in combination with debug logging, could help with investigating how the blocking/non-blocking upload is working. For instance, if you had 10 files in the directory and you have blocking implementation you should be able to easily see that it's only handling one file at a time. If you have non-blocking implementation you should be able to get the example to upload multiple files at a time and it should be obviously faster.

Yeah, definitely.

various small fixes will add example and add more test cases if the idea here's OK

CHANGELOG.md

examples/upload-files

pubtools/pulplib/_impl/client/client.py

tests/repository/test_upload.py

pubtools/pulplib/_impl/model/repository/file.py

style with upload by chunk in sequence

JayZ12138 · 2019-07-30T20:48:30Z

I'm working on the fake things now. I have a qurstion, feels like I should have a file.py under fake module?
OR should I just add the fake upload to client.py?

examples/upload-files

pubtools/pulplib/_impl/client/client.py

pubtools/pulplib/_impl/model/repository/file.py

rohanpm · 2019-07-31T00:40:24Z

I'm working on the fake things now. I have a qurstion, feels like I should have a file.py under fake module?
OR should I just add the fake upload to client.py?

For the similar publish case, I didn't add anything in the repository classes, I just ensured the fake client had the same internal API as the real client. If that works here too then I'd recommend the same.

Bare minimum for the fake client would be for uploading to return a successful future. I'd also recommend it tries to enforce the same requirements on its inputs as the real client does. For example: if you pass the name of a file which can't be opened with the real client, that'll give a failing future, so the fake client should do the same.

pubtools/pulplib/_impl/client/client.py

rohanpm · 2019-08-01T02:20:36Z

pubtools/pulplib/_impl/fake/client.py


+    def _do_upload_file(self, upload_id, file_obj, name):
+        # pylint: disable=unused-argument
+        out = f_return((hashlib.sha256().hexdigest(), random.randint(10, 1000)))


Is there a particular reason that you just return a static checksum and random size, instead of actually reading the file object?

I think it would be preferable to truly read the file object for a couple of reasons:

so that it crashes if the file object can't actually be read - same as the real client would

so that you get stable behavior with respect to the checksum & size, instead of using something random

Scaling isn't really a concern with the fake client, so I think you can feel free to implement that in a "simple" way, like just reading the data directly in one chunk.

yeah, Agree.

rohanpm · 2019-08-01T02:21:54Z

pubtools/pulplib/_impl/fake/client.py

+        return f_return([task])
+
+    def _delete_upload_request(self):
+        return f_return()  # pragma: no cover


Why no cover? If this is never called then you don't even need to implement it.

I guess you really don't need to implement it, as deleting upload requests is an implementation detail not exposed in the API of the real client in any way, so there's no reason for the fake client to have any code for it.

hm, I felt I needed it since it's called in FileRepository.upload_file, though it's not consistent.

By consistent I mean, since we always return the import task, so the call of upload_file().result() can't promise _delete_upload_request is called, at least in the real cases. I'm not sure if it will be called in the fake client, I'll do some test

Turns about it's never called. I removed it. But I don't know why it's never called.

rohanpm · 2019-08-01T02:26:00Z

tests/fake/test_fake_publish.py

    assert "repo1 not found" in str(exception)
+
+
+def test_can_upload(tmpdir):


Please don't add the upload test to the file named test_fake_publish, as publish and upload are different things. In this project, I tried to keep test files smaller and more specific so we don't end up with what we see in many other projects, where each test file keeps growing indefinitely and it's tough to clean them up or to understand the relationship between tests and fixtures.

Can you please add that to a new file, test_fake_upload.py? There's nothing wrong with having a test file starting with only 1-2 tests.

rohanpm · 2019-08-01T02:27:43Z

tests/fake/test_fake_publish.py

+    else:
+        exception = FileNotFoundError
+    with pytest.raises(exception):
+        upload_f = repo1.upload_file("nonexistent_file")


I think this is actually pointing to a difference between the real and fake clients.

Question: if you use the real client and you attempt to upload a file which doesn't exist, does it:

(A) - raise an exception, or

(B) - return a future which raises an exception

I think it might be (B), but you've made the fake client do (A).

Can you please investigate and ensure they behave the same?

make sense.

Hm, actually, it just raises an exception directly, because open the file isn't wrapped in the a future, see code:
In file.py, upload_file calls _do_upload_file as:
upload_complete_f = self._client._do_upload_file(upload_id, file_obj, name)
in client.py, it opens the file first, then wrap the upload in future and return:
`if not is_file_object:
file_obj = open(file_obj, "rb")

LOG.info("Uploading %s to Pulp", name) upload_f = f_flat_map(f_return(), lambda _: do_next_upload(hashlib.sha256(), 0))`

So, this is not completely non-blocking, try to upload the file not existed will fail the whole process, is this OK?

I am fine with either way as long as both the clients are consistent.

rohanpm · 2019-08-01T02:33:16Z

tests/repository/test_upload.py

+    messages = caplog.messages
+    # task's spwaned and completed
+    assert "Created Pulp task: task1" in messages
+    assert "Pulp task completed: task1" in messages


As above, I'd like to suggest you wait for the upload deletion to happen and verify it too.

rohanpm · 2019-08-01T02:42:57Z

pubtools/pulplib/_impl/fake/client.py

 from .match import match_object

 Publish = namedtuple("Publish", ["repository", "tasks"])
+Upload = namedtuple("Upload", ["repo_id", "tasks"])


I'm not quite sure if this API with Upload and upload_history is complete enough. I see a couple of issues with it:

Since it doesn't give you any info about what content was uploaded, all you can really check is whether the count of uploads is what you expected per repo. So, for instance, if you had a class which was generating content and uploading them to Pulp repos, you couldn't use the fake client to check anything about the content uploaded, whether they were uploaded with correct filenames, etc. It's really quite limited what you can do here. This isn't a deal breaker since it could potentially have more attributes added later.

The bigger issue is - what about other content types in the future? Right now, you've got upload_history and we only support file uploads. Later we will support other kinds of uploads. I'm thinking that it's quite possible different types of uploads ought to have different attributes collected in the history, so it may be a mistake to add one API which claims to support all kinds of uploads.

Can you please consider what to do about these issues? Also, it's always an option to leave out that new history API altogether now and let someone add it when they have a use-case for it.

rohanpm · 2019-08-01T02:43:29Z

pubtools/pulplib/_impl/fake/client.py

+        # pylint: disable=unused-argument
+        task = Task(id=self._next_task_id(), completed=True, succeeded=True)
+
+        self._upload_history.append(Upload(repo_id, [task]))


Can you please make this fail if the repo doesn't exist, like a real client would?

I got this reply from Jay but now I can't find it in github for some reason:

Is this really necessary?
The use case for upload is to get repository first and then call repo.upload_file.
If the repo doesn't exist, it should fail at get_repository, do we
need to double-check here?

I would say that it is indeed still necessary since there can be any number of references to
the same repository. I think there's already some autotests on the publish side which cover this.

For example you can have:

repo1 = client.get_repository('some-repo').result() repo2 = client.get_repository('some-repo').result() repo1.delete().result() # and now try to upload to repo2 # expected behavior: it should fail! repo2.upload_file(...)

rohanpm · 2019-08-01T02:44:23Z

pubtools/pulplib/_impl/fake/controller.py

+        Each element of this list is a named tuple with the following attributes,
+        in order:
+
+            ``repository``:


Documentation doesn't match reality. The namedtuple you declared actually has a "repo_id" and not a "repository".

In this case, I think the documentation is better and the code should be fixed. Publish returns a repository object and not a repo_id, why should Upload be inconsistent with that? Can you please fix that to have a repository object as well?

copy-pasted and wrong :(

rohanpm · 2019-08-01T02:45:49Z

pubtools/pulplib/_impl/model/repository/file.py

+                passing `relative_url` is mandatory.
+
+        Returns:
+            Future[:class:`~pubtools.pulplib.Task`]


Documentation maybe doesn't match reality again - isn't it returning a future with a list of tasks, not a single task?

copy-pasted and wrong again :(

rohanpm · 2019-08-02T01:51:58Z

pubtools/pulplib/_impl/fake/client.py

 from .match import match_object

 Publish = namedtuple("Publish", ["repository", "tasks"])
+Upload = namedtuple("Upload", ["repository", "tasks", "unit_type_id", "unit_key"])


I think this is not too good either, exposing the unit_type_id and unit_key here encourages devs to write tests against Pulp2 implementation details where it's not necessary to do so. upload_file is uploading a file "somehow" and the specifics of unit_type_id, unit_key are intentionally hidden, it's undermined by exposing them here like this.

If we are adding more attributes here I think they should try to be generic so as to expose as few Pulp2 implementation details as possible, while still meeting the requirements. So, what requirement is being met by these?

I think it's probably: allowing the developer of a test to verify that correct content was uploaded to correct path. Then I think it should be done more directly, like exposing "name" and ("sha256" or "content") attributes.

If you expose unit_key here, technically developers can look at unit_key["checksum"] to get the checksum. But that kind of implementation detail - "On Pulp, ISO units have a 'checksum' in unit key" - is exactly what this library is meant to be hiding.

JayZ12138 requested review from rajulkumar and rohanpm as code owners July 17, 2019 19:04

JayZ12138 force-pushed the 7160 branch 2 times, most recently from 250a013 to 3a42f7e Compare July 17, 2019 19:27

Support uploading content to pulp

6d7072d

Implemented basic uploading functionality, which could upload a single file to pulp.

JayZ12138 force-pushed the 7160 branch from 3a42f7e to 6d7072d Compare July 17, 2019 19:41

rohanpm suggested changes Jul 18, 2019

View reviewed changes

JayZ12138 force-pushed the 7160 branch 2 times, most recently from 88be0d4 to 1a90988 Compare July 22, 2019 19:32

JayZ12138 force-pushed the 7160 branch from 1a90988 to 8d77f7b Compare July 22, 2019 20:00

JayZ12138 commented Jul 22, 2019

View reviewed changes

rohanpm suggested changes Jul 23, 2019

View reviewed changes

rohanpm reviewed Jul 25, 2019

View reviewed changes

pubtools/pulplib/_impl/client/client.py Outdated Show resolved Hide resolved

implement upload to non-blocking style

19db708

various small fixes will add example and add more test cases if the idea here's OK

JayZ12138 force-pushed the 7160 branch from 7011882 to de92279 Compare July 26, 2019 19:54

Add example and fix bug in upload_file

681c117

JayZ12138 force-pushed the 7160 branch from de92279 to 681c117 Compare July 26, 2019 19:58

JayZ12138 requested review from alexandrevicenzi, chandwanitulsi, danrodrig, litleleprikon, midnightercz, negillett, rbikar and rohanpm July 29, 2019 13:26

rohanpm suggested changes Jul 30, 2019

View reviewed changes

Include small changes per Rohan and implement upload in non-blocking

f5cdc2a

style with upload by chunk in sequence

JayZ12138 force-pushed the 7160 branch from 5bda2f4 to f5cdc2a Compare July 30, 2019 20:39

JayZ12138 requested a review from rohanpm July 30, 2019 20:47

rohanpm suggested changes Jul 31, 2019

View reviewed changes

JayZ12138 force-pushed the 7160 branch 2 times, most recently from 60aa1a2 to 6906420 Compare July 31, 2019 17:38

Small fixes from Rohan's suggestion and add fake content

7dd742e

JayZ12138 force-pushed the 7160 branch from 6906420 to 7dd742e Compare July 31, 2019 17:49

JayZ12138 requested a review from rohanpm July 31, 2019 17:53

rohanpm suggested changes Aug 1, 2019

View reviewed changes

Small fixes according to Rohan's suggestions

bcffbf0

JayZ12138 force-pushed the 7160 branch from 1750f0d to bcffbf0 Compare August 1, 2019 16:45

JayZ12138 requested a review from rohanpm August 1, 2019 18:09

rohanpm reviewed Aug 2, 2019

View reviewed changes

JayZ12138 merged commit 6cb7fc0 into release-engineering:master Aug 2, 2019

		assert "repo1 not found" in str(exception)


		def test_can_upload(tmpdir):

Support uploading content to pulp #28

Support uploading content to pulp #28

Uh oh!

Conversation

JayZ12138 commented Jul 17, 2019

Uh oh!

JayZ12138 commented Jul 17, 2019

Uh oh!

rohanpm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rohanpm commented Jul 18, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JayZ12138 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rohanpm commented Jul 23, 2019

Uh oh!

rohanpm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

JayZ12138 commented Jul 25, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JayZ12138 commented Jul 30, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rohanpm commented Jul 31, 2019

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rohanpm commented Jul 18, 2019 •

edited

Loading

JayZ12138 Aug 1, 2019 •

edited

Loading