-
Notifications
You must be signed in to change notification settings - Fork 25
Support uploading content to pulp #28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
250a013 to
3a42f7e
Compare
Implemented basic uploading functionality, which could upload a single file to pulp.
|
I didn't add the upload_one_file to fake.client, seems like it would just return None in the current implementation. I'm not sure if any exception handling is needed in upload_one_file function, if so, then I can do the same in the fake function. |
rohanpm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think that the fake client really does need the equivalent API as well. The idea is that you should be able to swap in a fake client into any code written against a real client, and have that code succeed.
Since the API right now doesn't have any way of searching for content, the fake client implementation might not need to do anything except just validate the arguments and "succeed". Later on, if we add something in here for searching content, we'll have to update fake client to remember what has been uploaded and return it in searches.
|
One more comment: as you're adding new public API, can you please also bump the minor version in setup.py in this commit? Also, add to CHANGELOG.md ? i.e. if current version is x.y.z, because you've added new backwards-compatible API, per semver you should set next version to x.(y+1).z. |
88be0d4 to
1a90988
Compare
1. bump version and update CHANGELOG.md 2. move the upload api to FileRepository class 3. remove some unused arguments for importing to iso repo 4. upload/import file to Pulp is still blocking style, the api is just for import 1 file, if users want to import many files at the same time, should wrap it in threads 5. read file with binary mode 6. add verifications in test
JayZ12138
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think that the fake client really does need the equivalent API as well. The idea is that you should be able to swap in a fake client into any code written against a real client, and have that code succeed.
Since the API right now doesn't have any way of searching for content, the fake client implementation might not need to do anything except just validate the arguments and "succeed". Later on, if we add something in here for searching content, we'll have to update fake client to remember what has been uploaded and return it in searches.
Now there's no public API in client now, is a fake one for _do_upload_file still needed?
What's needed is that, if you use the fake client to get a FileRepository instance, then the public upload_file method on that instance should be implemented and should do something reasonable according to the documentation. It should probably at least crash if the file isn't readable. |
rohanpm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jay, could you consider writing an example for this too? It could help to ensure the API is right. Might help you develop and test it too.
For instance, in "examples" directory there could be examples/upload-files with behavior "upload all files in a directory to a given repo", run like e.g. upload-files --repo-id my-repo ~/path/to/dir
This, in combination with debug logging, could help with investigating how the blocking/non-blocking upload is working. For instance, if you had 10 files in the directory and you have blocking implementation you should be able to easily see that it's only handling one file at a time. If you have non-blocking implementation you should be able to get the example to upload multiple files at a time and it should be obviously faster.
Yeah, definitely. |
various small fixes will add example and add more test cases if the idea here's OK
style with upload by chunk in sequence
|
I'm working on the fake things now. I have a qurstion, feels like I should have a file.py under fake module? |
For the similar Bare minimum for the fake client would be for uploading to return a successful future. I'd also recommend it tries to enforce the same requirements on its inputs as the real client does. For example: if you pass the name of a file which can't be opened with the real client, that'll give a failing future, so the fake client should do the same. |
60aa1a2 to
6906420
Compare
|
|
||
| def _do_upload_file(self, upload_id, file_obj, name): | ||
| # pylint: disable=unused-argument | ||
| out = f_return((hashlib.sha256().hexdigest(), random.randint(10, 1000))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a particular reason that you just return a static checksum and random size, instead of actually reading the file object?
I think it would be preferable to truly read the file object for a couple of reasons:
- so that it crashes if the file object can't actually be read - same as the real client would
- so that you get stable behavior with respect to the checksum & size, instead of using something random
Scaling isn't really a concern with the fake client, so I think you can feel free to implement that in a "simple" way, like just reading the data directly in one chunk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, Agree.
| return f_return([task]) | ||
|
|
||
| def _delete_upload_request(self): | ||
| return f_return() # pragma: no cover |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why no cover? If this is never called then you don't even need to implement it.
I guess you really don't need to implement it, as deleting upload requests is an implementation detail not exposed in the API of the real client in any way, so there's no reason for the fake client to have any code for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, I felt I needed it since it's called in FileRepository.upload_file, though it's not consistent.
By consistent I mean, since we always return the import task, so the call of upload_file().result() can't promise _delete_upload_request is called, at least in the real cases. I'm not sure if it will be called in the fake client, I'll do some test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns about it's never called. I removed it. But I don't know why it's never called.
tests/fake/test_fake_publish.py
Outdated
| assert "repo1 not found" in str(exception) | ||
|
|
||
|
|
||
| def test_can_upload(tmpdir): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't add the upload test to the file named test_fake_publish, as publish and upload are different things. In this project, I tried to keep test files smaller and more specific so we don't end up with what we see in many other projects, where each test file keeps growing indefinitely and it's tough to clean them up or to understand the relationship between tests and fixtures.
Can you please add that to a new file, test_fake_upload.py? There's nothing wrong with having a test file starting with only 1-2 tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whoops.
tests/fake/test_fake_publish.py
Outdated
| else: | ||
| exception = FileNotFoundError | ||
| with pytest.raises(exception): | ||
| upload_f = repo1.upload_file("nonexistent_file") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is actually pointing to a difference between the real and fake clients.
Question: if you use the real client and you attempt to upload a file which doesn't exist, does it:
- (A) - raise an exception, or
- (B) - return a future which raises an exception
I think it might be (B), but you've made the fake client do (A).
Can you please investigate and ensure they behave the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, actually, it just raises an exception directly, because open the file isn't wrapped in the a future, see code:
In file.py, upload_file calls _do_upload_file as:
upload_complete_f = self._client._do_upload_file(upload_id, file_obj, name)
in client.py, it opens the file first, then wrap the upload in future and return:
`if not is_file_object:
file_obj = open(file_obj, "rb")
LOG.info("Uploading %s to Pulp", name)
upload_f = f_flat_map(f_return(), lambda _: do_next_upload(hashlib.sha256(), 0))`
So, this is not completely non-blocking, try to upload the file not existed will fail the whole process, is this OK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with either way as long as both the clients are consistent.
| messages = caplog.messages | ||
| # task's spwaned and completed | ||
| assert "Created Pulp task: task1" in messages | ||
| assert "Pulp task completed: task1" in messages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, I'd like to suggest you wait for the upload deletion to happen and verify it too.
| from .match import match_object | ||
|
|
||
| Publish = namedtuple("Publish", ["repository", "tasks"]) | ||
| Upload = namedtuple("Upload", ["repo_id", "tasks"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure if this API with Upload and upload_history is complete enough. I see a couple of issues with it:
Since it doesn't give you any info about what content was uploaded, all you can really check is whether the count of uploads is what you expected per repo. So, for instance, if you had a class which was generating content and uploading them to Pulp repos, you couldn't use the fake client to check anything about the content uploaded, whether they were uploaded with correct filenames, etc. It's really quite limited what you can do here. This isn't a deal breaker since it could potentially have more attributes added later.
The bigger issue is - what about other content types in the future? Right now, you've got upload_history and we only support file uploads. Later we will support other kinds of uploads. I'm thinking that it's quite possible different types of uploads ought to have different attributes collected in the history, so it may be a mistake to add one API which claims to support all kinds of uploads.
Can you please consider what to do about these issues? Also, it's always an option to leave out that new history API altogether now and let someone add it when they have a use-case for it.
| # pylint: disable=unused-argument | ||
| task = Task(id=self._next_task_id(), completed=True, succeeded=True) | ||
|
|
||
| self._upload_history.append(Upload(repo_id, [task])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please make this fail if the repo doesn't exist, like a real client would?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got this reply from Jay but now I can't find it in github for some reason:
Is this really necessary?
The use case for upload is to get repository first and then call repo.upload_file.
If the repo doesn't exist, it should fail at get_repository, do we
need to double-check here?
I would say that it is indeed still necessary since there can be any number of references to
the same repository. I think there's already some autotests on the publish side which cover this.
For example you can have:
repo1 = client.get_repository('some-repo').result()
repo2 = client.get_repository('some-repo').result()
repo1.delete().result()
# and now try to upload to repo2
# expected behavior: it should fail!
repo2.upload_file(...)
| Each element of this list is a named tuple with the following attributes, | ||
| in order: | ||
| ``repository``: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation doesn't match reality. The namedtuple you declared actually has a "repo_id" and not a "repository".
In this case, I think the documentation is better and the code should be fixed. Publish returns a repository object and not a repo_id, why should Upload be inconsistent with that? Can you please fix that to have a repository object as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy-pasted and wrong :(
| passing `relative_url` is mandatory. | ||
| Returns: | ||
| Future[:class:`~pubtools.pulplib.Task`] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation maybe doesn't match reality again - isn't it returning a future with a list of tasks, not a single task?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy-pasted and wrong again :(
| from .match import match_object | ||
|
|
||
| Publish = namedtuple("Publish", ["repository", "tasks"]) | ||
| Upload = namedtuple("Upload", ["repository", "tasks", "unit_type_id", "unit_key"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is not too good either, exposing the unit_type_id and unit_key here encourages devs to write tests against Pulp2 implementation details where it's not necessary to do so. upload_file is uploading a file "somehow" and the specifics of unit_type_id, unit_key are intentionally hidden, it's undermined by exposing them here like this.
If we are adding more attributes here I think they should try to be generic so as to expose as few Pulp2 implementation details as possible, while still meeting the requirements. So, what requirement is being met by these?
I think it's probably: allowing the developer of a test to verify that correct content was uploaded to correct path. Then I think it should be done more directly, like exposing "name" and ("sha256" or "content") attributes.
If you expose unit_key here, technically developers can look at unit_key["checksum"] to get the checksum. But that kind of implementation detail - "On Pulp, ISO units have a 'checksum' in unit key" - is exactly what this library is meant to be hiding.
Implemented basic uploading functionality, which could upload a single file to pulp.