Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Fake Binary Content #603

Closed
shahar4499 opened this issue Sep 17, 2017 · 12 comments · Fixed by #1069
Closed

Feature Request: Fake Binary Content #603

shahar4499 opened this issue Sep 17, 2017 · 12 comments · Fixed by #1069

Comments

@shahar4499
Copy link

Let's say I want to write unit tests for my web service that has a file upload feature.
It would be great if I can fake random binary content of popular binary file types, especially images and documents, like:

  • *.jpg
  • *.pdf
  • *.doc/docx

I think image uploading is probably the most common type of uploaded binary content, so it may be a good place to start.

Why would someone need this?
This could be great mainly for testing file upload APIs for several reasons.
It's way easier if the unit tests can generate fake binary content, so they will not depend on actual fixed binary files (and syncing them between teammates, YUCK).

Some use case examples:

  • A user uploads any format of an image (jpg, png, bmp, etc.) and the server needs to recognize the image type (by its binary structure, headers etc.) and store/convert/compress/whatever according to it.
  • A user uploads a document in a popular format (.pdf, .docx, .pages) and the server needs to read the content for parsing/analysis/whatever.
@shahar4499
Copy link
Author

There's a couple of code samples I found to get an idea of how we can accomplish this for images.
These depend on Pillow (a fork of Python Image Library which has better support and an easier API).

@malefice
Copy link
Contributor

@fcurella I will try and take a swing at this, but my main concern are the additional dependencies. Do we keep dependencies like Pillow optional?

@malefice
Copy link
Contributor

I am already done with fake zip and tar files, both with options to specify the number of files, the minimum file size per file, the total uncompressed size, and compression type to use. Hopefully you can give me feedback regarding additional dependencies @fcurella.

@mabuelhagag
Copy link

@malefice Is there a PR for this?

@malefice
Copy link
Contributor

@mabuelhagag nope, it is still in my local branch. My other PR #1052 has some significant changes, and I would rather have that merged first, so that my PR for this will already be rebased properly.

I think it has been a busy last couple of days in the US because it is almost Thanksgiving and Black Friday, so I suggest checking back later.

@fcurella
Copy link
Collaborator

fcurella commented Dec 5, 2019

I think the best approach would be to have those providers (ie: the ones requiring additional deps) as external providers, and link to them from the "community providers" page.

@fcurella fcurella reopened this Dec 5, 2019
@mabuelhagag
Copy link

I agree with @fcurella .

@malefice if you have code or pointers on how to approach this, I will help.

@n1ngu
Copy link
Contributor

n1ngu commented Aug 26, 2020

I'd love to see this implemented.

@malefice Regarding additional dependencies, I think this could be addressed with setuptools "Extras" https://setuptools.readthedocs.io/en/latest/setuptools.html#declaring-extras-optional-features-with-their-own-dependencies

If someone depends on, let's say, faker[img, pdf] that would tell pip to install whatever Faker likes to generate those blobs. Pillow and WeasyPrint, for example.

Then, at import time every extra faker can be enabled/disabled using blocks like

try:
  import PIL
except ImportError:
  pass
  # or write a dummy faker that won't work with the same interface
else:
  # implementation

(There are many flavours on how to get this working, I couldn't tell the best)

This means that if an extra dependency happens to be installed and can be imported, the extra fakers would be functional regardless of why that dependency got installed.

@n1ngu
Copy link
Contributor

n1ngu commented Aug 26, 2020

What bugs me is: what should the faker return to be usable? A file-like object? A byte array?

Byte arrays can have a heavy memory footprint for big files.

If we go for file-like objects, this https://docs.python.org/3/library/tempfile.html#tempfile.SpooledTemporaryFile looks like a good default bakend to me. But then, should the file backend be pluggable? Should Faker require the user to provide the file object?

How would this integrate with django file/image fields, for example? (I'm thinking about how this would be used in the factory-boy library to create django models and whatnot, but maybe its not an issue for this project)

@malefice
Copy link
Contributor

@n1ngu, thanks for the suggestions. Currently, I am working on #1162, so if you like to take a shot at this issue, then feel free to submit a PR.

What bugs me is: what should the faker return to be usable? A file-like object? A byte array?

Currently, the zip and tar provider methods return bytes, and the assorted provider methods for creating delimiter-separated values return string. Internally for buffering, the former uses io.BytesIO, and the latter uses io.StringIO(), so it should be easy to add support for returning those objects instead of bytes or string if desired.

How would this integrate with django file/image fields, for example?

If you are using factory-boy with current return values, you pass the return value to a file-like interface, and then use the from_file kwarg of factory.django.FileField or factory.django.ImageField. View the relevant docs for other options. It is up to you how you want to do it which is why Faker just generates the raw content.

@github-actions
Copy link

This issue is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale label Jun 14, 2022
@github-actions
Copy link

This issue was closed because it has been inactive for 14 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants