Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reset increment #1263

Closed
rgoubet opened this issue Sep 15, 2022 · 6 comments
Closed

Reset increment #1263

rgoubet opened this issue Sep 15, 2022 · 6 comments
Assignees
Labels
question The question stale

Comments

@rgoubet
Copy link

rgoubet commented Sep 15, 2022

Feature request

Unless I missed it, there doesn't seem to be a way to reset increments: if you generate data several times with the same schema, increments will pick up from the previous creation:

from mimesis import Field, Schema

_ = Field()
schema = Schema(schema=lambda: {
    "id": _('increment'),
    'name': _('full_name')})

for i in range(0,5):
    data = schema.create(5)
    print(data[0]['id'])

This returns:

1
6
11
16
21

Thesis

There should be an option to reset the increment each time data is generated.

Reasoning

When creating large amounts of data to export several times, you don't necessarily want increments to become huge.

@lk-geimfari
Copy link
Owner

lk-geimfari commented Sep 15, 2022

Hi! Actually, there is an accumulator argument for such cases: https://mimesis.name/en/master/api.html#mimesis.Numeric.increment

Here is a usage example:

>>> numeric.increment()
1
>>> numeric.increment(accumulator="a")
1
>>> numeric.increment()
2
numeric.increment(accumulator="a")
2
>>> numeric.increment(accumulator="b")
1
>>> numeric.increment(accumulator="a")
3

@lk-geimfari
Copy link
Owner

lk-geimfari commented Sep 15, 2022

In your case, you are using schemas wrong way.

Instead of doing this:

for i in range(0,5):
    data = schema.create(5)
    print(data[0]['id'])

Do this:

for i in schema.create(5):
    print(i['id'])

@lk-geimfari lk-geimfari self-assigned this Sep 15, 2022
@lk-geimfari lk-geimfari added the question The question label Sep 15, 2022
@rgoubet
Copy link
Author

rgoubet commented Sep 15, 2022

In your case, you are using schemas wrong way.

In my code example, I'm trying to create 5 fullfilled schemas (that I could then export 5 times) based on the same logical schema. And here, I cannot use a new accumulator every time, unless I instantiate a new Schema object every time.

@lk-geimfari
Copy link
Owner

@rgoubet Sorry, I don't get the idea. Can you, please, illustrate it on example?

@rgoubet
Copy link
Author

rgoubet commented Sep 26, 2022

My use case is that I want to create multiple, large random data sets in Excel files (generated with openpxl) for stress test purposes. So, let's say I want to create 5 files with 1 million rows each (I use 4 columns for readability, while in practice I get 30):

from mimesis import Field, Schema
from openpyxl import Workbook

_ = Field()

schema = Schema(schema=lambda: {
    "id": _('increment'),
    "timestamp": _('datetime'),
    'version': _('version'),
    'e-mail': _('person.email', domains=['argenx.com']),
    'token': _('token_hex'),
}

Now, I'll run a loop for each file, and use the iterator to preserve memory:

for i in range(0,5):
    wb = Workbook(write_only=True)
    ws = wb.create_sheet()
    for ix, v in enumerate(schema.iterator(1_000_000)):
        if ix==0:
            ws.append(list(v.keys())) # write headers
        else:
            ws.append(list(v.values())) # write data
    xl_file = os.path.join(path, f'data{str(i).zfill(3)}.xlsx')
    wb.save(xl_file)
    wb.close()

Now, it's all good, except that the id column increment continues in each file instead of restarting from 1. In my case, that could have been an issue as it can then become a larger number than I would want for the data type I want (turned out ok in the end).

As I said, maybe I missed something, but it would be nice to have a reset option (e.g. in the create and iterator methods) for the increments. Not critical at all, though.

@stale
Copy link

stale bot commented Jun 18, 2023

This issue has been automatically marked as stale because it has not had activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jun 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question The question stale
Projects
None yet
Development

No branches or pull requests

2 participants