Skip to content
This repository has been archived by the owner on Jan 25, 2023. It is now read-only.

Commit

Permalink
Merge pull request #7 from moj-analytical-services/gb
Browse files Browse the repository at this point in the history
Gb
  • Loading branch information
George Kelly committed Feb 26, 2021
2 parents 7093762 + 2dfdfe4 commit 4fe81d7
Show file tree
Hide file tree
Showing 7 changed files with 93 additions and 50 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/pythonpackage.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ jobs:
strategy:
max-parallel: 4
matrix:
python-version: [3.6, 3.7]
python-version: [3.6, 3.7, 3.8, 3.9]

steps:
- uses: actions/checkout@v1
Expand Down
21 changes: 21 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,3 +75,24 @@ sc = {
```

The sc parameter works for the key/value pair where the key is the column name (in the data to be generated) and the value is a string that is the name of the provider in the Faker package that is called under the hood of this package [(this line shows how).](data_generator/data_generator.py#L103). To find what other types of specific strings you can generate e.g. address, name, last_name, etc. Please look at [Faker providers](https://faker.readthedocs.io/en/stable/providers/baseprovider.html).


## Setting a seed

If you want to get the same data every time you run, set the seed.

```python
mf = MetaFaker(meta=meta, special_cols=sc)

mf.seed = 888
```

## Changing the locale

As of `v0.0.4`, this package defaults to British special characters where they are available (e.g. address). This can be changed by:

```python
mf = MetaFaker(meta=meta, special_cols=sc, locale="en_US")
```

For a full list of locales, see https://faker.readthedocs.io/en/master/locales.html.
4 changes: 3 additions & 1 deletion data_generator/data_generator.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ def __init__(self, meta: dict, **kwargs):
"""
self.columns = meta["columns"]
self.special_cols = kwargs.get("special_cols", {})
self.fake = Faker()
self.fake = Faker(kwargs.get("locale", "en_GB"))
self.default_min = kwargs.get("default_min", -1000)
self.default_max = kwargs.get("default_max", 1000)
self.null_probability = kwargs.get("null_probability", 0.1)
Expand Down Expand Up @@ -98,9 +98,11 @@ def fake_character(self, special_type: Optional[str] = None) -> str:
By default uses faker to return 1 - 10 random words with normal spacing.
If special_type is given then the special type of faker property is called.
E.g. if special_type = 'email' then the function would return fake.email().
Also removes new line characters from special types.
"""
if special_type:
value = getattr(self.fake, special_type)()
value = value.replace("\n", " ")
else:
value = " ".join(self.fake.words(randint(1, 10)))
return value
Expand Down
80 changes: 49 additions & 31 deletions poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "data_generator"
version = "0.0.3"
version = "0.0.4"
description = "Generates data from etl manager meta data"
authors = ["Karik Isichei <karik.isichei@digital.justice.gov.uk>"]
license = "MIT"
Expand All @@ -18,7 +18,7 @@ classifiers = [

[tool.poetry.dependencies]
python = "^3.6"
Faker = "^4"
Faker = "^6.0"

[tool.poetry.dev-dependencies]
pytest = "^3.4"
Expand Down
22 changes: 11 additions & 11 deletions tests/data/output/seed_test.csv
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
my_int,my_character_enum,my_email,my_datetime
,b,ruth65@jackson-long.info,2013-06-13 05:11:07
16,b,alexcarpenter@yahoo.com,1995-04-30 10:23:29
16,c,jeremyfranklin@sosa.info,2017-10-15 20:25:05
17,a,heather83@smith-moss.net,1991-12-27 06:57:23
18,c,mvalentine@hotmail.com,1980-03-28 07:31:18
,a,uwalker@hotmail.com,1984-04-21 18:36:57
13,c,wstevenson@hotmail.com,1992-11-08 17:18:12
11,c,zwise@yahoo.com,1972-10-21 20:26:53
10,a,harringtonapril@yahoo.com,1973-05-18 07:35:46
11,b,jacksonkyle@nguyen.com,1991-03-13 15:48:11
my_int,my_character_enum,my_email,my_datetime,my_address
,b,fjackson@yates-henderson.com,2017-11-25 14:22:40,Studio 64y Kimberley pines Patriciabury EX3 9NN
20,c,msmith@booth.com,2005-03-26 21:30:35,7 Stanley summit Abdulview TQ8E 3QR
16,b,charliehill@hotmail.com,2020-04-27 19:38:40,5 Paula trail Stewartbury B8 7WD
20,b,hollowaybradley@thompson.com,1986-11-03 05:42:53,Flat 4 Jemma unions North Craigberg B1A 9UG
14,a,norman87@hotmail.co.uk,1975-08-24 19:44:41,6 Beth union Gavinport M9 4QJ
11,b,carlynaylor@gmail.com,2001-06-14 21:54:05,036 Reed pine Marcstad WC74 9EE
10,c,shaun44@hotmail.co.uk,2019-08-07 07:34:51,Flat 0 Janice loop Port Kim BB4 8PA
13,c,seanowen@yahoo.com,2009-04-06 15:46:45,2 Mary plaza Joannahaven TQ66 5AW
,c,thomasleon@morgan.biz,1977-10-12 10:23:50,Flat 3 Joel bypass Lake Mathew B4E 7BE
20,a,bradleyrhys@pugh-turner.com,2014-11-11 09:09:19,059 Pamela lake Quinnfurt N50 8UG
10 changes: 6 additions & 4 deletions tests/test_meta_faker.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,9 @@ def test_readme():
row = mf.generate_row()

#  Make checks against row
assert row["my_int"] >= 10 or row["my_int"] is None
assert row["my_int"] <= 20 or row["my_int"] is None
if row["my_int"] is not None:
assert row["my_int"] >= 10
assert row["my_int"] <= 20

assert row["my_character_enum"] in ["a", "b", "c"]

Expand Down Expand Up @@ -177,12 +178,13 @@ def test_seed():
{"name": "my_character_enum", "type": "character", "enum": ["a", "b", "c"]},
{"name": "my_email", "type": "character",},
{"name": "my_datetime", "type": "datetime",},
{"name": "my_address", "type": "character",},
]
}

sc = {"my_email": "email"}
sc = {"my_email": "email", "my_address": "address"}

mf = MetaFaker(meta=meta, special_cols=sc)
mf = MetaFaker(meta=meta, special_cols=sc, locale="en_GB")

mf.seed = 888

Expand Down

0 comments on commit 4fe81d7

Please sign in to comment.