Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Template functionality #45

Closed
KevinGimbel opened this issue May 31, 2017 · 22 comments
Closed

Proposal: Template functionality #45

KevinGimbel opened this issue May 31, 2017 · 22 comments
Assignees
Labels

Comments

@KevinGimbel
Copy link
Collaborator

fakedata template specs

This issue describes the upcoming template feature for fakedata and is the basis of discussion around implementation

Synopsis

Despite being powerful already, fakedata will benefit from a template mechanism. With the underlying Go template engine we can implement a easy-to-use template mechanism which allows users to generate fake datasets in all kinds of formats, regardless of build-in formatters like sql, csv, et all.

Go Templates allow to combine values with printf so we get a way to combine generators our of the box, e.g. {{ printf "%s, %s" Lastname Firstname }} to print out names in an alternative format to what we are used to in Germany (and other countries).

Command Line Interface changes

In order to support templates we need to add a new "top level" command to fakedata which can accomplish the following tasks:

  1. Read a file from disk
  2. Parse the file
  3. Run data generators over the provided template in a loop
  4. Write the combined output to Stdout or to a file

By introducing a new top level command we avoid interference with existing generators and separate this functionality from the "direct" generator invocation.

Template usage

All generators are available as template functions and can be used inside the template to generate data as seen below:

{{ Name }} <{{ Email }}>

The example above will generate Name and E-Mail pairs like so:

Jane Doe <jane@doe.com>
Jon Doe <jon@doe.com>

As mentioned above we can now also combine generators inside the templates which gives a huge advantage over using generators in a "as-is" way. Users can define their own format or create specific datasets if needed.

Additional template functions

Go templates have a FuncMap which is a mapping of functions available inside the parsed template. We either need to manually add all generators or find a way to automatically make them available.

I would like to propose the addition of functions like Last, First, Odd, and Even. I believe these "little helpers" can provide a great interface when generating data, e.g. for adding a final semicolon to the end of an SQL statement or to close a JSON block properly.

{
  {{ range . }}
    {
      "name": "{{ Name }}",
      "email": "{{ Email }}",
    } {{ if not Last }},{{end}}
  {{ end }}
}

Running the template generator on the template will create a file with valid JSON as seen below:

{
  {
    "name": "Kevin",
    "email": "mail@nota.website"
  },
  {
    "name": "Emma",
    "email": "emma@nota.website"
  }
}

So this is my proposal written off the top of my head. Are there any things I missed? Feel free to add your input, I believe I will start work on Friday or next week (Monday/Tuesday).

@KevinGimbel
Copy link
Collaborator Author

KevinGimbel commented Jun 1, 2017

Small Update: I started to work on the template thing yesterday and got a (very early) version to work. I'll keep developing it in my branch.

As of now it can handle templates like these:

{{- range $i, $j := (Loop 5) -}}
    {{ if Odd $i }}
        Odd = {{ $i }}
    {{- end -}}
    {{ if Even $i }}
        Even = {{ $i }}
     {{- end -}}
{{- end -}}
<table>
<thead>
    <tr>
        <td>ID</td>
        <td>Type</td>
        <td>Reported by</td>
    </tr>
</thead>
<tbody>
{{ range Loop 5 }}
    <tr>
        <td>{{ Int 12 15 }}</td>
        <td>{{ Enum "issue" "feature" "documentation" }}</td>
        <td>{{ Generator "email" }}</td>
    </tr>
{{ end }}
</tbody>
</table>

The Generators are currently (all) available via {{ Generator "genname" }}, tho this is a horrible API IMO. Some of the generators that take extra arguments need to have their own functions, like Enum and Int.

Here's a asciinema showing me execute the two templates. https://asciinema.org/a/5kkf2d7k4qlalklmpx0mu2row

@lucapette
Copy link
Owner

@KevinGimbel I'm sorry it took me a couple of days to reply (busy times!) and I feel a bit bad about the fact I'm not sure this is the right direction. The thing is: I'd like to keep the API as simple as possible and the fact we will have a range . and look inside the template makes the simplest format harder to write.

I've looked briefly at https://github.com/peteraba/fakie and the format seems nice to me. What do you think of that approach? I'm still up for introducing a more advanced format/set of APIs but I'd like to try avoiding a complex format for simple scenarios like:

$ echo '{"email":"{{email}}", "subject": "welcome!"}' | fakedata

As far as I understand we'd need to loop inside the template only with this proposal? I wonder if we could offer both approaches (like in "simple templates by row" and "complex ones where you loop on your own). What do you think?

@KevinGimbel
Copy link
Collaborator Author

Thank you for your feedback!

As far as I understand we'd need to loop inside the template only with this proposal?

As far as the implementation goes this is true. I am not happy with the fact a loop is required inside the template yet, it was simply the fastest approach.

I wonder if we could offer both approaches

I hope so! I am not yet sure how this would work but it's on my "list". Maybe we can tweek the template implementation so that it will loop when a) --limit is set or b) there's no range inside the template - I am not yet sure how this is done.

I'll take a look at fakie and see how they do the template implementation.

Some changes and additions I have in mind:

  • loop by default (maybe no-loop flag? / detection of range?)
  • Unified way to access generators - currently there's a handful implemented and the rest behind the Generator function
  • reading from Stdin - this is really nice to have
  • proper error handling

The current state is the result of 1-2h of work, so there's still a good bit of work ahead. I didn't open a PR so far because it's really just a first basic implementation - I'll keep working on it as I have the time and comment/update this issue along the way.

This was referenced Jun 3, 2017
@jorinvo
Copy link
Contributor

jorinvo commented Jun 3, 2017

Hi,
I like what you have so far!

  • I think the range functionality doesn't really need any special treatment. The default can stay as it is, generating lines of output, and When someone likes to control the looping manually because there is more going on as in the HTML example above, then a --limit 1 should be enough to restrict the template to be only used once and the looping can be done with range instead.

  • It should be definitely possible to automatically generate the FuncMap from the existing map of generators. I don't think we manually need to specify them.

  • And +1 for stdin!

@lucapette
Copy link
Owner

@jorinvo I don't think I understand what you mean about the range functionality. --limit 1 sounds like we'd be hacking the options :)

@KevinGimbel

loop by default (maybe no-loop flag? / detection of range?)

Loop by default is the only thing I have a strong opinion about. It's a much simpler/better user experience that way so I can't agree more. Detection of range doesn't sound that easy but, if it's possible, that's a great feature.

Unified way to access generators - currently there's a handful implemented and the rest behind the Generator function

Not sure I fully understand this. Probably an implementation detail? I generally get to the code only when I really understand what the feature is supposed to to. So my bad if I'm not getting it!

reading from Stdin - this is really nice to have

I can't agree more. I would still suggest we ignore it until the first implementation of the feature reaches master. But it's definitely something I'd like to add.

proper error handling

👍

@jorinvo
Copy link
Contributor

jorinvo commented Jun 3, 2017

@jorinvo I don't think I understand what you mean about the range functionality.

Sorry, maybe an example is more clear:

echo '{{range Loop 3}} {{name}} {{email}}{{end}}' | fakedata -l 3

That would output 9 lines:

Lenard Newton mattbilotti@example.us
Ambrose Arnold koridhandy@example.name
Olin Lyons aaronalfred@test.me
Drema Graves jarjan@test.com
Ching Hines xtopherpaul@example.name
Karyl Pittman arindam_@example.name
Aliza Morgan taylorling@test.org
Ezekiel Poole andrewcohen@test.me
Phylicia Hayes aaroni@example.org

And this one 3:

echo '{{range Loop 3}} {{name}} {{email}}{{end}}' | fakedata -l 1

Lenard Newton mattbilotti@example.us
Ambrose Arnold koridhandy@example.name
Olin Lyons aaronalfred@test.me

I can use the second example, if I like to handle all the looping myself.

For each "line" fakedata generates, I can still call as many generators in the template as I like right?

Then I could do something like this:

echo '<table>{{range Loop 3}}
  <tr>
    <td>{{name}}</td><td>{{email}}</td>
  </tr>
{{end}}</table>' | fakedata -l 1

to do this:

<table>
  <tr>
    <td>Lenard Newton</td><td>mattbilotti@example.us</td>
  </tr>
  <tr>
    <td>Ambrose Arnold</td><td>koridhandy@example.name</td>
  </tr>
  <tr>
    <td>Olin Lyons</td><td>aaronalfred@test.me</td>
  </tr>
</table>

Does this makes sense or am I missing something?

@lucapette
Copy link
Owner

I see! Thank you for the much more detailed explanation! Very much appreciated.

I think this would be too much overhead on the user (and it exposes the internals of Go templating too in a way). For simple use cases like:

echo '{{range Loop 3}} {{name}} {{email}}{{end}}' | fakedata -l 3

I really think users should be able to write:

echo '{{name}} {{email}}' | fakedata -l=3

The basic idea is that in most cases (maybe a personal opinion, I'm not so sure about it), users won't need to control the looping so I'd say we have to serve the most common case with the simplest feature. I think the fact that similar tools use this approach is a good indication that we shouldn't complicate the simplest use case.

I'm even thinking about dropping (in the first iteration of the feature) the support for a "range template" (this feature needs a name :)) so that we get the first version of this into master much faster. It would unlock 1.0.0 and give the opportunity of restructuring the docs (I think without knowing how templates look like, it's not worth it to re-organize the README).

To recap, this is what I would do:

  • First feature that loops on the template automatically. Template specified with an arg --from-template or something
  • Automatic range detection or new parameter that does --from-full-template (naming isn't gresat here)
  • Automatic stdin detection

The order of the last two features isn't relevant though (I think we could do them in parallel which makes it nicer).

@jorinvo
Copy link
Contributor

jorinvo commented Jun 3, 2017

The first example you gave will yield 9 lines. The second 3.
Also, only because the first one is possible, it doesn't stop the user from still using the second version.
And range is not a feature we need to implement. It's just there, already included in the template package.
What I try to suggest here is, that we don't need to do anything special about range. The way it already exists works already well together with fakedata.

@lucapette
Copy link
Owner

lucapette commented Jun 3, 2017

@jorinvo I'm not following you then. Because all your examples do use a range keyword and my point is: by default we should not ask users to come up with such a complex template. If what you're saying is that range always needs to be part of the template, then we just disagree on how the feature should look like :)

I know range is part of the Go syntax and I think that may be the source of confusion here. @KevinGimbel and I weren't debating how to implement loops in Go templating (I think we all know how it works but if we don't that's fine, we'll learn by doing :)). We discussed the public API for users. And I think we agree we don't want them to specify a loop unless they really need something special.

So maybe we're just saying the same thing? By default, no need to specify range, use it if you need a complex template.

@jorinvo
Copy link
Contributor

jorinvo commented Jun 3, 2017

That's basically what I'm saying. There is no need to use range.

My argument was about the issue of adding extra flags / ways of detecting if range is used or not.
I think that the combination of range and --limit allow for all possible customization a user might want to do and we don't need to add any extra functionality here.

@lucapette
Copy link
Owner

@jorinvo 👍

"communication is hard" :)

@KevinGimbel
Copy link
Collaborator Author

Quite some input! Thanks @lucapette and @jorinvo.

What I get out of the discussion is the following:

  • Use --limit for template looping (default)
  • Document use of Go template range and --limit 1
  • Find a way to generate FuncMap
  • Read from Stdin

@lucapette
Copy link
Owner

@KevinGimbel 👍 as it's a lot of work I'd take out reading from stdin from the first take but not saying no to it if you feel like we can do it in one big PR :)

@jorinvo
Copy link
Contributor

jorinvo commented Jun 6, 2017

I think ioutil.ReadAll(os.Stdin) is even simpler than adding a new flag to specify a file, validate it and read the file's content :)
Or would you like to support both ways?

@KevinGimbel
Copy link
Collaborator Author

@lucapette @jorinvo I'll look into reading os.Stdin. I think it is not too complex after all. :)

@lucapette
Copy link
Owner

@jorinvo @KevinGimbel I wasn't specifically talking about the production code but more all the things around building it (docs, integration tests, and so on).

Reading from stdin per se is not that hard, I agree. If we support both ways (pipe and argument) then we'd need some automated detection of input coming from stdin (right? I don't think I'm complicating it but maybe I'm missing something).

I'm thinking we should support offer both approaches otherwise we'd be forcing users to pipe templates even when they have it stored in a file which may be not as nice as fakedata --from-template=/tmp/whatever.tmpl

@jorinvo
Copy link
Contributor

jorinvo commented Jun 6, 2017

I agree, if you plan on supporting both, it's probably simpler to only implement one first.

KevinGimbel added a commit that referenced this issue Jun 7, 2017
This commit adds the abillity to use cli pipes to pipe templates into
`fakedata`. It also adds all generators as named functions to the
template funcMap.
@KevinGimbel
Copy link
Collaborator Author

KevinGimbel commented Jun 7, 2017

Update: CLI pipelines and templates work now. By default the limitFlag is used and the template is executed multiple times. If a range inside the template is needed there's a Loop function.

I added a few comments to the commit to explain a few changes, see: 5508aa1

Well the biggest change is that I opted in for defining all generators by hand as you can see here. I spent a few hours trying to load/assign the generators automatically but it just did not work. I created a loop to go over all generators and then tried to assign their .Func to the template - this worked in theory but had a weird side effect: Only one function (.Func) would be assigned to all named functions. So {{ Name }}, {{ Int }} could produce a template with 2 values from Enum - I tried to understand what was going on but couldn't find a solution to the propblem.

In the end - and because some functions needed to be declared by "hand" anyway, I decided to just add the generators by myself. This way we also have a bit more control over them and can handle custom params like with Enum (see: https://github.com/lucapette/fakedata/blob/feat/45/templateGenerator/pkg/fakedata/template.go#L116-L119).

Showcase:

@KevinGimbel
Copy link
Collaborator Author

You may have seen that the Travis CI build is still failing. I'm on it. 😅 https://travis-ci.org/lucapette/fakedata/jobs/240295071

@lucapette
Copy link
Owner

@KevinGimbel this is looking pretty promising! Do you mind opening a PR already and mark it as wip? I think I'd get a better sense of how to plan shipping it in the context of #43 and #48 (seems hard!

@KevinGimbel
Copy link
Collaborator Author

@lucapette I created a pull request, see #49

@KevinGimbel
Copy link
Collaborator Author

Closing this issue because the PR has been merged. 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants