Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEATURE REQ: concatenate fields #17

Closed
gnanet opened this issue May 19, 2017 · 21 comments
Closed

FEATURE REQ: concatenate fields #17

gnanet opened this issue May 19, 2017 · 21 comments
Labels

Comments

@gnanet
Copy link

gnanet commented May 19, 2017

I know the name generators do only english names, but i wanted to generate a hungarian name order with "lastname firstname" into one field, and found out that this is currently not possible.

This also leads to a possible new feature, that a language specific dict map may be loaded run-time with a parameter :) But let me go back to the original idea

The enum function allows freely definiable datasets, but sometimes it would be desired to combine at least two fileds into one, with a separator (that may also be empty)

I am in no means a programmer, but can read code and found there are working examples that already do this kind of data generation:
domain => domain.name,domain.tld,"."
email => username,domain,"@"
name => name.first,name.last," "

Maybe a generator similar to enum could implement the withSep function as a field definition, that is able to understand the 3 parameters, but the splitting of the parameters would need a bit thinking...

Anyway, thank you for this great tool, and that you in advance, if you find the time and motivation to work on this feature.

@gnanet
Copy link
Author

gnanet commented May 21, 2017

oh f..k, it looks like have done it, and it survives make test, and seemingly it works,

But i did not done a fork, and i dont know how to make a pull request for my changes.

also i added a nev generator for enabling hungarian name order, and changed the dict.go to contain hungarian first and lastnames

@gnanet
Copy link
Author

gnanet commented May 21, 2017

Created #18 but seems that with editing here on github, i could not to resolve the gofmt -s problem, that is reported by travisCI.
make ci worked for me locally, but that is a checkout of the original master of you, and i am also not experienced in git either to push my changes to my own fork on github :(

@gnanet
Copy link
Author

gnanet commented May 21, 2017

Closed Pull request #18 , and opened #19 this should also pass TravisCI tests

@lucapette
Copy link
Owner

@gnanet first of all, thank you very much for the feature request! It's really
cool to see some real world needs.

I'm very grateful you started working on a PR but somehow sad I couldn't manage
to provide some input before you started. I'll provide some specific feedback in
the PR itself.

I've been thinking about a way to provide a composed generator but, so far, I
couldn't find an API that satisfies me. I think your attempt is good and
therefore a good entry point for the conversation but I would like to provide a
more general API. What I have in mind is the following:

  • Come up with some syntax that let's users define how to generate a specific
    column
  • Make sure that the syntax we use allows users to use constraints on the
    generated data (arguably the most interesting feature about fakedata in the
    long term. Now it's just a little project:)
  • Do not introduce a new option (like we did for -t=SQL_TABLE). This point is
    important because I think it's relevant that fakedata keeps the API in the
    form of fakedata [...option] [...generator]
  • If we're introducing a new generator (I think we should!) then I would avoid
    the constraint the current suggestion has: in the current proposal it's
    possible to join only two fields together.

Given these points, I suggest we try something like:

$ fakedata composed,"TEMPLATE"

Where "TEMPLATE" follows a specific syntax. This approach checks the points I
mentioned but it does come with some tricky points that need discussion:

  • composed isn't great naming. I thought of template too but I like none so
    far. Suggestions are more than welcome!
  • I would like to provide close to no syntax. As the less syntax there is, the
    more likely people will use the feature.
  • Building a composed (or whatever name we find together) generator opens the
    door to another feature I have in mind. I'd like to add aliases. The idea is
    that some generators are composed with existing generators (example: email
    composed ofusername and domain and a @). That means we can rewrite some
    of the generators using aliases (and use the syntax in the docs too). That way
    we could support some .fakedatarc that loads user-defined aliases. Maybe I'm
    taking the idea too far :) but I think it's worth considering at least the
    rewriting part when implementing the new generator

Of course, the hardest part is the syntax. My first suggestion would be taking
inspiration from phony (which, in turn,
greatly inspired fakedata) and go for something like:

$ fakedata email composed,"server{int,1..5}.{domain}"
test@example.com server3.test.com

I feel that requiring users to use quotes and surround fields with {}is kind
of cumbersome but it does meet all the requirements and we could consider this
kind of an advanced feature.

Personally, I'm not really happy with my own proposed syntax (sigh!) and I hope
to get some feedback about it and finally start working on it.

@KevinGimbel
Copy link
Collaborator

I thought about this for a while after working on the validations and I believe the most dynamic/open way would be to add template support where users could actually write a template file and we parse it. Go has a great standard template engine which isn't too complex to get into (from a users point of view).

By adding template support we could also add custom function for joining generators and then people could use these functions inside the templates to generate whatever type of datasets they want and join generators as needed.

I guess this is harder to implement but - in my opinion - brings the most flexibility in the long term.

@gnanet
Copy link
Author

gnanet commented May 21, 2017

Without deep knowledge about the mentioned template engine of Go, i would say that we should not re-implement the wheel, if we can build the data template syntax on top of it, then we should do it so.

Btw, i for at least 1,5 hours i was not aware what the purpose of generators.golden was, i just simply followed the information it contained, and while extending generators.go, i also added the relevant information. I did not know that this is the intergration test, that said long time i just could not understand why make test fails at that diff part, until i done the diff myself for the -g output and understood that a newline on the end of generators.golden made my test fail so much hours.... :)

I will continue thinking about the above mentioned points of the custom/complex/freeform generator (just to add some extra name ideas ;) ) and get back to you if i have useful to write.

@KevinGimbel
Copy link
Collaborator

@gnanet it's a good point you raised with the names. Names are hard to get right because there are so many different formats (like you said, in Hungarian it's lastname firstname). I believe we can come up with a good solution and the diversity you add to the project is already valuable! :)

@KevinGimbel
Copy link
Collaborator

@gnanet also in case you want to read up on Go templates here's the package documentation: https://golang.org/pkg/text/template/

@gnanet
Copy link
Author

gnanet commented May 21, 2017

@KevinGimbel yesterday i tried to be smart by reading the docs, but the golang docs seem dry to me like the data descriptions from the RFC about any internet protocol. I think i can learn better trough examples, but i will surely find my way ;)

Right now an idea also came into my mind, that a separate ticket should be opened about the functionality and purpose of .fakedatarc

@lucapette
Copy link
Owner

@KevinGimbel I'm not sure I understand how your suggestion would work in practice. I don't have a strong opinion on the syntax we use but I'd like to keep the API on the commandline as simple as possible. If what you're proposing is harder to implement but easier to use for the end user then we shouldn't care it's harder to do :) Do you mind sharing some example of how your proposal would look like?

As far as I understand, it would be something like:

$ fakedata --input=template.txt

which is different from what I have in mind but I see the point. We could still provide a shortcut for that with an inline (maybe better name) generator.

@gnanet I wrote about integration tests here. There's a short explanation of what Golden files are too.

@KevinGimbel
Copy link
Collaborator

@lucapette I made a quick-n-dirty go snippet to showcase the template approach. https://play.golang.org/p/5qQYBqnWax

The todos would basically be:

  • Implement a Template mechanism around text/template
  • Implement template helpers (lowercase, join, split, etc.) and make all generators available inside templates
  • Document (advanced) Template usage

The CLI usage would still stay the same because the template feature builds upon but does not change the existing generators (at least I believe it's that way). People who just need a quick set of data can still run fakedata name email int,10..1000 and get their data. People who want more control or need a different format (e.g. JSON or XML) can use a template.

{{ range . }}
{{- $name := printf "%s %s" .Lastname .Firstname -}}
{{- $orderItems := RandomInt 10 15 -}}

<order orderid="{{ RandomInt 10000 99999 }}">
<customer>
  <name>{{ $name }}</name>
  <email>{{ .Email }}</name>
</customer>
<items>
{{ range loop $orderItems }}
<item>
  <name>{{ ProductName }}</name>
</item>
{{ end }}
</items>
</order>
{{end}}

Go Playground: https://play.golang.org/p/3j_fcdi7XW

Templates probably need the most planning and the most work but long term they are the most flexible. Theoretically they could be written inline but it's a lot of stuff to write. Maybe we can allow every generator to use a template inline like so $ fakedata name"{{ .Lastname .Firstname }}" but that's the syntax you don't like and it can become complex real quick.

@lucapette
Copy link
Owner

@KevinGimbel thank you for the example. It does look like a lot of work but if we can justify it I don't mind. I do like the idea of providing a more powerful API for those that need to create more complex random data set while keeping the everyday usage of the CLI as easy as today.

Anyway, I'm still not sure how you'd like users to use templates from a CLI perspective. What do you have in mind?

@lucapette
Copy link
Owner

About the ability to inline generators. I wasn't thinking about rewriting the existing API. I really like the simplicity of the current approach: fakedata email ip looks great to me. What I was trying to suggest was a way to leverage your proposal from the CLI too as a column so we can do things like fakedata inline,"uppercase .Firstname". Which would look great for the avg use case (in a way that's how this conversation started I think)

@KevinGimbel
Copy link
Collaborator

I, too, would like to keep fakedata simple and straightforward. The default should stay with generation to stdout, so fakedata email ip works as expected and keeps working.

For this reason I would introduce a sub-command, fakedata template (or compose, source, file, ...) with --in and --out as options. If --out is not defined we write to stdout, if it's defined we create a file, for example $ fakedata compose --in order.tmpl --out order.xml creates a file named order.xml from the template order.tmpl. For template based processing I'd like to not have to specify generators, the only parameter left are --limit, --in, and --out. They should be available from the template and the template defines what generators I want to use - I am not sure yet how this would be implemented.

The "workflow" would be:

$ cat user.tmpl
<users>
{{ range . }}
  <user>
    <name>{{ .Name }}</name>
    <age> {{ Int 18 99 }}</age>
    <email> {{ .Email }}</email>
  </user>
{{ end }}
</users>

$ fakedata compose --in user.tmpl --out user.xml --limit 1
$ cat user.xml
<users>
  <user>
    <name>Kevin Gimbel</name>
    <age>23</age>
    <email>catchall@nota.website</email>
  </user>
</users>

The names (compose, --in, --out) are of course discussable and just what came to my mind.

As for inline, it will get quite messy IMO. We need another separator or keyword to assign templates "on the fly", e.g. $ fakedata email name;".Lastname .Firstname". As far as I can tell we currently have the following special charachters:

  • , to indicate a generator option, e.g. (int,1..5)
  • .. for a range (e.g., int,1..100
  • = for column names (sql) (e.g. login=name)

Introducing a new separator like ; might not be the best idea. It gets complicated and it's hard-ish to type, e.g. fakedata login=name;".Lastname .Firstname". Keeping , as separator means we need to do more thinking when parsing the keys or enforce a fixed order - both not optimal IMO. Also ; is a shell character, this might break things anyway. 😄

When I find the time I'll do some prototyping and see what I can come up with as I work. Maybe it's becoming clearer to me then.

@lucapette
Copy link
Owner

@KevinGimbel ok you convinced me :) I like the workflow too.

A few notes:

  • Let's ignore the inlining for now. I agree it can get messier so I'd say we give it another try once the feature is stable
  • I think we could do without the subcommand compose. Like in fakedata --in file.tmpl and have a check that says "You can either use --in or generators args". Not sure I'm explaning myself the best way today. But it's a messy day :)
  • I think --in or --input are fine names but I feel like stating the obvious may be a better choice. Something like --from-template=file.tmpl
  • I would always default to stdout as it's easy to do fakedata --from-template=file > output.xml. But maybe there's a good reason to add the option and I can't see it?

As a side note, I'd keep in mind that it would be good user experience if fakedata would detect there's data on the standard input so users can echo "{{.Name}}" | fakedatawhich is nice and short. Maybe it removes the need for the inlining I so much want (ehehe sorry :)) completely. I would add this feature on top of the actual templating feature so we can focus on this beatiful addition tofakedata`. There's a lot to write to make it work and it's very exciting.

I think this feature + some more generators and minor gardening can give us a proper v.1.0.0. So I will do the following as soon as I find a minute:

  • I'll create all the issues for the generators I have in mind and the minor additions so that we can work in parallel and give other people chances to contribute with smaller features
  • I create a couple of milestones so that we know where we're going
  • 🎉

I'm very happy about this conversation and the feature we're designing. Thank you everyone for the awesome help!

@KevinGimbel
Copy link
Collaborator

Looks good to me! I'm a huge fan of piping output on the CLI, so echo "{{.Name}}" | fakedata is a thing I would like to have. I've no idea how this is done with Go but I am sure there is a way to read from stdin / check if something is written to stdin.

As for the output redirection with fakedata --from-template=file > output.xml I only see one issue and that's an issue we have now, too: If the command fails, the error message is written into output.xml - I am not sure if there is a way to break the output redirect on error or if we need to handle this at all.

I'd love to work on the templating functions, I wanted to learn more about Go templates anyway and having a real use case makes it "easier" because I don't need to think of a project of my own. 😄

@lucapette
Copy link
Owner

@KevinGimbel I've looked into detecting stdin a while ago but I don't remember the details right now, we can do it in a separate PR once we've got the feature.

I think you have a good point about the error. I'll do some research about how other CLI handle the issue, right now it's not a big issue as I think it'll take a while before we land the feature in master. I'm very happy to hear you'd like to give it a try. I'll focus on the issues/milestone then!

@gnanet do you like where we're heading with the feature? In the end you requested it, your feedback is more than welcome!

@gnanet
Copy link
Author

gnanet commented May 22, 2017

Guys, as you may see i do lot more at night/evening at least in CEST zone. I will have to read all trough, but the idea to keep things still simple, while adding a lot more complex templating feature. This goes beyond my simple expectations, but i find, it will be good adventure into golang for me.

@KevinGimbel
Copy link
Collaborator

@gnanet would you like to give templates a try? Otherwise I'll start working on it in the coming days. :)

@lucapette lucapette added feature and removed feature labels May 26, 2017
@lucapette lucapette mentioned this issue May 30, 2017
7 tasks
@gnanet
Copy link
Author

gnanet commented May 30, 2017

The subject of my issue got far from to what it is evolved, do you think it would be wise to change the subject at least to better reflect the changes it started in the project?

@lucapette
Copy link
Owner

@KevinGimbel @gnanet I'm closing this one as we're discussing it in #45

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants