Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to change output format? #61

Closed
PratikshaTaneja opened this issue Nov 15, 2016 · 3 comments
Closed

how to change output format? #61

PratikshaTaneja opened this issue Nov 15, 2016 · 3 comments

Comments

@PratikshaTaneja
Copy link

Output produced is in paragraph form, i want to have it in dialogues form separated by line. how to do so?

@ubergarm
Copy link
Collaborator

ubergarm commented Mar 6, 2017

The output is not necessarily in paragraph form, unless your input is. The output will attempt to mimic the exact syntax structure of the input. So design your input in the format you want the output to resemble. Perhaps use <start>a dialogue here<stop> markers around atomic dialogue units, or a special split character etc

@PratikshaTaneja
Copy link
Author

PratikshaTaneja commented Mar 21, 2017 via email

@ubergarm
Copy link
Collaborator

ubergarm commented Mar 21, 2017

This algorithm is trained by learning the probability distribution of a sequence of arbitrary characters. Then you generate strings of characters using sample.py based on the trained model.

The algorithm has no concept of higher level semantics expressed as we might discuss them. This is very powerful as it can work over almost any input data without a human trying to impart "expert knowledge" etc...

So if you want more human sensical output with guarantees around semantics/syntax you could:

  1. Start with say >= 10 MiB of input data that all consistently looks exactly like you want the output.
  2. Try to train at least a dozen different models with by varying the parameters of each.
  3. Compare the results of all of your models and select the best one by your criteria.
  4. Write your own custom post-processor that fixes up syntax and generates "dialogues" in the format you desire.

This is a powerful algorithm, but it is still not easy to make something that always passes the turing test across a broad input spectrum.

If you can't get the output you need, there are other similar systems that works on word instead of character encoding etc.

Thanks and good luck!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants