-
Notifications
You must be signed in to change notification settings - Fork 833
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting the same sentences multiple times #241
Comments
One part of it is that the sentence dataset is really small, at 188 sentences (I expected thousands.) May I suggest the use of Wikisource? Something speech-heavy, like Agatha Christie, or modern speeches, maybe a play? The second part is that it simply uses |
I think this should be dealt with high priority. If you as an avid and excited user get to see and record the same sentences over and over again, it takes away your motivation to contribute. |
Is this still happening? We have increased our sentences to several thousand (with more coming soon). (Note, this is not a duplicate of #260, that one is about listening to sentence, this one is about recording. They come from entirely different pools). |
@mikehenrty I am receiving a lot of new sentences now. Thanks! |
Am late to this, but if you're expanding the range of sentences, might be worth considering phonetic pangrams, as by definition these cover a large chunk of sounds quickly (although they are typically unrealistic) |
I have tried a bit, and it seems like there are now sufficiently many different sentences to avoid getting the same ones multiple times. Thanks for fixing! |
I'm reopening because the pool of sentences is not so large after all: you can still get the same sentences occasionally when you record a sufficient number of them (around 100), even in the same session. I think this could be fixed (within one session) by remembering which sentences have already been recorded, and not asking for these same sentences again. |
Skipping sentences would help but it's a bit more tedious, and also as a user I'm not always sure whether I have already seen a sentence or not. (Did I see it when recording, or when validating? Was it that sentence, or another sentence from the same novel? etc.) So even if users can skip sentences there would probably be some frustration and some duplicate recordings (but I don't know whether having duplicate recordings of the same sentence by the same speaker pollutes the dataset). |
Let's close this bug as we are actively trying to gather new sentences in #341. We are increasing our sentences by the day. |
I think this should be reopened: while recording some sentences this morning I got the same one multiple times again. ("Gossips are frogs, they drink and talk", and "Where did he get it", if I remember correctly.) |
As I user, when recording, I often get the same sentences multiple times -- sometimes you can get twice the same sentence in one batch of three recordings.
There should be some way to remember which sentences have already been served to a given session, and not serving them again once they have been recorded.
The text was updated successfully, but these errors were encountered: