New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting the same sentences multiple times #241

Closed
a3nm opened this Issue Jun 24, 2017 · 11 comments

Comments

Projects
None yet
7 participants
@a3nm

a3nm commented Jun 24, 2017

As I user, when recording, I often get the same sentences multiple times -- sometimes you can get twice the same sentence in one batch of three recordings.

There should be some way to remember which sentences have already been served to a given session, and not serving them again once they have been recorded.

@espadrine

This comment has been minimized.

espadrine commented Jun 24, 2017

One part of it is that the sentence dataset is really small, at 188 sentences (I expected thousands.) May I suggest the use of Wikisource? Something speech-heavy, like Agatha Christie, or modern speeches, maybe a play?

The second part is that it simply uses Math.random(). It will not yield well-distributed results, so you risk having a large disparity between the number of recordings per sample. One solution is to increment a global variable on the server, and yield the sentence at that index (and wrapping around, obviously). That variable should be initialized to a random value (or persisted), to prevent one part being repeated due to server restarts.

@orschiro

This comment has been minimized.

orschiro commented Jul 16, 2017

I think this should be dealt with high priority. If you as an avid and excited user get to see and record the same sentences over and over again, it takes away your motivation to contribute.

@mikehenrty

This comment has been minimized.

Contributor

mikehenrty commented Jul 17, 2017

Is this still happening? We have increased our sentences to several thousand (with more coming soon).

(Note, this is not a duplicate of #260, that one is about listening to sentence, this one is about recording. They come from entirely different pools).

@mikehenrty mikehenrty marked this as a duplicate of #260 Jul 17, 2017

@orschiro

This comment has been minimized.

orschiro commented Jul 18, 2017

@mikehenrty I am receiving a lot of new sentences now. Thanks!

@nmstoker

This comment has been minimized.

Contributor

nmstoker commented Jul 18, 2017

Am late to this, but if you're expanding the range of sentences, might be worth considering phonetic pangrams, as by definition these cover a large chunk of sounds quickly (although they are typically unrealistic)

https://www.quora.com/Is-there-a-text-that-covers-the-entire-English-phonetic-range/answer/Sheetal-Srivastava-1

@a3nm

This comment has been minimized.

a3nm commented Jul 18, 2017

I have tried a bit, and it seems like there are now sufficiently many different sentences to avoid getting the same ones multiple times. Thanks for fixing!

@a3nm a3nm closed this Jul 18, 2017

@a3nm

This comment has been minimized.

a3nm commented Jul 23, 2017

I'm reopening because the pool of sentences is not so large after all: you can still get the same sentences occasionally when you record a sufficient number of them (around 100), even in the same session.

I think this could be fixed (within one session) by remembering which sentences have already been recorded, and not asking for these same sentences again.

@a3nm a3nm reopened this Jul 23, 2017

@Omniscimus

This comment has been minimized.

Contributor

Omniscimus commented Jul 23, 2017

This will probably be fixed once #304 gets accepted, but before that remembering sentences in a session could be a temporary fix. Allowing users to skip sentences (#278) would probably be a good enough fix as well for now (and skipping would be useful anyway).

@a3nm

This comment has been minimized.

a3nm commented Jul 23, 2017

Skipping sentences would help but it's a bit more tedious, and also as a user I'm not always sure whether I have already seen a sentence or not. (Did I see it when recording, or when validating? Was it that sentence, or another sentence from the same novel? etc.) So even if users can skip sentences there would probably be some frustration and some duplicate recordings (but I don't know whether having duplicate recordings of the same sentence by the same speaker pollutes the dataset).

@mikehenrty

This comment has been minimized.

Contributor

mikehenrty commented Jul 25, 2017

Let's close this bug as we are actively trying to gather new sentences in #341. We are increasing our sentences by the day.

@mikehenrty mikehenrty closed this Jul 25, 2017

@a3nm

This comment has been minimized.

a3nm commented May 26, 2018

I think this should be reopened: while recording some sentences this morning I got the same one multiple times again. ("Gossips are frogs, they drink and talk", and "Where did he get it", if I remember correctly.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment