Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After saved, new words created, duplicates. #10

Closed
xshy216 opened this issue Aug 27, 2020 · 16 comments
Closed

After saved, new words created, duplicates. #10

xshy216 opened this issue Aug 27, 2020 · 16 comments
Labels
bug Something isn't working

Comments

@xshy216
Copy link

xshy216 commented Aug 27, 2020

Describe the bug

"slate-transcript-editor": "0.0.15",

3cbfe40c-99a9-463a-b58e-39d14e02cdcb.wav.dpe.json.zip

I am using the slate editor. It has some issues on the export function, when export with time code, it changed the transcript, duplicates.

{"words":[
{"start":4.11,"confidence":0.779731,"end":4.5,"text":"Yeah.","id":0,"index":0},
{"start":5.03,"confidence":0.852385,"end":5.48,"text":"Okay.","id":1,"index":1},
{"start":6.18,"confidence":0.793509,"end":6.5,"text":"Now,","id":2,"index":2},
{"start":6.5,"confidence":0.821012,"end":6.82,"text":"so","id":3,"index":3},
{"start":6.86,"confidence":0.754611,"end":7.24,"text":"probably","id":4,"index":4},
{"start":7.81,"confidence":0.870976,"end":8.05,"text":"would","id":5,"index":5},
....],
"paragraphs":[
{"id":0,"start":4.11,"end":4.5,"speaker":2},
{"id":1,"start":5.03,"end":5.48,"speaker":1},
{"id":2,"start":6.18,"end":15.08,"speaker":2},
{"id":3,"start":15.37,"end":15.98,"speaker":1},
...]}'

Changed to:

{"words":[
{"text":"Okay.","start":3.9399999999999977,"end":4.259999999999998},
{"text":"Now,","start":4.259999999999998,"end":4.579999999999998},
{"text":"Okay.","start":4.579999999999998,"end":4.899999999999999},
{"text":"Now,","start":4.899999999999999,"end":5.219999999999999},
{"text":"Okay.","start":5.219999999999999,"end":5.539999999999999},
{"text":"Now,","start":5.539999999999999,"end":5.859999999999999},
{"end":6.18,"start":5.859999999999999,"text":"Okay."},
{"end":6.5,"start":6.18,"text":"Now,"},
{"end":6.86,"start":6.5,"text":"So"},
{"end":7.81,"start":6.86,"text":"probably"},
{"end":8.05,"start":7.81,"text":"would"},
...],
"paragraphs":[
{"speaker":"2","start":4.11,"end":4.259999999999998,"id":"0"},
{"speaker":"1","start":5.859999999999999,"end":6.5,"id":"1"},
{"speaker":"2","start":6.5,"end":10.754999999999999,"id":"2"},
{"speaker":"Speaker A","start":10.754999999999999,"end":15.37,"id":"3"},
...]}'
@xshy216 xshy216 added the bug Something isn't working label Aug 27, 2020
@pietrop
Copy link
Owner

pietrop commented Aug 27, 2020

👋 thanks for flagging this,
Does this happen only when export with time code? And for all time code export option or just some?

@xshy216
Copy link
Author

xshy216 commented Aug 28, 2020

It happened when save it as well, all time code export option have same problem.

@pietrop
Copy link
Owner

pietrop commented Aug 28, 2020

Ok, makes sense, when it saves it runs time code re alignment.

What do you use for speech to text before converting it to DPE format?

@pietrop
Copy link
Owner

pietrop commented Aug 28, 2020

I can’t seem to reproduce in storybook, what version of slate transcript editor are you on?

Update: sorry saw you saw in first post 0.0.15

@xshy216
Copy link
Author

xshy216 commented Aug 28, 2020

I am using Azure, Xfyun etc. This file is from Azure, I converted to DPE.

@overZellis133
Copy link

overZellis133 commented Oct 27, 2020

@pietrop, we are seeing this, and we had a student today at American University have their transcript rendered pretty unusable when some words were replicated thousands of times across different portions of their transcript. We are using Google STT before converting to DPE. We are seeing the issue sometimes upon saving.

@pietrop
Copy link
Owner

pietrop commented Oct 27, 2020

Thanks for flagging this @overZellis133 , it be good to take a close look at the sample data

@pietrop
Copy link
Owner

pietrop commented Oct 27, 2020

To recap our convo

I am not sure if this is caused by the conversion of the data provided to SlateJs. It takes DPE format.

For GCP, I made a converter, pietrop/gcp-to-dpe, in latest v2, this is refactored (removing intermeidate draftJs conversion, as it origially came from @bbc/react-transcript-editor), and needs/uses GCP Speaker diarization to break paragraphs on speaker change.

So worth trying using that, and see if issue still persists.

@xshy216
Copy link
Author

xshy216 commented Nov 6, 2020 via email

@pietrop
Copy link
Owner

pietrop commented Nov 6, 2020

ah, that's interesting, wasn't able to see the image tho?

you are using @pietrop/slate-transcript-editor or earlier version @bbc/react-transcript-editor ?

@xshy216
Copy link
Author

xshy216 commented Nov 6, 2020 via email

@pietrop
Copy link
Owner

pietrop commented Nov 6, 2020

cool, yeah it be good to see what the pagination look like if you might be able to share that as a PR in @bbc/react-transcript-editor?
since that's a probl we are still trying to solve for that project

@xshy216
Copy link
Author

xshy216 commented Nov 6, 2020

Ok, I will try to make a PR there.

I hadn't been coding for 20 years, just came back for a project recently. I did not code it well as separate component, just made it work, need some time to make it as a PR.

For your reference, upload a screen shot here.
ScreenHunter 72

@pietrop
Copy link
Owner

pietrop commented Nov 6, 2020

Yeah no rush, and no worries if the code isn't perfect, it just be interesting to see the code/PR to see the concept/idea behind the pagination in draftJS 😊

@xshy216
Copy link
Author

xshy216 commented Nov 6, 2020

Hm, I did outside of draftJS, pagation in to editor, not in draftJS. I added one more props to the editor, to pass whole transcript, but only take the one page to the editor (DPE, then draftJS) to edit, when page change, save page into memory (slice of the array), take the required page to editor. When choose to Save, save the whole transcript to local/database.

@pietrop
Copy link
Owner

pietrop commented May 19, 2021

closing in favor of this,
pietrop/digital-paper-edit-electron#74 (comment) but feel free to raise another issue if you run into it again. And provide as much information as possible, as well as detailed steps to reproduce the issue, including sample json etc...

@pietrop pietrop closed this as completed May 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants