Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for Italian original sentence dataset #2406

Closed
wants to merge 6 commits into from

Conversation

@Mte90
Copy link
Contributor

Mte90 commented Nov 22, 2019

@Sav22999 waiting for a check by him

I opened the pr in the wrong repo (we usually do before on our fork) but anyway the changes are very few, based on MozillaItalia#35

@Mte90 Mte90 changed the title Correzioni di #35 Fix for Italian original sentence dataset Nov 22, 2019
Copy link
Contributor

Sav22999 left a comment

Corrections

server/data/it/frasi.txt Outdated Show resolved Hide resolved
server/data/it/frasi.txt Outdated Show resolved Hide resolved
server/data/it/frasi.txt Outdated Show resolved Hide resolved
server/data/it/frasi.txt Outdated Show resolved Hide resolved
server/data/it/frasi.txt Show resolved Hide resolved
@Sav22999

This comment has been minimized.

Copy link
Contributor

Sav22999 commented Nov 23, 2019

Other corrections:

  • At row 4755 -> remove the " (Ho pensato: "È probabile siano stati sabotati gli addetti alle pulizie.)
  • At row 4789 -> remove the " ("Preparati giovane Space, adesso tutte le tue malefatte verranno a galla.)
  • At row 4792 -> remove the " (Finirai come Neo in Matrix nell'interrogatorio con Mr. Smith".)
  • At row 5786 -> remove kk and cch -> (È necessario in quest'epoca in cui, ad esempio, si usa sempre più spesso "kk" invece che "cch".) -> replace sentence with: È necessario in quest'epoca in cui si usa sempre più spesso.
  • At rows: 2879, 2880, 2910, 2911, 2912, 2913, 2914, 9782, 10021, 10022, 10048, 10225, 10577, 10578, 10582, 10941, 10942, 11137, 11138, 11139, 11144, 11145 (and 11175): replace % with percento
  • At row: 220: replace mozillians.org with sito web (Che hanno contribuito in modo significativo alle attività Locali e hanno ottenuto un vouch su mozillians.org.)
  • At row 313: es. -> esempio (remove abbreviation)
  • At row 2975: n.109/96 -> numero 109 del 1996
  • At row 4694: a.C. -> avanti Cristo
  • At row 4792: Mr. -> il signor
  • At row 6088: n. -> numero
  • At rows 10353, 10354 and 10370: M.I.T. -> MIT (consider as a word)
  • At row 10382: S. -> Steven
  • At row 10471: comp.os.minix -> Minix
  • At row 10765: MSN.com -> di Microsoft
  • At rows 165, 169, 574, 4141, 5776, 9658, 9852, 9921 and 11167: / -> oppure
  • At rows 9920, 9936, 9981, 9987, 9991, 9995, 9998, 10003 and 10005: m -> metri
  • At rows 469, 1396 and 1545: e/o -> e
  • At rows 380, 10452, 10548, 10631, 10663 and 10887: / -> (replace with empty space)
  • At row 2867: 109/1996 -> 10 del 1996
  • At row 2869: 4/10 -> 4 del 2010 and 50/10 -> 50 del 2010
  • At row 2947: 109/96 -> 109 del 1996
  • At row 1594: 1/3 -> un terzo
  • At rows 1587, 1735: cm -> centimetri
  • At row 1769: km -> chilometri
  • At row 9921: 50m/100m -> 50 metri oppure 100 metri
Mte90 added 2 commits Nov 24, 2019
server/data/it/frasi.txt Outdated Show resolved Hide resolved
server/data/it/frasi.txt Outdated Show resolved Hide resolved
server/data/it/frasi.txt Outdated Show resolved Hide resolved
server/data/it/frasi.txt Outdated Show resolved Hide resolved
server/data/it/frasi.txt Outdated Show resolved Hide resolved
server/data/it/frasi.txt Outdated Show resolved Hide resolved
server/data/it/frasi.txt Outdated Show resolved Hide resolved
server/data/it/frasi.txt Show resolved Hide resolved
@Sav22999

This comment has been minimized.

Copy link
Contributor

Sav22999 commented Nov 24, 2019

👍 perfect now, in my opinion.

@phirework

This comment has been minimized.

Copy link
Collaborator

phirework commented Nov 25, 2019

Hi @Mte90 and @Sav22999, thank you both for your hard work on this PR! We don't accept localization content in voice-web directly, since this project use the Mozilla's Pontoon system to manage and merge localizations.

You can find the page for Italian here: https://pontoon.mozilla.org/it/common-voice/

Once you've submitted your changes, a community reviewer will check and approve them, and then they will be auto-incorporated into the website in our next deployment.

Thanks!

@phirework phirework closed this Nov 25, 2019
@Mte90

This comment has been minimized.

Copy link
Contributor Author

Mte90 commented Nov 25, 2019

@nukeador

This comment has been minimized.

Copy link
Collaborator

nukeador commented Nov 26, 2019

Hi there.

For sentences, please remember we only accept PRs with removals, and corrected sentences should go through the sentence collector to ensure automatic filters are applied.

https://common-voice.github.io/sentence-collector/#/

If you change this PR to just remove sentences we will be able to merge.

Thanks!

@Mte90

This comment has been minimized.

Copy link
Contributor Author

Mte90 commented Nov 26, 2019

We fixed stuff that cannot be detected by filters in the first dataset that we did before the sentence collector, this doesn't involve sentence collector or wikipedia

@nukeador

This comment has been minimized.

Copy link
Collaborator

nukeador commented Nov 26, 2019

I know, but please understand this process is like this because we can't review ourselves every locale doing a PR, that's why we only accept removal PRs and ask for the modified sentences to be sent through the sentence collector, so the process there is automated.

Thanks!

nukeador added a commit that referenced this pull request Nov 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.