Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Hebrew language code to he per IANA registry #401

Merged
merged 3 commits into from
Dec 7, 2022

Conversation

altryne
Copy link
Contributor

@altryne altryne commented Oct 23, 2022

Per IANA registry, iw was deprecated as the code for Hebrew in 1989 and the preferred code is he

The correct subtag:

%%
Type: language
Subtag: he
Description: Hebrew
Added: 2005-10-16
Suppress-Script: Hebr
%%

And the deprecation

%%
Type: language
Subtag: iw
Description: Hebrew
Added: 2005-10-16
Deprecated: 1989-01-01
Preferred-Value: he
Suppress-Script: Hebr
%%

Per [IANA registry](https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry), `iw` was deprecated as the code for Hebrew in 1989 and the preferred code is `he`

The correct subtag: 
```
%%
Type: language
Subtag: he
Description: Hebrew
Added: 2005-10-16
Suppress-Script: Hebr
%%
``` 
And the deprecation
```
%%
Type: language
Subtag: iw
Description: Hebrew
Added: 2005-10-16
Deprecated: 1989-01-01
Preferred-Value: he
Suppress-Script: Hebr
%%
```
@@ -29,6 +29,7 @@
"fi": "finnish",
"vi": "vietnamese",
"iw": "hebrew",
Copy link
Contributor Author

@altryne altryne Oct 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left the previous code here for backwards compatibility, but the TO_LANGUAGE_CODE lookup now returns he as it should

@jongwook
Copy link
Collaborator

Thanks for pointing this out. We (carelessly) took the language code from VoxLingua107, which had a few entries that did not match the ISO-639-1 codes.

However, the dictionary LANGUAGE makes uses of the order-sensitive nature of Python dicts, so this PR will break the languages that comes after Hebrew.

"<|startoftranscript|>",
*[f"<|{lang}|>" for lang in LANGUAGES.keys()],
"<|translate|>",

I think it wouldn't hurt much to break the backward compatibility and replace iw with he everywhere, maybe redirecting the input "iw" with "he" where appropriate.

altryne and others added 2 commits December 7, 2022 09:02
Per discussion, it's ok to make this change without backwards compatibility
@altryne
Copy link
Contributor Author

altryne commented Dec 7, 2022

@jongwook hey, apologies, I completely missed your comment.
Agree, and changed to he per comment.

@jongwook jongwook merged commit b9265e5 into openai:main Dec 7, 2022
@altryne altryne deleted the patch-1 branch December 16, 2022 17:00
altryne added a commit to altryne/transformers that referenced this pull request Jan 25, 2023
Here's my original PR into whisper that changes the same: 
openai/whisper#401

Per [IANA registry](https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry), `iw` was deprecated as the code for Hebrew in 1989 and the preferred code is `he`

The correct subtag: 
```
%%
Type: language
Subtag: he
Description: Hebrew
Added: 2005-10-16
Suppress-Script: Hebr
%%
``` 
And the deprecation
```
%%
Type: language
Subtag: iw
Description: Hebrew
Added: 2005-10-16
Deprecated: 1989-01-01
Preferred-Value: he
Suppress-Script: Hebr
%%
```
sgugger pushed a commit to huggingface/transformers that referenced this pull request Jan 26, 2023
Here's my original PR into whisper that changes the same: 
openai/whisper#401

Per [IANA registry](https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry), `iw` was deprecated as the code for Hebrew in 1989 and the preferred code is `he`

The correct subtag: 
```
%%
Type: language
Subtag: he
Description: Hebrew
Added: 2005-10-16
Suppress-Script: Hebr
%%
``` 
And the deprecation
```
%%
Type: language
Subtag: iw
Description: Hebrew
Added: 2005-10-16
Deprecated: 1989-01-01
Preferred-Value: he
Suppress-Script: Hebr
%%
```
miyu386 pushed a commit to miyu386/transformers that referenced this pull request Feb 9, 2023
Here's my original PR into whisper that changes the same: 
openai/whisper#401

Per [IANA registry](https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry), `iw` was deprecated as the code for Hebrew in 1989 and the preferred code is `he`

The correct subtag: 
```
%%
Type: language
Subtag: he
Description: Hebrew
Added: 2005-10-16
Suppress-Script: Hebr
%%
``` 
And the deprecation
```
%%
Type: language
Subtag: iw
Description: Hebrew
Added: 2005-10-16
Deprecated: 1989-01-01
Preferred-Value: he
Suppress-Script: Hebr
%%
```
ArthurZucker pushed a commit to ArthurZucker/transformers that referenced this pull request Mar 2, 2023
Here's my original PR into whisper that changes the same: 
openai/whisper#401

Per [IANA registry](https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry), `iw` was deprecated as the code for Hebrew in 1989 and the preferred code is `he`

The correct subtag: 
```
%%
Type: language
Subtag: he
Description: Hebrew
Added: 2005-10-16
Suppress-Script: Hebr
%%
``` 
And the deprecation
```
%%
Type: language
Subtag: iw
Description: Hebrew
Added: 2005-10-16
Deprecated: 1989-01-01
Preferred-Value: he
Suppress-Script: Hebr
%%
```
ilanit1997 pushed a commit to ilanit1997/whisper that referenced this pull request May 16, 2023
* Update Hebrew language code to he per IANA registry

Per [IANA registry](https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry), `iw` was deprecated as the code for Hebrew in 1989 and the preferred code is `he`

The correct subtag: 
```
%%
Type: language
Subtag: he
Description: Hebrew
Added: 2005-10-16
Suppress-Script: Hebr
%%
``` 
And the deprecation
```
%%
Type: language
Subtag: iw
Description: Hebrew
Added: 2005-10-16
Deprecated: 1989-01-01
Preferred-Value: he
Suppress-Script: Hebr
%%
```

* Update hebrew ISO code to he

Per discussion, it's ok to make this change without backwards compatibility
abyesilyurt pushed a commit to abyesilyurt/whisper that referenced this pull request Nov 13, 2023
* Update Hebrew language code to he per IANA registry

Per [IANA registry](https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry), `iw` was deprecated as the code for Hebrew in 1989 and the preferred code is `he`

The correct subtag: 
```
%%
Type: language
Subtag: he
Description: Hebrew
Added: 2005-10-16
Suppress-Script: Hebr
%%
``` 
And the deprecation
```
%%
Type: language
Subtag: iw
Description: Hebrew
Added: 2005-10-16
Deprecated: 1989-01-01
Preferred-Value: he
Suppress-Script: Hebr
%%
```

* Update hebrew ISO code to he

Per discussion, it's ok to make this change without backwards compatibility
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants