Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: craete a script to ease the process of i18n by make i18n #1991

Closed
wants to merge 1 commit into from

Conversation

ghassanmas
Copy link
Member

@ghassanmas ghassanmas commented Nov 4, 2021

The make i18n script once run, will generate both pot files and
transifex config file, to east the process of translating
the specific edx-doc projects. before running the script, you need to install
sphinx-intl via pip install sphinx-intl

Related ticket: https://trello.com/c/q4JJOHeo

Please note:
This might generate duplicate strings, because the three projects "open_edx_students", "open_edx_course_authors", and "en_us/open_edx_release_notes" I assume would share strings or resources from "en_us/shared", so it would be important to figure out if there would be a way to not have duplicate or at least to let translators know about it!

@ghassanmas ghassanmas requested a review from a team November 4, 2021 13:25
@openedx-webhooks openedx-webhooks added needs triage open-source-contribution PR author is not from Axim or 2U labels Nov 4, 2021
@openedx-webhooks
Copy link

Thanks for the pull request, @ghassanmas! I've created OSPR-6206 to keep track of it in JIRA, where we prioritize reviews. Please note that it may take us up to several weeks or months to complete a review and merge your PR.

Feel free to add as much of the following information to the ticket as you can:

  • supporting documentation
  • Open edX discussion forum threads
  • timeline information ("this must be merged by XX date", and why that is)
  • partner information ("this is a course on edx.org")
  • any other information that can help Product understand the context for the PR

All technical communication about the code itself will be done via the GitHub pull request interface. As a reminder, our process documentation is here.

Please let us know once your PR is ready for our review and all tests are green.

@ghassanmas
Copy link
Member Author

@feanil following our discussing in the last translation working group, I have created this PR

@natabene
Copy link

natabene commented Nov 4, 2021

@ghassanmas Thank you for your contribution. @nedbat @feanil Would either of you would like to review?

@ghassanmas
Copy link
Member Author

ghassanmas commented Dec 7, 2021

The latest commits this 85bf3e9 and this 94edab0, are made to automate the process of pushing resoucres to transfix , as was propsed by @ehuthmacher (Eden Huthmacher ) during our call today 7 Dec.

Note: In order to run, it's important that somone with access to github env, to add tx token (which has permission to push resouces).
The env variable name is expected to be stored or saved as secrets.TX_TOKEN, hence this why the tests are failing: https://github.com/edx/edx-documentation/runs/4446825219?check_suite_focus=true#step:7:863

tx ERROR: Authorization Required

@natabene natabene requested a review from sarina December 8, 2021 15:39
@sarina
Copy link
Contributor

sarina commented Dec 8, 2021

Hi @ghassanmas - I spoke with @feanil and we have some major docs work coming up, and are currently not that versed in the structure of the docs as-is. We need to pull apart edX-specific docs from Open edX-specific docs, for example. The repo may change significantly. As such I'm not exactly sure what to do with this PR, as it seems potentially premature - and it wouldn't be great to work on translating docs just to see them change out from under you. What are your thoughts?

@ghassanmas
Copy link
Member Author

@sarina, if I got you right there are two things that can be changed from these docs:

  • A The Code The sphinx project structure, architecture ...etc
  • B The Content The content of docs can be branched into two
    1. The content that would takin away: i.e. open edx/tcril vs u2/edx content
    2. New major content update e.g. someone is expected to change the content structure of content heavily of which is considered to be part of open edx content.

If you are referring to changes of B.i then thus already be taken care of as what @feanil suggested, hence: https://github.com/ghassanmas/edx-documentation/blob/94edab0c17f5cf8e8945041d80c703524e57595a/i18n.sh#L38-L42 . But may still not clear about the content?

If you are referring to changes of B.ii then yeah I would totally agree it doesn't make sense to ask translators to translate and then after a short period of time the whole content or large part of the content is expected to be changed.

If you are referring to changes of A then it would be okay to rewrite the i18n.sh (Since I don't expect changes of code would affect changes of content). I would just optimize for giving the opportunity for community members to have translatable docs ASAP instead of them having to wait until U2 and open edx decided what content to be changed...etc

But of course, you might have more input about how long this process is gonna take, Also regarding the structure, I would say it would maybe better to have just one sphinx project instead of having multiple projects, it would just make the process of generating i18n strings simpler and most importantly less prone to errors hence:

Please note:
This might generate duplicate strings, because the three projects "open_edx_students", "open_edx_course_authors", and "en_us/open_edx_release_notes" I assume would share strings or resources from "en_us/shared", so it would be important to figure out if there would be a way to not have duplicate or at least to let translators know about it!
ref qoute

Copy link
Contributor

@feanil feanil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ghassanmas I didn't see the limiting to only 3 books under here. Thanks for pointing that out. I gave you an initial review but have a few questions because I'm not very familiar with our translations infrastructure.

Also, to confirm, the TX_TOKEN is not currently defined, is that right? Can we make it so that translations sync won't run if that setting is missing?

TX_TOKEN: ${{ secrets.TX_TOKEN }}
run: |
./i18n.sh
tx push -s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add trailing newline.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's totally doable I would just add that to run when TX_TOKEN is missing and the running env is github, because if run locally there would be different way for transfiex to check for authtincaiion.

pyjwt[crypto]==1.7.1
# elasticsearch>=7.14.0 includes breaking changes in it which caused issues in discovery upgrade process.
# elastic search changelog: https://www.elastic.co/guide/en/enterprise-search/master/release-notes-7.14.0.html
elasticsearch<7.14.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this in the constraints file? Doesn't seem relevant to edx-documentation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was generated by make upgrade because I added new dependences

.tx/config Outdated
[main]
host = https://www.transifex.com

[open-edx-documentation-project.OpenSFD_certificates]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with how this configuration is setup, can you provide some context?

  • What does OpenSFD stand for?
  • Why are there so many resource entries in this file? Is it one per RST file?
  • Was this generated in some way? If so, what tool or process was used to generate it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does OpenSFD stand for?

It's this rst file https://github.com/edx/edx-documentation/blob/master/en_us/open_edx_students/source/OpenSFD_certificates.rst to be more accuarte it's OpenSFD_cerficates

Why are there so many resource entries in this file? Is it one per RST file?

Yes there is one resoucre per rst file from each project and also from the shared proejct

Was this generated in some way? If so, what tool or process was used to generate it?

That exactly what does i18n.sh does, it suses sphinx-intl tool to generate those file for each project (the open edx projects).
I tried to dcoument each step in the file. I used sphinx-intl because the sphinx is what is used here to handle doc also as suggested by @jmbowman in this trello ticket

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that file is generated, does it make sense to check it in? Wouldn't it be better to generate it when we need it? Otherwise, we'll need to ensure that anyone that adds a new rst file will also need to know to re-generate the config file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it make sense to remove it given also the now github workflow will generate it anyway...

@ghassanmas ghassanmas force-pushed the transfix-sphinx branch 4 times, most recently from 2482d37 to c87bd5e Compare December 11, 2021 11:54
@ghassanmas
Copy link
Member Author

@feanil I have pushed a commit to reolsve your suggestion, let me know if there anything I need to do

Copy link
Contributor

@feanil feanil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking that we're closing enough to moving these repos over, that we may want to hold merging this until it's moved over to the openedx org but I think it's starting to make sense to me as a starting point and it's pretty close to being ready to merge from my perspective.

i18n.sh Outdated Show resolved Hide resolved

- name : "Translations Sync"
env:
TX_TOKEN: ${{ secrets.TX_TOKEN }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make it clear which token this is, rename it to TRANSIFEX_TOKEN Can you also add docs for what permissions the token user needs to successfully run this script?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean rename it in the secret or in the env. Because I am afraid for env it has to be TX_TOKEN as per transifex doc. Also tx is cli name for transifex, so it should be okay to assume future developer be famiiliar with tx.

Yes I can add doc about what the permission of token needed to be, I will refer to this doc, simply it's as per tx doc the one or the user generating the token should have permission of project maintaer or administoraor

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great, linking to the doc and adding the one line that repeats the critical info is a good improvement. That way if the doc moves we still have the info and if the doc does not move, we can always get more details if needed.

I think it's fine to leave it as TX_TOKEN, but we'll need to add this as an org secret and before this job will succeed.


- name : "Translations Sync"
env:
TX_TOKEN: ${{ secrets.TX_TOKEN }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great, linking to the doc and adding the one line that repeats the critical info is a good improvement. That way if the doc moves we still have the info and if the doc does not move, we can always get more details if needed.

I think it's fine to leave it as TX_TOKEN, but we'll need to add this as an org secret and before this job will succeed.

@@ -38,3 +38,11 @@ jobs:
- name: "Build docs"
run: |
./run_tests.sh

- name : "Translations Sync"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking more at this workflow, currently this workflow just builds the docs as a part of the testing so this build.yml is a part of CI, what we're doing is adding a release step. If we leave it as is, it will also push to transifex anytime a PR is made which is not what we want. We should only push on merges to master.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have just added a condition on the translation step, which should stop it from running unless it's a merging request, hence the build succeed because its a PR and the tx token no longer is releavnt! and Merry Christmas 🎅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ghassanmas I don't think that's quite right either, because then it will run on any PR merging to any branch won't it?

I think the way I'd like to do this is to have just a separate github action for building and testing from the action to push translation strings to transifex. Can you create a separate github action(push_translation_strings.yml) which runs on push to master?

@ghassanmas
Copy link
Member Author

@feanil I have just wrapped up the commit log into one. And the building tests behaved as expected with the openedx migration (translation script have been skipped).
Let me know if there any blockers from my end to merge it.

@@ -38,3 +38,11 @@ jobs:
- name: "Build docs"
run: |
./run_tests.sh

- name : "Translations Sync"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ghassanmas I don't think that's quite right either, because then it will run on any PR merging to any branch won't it?

I think the way I'd like to do this is to have just a separate github action for building and testing from the action to push translation strings to transifex. Can you create a separate github action(push_translation_strings.yml) which runs on push to master?

@ghassanmas
Copy link
Member Author

@feanil It will run on push to master or open release./master as specified here

branches:
- "master"
- "open-release/*.master"

I can do that by creating a separate workflow file..etc for it anyway (for separation of concern), but anyway, do we want it to push translation each time push to master happens? because I sensed from @sarina that we better push on each release?

@sarina
Copy link
Contributor

sarina commented Jan 11, 2022

@feanil not sure if you were there in standup, but yes, each push to master has the potential to wipe a whole page of translations. so we'd want this to somehow push only when a named release branch is made (either as part of release tooling or automatically when a tag is created - however because creating a new named release branch would require a new transifex project, we'd likely need to do it manually)

@feanil
Copy link
Contributor

feanil commented Jan 11, 2022

Right, yes, thank you for the reminder, it slipped my mind. We should only do it on pushes to the release branches but I think it still makes sense to have it as a separate action.

@ghassanmas
Copy link
Member Author

ghassanmas commented Jan 11, 2022

So I should create a github action than be triggerd manually i.e. on: workflow_dispatch ref: https://docs.github.com/en/actions/managing-workflow-runs/manually-running-a-workflow#configuring-a-workflow-to-run-manually

@feanil
Copy link
Contributor

feanil commented Jan 11, 2022 via email

 - The script runs on new releases, will generate .tx directory.
- .tx contains transifex config files of *specific* edx-doc projects,
the files will be used by tx-client in order to push resources.
- Both sphinx-intl and tx-client libraries are being used and thus are
 added to the dependency.
- TX_TOKEN is expected to be available in the env with push permission,
 for more info refer to:
  https://docs.transifex.com/client/client-configuration
- Related ticket: https://trello.com/c/q4JJOHeo
@ghassanmas
Copy link
Member Author

@feanil done

@sarina
Copy link
Contributor

sarina commented Jan 12, 2022

@feanil

No it should run on merges(push) to the open.release/*.master branches.

How do you make sure it automatically goes to the correct project? I envision that the lilac.master branch creates a Lilac project in transifex, the maple.master branch a Maple project, etc (my Transifex terms may be wrong - I don't actually know how to create separate named branches within a Transifex repo/project).

@ghassanmas
Copy link
Member Author

@sarina the i18n.sh will always push to the same project in tx which is open-edx-documentation-project ref

The thing is if we pushed to a new project each time a new release happens then isn't the translation process needed to be started from scratch all over again?

Regarding what happens when a file is a resource file being pushed again or updated, its that TX will work out which strings have been changed in that file and a new string for translation accordingly ref

The thing that hasn't been thought of is when and how should the translation for each release will be pulled, should this repo try for example through a GitHub action to pull the translation from tx on e.g. weekly basis?

@sarina
Copy link
Contributor

sarina commented Jan 13, 2022

@ghassanmas this is exactly what I've been saying needs some serious investigation/thought. Right now, people are running on instances 1 or 2 or even 6 named releases behind the current named release. We need a way of maintaining translations for previous named releases, not just the most current.

@ghassanmas
Copy link
Member Author

If it's critical to keep maintaining translation for multiple releases then to my knowlege of tx we have two ways to resolve this

  1. Have a new project for each release i.e. "Open edx documentation project "
  2. Have a new translation file for each release i.e. "readme..po"

I think both are doable, however, if we choose (1) and do it the normal way, then translators might be frustrated because it would require to rereview or retranslate the whole project again. But we could do it in a handy way such that when instead of creating an empty project (for the new release) we could copy or clone the translation of the last release and then push the new resource files, TX mentioned this here and even refer to a python script that can be utilized for that.

For (2) I am not sure about that for now, because it would affect the translation experience like how many resources will end be in the project? Also, it might be complicated if for example for a new release new rst files are being added

@sarina
Copy link
Contributor

sarina commented Jan 13, 2022 via email

@ghassanmas
Copy link
Member Author

ghassanmas commented Jan 13, 2022

Can you outline the requirements explicitly again? Is it just to save the transaltion per release?

@sarina
Copy link
Contributor

sarina commented Jan 19, 2022

Requirements:

  • ALL named releases have THEIR OWN translation. That is, I can access the Maple docs, the Nutmeg docs, etc in my language

  • Making a push to a named release branch updates the doc for that release. So pushing a change to the open-release/nutmeg branch DOES NOT change the Maple docs

  • You will need to propose a way of handling the fact that we cut multiple "beta" branches prior to a "master" branch of a given release
    image

  • For translators, there must be a way to preserve translation memory.

@feanil
Copy link
Contributor

feanil commented Jan 31, 2022

@ghassanmas based on the conversations here and in a few other places, I wrote up #2018 to clearly define what we want done around translations. While I think the tooling discovery you've done in this PR has been really useful, I think we need to have a much better understanding about how we're going to manage the translations process around documentation before we merge this and create a bunch of translation work that we may have to throw away.

Thanks for this first attempt at this, it really helps clarify what questions we need answered before we can proceed with having a good documentation translation process but I don't think we should merge this as is. Let's talk more about answering the questions raised from #2018 before we land the tooling to start pushing things up to transifex.

@openedx-webhooks
Copy link

@ghassanmas Even though your pull request wasn’t merged, please take a moment to answer a two question survey so we can improve your experience in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
open-source-contribution PR author is not from Axim or 2U rejected
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants