Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Language Independence #651

Closed
ianmilligan1 opened this issue Nov 6, 2017 · 68 comments

Comments

@ianmilligan1
Copy link
Contributor

commented Nov 6, 2017

I wanted to move the conversation from #647 so policy discussions don't get wrapped up with an individual lesson.

@arojascastro noted below:

I may be wrong but the point is that Geoparser only recognizes named entities in English; I have never used it but I checked the website and I did not see any way to change the language parameter or whatever. So if anyone translates this lesson, s/he will have to use the text in English - because it will not work with texts in Spanish.

What do you think? @vgayolrs @mariajoafana

On the other hand, maybe it would be good pracrice to accept the publication of lessons that takes into consideration how dependant the content is on language.

Examples:

  • is it possible to use texts in different language other than English? Then it may be accepted
  • can you use resources that are transnational (for instance Europeana)? Fine!
  • are you using tools that are available in different languages? Perfect!

In my opinion, if the PH is committed to open access and free tecnologies, it may be good idea to extend its committment to multilingual tecnologies as well... of course a first step could be tecnologies that are localisation friendly...

@ianmilligan1

This comment has been minimized.

Copy link
Contributor Author

commented Nov 6, 2017

My own two cents is that I think it's worth making a consideration, but I don't think we want to have yet another blanket policy.

The Geoparser is a good example: by default it focuses on English, but a research team could tweak different elements to make it work in different languages. It would take effort and time, and it's not realistically feasible for tool projects to implement all of this out of the box on the resources that most DH projects have.

For what it's worth, my work for example focuses on English-language tools out of the box, mostly because (a) that's our expertise; (b) our pipeline of tools; and (c) our local users. We make it so that it's extendable down the road for other linguistic communities, but in some cases that's going to require people to do a bit of their own legwork on the text classifier front. With these complex tools, language independence isn't something you're going to easily have out of the box, especially given interdependent processing steps.

@arojascastro

This comment has been minimized.

Copy link
Contributor

commented Nov 6, 2017

Yup, I agree mostly, I think you could take into account when reviewing proposals - just as one factor amongst others.

However, your scenario applies well to programming languages, tools and methods mostly.

What about the reuse of information already published by third parties? I think a great amount of work can be on this second direction: to use resources that are transnational and multilingual. They already exist. So for instance, instead of mining the Internet Archive, why not mining Europeana?

At the moment all PH contents, examples and resources are focused on the English culture, which of course makes total sense because authors are English or American, but the audience... is it English or American only?

I do not think it would take a lot of effort to look for examples and resources that are not that dependent on language -- that are more diverse. We cannot limit diversity to the composition of our team board... it must be reflected on content as well. I insist this is feasible for all authors.

@acrymble

This comment has been minimized.

Copy link
Contributor

commented Nov 6, 2017

It could be something that someone mentions at the review stage if a lesson looks like it will be heavily focused on English material. Maybe someone with experience translating could just say something like: 'is there a way this could be altered so that we could more easily translate it into Spanish and other languages?'

@mdlincoln

This comment has been minimized.

Copy link
Member

commented Nov 27, 2017

@arojascastro Does this guideline need to be written down in our editorial and/or reviewer guidelines?

@arojascastro

This comment has been minimized.

Copy link
Contributor

commented Nov 27, 2017

In my opinion, yes. But maybe we should vote? Or see more views? It is up to you (English team) really.

@mdlincoln

This comment has been minimized.

Copy link
Member

commented Nov 27, 2017

I think it makes sense as a guideline - something that editors and reviewers should keep in mind, much like they do all the other traits of lessons that we try to cultivate. @arojascastro would you like to draft some additions to those pages and submit it as a PR? Then we can all take a look at those.

@arojascastro

This comment has been minimized.

Copy link
Contributor

commented Dec 8, 2017

Yes, @mdlincoln I'll take this. I need some days, but I am back on tracks and working for the PHes fully (I had to focus on teaching).

@walshbr walshbr referenced this issue Jan 18, 2018

Closed

Editorial Meeting Agenda - 1/22/18 #678

8 of 8 tasks complete
@arojascastro

This comment has been minimized.

Copy link
Contributor

commented Feb 7, 2018

I found these resources by the W3C about internationalization and localization: https://www.w3.org/International/getting-started/index

After reading, I drafted this text. Of course, my English can be improved, do not hesitate to suggest improvements concerning language and contents.

Internationalization and localization

As the World Wide Consortium (W3C) reminds us, internationalization "is the design and development of a product, application or document content that enables easy localization for target audiences that vary in culture, region, or language". Since 2017 The Programming Historian has become an international and multilingual website that provides lessons for English and Spanish audiences (February 2018). For this reason, we encourage authors to adopt an international outlook while writing their lessons. Key aspects to be considered include but are not limited to:

  • Images, texts and datasets should be as cosmopolitan as possible accordingly with our global audience. We recommend using international resources well known all around the world.
  • Examples, representations, icons, symbols and colors are important. We encourage seeking cases that embrace diversity and inclusivity and that are not offensive to other cultures - a trivial thumb up gesture or emoji may be misunderstood in Australia, Greece or Middle East.
  • Do not expect that everybody knows referents originated in a specific cultural context. Give the full name of organizations and the abbreviated form in parenthesis. For people, places or events, some explanations or links to Wikipedia may be necessary.
  • Use methods and tools that can be used, adapted or trained in a different language other than the language of your tutorial.
  • Use methods and tools whose web interface and documentation are available in several languages.
  • Support different character sets. If dealing with text processing, consider that most languages make use of non Latin characters as well as accents and various other diacritics.
  • Be aware of cultural differences related to display capabilities, time and date formats, and personal names.
  • Provide information and links to further resources and bibliographical references in several languages.

These key aspects are desiderable in order to make the localization process easier. Our editors will assist authors and reviewers and give appropriate feedback in case of doubt.

@arojascastro

This comment has been minimized.

Copy link
Contributor

commented Feb 7, 2018

You may also be interested in this Internationalization Checklist by Windows: https://msdn.microsoft.com/en-us/library/windows/desktop/ee845046(v=vs.85).aspx

There are more tecnical stuff that you may want to include @mdlincoln @acrymble ...

@mdlincoln

This comment has been minimized.

Copy link
Member

commented Feb 7, 2018

@arojascastro Thank you for drafting this. I really like this text - it is clear and covers many important issues, without being overly long.

Two thoughts

  1. It might be useful to make even more explicit that we encourage for internationalization in both methods as well as sources. For example, if the text analysis tool you're using works for multiple languages, you might consider showing examples of source data from different languages, demonstrating any required configuration changes etc.
  2. This text going in the reviewer guidelines, yes? I think it would make sense there, since that is where we already have many of our conceptual guidelines for lessons. The author guidelines specifically advise potential authors to consult the reviewer guidelines to understand what would make for a good lesson.

There are a few other small wording changes I might suggest - but I will put those in after you have opened a PR.

@arojascastro

This comment has been minimized.

Copy link
Contributor

commented Feb 8, 2018

Thank you @mdlincoln for your feedback.

  1. How would you make that more explicit? Would you add one more bullet item saying "We encourage for internationalization in both methods as well as sources. For example, if the text analysis tool you're using works for multiple languages, you might consider showing examples of source data from different languages, demonstrating any required configuration changes". Or would you add that idea in the first paragraph? For instance:

As the World Wide Consortium (W3C) reminds us, internationalization "is the design and development of a product, application or document content that enables easy localization for target audiences that vary in culture, region, or language". Since 2017 The Programming Historian has become an international and multilingual website that provides lessons for English and Spanish audiences (February 2018). For this reason, we encourage authors to adopt an international outlook while writing their lessons in both methods and sources. Key aspects to be considered include but are not limited to:

  1. I am not sure. I drafted this thinking that the author should read these guidelines before starting to write - but for sure the reviewers and editor should take them into account while reviewing proposals. If they are going to t be part of the reviewer guidelines maybe the text should be a bit different - instead of using the imperative form we should use questions. For instance:

    • Images, texts and datasets should be as cosmopolitan as possible accordingly with our global audience. Does the author use international resources well known all around the world?
    • Examples, representations, icons, symbols and colors are important. A trivial thumb up gesture or emoji may be misunderstood in Australia, Greece or Middle East. Do examples embrace diversity and inclusivity? Are they respectful and non offensive to other cultures?
    • Referents originated in a specific cultural context may be unknown for different audiences. Does the author give the full name of organizations and the abbreviated form in parenthesis? Are there explanations or links to Wikipedia concerning mentioned people, places or events?
    • Does the author use methods and tools that can be used, adapted or trained in a different language other than the language of your tutorial?
    • Does the author use methods and tools whose web interface and documentation are available in several languages?
    • Does the author support different character sets? If dealing with text processing, non Latin characters as well as accents and various other diacritics are covered in the tutorial?
    • Cultural differences such as display capabilities, time and date formats, and personal names are discussed?
    • Does the author provide information and links to further resources and bibliographical references in several languages?

I think this is an important issue because if we start adopting this criteria we will make translations and localization into Spanish, French or another language much easier and truly international. On the other hand, it requires that the team understands and supports these ideas and sees a real benefit in the long term. Asking for this kind of contributions to authors is going to be difficult because most of them will have to go beyond their comfort area and explore unknown territories -- for example, how many of our current tutorials contain bibliographic references other than in English? I would say none.

@arojascastro

This comment has been minimized.

Copy link
Contributor

commented Feb 8, 2018

Bonus: I found this article about "Writing for a Global Audience" interesting. But I do not think we should go in that detail, right? https://www.globalme.net/blog/writing-for-a-global-audience-25-dos-and-donts

@acrymble

This comment has been minimized.

Copy link
Contributor

commented Feb 8, 2018

This is an interesting discussion. I support it in principle (especially to get English speakers to think about the rest of the world when they write). But I am wary that we don't want to create a situation where local variation in needs and culture cannot be expressed. The DH skills needs of a place like Africa or S. America might be quite different from those of Europe. We wouldn't want a policy that prohibited or otherwise made it difficult for those different regional needs for different types of skills/tools to be published because they didn't necessarily meet the needs of other cultures and locations.

So we need to find a balance. And in the first instance, I think it should be something editors are responsible for considering as they conduct their first pass reviews, so we can see what effects it has (if any) on lessons.

@arojascastro

This comment has been minimized.

Copy link
Contributor

commented Feb 8, 2018

True, I agree that local needs may differ but for that reason we are also asking for original tutorials in Spanish. The point of this draft is to make easy to translate texts from one language into another. The same would apply to original tutorials in Spanish - if we ever publish one and you translate it into English you will encounter with many problems. On the other hand, I have done a lot of research and I think this draft summarizies the current state of the art when trying to go international in order to enable localization. But yes it would be great to hear more opinions.

@drjwbaker

This comment has been minimized.

Copy link
Member

commented Feb 8, 2018

@arojascastro Thanks for all the research done on this. Looks great. One comment. After "for this reason, we encourage authors to adopt an international outlook while writing their lessons. Key aspects to be considered include but are not limited to:" why isn't the first thing we say something about authors considering writing in their language of choice (or contacting the editor to see if that is possible)? I know there are big infrastructure complications for us here, but it seemed - to me - like the elephant in the room which we should at least address (~"we encourage lessons English and Spanish. We don't currently have infrastructure to support publishing lessons in other languages. However if you want to write in a language other than English and Spanish, contact the editors")

@arojascastro

This comment has been minimized.

Copy link
Contributor

commented Feb 8, 2018

I agree with @drjwbaker, I like that clarification. By the way I saw that some lessons have been already translated into French...

On the other hand, I did not mean to castrate the "local" needs - none of these criteria should be compulsory but desiderable. Let's say that a tutorial focus on a resource available only in English and that the method only works with English texts. For me that is fine, but it would be good that the tutorial discuss the pros and cons and acknowledges this and tries to suggest alternatives or solutions. This would help a lot to translators. Why? Because then editors and translators do not have to 1. ignore problems and keep translating as it all was perfect; 2. or intervene, adapt and rewrite the lesson in order to solve those problems.

@acrymble

This comment has been minimized.

Copy link
Contributor

commented Feb 10, 2018

I will use these guidelines as a talking point in the lesson I've just started editing. It's important that we don't change the goalposts for people further down the review process, but I think this is a positive idea. Thank you @arojascastro. Let's test it out and see how it goes before adopting it as formal policy (or adapting it before doing so).

@arojascastro

This comment has been minimized.

Copy link
Contributor

commented Feb 12, 2018

Great! Let's test them and then adapt or propose a different approach. Let us know about the feedback from authors.

@drjwbaker

This comment has been minimized.

Copy link
Member

commented Apr 24, 2018

If @mariajoafana @arojascastro (that is the ES members who've commented on the ticket) are happy, then I am. Thanks Adam for the revision.

@arojascastro

This comment has been minimized.

Copy link
Contributor

commented Apr 24, 2018

I am a bit surprised because the first original proposal that we received precisely uses a tool that works with English, Spanish and Japanese and whose web interface is available in three languages.

I recommend people to read the tutorial about "Stylometry with Python" where we put these ideas into practice. I think the final output is much better after considering these issues.

I agree about the balance between local needs and global coverage, however, these guidelines are not dangerous at all and they have a heuristic dimension as well -- we as a DH project stand for these values and help to change the statu quo. I see translation as an agent of change and to be honest I am less interested in fullfilling local needs than serving as a communication enabler between communities -- that is why I tend to favour translations rather than original content. But this is just my opinion and you may have other preferences.

Said this, I am happy with any text that addresses the problem and allows us to move on. I am also confident that we can change these guidelines or improve them in the future when the English team starts translating from Spanish.

@drjwbaker

This comment has been minimized.

Copy link
Member

commented Apr 24, 2018

@arojascastro To clarify, when you say ..

I am a bit surprised because the first original proposal that we received precisely uses a tool that works with English, Spanish and Japanese and whose web interface is available in three languages.

.. what are you referring to here? The first ES proposal that has been received?

@arojascastro

This comment has been minimized.

Copy link
Contributor

commented Apr 24, 2018

Yes, exactly, by we I meant the Spanish team. The author is sending the final proposal in the following days.

@drjwbaker

This comment has been minimized.

Copy link
Member

commented Apr 24, 2018

Thanks for the clarification @arojascastro.

@walshbr

This comment has been minimized.

Copy link
Contributor

commented Apr 24, 2018

Just explicitly saying here that I support this and the conversations around it. Have been keeping silent because the other opinions rendered here have been very thoughtful and useful - I trust your judgements. The new text looks good to me. As I mentioned on the call, guidelines like these are also helpful for new editors as we try to find specific points of intervention with new proposals.

@arojascastro

This comment has been minimized.

Copy link
Contributor

commented Apr 24, 2018

Just let me add three things.

In my original proposal, I stated the following:

  • Be aware of cultural differences related to display capabilities, time and date formats, and personal names.

  • Provide information and links to further resources and bibliographical references in several languages.

In the latest proposal we have:

  • Use internationally recognised formats for time, dates, etc.

  • When possible, add a multi-lingual documentation section ("Further reading" or "References") at the end of your tutorial.

The first bullet is the opposite of what I meant. What I meant is that there is no standard dates or times and thus the author should cover a range of different formats. For instance, in British English you may say 14th March 2016 but in American English you might say March 14th, 2016 and in Spanish 14 de marzo de 2016 while in German is 14. März 2016 (if I am not wrong).

Something similar happens with personal names: you have one family name whereas in Spain we have two.

In respect of the second bullet point, I am not sure adding an appendix is a solution: a global outlook should not be something that you add as suplemment or an amendment, it should be something that is organically present in all the tutorial.

Also in my original proposal there was a inclusivity component that has been lost: "We encourage seeking cases that embrace diversity and inclusivity and that are not offensive to other cultures".

We have also lost the links to W3C on internationalization: https://www.w3.org/International/questions/qa-i18n.en

But as I said, at this moment I am happy with any text that reaches a consensus.

@acrymble

This comment has been minimized.

Copy link
Contributor

commented Apr 24, 2018

Sorry @arojascastro I guess I misunderstood.

For the first one, is this something that authors can actualy follow? Other than making them better people, what will being aware of different ways of showing the date do to the lessons?

  • Be aware of cultural differences related to display capabilities, time and date formats, and personal names.
@drjwbaker

This comment has been minimized.

Copy link
Member

commented Apr 25, 2018

On ..

Use internationally recognised formats for time, dates, etc.

.. can we resolve this by making a distinction between dates/times in free text and dates/times in code or metadata.

So, for dates in free text we accept that there are many ways of representing dates or times.

But for dates/times in code or metadata there are international standards that are useful for ensuring interoperability of code/metadata and that I recommend we use. Specifically, I refer to ISO 8601:2004 - Data elements and interchange formats -- Information interchange -- Representation of dates and times (see https://www.iso.org/standard/40874.html or https://www.loc.gov/standards/datetime/iso-tc154-wg5_n0039_iso_wd_8601-2_2016-02-16.pdf).

@arojascastro

This comment has been minimized.

Copy link
Contributor

commented Apr 25, 2018

Absolutely @drjwbaker

I took the idea from W3C: https://www.w3.org/International/articles/definitions-time/

I guess this is relevant if we expect tutorials that need input from users (for instance, forms) or about web design. It may be less relevant in text analysis -- but maybe those tutorials focuse on creating a list of dates using XSLT, python or R as well. In any case, if you think it is not relevant, we should remove that bullet.

Have a nice day.

@drjwbaker

This comment has been minimized.

Copy link
Member

commented Apr 25, 2018

  • When choosing your methods or tools, try try to make choices with multi-lingual readers in mind. This is particularly important when working on textual analysis methods, or where users may reasonably want to have support for different character sets (eg, accented characters, non-Latin, etc).
  • When choosing primary sources, images, producing figures, or taking screen shots, consider how they will present themselves to a global audience.
  • When writing, avoid jokes, cultural references, puns, plays on words, idiomatic expressions, sarcasm, emojis, or language that is more difficult than it needs to be. Mentions of persons, organisations, or historical details should always come with contextual information. It may help to assume your reader does not live in your country or speak your language.
  • In code or metadata, use internationally recognised standard formats for dates and times (ISO 8601:2004). In free text, be aware of cultural differences related to the representation of dates and times.
  • When possible, add a multi-lingual documentation section ("Further reading" or "References") at the end of your tutorial.
@acrymble

This comment has been minimized.

Copy link
Contributor

commented Apr 25, 2018

Ok continuing to revise based on feedback:

Write For a Global Audience

Programming Historian readers live all around the world, and operate in a range of cultural contexts. To help reach that global audience, we have been publishing in more than one language since 2017, and aim to translate all tutorials. While we recognise that not all methods or tools are fully internationally accessible, authors can and should take steps to write their lesson in a way that is accessible to as many people as possible. Please consider the following when writing your tutorial:

  • When choosing your methods or tools, try to make choices with multi-lingual readers in mind. This is particularly important when working on textual analysis methods, or where users may reasonably want to have support for different character sets (eg, accented characters, non-Latin, etc).
  • Where possible, choose methods and tools that have multi-lingual documentation.
  • When choosing primary sources, we encourage authors to use materials they know well, but where possible they should make choices that are internationally focused and that could reasonably be used in translated lessons.
  • When choosing primary sources, images, producing figures, or taking screen shots, consider how they will present themselves to a global audience.
  • When writing, avoid jokes, cultural references, puns, plays on words, idiomatic expressions, sarcasm, emojis, or language that is more difficult than it needs to be. Mentions of persons, organisations, or historical details should always come with contextual information. It may help to assume your reader does not live in your country or speak your language.
  • In code examples or metadata, use internationally recognised standard formats for dates and times (ISO 8601:2004). In free text, be aware of cultural differences related to the representation of dates and times which might cause confusion.
  • Where possible, choose methods and tools that have multi-lingual documentation. If this is not practical, it would be great if you could add some multi-lingual references at the end of your tutorial.

Contact your editor if you require guidance on any of these matters. Tutorials that are unable to meet these guidelines may not be translated, but are still welcome for consideration for monolingual publication.

@jenniferisasi

This comment has been minimized.

Copy link
Contributor

commented Apr 25, 2018

I like this! Only, on "When choosing your methods or tools, try try to make choices with multi-lingual readers in mind." there are two "try"

@mdlincoln mdlincoln referenced this issue Apr 27, 2018

Merged

Language links refactor #818

3 of 3 tasks complete
@arojascastro

This comment has been minimized.

Copy link
Contributor

commented May 5, 2018

I just want to add that I'm very happy with the final proposal -- it is better and more balanced than my original text. I thank you all also for the collaborative spirit of this issues. You helped a lot to shape my thoughts and I also learned and changed my perspective thanks to your criticisms and comments.

@drjwbaker

This comment has been minimized.

Copy link
Member

commented May 5, 2018

I feel the same @arojascastro. This has been an excellent, passionate, therapeutic discussion.

acrymble added a commit that referenced this issue May 8, 2018

@acrymble

This comment has been minimized.

Copy link
Contributor

commented May 8, 2018

I have created a pull request. This is awaiting translation before being published. Thanks for your views everyone.

@arojascastro

This comment has been minimized.

Copy link
Contributor

commented May 8, 2018

Thank you! I can translate the texts on the weekend -- unless someone else can / want.

@acrymble

This comment has been minimized.

Copy link
Contributor

commented May 9, 2018

It would be great if someone else on the Spanish team would help with this. We rely on @arojascastro a lot for this type of work. I'm sure he would appreciate more help.

@jenniferisasi

This comment has been minimized.

Copy link
Contributor

commented May 9, 2018

@acrymble and @arojascastro here is a preliminary version of the translation; in this manner, I hope someone can review it and make the necessary changes and paste it where appropriate cos I'm not sure where this goes and last thing I want is break the guidelines website :)

Escribir para una audiencia global

Los lectores de Programming Historian viven por todo el mundo y, como tal, trabajan en un amplio rango de contextos culturales. Para poder alcanzar a dicha audiencia global, hemos estado publicando en más de un idioma desde 2017 y nuestro objetivo es traducir todos los tutoriales. Aunque reconocemos que no todos los métodos o herramientas son totalmente accesibles a nivel internacional, los autores pueden y deben tomar medidas para escribir sus lecciones de manera que sea accessible al mayor número de personas posible. Por favor, considera lo siguiente al escribir tu tutorial:

  • Al elegir tus métodos o herramientas, toma tus decisiones con una audiencia multilingüe en mente. Esto es particularmente importante cuando se trabaja en métodos de análisis textual, o cuando los usuarios quieran tener soporte para diferentes conjuntos de caracteres de forma razonable (por ejemplo, caracteres acentuados, no latinos, etc.).
  • Al elegir tus fuentes primarias e imágenes, al crear figuras o al hacer capturas de pantalla, considera cómo se presentarán a una audiencia global.
  • Al escribir evita usar chistes, referencias culturales, juegos de palabras, expresiones idiomáticas, el sarcasmo, emojis o un lenguaje innecesariamente complicado. Las menciones a personas, a organizaciones o a eventos históricos siempre deben ir acompañadas de información contextual. Te puede resultar útil pensar que tu audiencia no vive en tu país o que no habla tu mismo idioma.
  • En tus ejemplos de código o metadatos, utiliza los formatos estándar internacionalmente reconocidos para fechas y horas ([ISO 8601: 2004] (https://www.iso.org/standard/40874.html)). En tu texto, ten en cuenta las diferencias culturales relacionadas con la presentación de fechas y horas que puedan causar confusión.
  • Cuando sea posible escoge métodos y herrramientas que tengan documentación multilingüe. Si esto no es posible, trata de agregar algunas referencias multilingües al final de tu tutorial.

Contacta con tu editor si necesitas orientación sobre alguno de estos asuntos. Puede que no podamos traducir tu tutorial si no cumple con estas pautas, pero aún así lo consideraremos para su publicación monolingüe.

acrymble added a commit that referenced this issue May 10, 2018

@acrymble

This comment has been minimized.

Copy link
Contributor

commented May 10, 2018

I've added this to the pull request. Just need a reviewer to look it all over.

@arojascastro

This comment has been minimized.

Copy link
Contributor

commented May 10, 2018

I can take care later today.

@arojascastro

This comment has been minimized.

Copy link
Contributor

commented May 10, 2018

Done.

@acrymble

This comment has been minimized.

Copy link
Contributor

commented May 10, 2018

Thanks everyone, and especially @arojascastro for this important policy change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.