From 353ff094bef9e184a22f888a3ca951a41ae6af66 Mon Sep 17 00:00:00 2001 From: Anisa Hawes <87070441+anisa-hawes@users.noreply.github.com> Date: Fri, 3 Feb 2023 17:59:15 +0000 Subject: [PATCH 1/3] Update introduction-to-stylometry-with-python.md - Reinstate caution re: Python v3.6 or higher at line 56 - Adjust code`with open(f'data/federalist_{filename}.txt', 'r') as f:` at line 156 - Add the line `nltk.download('punkt')` to follow `import nltk` at line 194 - Remove indentations which preceded token length code at lines 212-214 --- .../introduction-to-stylometry-with-python.md | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/en/lessons/introduction-to-stylometry-with-python.md b/en/lessons/introduction-to-stylometry-with-python.md index dc38b9cfc1..228f83276f 100755 --- a/en/lessons/introduction-to-stylometry-with-python.md +++ b/en/lessons/introduction-to-stylometry-with-python.md @@ -53,7 +53,7 @@ Please note that the code in this lesson has been designed to run sequentially. ## Prior Reading -If you do not have experience with the Python programming language or are finding examples in this tutorial difficult, the author recommends you read the lessons on [Working with Text Files in Python](/lessons/working-with-text-files) and [Manipulating Strings in Python](/lessons/manipulating-strings-in-python). +If you do not have experience with the Python programming language or are finding examples in this tutorial difficult, the author recommends you read the lessons on [Working with Text Files in Python](/lessons/working-with-text-files) and [Manipulating Strings in Python](/lessons/manipulating-strings-in-python). Please note, that those lessons were written in Python version 2 whereas this one uses Python version 3. The differences in [syntax](https://en.wikipedia.org/wiki/Syntax) between the two versions of the language can be subtle. If you are confused at any time, follow the examples as written in this lesson and use the other lessons as background material. (More precisely, the code in this tutorial was written using [Python 3.6.4](https://www.python.org/downloads/release/python-364/); the [f-string construct](https://docs.python.org/3/whatsnew/3.6.html#pep-498-formatted-string-literals) in the line `with open(f'data/federalist_{filename}.txt', 'r') as f:`, for example, requires Python 3.6 or a more recent version of the language.) ## Required materials @@ -153,7 +153,7 @@ Next, as we are interested in each author's vocabulary, we will define a short P def read_files_into_string(filenames): strings = [] for filename in filenames: - with open(f'data/federalist_{filename}.txt') as f: + with open(f'data/federalist_{filename}.txt', 'r') as f: strings.append(f.read()) return '\n'.join(strings) ``` @@ -191,6 +191,7 @@ The code required to calculate characteristic curves for the *Federalist*'s auth ```python # Load nltk import nltk +nltk.download('punkt') %matplotlib inline # Compare the disputed papers to those written by everyone, @@ -207,10 +208,10 @@ for author in authors: federalist_by_author_tokens[author] = ([token for token in tokens if any(c.isalpha() for c in token)]) - # Get a distribution of token lengths - token_lengths = [len(token) for token in federalist_by_author_tokens[author]] - federalist_by_author_length_distributions[author] = nltk.FreqDist(token_lengths) - federalist_by_author_length_distributions[author].plot(15,title=author) +# Get a distribution of token lengths +token_lengths = [len(token) for token in federalist_by_author_tokens[author]] +federalist_by_author_length_distributions[author] = nltk.FreqDist(token_lengths) +federalist_by_author_length_distributions[author].plot(15,title=author) ``` The '%matplotlib inline' declaration below 'import nltk' is required if your development environment is a [Jupyter Notebook](http://jupyter.org/), as it was for me while writing this tutorial; otherwise you may not see the graphs on your screen. If you work in [Jupyter Lab](http://jupyterlab.readthedocs.io/en/stable/getting_started/installation.html), please replace this clause with '%matplotlib ipympl'. From 66aa1093f6d7edea18460fbf9e0848d50931f713 Mon Sep 17 00:00:00 2001 From: Anisa Hawes <87070441+anisa-hawes@users.noreply.github.com> Date: Fri, 3 Feb 2023 18:07:13 +0000 Subject: [PATCH 2/3] Update introduction-a-la-stylometrie-avec-python.md - Reinstate caution re: Python v3.6 or higher at line 61 - Adjust code`with open(f'data/federalist_{nom_fichier}.txt', 'r') as f:` at line 162 - Add the line `nltk.download('punkt')` to follow `import nltk` at line 194 - Remove indentations which preceded token length code at lines 220-222 --- .../introduction-a-la-stylometrie-avec-python.md | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/fr/lecons/introduction-a-la-stylometrie-avec-python.md b/fr/lecons/introduction-a-la-stylometrie-avec-python.md index 47a4409b6a..7f91476781 100644 --- a/fr/lecons/introduction-a-la-stylometrie-avec-python.md +++ b/fr/lecons/introduction-a-la-stylometrie-avec-python.md @@ -58,7 +58,7 @@ Veuillez noter que le code informatique de cette leçon a été conçu pour êtr ## Lectures préalables -Si vous n'avez pas d'expérience de programmation en Python ou si vous trouvez les exemples dans ce tutoriel difficiles, l'auteur vous recommande de lire les leçons intitulées [Travailler avec des fichiers texte en Python](/fr/lecons/travailler-avec-des-fichiers-texte) et [Manipuler des chaînes de caractères en Python](/fr/lecons/manipuler-chaines-caracteres-python). +Si vous n'avez pas d'expérience de programmation en Python ou si vous trouvez les exemples dans ce tutoriel difficiles, l'auteur vous recommande de lire les leçons intitulées [Travailler avec des fichiers texte en Python](/fr/lecons/travailler-avec-des-fichiers-texte) et [Manipuler des chaînes de caractères en Python](/fr/lecons/manipuler-chaines-caracteres-python). Notez aussi que ces leçons ont à l'origine été rédigées en Python 2 tandis que ce tutoriel utilise Python 3. Les différences de [syntaxe](https://fr.wikipedia.org/wiki/Syntaxe) entre les deux versions du langage peuvent être subtiles. En cas de conflit, suivez les exemples tels qu'ils sont codés dans le présent tutoriel et n'utilisez les autres ressources qu'à titre indicatif. (Plus précisément, le code intégré à ce tutoriel a été écrit en [Python 3.6.4](https://www.python.org/downloads/release/python-364/); la chaîne de type [f-string](https://docs.python.org/3/whatsnew/3.6.html#pep-498-formatted-string-literals) qui apparaît dans la ligne `with open(f'data/federalist_{nom_fichier}.txt', 'r') as f:`, par exemple, requiert Python 3.6 ou une version plus récente du langage.) ## Matériel requis @@ -159,7 +159,7 @@ Ensuite, puisque nous nous intéressons au vocabulaire employé par chaque auteu def lire_fichiers_en_chaine(noms_fichiers): chaines = [] for nom_fichier in noms_fichiers: - with open(f'data/federalist_{nom_fichier}.txt') as f: + with open(f'data/federalist_{nom_fichier}.txt', 'r') as f: chaines.append(f.read()) return '\n'.join(chaines) ``` @@ -198,6 +198,7 @@ Le code requis pour calculer les courbes caractéristiques des auteurs du _Féd ```python # Charger nltk import nltk +nltk.download('punkt') %matplotlib inline # Comparons les articles contestés à ceux écrits par chaque @@ -215,10 +216,10 @@ for auteur in auteurs: if any(c.isalpha() for c in occ)]) - # Obtenir et dessiner la distribution des fréquences de longueurs - occs_longueurs = [len(occ) for occ in federalist_par_auteur_occs[auteur]] - federalist_par_auteur_dist_longueurs[auteur] = nltk.FreqDist(occs_longueurs) - federalist_par_auteur_dist_longueurs[auteur].plot(15,title=auteur) +# Obtenir et dessiner la distribution des fréquences de longueurs +occs_longueurs = [len(occ) for occ in federalist_par_auteur_occs[auteur]] +federalist_par_auteur_dist_longueurs[auteur] = nltk.FreqDist(occs_longueurs) +federalist_par_auteur_dist_longueurs[auteur].plot(15,title=auteur) ``` La clause `%matplotlib inline` sous la ligne `import nltk` est nécessaire si vous travaillez dans un environnement de développement [Jupyter Notebook](https://jupyter.org/), comme c'était le cas pour moi lorsque j'ai rédigé ce tutoriel; en son absence, les graphes pourraient ne pas apparaître à l'écran. Si vous travaillez plutôt dans [Jupyter Lab](https://jupyterlab.readthedocs.io/en/stable/getting_started/installation.html), veuillez remplacer cette clause par `%matplotlib ipympl`. From 796b85487c4ec9d87c23d84f8b161a81f455adc5 Mon Sep 17 00:00:00 2001 From: Anisa Hawes <87070441+anisa-hawes@users.noreply.github.com> Date: Fri, 3 Feb 2023 18:29:31 +0000 Subject: [PATCH 3/3] Update introducao-estilometria-python.md - Adjust code `with open(f'dados/pg{id_ficheiro}.txt', encoding='utf-8') as f:` to `with open(f'dados/federalist_{id_ficheiro}.txt', 'r', encoding='utf-8') as f:` at lines 59 and 149 - Add the line `nltk.download('punkt')` to follow `import nltk` at line 194 - Remove indentations which preceded token length code at lines 213-215 --- pt/licoes/introducao-estilometria-python.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/pt/licoes/introducao-estilometria-python.md b/pt/licoes/introducao-estilometria-python.md index 95d299b834..8a52335fd8 100644 --- a/pt/licoes/introducao-estilometria-python.md +++ b/pt/licoes/introducao-estilometria-python.md @@ -56,7 +56,7 @@ No final desta lição, teremos percorrido os seguintes tópicos: ## Leitura prévia -Se você não tem experiência com a linguagem de programação Python ou está tendo dificuldade nos exemplos apresentados neste tutorial, o autor recomenda que você leia as lições [Trabalhando com ficheiros de texto em Python](/pt/licoes/trabalhando-ficheiros-texto-python) e [Manipular Strings com Python](/pt/licoes/manipular-strings-python). Note que essas lições foram escritas em Python versão 2, enquanto esta usa Python versão 3. As diferenças de [sintaxe](https://perma.cc/E5LQ-S65P) entre as duas versões da linguagem podem ser sutis. Se você ficar em dúvida, siga os exemplos conforme descritos nesta lição e use as outras lições como material de apoio. (Este tutorial encontra-se atualizado até à versão [Python 3.8.5](https://perma.cc/XCT2-Q4AT); as [strings literais formatadas](https://perma.cc/U6Q6-59V3) na linha `with open(f'data/pg{filename}.txt', encoding='utf-8') as f:`, por exemplo, requerem Python 3.6 ou uma versão mais recente da linguagem.) +Se você não tem experiência com a linguagem de programação Python ou está tendo dificuldade nos exemplos apresentados neste tutorial, o autor recomenda que você leia as lições [Trabalhando com ficheiros de texto em Python](/pt/licoes/trabalhando-ficheiros-texto-python) e [Manipular Strings com Python](/pt/licoes/manipular-strings-python). Note que essas lições foram escritas em Python versão 2, enquanto esta usa Python versão 3. As diferenças de [sintaxe](https://perma.cc/E5LQ-S65P) entre as duas versões da linguagem podem ser sutis. Se você ficar em dúvida, siga os exemplos conforme descritos nesta lição e use as outras lições como material de apoio. (Este tutorial encontra-se atualizado até à versão [Python 3.8.5](https://perma.cc/XCT2-Q4AT); as [strings literais formatadas](https://perma.cc/U6Q6-59V3) na linha `with open(f'data/pg{filename}.txt', 'r', encoding='utf-8') as f:`, por exemplo, requerem Python 3.6 ou uma versão mais recente da linguagem.) ## Materiais requeridos @@ -146,7 +146,7 @@ def ler_ficheiros_para_string(ids_ficheiros): global texto strings = [] for id_ficheiro in ids_ficheiros: - with open(f'dados/pg{id_ficheiro}.txt', + with open(f'dados/pg{id_ficheiro}.txt', 'r', encoding='utf-8') as f: texto = f.read() texto = re.search(r"(START.*?\*\*\*)(.*)(\*\*\* END)", @@ -191,6 +191,7 @@ O trecho de código necessário para calcular e exibir as curvas característica ```python # Carregar nltk e matpotlib import nltk +nltk.download('punkt') import matplotlib.pylab as plt obras_tokens = {} @@ -209,9 +210,9 @@ for autor in autores: obras_tokens[autor] = ([token for token in tokens if any(c.isalpha() for c in token)]) - # Obter a distribuição de comprimentos de tokens - token_comprimentos = [len(token) for token in obras_tokens[autor]] - obras_distribuicao_comprimento[autor] = nltk.FreqDist(token_comprimentos) +# Obter a distribuição de comprimentos de tokens +token_comprimentos = [len(token) for token in obras_tokens[autor]] +obras_distribuicao_comprimento[autor] = nltk.FreqDist(token_comprimentos) # Plotar a curva característica de composição lista_chaves = []