Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 7 additions & 6 deletions en/lessons/introduction-to-stylometry-with-python.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ Please note that the code in this lesson has been designed to run sequentially.

## Prior Reading

If you do not have experience with the Python programming language or are finding examples in this tutorial difficult, the author recommends you read the lessons on [Working with Text Files in Python](/lessons/working-with-text-files) and [Manipulating Strings in Python](/lessons/manipulating-strings-in-python).
If you do not have experience with the Python programming language or are finding examples in this tutorial difficult, the author recommends you read the lessons on [Working with Text Files in Python](/lessons/working-with-text-files) and [Manipulating Strings in Python](/lessons/manipulating-strings-in-python). Please note, that those lessons were written in Python version 2 whereas this one uses Python version 3. The differences in [syntax](https://en.wikipedia.org/wiki/Syntax) between the two versions of the language can be subtle. If you are confused at any time, follow the examples as written in this lesson and use the other lessons as background material. (More precisely, the code in this tutorial was written using [Python 3.6.4](https://www.python.org/downloads/release/python-364/); the [f-string construct](https://docs.python.org/3/whatsnew/3.6.html#pep-498-formatted-string-literals) in the line `with open(f'data/federalist_{filename}.txt', 'r') as f:`, for example, requires Python 3.6 or a more recent version of the language.)

## Required materials

Expand Down Expand Up @@ -153,7 +153,7 @@ Next, as we are interested in each author's vocabulary, we will define a short P
def read_files_into_string(filenames):
strings = []
for filename in filenames:
with open(f'data/federalist_{filename}.txt') as f:
with open(f'data/federalist_{filename}.txt', 'r') as f:
strings.append(f.read())
return '\n'.join(strings)
```
Expand Down Expand Up @@ -191,6 +191,7 @@ The code required to calculate characteristic curves for the *Federalist*'s auth
```python
# Load nltk
import nltk
nltk.download('punkt')
%matplotlib inline

# Compare the disputed papers to those written by everyone,
Expand All @@ -207,10 +208,10 @@ for author in authors:
federalist_by_author_tokens[author] = ([token for token in tokens
if any(c.isalpha() for c in token)])

# Get a distribution of token lengths
token_lengths = [len(token) for token in federalist_by_author_tokens[author]]
federalist_by_author_length_distributions[author] = nltk.FreqDist(token_lengths)
federalist_by_author_length_distributions[author].plot(15,title=author)
# Get a distribution of token lengths
token_lengths = [len(token) for token in federalist_by_author_tokens[author]]
federalist_by_author_length_distributions[author] = nltk.FreqDist(token_lengths)
federalist_by_author_length_distributions[author].plot(15,title=author)
```

The '%matplotlib inline' declaration below 'import nltk' is required if your development environment is a [Jupyter Notebook](http://jupyter.org/), as it was for me while writing this tutorial; otherwise you may not see the graphs on your screen. If you work in [Jupyter Lab](http://jupyterlab.readthedocs.io/en/stable/getting_started/installation.html), please replace this clause with '%matplotlib ipympl'.
Expand Down
13 changes: 7 additions & 6 deletions fr/lecons/introduction-a-la-stylometrie-avec-python.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ Veuillez noter que le code informatique de cette leçon a été conçu pour êtr

## Lectures préalables

Si vous n'avez pas d'expérience de programmation en Python ou si vous trouvez les exemples dans ce tutoriel difficiles, l'auteur vous recommande de lire les leçons intitulées [Travailler avec des fichiers texte en Python](/fr/lecons/travailler-avec-des-fichiers-texte) et [Manipuler des chaînes de caractères en Python](/fr/lecons/manipuler-chaines-caracteres-python).
Si vous n'avez pas d'expérience de programmation en Python ou si vous trouvez les exemples dans ce tutoriel difficiles, l'auteur vous recommande de lire les leçons intitulées [Travailler avec des fichiers texte en Python](/fr/lecons/travailler-avec-des-fichiers-texte) et [Manipuler des chaînes de caractères en Python](/fr/lecons/manipuler-chaines-caracteres-python). Notez aussi que ces leçons ont à l'origine été rédigées en Python 2 tandis que ce tutoriel utilise Python 3. Les différences de [syntaxe](https://fr.wikipedia.org/wiki/Syntaxe) entre les deux versions du langage peuvent être subtiles. En cas de conflit, suivez les exemples tels qu'ils sont codés dans le présent tutoriel et n'utilisez les autres ressources qu'à titre indicatif. (Plus précisément, le code intégré à ce tutoriel a été écrit en [Python 3.6.4](https://www.python.org/downloads/release/python-364/); la chaîne de type [f-string](https://docs.python.org/3/whatsnew/3.6.html#pep-498-formatted-string-literals) qui apparaît dans la ligne `with open(f'data/federalist_{nom_fichier}.txt', 'r') as f:`, par exemple, requiert Python 3.6 ou une version plus récente du langage.)

## Matériel requis

Expand Down Expand Up @@ -159,7 +159,7 @@ Ensuite, puisque nous nous intéressons au vocabulaire employé par chaque auteu
def lire_fichiers_en_chaine(noms_fichiers):
chaines = []
for nom_fichier in noms_fichiers:
with open(f'data/federalist_{nom_fichier}.txt') as f:
with open(f'data/federalist_{nom_fichier}.txt', 'r') as f:
chaines.append(f.read())
return '\n'.join(chaines)
```
Expand Down Expand Up @@ -198,6 +198,7 @@ Le code requis pour calculer les courbes caractéristiques des auteurs du _Féd
```python
# Charger nltk
import nltk
nltk.download('punkt')
%matplotlib inline

# Comparons les articles contestés à ceux écrits par chaque
Expand All @@ -215,10 +216,10 @@ for auteur in auteurs:
if any(c.isalpha() for c in occ)])


# Obtenir et dessiner la distribution des fréquences de longueurs
occs_longueurs = [len(occ) for occ in federalist_par_auteur_occs[auteur]]
federalist_par_auteur_dist_longueurs[auteur] = nltk.FreqDist(occs_longueurs)
federalist_par_auteur_dist_longueurs[auteur].plot(15,title=auteur)
# Obtenir et dessiner la distribution des fréquences de longueurs
occs_longueurs = [len(occ) for occ in federalist_par_auteur_occs[auteur]]
federalist_par_auteur_dist_longueurs[auteur] = nltk.FreqDist(occs_longueurs)
federalist_par_auteur_dist_longueurs[auteur].plot(15,title=auteur)
```

La clause `%matplotlib inline` sous la ligne `import nltk` est nécessaire si vous travaillez dans un environnement de développement [Jupyter Notebook](https://jupyter.org/), comme c'était le cas pour moi lorsque j'ai rédigé ce tutoriel; en son absence, les graphes pourraient ne pas apparaître à l'écran. Si vous travaillez plutôt dans [Jupyter Lab](https://jupyterlab.readthedocs.io/en/stable/getting_started/installation.html), veuillez remplacer cette clause par `%matplotlib ipympl`.
Expand Down
11 changes: 6 additions & 5 deletions pt/licoes/introducao-estilometria-python.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ No final desta lição, teremos percorrido os seguintes tópicos:

## Leitura prévia

Se você não tem experiência com a linguagem de programação Python ou está tendo dificuldade nos exemplos apresentados neste tutorial, o autor recomenda que você leia as lições [Trabalhando com ficheiros de texto em Python](/pt/licoes/trabalhando-ficheiros-texto-python) e [Manipular Strings com Python](/pt/licoes/manipular-strings-python). Note que essas lições foram escritas em Python versão 2, enquanto esta usa Python versão 3. As diferenças de [sintaxe](https://perma.cc/E5LQ-S65P) entre as duas versões da linguagem podem ser sutis. Se você ficar em dúvida, siga os exemplos conforme descritos nesta lição e use as outras lições como material de apoio. (Este tutorial encontra-se atualizado até à versão [Python 3.8.5](https://perma.cc/XCT2-Q4AT); as [strings literais formatadas](https://perma.cc/U6Q6-59V3) na linha `with open(f'data/pg{filename}.txt', encoding='utf-8') as f:`, por exemplo, requerem Python 3.6 ou uma versão mais recente da linguagem.)
Se você não tem experiência com a linguagem de programação Python ou está tendo dificuldade nos exemplos apresentados neste tutorial, o autor recomenda que você leia as lições [Trabalhando com ficheiros de texto em Python](/pt/licoes/trabalhando-ficheiros-texto-python) e [Manipular Strings com Python](/pt/licoes/manipular-strings-python). Note que essas lições foram escritas em Python versão 2, enquanto esta usa Python versão 3. As diferenças de [sintaxe](https://perma.cc/E5LQ-S65P) entre as duas versões da linguagem podem ser sutis. Se você ficar em dúvida, siga os exemplos conforme descritos nesta lição e use as outras lições como material de apoio. (Este tutorial encontra-se atualizado até à versão [Python 3.8.5](https://perma.cc/XCT2-Q4AT); as [strings literais formatadas](https://perma.cc/U6Q6-59V3) na linha `with open(f'data/pg{filename}.txt', 'r', encoding='utf-8') as f:`, por exemplo, requerem Python 3.6 ou uma versão mais recente da linguagem.)

## Materiais requeridos

Expand Down Expand Up @@ -146,7 +146,7 @@ def ler_ficheiros_para_string(ids_ficheiros):
global texto
strings = []
for id_ficheiro in ids_ficheiros:
with open(f'dados/pg{id_ficheiro}.txt',
with open(f'dados/pg{id_ficheiro}.txt', 'r',
encoding='utf-8') as f:
texto = f.read()
texto = re.search(r"(START.*?\*\*\*)(.*)(\*\*\* END)",
Expand Down Expand Up @@ -191,6 +191,7 @@ O trecho de código necessário para calcular e exibir as curvas característica
```python
# Carregar nltk e matpotlib
import nltk
nltk.download('punkt')
import matplotlib.pylab as plt

obras_tokens = {}
Expand All @@ -209,9 +210,9 @@ for autor in autores:
obras_tokens[autor] = ([token for token in tokens
if any(c.isalpha() for c in token)])

# Obter a distribuição de comprimentos de tokens
token_comprimentos = [len(token) for token in obras_tokens[autor]]
obras_distribuicao_comprimento[autor] = nltk.FreqDist(token_comprimentos)
# Obter a distribuição de comprimentos de tokens
token_comprimentos = [len(token) for token in obras_tokens[autor]]
obras_distribuicao_comprimento[autor] = nltk.FreqDist(token_comprimentos)

# Plotar a curva característica de composição
lista_chaves = []
Expand Down