In [6]:
print("TensorFlow version: ", tf.__version__)
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))


device_name = tf.test.gpu_device_name()
if device_name:
    print(f"Device name: '{device_name}'")
    print("Device properties:", tf.config.experimental.get_device_details(tf.config.experimental.list_physical_devices('GPU')[0]))
else:
    print("No GPU device found.")
    
# Check bfloat16 support
bf16_supported = any(
    'bfloat16' in tf.config.experimental.get_device_details(gpu_device).get('compute_capability', '')
    for gpu_device in tf.config.experimental.list_physical_devices('GPU')
)

print("Suporta bfloat16." if bf16_supported else "Não suporta bfloat16.")

TensorFlow version:  2.12.0
Num GPUs Available:  1
Device name: '/device:GPU:0'
Device properties: {'compute_capability': (7, 5), 'device_name': 'NVIDIA GeForce RTX 2060 SUPER'}
Não suporta bfloat16.


2024-06-30 01:02:56.062924: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-06-30 01:02:56.063166: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-06-30 01:02:56.063289: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysf

# Ungraded Lab: Tokenizer Basics

Na maioria das tarefas de PNL, o passo inicial na preparação dos seus dados é extrair um vocabulário de palavras do seu *corpus* (ou seja, textos de entrada). Você precisará definir como representar os textos em representações numéricas que podem ser usadas para treinar uma rede neural. Essas representações são chamadas de *tokens* e Tensorflow e Keras facilitam sua geração usando suas APIs. Você verá como fazer isso nas próximas células.

## Gerando o Vocabulário

Neste notebook, você verá primeiro como pode fornecer um dicionário de consulta para cada palavra. O código abaixo pega uma lista de frases, depois pega cada palavra dessas frases e a atribui a um número inteiro. Isso é feito usando o método [fit_on_texts()](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/text/Tokenizer#fit_on_texts) e você pode obter o resultado olhando a propriedade `word_index`. <u>Palavras mais frequentes têm um índice menor</u>.

In [3]:
from tensorflow.keras.preprocessing.text import Tokenizer

# Define input sentences
sentences = [
    'i love my dog',
    'I, love my cat'
    ]

# Initialize the Tokenizer class
tokenizer = Tokenizer(num_words = 100)

# Generate indices for each word in the corpus
tokenizer.fit_on_texts(sentences)

# Get the indices and print it
word_index = tokenizer.word_index
print(word_index)

{'i': 1, 'love': 2, 'my': 3, 'dog': 4, 'cat': 5}


The `num_words` parameter used in the initializer specifies the maximum number of words minus one (based on frequency) to keep when generating sequences. You will see this in a later exercise. For now, the important thing to note is it does not affect how the `word_index` dictionary is generated. You can try passing `1` instead of `100` as shown on the next cell and you will arrive at the same `word_index`. 

Also notice that by default, all punctuation is ignored and words are converted to lower case. You can override these behaviors by modifying the `filters` and `lower` arguments of the `Tokenizer` class as described [here](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/text/Tokenizer#arguments). You can try modifying these in the next cell below and compare the output to the one generated above.

In [None]:
# Define input sentences
sentences = [
    'i love my dog',
    'I, love my cat',
    'You love my dog!'
]

# Initialize the Tokenizer class
tokenizer = Tokenizer(num_words = 1)

# Generate indices for each word in the corpus
tokenizer.fit_on_texts(sentences)

# Get the indices and print it
word_index = tokenizer.word_index
print(word_index)

That concludes this short exercise on tokenizing input texts!