# Naive Bayes
O que √©: **Algoritmo de Classifica√ß√£o**

Em poucas palavras: **calcula probabilidades com base nas caracter√≠sticas independentes**

Refer√™ncia: https://www.alura.com.br/artigos/machine-learning

O Naive Bayes √© um conjunto de algoritmos de classifica√ß√£o baseados no Teorema de Bayes, que assume que as caracter√≠sticas s√£o independentes entre si. 

Ele √© chamado de "Naive" (ing√™nuo) porque essa suposi√ß√£o raramente √© verdadeira na pr√°tica, mas, ainda assim, o algoritmo √© eficaz para muitos problemas pr√°ticos.

Teorema de Bayes
O Teorema de Bayes √© a base do algoritmo Naive Bayes. Ele calcula a probabilidade de uma classe com base nas caracter√≠sticas observadas:

$$P(C|X) = \frac{P(X|C) \cdot P(C)}{P(X)}$$


Onde:
* $ùëÉ(ùê∂‚à£ùëã)$ √© a probabilidade de a classe 
* $ùê∂$ ser verdadeira dado o dado $ùëã$
* $ùëÉ(ùëã‚à£ùê∂)$ √© a probabilidade de observar ùëã dado que ùê∂ √© verdadeiro
* $ùëÉ(ùê∂)$ √© a probabilidade a priori da classe 
* $ùê∂ùëÉ(ùëã)$ √© a probabilidade a priori do dado $ùëã$

**Tipos de Naive Bayes**
1. **Gaussian Naive Bayes:** Usado quando as caracter√≠sticas s√£o cont√≠nuas e assumem uma distribui√ß√£o normal.
2. **Multinomial Naive Bayes:** Usado principalmente para classifica√ß√£o de textos onde as caracter√≠sticas s√£o contagens de palavras.
3. **Bernoulli Naive Bayes:** Usado quando as caracter√≠sticas s√£o bin√°rias (0 ou 1).

Importar as Bibliotecas Necess√°rias

In [2]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB, MultinomialNB, BernoulliNB
from sklearn.metrics import accuracy_score

Carregar um Conjunto de Dados

In [3]:
from sklearn.datasets import load_iris
iris = load_iris()
X, y = iris.data, iris.target

Dividir os Dados em Conjuntos de Treino e Teste

In [4]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

Treinar e Avaliar Gaussian Naive Bayes

In [5]:
# Instanciar o classificador
gnb = GaussianNB()
# Treinar o modelo
gnb.fit(X_train, y_train)
# Fazer previs√µes
y_pred = gnb.predict(X_test)
# Avaliar o modelo
print(f'Gaussian Naive Bayes Accuracy: {accuracy_score(y_test, y_pred):.2f}')


Gaussian Naive Bayes Accuracy: 0.98


Treinar e Avaliar Multinomial Naive Bayes

In [6]:
# Instanciar o classificador
mnb = MultinomialNB()
# Ajustar os dados para serem positivos (necess√°rio para MultinomialNB)
X_train_mnb = np.abs(X_train)
X_test_mnb = np.abs(X_test)
# Treinar o modelo
mnb.fit(X_train_mnb, y_train)
# Fazer previs√µes
y_pred_mnb = mnb.predict(X_test_mnb)
# Avaliar o modelo
print(f'Multinomial Naive Bayes Accuracy: {accuracy_score(y_test, y_pred_mnb):.2f}')


Multinomial Naive Bayes Accuracy: 0.96


Treinar e Avaliar Bernoulli Naive Bayes

In [7]:
# Converter os dados para valores bin√°rios
X_train_bnb = np.where(X_train > np.mean(X_train, axis=0), 1, 0)
X_test_bnb = np.where(X_test > np.mean(X_test, axis=0), 1, 0)
# Instanciar o classificador
bnb = BernoulliNB()
# Treinar o modelo
bnb.fit(X_train_bnb, y_train)
# Fazer previs√µes
y_pred_bnb = bnb.predict(X_test_bnb)
# Avaliar o modelo
print(f'Bernoulli Naive Bayes Accuracy: {accuracy_score(y_test, y_pred_bnb):.2f}')

Bernoulli Naive Bayes Accuracy: 0.78


‚ùáÔ∏è Exemplo:

In [8]:
#import numpy as np
#import pandas as pd
#from sklearn.model_selection import train_test_split
#from sklearn.naive_bayes import GaussianNB, MultinomialNB, BernoulliNB
#from sklearn.metrics import accuracy_score
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import confusion_matrix
#from yellowbrick.classifier import ConfusionMatrix

‚ö†Ô∏è O `LabelEncoder` transforma categorias em n√∫meros inteiros utilizando um mapeamento simples.

Cada categoria √∫nica √© atribu√≠da a um n√∫mero inteiro crescente, come√ßando do zero, em ordem alfab√©tica.

Por exemplo, se voc√™ tiver uma lista de frutas como `["ma√ß√£", "banana", "cereja"]`, o LabelEncoder converter√° isso em `[0, 1, 2]`.

In [6]:
from sklearn.preprocessing import LabelEncoder

# Lista de categorias
frutas = ["ma√ß√£", "uva", "cereja", "uva", "ma√ß√£", "cereja", "banana", "amora", ]

# Instanciar o LabelEncoder
le = LabelEncoder()

# Ajustar e transformar as categorias
frutas_transformadas = le.fit_transform(frutas)

print("frutas:",frutas)
print("frutas transformadas:", frutas_transformadas)
print("classes:", le.classes_)


frutas: ['ma√ß√£', 'uva', 'cereja', 'uva', 'ma√ß√£', 'cereja', 'banana', 'amora']
frutas transformadas: [3 4 2 4 3 2 1 0]
classes: ['amora' 'banana' 'cereja' 'ma√ß√£' 'uva']


Objetivo: descobrir os bons e os maus pagadores

In [9]:
credito = pd.read_csv('Credit.csv') 
credito

Unnamed: 0,checking_status,duration,credit_history,purpose,credit_amount,savings_status,employment,installment_commitment,personal_status,other_parties,...,property_magnitude,age,other_payment_plans,housing,existing_credits,job,num_dependents,own_telephone,foreign_worker,class
0,<0,6,'critical/other existing credit',radio/tv,1169,'no known savings',>=7,4,'male single',none,...,'real estate',67,none,own,2,skilled,1,yes,yes,good
1,0<=X<200,48,'existing paid',radio/tv,5951,<100,1<=X<4,2,'female div/dep/mar',none,...,'real estate',22,none,own,1,skilled,1,none,yes,bad
2,'no checking',12,'critical/other existing credit',education,2096,<100,4<=X<7,2,'male single',none,...,'real estate',49,none,own,1,'unskilled resident',2,none,yes,good
3,<0,42,'existing paid',furniture/equipment,7882,<100,4<=X<7,2,'male single',guarantor,...,'life insurance',45,none,'for free',1,skilled,2,none,yes,good
4,<0,24,'delayed previously','new car',4870,<100,1<=X<4,3,'male single',none,...,'no known property',53,none,'for free',2,skilled,2,none,yes,bad
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,'no checking',12,'existing paid',furniture/equipment,1736,<100,4<=X<7,3,'female div/dep/mar',none,...,'real estate',31,none,own,1,'unskilled resident',1,none,yes,good
996,<0,30,'existing paid','used car',3857,<100,1<=X<4,4,'male div/sep',none,...,'life insurance',40,none,own,1,'high qualif/self emp/mgmt',1,yes,yes,good
997,'no checking',12,'existing paid',radio/tv,804,<100,>=7,4,'male single',none,...,car,38,none,own,1,skilled,1,none,yes,good
998,<0,45,'existing paid',radio/tv,1845,<100,1<=X<4,4,'male single',none,...,'no known property',23,none,'for free',1,skilled,1,yes,yes,bad


Dividir os previsores da classe

In [10]:
previsores = credito.iloc[ : , 0:20]
previsores

Unnamed: 0,checking_status,duration,credit_history,purpose,credit_amount,savings_status,employment,installment_commitment,personal_status,other_parties,residence_since,property_magnitude,age,other_payment_plans,housing,existing_credits,job,num_dependents,own_telephone,foreign_worker
0,<0,6,'critical/other existing credit',radio/tv,1169,'no known savings',>=7,4,'male single',none,4,'real estate',67,none,own,2,skilled,1,yes,yes
1,0<=X<200,48,'existing paid',radio/tv,5951,<100,1<=X<4,2,'female div/dep/mar',none,2,'real estate',22,none,own,1,skilled,1,none,yes
2,'no checking',12,'critical/other existing credit',education,2096,<100,4<=X<7,2,'male single',none,3,'real estate',49,none,own,1,'unskilled resident',2,none,yes
3,<0,42,'existing paid',furniture/equipment,7882,<100,4<=X<7,2,'male single',guarantor,4,'life insurance',45,none,'for free',1,skilled,2,none,yes
4,<0,24,'delayed previously','new car',4870,<100,1<=X<4,3,'male single',none,4,'no known property',53,none,'for free',2,skilled,2,none,yes
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,'no checking',12,'existing paid',furniture/equipment,1736,<100,4<=X<7,3,'female div/dep/mar',none,4,'real estate',31,none,own,1,'unskilled resident',1,none,yes
996,<0,30,'existing paid','used car',3857,<100,1<=X<4,4,'male div/sep',none,4,'life insurance',40,none,own,1,'high qualif/self emp/mgmt',1,yes,yes
997,'no checking',12,'existing paid',radio/tv,804,<100,>=7,4,'male single',none,4,car,38,none,own,1,skilled,1,none,yes
998,<0,45,'existing paid',radio/tv,1845,<100,1<=X<4,4,'male single',none,4,'no known property',23,none,'for free',1,skilled,1,yes,yes


In [11]:
previsores = credito.iloc[ : , 0:20].values
previsores

array([['<0', 6, "'critical/other existing credit'", ..., 1, 'yes',
        'yes'],
       ['0<=X<200', 48, "'existing paid'", ..., 1, 'none', 'yes'],
       ["'no checking'", 12, "'critical/other existing credit'", ..., 2,
        'none', 'yes'],
       ...,
       ["'no checking'", 12, "'existing paid'", ..., 1, 'none', 'yes'],
       ['<0', 45, "'existing paid'", ..., 1, 'yes', 'yes'],
       ['0<=X<200', 45, "'critical/other existing credit'", ..., 1,
        'none', 'yes']], dtype=object)

In [12]:
classe = credito.iloc[:, 20]
classe

0      good
1       bad
2      good
3      good
4       bad
       ... 
995    good
996    good
997    good
998     bad
999    good
Name: class, Length: 1000, dtype: object

In [13]:
classe = credito.iloc[:, 20].values
classe

array(['good', 'bad', 'good', 'good', 'bad', 'good', 'good', 'good',
       'good', 'bad', 'bad', 'bad', 'good', 'bad', 'good', 'bad', 'good',
       'good', 'bad', 'good', 'good', 'good', 'good', 'good', 'good',
       'good', 'good', 'good', 'good', 'bad', 'good', 'good', 'good',
       'good', 'good', 'bad', 'good', 'bad', 'good', 'good', 'good',
       'good', 'good', 'good', 'bad', 'good', 'good', 'good', 'good',
       'good', 'good', 'good', 'good', 'good', 'bad', 'good', 'bad',
       'good', 'good', 'bad', 'good', 'good', 'bad', 'bad', 'good',
       'good', 'good', 'good', 'bad', 'good', 'good', 'good', 'good',
       'good', 'bad', 'good', 'bad', 'good', 'good', 'good', 'bad',
       'good', 'good', 'good', 'good', 'good', 'good', 'bad', 'good',
       'bad', 'good', 'good', 'bad', 'good', 'good', 'bad', 'good',
       'good', 'good', 'good', 'good', 'good', 'good', 'good', 'good',
       'bad', 'bad', 'good', 'good', 'good', 'good', 'good', 'good',
       'bad', 'good', 'go

Transformar apenas os atributos categ√≥ricos em atributos num√©ricos, passando um √≠ndice de cada coluna categ√≥rica

In [14]:
labelencoder1 = LabelEncoder()
previsores[:, 0] = labelencoder1.fit_transform(previsores[:, 0])

labelencoder2 = LabelEncoder()
previsores[:, 2] = labelencoder2.fit_transform(previsores[:, 2])

labelencoder3 = LabelEncoder()
previsores[:, 3] = labelencoder3.fit_transform(previsores[:, 3])

labelencoder4 = LabelEncoder()
previsores[:, 5] = labelencoder4.fit_transform(previsores[:, 5])

labelencoder5 = LabelEncoder()
previsores[:, 6] = labelencoder5.fit_transform(previsores[:, 6])

labelencoder6 = LabelEncoder()
previsores[:, 8] = labelencoder6.fit_transform(previsores[:, 8])

labelencoder7 = LabelEncoder()
previsores[:, 9] = labelencoder7.fit_transform(previsores[:, 9])

labelencoder8 = LabelEncoder()
previsores[:, 11] = labelencoder8.fit_transform(previsores[:, 11])

labelencoder9 = LabelEncoder()
previsores[:, 13] = labelencoder9.fit_transform(previsores[:, 13])

labelencoder10 = LabelEncoder()
previsores[:, 14] = labelencoder10.fit_transform(previsores[:, 14])

labelencoder11 = LabelEncoder()
previsores[:, 16] = labelencoder11.fit_transform(previsores[:, 16])

labelencoder12 = LabelEncoder()
previsores[:, 18] = labelencoder12.fit_transform(previsores[:, 18])

labelencoder13 = LabelEncoder()
previsores[:, 19] = labelencoder13.fit_transform(previsores[:, 19])


In [15]:
#C√≥digo Simplificado:

'''# √çndices das colunas que precisam ser codificadas
colunas_para_codificar = [0, 2, 3, 5, 6, 8, 9, 11, 13, 14, 16, 18, 19]

# Inicializar o LabelEncoder
labelencoder = LabelEncoder()

# Aplicar o LabelEncoder a cada coluna especificada
for coluna in colunas_para_codificar:
    previsores[:, coluna] = labelencoder.fit_transform(previsores[:, coluna])'''


'# √çndices das colunas que precisam ser codificadas\ncolunas_para_codificar = [0, 2, 3, 5, 6, 8, 9, 11, 13, 14, 16, 18, 19]\n\n# Inicializar o LabelEncoder\nlabelencoder = LabelEncoder()\n\n# Aplicar o LabelEncoder a cada coluna especificada\nfor coluna in colunas_para_codificar:\n    previsores[:, coluna] = labelencoder.fit_transform(previsores[:, coluna])'

In [16]:
previsores

array([[2, 6, 1, ..., 1, 1, 1],
       [1, 48, 3, ..., 1, 0, 1],
       [0, 12, 1, ..., 2, 0, 1],
       ...,
       [0, 12, 3, ..., 1, 0, 1],
       [2, 45, 3, ..., 1, 1, 1],
       [1, 45, 1, ..., 1, 0, 1]], dtype=object)

Dividir parte da base dos dados entre treinamento (70%) e teste (30%)

In [17]:
X_treinamento, X_teste, y_treinamento, y_teste = train_test_split(previsores, classe, test_size = 0.3, random_state = 0) 
# 70% = X_treinamento e y_treinamento
# 30% = X_teste e y_teste


In [18]:
X_treinamento

array([[1, 24, 1, ..., 2, 1, 1],
       [0, 36, 3, ..., 1, 1, 1],
       [2, 15, 1, ..., 2, 1, 1],
       ...,
       [0, 9, 3, ..., 1, 0, 1],
       [1, 18, 1, ..., 1, 0, 1],
       [1, 36, 2, ..., 2, 1, 1]], dtype=object)

In [19]:
X_test

array([[6.1, 2.8, 4.7, 1.2],
       [5.7, 3.8, 1.7, 0.3],
       [7.7, 2.6, 6.9, 2.3],
       [6. , 2.9, 4.5, 1.5],
       [6.8, 2.8, 4.8, 1.4],
       [5.4, 3.4, 1.5, 0.4],
       [5.6, 2.9, 3.6, 1.3],
       [6.9, 3.1, 5.1, 2.3],
       [6.2, 2.2, 4.5, 1.5],
       [5.8, 2.7, 3.9, 1.2],
       [6.5, 3.2, 5.1, 2. ],
       [4.8, 3. , 1.4, 0.1],
       [5.5, 3.5, 1.3, 0.2],
       [4.9, 3.1, 1.5, 0.1],
       [5.1, 3.8, 1.5, 0.3],
       [6.3, 3.3, 4.7, 1.6],
       [6.5, 3. , 5.8, 2.2],
       [5.6, 2.5, 3.9, 1.1],
       [5.7, 2.8, 4.5, 1.3],
       [6.4, 2.8, 5.6, 2.2],
       [4.7, 3.2, 1.6, 0.2],
       [6.1, 3. , 4.9, 1.8],
       [5. , 3.4, 1.6, 0.4],
       [6.4, 2.8, 5.6, 2.1],
       [7.9, 3.8, 6.4, 2. ],
       [6.7, 3. , 5.2, 2.3],
       [6.7, 2.5, 5.8, 1.8],
       [6.8, 3.2, 5.9, 2.3],
       [4.8, 3. , 1.4, 0.3],
       [4.8, 3.1, 1.6, 0.2],
       [4.6, 3.6, 1. , 0.2],
       [5.7, 4.4, 1.5, 0.4],
       [6.7, 3.1, 4.4, 1.4],
       [4.8, 3.4, 1.6, 0.2],
       [4.4, 3

In [20]:
print(y_treinamento)
y_treinamento.shape

['bad' 'bad' 'good' 'good' 'good' 'good' 'bad' 'bad' 'bad' 'good' 'bad'
 'bad' 'good' 'good' 'good' 'good' 'good' 'good' 'good' 'good' 'good'
 'good' 'good' 'bad' 'good' 'bad' 'good' 'bad' 'good' 'good' 'good' 'good'
 'good' 'good' 'good' 'good' 'bad' 'good' 'bad' 'good' 'bad' 'good' 'bad'
 'good' 'good' 'bad' 'bad' 'good' 'bad' 'bad' 'good' 'good' 'good' 'good'
 'good' 'good' 'good' 'bad' 'bad' 'good' 'good' 'good' 'bad' 'bad' 'good'
 'good' 'bad' 'good' 'bad' 'bad' 'good' 'bad' 'good' 'good' 'good' 'bad'
 'bad' 'good' 'good' 'good' 'bad' 'good' 'good' 'good' 'bad' 'bad' 'bad'
 'good' 'good' 'bad' 'good' 'good' 'good' 'good' 'bad' 'good' 'good' 'bad'
 'good' 'bad' 'good' 'good' 'bad' 'good' 'good' 'bad' 'bad' 'good' 'bad'
 'bad' 'good' 'good' 'good' 'good' 'good' 'good' 'good' 'good' 'good'
 'bad' 'good' 'good' 'good' 'good' 'good' 'good' 'good' 'bad' 'good' 'bad'
 'good' 'good' 'good' 'good' 'good' 'good' 'bad' 'good' 'bad' 'bad' 'good'
 'good' 'good' 'good' 'good' 'good' 'good' 'goo

(700,)

In [21]:
print(y_teste)
y_teste.shape

['good' 'good' 'good' 'good' 'good' 'good' 'good' 'good' 'good' 'bad'
 'bad' 'bad' 'good' 'good' 'bad' 'good' 'good' 'bad' 'good' 'good' 'good'
 'good' 'good' 'good' 'good' 'bad' 'good' 'good' 'good' 'bad' 'good'
 'good' 'bad' 'bad' 'good' 'bad' 'good' 'good' 'bad' 'good' 'bad' 'good'
 'good' 'good' 'good' 'bad' 'bad' 'good' 'good' 'good' 'bad' 'bad' 'good'
 'good' 'bad' 'good' 'good' 'good' 'good' 'good' 'bad' 'good' 'good'
 'good' 'good' 'good' 'good' 'bad' 'bad' 'good' 'good' 'good' 'good'
 'good' 'good' 'bad' 'good' 'bad' 'bad' 'good' 'good' 'good' 'good' 'good'
 'good' 'good' 'bad' 'good' 'good' 'good' 'good' 'good' 'good' 'good'
 'good' 'bad' 'good' 'good' 'good' 'good' 'good' 'good' 'good' 'good'
 'good' 'good' 'good' 'bad' 'bad' 'good' 'good' 'bad' 'bad' 'good' 'good'
 'good' 'bad' 'good' 'good' 'good' 'bad' 'good' 'good' 'bad' 'good' 'good'
 'good' 'good' 'good' 'good' 'bad' 'good' 'bad' 'good' 'good' 'bad' 'good'
 'good' 'good' 'good' 'good' 'good' 'good' 'bad' 'bad' 'bad' 'g

(300,)

Sempre treinar os dados usando os 70% dos dados X e y de treinamento

In [22]:
naive_bayes = GaussianNB()
naive_bayes.fit(X_treinamento, y_treinamento)

Prever o modelo treinado com os 30% armazenados na base de testes

In [23]:
previsoes = naive_bayes.predict(X_teste)
print(previsoes)
previsoes.shape

['bad' 'good' 'good' 'good' 'bad' 'good' 'good' 'good' 'good' 'bad' 'bad'
 'bad' 'good' 'bad' 'good' 'good' 'good' 'good' 'bad' 'good' 'bad' 'good'
 'bad' 'good' 'good' 'bad' 'good' 'good' 'good' 'bad' 'good' 'good' 'good'
 'good' 'good' 'bad' 'good' 'good' 'good' 'good' 'good' 'bad' 'good'
 'good' 'good' 'bad' 'bad' 'bad' 'bad' 'bad' 'good' 'bad' 'good' 'good'
 'good' 'good' 'bad' 'good' 'good' 'good' 'bad' 'good' 'good' 'good'
 'good' 'good' 'good' 'good' 'bad' 'good' 'good' 'good' 'good' 'good'
 'bad' 'good' 'good' 'good' 'good' 'good' 'good' 'good' 'good' 'bad'
 'good' 'good' 'bad' 'bad' 'good' 'bad' 'good' 'good' 'good' 'good' 'good'
 'good' 'good' 'good' 'good' 'good' 'good' 'good' 'good' 'bad' 'good'
 'good' 'bad' 'bad' 'good' 'good' 'good' 'bad' 'good' 'good' 'bad' 'good'
 'good' 'good' 'good' 'bad' 'bad' 'good' 'good' 'bad' 'good' 'good' 'good'
 'good' 'good' 'good' 'good' 'good' 'bad' 'good' 'good' 'good' 'good'
 'good' 'good' 'bad' 'bad' 'bad' 'good' 'bad' 'good' 'good' 'goo

(300,)

Matriz de Confus√£o: comparar os resultados de teste com o que de fato aconteceu<BR>
‚úÖ|‚ùå<BR>
‚ùå|‚úÖ

In [24]:
confusao = confusion_matrix(y_teste, previsoes)
print("Matriz de Confus√£o:\n", confusao)
print("\n‚úÖ", confusao[0,0],"+ ‚úÖ",confusao[1,1], "=", confusao[0,0] + confusao[1,1])
print("‚ùå", confusao[1,0], "+ ‚ùå", confusao[0,1], "=", confusao[1,0] + confusao[0,1])

Matriz de Confus√£o:
 [[ 41  45]
 [ 42 172]]

‚úÖ 41 + ‚úÖ 172 = 213
‚ùå 42 + ‚ùå 45 = 87


Calcular a taxa de acerto

In [25]:
taxa_acerto = accuracy_score(y_teste, previsoes)
print("‚úÖ Taxa de acerto:", taxa_acerto,"%")

taxa_erro = 1- taxa_acerto
print("‚ùå Taxa de erro:", round(taxa_erro,2),"%")


‚úÖ Taxa de acerto: 0.71 %
‚ùå Taxa de erro: 0.29 %


Usar o modelo para dados novos/ dados reais

In [26]:
novo_credito = pd.read_csv("NovoCredit.csv")
novo_credito

Unnamed: 0,checking_status,duration,credit_history,purpose,credit_amount,savings_status,employment,installment_commitment,personal_status,other_parties,residence_since,property_magnitude,age,other_payment_plans,housing,existing_credits,job,num_dependents,own_telephone,foreign_worker
0,'no checking',12,'existing paid',radio/tv,804,<100,>=7,4,'male single',none,4,car,38,none,own,1,skilled,1,none,yes


Armazenar soomente os valores sem o t√≠tulo da coluna

In [27]:
novo_credito_previsor = novo_credito.iloc[:, 0:20].values
novo_credito_previsor

array([["'no checking'", 12, "'existing paid'", 'radio/tv', 804, '<100',
        '>=7', 4, "'male single'", 'none', 4, 'car', 38, 'none', 'own',
        1, 'skilled', 1, 'none', 'yes']], dtype=object)

Criar um for para converter os dados de categ√≥ricos para num√©ricos

In [28]:
colunas_conversao = [0, 2, 3, 5, 6, 8, 9, 11, 13, 14, 16, 18, 19]
labelencoder = LabelEncoder()

for coluna in colunas_conversao:
    novo_credito_previsor[:, coluna] = labelencoder.fit_transform(novo_credito_previsor[:, coluna])

novo_credito_previsor

array([[0, 12, 0, 0, 804, 0, 0, 4, 0, 0, 4, 0, 38, 0, 0, 1, 0, 1, 0, 0]],
      dtype=object)

Prever

In [29]:
naive_bayes.predict(novo_credito_previsor)

array(['good'], dtype='<U4')