# **Use AutoKeras for Binary Classification**

- [Article link for English speakers](https://inside-machinelearning.com/en/autokeras-the-revolutionary-library-for-deep-learning/)
- [Article link for French speakers](https://inside-machinelearning.com/autokeras-la-librairie-du-futur/)

## **Load the data**

We begin by importing the **basic libraries** to do **Machine Learning** in Kaggle:
- numpy
- pandas
- os

In [None]:
import numpy as np
import pandas as pd

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

Then we **load** the **train** and **test data**.

Here, a **unusual difference**, to use AutoKeras we must **transform our list** of tweet in a **numpy array**, for that we use the function *to_numpy()*.

In [None]:
train_data = pd.read_csv('/kaggle/input/nlp-getting-started/train.csv', index_col = 'id')
train_data = train_data.reset_index(drop = True)

X_train = train_data[['text']].to_numpy()
y_train = train_data[['target']].to_numpy()

In [None]:
test_data = pd.read_csv('/kaggle/input/nlp-getting-started/test.csv')


test_id = test_data[['id']]

X_test = test_data[['text']].to_numpy()

Then, we can **check** that our data is in the form of a **numpy array**  :

In [None]:
X_train

### **AutoKeras Model**

To use **AutoKeras** the first thing to do is to **install the library** on our server:

In [None]:
!pip install autokeras &> /dev/null

Then we **import the library**.

In [None]:
import autokeras as ak

The **interesting part** begins ! We want to **classify text**, so we use the **AutoKeras'** function  *TextClassifier()*.

This function has a **main parameter** : **max_trials**.

**max_trials** allows to determine the **number of models** that **AutoKeras** will test before choosing **the best one**.

Other **parameters** exist that you can **consult** on the [documentation](https://autokeras.com/text_classifier/).

In [None]:
clf = ak.TextClassifier(
    overwrite=True, max_trials=3
)

Afterwards, we **train our model !**

In [None]:
clf.fit(X_train, y_train, epochs=5)

**Simple, fast, efficient...** what more do we need ?

We then **make our prediction**.

In [None]:
predictions = clf.predict(X_test)

This **prediction** is composed of **1** and **0** in **float format**, we transform them into **int**.

In [None]:
predictions = list(map(int, predictions))

**Finally**, we convert them into **dataframe**.

In [None]:
output = pd.DataFrame({'id': test_id.id, 'target': predictions})

And **submit our prediction !**

In [None]:
output.to_csv('my_submission.csv', index=False)
print("Your submission was successfully saved!")