# What is AutoNLP?



![image.png](attachment:d5b47a9b-73e2-4b62-b032-4e461b322c4a.png)

AutoNLP is very similar to AutoML, it automates the process of EDA and text processing and helps data scientists to get the best model. AutoNLP is a function present in the AutoViML framework, built using scikit-learn, NumPy, pandas, and matplotlib. It is designed to build high-performance interpretable models with the fewest variables.

AutoNLP guides a data scientist in:
* Exploratory data analysis of text
* Data Cleaning
* Feature reduction
* Variable Classification
* Produces model performance results as graphs
* Can easily handle text, date-time, structs, numeric, boolean, factor, and categorical variables
* Allows use the featuretools library to do Feature Engineering

```
from autoviml.feature_engineering import feature_engineering
print(df[preds].shape)
dfmod = feature_engineering(df[preds],['add'],'ID')
print(dfmod.shape)
```



# Case Study

## 1. Install autoviml

In [None]:
!pip install autoviml

In [None]:
import pandas as pd
import numpy as np

## 2. Loading data


In [None]:
data = pd.read_csv('../input/twitter-sentiment-analysis/train_2.csv')


In [None]:
data.head()

## 3. Usage


In [None]:
from sklearn.model_selection import train_test_split

from autoviml.Auto_NLP import Auto_NLP

train, test = train_test_split(data, test_size=0.2)

## 4. Specifying the Response and Predictor variables


In [None]:
input_feature, target = "tweet", "label"

## 5. Run AutoNLP


In [None]:
train_x, test_x, final, predicted= Auto_NLP(input_feature, train, test,target,
                                            score_type="balanced_accuracy",
                                            top_num_features=350,
                                            modeltype="Classification",
                                            verbose=2,
                                            build_model=True)

In [None]:
final.predict(test_x[input_feature])

## 6. Prediction

In [None]:
testing = pd.read_csv('../input/twitter-sentiment-analysis/test_2.csv')

In [None]:
final.predict(testing[input_feature])

In [None]:
prediction = pd.read_csv('../input/twitter-sentiment-analysis/test_2.csv')

In [None]:
prediction['label'] = final.predict(testing[input_feature])

In [None]:
prediction.to_csv('prediction.csv',index = False)

# Reference :

* https://github.com/AutoViML/Auto_ViML