<a href="https://colab.research.google.com/github/shubham62025865/project/blob/main/project_of_nlp.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **PROJECT OF NEURALLANGUAGEPREPROCESSING**

Natural Language Processing with TensorFlow

The main goal of natural language processing (NLP) is to derive information from natural language.

Natural language is a broad term but you can consider it to cover any of the following:

Text (such as that contained in an email, blog post, book, Tweet)
Speech (a conversation you have with a doctor, voice commands you give to a smart speaker)
Under the umbrellas of text and speech there are many different things you might want to do.

If you're building an email application, you might want to scan incoming emails to see if they're spam or not spam (classification).

If you're trying to analyse customer feedback complaints, you might want to discover which section of your business they're for.

🔑 Note: Both of these types of data are often referred to as sequences (a sentence is a sequence of words). So a common term you'll come across in NLP problems is called seq2seq, in other words, finding information in one sequence to produce another sequence (e.g. converting a speech command to a sequence of text-based steps).

To get hands-on with NLP in TensorFlow, we're going to practice the steps we've used previously but this time with text data:

Text -> turn into numbers -> build a model -> train the model to find patterns -> use patterns (make predictions)

Unzipping nlp_getting_started.zip gives the following 3 .csv files:

sample_submission.csv - an example of the file you'd submit to the Kaggle competition of your model's predictions.
train.csv - training samples of real and not real diaster Tweets.
test.csv - testing samples of real and not real diaster Tweets.


Since we have two target values, we're dealing with a binary classification problem.

It's fairly balanced too, about 60% negative class (target = 0) and 40% positive class (target = 1).

Where,

1 = a real disaster Tweet
0 = not a real disaster Tweet
And what about the total number of samples we have?

In [None]:
# Unzip data
! unzip "/content/drive/MyDrive/nlp_getting_started (1).zip"

Archive:  /content/drive/MyDrive/nlp_getting_started (1).zip
replace sample_submission.csv? [y]es, [n]o, [A]ll, [N]one, [r]ename: 

In [None]:
# Turn .csv files into pandas DataFrame's
import pandas as pd
train_df = pd.read_csv("train.csv")
test_df = pd.read_csv("test.csv")
train_df.head()

Unnamed: 0,id,keyword,location,text,target
0,1,,,Our Deeds are the Reason of this #earthquake M...,1
1,4,,,Forest fire near La Ronge Sask. Canada,1
2,5,,,All residents asked to 'shelter in place' are ...,1
3,6,,,"13,000 people receive #wildfires evacuation or...",1
4,7,,,Just got sent this photo from Ruby #Alaska as ...,1


In [None]:
train_df.shape

(7613, 5)

In [None]:
train_df_shuffled = train_df.sample(frac=1, random_state=42) # shuffle with random_state=42 for reproducibility
train_df_shuffled.head()

Unnamed: 0,id,keyword,location,text,target
2644,3796,destruction,,So you have a new weapon that can cause un-ima...,1
2227,3185,deluge,,The f$&amp;@ing things I do for #GISHWHES Just...,0
5448,7769,police,UK,DT @georgegalloway: RT @Galloway4Mayor: ÛÏThe...,1
132,191,aftershock,,Aftershock back to school kick off was great. ...,0
6845,9810,trauma,"Montgomery County, MD",in response to trauma Children of Addicts deve...,0


In [None]:
# How many examples of each class?
train_df.target.value_counts(normalize = 1)

0    0.57034
1    0.42966
Name: target, dtype: float64

In [None]:
# How many samples total?
print(f"Total training samples: {len(train_df)}")
print(f"Total test samples: {len(test_df)}")
print(f"Total samples: {len(train_df) + len(test_df)}")

Total training samples: 7613
Total test samples: 3263
Total samples: 10876


In [None]:
train_df_shuffled.head()

Unnamed: 0,id,keyword,location,text,target
2644,3796,destruction,,So you have a new weapon that can cause un-ima...,1
2227,3185,deluge,,The f$&amp;@ing things I do for #GISHWHES Just...,0
5448,7769,police,UK,DT @georgegalloway: RT @Galloway4Mayor: ÛÏThe...,1
132,191,aftershock,,Aftershock back to school kick off was great. ...,0
6845,9810,trauma,"Montgomery County, MD",in response to trauma Children of Addicts deve...,0


In [None]:
from sklearn.model_selection import train_test_split

# Use train_test_split to split training data into training and validation sets
train_sentences, val_sentences, train_labels, val_labels = train_test_split(train_df_shuffled["text"].to_numpy(),
                                                                            train_df_shuffled["target"].to_numpy(),
                                                                            test_size=0.1, # dedicate 10% of samples to validation set
                                                                            random_state=42) # random state for reproducibility

In [None]:
# Check the lengths
len(train_sentences), len(train_labels), len(val_sentences), len(val_labels)

NameError: ignored

In [None]:
train_sentences.shape,train_labels.shape

((6851,), (6851,))

In [None]:
# View the first 10 training sentences and their labels
train_sentences[:5], train_labels[:5]

(array(['@mogacola @zamtriossu i screamed after hitting tweet',
        'Imagine getting flattened by Kurt Zouma',
        '@Gurmeetramrahim #MSGDoing111WelfareWorks Green S welfare force ke appx 65000 members har time disaster victim ki help ke liye tyar hai....',
        "@shakjn @C7 @Magnums im shaking in fear he's gonna hack the planet",
        'Somehow find you and I collide http://t.co/Ee8RpOahPk'],
       dtype=object),
 array([0, 0, 1, 0, 0]))

In [None]:
import nltk
nltk.download("popular")
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
# remove url
def preprocess(sentences):
  tweet_list = []
  import re
  import numpy as np
  for tweet in sentences:
    tweet_cleaned = re.sub(r'http\S+', '', tweet)
    tweet_cleaned = tweet_cleaned.lower()
    tokens = word_tokenize(tweet_cleaned)
    clean_list = [word for word in tokens if word not in stopwords.words('english')]
    tweet_cleaned = " ".join(clean_list)
    tweet_list.append(tweet_cleaned)

  return np.array(tweet_list)




[nltk_data] Downloading collection 'popular'
[nltk_data]    | 
[nltk_data]    | Downloading package cmudict to /root/nltk_data...
[nltk_data]    |   Unzipping corpora/cmudict.zip.
[nltk_data]    | Downloading package gazetteers to /root/nltk_data...
[nltk_data]    |   Unzipping corpora/gazetteers.zip.
[nltk_data]    | Downloading package genesis to /root/nltk_data...
[nltk_data]    |   Unzipping corpora/genesis.zip.
[nltk_data]    | Downloading package gutenberg to /root/nltk_data...
[nltk_data]    |   Unzipping corpora/gutenberg.zip.
[nltk_data]    | Downloading package inaugural to /root/nltk_data...
[nltk_data]    |   Unzipping corpora/inaugural.zip.
[nltk_data]    | Downloading package movie_reviews to
[nltk_data]    |     /root/nltk_data...
[nltk_data]    |   Unzipping corpora/movie_reviews.zip.
[nltk_data]    | Downloading package names to /root/nltk_data...
[nltk_data]    |   Unzipping corpora/names.zip.
[nltk_data]    | Downloading package shakespeare to /root/nltk_data...
[nlt

In [None]:
train_sentences = preprocess(train_sentences)
val_sentences = preprocess(val_sentences)

In [None]:
# Check the lengths
len(train_sentences), len(train_labels), len(val_sentences), len(val_labels)

(6851, 6851, 762, 762)

In [None]:
# View the first 10 training sentences and their labels
train_sentences[:5], train_labels[:5]

(array(['@ mogacola @ zamtriossu screamed hitting tweet',
        'imagine getting flattened kurt zouma',
        '@ gurmeetramrahim # msgdoing111welfareworks green welfare force ke appx 65000 members har time disaster victim ki help ke liye tyar hai ....',
        "@ shakjn @ c7 @ magnums im shaking fear 's gon na hack planet",
        'somehow find collide'], dtype='<U166'),
 array([0, 0, 1, 0, 0]))

In [None]:
import tensorflow as tf
from tensorflow.keras.layers import TextVectorization  #https://www.tensorflow.org/api_docs/python/tf/keras/layers/TextVectorization

# Use the default TextVectorization variables
text_vectorizer = TextVectorization(max_tokens=None, # how many words in the vocabulary (all of the different words in your text)
                                    standardize="lower_and_strip_punctuation", # how to process text
                                    split="whitespace", # how to split tokens
                                    ngrams=None, # create groups of n-words?
                                    output_mode="int", # how to map tokens to numbers
                                    output_sequence_length=None) # how long should the output sequence of tokens be?
                                    # pad_to_max_tokens=True) # Not valid if using max_tokens=None

In [None]:
tweet_length = []
for tweet in train_sentences:
  tweet_list = tweet.split()
  tweet_length.append(len(tweet_list))

In [None]:
import numpy as np
sum(tweet_length)/ len(tweet_length)

12.297036928915487

In [None]:
# Find average number of tokens (words) in training Tweets
sum([len(i.split()) for i in train_sentences])/len(train_sentences)

12.297036928915487

In [None]:
# Setup text vectorization with custom variables
max_vocab_length = 10000 # max number of words to have in our vocabulary
max_length = 14 # max length our sequences will be (e.g. how many words from a Tweet does our model see?)

text_vectorizer = TextVectorization(max_tokens=max_vocab_length,
                                    output_mode="int",
                                    output_sequence_length=max_length)

In [None]:
# Fit the text vectorizer to the training text
text_vectorizer.adapt(train_sentences)

In [None]:
# Create sample sentence and tokenize it
sample_sentence = "70 Years After Atomic Bombs Atomic Japan Still Struggles With War Past: The anniversary of the devastation wrought b..."
print(sample_sentence)
text_vectorizer([sample_sentence])

70 Years After Atomic Bombs Atomic Japan Still Struggles With War Past: The anniversary of the devastation wrought b...


<tf.Tensor: shape=(1, 14), dtype=int64, numpy=
array([[ 220,   74,    1,  136, 1038,  136,  130,   21, 1354, 6530,   59,
         415,  364,  429]])>

In [None]:
# Choose a random sentence from the training dataset and tokenize it
import random
random_sentence = random.choice(train_sentences)
print(f"Original text:\n{random_sentence}\
      \n\nVectorized version:")
text_vectorizer([random_sentence])

Original text:
heard # skh radio first time . almost crashed car . @ 5sos @ ashton5sos @ luke5sos @ michael5sos @ calum5sos      

Vectorized version:


<tf.Tensor: shape=(1, 14), dtype=int64, numpy=
array([[ 397, 8210,  728,   35,   32,  565,  314,   64, 6120,    1,    1,
        2738, 3904,    0]])>

In [None]:
# Get the unique words in the vocabulary
words_in_vocab = text_vectorizer.get_vocabulary()
top_5_words = words_in_vocab[:5] # most common tokens (notice the [UNK] token for "unknown" words)
bottom_5_words = words_in_vocab[-5:] # least common tokens
print(f"Number of words in vocab: {len(words_in_vocab)}")
print(f"Top 5 most common words: {top_5_words}")
print(f"Bottom 5 least common words: {bottom_5_words}")

Number of words in vocab: 10000
Top 5 most common words: ['', '[UNK]', 's', 'nt', 'like']
Bottom 5 least common words: ['nvr', 'nuys', 'nutsandboltssp', 'nuts', 'nut']


In [None]:
tf.random.set_seed(42)
from tensorflow.keras import layers

embedding = layers.Embedding(input_dim=max_vocab_length, # set input shape
                             output_dim=128, # set size of embedding vector
                             embeddings_initializer="uniform", # default, intialize uniform
                             input_length=max_length, # how long is each input
                             name="embedding_1")

embedding

<keras.src.layers.core.embedding.Embedding at 0x7f4b3bd354e0>

In [None]:
# Get a random sentence from training set
import random
random_sentence = random.choice(train_sentences)
print(f"Original text:\n{random_sentence}\
      \n\nEmbedded version:")

# Embed the random sentence (turn it into numerical representation)
sample_embed = embedding(text_vectorizer([random_sentence]))
sample_embed

Original text:
@ bigburgerboi55 flat footballs ! ! ? ? like flattened spartans crushing back day ! ! ! ! # hail      

Embedded version:


<tf.Tensor: shape=(1, 14, 128), dtype=float32, numpy=
array([[[ 0.03854031,  0.02699376, -0.03334167, ...,  0.03214116,
         -0.00069686, -0.00573311],
        [-0.02984565, -0.0396796 , -0.04016267, ...,  0.00916386,
          0.03475389, -0.03628135],
        [ 0.03854031,  0.02699376, -0.03334167, ...,  0.03214116,
         -0.00069686, -0.00573311],
        ...,
        [ 0.0460467 ,  0.01789996, -0.0176149 , ..., -0.01069229,
         -0.01788081,  0.00611104],
        [ 0.0460467 ,  0.01789996, -0.0176149 , ..., -0.01069229,
         -0.01788081,  0.00611104],
        [ 0.0460467 ,  0.01789996, -0.0176149 , ..., -0.01069229,
         -0.01788081,  0.00611104]]], dtype=float32)>

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import Pipeline

# Create tokenization and modelling pipeline
model_0 = Pipeline([
                    ("tfidf", TfidfVectorizer()), # convert words to numbers using tfidf
                    ("clf", MultinomialNB()) # model the text
])

# Fit the pipeline to the training data
model_0.fit(train_sentences, train_labels)

In [None]:
val_sentences.shape, val_labels.shape

((762,), (762,))

In [None]:
baseline_score = model_0.score(val_sentences, val_labels)
print(f"Our baseline model achieves an accuracy of: {baseline_score*100:.2f}%")

In [None]:
# Make predictions
baseline_preds = model_0.predict(val_sentences)
baseline_preds[:20]

In [None]:
# Function to evaluate: accuracy, precision, recall, f1-score
from sklearn.metrics import accuracy_score, precision_recall_fscore_support

def calculate_results(y_true, y_pred):
  """
  Calculates model accuracy, precision, recall and f1 score of a binary classification model.

  Args:
  -----
  y_true = true labels in the form of a 1D array
  y_pred = predicted labels in the form of a 1D array

  Returns a dictionary of accuracy, precision, recall, f1-score.
  """
  # Calculate model accuracy
  model_accuracy = accuracy_score(y_true, y_pred) * 100
  # Calculate model precision, recall and f1 score using "weighted" average
  model_precision, model_recall, model_f1, _ = precision_recall_fscore_support(y_true, y_pred, average="weighted")
  model_results = {"accuracy": model_accuracy,
                  "precision": model_precision,
                  "recall": model_recall,
                  "f1": model_f1}
  return model_results

In [None]:
# Get baseline results
baseline_results = calculate_results(y_true=val_labels,
                                     y_pred=baseline_preds)
baseline_results

In [None]:
from tensorflow.keras import layers

input = layers.Input(shape = (1,), dtype = "string")
tv = text_vectorizer(input)
ebd = embedding(tv)
ga = layers.GlobalAveragePooling1D()(ebd)
# d1 = layers.Dense(64, activation = "relu")(ga)
output = layers.Dense(1, activation = "sigmoid")(ga)


In [None]:
model = tf.keras.Model(inputs = input, outputs = output)
model.summary()

In [None]:
model.compile(
    loss = "binary_crossentropy",
    optimizer = "adam",
    metrics = ["accuracy"]
)

# train_sentences, val_sentences, train_labels, val_labels

model.fit(train_sentences, train_labels, epochs = 5, validation_data = (val_sentences, val_labels))

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.src.callbacks.History at 0x7f4b382cc160>

In [None]:
# Build model with the Functional API
from tensorflow.keras import layers
inputs = layers.Input(shape=(1,), dtype="string") # inputs are 1-dimensional strings
x = text_vectorizer(inputs) # turn the input text into numbers
x = embedding(x) # create an embedding of the numerized numbers
x = layers.GlobalAveragePooling1D()(x) # lower the dimensionality of the embedding (try running the model without this layer and see what happens)
outputs = layers.Dense(1, activation="sigmoid")(x) # create the output layer, want binary outputs so use sigmoid activation
model_1 = tf.keras.Model(inputs, outputs, name="model_1_dense") # construct the model

In [None]:
# Compile model
model_1.compile(loss="binary_crossentropy",
                optimizer=tf.keras.optimizers.Adam(),
                metrics=["accuracy"])

In [None]:
# Get a summary of the model
model_1.summary()

In [None]:
# Fit the model
model_1_history = model_1.fit(train_sentences, # input sentences can be a list of strings due to text preprocessing layer built-in model
                              train_labels,
                              epochs=5,
                              validation_data=(val_sentences, val_labels),
)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [None]:
embedding.weights

[<tf.Variable 'embedding_1/embeddings:0' shape=(10000, 128) dtype=float32, numpy=
 array([[ 0.02762303,  0.05990589, -0.04322016, ..., -0.01768667,
          0.024793  ,  0.05269305],
        [ 0.0183447 ,  0.0723121 , -0.07765169, ...,  0.03172897,
          0.04322051,  0.0474329 ],
        [-0.04573086,  0.04318729, -0.06912646, ..., -0.01351985,
          0.02506709,  0.08711828],
        ...,
        [-0.11511342,  0.06098103,  0.08267513, ...,  0.05766258,
          0.05717307,  0.07217729],
        [-0.08760691,  0.01632658,  0.05407988, ...,  0.06204621,
          0.03205802,  0.0631576 ],
        [-0.13963863,  0.10337882,  0.03440196, ...,  0.13659963,
          0.06563178,  0.08298192]], dtype=float32)>]

In [None]:
embed_weights = model_1.get_layer("embedding_1").get_weights()[0]
print(embed_weights.shape)

(10000, 128)


In [None]:
# Make predictions (these come back in the form of probabilities)
model_1_pred_probs = model_1.predict(val_sentences)
model_1_pred_probs[:10] # only print out the first 10 prediction probabilities



array([[0.32290438],
       [0.76082015],
       [0.9992501 ],
       [0.11179618],
       [0.4405569 ],
       [0.9572143 ],
       [0.88378125],
       [0.9990426 ],
       [0.95531666],
       [0.27437901]], dtype=float32)

In [None]:
# Get the vocabulary from the text vectorization layer
words_in_vocab = text_vectorizer.get_vocabulary()
len(words_in_vocab), words_in_vocab[:10]

(10000, ['', '[UNK]', 's', 'nt', 'like', 'amp', 'm', 'fire', 'via', 'new'])

In [None]:

model_1.summary()


In [None]:
# Get the weight matrix of embedding layer
# (these are the numerical patterns between the text in the training dataset the model has learned)
embed_weights = model_1.get_layer("embedding_1").get_weights()[0]
print(embed_weights.shape) # same size as vocab size and embedding_dim (each word is a embedding_dim size vector)

(10000, 128)


In [None]:
# Code below is adapted from: https://www.tensorflow.org/tutorials/text/word_embeddings#retrieve_the_trained_word_embeddings_and_save_them_to_disk
import io

# Create output writers
out_v = io.open("embedding_vectors.tsv", "w", encoding="utf-8")
out_m = io.open("embedding_metadata.tsv", "w", encoding="utf-8")

# Write embedding vectors and words to file
for num, word in enumerate(words_in_vocab):
  if num == 0:
     continue # skip padding token
  vec = embed_weights[num]
  out_m.write(word + "\n") # write words to file
  out_v.write("\t".join([str(x) for x in vec]) + "\n") # write corresponding word vector to file
out_v.close()
out_m.close()

# Download files locally to upload to Embedding Projector
try:
  from google.colab import files
except ImportError:
  pass
else:
  files.download("embedding_vectors.tsv")
  files.download("embedding_metadata.tsv")

In [None]:
# Set random seed and create embedding layer (new embedding layer for each model)
tf.random.set_seed(42)
from tensorflow.keras import layers
model_2_embedding = layers.Embedding(input_dim=max_vocab_length,
                                     output_dim=128,
                                     embeddings_initializer="uniform",
                                     input_length=max_length,
                                     name="embedding_2")


# Create LSTM model
inputs = layers.Input(shape=(1,), dtype="string")
x = text_vectorizer(inputs)
x = model_2_embedding(x)
print(x.shape)
x = layers.LSTM(64)(x) # return vector for whole sequence
print(x.shape)
# x = layers.Dense(64, activation="relu")(x) # optional dense layer on top of output of LSTM cell
outputs = layers.Dense(1, activation="sigmoid")(x)
model_2 = tf.keras.Model(inputs, outputs, name="model_2_LSTM")
model_2.summary()

In [None]:
# Compile model
model_2.compile(loss="binary_crossentropy",
                optimizer=tf.keras.optimizers.Adam(),
                metrics=["accuracy"])

In [None]:
model_2.summary()

In [None]:
model_2.summary()

In [None]:
# Fit model
model_2_history = model_2.fit(train_sentences,
                              train_labels,
                              epochs=5,
                              validation_data=(val_sentences, val_labels),
                              )

In [None]:
# Make predictions on the validation dataset
model_2_pred_probs = model_2.predict(val_sentences)
model_2_pred_probs.shape, model_2_pred_probs[:10] # view the first 10

In [None]:
# Round out predictions and reduce to 1-dimensional array
model_2_preds = tf.squeeze(tf.round(model_2_pred_probs))
model_2_preds[:10]

In [None]:
# Calculate LSTM model results
model_2_results = calculate_results(y_true=val_labels,
                                    y_pred=model_2_preds)
model_2_results

In [None]:
import tensorflow_hub as hub
embed = hub.load("https://tfhub.dev/google/universal-sentence-encoder/4") # load Universal Sentence Encoder
sample_sentences = ([
                      "When you call the universal sentence encoder on a sentence, it turns it into numbers."])
embed_samples=embed(sample_sentences)
print(embed_samples[0][:50])

In [None]:
embed_samples[0].shape

In [None]:
import tensorflow as tf

In [None]:
import tensorflow_hub as hub
from tensorflow.keras import layers
sentence_encoder_layer = hub.KerasLayer("https://tfhub.dev/google/universal-sentence-encoder/4",
                                        input_shape=[], # shape of inputs coming to our model
                                        dtype=tf.string, # data type of inputs coming to the USE layer
                                        trainable=False, # keep the pretrained weights (we'll create a feature extractor)
                                        name="USE")

In [None]:
# Create model using the Sequential API
import tensorflow as tf
model_6 = tf.keras.Sequential([
  sentence_encoder_layer, # take in sentences and then encode them into an embedding
  layers.Dense(64, activation="relu"),
  layers.Dense(1, activation="sigmoid")
], name="model_6_USE")

# Compile model
model_6.compile(loss="binary_crossentropy",
                optimizer=tf.keras.optimizers.Adam(),
                metrics=["accuracy"])

model_6.summary()

In [None]:
# Train a classifier on top of pretrained embeddings
model_6_history = model_6.fit(train_sentences,
                              train_labels,
                              epochs=5,
                              validation_data=(val_sentences, val_labels)
                              )

In [None]:
# Make predictions with USE TF Hub model
model_6_pred_probs = model_6.predict(val_sentences)
model_6_pred_probs[:10]

In [None]:
# Convert prediction probabilities to labels
model_6_preds = tf.squeeze(tf.round(model_6_pred_probs))
model_6_preds[:10]

In [None]:
# Calculate model 6 performance metrics
model_6_results = calculate_results(val_labels, model_6_preds)
model_6_results

In [None]:
# Save TF Hub Sentence Encoder model to HDF5 format
model_6.save("model_6.h5")

In [None]:
# Load model with custom Hub Layer (required with HDF5 format)
loaded_model_6 = tf.keras.models.load_model("model_6.h5",
                                            custom_objects={"KerasLayer": hub.KerasLayer})

In [None]:
# How does our loaded model perform?
loaded_model_6.evaluate(val_sentences, val_labels)

In [None]:
# Save TF Hub Sentence Encoder model to SavedModel format (default)
model_6.save("model_6_SavedModel_format")

In [None]:
import shutil
shutil.make_archive("folder", format = "zip", base_dir = "/content/model_6_SavedModel_format")

'folder.zip'

In [None]:
##Writefile app.py
import streamlit as st
st.write

ModuleNotFoundError: ignored

In [None]:
! pip install streamlit -q

In [None]:
! pip install pyngrok

Collecting pyngrok
  Downloading pyngrok-7.0.0.tar.gz (718 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m718.7/718.7 kB[0m [31m6.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: pyngrok
  Building wheel for pyngrok (setup.py) ... [?25l[?25hdone
  Created wheel for pyngrok: filename=pyngrok-7.0.0-py3-none-any.whl size=21129 sha256=cca6fe90e20aadb8523abd4307a4a6b46b3e6c0571153fcfa14fc7c3bfee438b
  Stored in directory: /root/.cache/pip/wheels/60/29/7b/f64332aa7e5e88fbd56d4002185ae22dcdc83b35b3d1c2cbf5
Successfully built pyngrok
Installing collected packages: pyngrok
Successfully installed pyngrok-7.0.0


In [None]:
!nohup streamlit run app.py --server.port 80 &
url = ngrok.connect(port="80")
print(url)

In [None]:
# streamlit for frontend

import streamlit as st
import tensorflow as tf
from PIL import Image

img = Image.open("disaster.jpg")


st.image(img)

st.write("# Disaster Tweet Prediction")


tweet = st.text_input(
        "Enter tweet to classify",
        "Enter or paste a tweet here",
        key="placeholder",
    )


#load model
@st.cache_resource
def cache_model(model_name):
    model = tf.keras.models.load_model(model_name)
    return (model)

model = cache_model("model_6_SavedModel_format")
# Load TF Hub Sentence Encoder SavedModel
# model = tf.keras.models.load_model("model_6_SavedModel_format")

def predict_on_sentence(model, sentence):
  """
  Uses model to make a prediction on sentence.

  Returns the sentence, the predicted label and the prediction probability.
  """
  pred_prob = model.predict([sentence])
  pred_label = tf.squeeze(tf.round(pred_prob)).numpy()

  st.write(f"## {sentence}")
  if pred_label == 0:
     st.write(f"This is a non-disaster tweet with probability: {round((1 - pred_prob[0][0]) * 100, 2)}%")

  else:
     st.write(f"This is a disaster tweet with probability: {round(pred_prob[0][0]*100, 2)}%")


if tweet:
    predict_on_sentence(model, tweet)