Once upon a time, we embarked on a journey to create a model that can understand and classify different conversational techniques. It was a journey filled with learning, challenges, and ultimately, success. Here's the story of our journey:

1. **The Beginning - Data Generation**: Our journey began in a world without data. But we knew that to train our model, we needed data. So, we created our own. We used a magical tool called the Faker library to create sentences out of thin air. Each sentence was given a special tag - a conversational technique. And just like that, we had our own dataset to start our journey.

2. **The First Attempt - Naive Bayes Classifier**: With our dataset ready, we started training our first model - a Naive Bayes classifier. It was a simple model, a beginner in the world of machine learning. But as we quickly realized, it was too simple for our complex task. So, we decided to bring in the big guns.

3. **The Big Guns - LSTM Model**: We decided to use a more powerful model - a Long Short-Term Memory (LSTM) model. This model was a type of Recurrent Neural Network (RNN), a model known for its ability to understand sequences, like sentences. It was a perfect fit for our task. So, we trained our LSTM model and watched as it learned from our data.

4. **The Test - Model Evaluation**: After our model was trained, it was time to test its skills. We evaluated our model's performance by looking at its accuracy - the percentage of sentences it classified correctly. And to our delight, our model performed quite well, achieving an accuracy of over 98% on our test data.

5. **The Reflection - Data Visualization**: With our journey nearing its end, we decided to look back and reflect on our model's performance. We created plots that showed how our model's performance improved over time. It was a visual representation of our model's learning process - a testament to its growth and our success.

And so, our journey came to an end. We had successfully created a model that could classify sentences based on conversational techniques. But as we looked back, we realized that our journey was not just about the destination, but also about the journey itself. It was about learning, growing, and overcoming challenges. And most importantly, it was about never giving up, no matter how complex the task.

In [None]:
import pandas as pd
import faker
import random
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import classification_report

# Initialize Faker library
fake = faker.Faker()

# Define function to generate mock data
def generate_mock_data(n=1000):
    data = []
    for _ in range(n):
        # Generate fake sentence
        sentence = fake.sentence()
        # Randomly assign a conversational technique
        technique = random.choice(['affirmation', 'question', 'active listening', 'reflection'])
        data.append((sentence, technique))
    return pd.DataFrame(data, columns=['sentence', 'technique'])

# Generate mock data
df = generate_mock_data()

# Save the data to a CSV file
df.to_csv('mock_data.csv', index=False)

In [None]:
!pip install faker

In [None]:
import pandas as pd
import faker
import random
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import classification_report

# Initialize Faker library
fake = faker.Faker()

# Define function to generate mock data
def generate_mock_data(n=1000):
    data = []
    for _ in range(n):
        # Generate fake sentence
        sentence = fake.sentence()
        # Randomly assign a conversational technique
        technique = random.choice(['affirmation', 'question', 'active listening', 'reflection'])
        data.append((sentence, technique))
    return pd.DataFrame(data, columns=['sentence', 'technique'])

# Generate mock data
df = generate_mock_data()

# Save the data to a CSV file
df.to_csv('mock_data.csv', index=False)

In [None]:
# Load the data from CSV file
df = pd.read_csv('mock_data.csv')

# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(df['sentence'], df['technique'], test_size=0.2, random_state=42)

# Vectorize the sentences
vectorizer = CountVectorizer()
X_train_vec = vectorizer.fit_transform(X_train)
X_test_vec = vectorizer.transform(X_test)

# Train the Naive Bayes classifier
clf = MultinomialNB()
clf.fit(X_train_vec, y_train)

# Predict on the test set
y_pred = clf.predict(X_test_vec)

# Print the classification report
print(classification_report(y_test, y_pred))

In [None]:
!pip install tensorflow

In [None]:
# Generate more mock data
df1 = generate_mock_data(n=5000)
df2 = generate_mock_data(n=5000)
df3 = generate_mock_data(n=5000)
df4 = generate_mock_data(n=5000)

# Combine all the dataframes
df = pd.concat([df1, df2, df3, df4])

# Save the data to a CSV file
df.to_csv('large_mock_data.csv', index=False)

In [None]:
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from sklearn.preprocessing import LabelEncoder

# Load the data from CSV file
df = pd.read_csv('large_mock_data.csv')

# Split the data into sentences and labels
sentences = df['sentence'].values
labels = df['technique'].values

# Tokenize the sentences
tokenizer = Tokenizer(num_words=10000, oov_token='<OOV>')
tokenizer.fit_on_texts(sentences)
sequences = tokenizer.texts_to_sequences(sentences)
padded_sequences = pad_sequences(sequences, padding='post')

# Encode the labels
encoder = LabelEncoder()
encoded_labels = encoder.fit_transform(labels)

# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(padded_sequences, encoded_labels, test_size=0.2, random_state=42)

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense

# Define the model
model = Sequential([
    Embedding(10000, 64, input_length=X_train.shape[1]),
    LSTM(64, return_sequences=False),
    Dense(4, activation='softmax')
])

# Compile the model
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

# Train the model
history = model.fit(X_train, y_train, epochs=3, validation_data=(X_test, y_test))

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Set the style of seaborn for our plots
sns.set()

# Plotting the training and validation loss
plt.figure(figsize=(10, 6))
plt.plot(history.history['loss'], color='blue', label='Training Loss')
plt.plot(history.history['val_loss'], color='red', label='Validation Loss')
plt.title('Training and Validation Loss over epochs')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()

# Plotting the training and validation accuracy
plt.figure(figsize=(10, 6))
plt.plot(history.history['accuracy'], color='blue', label='Training Accuracy')
plt.plot(history.history['val_accuracy'], color='red', label='Validation Accuracy')
plt.title('Training and Validation Accuracy over epochs')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()

In this cell, we are generating the mock data using the function we defined earlier. The generated data is then saved to a CSV file.

In this cell, we are installing the Faker library which will be used to generate fake sentences for our mock data.

In this cell, we are generating the mock data using the function we defined earlier. The generated data is then saved to a CSV file.