# NLTK-Enhanced Financial Chatbot Project

---
This **Jupyter Notebook** demonstrates the development of a Flask-based chatbot, enhanced with *Natural Language Processing (NLP)* capabilities using NLTK. Our chatbot, **FinBot**, aims to make financial data accessible and actionable through natural language queries.

## Table of Contents

- [Setup and Imports](#Setup-and-Imports)
- [Initializing Flask and Loading Data](#Initializing-Flask-and-Loading-Data)
- [Utinity Functions for NLP Tasks](#Utility-Functions-for-NLP-Tasks)
- [Implementing Chatbot Functionality with NLP](#Implementing-Chatbot-Functionality-with-NLP)
- [Flask App Routes for User Interaction](#Flask-App-Routes-for-User-Interaction)
- [Running the Flask App in a Jupyter Notebook](#Running-the-Flask-App-in-a-Jupyter-Notebook)
- [Conclusion](#Conclusion)

### Prerequisites

- Python 3.x
- Flask
- NLTK
- pandas

We'll use NLTK for natural language processing to:

- Tokenize the query (split it into words and punctuation).
- Tag these tokens with part-of-speech tags to help identify nouns, numbers, etc.
- Recognize the intent of the query: is the user asking about revenue or earnings?
- Extract entities such as company names and years.

Before starting, lets ensure that we have downloaded and installed the necessary data:

In [1]:
import subprocess
import sys

# List of packages to install
packages = ["Flask", "nltk", "pandas"]

# Run pip install
result = subprocess.run([sys.executable, "-m", "pip", "install", "-q"] + packages, capture_output=True)

# Check if the installation was successful
if result.returncode == 0: # succeeded (returncode is 0) or failed (non-zero returncode)
    print("Successfully installed required packages.")
else:
    print("Installation failed.")

Successfully installed required packages.


---

## Setup and Imports
This section includes all necessary imports and setup instructions for our chatbot. This includes importing libraries and downloading necessary NLTK data.

#### Libraries Overview

- **nltk (Natural Language Toolkit)**: A leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning.

- **Flask**: A lightweight WSGI web application framework. It's designed to make getting started quick and easy, with the ability to scale up to complex applications. For our chatbot, Flask will handle HTTP requests and serve our web interface.

- **pandas**: An open-source data analysis and manipulation tool, built on top of the Python programming language. It offers data structures and operations for manipulating numerical tables and time series, crucial for handling our financial dataset.

- **threading**: This module constructs higher-level threading interfaces on top of the lower-level thread module. We'll use it to run our Flask application in a background thread within the Jupyter Notebook environment, allowing interactive chatbot testing without interrupting the notebook's execution.

In [2]:
# Standard library imports
from threading import Thread  # For multi-threading support

# Flask framework for building web applications
from flask import Flask, request, jsonify, render_template

# Pandas for data manipulation and analysis
import pandas as pd

# NLTK for natural language processing tasks
import nltk
from nltk.tokenize import word_tokenize  # Tokenizing text into words
from nltk.tag import pos_tag  # Assigning parts of speech to tokens
from nltk.corpus import wordnet as wn  # Accessing WordNet lexical database

#### Downloading NLTK Data

For processing natural language and understanding the structure and meaning of the user's queries, we will download the below datasets:
- `punkt`: A tokenizer that divides a text into a list of sentences by using an unsupervised algorithm to build a model for abbreviation words, collocations, and words that start sentences.
- `averaged_perceptron_tagger`: A part-of-speech tagger that uses the averaged perceptron algorithm to tag words with their parts of speech.
- `wordnet`: A large lexical database of English. Nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept.

In [4]:
# Download NLTK data
nltk.download('punkt')  # For tokenization
nltk.download('averaged_perceptron_tagger')  # For part-of-speech tagging
nltk.download('wordnet')  # For synonym lookup

[nltk_data] Error loading punkt: <urlopen error [Errno 11001]
[nltk_data]     getaddrinfo failed>
[nltk_data] Error loading averaged_perceptron_tagger: <urlopen error
[nltk_data]     [Errno 11001] getaddrinfo failed>
[nltk_data] Error loading wordnet: <urlopen error [Errno 11001]
[nltk_data]     getaddrinfo failed>


False

---

## Initializing Flask and Loading Data
Here, we initialize our Flask app and load the dataset that our chatbot will use to find financial information about companies.

#### Initializing the Flask Application

We initialize our Flask application by creating an instance of the `Flask` class. This instance acts as the central registry for our chatbot's routes, views, and other web functionalities.

In [None]:
app = Flask(__name__)

#### Loading the Financial Dataset

Our chatbot will use the dataset named `detailed_financial_data.csv`, which contains financial information for various companies across different fiscal years which we created in an earlier notebook. We will use pandas to load this CSV file into a DataFrame, making the data easily accessible for our chatbot's logic.

In [None]:
df = pd.read_csv('detailed_financial_data.csv')

---

## Utility Functions for NLP Tasks

Natural Language Processing (NLP) is key to making our chatbot understand the nuances of human language. By implementing utility functions for common NLP tasks, we can enhance our chatbot's ability to parse user queries, identify key information (like company names and financial metrics), and generate relevant responses. In this section, we'll create functions for finding synonym, intent recognition, and entity extraction.

#### Synonym Identification

To make our chatbot more flexible and understanding, it's helpful to identify synonyms of key words in user queries. This allows the chatbot to recognize various ways users might ask about the same concept. We'll use WordNet, a lexical database for the English language, to find synonyms.

**Example Usage:**

- **Input Word:** "profit"
- **Synonyms:** "earnings", "income", "returns

In [None]:
def get_synonyms(word):
    """
    Fetch synonyms for a given word using NLTK's WordNet.

    Args:
    - word (str): The word for which synonyms are to be found.

    Returns:
    - set: A set containing synonyms of the given word.
    """
    # Create an empty set called 'synonyms'. This is where we'll store all the unique synonyms we find.
    synonyms = set()

    # Loop through each synset (a group of synonymous words) that WordNet has for the given 'word'.
    for syn in wn.synsets(word):

        # Inside each synset, loop through each lemma. A lemma is basically a synonym in the synset.
        for lemma in syn.lemmas():

            # Add the synonym to our 'synonyms' set. Before adding, replace any underscores ('_') 
            # in the synonym name with spaces, as WordNet uses underscores to denote spaces.
            synonyms.add(lemma.name().replace('_', ' '))

    # Return the set of synonyms we've collected.
    return synonyms

#### Intent Recognition

To accurately respond to queries, the chatbot needs to discern the user's intent, like whether the query pertains to "Total Revenue" or "Net Income". The `intent_recognition` function analyzes the query for keywords and their synonyms related to these financial metrics.

**Example Flow:**

1. **User Query:** "What was the net income for Apple in 2022?"
2. **Identified Intent:** "Net Income"

In [5]:
def intent_recognition(query):
    """
    Determine the intent of the query based on keyword synonyms.

    Args:
    - query (str): User's query as input.

    Returns:
    - str: Identified financial metric ('Total Revenue' or 'Net Income'), or None if unrecognized.
    """
    
    # Extend the set of synonyms for 'revenue' with 'sales', which is also commonly used to indicate revenue.
    revenue_synonyms = get_synonyms('revenue') | {'sales'}
    
    # Extend the set of synonyms for 'earnings' with 'profit', 'income', and 'net income', 
    # which are terms often used interchangeably with earnings.
    earnings_synonyms = get_synonyms('earnings') | {'profit', 'income', 'net income'}
    
    # Tokenize the user's query into individual lowercase words to facilitate matching against synonyms.
    tokens = word_tokenize(query.lower())
    
    # Check if any of the tokens (words) in the query match any of the synonyms for revenue.
    # If a match is found, return 'Total Revenue' as the intent.
    if any(token in revenue_synonyms for token in tokens):
        return 'Total Revenue'
    
    # If no revenue-related tokens are found, check if any tokens match the synonyms for earnings.
    # If a match is found, return 'Net Income' as the intent.
    elif any(token in earnings_synonyms for token in tokens):
        return 'Net Income'
    
    # If no matches are found in either synonym set, return None to indicate the intent is unrecognized.
    return None

#### Extracting Entities from Queries

For the chatbot to provide specific financial data, it must identify the company and fiscal year mentioned in a query. The `extract_entities` function parses tokenized and tagged queries to find and return these critical pieces of information.

**Example:**

- **User Query:** "How much did Tesla earn in 2021?"
- **Extracted Entities:** Company - "Tesla", Year - 2021

In [1]:
def extract_entities(tagged_tokens):
    """
    Extract potential company names and fiscal years from tagged tokens.

    Args:
    - tagged_tokens (list of tuples): Tokenized and part-of-speech tagged tokens.

    Returns:
    - tuple: (company, year) where 'company' is the first proper noun and 'year' is the first four-digit number found, or None if not found.
    """
    # Initialize variables to store the first found company name and fiscal year.
    company, year = None, None
    
    # Loop through each tagged token.
    for word, tag in tagged_tokens:
        # Check if the current word is tagged as a proper noun ('NNP') and we haven't found a company name yet.
        if tag == 'NNP' and not company:
            # If conditions are met, this is the first proper noun encountered, so we treat it as the company name.
            company = word
        
        # Check if the current word is a digit, exactly four characters long, and we haven't found a year yet.
        elif word.isdigit() and len(word) == 4 and not year:
            # If conditions are met, this is the first four-digit number encountered, so we treat it as the fiscal year.
            # Convert the string to an integer for the year.
            year = int(word)
    
    # After processing all tokens, return the first found company name and fiscal year (if any).
    return company, year

Integrating these advanced NLP utilities enhances our chatbot's workflow, enabling it to:

1. **Comprehend a broader range of user queries** through synonym identification.
2. **Determine the user's intent** with greater accuracy.
3. **Extract key information** needed to fetch relevant financial data.

---

## Implementing Chatbot Functionality with NLP

Building upon our foundation of NLP utilities, we've developed a core function, `nltk_enhanced_chatbot`, which integrates these tools to process user queries effectively. This function is where our chatbot's NLP capabilities converge to process user queries. When a user submits a query, the chatbot undergoes a series of steps to understand the query and generate an appropriate response. Below is an outline of this process:

1. **Query Reception**: The chatbot receives the user's natural language query.
2. **Tokenization**: The query is broken down into individual words or tokens.
3. **Part-of-Speech Tagging**: Each token is tagged with its respective part of speech, aiding in understanding the query's grammatical structure.
4. **Intent Recognition**: The chatbot determines the query's intent, such as whether the user is asking about total revenue, net income, etc.
5. **Entity Extraction**: Key information like company names and fiscal years are extracted from the query.
6. **Data Retrieval**: Based on the recognized intent and extracted entities, the chatbot fetches the relevant financial data from the dataset.
7. **Response Generation**: The chatbot crafts a response incorporating the retrieved data, which is then presented to the user.

In [2]:
def nltk_enhanced_chatbot(user_query):
    """
    Process a user's query to determine the company's total revenue or net income for a given year.

    Args:
    - user_query (str): The user's text query.

    Returns:
    - str: Response generated based on the query's intent and extracted entities.
    """
    # Tokenize the user query into individual words.
    tokens = word_tokenize(user_query)
    
    # Tag each token with its part of speech to understand the structure of the sentence.
    tagged = pos_tag(tokens)
    
    # Determine the intent of the query (e.g., asking about revenue or net income) using a predefined function.
    metric = intent_recognition(user_query)
    
    # Extract potential company name and fiscal year from the tagged tokens.
    company, year = extract_entities(tagged)
    
    # If a company name, year, and financial metric were successfully identified in the query...
    if company and year and metric:
        # Example: Query a DataFrame 'df' containing financial data to find the requested information.
        # This filters the data for the specified company and year.
        data = df[(df['Company'].str.lower() == company.lower()) & (df['Fiscal Year'] == year)]
        
        # If matching data is found, extract and format the financial metric value to respond to the user.
        if not data.empty:
            value = data.iloc[0][metric]
            return f"The {metric.lower()} for {company} in {year} was ${value} billion."
        else:
            # If no data matches the query, inform the user.
            return "Data not found for your query. Please check the company name and fiscal year."
    else:
        # If the query could not be understood (missing or unclear company name, year, or metric), ask for clarity.
        return "Sorry, I couldn't understand your query. Please ask about a company's total revenue or earnings for a specific year."

#### Example Interaction: Query to Response

Consider a user query: "What was Microsoft's revenue in 2021?"

1. **Tokenization**: "What", "was", "Microsoft's", "revenue", "in", "2021"
2. **POS Tagging**: [("What", "WP"), ("was", "VBD"), ("Microsoft's", "NNP"), ("revenue", "NN"), ("in", "IN"), ("2021", "CD")]
3. **Intent Recognition**: The chatbot identifies the intent as querying for "Total Revenue".
4. **Entity Extraction**: Company = "Microsoft", Year = 2021
5. **Data Retrieval**: Looks up the total revenue for Microsoft in 2021 within the dataset.
6. **Response Generation**: "Microsoft's total revenue in 2021 was $143 billion."

---

## Flask App Routes for User Interaction

The Flask framework enables our chatbot to communicate with users via a web interface. By defining routes in our Flask app, we can specify how the chatbot should handle incoming HTTP requests and generate responses. This section outlines the key routes that facilitate user interaction with our NLTK-enhanced chatbot.

#### The Home Route (`/`)

The home route is the primary entry point to our chatbot. When users visit this route, they are greeted with the chatbot's interface, which is rendered using an HTML template.

In [None]:
@app.route('/')
def home():
    return render_template('index.html')

The `index.html` template contains the HTML and JavaScript necessary for users to interact with the chatbot. It provides a text input field for users to enter their queries and a submit button to send these queries to the chatbot for processing.

#### The Chatbot Route (`/chatbot`)

The `/chatbot` route handles the core functionality of our chatbot: processing user queries and generating responses. It listens for POST requests, which contain the user's query in JSON format.

In [6]:
@app.route('/chatbot', methods=['POST'])
def chatbot_response():
    data = request.get_json()
    user_query = data.get('query', '')
    response = nltk_enhanced_chatbot(user_query)
    return jsonify({'response': response})

Upon receiving a query, this route invokes the `nltk_enhanced_chatbot` function, passing in the user's query. The function processes the query using natural language processing (NLP) techniques, determines the intent, extracts relevant information, and retrieves the appropriate response from the dataset. The response is then returned to the user in JSON format, which the front-end interface displays.


#### Interaction Flow:
Below is the interaction flow between the user, the Flask app routes, and the chatbot:

1. **User submits query**: The user types a query in the web interface and submits it.
2. **Home route (`/`)**: Serves the chatbot interface to the user.
3. **Chatbot route (`/chatbot`)**: Receives the user's query, processes it, and returns the chatbot's response.
4. **Display Response**: The web interface displays the chatbot's response to the user.
---

## Running the Flask App in a Jupyter Notebook

Deploying our Flask-based chatbot directly from a Jupyter Notebook presents unique challenges, chiefly how to run the Flask server in a way that doesn't interfere with the notebook's interactivity. By leveraging Python's `threading` module, we can start the Flask app in a separate thread, allowing both the server and the notebook to run concurrently. This setup is particularly useful for development and testing purposes.

#### Implementation

To achieve this, we define a `run_app` function that specifies the Flask app's running configuration. Then, we create and start a thread targeting this function. Here's how it's done:

In [7]:
def run_app():
    app.run(port=5001, debug=False, use_reloader=False)

flask_thread = Thread(target=run_app)
flask_thread.start()

 * Serving Flask app "__main__" (lazy loading)
 * Environment: production
[2m   Use a production WSGI server instead.[0m
 * Debug mode: off


 * Running on http://127.0.0.1:5001/ (Press CTRL+C to quit)
127.0.0.1 - - [27/Mar/2024 09:43:04] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [27/Mar/2024 09:43:20] "POST /chatbot HTTP/1.1" 200 -


By setting `debug=False` and `use_reloader=False`, we ensure that the Flask app does not attempt to restart within the notebook environment, which could cause issues. The chosen port, `5001`, can be adjusted based on your preferences and any potential port conflicts on your machine.

#### Best Practices and Considerations

- **Port Selection**: Ensure the chosen port is free and not blocked by other applications.
- **Debugging**: Running Flask in debug mode within a Jupyter Notebook is not recommended due to potential conflicts with the notebook server. Keep `debug=False` for stability.
- **Stopping the Server**: To stop the Flask server, you may need to manually interrupt the kernel or close the Jupyter Notebook. Be aware that the thread will continue to run until explicitly stopped or until the notebook session ends.
This setup is good enough for development  but for production deployment, Flask apps should be run in a more robust server environment, such as with Gunicorn or uWSGI behind Nginx or Apache.

#### Testing the Chatbot

To interact with the chatbot:
1. Start the Flask app by running it in the terminal or command prompt.
2. Navigate to the home route on the web browser to access the chat interface.
3. Enter a query and submit it to see the chatbot's response.

---

## Conclusion

We developed an NLTK-enhanced financial chatbot, leveraging the robust capabilities of Flask, pandas, and the Natural Language Toolkit (NLTK). 

**Key Achievements:**

- **Integration of NLP Techniques:** We successfully integrated advanced NLP techniques to enhance our chatbot's understanding of user queries, allowing it to process and respond to inquiries about financial data with a high degree of accuracy.
- **Flask for Web Interactivity:** By utilizing Flask, we demonstrated how to effectively build and deploy a web-based chatbot, making sophisticated data analysis accessible through simple natural language queries.
- **Interactive Testing in Jupyter:** The innovative use of threading to run the Flask application within a Jupyter Notebook environment underscored the notebook's utility as a powerful tool for interactive development and testing.

**Learnings and Reflections:**

This project underscored the importance of clean, structured data and the power of NLP in extracting meaningful information from user input. It also showcased the challenges and solutions in deploying interactive applications directly from a Jupyter Notebook, providing valuable insights into the development workflow and debugging practices.

**Looking Ahead:**

- **Enhancing NLP Capabilities:** Future work could explore more sophisticated NLP models and techniques, such as named entity recognition (NER) and sentiment analysis, to further improve the chatbot’s understanding and responses.
- **Expanding the Dataset:** Incorporating a more extensive and diverse financial dataset would enable the chatbot to provide insights across a broader spectrum of companies and financial metrics.
- **Deployment and Scaling:** Moving beyond the notebook, deploying the chatbot on a cloud platform would be a crucial step towards scaling and making the application widely accessible.

**Closing Thoughts:**

This development showcases the incredible potential of combining NLP with web technologies to create intelligent, user-friendly applications. As I continue to refine and expand this chatbot, it serves as a testament to the power of collaboration between data science and web development disciplines, driving forward the possibilities of technology and innovation.