This project implements a Visual Question Answering system using deep learning techniques in JavaScript. The system allows users to ask questions about images, and the model will provide answers based on the visual content in the image.
Visual Question Answering (VQA) combines the fields of Computer Vision and Natural Language Processing to allow a system to answer questions about the content of images. This project utilizes state-of-the-art deep learning models to process both images and text for accurate answers.
Ensure that you have the following installed:
- Node.js (Version 14 or higher)
- npm (Node package manager)
-
Clone this repository:
git clone https://github.com/subratadasGit/Visual_Question_Answering.git
-
Navigate to the project directory:
cd Visual_Question_Answering -
Install the required dependencies:
npm install
To run the model:
-
Place your image in the
images/directory. -
Run the VQA script:
node vqa.js --image path_to_image --question "Your question here" -
The model will output an answer based on the image and question provided.
The Visual Question Answering model combines Convolutional Neural Networks (CNNs) for image feature extraction and Recurrent Neural Networks (RNNs) for processing the textual input (questions).
- CNN: Extracts features from the image.
- RNN: Processes the question and provides an answer based on the extracted features.
This project is licensed under the MIT License - see the LICENSE file for details.
- Author: Subrata Das Rimi Sreemany
- Email: subratadasgit786@gmail.com , rimisreemany21@gmail.com